This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 51996 - Encoding problem when using Ant exec on Windows
Summary: Encoding problem when using Ant exec on Windows
Status: CLOSED WONTFIX
Alias: None
Product: projects
Classification: Unclassified
Component: Ant (show other bugs)
Version: 4.x
Hardware: PC Windows XP
: P3 blocker (vote)
Assignee: Jesse Glick
URL:
Keywords: I18N
Depends on:
Blocks:
 
Reported: 2004-12-02 01:41 UTC by Steve Benigan
Modified: 2006-03-24 10:11 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
New test script (1.35 KB, text/plain)
2004-12-03 18:50 UTC, Jesse Glick
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Benigan 2004-12-02 01:41:34 UTC
Under windows, ant output appears to have the
wrong encoding when running the exec task.

To reproduce:

1. In Netbeans, create the following ant script in
the files tab (so that you can execute it within
netbeans):

<?xml version="1.0" encoding="UTF-8"?>
 <project name="test" basedir="." default="init">
   <target name="init">
     <exec executable="dir"/>
     <exec executable="cmd">
        <arg value="/c dir *"/>
     </exec>
   </target>
 </project>

2. Create a file with a special character in the
same directory such as "über".

3. Run the ant script and you'll see that the
character is displayed incorrectly in both cases.
   It is displayed as "\374ber" and "&#65533;ber" where
the question mark appears as a square.  The second
form is what I was seeing while working on the
subversion profile.

Running either command (dir or cmd /c dir *) from
the command prompt results in the proper display
of the special character.

This was discovered initially while working on the
subversion profile in which "svn status" was
incorrectly returning "&#65533;ber".  This problem
appears to be deeper in Netbeans, possibly due to
the ant integration.  It would need to be fixed in
order for the subversion profile to be used with
special characters since the profile relies on
parsing the command output and matching against
existing file names.
Comment 1 Steve Benigan 2004-12-02 01:44:25 UTC
Hmmm.  "?ber" in the description above should appear as:

question mark + "ber"

I guess this issue tracker has an encoding problem too.
Comment 2 Steve Benigan 2004-12-02 01:46:45 UTC
Grrr...

&#65533;ber

should have been

question mark + "ber"

It looks like the encoding problem for this issue tracker only applies
to the initial description!  But that's separate from the issue at hand...
Comment 3 Jesse Glick 2004-12-02 02:43:39 UTC
The Ant integration just takes (Unicode) characters as handed to it by
Ant and sends them to the output window for display as they come. I'm
not surprised that Windows has difficulty with certain encodings but I
don't think there's anything to be done about it besides configuring
your operating system to use a uniform encoding everywhere (e.g.
UTF-8). FWIW, Linux in UTF-8 locale has no issues with any character
set, so it is really OS-dependent.
Comment 4 Steve Benigan 2004-12-02 09:39:10 UTC
Reopening b/c this will be an issue for every vcsgeneric profile that
relies on parsing ant output under windows or any api that requires
parsing of ant output.  It may also affect user's ant scripts that
rely on matching file names in tasks such as copy.  Netbeans relies on
the behavior of ant to be consistent on the Command Prompt and in
Netbeans and this is not the case.

While it may not be a Netbeans code issue, it certainly needs to be
addressed.  I've searched ant's bug database but haven't come up with
anything yet.

Here is a more in depth ant test script.

<?xml version="1.0" encoding="UTF-8"?>
 <project name="test" basedir="." default="init">
    <!-- tests encoding of ant output.  Run this script from the
Command Prompt
    in windows and within Netbeans to see the different results -->
    <target name="init">
        <!-- Running "dir" and "cmd /c dir *" on the Command Prompt
displays the correct results "über" -->

        <echo message="dir without redirector"/>
        <exec executable="dir" />
        <!--
        Netbeans       "\374ber"
        Command Prompt "\374ber"
        -->

        <echo message="cmd /c dir without redirector"/>
        <exec executable="cmd">
        <arg value="/c dir *"/>
        </exec>
        <!--
        Netbeans       "&#65533;ber"
        Command Prompt "?ber"
        -->     

        <echo message="dir with redirector inputencoding=UTF-8"/>
        <exec executable="dir">
        <redirector inputencoding="UTF-8"/>
        </exec>
        <!-- redirector has no effect
        Netbeans       "\374ber"
        Command Prompt "\374ber"
        -->

        <echo message="cmd /c dir with redirector inputencoding=UTF-8"/>
        <exec executable="cmd">
        <arg value="/c dir *"/>
        <redirector inputencoding="UTF-8"/>
        </exec>
        <!-- redirector does not fix the problem
        Netbeans       "?ber"
        Command Prompt "?ber"
        -->

        <echo message="echo task"/>
        <echo message="über.txt"/>
        <!--
        Netbeans       "über"
        Command Prompt "&#8319;ber"
        -->
    </target>
 </project>
Comment 5 Steve Benigan 2004-12-02 09:41:05 UTC
Note that the encoding in this issue tracker isn't quite correct so
the comments in the ant script aren't showing correctly in the
previous issue comment.
Comment 6 Jesse Glick 2004-12-02 16:51:38 UTC
Do not paste large text blocks into IZ's "Comments" area in general
(lines rewrap etc.), and certainly do not try it if you are using any
non-ASCII characters. Use the "Create a new attachment" link which
lets you upload the file exactly as you have it on disk.

Feel free to investigate, though I doubt you'll find anything fixable
in either NB or Ant. NB deals only with characters, not bytes, as far
as process output is concerned; in fact it does nothing special to
handle <exec> or <java>, it just accepts String's from Ant. Ant
appears to do the conversion in LogOutputStream.processBuffer, where
it calls ByteArrayOutputStream.toString(), which is documented to use
the platform's default character encoding, whatever that is. You could
insert debugging code to print the hexadecimal byte values that go
into LOS and the hex char values that come out and see if something is
wrong.

BTW what does Ant have to do with Subversion exactly? I don't see the
connection.

Reassigning to you since this sort of thing is not generally
reproducible, or not reproducible in the same way, since it depends on
both the operating system and the operating system's configuration
(e.g. current locale setting), so debugging generally has to be done
by someone observing the problem to begin with.
Comment 7 Steve Benigan 2004-12-03 04:01:48 UTC
Sorry, but I posted the ant script so that this issue is searchable. 
The enclosures are not.  I didn't think it was too much text.

I found the issue while working on the subversion profile.  Most vcs
profiles rely on parsing command output to determine things like
revision, author, status, etc.  With international characters, file
names in Netbeans were not matching the parsed output due to this
issue.  It is not just the subversion profile.  This will happen with
any profile that does this and could happen with any api that relies
on parsing ant output.

Since Netbeans is supposed to be a capable of running cross platform
and in many locales, I posted the ant script to reproduce the issue
apart from the subversion profile to show that it's an
Ant/Netbeans/Windows problem.  This is completely reproducible on
Windows in the US and German locales.  A fellow Netbeans user using
Windows 2000 in the German locale has seen the problem as well.

Please do not assign this back to me.  I am only working on the
Subversion profile and this issue extends beyond it into a general
issue with Ant/Netbeans/Windows.  Surely there is someone who works on
 locale issues besides yourself who would be interested in addressing
the problem.
Comment 8 Jan Chalupa 2004-12-03 09:16:24 UTC
Steve: could you please attach your <userdir>/var/log/messages.log
file? I'm especially interested in the System Locale and Encoding
settings used by your NetBeans.

I think this is very similar to issue #51672. Obviously, javac
-encoding settings has nothing to do with this case, but other points
mentioned there probably apply. This also makes me think that the
current resolution of #51672 as duplicate of issue #19928 is incorrect.

Roman, I think you'll like this one -> cc.
Comment 9 Steve Benigan 2004-12-03 11:14:58 UTC
From <userdir>/var/log/messages.log:

>Log Session: Friday, December 3, 2004 5:50:09 AM EST
>System Info: 
  Product Version       = NetBeans IDE Dev (Build 041130)
  Operating System      = Windows XP version 5.1 running on x86
  Java; VM; Vendor      = 1.5.0; Java HotSpot(TM) Client VM 1.5.0-b64;
Sun Microsystems Inc.
  Java Home             = Z:\java\1.5.0\jre
  System Locale; Encod. = en_US (nb); Cp1252

I also tried:

  System Locale; Encod. = de_DE (nb); Cp1252

With the same result.
Comment 10 Jan Chalupa 2004-12-03 15:12:18 UTC
Thanks. I think that explains something. The encoding used by NetBeans
on your system is Cp1252 (Windows Latin-1). I assume you would need to
force NetBeans to use a German encoding (Cp273?). I described a way to
do that in issue #51672. You might also need to make Ant run with the
same encoding, but I don't know how to do that.

I'm quite sure the workaround might help in this case, but why it is
needed and why NetBeans doesn't set the right encoding in the first
place is beyond my imagination.
Comment 11 Jan Chalupa 2004-12-03 15:18:11 UTC
Ah, I noticed you set the encoding in the Ant script to UTF-8. Instead
of experimenting with various Cpxxx encodings, can you try to
explicitly tell NetBeans to run with the UTF-8 encoding like this...?

  nb.exe -J-Dfile.encoding=UTF8

Let me know if it helps.
Comment 12 Jesse Glick 2004-12-03 18:50:40 UTC
Created attachment 19134 [details]
New test script
Comment 13 Jesse Glick 2004-12-03 19:00:05 UTC
So, I did some experimentation on my XP partition. Results:


NOTE: removed executable="dir" because it did not work for me on XP
for whatever reason (I have no dir.exe - built-in?). But cmd /c dir works.


From NB running on American English XP w/ default encoding Cp1252:

Both cmd /c dir showed "dobre.txt" (r-hacek mapped to r)
echo from Ant is correct
both lines from Java program (forked or unforked) show ?

(note: correct accented filename appears in Favorites tab; this is
using a VFAT filesystem)


From NB running on XP w/ -J-Dfile.encoding=UTF-8:

cmd /c dir still shows "dobre.txt" (missing accent)
echo from Ant is still correct
forked Java program still shows ?
unforked Java program shows correct accented char


Command.com on XP:

Both cmd /c dir showed "dobre.txt" (r-caron mapped to r)
echo from Ant shows ?
both lines from Java program (forked or unforked) show ?


After setting Control Panel -> Regional & Language Options -> Advanced
-> Language for Non-Unicode Programs to Czech, and rebooting a couple
of times:


NB w/ default encoding (still Cp1252):


cmd /c dir w/o redir shows y-acute (wrong char)
w/ redir shows ?
echo still fine
forked and unforked Java show ?


NB w/ -J-Dfile.encoding=UTF-8:

both cmd /c dir shows square "missing char"
echo from Ant is still correct
forked Java program still shows ?
unforked Java program still shows correct accented char


NB w/ -J-Dfile.encoding=Cp1250:

cmd /c dir w/o redir shows y-acute (wrong char)
w/ redir shows ? (and using inputencoding="Cp1250" or ISO-8859-2 shows
y-acute)
echo still fine
forked Java shows ?
unforked Java is correct


Command.com on XP:

cmd /c dir w/o redirector shows correct accented char
w/ redirector shows ?
echo from Ant shows ?
both lines from Java program (forked or unforked) show ?


Now from Linux (Fedora Core 3, locale en_US.UTF-8), after replacing
cmd /c dir with ls:


From a shell on console (font latarcyrheb-sun16):
all correct


From gnome-terminal:
all correct


From NB, using default UTF-8 encoding:
all correct


So my current recommendations: use ASCII or use Linux, your choice.
Sorry, but there just seem to be too many variables on Windows, and
there does not appear to be any way to force the OS to use UTF-8
consistently for everything. (There is no explicit "encoding" setting
available in the Control Panel, just "Language for Non-Unicode
Programs", which parenthetically notes the encoding for each language
- but they are all codepage-based, not Unicode.)

If anyone has any bright ideas on how Ant could automatically detect
that the OS is not Unicode-friendly and massage its I/O to use some
charset translator in a way that would work reliably and fix these
problems, feel free to file a patch on ant.apache.org (and prepare for
exhaustive field testing). Again, NB itself does not do byte <-> char
translation for the Ant integration, so it is probably not responsible
for any bugs in this area (nor can it likely include any fixes).
Comment 14 Marian Mirilovic 2005-07-12 10:01:35 UTC
closed