24668 – I18N - Output window should use the same encoding as the launched application

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 24668 - I18N - Output window should use the same encoding as the launched application

Summary: I18N - Output window should use the same encoding as the launched application

Status:	RESOLVED FIXED

Alias:	None

Product:	projects
Classification:	Unclassified
Component:	Ant Project (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	Jesse Glick

URL:
Keywords:	I18N

Depends on:	166597
Blocks:
	Show dependency tree

Reported:	2002-06-11 21:30 UTC by _ gtzabari
Modified:	2011-07-29 14:09 UTC (History)
CC List:	7 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Program, and results of running it (57.43 KB, image/jpeg) 2008-08-20 18:49 UTC, ralphrmartin	Details
Possible patch (includes changes only for j2seproject; other project types may also need patching) (5.71 KB, patch) 2008-08-26 19:23 UTC, Jesse Glick	Details \| Diff
Alternate patch using <redirector>, for evaluation (fixes UTF-8 for CoS projects; does not handle non-j2se projects) (3.43 KB, patch) 2008-08-26 20:28 UTC, Jesse Glick	Details \| Diff
Revised patch; now should pick up project encoding even when using CoS; still j2seproject only (5.95 KB, patch) 2008-12-19 01:36 UTC, Jesse Glick	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description _ gtzabari 2002-06-11 21:30:22 UTC

We should be able to (somehow) viewer visual 
representations of unicode characters being used in the 
code. For example, say I have:

System.out.println("\u0320 This is a test!");

   I should be able to move my mouse over the String and 
see a visual representation or something similar.

Comment 1 Marek Grummich 2002-07-22 12:20:57 UTC

Set target milestone to TBD

Comment 2 Marek Grummich 2002-07-22 12:26:20 UTC

Set target milestone to TBD

Comment 3 Jesse Glick 2002-12-23 16:37:51 UTC

Consistent use of the I18N keyword.

Comment 4 Jesse Glick 2006-01-02 21:49:08 UTC

Or have a toggle button in the editor to switch from "raw" to "cooked" mode. In
raw mode you would see the escapes; in cooked mode, the real Unicode character.
Would be especially valuable for .properties files, for which cf. #32392.

Comment 5 sabernaderi 2006-03-30 21:34:12 UTC

bfb

Comment 6 sabernaderi 2006-03-30 21:36:32 UTC

bfb

Comment 7 ralphrmartin 2008-08-20 15:06:19 UTC

This is a serious issue for anyone using Netbeans to develop programs using non-western fonts.

Please can we have some action on this?

Comment 8 Ken Frank 2008-08-20 16:04:01 UTC

to developers, is the assigned to person still the one who would look
at this and evaluate if it could be done ?

if not, could someone re look at it, based on latest comments from ralph ?

ken.frank@sun.com

Comment 9 Jesse Glick 2008-08-20 16:34:56 UTC

This is certainly desirable, though there are two ways in which it is less critical than in previous NB releases:

1. As of 6.0 (or is it 6.1?) each project may configure a specific source file encoding, such as UTF-8, so you can
insert Unicode characters directly into Java sources without needing an escape. (BTW try the Insert Unicode module
available on the alpha update center.)

2. For the common case that you are keeping international strings in *.properties files, the 6.5 editor shows these
characters "cooked" even though the actual file on disk uses escapes.

There remains the problem that people who (for whatever reason) need to have \uXXXX escapes in *.java files cannot
easily see what they are.

Comment 10 ralphrmartin 2008-08-20 16:50:54 UTC

Using UTF8 for the sources does not completely solve my problem (perhaps I should file a different bug).

The real problem is that if you write a Java command line program, the OUTPUT window does not display UTF8 chacters (unlike source windows). This is true 
EVEN IF you set -Dfile.encoding=UTF8 in the arguments for running the program (certainly on MacOS X, anyway).

Comment 11 Jesse Glick 2008-08-20 16:54:21 UTC

Display of characters in output would be a completely unrelated issue. This is very sensitive to the operating system,
as output from the externally spawned process gets converted to a bytestream which then needs to be converted back to
characters for display in the IDE. It works fine on Linux (assuming the system locale is UTF-8, as it is on all modern
distributions), but I'm not surprised if it does not work on Macs.

Comment 12 _ gtzabari 2008-08-20 16:57:25 UTC

> There remains the problem that people who (for whatever reason) need to have \uXXXX escapes in *.java files cannot
easily see what they are.

Is there such a case? Let's name specific use-cases for this feature. Is there a reason for someone to prefer \uXXXX
instead of raw characters typed into a UTF8 document? Or is this simply something we have to deal because of legacy
documents?

If this is purely a legacy-document thing I would suggest adding a feature where Netbeans asks the user if he'd like to
convert all his \uXXXX in strings to raw UTF8 characters. Even with this feature available, however, certain people
might decide they don't wish to change existing code. In such a case (or if there is a more compelling use-case) you
might want to consider the following user interface: reuse the "code folding" feature to enable the user to toggle
between raw versus visual display mode for strings. For example, a line containing \uXXXX will display the visual
representation if the line is "folded" and the raw \uXXXX string if it is expanded.

Comment 13 Ken Frank 2008-08-20 17:02:44 UTC

Ralph,

does the situation happen in output window even if the project encoding
property of your project is utf-8 ?

and are the characters to be output in the output window the escaped
ones or the actual ones ?

if you could attach a gif of what is seen in output window it could be helpful.

ken.frank@sun.com

Comment 14 Ken Frank 2008-08-20 17:11:43 UTC

gtzabari,

I don't know at all if this would help but will mention it - on nb update center is
encoding plugin that lets convert between different encodings in a project files;
I don't know if it would apply to this case but it might be worth looking at.

ken.frank@sun.com

Comment 15 ralphrmartin 2008-08-20 18:48:03 UTC

Yes, it still is an issue even if the project encoding property is utf-8.

The characters in the output window typically come out as question marks, as per attachment (coming up next).

Comment 16 ralphrmartin 2008-08-20 18:49:11 UTC

Created attachment 67969 [details]
Program, and results of running it

Comment 17 Ken Frank 2008-08-20 19:10:51 UTC

ralph,

what platform/os are you using and what locale/reg setting are you using ?

Comment 18 ralphrmartin 2008-08-20 20:17:22 UTC

I am using MacOS X. Netbeans reports 
System: Mac OS X version 10.5.5 running on x86_64; MacRoman; en_US (nb)
(note MacRoman) but I do not know how to change the "System" locale away from MacRoman.

For what its worth, I am in the UK, and the rest of my Mac's control panel settings are for GB and UK English, roughly speaking.

Comment 19 Vitezslav Stejskal 2008-08-20 22:26:59 UTC

> does the situation happen in output window even if the project encoding
> property of your project is utf-8 ?

As Jesse explained the encoding in the output window and in a project are two unrelated things. So, changing project
encoding may not necessarily fix what you see in the output window. Anyway, this should really be filed as a different
issue.

As for showing the 'cooked' version of unicode escapes maybe we could just show the cooked version in tooltip when you
hover mouse over the escaped part of a string in the editor. That should be reasonably easy to do (at least for java).

Comment 20 Ken Frank 2008-08-21 03:36:44 UTC

as to if new issue should be filed,
here is some historical data on some other issues with
output window and non ascii - some were fixed, some closed
as wont fix  - so can see if new issues really should be filed.
(wont include issues or rfe about fonts for ow or about input of multibyte encoding prop rfe or 
encoding detection rfe (which was closed as wont fix)

18666 - fixed 2003
20331 - fixed 2002
115203 - wont fix
130311, 130317 - wont fix

ken.frank@sun.com

Comment 21 Vitezslav Stejskal 2008-08-21 13:46:53 UTC

Thanks Ken. I get that displaying text in output window is beyond Nb control. We had that filed as defects and concluded
that it can't be fixed on our side.

Let's leave this RFE open for adding the possibility to see the real characters for escaped unicode sequences.

Comment 22 Ken Frank 2008-08-21 14:02:59 UTC

Vita,

here is some addtional information from separate discussion I had with Ralph - can
you look at it and see if this might lead to separate issue or rfe or if its covered
under the other issues mentioned below that were willnotfix ?

1. when the mac language choice was set to chinese,
the chinese characters were shown ok but the pound sign
was not, which is opposite from original case when he
was running in english locale and the pound sign showed
ok but not the chinese.

2. is there a utf-8 locale on mac like there is on
solaris and linux ? windows does not have one.
perhaps thus from #1 that for nb output window,
the encoding might not be viewed as per project but
as per locale user is in when running nb,
which is different than how rest of nb does it.

3. when he ran java program standalone in mac terminal program, using
-Dfile.encoding=UTF8, all the characters can be seen ok
since terminal can show different encodings.

4. should the user be able to run under english locale,
but being able to input and show in editor both
the chinese and english characters like pound sign --
should nb output window show the characters ok when using
System.out or should user need to do additional java coding related
to encoding handling or conversion so that both characters might
be seen ok in output window ?
(if that is even possible to do) 


5. I think user might assume that since project encoding
is utf-8 which it is in this case, and since all characters
can be seen ok in editor, that simple System.out should
work ok to show all characters in output window.

6. for many/most parts of nb itself, the text shown in
output window does show the non ascii ok as to showing
nb own messages.

---> please let us know if separate issue or rfe needed or if its a wontfix
in context of user needing to do additional encoding handling in code
or if its just something that can't work or is not supported user scenario
as to running in english locale but using characters of other encodings


ken.frank@sun.com

Comment 23 _ gtzabari 2008-08-21 19:07:27 UTC

Correct me if I'm wrong, but isn't the Mac issue simply a matter of the output window using the wrong (non-unicode)
font? Couldn't you resolve it by simply finding a unicode font that displays both English and Chinese characters?

Comment 24 ralphrmartin 2008-08-21 21:12:38 UTC

I dont think its just a font issue. With the system set to GB, Western chars show up OK, but Asian ones dont. When the system is set to Chinese, Chinese chars 
show up OK but some Western ones dont. BUT in each case, it LOOKS like the same font is used (but I may be wrong on this).

Comment 25 _ gtzabari 2008-08-21 21:15:03 UTC

I've never heard of fonts have a "locale" before. I think a font is a font is a font. I have seen numerous times before
where non-standard characters were rendered as "??" simply because the font did not define a character for them.
Upgrading to a full unicode font always fixed the problem.

Is there a way for you to find out what font is being used by the output window? If so, I would open it up in some font
viewing software and check if the font actually contains the characters you expect it to output.

Comment 26 ralphrmartin 2008-08-21 21:36:57 UTC

I think its up to the Netbeans engineers to figure out which font is being used. There seems to be no way to change, or even see what the font in the output 
window is from the Netbeans user preferences.

It seems more likely that the issue is that the "-Dfile.encoding=UTF8" run option I am specifying in the project's run preferences is not being passed on, or 
otherwise not being honored in some way.

Indeed, if I change my program to
   public static void main(String[] args) {
        System.out.println(System.getProperty("file.encoding"));        
        System.out.print("£1000 payment to 小青");
    }
and compile it, it compiles fine, but when I try to run it, it seems to hang, and prints nothing.
There is definitely something wrong here.

For completeness, the above program runs fine in Terminal.app, using
java -Dfile.encoding=UTF8 Classname
saying the encoding is UTF8, and printing all characters, Western and Unicode alike.

Comment 27 _ gtzabari 2008-08-21 21:39:11 UTC

I'm not a Netbeans staffer but based on what you wrote this definitely sounds like a bug to me (the hang at the very
least). I would suggest opening a new bug report against the "output window" component with this information.

Comment 28 ralphrmartin 2008-08-21 21:44:28 UTC

OOPS!
I was running the wrong project by mistake.

OK, when I run the right project, the output is (in GB system locale)
MacRoman
£1000 payment to ??

so it looks like my previous supposition is correct. 

The -Dfile.encoding=UTF8 run argument, set in the project properties, is being ignored.

So
(a) I think this should be picked and not ignored, and
(b) a more sensible default would be for this to automatically get set for the output window to the same as the source code windows, assuming the user 
has not explicitly tried to set it with a -Dfile.encoding=something run argument./

Comment 29 ralphrmartin 2008-08-21 21:48:57 UTC

Would a more competent person than me like to open this as one new bug report, and a separate request for enhancement as follows? 
- Bug: run arguments not being properly picked up and used, specifically -Dfile.encoding=something not being honored
- Enhancement: by default, supply a run argument of -Dfile.encoding=whatever the source code windows have as their encoding

Comment 30 _ gtzabari 2008-08-21 21:50:47 UTC

Okay, now this is beginning to make more sense. My guess is that the following is going on:

1) The Netbeans output window is always initialized with the same font at startup time, regardless of the application
you run.
2) You run a program with a different locale by passing command-line arguments but as far as Netbeans is concerned the
output window is already initialized with an existing font. It makes no attempt to ensure that the output window "tab"
locale matches that of your application.

Think about it from Netbeans' point of view... the output window needs to be created *before* your application actually
runs because of the Ant build process. The real question is whether there is anything Netbeans can do about this, short
of ensuring that the output window is using a full-unicode font to begin with.

Comment 31 ralphrmartin 2008-08-21 22:50:31 UTC

I think this page is relevant: http://java.sun.com/docs/books/tutorial/i18n/text/stream.html

If we cannot set the file.encoding property for output, then the unicode characters are from System.out.println are going to get convert to the wrong stream 
of bytes, and hence show up as the wrong characters.

It is not a case of the "output window not understanding". It is a case of "if the file.encoding property is not set right, we get the wrong steram of bytes sent to 
the window" - at least that is what I believe with my incomplete understanding.

Without any disrespect to anyone here, I think that someone who really understands file encodings and charsets on Java deeply needs to take a look at this, 
and tell us all the right thing to do. I know I am pretty much out of my depth.

Comment 32 Vitezslav Stejskal 2008-08-22 09:52:02 UTC

> The -Dfile.encoding=UTF8 run argument, set in the project properties, is being ignored.

How do you specify it? I tried both 6.1 and 6.5, created new java application project, in project properties -> Run ->
VM Options added -Dfile.encoding=MacRoman. Then ran the program which:

System.out.println("file.encoding=" + System.getProperty("file.encoding"));
Charset ch = Charset.defaultCharset();
System.out.println("ch.name=" + ch.name());

and both 'file.encoding' and the default charset was MacRoman. I then tried other encodings like ISO-8859-1 or UTF-8 and
they all seemed to be passed to the program correctly. So, what do I do differently? Why is it working for me and not
for you? (besides of the fact that I'm on linux)

Product Version: NetBeans IDE 6.1 (Build 200804211638)
Java: 1.6.0_10-ea; Java HotSpot(TM) Client VM 11.0-b11
System: Linux version 2.6.22-15-generic running on i386; UTF-8; en_US (nb)

Comment 33 ralphrmartin 2008-08-22 10:18:22 UTC

Aha I can see part of the problem - but unfortunately fixing it still leaves a problem.

You put the -Dfile.encoding in the "VM options" box for the Project's Run Properties.
I was putting it in the "Arguments"box.
This is a subtle difference between arguments for the program, and others for the java command itself!
Maybe the wording here could be tightened up so others do not make this same mistake.

NEVERTHELESS, unfortunately, when I put Dfile.encoding in the VM options box, I STILL do not get the correct output.
Running this test:

    public static void main(String[] args) {
        System.out.println("file.encoding=" + System.getProperty("file.encoding"));
        Charset ch = Charset.defaultCharset();
        System.out.println("ch.name=" + ch.name());
        System.out.print("£1000 paid to 小青");
    }

Gives this output:

run:
file.encoding=UTF8
ch.name=UTF-8
¬£1000 paid to Â∞èÈùí
BUILD SUCCESSFUL (total time: 0 seconds)

Note an extraneous character before the £ sign, and incorrect Asian characters - even though both file.encoding and ch.name are now OK.

[This is with a completely clean project created in NB6.5. The source file is correctly formatted as UTF-8, which I have verified with an independent editor.]

Comment 34 Vitezslav Stejskal 2008-08-22 12:16:28 UTC

Have you tried starting Netbeans with UTF8 by for example adding '-J-Dfile.encoding=UTF8' to <nbinst>/etc/netbeans.conf?
That way both the JVM running Netbeans and JVM running your application should use the same default charset and have no
compatibility problems when converting bytes-to-chars and vice versa.

Comment 35 Ken Frank 2008-08-22 16:03:53 UTC

But should user really need to start nb with encoding option or need
to add such argument to project properties to have characters show
ok in output window, when they do show ok in editor ?

(also had read in some mail or issue in past that the -J-D encoding option
is not an official one to use, and probably should not be needed,
or rather than it does not apply to encoding of other software like dbase,
app servers, though user might think it does)

Getting back to the recent topic, which still probably needs a separate
issue since its not about this issue, is should the output window show
the characters in this case correctly also as to the users program output ?
(and this might be just on mac or not)

to repeat some info provided by Ralph, when he ran in zh locale, the zh characters
showed ok in ow, the pound sign did not, and when he ran in en locale, visa versa.

and with latest info from him, where the encoding was set in run props, still the 
ow did not show the characters ok.

I think its ok if ow might not show all characters ok if indeed project has files/data
with 2 different encodings, same would be for editor or other parts.

but in this case, since project is utf-8, and since assuming those characters
are utf-8, that is, they were entered into editor, for example, pasted, from
terminal that is known to be using utf-8 characters, then shouldn't ow show them
correctly since are talking about just one character encoding at this point ?

maybe its just as he mentions that ow is not using the utf-8 encoding - at least on
mac.  (I've seen in other issues where linux is used to compare and often these
kind of things have worked on linux but not on mac, for example)

---> anyway Vita, do you think a separate issue could be filed on this
since what we are discussing now is not about this issue ?

ken.frank@sun.com

Comment 36 Ken Frank 2008-08-22 17:11:01 UTC

Ralph,

how is the pound sign being entered into the nb editor ?
I found some tool as part of choice of which input methods to use
(international setttings has a section/tab that allows to choose this)
that perhaps was a unicode one or at least for japanese that also
had pound sign, dollar sign. this tool has 2 sections where pound sign is avaliable - each one 
looks a bit different.

am running in ja locale, using java project with default utf8 encoding
and then entered ja characters and the 2 pound signs and both showed ok in output
window. (ie same as in editor)



ken.frank@sun.com

Comment 37 ralphrmartin 2008-08-22 17:16:57 UTC

"Have you tried starting Netbeans with UTF8 by for example adding '-J-Dfile.encoding=UTF8' to <nbinst>/etc/netbeans.conf?"

Hurray! I changed netbeans_default_options by adding -J-Dfile.encoding=UTF8 at the front, and it now seems to work fine.

Please consider adding this as a standard default in the next release.

Thanks to all for their hard work and persistence in sorting this out!

Comment 38 ralphrmartin 2008-08-22 17:20:01 UTC

How £ is being entered into Netbeans doesn't matter (or Asian characters). It can be cut and paste, typing from Keyboard, or using Apple's Asian input 
methods or whatnot.

As my previous message shows, it is an output encoding problem, and it seems there is a simple fix which should work for all Mac users - change Netbeans 
to run with default encoding file encoding to UTF8. Then output and input windows are in agreement, and the problem goes away.

Comment 39 _ gtzabari 2008-08-22 17:20:34 UTC

Okay, so I think it's pretty clear at this point you want to open a new, separate, issue asking the Netbeans team to
change the default file.encoding ;)

It's a shame you didn't open a new issue earlier because now this entire conversation will get lost... jglick, is it
possible to somehow move their conversation and attachment over to the new issue and remove it from this one?

Comment 40 ralphrmartin 2008-08-22 17:22:52 UTC

Apologies for not reading the original posting closely enough. I thought initially it was reporting the same issue I was discussing, but a more careful look 
showed it was not.

Please do raise a new request, and transfer the discussion across if possible.

Better still, just get in there and fix it! :-)

Comment 41 Ken Frank 2008-08-22 17:23:13 UTC

on mac, netbeans about box shows that, for japanese, ja_JP is locale name
and UTF-8 is encoding being seen, thus am assuming that the actual locale
used by setting the i18n properties to Japanese is the ja_JP.UTF-8
locale - am stating this since in /usr/share/locale are 4 ja sub locales, same
for zh_CN and one of them is called ja_JP whereas others have encoding kind
of name like ja_JP.SJIS and don't know if nb about box parses just the ja_JP
part or if thats the locale that is used by the os.

but i guess as long as nb is viewing the encoding of this locale as utf8, that is ok.

if ran using another ja locale and then had the pound sign and the ja characters,
it might be expected that ow might not show all ok if the pound sign was not
part of the character set of that other ja locale.

but in this case, since locale encoding is utf8, seems like both the locale characters and 
pound sign should show ok in ow.

Comment 42 Ken Frank 2008-08-22 17:30:06 UTC

as per Ralph's last comment:

Have you tried starting Netbeans with UTF8 by for example adding '-J-Dfile.encoding=UTF8' to <nbinst>/etc/netbeans.conf?"

Hurray! I changed netbeans_default_options by adding -J-Dfile.encoding=UTF8 at the front, and it now seems to work fine.

---> isn't it a issue or at least a valid rfe ?
why should user need to use this option when starting nb - I don't think that option has ever
been required and don't think (not sure) its one of the official ones.

if its not issue or rfe, then what are the use cases where using this option would be
needed ?

ken.frank@sun.com

Comment 43 Ken Frank 2008-08-22 17:34:29 UTC

from Ralph's comment - as my previous message shows, it is an output encoding problem, and it seems there is a simple
fix which should work for all Mac users - change Netbeans 
to run with default encoding file encoding to UTF8. Then output and input windows are in agreement, and the problem goes
away.

or perhaps it would be as to implementation - to use the project encoding value rathter than utf-8 only
which as you mention the default for a new session with new userdir should be utf-8
since that is default project encoding.

Comment 44 Jesse Glick 2008-08-22 18:11:53 UTC

Anything done to change how I/O encoding works is likely to be obsolete in NB 7.0 since we will probably cease to use
Ant to run programs at that time. AFAIK Ant does byte <-> char conversion using the default encoding, so setting
file.encoding for the NB process could affect it (though even this is unlikely to help Windows users). There is no way
to have the I/O encoding in Ant be sensitive to what project is being run.

Comment 45 nleck 2008-08-24 01:34:38 UTC

please say this isn't so... "obsolete in NB 7.0 since we will probably cease to use
Ant to run programs at that time". ANT is the major reason we use NB over other IDEs.

Comment 46 Vitezslav Stejskal 2008-08-25 10:18:19 UTC

First, thanks Ralph for trying things out and confirming that -J-Dfile.encoding=UTF8 in netbeans.conf helped. Now, let
me summarize what we know and have learned. Most parts of Netbeans don't care about encoding, which in other words mean
that they just use the default encoding as it is detected by JVM. Although I'm not familiar with the OW implementation I
think it also uses the default encoding. The encoding set for a project is basically used only by the editor and when
loading/saving the project files. The encoding set by 'Project Properties -> Run -> VM options' field for running the
project is only used as a JVM parameter when running the project application.

Scenario #1: The beginning, no options set.

1. The default encoding for Netbeans is MacRoman, this is also the encoding used by OW
2. Without any other options the JVM running the project uses MacRoman as well
3. System.out.println in the project has to convert unicode characters to bytes in order to send them to the out stream.
This conversion is done by using the MacRoman encoding.

I'm not familiar with MacRoman, but I assume that it contains only a limited number of characters and what exact
characters they are depends on the Mac OS wide locale (english vs chinese) selection. So, the problem here was in
mapping unicode characters to a limited MacRoman charset, which resulted in ? replacing characters that are not in the
MacRoman charset. The OW encoding is in sync with the JVM encoding running the project and shows the out stream
correctly, which means that it shows the ? as they appear in the out stream.

Scenario #2: Using -Dfile.encoding=UTF8 as the VM option for running the project.

1. The default encoding for Netbeans is still MacRoman, which is also the encoding used by OW!
2. The JVM running the project is using UTF8
3. System.out.println in the project converts unicode characters to bytes using the UTF8 encoding.

In this case there is no problem in converting the unicode characters to UTF8 and the out stream of the project's
application is encoded in UTF8. The problem is that OW is using MacRoman encoding (!) to translate the out stream bytes
to characters that are displayed in Netbeans. And again not all characters are displayed.

Scenario #3: Using -Dfile.encoding=UTF8 for both Netbeans' JVM and the JVM running the project.

1. The default encoding for Netbeans is now UTF8 and it is also used by OW.
2. The JVM running the project is using UTF8.
3. System.out.println in the project converts unicode characters to bytes using the UTF8 encoding and the OW is using
UTF8 to convert the out stream bytes back to characters that are displayed in Netbeans.

All characters are displayed correctly now.

Now, Ken is right when he says 'why should user need to use this option when starting nb'. The user should not have to.
The OW implementation should use the same encoding that was used for running the program. So, if there is
-Dfile.encoding=UTF8 in the VM options and you run the project from within Netbeans, the OW should use UTF8 for decoding
the application's streams to characters. Additionally, the project should by default pass
-Dfile.encoding=<the-selected-project-encoding> to the JVM when launching the application.

I think this is the problem that can and should be reported to the OW and project components. I would actually suggest
to change the summary of this issue and use this issue instead of creating a new one. We can file a different one for th
e editor, which will request 'showing the real characters for escape sequences'. This way the conversation we've had
here will not be lost. If there are no objections I'll do that.

Comment 47 ralphrmartin 2008-08-25 10:46:57 UTC

I agree pretty much with the summary; although the following may be slightly more accurate:

Scenario 1, point 1: I am pretty sure the default encoding for Netbeans is whatever the Mac version of Java tells it is the appropriate value of the appropriate 
system property. The actual value will depend how the users have set up their international preferences - e.g. for Westerners it will often be MacRoman, but 
for Chinese users it will surely be different. MacRoman is a 256 char charset like say latin1, but different in detail.

Anyway, please lets do get this fixed in 6.5 final if we can.

Comment 48 Ken Frank 2008-08-25 15:46:23 UTC

Vita,

thanks for the summary of this and offer to change this issue so it can be
into another category/subcat with a different summary about ow itself.

I think that Ralph's recent comment about how nb views encoding before or aside
from a project is correct also, it depends on locale user or OS is in when they start nb.

ken.frank@sun.com

Comment 49 Jesse Glick 2008-08-25 16:38:00 UTC

This issue was reported about displaying Unicode characters in the editor naturally. Please let's leave it that way. Any
issues with running programs are unrelated and should be reported separately. But I think anything that might be filed
would be WONTFIX anyway, as described below.

To rephrase and expand upon my earlier message, since I don't think it was understood well:

1. The Output Window has no "encoding", it deals entirely with Unicode characters just like any Java/Swing component.

2. Any issues with loss of non-ASCII characters during a run of an external program have to do with the char -> byte ->
char translation done by that program first printing to an OS-specific stdio stream and that stream then being decoded
for display in the OW. The OW itself has nothing to do with this process.

3. External Java programs encode characters to System.out using the value of file.encoding in that external process,
defaulting to the platform's encoding. NetBeans has nothing to do with this, except insofar as it could explicitly
override this property using -Dkey=value when launching that program.

4. In current NB releases, external Java programs are run using Ant's <exec>. This task decodes the process's System.out
and .err (the confusingly named Process.getInputStream, as well as .getErrorStream) using the value of file.encoding in
the NetBeans process. NetBeans itself has no control or influence over this decoding other than by setting file.encoding
for the entire NetBeans process. There is no way to align this encoding with a particular project.

5. Post-6.5 releases of NB will likely launch external Java programs directly by default, rather than using Ant. Among
other things such a change could give us more options for controlling the encoding and decoding of characters. This does
_not_ mean that Ant will not be used by NetBeans, just that it will not be used to run user programs interactively by
default (you will still be able to run programs through Ant if you have special needs which are best handled by a custom
Ant script). Projects using Compile on Save in 6.5 already bypass the project's regular build.xml for running the
program; in 6.5 Ant is still being used for the implementation of the run & debug actions, but this special script is
inaccessible to the project and should be considered an implementation detail.

Comment 50 Ken Frank 2008-08-25 16:48:04 UTC

thanks for explaining about underlying flow of things, but I think whats
being asked here (or rather would be if this issue had cat/subcat changed or was in another issue
, is if netbeans *should* do some of these things for user, vs them needing to do it themself.

seems like user should not need to know or care that ow flow is different than editor -
they just see that some things show ok in editor and not in ow (or needing to do special
things like start nb with encoding value or in options, which seems like a not helpful
way to require)

in any case, with the new implementation discussed for this, how can we provide comments
in that issue or to that team, to consider about these encoding situations when doing
that implemenatation ?

ken.frank@sun.com

Comment 51 Jesse Glick 2008-08-25 16:58:07 UTC

Of course it _should_ work the way the user wants. In 6.5 as far as I know it _can't_, except insofar as adding
-Dfile.encoding=UTF-8 to netbeans.conf may help UTF-8 users on Mac OS X (unnecessary on Linux, may not work on Windows,
not sure about Solaris). I do not have a cross-platform QA lab to test every scenario on, so I can only guess about how
non-Linux OSs will work. The last time I tried to make Win XP (SP2, standard US-English install) display arbitrary
Unicode characters in a Java console application running in a command shell (not using NetBeans at all), after some
effort I concluded it was not possible, but there may well be special tricks I do not know about.

There is no value in discussing the behavior of NB 7.0+ before it has even begun to be developed. The implementation of
program running will likely be very different and the bugs fixed or introduced will not have much to do with the
implementation in 6.5.

Bear in mind that web applications and GUI applications use entirely different code paths for producing visible output
and these are not likely to have any problems with Unicode. It is console applications that are tricky, since standard
I/O predates widespread Unicode support.

Again, this discussion is cluttering a straightforward and implementable RFE about showing a tooltip in the editor over
Unicode escapes.

Comment 52 Vitezslav Stejskal 2008-08-26 10:34:50 UTC

Ok, here is what happened just now. I filed a new RFE - I18N - Show real characters for escape sequences. And
renamed/moved this one to core/output. I think I understand Jesse's explanation. But since we all concluded that this
__is__ a problem and we may even have a solution for it in Nb7.0+ then I think it's fair to track it.

Obviously we can still decide that the solution would be too expensive and/or the situation is not that bad (works on
linux, has workaround on Mac) and close this issue as WONTFIX. Thanks

Comment 53 _ gtzabari 2008-08-26 10:38:59 UTC

The new RFE can be found at issue 145106.

Comment 54 Jesse Glick 2008-08-26 16:34:52 UTC

To repeat - this probably has nothing to do with the output window; core/output is the wrong component. (You can confirm
by writing and running a simple module which just uses IOProvider to get an output tab and print some random Unicode
text to it.)

Comment 55 Ken Frank 2008-08-26 16:52:02 UTC

Vita,

thanks for doing this !

one comment on if this issue should be considered or not - does it
happen on windows since thats what is used by most.
Its great that it might not happen on linux but
IMO thats not a reason to not look at it completely
if it happens on windows.

ken.frank@sun.com

Comment 56 Jesse Glick 2008-08-26 18:44:12 UTC

The version "6.0" was intentional, since this was when project encodings were introduced.

Comment 57 Jesse Glick 2008-08-26 19:03:51 UTC

I just tried again in plain US-Eng XP SP2 (inside VirtualBox). Seems the matched use of -J-Dfile.encoding=UTF-8 in
netbeans.conf (=> Ant) and the user program (run.jvmargs=-Dfile.encoding=UTF-8 in project.properties) does let you print
"Čau ty vole! שלם" to System.out without corruption. Note that the project's source file encoding is completely
irrelevant: you must use the same encoding for NB as for running the program, and the only reasonable choice for this
shared encoding is UTF-8.

Comment 58 Jesse Glick 2008-08-26 19:06:04 UTC

Note also that there appears to be no way to get the program to display correctly in an XP command window - only when
run through NetBeans.

Comment 59 Jesse Glick 2008-08-26 19:23:12 UTC

Created attachment 68365 [details]
Possible patch (includes changes only for j2seproject; other project types may also need patching)

Comment 60 Jesse Glick 2008-08-26 19:59:30 UTC

Another option would be to leave netbeans.conf alone and use

<jvmarg value="-Dfile.encoding=${source.encoding}"/>
<redirector outputencoding="${source.encoding}" inputencoding="${source.encoding}" errorencoding="${source.encoding}"/>

inside every call to <java>, where ${source.encoding} is the project's encoding. (Obviously this is the encoding of the
project being actually run; if it has subprojects which specify different encodings, these are ignored for purposes of
running the app.)

I don't fully understand what <redirector> does (it is poorly documented), but it seems to work for Unicode output on
XP. (Unicode input still does not work; I have no idea why, but this is likely less important than output.)

The above snippet could be used directly in build-impl.xsl for e.g. j2seproject. For the snippets in java.source.ant,
used for compile-on-save projects in 6.5+, the ProjectRunner API would have to either accept a new API parameter with
the project encoding, or automatically notice "-Dfile.encoding=..." being used in the VM arguments and create a matching
<redirector/>, or just use UTF-8. You can test it easily in any version of NetBeans by just adding to build.xml of a
j2seproject:

<target name="run" depends="compile">
    <java classname="${main.class}" classpath="${build.classes.dir}" fork="true">
        <jvmarg value="-Dfile.encoding=${source.encoding}"/>
        <redirector outputencoding="${source.encoding}" inputencoding="${source.encoding}"
errorencoding="${source.encoding}"/>
    </java>
</target>

I do not know whether a similar change would be useful for e.g. Ruby console projects.

Comment 61 Jesse Glick 2008-08-26 20:28:00 UTC

Created attachment 68374 [details]
Alternate patch using <redirector>, for evaluation (fixes UTF-8 for CoS projects; does not handle non-j2se projects)

Comment 62 Jesse Glick 2008-09-10 04:42:05 UTC

Too risky for 6.5 I think. If someone from QE is ready to exhaustively test a possible patch I could prepare one after
6.5 has been branched off.

Comment 63 Ken Frank 2008-09-10 04:59:04 UTC

I agree with Jesse here - IMO this should not happen for 6.5 - we are already past feature freeze
and almost at code freeze and also, if separate issues might be needed for this 
for non j2se projects, then doing this for next release would allow coordination
about it and to give users a consistent behavior.

We can test in a while for this but now focus is on 6.5; perhaps we can see if ralphmartin 
and gtzabari might want to look at the patch also.

ken.frank@sun.com

Comment 64 Jesse Glick 2008-11-07 17:15:10 UTC

Reminding myself to prepare a patch for experimentation.

Comment 65 Jesse Glick 2008-12-19 01:36:45 UTC

Created attachment 75164 [details]
Revised patch; now should pick up project encoding even when using CoS; still j2seproject only

Comment 66 Jesse Glick 2008-12-19 01:39:24 UTC

I would appreciate it if someone from QA could test the latest patch. This is for j2seproject's only, though I am not
sure if it even makes sense for Java EE-oriented project types. Testing should obviously be on a variety of different
OSs set up in different ways. Non-ASCII output which is compatible with the project encoding should be displayed
correctly in the Output Window; non-ASCII input may not work.

Comment 67 Antonin Nebuzelsky 2008-12-19 10:19:29 UTC

> I would appreciate it if someone from QA could test the latest patch.

Lukasi, can you help Jesse?

Comment 68 Tomas Danek 2009-01-06 13:24:17 UTC

i can try to help you, but please give me a hint what to test. My guess is that test matrix should have these axes:
- CoS on/off
- Sol/Lin/Xp/Mac
- UTF8/other project encoding

i have no experience I18N, encodings, locales,...gan you give advice Ken?

Comment 69 Jesse Glick 2009-01-06 21:16:23 UTC

Yes, all of those axes would be useful. (Vista as well as XP, perhaps.)

Comment 70 Tomas Danek 2009-01-13 13:27:38 UTC

Michal Vanek (thanks!) helped us to test on Win XP as well as Ubuntu , works as expected. Testing on Solaris will follow.

Comment 71 Jesse Glick 2009-04-03 01:55:57 UTC

Modified patch a bit; in CoS mode, encoding was falling back to UTF-8. (Which can encode all chars and is a good
fallback, but not what was intended.)

Not touching other project types besides j2seproject. Server-based projects would anyway not care much about encoding of
stdio.

core-main #26596531e8a9

Comment 72 Quality Engineering 2009-04-03 20:08:57 UTC

Integrated into 'main-golden', will be available in build *200904031400* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/26596531e8a9
User: Jesse Glick <jglick@netbeans.org>
Log: #24668: set file.encoding to project's source.encoding and use the same for en/decoding stdio.

Comment 73 Quality Engineering 2011-07-29 14:09:11 UTC

Integrated into 'main-golden'
Changeset: http://hg.netbeans.org/main-golden/rev/e42490057e6c
User: Jesse Glick <jglick@netbeans.org>
Log: 'encoding' will always be set after #24668.