This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
We should be able to (somehow) viewer visual representations of unicode characters being used in the code. For example, say I have: System.out.println("\u0320 This is a test!"); I should be able to move my mouse over the String and see a visual representation or something similar.
Set target milestone to TBD
Consistent use of the I18N keyword.
Or have a toggle button in the editor to switch from "raw" to "cooked" mode. In raw mode you would see the escapes; in cooked mode, the real Unicode character. Would be especially valuable for .properties files, for which cf. #32392.
bfb
This is a serious issue for anyone using Netbeans to develop programs using non-western fonts. Please can we have some action on this?
to developers, is the assigned to person still the one who would look at this and evaluate if it could be done ? if not, could someone re look at it, based on latest comments from ralph ? ken.frank@sun.com
This is certainly desirable, though there are two ways in which it is less critical than in previous NB releases: 1. As of 6.0 (or is it 6.1?) each project may configure a specific source file encoding, such as UTF-8, so you can insert Unicode characters directly into Java sources without needing an escape. (BTW try the Insert Unicode module available on the alpha update center.) 2. For the common case that you are keeping international strings in *.properties files, the 6.5 editor shows these characters "cooked" even though the actual file on disk uses escapes. There remains the problem that people who (for whatever reason) need to have \uXXXX escapes in *.java files cannot easily see what they are.
Using UTF8 for the sources does not completely solve my problem (perhaps I should file a different bug). The real problem is that if you write a Java command line program, the OUTPUT window does not display UTF8 chacters (unlike source windows). This is true EVEN IF you set -Dfile.encoding=UTF8 in the arguments for running the program (certainly on MacOS X, anyway).
Display of characters in output would be a completely unrelated issue. This is very sensitive to the operating system, as output from the externally spawned process gets converted to a bytestream which then needs to be converted back to characters for display in the IDE. It works fine on Linux (assuming the system locale is UTF-8, as it is on all modern distributions), but I'm not surprised if it does not work on Macs.
> There remains the problem that people who (for whatever reason) need to have \uXXXX escapes in *.java files cannot easily see what they are. Is there such a case? Let's name specific use-cases for this feature. Is there a reason for someone to prefer \uXXXX instead of raw characters typed into a UTF8 document? Or is this simply something we have to deal because of legacy documents? If this is purely a legacy-document thing I would suggest adding a feature where Netbeans asks the user if he'd like to convert all his \uXXXX in strings to raw UTF8 characters. Even with this feature available, however, certain people might decide they don't wish to change existing code. In such a case (or if there is a more compelling use-case) you might want to consider the following user interface: reuse the "code folding" feature to enable the user to toggle between raw versus visual display mode for strings. For example, a line containing \uXXXX will display the visual representation if the line is "folded" and the raw \uXXXX string if it is expanded.
Ralph, does the situation happen in output window even if the project encoding property of your project is utf-8 ? and are the characters to be output in the output window the escaped ones or the actual ones ? if you could attach a gif of what is seen in output window it could be helpful. ken.frank@sun.com
gtzabari, I don't know at all if this would help but will mention it - on nb update center is encoding plugin that lets convert between different encodings in a project files; I don't know if it would apply to this case but it might be worth looking at. ken.frank@sun.com
Yes, it still is an issue even if the project encoding property is utf-8. The characters in the output window typically come out as question marks, as per attachment (coming up next).
Created attachment 67969 [details] Program, and results of running it
ralph, what platform/os are you using and what locale/reg setting are you using ?
I am using MacOS X. Netbeans reports System: Mac OS X version 10.5.5 running on x86_64; MacRoman; en_US (nb) (note MacRoman) but I do not know how to change the "System" locale away from MacRoman. For what its worth, I am in the UK, and the rest of my Mac's control panel settings are for GB and UK English, roughly speaking.
> does the situation happen in output window even if the project encoding > property of your project is utf-8 ? As Jesse explained the encoding in the output window and in a project are two unrelated things. So, changing project encoding may not necessarily fix what you see in the output window. Anyway, this should really be filed as a different issue. As for showing the 'cooked' version of unicode escapes maybe we could just show the cooked version in tooltip when you hover mouse over the escaped part of a string in the editor. That should be reasonably easy to do (at least for java).
as to if new issue should be filed, here is some historical data on some other issues with output window and non ascii - some were fixed, some closed as wont fix - so can see if new issues really should be filed. (wont include issues or rfe about fonts for ow or about input of multibyte encoding prop rfe or encoding detection rfe (which was closed as wont fix) 18666 - fixed 2003 20331 - fixed 2002 115203 - wont fix 130311, 130317 - wont fix ken.frank@sun.com
Thanks Ken. I get that displaying text in output window is beyond Nb control. We had that filed as defects and concluded that it can't be fixed on our side. Let's leave this RFE open for adding the possibility to see the real characters for escaped unicode sequences.
Vita, here is some addtional information from separate discussion I had with Ralph - can you look at it and see if this might lead to separate issue or rfe or if its covered under the other issues mentioned below that were willnotfix ? 1. when the mac language choice was set to chinese, the chinese characters were shown ok but the pound sign was not, which is opposite from original case when he was running in english locale and the pound sign showed ok but not the chinese. 2. is there a utf-8 locale on mac like there is on solaris and linux ? windows does not have one. perhaps thus from #1 that for nb output window, the encoding might not be viewed as per project but as per locale user is in when running nb, which is different than how rest of nb does it. 3. when he ran java program standalone in mac terminal program, using -Dfile.encoding=UTF8, all the characters can be seen ok since terminal can show different encodings. 4. should the user be able to run under english locale, but being able to input and show in editor both the chinese and english characters like pound sign -- should nb output window show the characters ok when using System.out or should user need to do additional java coding related to encoding handling or conversion so that both characters might be seen ok in output window ? (if that is even possible to do) 5. I think user might assume that since project encoding is utf-8 which it is in this case, and since all characters can be seen ok in editor, that simple System.out should work ok to show all characters in output window. 6. for many/most parts of nb itself, the text shown in output window does show the non ascii ok as to showing nb own messages. ---> please let us know if separate issue or rfe needed or if its a wontfix in context of user needing to do additional encoding handling in code or if its just something that can't work or is not supported user scenario as to running in english locale but using characters of other encodings ken.frank@sun.com
Correct me if I'm wrong, but isn't the Mac issue simply a matter of the output window using the wrong (non-unicode) font? Couldn't you resolve it by simply finding a unicode font that displays both English and Chinese characters?
I dont think its just a font issue. With the system set to GB, Western chars show up OK, but Asian ones dont. When the system is set to Chinese, Chinese chars show up OK but some Western ones dont. BUT in each case, it LOOKS like the same font is used (but I may be wrong on this).
I've never heard of fonts have a "locale" before. I think a font is a font is a font. I have seen numerous times before where non-standard characters were rendered as "??" simply because the font did not define a character for them. Upgrading to a full unicode font always fixed the problem. Is there a way for you to find out what font is being used by the output window? If so, I would open it up in some font viewing software and check if the font actually contains the characters you expect it to output.
I think its up to the Netbeans engineers to figure out which font is being used. There seems to be no way to change, or even see what the font in the output window is from the Netbeans user preferences. It seems more likely that the issue is that the "-Dfile.encoding=UTF8" run option I am specifying in the project's run preferences is not being passed on, or otherwise not being honored in some way. Indeed, if I change my program to public static void main(String[] args) { System.out.println(System.getProperty("file.encoding")); System.out.print("£1000 payment to 小青"); } and compile it, it compiles fine, but when I try to run it, it seems to hang, and prints nothing. There is definitely something wrong here. For completeness, the above program runs fine in Terminal.app, using java -Dfile.encoding=UTF8 Classname saying the encoding is UTF8, and printing all characters, Western and Unicode alike.
I'm not a Netbeans staffer but based on what you wrote this definitely sounds like a bug to me (the hang at the very least). I would suggest opening a new bug report against the "output window" component with this information.
OOPS! I was running the wrong project by mistake. OK, when I run the right project, the output is (in GB system locale) MacRoman £1000 payment to ?? so it looks like my previous supposition is correct. The -Dfile.encoding=UTF8 run argument, set in the project properties, is being ignored. So (a) I think this should be picked and not ignored, and (b) a more sensible default would be for this to automatically get set for the output window to the same as the source code windows, assuming the user has not explicitly tried to set it with a -Dfile.encoding=something run argument./
Would a more competent person than me like to open this as one new bug report, and a separate request for enhancement as follows? - Bug: run arguments not being properly picked up and used, specifically -Dfile.encoding=something not being honored - Enhancement: by default, supply a run argument of -Dfile.encoding=whatever the source code windows have as their encoding
Okay, now this is beginning to make more sense. My guess is that the following is going on: 1) The Netbeans output window is always initialized with the same font at startup time, regardless of the application you run. 2) You run a program with a different locale by passing command-line arguments but as far as Netbeans is concerned the output window is already initialized with an existing font. It makes no attempt to ensure that the output window "tab" locale matches that of your application. Think about it from Netbeans' point of view... the output window needs to be created *before* your application actually runs because of the Ant build process. The real question is whether there is anything Netbeans can do about this, short of ensuring that the output window is using a full-unicode font to begin with.
I think this page is relevant: http://java.sun.com/docs/books/tutorial/i18n/text/stream.html If we cannot set the file.encoding property for output, then the unicode characters are from System.out.println are going to get convert to the wrong stream of bytes, and hence show up as the wrong characters. It is not a case of the "output window not understanding". It is a case of "if the file.encoding property is not set right, we get the wrong steram of bytes sent to the window" - at least that is what I believe with my incomplete understanding. Without any disrespect to anyone here, I think that someone who really understands file encodings and charsets on Java deeply needs to take a look at this, and tell us all the right thing to do. I know I am pretty much out of my depth.
> The -Dfile.encoding=UTF8 run argument, set in the project properties, is being ignored. How do you specify it? I tried both 6.1 and 6.5, created new java application project, in project properties -> Run -> VM Options added -Dfile.encoding=MacRoman. Then ran the program which: System.out.println("file.encoding=" + System.getProperty("file.encoding")); Charset ch = Charset.defaultCharset(); System.out.println("ch.name=" + ch.name()); and both 'file.encoding' and the default charset was MacRoman. I then tried other encodings like ISO-8859-1 or UTF-8 and they all seemed to be passed to the program correctly. So, what do I do differently? Why is it working for me and not for you? (besides of the fact that I'm on linux) Product Version: NetBeans IDE 6.1 (Build 200804211638) Java: 1.6.0_10-ea; Java HotSpot(TM) Client VM 11.0-b11 System: Linux version 2.6.22-15-generic running on i386; UTF-8; en_US (nb)
Aha I can see part of the problem - but unfortunately fixing it still leaves a problem. You put the -Dfile.encoding in the "VM options" box for the Project's Run Properties. I was putting it in the "Arguments"box. This is a subtle difference between arguments for the program, and others for the java command itself! Maybe the wording here could be tightened up so others do not make this same mistake. NEVERTHELESS, unfortunately, when I put Dfile.encoding in the VM options box, I STILL do not get the correct output. Running this test: public static void main(String[] args) { System.out.println("file.encoding=" + System.getProperty("file.encoding")); Charset ch = Charset.defaultCharset(); System.out.println("ch.name=" + ch.name()); System.out.print("£1000 paid to 小青"); } Gives this output: run: file.encoding=UTF8 ch.name=UTF-8 ¬£1000 paid to Â∞èÈùí BUILD SUCCESSFUL (total time: 0 seconds) Note an extraneous character before the £ sign, and incorrect Asian characters - even though both file.encoding and ch.name are now OK. [This is with a completely clean project created in NB6.5. The source file is correctly formatted as UTF-8, which I have verified with an independent editor.]
Have you tried starting Netbeans with UTF8 by for example adding '-J-Dfile.encoding=UTF8' to <nbinst>/etc/netbeans.conf? That way both the JVM running Netbeans and JVM running your application should use the same default charset and have no compatibility problems when converting bytes-to-chars and vice versa.
But should user really need to start nb with encoding option or need to add such argument to project properties to have characters show ok in output window, when they do show ok in editor ? (also had read in some mail or issue in past that the -J-D encoding option is not an official one to use, and probably should not be needed, or rather than it does not apply to encoding of other software like dbase, app servers, though user might think it does) Getting back to the recent topic, which still probably needs a separate issue since its not about this issue, is should the output window show the characters in this case correctly also as to the users program output ? (and this might be just on mac or not) to repeat some info provided by Ralph, when he ran in zh locale, the zh characters showed ok in ow, the pound sign did not, and when he ran in en locale, visa versa. and with latest info from him, where the encoding was set in run props, still the ow did not show the characters ok. I think its ok if ow might not show all characters ok if indeed project has files/data with 2 different encodings, same would be for editor or other parts. but in this case, since project is utf-8, and since assuming those characters are utf-8, that is, they were entered into editor, for example, pasted, from terminal that is known to be using utf-8 characters, then shouldn't ow show them correctly since are talking about just one character encoding at this point ? maybe its just as he mentions that ow is not using the utf-8 encoding - at least on mac. (I've seen in other issues where linux is used to compare and often these kind of things have worked on linux but not on mac, for example) ---> anyway Vita, do you think a separate issue could be filed on this since what we are discussing now is not about this issue ? ken.frank@sun.com
Ralph, how is the pound sign being entered into the nb editor ? I found some tool as part of choice of which input methods to use (international setttings has a section/tab that allows to choose this) that perhaps was a unicode one or at least for japanese that also had pound sign, dollar sign. this tool has 2 sections where pound sign is avaliable - each one looks a bit different. am running in ja locale, using java project with default utf8 encoding and then entered ja characters and the 2 pound signs and both showed ok in output window. (ie same as in editor) ken.frank@sun.com
"Have you tried starting Netbeans with UTF8 by for example adding '-J-Dfile.encoding=UTF8' to <nbinst>/etc/netbeans.conf?" Hurray! I changed netbeans_default_options by adding -J-Dfile.encoding=UTF8 at the front, and it now seems to work fine. Please consider adding this as a standard default in the next release. Thanks to all for their hard work and persistence in sorting this out!
How £ is being entered into Netbeans doesn't matter (or Asian characters). It can be cut and paste, typing from Keyboard, or using Apple's Asian input methods or whatnot. As my previous message shows, it is an output encoding problem, and it seems there is a simple fix which should work for all Mac users - change Netbeans to run with default encoding file encoding to UTF8. Then output and input windows are in agreement, and the problem goes away.
Okay, so I think it's pretty clear at this point you want to open a new, separate, issue asking the Netbeans team to change the default file.encoding ;) It's a shame you didn't open a new issue earlier because now this entire conversation will get lost... jglick, is it possible to somehow move their conversation and attachment over to the new issue and remove it from this one?
Apologies for not reading the original posting closely enough. I thought initially it was reporting the same issue I was discussing, but a more careful look showed it was not. Please do raise a new request, and transfer the discussion across if possible. Better still, just get in there and fix it! :-)
on mac, netbeans about box shows that, for japanese, ja_JP is locale name and UTF-8 is encoding being seen, thus am assuming that the actual locale used by setting the i18n properties to Japanese is the ja_JP.UTF-8 locale - am stating this since in /usr/share/locale are 4 ja sub locales, same for zh_CN and one of them is called ja_JP whereas others have encoding kind of name like ja_JP.SJIS and don't know if nb about box parses just the ja_JP part or if thats the locale that is used by the os. but i guess as long as nb is viewing the encoding of this locale as utf8, that is ok. if ran using another ja locale and then had the pound sign and the ja characters, it might be expected that ow might not show all ok if the pound sign was not part of the character set of that other ja locale. but in this case, since locale encoding is utf8, seems like both the locale characters and pound sign should show ok in ow.
as per Ralph's last comment: Have you tried starting Netbeans with UTF8 by for example adding '-J-Dfile.encoding=UTF8' to <nbinst>/etc/netbeans.conf?" Hurray! I changed netbeans_default_options by adding -J-Dfile.encoding=UTF8 at the front, and it now seems to work fine. ---> isn't it a issue or at least a valid rfe ? why should user need to use this option when starting nb - I don't think that option has ever been required and don't think (not sure) its one of the official ones. if its not issue or rfe, then what are the use cases where using this option would be needed ? ken.frank@sun.com
from Ralph's comment - as my previous message shows, it is an output encoding problem, and it seems there is a simple fix which should work for all Mac users - change Netbeans to run with default encoding file encoding to UTF8. Then output and input windows are in agreement, and the problem goes away. or perhaps it would be as to implementation - to use the project encoding value rathter than utf-8 only which as you mention the default for a new session with new userdir should be utf-8 since that is default project encoding.
Anything done to change how I/O encoding works is likely to be obsolete in NB 7.0 since we will probably cease to use Ant to run programs at that time. AFAIK Ant does byte <-> char conversion using the default encoding, so setting file.encoding for the NB process could affect it (though even this is unlikely to help Windows users). There is no way to have the I/O encoding in Ant be sensitive to what project is being run.
please say this isn't so... "obsolete in NB 7.0 since we will probably cease to use Ant to run programs at that time". ANT is the major reason we use NB over other IDEs.
First, thanks Ralph for trying things out and confirming that -J-Dfile.encoding=UTF8 in netbeans.conf helped. Now, let me summarize what we know and have learned. Most parts of Netbeans don't care about encoding, which in other words mean that they just use the default encoding as it is detected by JVM. Although I'm not familiar with the OW implementation I think it also uses the default encoding. The encoding set for a project is basically used only by the editor and when loading/saving the project files. The encoding set by 'Project Properties -> Run -> VM options' field for running the project is only used as a JVM parameter when running the project application. Scenario #1: The beginning, no options set. 1. The default encoding for Netbeans is MacRoman, this is also the encoding used by OW 2. Without any other options the JVM running the project uses MacRoman as well 3. System.out.println in the project has to convert unicode characters to bytes in order to send them to the out stream. This conversion is done by using the MacRoman encoding. I'm not familiar with MacRoman, but I assume that it contains only a limited number of characters and what exact characters they are depends on the Mac OS wide locale (english vs chinese) selection. So, the problem here was in mapping unicode characters to a limited MacRoman charset, which resulted in ? replacing characters that are not in the MacRoman charset. The OW encoding is in sync with the JVM encoding running the project and shows the out stream correctly, which means that it shows the ? as they appear in the out stream. Scenario #2: Using -Dfile.encoding=UTF8 as the VM option for running the project. 1. The default encoding for Netbeans is still MacRoman, which is also the encoding used by OW! 2. The JVM running the project is using UTF8 3. System.out.println in the project converts unicode characters to bytes using the UTF8 encoding. In this case there is no problem in converting the unicode characters to UTF8 and the out stream of the project's application is encoded in UTF8. The problem is that OW is using MacRoman encoding (!) to translate the out stream bytes to characters that are displayed in Netbeans. And again not all characters are displayed. Scenario #3: Using -Dfile.encoding=UTF8 for both Netbeans' JVM and the JVM running the project. 1. The default encoding for Netbeans is now UTF8 and it is also used by OW. 2. The JVM running the project is using UTF8. 3. System.out.println in the project converts unicode characters to bytes using the UTF8 encoding and the OW is using UTF8 to convert the out stream bytes back to characters that are displayed in Netbeans. All characters are displayed correctly now. Now, Ken is right when he says 'why should user need to use this option when starting nb'. The user should not have to. The OW implementation should use the same encoding that was used for running the program. So, if there is -Dfile.encoding=UTF8 in the VM options and you run the project from within Netbeans, the OW should use UTF8 for decoding the application's streams to characters. Additionally, the project should by default pass -Dfile.encoding=<the-selected-project-encoding> to the JVM when launching the application. I think this is the problem that can and should be reported to the OW and project components. I would actually suggest to change the summary of this issue and use this issue instead of creating a new one. We can file a different one for th e editor, which will request 'showing the real characters for escape sequences'. This way the conversation we've had here will not be lost. If there are no objections I'll do that.
I agree pretty much with the summary; although the following may be slightly more accurate: Scenario 1, point 1: I am pretty sure the default encoding for Netbeans is whatever the Mac version of Java tells it is the appropriate value of the appropriate system property. The actual value will depend how the users have set up their international preferences - e.g. for Westerners it will often be MacRoman, but for Chinese users it will surely be different. MacRoman is a 256 char charset like say latin1, but different in detail. Anyway, please lets do get this fixed in 6.5 final if we can.
Vita, thanks for the summary of this and offer to change this issue so it can be into another category/subcat with a different summary about ow itself. I think that Ralph's recent comment about how nb views encoding before or aside from a project is correct also, it depends on locale user or OS is in when they start nb. ken.frank@sun.com
This issue was reported about displaying Unicode characters in the editor naturally. Please let's leave it that way. Any issues with running programs are unrelated and should be reported separately. But I think anything that might be filed would be WONTFIX anyway, as described below. To rephrase and expand upon my earlier message, since I don't think it was understood well: 1. The Output Window has no "encoding", it deals entirely with Unicode characters just like any Java/Swing component. 2. Any issues with loss of non-ASCII characters during a run of an external program have to do with the char -> byte -> char translation done by that program first printing to an OS-specific stdio stream and that stream then being decoded for display in the OW. The OW itself has nothing to do with this process. 3. External Java programs encode characters to System.out using the value of file.encoding in that external process, defaulting to the platform's encoding. NetBeans has nothing to do with this, except insofar as it could explicitly override this property using -Dkey=value when launching that program. 4. In current NB releases, external Java programs are run using Ant's <exec>. This task decodes the process's System.out and .err (the confusingly named Process.getInputStream, as well as .getErrorStream) using the value of file.encoding in the NetBeans process. NetBeans itself has no control or influence over this decoding other than by setting file.encoding for the entire NetBeans process. There is no way to align this encoding with a particular project. 5. Post-6.5 releases of NB will likely launch external Java programs directly by default, rather than using Ant. Among other things such a change could give us more options for controlling the encoding and decoding of characters. This does _not_ mean that Ant will not be used by NetBeans, just that it will not be used to run user programs interactively by default (you will still be able to run programs through Ant if you have special needs which are best handled by a custom Ant script). Projects using Compile on Save in 6.5 already bypass the project's regular build.xml for running the program; in 6.5 Ant is still being used for the implementation of the run & debug actions, but this special script is inaccessible to the project and should be considered an implementation detail.
thanks for explaining about underlying flow of things, but I think whats being asked here (or rather would be if this issue had cat/subcat changed or was in another issue , is if netbeans *should* do some of these things for user, vs them needing to do it themself. seems like user should not need to know or care that ow flow is different than editor - they just see that some things show ok in editor and not in ow (or needing to do special things like start nb with encoding value or in options, which seems like a not helpful way to require) in any case, with the new implementation discussed for this, how can we provide comments in that issue or to that team, to consider about these encoding situations when doing that implemenatation ? ken.frank@sun.com
Of course it _should_ work the way the user wants. In 6.5 as far as I know it _can't_, except insofar as adding -Dfile.encoding=UTF-8 to netbeans.conf may help UTF-8 users on Mac OS X (unnecessary on Linux, may not work on Windows, not sure about Solaris). I do not have a cross-platform QA lab to test every scenario on, so I can only guess about how non-Linux OSs will work. The last time I tried to make Win XP (SP2, standard US-English install) display arbitrary Unicode characters in a Java console application running in a command shell (not using NetBeans at all), after some effort I concluded it was not possible, but there may well be special tricks I do not know about. There is no value in discussing the behavior of NB 7.0+ before it has even begun to be developed. The implementation of program running will likely be very different and the bugs fixed or introduced will not have much to do with the implementation in 6.5. Bear in mind that web applications and GUI applications use entirely different code paths for producing visible output and these are not likely to have any problems with Unicode. It is console applications that are tricky, since standard I/O predates widespread Unicode support. Again, this discussion is cluttering a straightforward and implementable RFE about showing a tooltip in the editor over Unicode escapes.
Ok, here is what happened just now. I filed a new RFE - I18N - Show real characters for escape sequences. And renamed/moved this one to core/output. I think I understand Jesse's explanation. But since we all concluded that this __is__ a problem and we may even have a solution for it in Nb7.0+ then I think it's fair to track it. Obviously we can still decide that the solution would be too expensive and/or the situation is not that bad (works on linux, has workaround on Mac) and close this issue as WONTFIX. Thanks
The new RFE can be found at issue 145106.
To repeat - this probably has nothing to do with the output window; core/output is the wrong component. (You can confirm by writing and running a simple module which just uses IOProvider to get an output tab and print some random Unicode text to it.)
Vita, thanks for doing this ! one comment on if this issue should be considered or not - does it happen on windows since thats what is used by most. Its great that it might not happen on linux but IMO thats not a reason to not look at it completely if it happens on windows. ken.frank@sun.com
The version "6.0" was intentional, since this was when project encodings were introduced.
I just tried again in plain US-Eng XP SP2 (inside VirtualBox). Seems the matched use of -J-Dfile.encoding=UTF-8 in netbeans.conf (=> Ant) and the user program (run.jvmargs=-Dfile.encoding=UTF-8 in project.properties) does let you print "Čau ty vole! שלם" to System.out without corruption. Note that the project's source file encoding is completely irrelevant: you must use the same encoding for NB as for running the program, and the only reasonable choice for this shared encoding is UTF-8.
Note also that there appears to be no way to get the program to display correctly in an XP command window - only when run through NetBeans.
Created attachment 68365 [details] Possible patch (includes changes only for j2seproject; other project types may also need patching)
Another option would be to leave netbeans.conf alone and use <jvmarg value="-Dfile.encoding=${source.encoding}"/> <redirector outputencoding="${source.encoding}" inputencoding="${source.encoding}" errorencoding="${source.encoding}"/> inside every call to <java>, where ${source.encoding} is the project's encoding. (Obviously this is the encoding of the project being actually run; if it has subprojects which specify different encodings, these are ignored for purposes of running the app.) I don't fully understand what <redirector> does (it is poorly documented), but it seems to work for Unicode output on XP. (Unicode input still does not work; I have no idea why, but this is likely less important than output.) The above snippet could be used directly in build-impl.xsl for e.g. j2seproject. For the snippets in java.source.ant, used for compile-on-save projects in 6.5+, the ProjectRunner API would have to either accept a new API parameter with the project encoding, or automatically notice "-Dfile.encoding=..." being used in the VM arguments and create a matching <redirector/>, or just use UTF-8. You can test it easily in any version of NetBeans by just adding to build.xml of a j2seproject: <target name="run" depends="compile"> <java classname="${main.class}" classpath="${build.classes.dir}" fork="true"> <jvmarg value="-Dfile.encoding=${source.encoding}"/> <redirector outputencoding="${source.encoding}" inputencoding="${source.encoding}" errorencoding="${source.encoding}"/> </java> </target> I do not know whether a similar change would be useful for e.g. Ruby console projects.
Created attachment 68374 [details] Alternate patch using <redirector>, for evaluation (fixes UTF-8 for CoS projects; does not handle non-j2se projects)
Too risky for 6.5 I think. If someone from QE is ready to exhaustively test a possible patch I could prepare one after 6.5 has been branched off.
I agree with Jesse here - IMO this should not happen for 6.5 - we are already past feature freeze and almost at code freeze and also, if separate issues might be needed for this for non j2se projects, then doing this for next release would allow coordination about it and to give users a consistent behavior. We can test in a while for this but now focus is on 6.5; perhaps we can see if ralphmartin and gtzabari might want to look at the patch also. ken.frank@sun.com
Reminding myself to prepare a patch for experimentation.
Created attachment 75164 [details] Revised patch; now should pick up project encoding even when using CoS; still j2seproject only
I would appreciate it if someone from QA could test the latest patch. This is for j2seproject's only, though I am not sure if it even makes sense for Java EE-oriented project types. Testing should obviously be on a variety of different OSs set up in different ways. Non-ASCII output which is compatible with the project encoding should be displayed correctly in the Output Window; non-ASCII input may not work.
> I would appreciate it if someone from QA could test the latest patch. Lukasi, can you help Jesse?
i can try to help you, but please give me a hint what to test. My guess is that test matrix should have these axes: - CoS on/off - Sol/Lin/Xp/Mac - UTF8/other project encoding i have no experience I18N, encodings, locales,...gan you give advice Ken?
Yes, all of those axes would be useful. (Vista as well as XP, perhaps.)
Michal Vanek (thanks!) helped us to test on Win XP as well as Ubuntu , works as expected. Testing on Solaris will follow.
Modified patch a bit; in CoS mode, encoding was falling back to UTF-8. (Which can encode all chars and is a good fallback, but not what was intended.) Not touching other project types besides j2seproject. Server-based projects would anyway not care much about encoding of stdio. core-main #26596531e8a9
Integrated into 'main-golden', will be available in build *200904031400* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main-golden/rev/26596531e8a9 User: Jesse Glick <jglick@netbeans.org> Log: #24668: set file.encoding to project's source.encoding and use the same for en/decoding stdio.
Integrated into 'main-golden' Changeset: http://hg.netbeans.org/main-golden/rev/e42490057e6c User: Jesse Glick <jglick@netbeans.org> Log: 'encoding' will always be set after #24668.