Bug 132664 - UTF-8 Characters not handled properly
 Summary: UTF-8 Characters not handled properly
 Status: RESOLVED DUPLICATE of bug 125582 editor Unclassified -- Other -- 6.x PC Windows XP P2 with 1 vote (vote) 6.x issues@editor issues@editor RANDOM, THREAD (view as bug list) Show dependency tree / graph

 Reported: 2008-04-11 21:56 UTC by krheinwald 2008-08-06 08:58 UTC (History) 4 users (show) gyszalai jtulach mmirilovic tzezula DEFECT Exception Report :

Attachments
messages.log of a session which showed that behaviour (41.89 KB, text/plain)
2008-04-14 10:57 UTC, krheinwald
Details

 Note You need to log in before you can comment on or make changes to this bug.
 krheinwald 2008-04-11 21:56:41 UTC When the charset for a Java project is set to UTF-8, UTF-8 characters in source file open in the editor always get prefixed with an extra character when the IDE is restarted. This does not happen when the file us not open in the editor during launch but opened afterwards. Example: 'ß' changes to 'ÃŸ' and '©' to 'Â©'. I remember the same happening in 6.0ß/RC, but being fixed in 6.0 release. Vitezslav Stejskal 2008-04-14 09:58:37 UTC What project type are you using? I tried with simple 'Java Application' project and it seemed to work fine. Could you please attach your log file. It's in /var/log/messages.log. Thanks krheinwald 2008-04-14 10:57:02 UTC Created attachment 60117 [details] messages.log of a session which showed that behaviour krheinwald 2008-04-14 10:58:48 UTC The project was created initially years ago in NB 4(?) as Java Application, IIRC and carried over ever since. > I remember the same happening in 6.0ß/RC, but being fixed in 6.0 release. Tomas Zezula 2008-04-14 12:09:54 UTC Do I understand correctly, that the unicode characters are destroyed only in the case when the file is opened in editor during the IDE start? If you open a file which was closed during the start everything is OK, right? krheinwald 2008-04-14 12:14:51 UTC Correct. Tomas Zezula 2008-04-14 12:36:54 UTC Seems that the project (j2seproject) FileEncodingQuery is not available in the time IDE loads the opened document and some other, probably DefaultFileEncodingQuery (Charset.defaultCharset), is used. May be caused by lazy project loading, Jardo?  Jaroslav Tulach 2008-04-15 12:25:53 UTC Strange. The architectural intention of lazy project shall in no way affect the implementation of any query. Try to simulate the problem by closing all projects, opening the File with File/Open, close and restart. If it exhibits the same problem, then this has nothing to do with lazy project loading. krheinwald 2008-04-15 12:32:12 UTC As I wrote in my initial bug report: > This does not happen when the file us not open in the editor during launch but opened afterwards. Jan Lahoda 2008-04-30 19:29:00 UTC Re: "The architectural intention of lazy project shall in no way affect the implementation of any query." - I am afraid that the lazy projects are affecting the queries, as described in issue #125582. The FEQ could be affected in a similar way as the ClassPath. Reporter, what is your project's layout? Is the file on which you see the problem part of an external source root? Thanks. krheinwald 2008-04-30 20:28:10 UTC External source root? Please explain. It's a plain Java project stored on a local disk. FYI, I could recreate the problem with a new 'Desktop Application' project and it's associated source file. Vitezslav Stejskal 2008-05-04 17:13:14 UTC Looking in the attached messages.log the default encoding is Cp1252. Which in case that FEQ is for some reason not available will definitely corrupt loading files that were stored in UTF-8 and use chars above 0x7F. I'm not sure if this is going to work at all or will not have some unwanted side effects, but you could try adding '-J-Dfile.encoding=UTF-8' to the startup parameters in /etc/netbeans.conf. It should override your system's default encoding and hopefully load the files correctly even when your project's encoding is not available. In the meantime, any reliable steps how to reproduce this problem will be much appreciated. krheinwald 2008-05-04 17:40:20 UTC > I'm not sure if this is going to work at all or will not have some unwanted side effects, but you could try adding >'-J-Dfile.encoding=UTF-8' to the startup parameters in /etc/netbeans.conf. It should override your system's > default encoding and hopefully load the files correctly even when your project's encoding is not available. That seesm to help but only shifts the problem to files using CP1252. > In the meantime, any reliable steps how to reproduce this problem will be much appreciated. - Create a new Project 'Java Desktop Application' - Switch to the source view of the generated 'DesktopApplciation1View.java' - Insert 'ß' into one of the strings at the top. - Restart NetBeans with the file open. - 'ß' will be converted to 2 symbolic characters.  Vitezslav Stejskal 2008-06-02 10:39:13 UTC *** Issue 135762 has been marked as a duplicate of this issue. *** Vitezslav Stejskal 2008-06-25 15:42:36 UTC Could somebody please test this on Windows box with recent dev build? I tried on Ubuntu and it seems to be working fine. Thanks Jiri Prox 2008-06-26 09:28:09 UTC I've tried to reproduce it, but I cannot get the error even in version where was the bug reported. Reporter, can you try a newer build, please? Dev build can be downloaded here http://deadlock.netbeans.org/hudson/job/trunk/lastSuccessfulBuild/artifact/nbbuild/dist/zip/ Product Version: NetBeans IDE 6.1 RC1 (Build 200804100130) Java: 1.6.0_10-rc; Java HotSpot(TM) Client VM 11.0-b13 System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb) Userdir: C:\Documents and Settings\tester\.netbeans\6.1rc1 Product Version: NetBeans IDE Dev (Build 20080626015438) Java: 1.6.0_10-rc; Java HotSpot(TM) Client VM 11.0-b13 System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb) Userdir: C:\Documents and Settings\tester\.netbeans\dev krheinwald 2008-06-26 10:34:14 UTC I can confirm the problem is still there in the latest DEV build using both JDK 1.6_06 and 1.6_10ß. Product Version: NetBeans IDE Dev (Build 20080626082658) Java: 1.6.0_10-beta; Java HotSpot(TM) Client VM 11.0-b12 System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb) Userdir: C:\Dokumente und Einstellungen\krheinwald\.netbeans\dev Product Version: NetBeans IDE Dev (Build 20080626082658) Java: 1.6.0_06; Java HotSpot(TM) Client VM 10.0-b22 System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb) Userdir: C:\Dokumente und Einstellungen\krheinwald\.netbeans\dev gyszalai 2008-06-26 12:27:12 UTC It is still a problem with the daily build (200806240008) on Kubuntu Linux 8.04. The System encoding set to UTF-8, java sources encoding set to ISO-8859-2. The JDK is 1.6.0_06; Java HotSpot(TM) Client VM 10.0-b22.  Vitezslav Stejskal 2008-06-26 12:33:44 UTC Thank you to everybody for testing. If this is a race condition, which I think it is, we may never be able to reproduce it reliably. Maybe we could prepare a test, which would simulate opening a document from a project which has not yet been fully initialized. Although I am not sure how to write such a test. Jarda, would you have a suggestion please? zn_cn_2 2008-07-15 11:27:01 UTC I finally found how to reproduce this bug!!!! Plus: this bug is still happening on Netbeans 6.5 M1( Dev 200807040101 ) This bug may occasionally happend on the projects that totally created, BUT WILL happend on the projects that created with existing sources. Just following this steps, and you wil get that bug: 1. Create a new project: File -> New project -> Java -> Java Application 2. Create Main Class, such as : com.nazca.test.TestEncodingBug and write some unicode characters in the source file, e.g.: //---------------------------------------------- package com.nazca.test; public class TestEncodingBug { public static void main(String[] args) { System.out.println("2008 北京欢迎你"); } } //---------------------------------------------- 3.leave the TestEncodingBug.java opened and dirctly close IDE, 4.start Netbeans , you will find the encoding still right. 5.then do the following steps, and you will get that bug. 6.close this project and copy the java source file to other dir, e.g. in e:\\test\\src\\com\\nazca\\test\ \TestEncodingBug.java 7.create another project, but use existing sources: File -> New project -> Java -> Java Project with Existing Sources 8.use the former source dir, e.g. e:\\text\\src 9.open the TestEncodingBug.java, and dirctly close IDE leaving it opened. 10.start Netbeans, you will find that the characters are not correct any more. the source file will be like this. //---------------------------------------------- package com.nazca.test; public class TestEncodingBug { public static void main(String[] args) { System.out.println("2008 鍖椾含娆㈣繋浣�); } } //---------------------------------------------- 11. if you save that file at this state carelessly, then the characters will be uncorrect permenatly. 12. if you close TestEncodingBug.java and reopen it, the characters will be correct. Maybe this bug is caused by the difference of ant script of 'New Project' and 'New project with existing source' I think this bug is very serious, it has destory many of my sources, and I had to use 'Revert to...' again and again. Now I'm thinking about writing a plugin to close every source files when IDE exiting to avoid this bug, I think many poeple may feel very happy to use this plugin, especially people use Non-ASCII simbols :( zn_cn_2 2008-07-15 11:27:03 UTC I finally found how to reproduce this bug!!!! Plus: this bug is still happening on Netbeans 6.5 M1( Dev 200807040101 ) This bug may occasionally happend on the projects that totally created, BUT WILL happend on the projects that created with existing sources. Just following this steps, and you wil get that bug: 1. Create a new project: File -> New project -> Java -> Java Application 2. Create Main Class, such as : com.nazca.test.TestEncodingBug and write some unicode characters in the source file, e.g.: //---------------------------------------------- package com.nazca.test; public class TestEncodingBug { public static void main(String[] args) { System.out.println("2008 北京欢迎你"); } } //---------------------------------------------- 3.leave the TestEncodingBug.java opened and dirctly close IDE, 4.start Netbeans , you will find the encoding still right. 5.then do the following steps, and you will get that bug. 6.close this project and copy the java source file to other dir, e.g. in e:\\test\\src\\com\\nazca\\test\ \TestEncodingBug.java 7.create another project, but use existing sources: File -> New project -> Java -> Java Project with Existing Sources 8.use the former source dir, e.g. e:\\text\\src 9.open the TestEncodingBug.java, and dirctly close IDE leaving it opened. 10.start Netbeans, you will find that the characters are not correct any more. the source file will be like this. //---------------------------------------------- package com.nazca.test; public class TestEncodingBug { public static void main(String[] args) { System.out.println("2008 鍖椾含娆㈣繋浣�); } } //---------------------------------------------- 11. if you save that file at this state carelessly, then the characters will be uncorrect permenatly. 12. if you close TestEncodingBug.java and reopen it, the characters will be correct. Maybe this bug is caused by the difference of ant script of 'New Project' and 'New project with existing source' I think this bug is very serious, it has destory many of my sources, and I had to use 'Revert to...' again and again. Now I'm thinking about writing a plugin to close every source files when IDE exiting to avoid this bug, I think many poeple may feel very happy to use this plugin, especially people use Non-ASCII simbols :( Jan Lahoda 2008-07-15 12:16:36 UTC Thanks for the info. From the description, it seems to me like a duplicate of issue #125582 (marked as fixed after 6.5M1). *** This issue has been marked as a duplicate of 125582 *** Jiri Prox 2008-08-06 08:58:28 UTC *** Issue 143002 has been marked as a duplicate of this issue. ***