132664 – UTF-8 Characters not handled properly

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 132664 - UTF-8 Characters not handled properly

Summary: UTF-8 Characters not handled properly

Status:	RESOLVED DUPLICATE of bug 125582

Alias:	None

Product:	editor
Classification:	Unclassified
Component:	-- Other -- (show other bugs)
Version:	6.x
Hardware:	PC Windows XP

Importance:	P2 blocker with 1 vote (vote)
Assignee:	issues@editor

URL:
Keywords:	RANDOM, THREAD

Duplicates (2):	135762 143002 (view as bug list)
Depends on:
Blocks:

Reported:	2008-04-11 21:56 UTC by krheinwald
Modified:	2008-08-06 08:58 UTC (History)
CC List:	4 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
messages.log of a session which showed that behaviour (41.89 KB, text/plain) 2008-04-14 10:57 UTC, krheinwald	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description krheinwald 2008-04-11 21:56:41 UTC

When the charset for a Java project is set to UTF-8, UTF-8 characters in source file open in the editor always get
prefixed with an extra character when the IDE is restarted. This does not happen when the file us not open in the editor
during launch but opened afterwards.

Example: 'ß' changes to 'ÃŸ' and '©' to 'Â©'.

I remember the same happening in 6.0ß/RC, but being fixed in 6.0 release.

Comment 1 Vitezslav Stejskal 2008-04-14 09:58:37 UTC

What project type are you using? I tried with simple 'Java Application' project and it seemed to work fine. Could you
please attach your log file. It's in <userdir>/var/log/messages.log. Thanks

Comment 2 krheinwald 2008-04-14 10:57:02 UTC

Created attachment 60117 [details]
messages.log of a session which showed that behaviour

Comment 3 krheinwald 2008-04-14 10:58:48 UTC

The project was created initially years ago in NB 4(?) as Java Application, IIRC and carried over ever since.

> I remember the same happening in 6.0ß/RC, but being fixed in 6.0 release.

Comment 4 Tomas Zezula 2008-04-14 12:09:54 UTC

Do I understand correctly, that the unicode characters are destroyed only in the case when the file is opened in editor during the IDE start? If you open a file 
which was closed during the start everything is OK, right?

Comment 5 krheinwald 2008-04-14 12:14:51 UTC

Correct.

Comment 6 Tomas Zezula 2008-04-14 12:36:54 UTC

Seems that the project (j2seproject) FileEncodingQuery is not available in the time IDE loads the opened document and some other, probably 
DefaultFileEncodingQuery (Charset.defaultCharset), is used. May be caused by lazy project loading, Jardo?

Comment 7 Jaroslav Tulach 2008-04-15 12:25:53 UTC

Strange. The architectural intention of lazy project shall in no way affect the implementation of any query. Try to 
simulate the problem by closing all projects, opening the File with File/Open, close and restart. If it exhibits the 
same problem, then this has nothing to do with lazy project loading.

Comment 8 krheinwald 2008-04-15 12:32:12 UTC

As I wrote in my initial bug report:

> This does not happen when the file us not open in the editor
during launch but opened afterwards.

Comment 9 Jan Lahoda 2008-04-30 19:29:00 UTC

Re: "The architectural intention of lazy project shall in no way affect the implementation of any query." - I am afraid
that the lazy projects are affecting the queries, as described in issue #125582. The FEQ could be affected in a similar
way as the ClassPath.

Reporter, what is your project's layout? Is the file on which you see the problem part of an external source root? Thanks.

Comment 10 krheinwald 2008-04-30 20:28:10 UTC

External source root? Please explain. It's a plain Java project stored on a local disk.

FYI, I could recreate the problem with a new 'Desktop Application' project and it's associated source file.

Comment 11 Vitezslav Stejskal 2008-05-04 17:13:14 UTC

Looking in the attached messages.log the default encoding is Cp1252. Which in case that FEQ is for some reason not
available will definitely corrupt loading files that were stored in UTF-8 and use chars above 0x7F.

I'm not sure if this is going to work at all or will not have some unwanted side effects, but you could try adding
'-J-Dfile.encoding=UTF-8' to the startup parameters in <nb-inst>/etc/netbeans.conf. It should override your system's
default encoding and hopefully load the files correctly even when your project's encoding is not available.

In the meantime, any reliable steps how to reproduce this problem will be much appreciated.

Comment 12 krheinwald 2008-05-04 17:40:20 UTC

> I'm not sure if this is going to work at all or will not have some unwanted side effects, but you could try adding
>'-J-Dfile.encoding=UTF-8' to the startup parameters in <nb-inst>/etc/netbeans.conf. It should override your system's
> default encoding and hopefully load the files correctly even when your project's encoding is not available.

That seesm to help but only shifts the problem to files using CP1252.

> In the meantime, any reliable steps how to reproduce this problem will be much appreciated.

- Create a new Project 'Java Desktop Application'
- Switch to the source view of the generated 'DesktopApplciation1View.java'
- Insert 'ß' into one of the strings at the top.
- Restart NetBeans with the file open.
- 'ß' will be converted to 2 symbolic characters.

Comment 13 Vitezslav Stejskal 2008-06-02 10:39:13 UTC

*** Issue 135762 has been marked as a duplicate of this issue. ***

Comment 14 Vitezslav Stejskal 2008-06-25 15:42:36 UTC

Could somebody please test this on Windows box with recent dev build? I tried on Ubuntu and it seems to be working fine.
Thanks

Comment 15 Jiri Prox 2008-06-26 09:28:09 UTC

I've tried to reproduce it, but I cannot get the error even in version where was the bug reported.
Reporter, can you try a newer build, please?

Dev build can be downloaded here
http://deadlock.netbeans.org/hudson/job/trunk/lastSuccessfulBuild/artifact/nbbuild/dist/zip/



Product Version: NetBeans IDE 6.1 RC1 (Build 200804100130)
Java: 1.6.0_10-rc; Java HotSpot(TM) Client VM 11.0-b13
System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb)
Userdir: C:\Documents and Settings\tester\.netbeans\6.1rc1


Product Version: NetBeans IDE Dev (Build 20080626015438)
Java: 1.6.0_10-rc; Java HotSpot(TM) Client VM 11.0-b13
System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb)
Userdir: C:\Documents and Settings\tester\.netbeans\dev

Comment 16 krheinwald 2008-06-26 10:34:14 UTC

I can confirm the problem is still there in the latest DEV build using both JDK 1.6_06 and 1.6_10ß.

Product Version: NetBeans IDE Dev (Build 20080626082658)
Java: 1.6.0_10-beta; Java HotSpot(TM) Client VM 11.0-b12
System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb)
Userdir: C:\Dokumente und Einstellungen\krheinwald\.netbeans\dev

Product Version: NetBeans IDE Dev (Build 20080626082658)
Java: 1.6.0_06; Java HotSpot(TM) Client VM 10.0-b22
System: Windows XP version 5.1 running on x86; Cp1252; de_DE (nb)
Userdir: C:\Dokumente und Einstellungen\krheinwald\.netbeans\dev

Comment 17 gyszalai 2008-06-26 12:27:12 UTC

It is still a problem with the daily build (200806240008) on Kubuntu Linux 8.04. The System encoding set to UTF-8, java
sources encoding set to ISO-8859-2. The JDK is 1.6.0_06; Java HotSpot(TM) Client VM 10.0-b22.

Comment 18 Vitezslav Stejskal 2008-06-26 12:33:44 UTC

Thank you to everybody for testing. If this is a race condition, which I think it is, we may never be able to reproduce
it reliably. Maybe we could prepare a test, which would simulate opening a document from a project which has not yet
been fully initialized. Although I am not sure how to write such a test. Jarda, would you have a suggestion please?

Comment 19 zn_cn_2 2008-07-15 11:27:01 UTC

I finally found how to reproduce this bug!!!!
Plus: this bug is still happening on Netbeans 6.5 M1( Dev 200807040101 )

This bug may occasionally happend on the projects that totally created, BUT WILL happend on the projects that created 
with existing sources.

Just following this steps, and you wil get that bug:

1. Create a new project: File -> New project -> Java -> Java Application

2. Create Main Class, such as : com.nazca.test.TestEncodingBug
   and write some unicode characters in the source file, e.g.:
//----------------------------------------------
package com.nazca.test;

public class TestEncodingBug {
    public static void main(String[] args) {
        System.out.println("2008 北京欢迎你");
    }
}
//----------------------------------------------

3.leave the TestEncodingBug.java opened and dirctly close IDE,

4.start Netbeans , you will find the encoding still right.

5.then do the following steps, and you will get that bug.

6.close this project and copy the  java source file to other dir, e.g. in e:\\test\\src\\com\\nazca\\test\
\TestEncodingBug.java

7.create another project, but use existing sources: File -> New project -> Java -> Java Project with Existing Sources

8.use the former source dir, e.g.  e:\\text\\src

9.open the TestEncodingBug.java, and dirctly close IDE leaving it opened.

10.start Netbeans, you will find that the characters are not correct any more. the source file will be like this.
//----------------------------------------------
package com.nazca.test;

public class TestEncodingBug {
    public static void main(String[] args) {
        System.out.println("2008 鍖椾含娆㈣繋浣�);
    }
}
//----------------------------------------------

11. if you save that file at this state carelessly, then the characters will be uncorrect permenatly.

12. if you close TestEncodingBug.java and reopen it, the characters will be correct.

Maybe this bug is caused by the difference of ant script of 'New Project' and 'New project with existing source'

I think this bug is very serious, it has destory many of my sources, and I had to use 'Revert to...' again and again.

Now I'm thinking about writing a plugin to close every source files when IDE exiting to avoid this bug, I think many 
poeple may feel very happy to use this plugin, especially people use Non-ASCII simbols :(

Comment 20 zn_cn_2 2008-07-15 11:27:03 UTC

I finally found how to reproduce this bug!!!!
Plus: this bug is still happening on Netbeans 6.5 M1( Dev 200807040101 )

This bug may occasionally happend on the projects that totally created, BUT WILL happend on the projects that created 
with existing sources.

Just following this steps, and you wil get that bug:

1. Create a new project: File -> New project -> Java -> Java Application

2. Create Main Class, such as : com.nazca.test.TestEncodingBug
   and write some unicode characters in the source file, e.g.:
//----------------------------------------------
package com.nazca.test;

public class TestEncodingBug {
    public static void main(String[] args) {
        System.out.println("2008 北京欢迎你");
    }
}
//----------------------------------------------

3.leave the TestEncodingBug.java opened and dirctly close IDE,

4.start Netbeans , you will find the encoding still right.

5.then do the following steps, and you will get that bug.

6.close this project and copy the  java source file to other dir, e.g. in e:\\test\\src\\com\\nazca\\test\
\TestEncodingBug.java

7.create another project, but use existing sources: File -> New project -> Java -> Java Project with Existing Sources

8.use the former source dir, e.g.  e:\\text\\src

9.open the TestEncodingBug.java, and dirctly close IDE leaving it opened.

10.start Netbeans, you will find that the characters are not correct any more. the source file will be like this.
//----------------------------------------------
package com.nazca.test;

public class TestEncodingBug {
    public static void main(String[] args) {
        System.out.println("2008 鍖椾含娆㈣繋浣�);
    }
}
//----------------------------------------------

11. if you save that file at this state carelessly, then the characters will be uncorrect permenatly.

12. if you close TestEncodingBug.java and reopen it, the characters will be correct.

Maybe this bug is caused by the difference of ant script of 'New Project' and 'New project with existing source'

I think this bug is very serious, it has destory many of my sources, and I had to use 'Revert to...' again and again.

Now I'm thinking about writing a plugin to close every source files when IDE exiting to avoid this bug, I think many 
poeple may feel very happy to use this plugin, especially people use Non-ASCII simbols :(

Comment 21 Jan Lahoda 2008-07-15 12:16:36 UTC

Thanks for the info. From the description, it seems to me like a duplicate of issue #125582 (marked as fixed after 6.5M1).

*** This issue has been marked as a duplicate of 125582 ***

Comment 22 Jiri Prox 2008-08-06 08:58:28 UTC

*** Issue 143002 has been marked as a duplicate of this issue. ***