Bug 39521 - I18N - International characters mangled when source is saved
I18N - International characters mangled when source is saved
Status: RESOLVED WORKSFORME
Product: java
Classification: Unclassified
Component: Unsupported
3.x
PC Windows 7 x64
: P2 (vote)
: 4.x
Assigned To: Dusan Balek
issues@java
: I18N
Depends on: 42638
Blocks: 45719
  Show dependency treegraph
 
Reported: 2004-02-03 09:16 UTC by digithed
Modified: 2012-01-12 16:22 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description digithed 2004-02-03 09:16:10 UTC
While using the current Development Q-Build 
(200401201900) I have noticed a problem with 
editing source files containing international 
characters. I work in Sweden and we have 3 extra 
letters in the alphabet (åÅ, äÄ, öÖ), these 
characters are quite common in names an so get 
entered into comments in source code quite often. 
The 3.5.1 release has no problem with these 
characters, but the current Development Q-Build 
seems to mangle them when the file is saved. For 
instance the name..
'Esbjörn'
gets mangled and becomes...
'Esbjörn'
Comment 1 pfelenda 2004-02-03 14:47:02 UTC
The problem should be when file is saved in one encoding and read from
another. I had the same problems with Czech characters. I saved file
with iso8559-2 (or cp1250) encoding and read it with UTF-8.
If this is your problem, try to change encoding for java files in
options (Tools|Options->Editing|Java Sources->Expert - Default
Encoding) to the same value used by saving file. (to apply changes you
should close all java files or restart IDE)


Could you provide info about locale ? These informations about system
are printed when you start development build from system console.
There is a part of info from my system :

System Info:
...
  System Locale; Encod. = cs_CZ; UTF-8
...

Please give the info about jdk version too.
Comment 2 digithed 2004-02-04 10:45:39 UTC
Sorry to sound lame, but how do you start NetBeans from the system 
console? Do you mean running runide.exe from a command prompt in 
Windows? If this is what you mean then I do not get any system info. 
printed out showing the locale.

Also the settings at (Tools|Options->Editing|Java Sources->Expert - 
Default Encoding) are empty, but they are also empty on the 3.5.1 
version where I don't have the international characters problem.

My JDK version is 1.4.1_06
Comment 3 pfelenda 2004-02-04 15:24:05 UTC
Sorry I am a Linux user. Yes a meant run it from the command promt in
Windows. Open cmd, move to ...\netbeans\bin directory and run runide.exe .
If you have dev build (Q-build) then a lot of informations are printed
in this command promt. These informations are stored in "c:\Documents
and Setting\{user_name}\.netbeans\dev\system\ide.log" file. You could
open this file in text editor and find the row with "System Locale"
string. This string is at 8 row in my ide.log.
There is another way how to find information about file encoding.
Put these commands :
  String enc = System.getProperties().getProperty("file.encoding");
  System.out.println("encoding : "+enc);
in "main" method in your source code and run it. You see the encoding
used by reading a file in output window in IDE.

---
The Default Encoding is empty by default. The encoding is read from
your OS and you can find out the encoding by process described above.
The source file should be writed and read with the same encoding, else
some national characters won't be rightly displayed.
This should happen when files are writed on Linux and read on Windows
or writed/read in Windows with different locales.
Try to set the Default encoding to the value (e.g. cp1250, iso8859-2,
UTF-8, ...) used by writing your source files.
If you edit source on different computers with different file
encoding, you must find the encoding on the computer where the files
are displayed correctly.
This happens because the IDE doens't know anything about locale of
edited file.
Comment 4 digithed 2004-02-04 15:59:53 UTC
OK I have now found the locale that is being used by NetBeans on my 
machine, it is...
System Locale; Encod. = en_GB; Cp1252
(I am living in Sweden, but I am from England - I am using a Swedish 
keyboard)

I understand your last comment, but I am still a little confused...
The System Locale quoted above is used on my machine by both NetBeans 
3.5.1 and the Dev. Q-Build (200401201900). I am editing the same 
source files on the same machine, sometimes with 3.5.1 and sometimes 
with the Q-Build. When I edit with 3.5.1 I get no problems with the 
Swedish characters åäö (despite the fact that my locale is en_GB), 
but when I edit source files using the Dev. Q.Build I get the 
problems described in my original post.
Does this mean...
a) That there is a bug in 3.5.1 in that it is not using the locale 
information correctly?
or
b) That there is a bug in the Dev. Q-Build in that it should cope 
with the international characters using the locale that is set on my 
machine?
Comment 5 pfelenda 2004-02-04 16:44:39 UTC
I don't know where is the problem now. Meaby You are using
different JDK's. Is this your case ?
Check the version of jdk in main manu Help|About -> Details in
both IDEs (3.5.1 and Q-build).

Comment 6 Miloslav Metelka 2004-02-10 08:59:31 UTC
IMHO it would be useful to attach ide.log file (under 
<user-dir>/system/ide.log) for both your IDEs (3.5.1 and the dev build).

The logs do not have to be complete, we need mainly the sort of info
like this below:

-------------------------------------------------------------------------------
>Log Session: Monday, February 9, 2004 10:29:54 PM CET
>System Info:
 Product Version       = NetBeans IDE Dev (Build 040205)
 Operating System      = Linux version 2.4.20-19.8custom running on i386
 Java; VM; Vendor      = 1.4.2; Java HotSpot(TM) Client VM 1.4.2-b28;
Sun Microsystems Inc.
 Java Home             = /usr/java/j2sdk1.4.2/jre
 System Locale; Encod. = en_US; UTF-8
 Home Dir; Current Dir = /usr/local/home/mmetelka;
/usr/local/src/nb/t/nb_all/nbbuild


Anyway we should be able to resolve this into 3.6. Assigning to Dusan.
Comment 7 digithed 2004-02-10 09:49:59 UTC
ide.log file from NetBeans 3.5.1:
----------------------------------------------------------------------
---------
>Log Session: 26 January 2004 17:36:50 o'clock CET
>System Info: 
  Product Version       = NetBeans IDE 3.5.1 (Build 200307302351)
  IDE Versioning        = IDE/1 spec=3.42.2 impl=200307302351
  Operating System      = Windows 2000 version 5.0 running on x86
  Java; VM; Vendor      = 1.4.1_06; Java HotSpot(TM) Client VM 
1.4.1_06-b01; Sun Microsystems Inc.
  Java Home             = C:\j2sdk1.4.1_06\jre
  System Locale; Encod. = en_GB; Cp1252
  Home Dir; Current Dir = C:\Documents and Settings\sbrammer.SE; 
C:\Program Files\NetBeans IDE 3.5.1
  IDE Install; User Dir = C:\Program Files\NetBeans IDE 3.5.1; 
C:\Documents and Settings\sbrammer.SE\.netbeans\3.5
  CLASSPATH             = C:\Program Files\NetBeans IDE 3.5.1
\lib\ext\boot.jar;C:\Program Files\NetBeans IDE 3.5.1\lib\ext\crimson-
1.1.3.jar;C:\Program Files\NetBeans IDE 3.5.1\lib\ext\regexp-
1.2.jar;C:\Program Files\NetBeans IDE 3.5.1\lib\ext\xerces-
2.0.2.jar;C:\Program Files\NetBeans IDE 3.5.1\lib\ext\xml-apis-
1.0b2.jar;C:\j2sdk1.4.1_06\lib\dt.jar;C:\j2sdk1.4.1_06\lib\tools.jar
  Boot & ext classpath  = C:\j2sdk1.4.1_06
\jre\lib\rt.jar;C:\j2sdk1.4.1_06\jre\lib\i18n.jar;C:\j2sdk1.4.1_06
\jre\lib\sunrsasign.jar;C:\j2sdk1.4.1_06
\jre\lib\jsse.jar;C:\j2sdk1.4.1_06\jre\lib\jce.jar;C:\j2sdk1.4.1_06
\jre\lib\charsets.jar;C:\j2sdk1.4.1_06\jre\classes;C:\j2sdk1.4.1_06
\jre\lib\ext\dnsns.jar;C:\j2sdk1.4.1_06
\jre\lib\ext\ldapsec.jar;C:\j2sdk1.4.1_06
\jre\lib\ext\localedata.jar;C:\j2sdk1.4.1_06
\jre\lib\ext\sunjce_provider.jar
  Dynamic classpath     = C:\Program Files\NetBeans IDE 3.5.1
\lib\core-windows.jar;C:\Program Files\NetBeans IDE 3.5.1
\lib\core.jar;C:\Program Files\NetBeans IDE 3.5.1\lib\openide.jar
----------------------------------------------------------------------
---------

ide.log file from NetBeans Dev Build:
----------------------------------------------------------------------
---------
>Log Session: 27 January 2004 15:57:41 o'clock CET
>System Info: 
  Product Version       = NetBeans IDE Dev (Build 200401201900)
  Operating System      = Windows 2000 version 5.0 running on x86
  Java; VM; Vendor      = 1.4.1_06; Java HotSpot(TM) Client VM 
1.4.1_06-b01; Sun Microsystems Inc.
  Java Home             = C:\j2sdk1.4.1_06\jre
  System Locale; Encod. = en_GB; Cp1252
  Home Dir; Current Dir = C:\Documents and Settings\sbrammer.SE; 
C:\Program Files\NetBeans3.6
  IDE Install; User Dir = C:\Program Files\NetBeans3.6; C:\Documents 
and Settings\sbrammer.SE\.netbeans\dev
  CLASSPATH             = C:\Program Files\NetBeans3.6
\lib\ext\boot.jar;C:\j2sdk1.4.1_06\lib\dt.jar;C:\j2sdk1.4.1_06
\lib\tools.jar
  Boot & ext classpath  = C:\j2sdk1.4.1_06
\jre\lib\rt.jar;C:\j2sdk1.4.1_06\jre\lib\i18n.jar;C:\j2sdk1.4.1_06
\jre\lib\sunrsasign.jar;C:\j2sdk1.4.1_06
\jre\lib\jsse.jar;C:\j2sdk1.4.1_06\jre\lib\jce.jar;C:\j2sdk1.4.1_06
\jre\lib\charsets.jar;C:\j2sdk1.4.1_06\jre\classes;C:\j2sdk1.4.1_06
\jre\lib\ext\dnsns.jar;C:\j2sdk1.4.1_06
\jre\lib\ext\ldapsec.jar;C:\j2sdk1.4.1_06
\jre\lib\ext\localedata.jar;C:\j2sdk1.4.1_06
\jre\lib\ext\sunjce_provider.jar
  Dynamic classpath     = C:\Program Files\NetBeans3.6
\lib\core.jar;C:\Program Files\NetBeans3.6\lib\openfile-
cli.jar;C:\Program Files\NetBeans3.6\lib\openide-
loaders.jar;C:\Program Files\NetBeans3.6\lib\openide.jar
----------------------------------------------------------------------
---------
Comment 8 digithed 2004-02-10 09:59:57 UTC
Sorry the log files formatted so badly. I think its still possible to 
see the necessary information however.
Comment 9 Dusan Balek 2004-02-11 14:42:03 UTC
Let me ask one more question.
Is it problem of saving/reading Java source files only, or
does it appear also when saving/reading other file types (e.g. plain
text)? 
Comment 10 digithed 2004-02-16 15:29:48 UTC
I know it has happened with java and JSP files I haven't tried plain 
text. I'll try and test this and get back to you. My normal working 
environment is Netbeans3.5.1, I was only evaluating the dev version 
and I have to do some messing around on my machine to use it as I 
don't want to screw up my settings in 3.5.1. I'm a little busy right 
now with 'real work' :-) so please bare with me for a while.

Another thing... I have now also downloaded and used Netbeans 3.6beta 
I will try and test this problem in that version also and will add 
comments here about my findings.
Comment 11 Ken Frank 2004-06-08 20:03:13 UTC
Could someone summarize the way nb handles encodings and the
options, if any, for various file/project types about encoding,
both the default way its used and if user needs to set other encoding.
See also other issues and rfes on encoding handling.

It would be good if nb had unified and standard ways for encodings
to be set and handled.

ken.frank@sun.com
06/07/2004
Comment 12 digithed 2004-06-28 10:37:16 UTC
This is still a problem in Netbeans 3.6
Comment 13 Miloslav Metelka 2004-06-30 17:11:42 UTC
AFAIK by default the default jvm settings are used when loading the
files - i.e. the single-parameter InputStreamReader constructor is
used so the default bytetocharconverter gets used.
Some mime-types e.g. xml or jsp override this default behavior and
read the encoding information from the file and construct the
appropriate InputStreamReader.
Java module has a special support regarding encodings - there is an
user-selectable option for changing of the encoding for the java files.

With the new build system there was some effort to build up a
supporting infrastructure allowing encoding changes but I'm not aware
what is the current status of it.
Comment 14 Ken Frank 2004-06-30 17:34:10 UTC
am changing this to P2 as it seems to be an important issue - user 
who operates in their own locale,
using the characters and encoding of that locale, should be
able to see the characters of that locale when source is saved
and file is reloaded, or in any other case, assuming jdk supports
that locale, which for Sweden I think it is.

It really shouldnt matter what file type user is working with
in this situation; it doesnt seem it should be a matter of
each file type needing to handle this separately - but rather
editor or nb infrastructure.

ken.frank@sun.com
06/30/2004
Comment 15 Jesse Glick 2004-06-30 22:56:28 UTC
"It really shouldnt matter what file type user is working with in this
situation; it doesnt seem it should be a matter of each file type
needing to handle this separately - but rather editor or nb
infrastructure." - however it *does* matter, because different file
types may specify their encoding differently. XML, for example, is
handled completely differently than anything else. And file encoding
needs to interact with the parser and build system, not just the editor.
Comment 16 Marian Mirilovic 2004-08-03 07:47:20 UTC
reevalutate please ...
Comment 17 Jan Becicka 2004-08-03 08:03:57 UTC
To Petr Felenda:
Are you able to reproduce it? I can save/load files with czech
characters without problems.
Comment 18 pfelenda 2004-08-09 15:26:04 UTC
Honzo, I am not able to reproduce the problem described by reporter.
Comment 19 Jan Becicka 2004-08-09 15:47:07 UTC
Is anybody able to reproduce it?
Comment 20 Jan Becicka 2004-08-17 07:54:25 UTC
I'd like to change priority of this issue to P3 since we are unable to
reproduce this issue. Any objections?
Comment 21 Tomas Hurka 2004-08-17 12:49:36 UTC
Closing as works-for-me. Please reopen, if you can reproduce it with current dev. 
build. Thanks.
Comment 22 andrei.baciu 2012-01-11 20:18:10 UTC
After many years... the problem came back.
So after about 10 hours of tests and other tests i can say NetBeans 7.1 has a problem with swedish characters. Namely åöä and ÅÖÄ.

So i started editing a old web site of mine. The site is in swedish. When i opened a file, i got this message: "The encoding UTF-8; specified in meta tag of the document portfolio_wedding.html is invalid. Do you want to load the file using windows-1252 encoding?". If i say yes, ä becomes ä and ö becomes Ã¥ and so on.

If i try to save it, this message appears: "The encoding utf-8; specified in meta tag of the document portfolio_wedding.html is invalid or the document contains characters which cannot be saved using this encoding. Do you want to save the file using the original windows-1252 encoding?".

I pick YES. At this point the file will look wrong in NetBeans editor window but show up ok in the browser window.

Ofcourse, most people will never get to check the browser seeing the characters all mangled in the source. So most, including me, started fixing the mangling. So i replaced manually all the Ã¥ with ö and so forth. Now, the file looks ok in the editor even after saving even if it still asks at save: "The encoding utf-8; specified in meta tag of the document portfolio_wedding.html is invalid or the document contains characters which cannot be saved using this encoding. Do you want to save the file using the original windows-1252 encoding?".

The BIG surprise is that when i go back to the browser, all my special characters, öäå look like ���.

I guess anyone can try this by saving portfolio_wedding.html from: http://foto.aastudio.se/se/portfolio_wedding.html and do all above again. I know i did it a few times.

My OS: Windows 7 64

Best regards,
Andrei.
Comment 23 Jesse Glick 2012-01-11 21:33:45 UTC
(In reply to comment #22)
> When i
> opened a file, i got this message: "The encoding UTF-8; specified in meta tag
> of the document portfolio_wedding.html is invalid.

Which it is. Your file has

<meta http-equiv="Content-Type" content="text/html; charset=utf-8;charset=utf-8" />

which is wrong. BTW Emacs when opening this file also warns you:

Warning: unknown coding system "utf-8;charset=utf-8"

> Do you want to load the file
> using windows-1252 encoding?". If i say yes, ä becomes ä and ö becomes Ã¥ and
> so on.

Which is exactly what should happen if you load a UTF-8 file in windows-1252.

Anyway the mentioned issue is unrelated to anything reported previously in this issue and pertains the web/html component in Bugzilla.
Comment 24 andrei.baciu 2012-01-12 10:14:58 UTC
Well... lets pick another file then.

The file: http://foto.aastudio.se/test_netbeans.html

That file has the right charset, works in a browser and i even opened it in Emacs and it works just fine. Now, when i try to open it in NetBeans 7.1 it starts with: "The encoding UTF-8; specified in meta tag of the document test_netbeans.html is invalid. Do you want to load the file using windows-1252 encoding?".

So how can i fix this? Or is there again a mistake on my side?

And about the wrong charset line in my first file... I did fix that yesterday on my local version way before posting here but somehow forgot to fix the one online.

Best regards,
Andrei.
Comment 25 Jesse Glick 2012-01-12 16:22:45 UTC
(In reply to comment #24)
> http://foto.aastudio.se/test_netbeans.html
> 
> has the right charset, works in a browser

Most browsers are set to be very lax about various standards, so this does not prove much.

> and i even opened it in Emacs and it works just fine.

Emacs 23.3.1 also reports (in status line and *Messages*) for this file:

Warning: unknown coding system "utf-8;"

Obviously the trailing semicolon is unexpected.

> So how can i fix this?

If you believe the trailing semicolon is permitted by the HTML spec (I have never seen this usage before but it might be technically legal), then file a bug report in the web/html component as I said in comment #23.


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo