Bug 75906 - I18N - Add support for other encodings (other than ISO-8859-1)
I18N - Add support for other encodings (other than ISO-8859-1)
Status: RESOLVED FIXED
Product: utilities
Classification: Unclassified
Component: Properties
5.x
All All
: P2 with 9 votes (vote)
: 7.4
Assigned To: Jan Peska
issues@utilities
JA_COMMUNITY
: I18N
: 93636 99231 125875 155934 198631 210088 228196 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-04 10:57 UTC by Marian Petras
Modified: 2013-08-20 12:00 UTC (History)
13 users (show)

See Also:
Issue Type: ENHANCEMENT
:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marian Petras 2006-05-04 10:57:34 UTC
JDK 1.6.0 b81 supports saving of properties files in various encodings (i.e. not
just ISO-8859-1). Allow users to choose their preferred encoding of properties
files (at least in projects based on JDK 1.6.0 and newer).

For more information, see JDK issue #6204853
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6204853)
Comment 1 Jesse Glick 2006-12-10 20:25:27 UTC
As I mentioned in my comments to the JDK bug, the new Reader constructor is a
bit half-baked; there is no way for the IDE to tell that a given .properties
file is in a different encoding.
Comment 2 Marian Petras 2007-01-09 12:47:38 UTC
This could be done in two phases:

1) Support for ISO-8859-1 and for the system's default encoding.

   If the .properties file cannot be read using ISO-8859-1, try loading it
   using the system's default encoding. Hold the information about
   which encoding was used for loading the file and use the same encoding
   when saving the modified content.

2) Add support for other encodings once issue #42638
   ("Provide support for File Encoding") is resolved.

If issue #42638 is resolved in NB 6.0 M7 (it is currently planned so), I would
skip the first phase.
Comment 3 Marian Petras 2007-01-09 12:59:36 UTC
Issue #42638 is planned for 6.0 M7, so I plan this to M8 - some time is needed
for adaptation of the API introduced by fix of #42638.
Comment 4 Jesse Glick 2007-01-09 17:41:17 UTC
"If the .properties file cannot be read using ISO-8859-1, try loading it using
the system's default encoding." - this cannot work, I think; all byte sequences
are valid in ISO-8859-1, IIRC.
Comment 5 Marian Petras 2007-01-10 10:06:01 UTC
I would actually detect any characters greater than 0x7e, as there should not be
any in files written by Properties.store(OutputStream, String). I am not sure it
is a good idea, though.
Comment 6 Jiri Prox 2007-01-31 08:56:08 UTC
*** Issue 93636 has been marked as a duplicate of this issue. ***
Comment 7 Marian Petras 2007-04-26 14:32:45 UTC
Now that issue #32392 ("Edit Text Rather than Escape Sequences") is fixed, I do
not plan to support other encodings. Any text typed in the editor is encoded to
a pure ASCII file, with non-ASCII characters encoded in form of \uxxxx sequences.

See also issue #97861 ("Update properties data object to use FileEncodingQuery").
Comment 8 Ken Frank 2007-04-26 17:01:20 UTC
just want to be clear for testing - that the user can enter text with characters
of the locale they are currently in OR with characters of the currently set 
project encoding property (once that is implemented for all project types)

and that in editor it will be changed to show the escaped ascii ?

- is this for editing the property file as the file itself or also for the view
of it
where one gets to input keys and values ?

- does it mean that as they type into property file some multibyte characters,
for example, that they are automatically converted into the escaped ascii sequence ?

ken.frank@sun.com
Comment 9 Marian Petras 2007-04-27 07:57:17 UTC
All properties are saved with encoding ISO-8859-1 (ISO Latin 1) - there is no
change in this. The change is that, when saving the file, characters that are
not part of the ISO-8859-1 character table are not silently replaced with a
question mark (as it used to work) but they are silently replaced with
corresponding \uxxxx sequences as specified in method
java.util.Properties.store(...) - see
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html#store%28java.io.OutputStream,%20java.lang.String%29

The user can enter any characters they want (including multibyte), characters
having Unicode value less than 20h or greater than 1eh will be saved as \uxxxx
sequences, where 'xxxx' is a Unicode value of the corresponding character
expressed with hexadecimal digits. When the file is opened (loaded) in NetBeans,
these sequences will be decoded and corresponding characters will be displayed
in the editor instead of the sequences. The user is still allowed to enter
\uxxxx sequences in the editor - these sequences will not be modified during
saving but they will be decoded when the file is later loaded.

The above mechanism is independent of the locale settings of the IDE and of the
project's or file's settings.

The view where one gets to input keys and values is unchanged - it has always
allowed to enter any characters and translated them to \uxxxx sequences as
necessary. There is one remaining issue connected with it - when the user edits
the .properties file using the table view and he/she has also the editor view
for the same file opened, non-ASCII characters entered in the table view are
promoted to the editor view as \uxxxx escape sequences. This is no longer
necessary and I just filed issue #102699 for it.
Comment 10 Marian Petras 2007-04-27 08:00:42 UTC
Correction: instead of

    "less than 20h or greater than 1eh"

there should be

    "less than 20h or greater than 7eh"
Comment 11 Marian Petras 2007-04-27 08:01:41 UTC
Reoped so that status can be changed.
Comment 12 Marian Petras 2007-08-29 16:46:28 UTC
*** Issue 99231 has been marked as a duplicate of this issue. ***
Comment 13 Marian Petras 2007-08-29 17:10:45 UTC
It should be possible to semi-detect encoding of a .properties file by checking its content. If it contains any
character of value 0xff or larger, than the file:
 - either has not been modified from NetBeans yet
 - or it is a file encoded using UTF-8

In such a case, the file could by checked whether it could be decoded using the UTF-8 encoding. If it could, ask the
user (something like "Seems to be a UTF-8 encoded file. Right?") and let him/her decide whether it should be loaded
using ISO-8859-1 or UTF-8. If it could not, then use the ISO-8859-1 encoding.

In the question dialogue about encoding (ISO-8859-1 vs. UTF-8), the user could specify to always use the selected
encoding for all .properties files in the project.
Comment 14 Marian Petras 2007-08-29 17:16:06 UTC
The mechanism described in my previous comment could be also generalized so the algorithm would be:

1) Scan the file - search for non-ASCII characters.
2) If there are no non-ASCII characters found, use the default encoding/decoding used for .properties files
   (ISO-8859-1 with \uxxxx sequences translated to corresponding characters).
3) If there are some non-ASCII characters found, try to detect encoding, UTF-8 in the first place.
   If some encoding is detected, ask the user for confirmation. If no encoding is detected,
   let the user to specify which encoding to use.
Comment 15 Marian Petras 2007-10-08 12:37:05 UTC
*** Issue 114462 has been marked as a duplicate of this issue. ***
Comment 16 Ken Frank 2008-02-29 17:36:40 UTC
Matthias,

could you comment here on problems seen in
125875 and how this enhancement is same or related and impact 
on your team; perhaps it can be raised in priority.

 on cc list here
are developers who could reply to the comments.

ken.frank@sun.com
Comment 17 schmidtm 2008-03-03 15:19:42 UTC
Hi,

the Grails team needs this RFE as well. Grails message-bundles are exclusively UTF-8 encoded:

"The files must be saved in UTF-8 encoding if you wish to use non-ascii characters, which is contrary to standard Java
properties files which use the native Java VM encoding."

see http://docs.codehaus.org/display/GRAILS/Internationalization

At the moment we can not use NB6.1 to deal with grails message-bundles. Since we are working on a full-featured
Groov/Grails integration into NB6.1 we need to be able to work with theses files.
Comment 18 schmidtm 2008-03-03 15:29:17 UTC
*** Issue 125875 has been marked as a duplicate of this issue. ***
Comment 19 Petr Jiricka 2008-03-19 12:19:37 UTC
Based on all the discussion above, I think this is a much higher priority than P4 - bumping to P2. Can this be
considered for the next release?
Comment 20 Marian Petras 2008-03-19 13:11:58 UTC
Yes, it will be considered. When planning the next release, I will take priority of enhancements and feature requests
inti account.
Comment 21 Keiichi Oono 2009-05-01 03:56:26 UTC
PHP user also user .properties, when log4php is being used. It's a minor case but it's helpful if we have an option to
save .properties as system default encoding to save non-ascii characters used in comment line.
Comment 22 Jesse Glick 2010-07-07 18:08:25 UTC
*** Bug 155934 has been marked as a duplicate of this bug. ***
Comment 23 Jan Peska 2011-09-09 11:57:24 UTC
*** Bug 198631 has been marked as a duplicate of this bug. ***
Comment 24 Jan Peska 2012-04-03 07:55:34 UTC
*** Bug 210088 has been marked as a duplicate of this bug. ***
Comment 25 ominds 2012-07-24 01:53:48 UTC
This bug is biting me as well. I'm writing a Firefox extension, .properties files should be UTF-8 according to Mozilla. File was created externally by another developer as UTF-8, when I opened it, it seems netbeans tried to open it as ISO-8859-1 and all Chinese characters broke. 

If netbeans treats this file as any other project file when creating/opening/saving that should solve the problem since all project files are UTF-8.
Comment 26 netmackan 2012-08-08 11:34:15 UTC
I have the same problem. Working with properties files in UTF-8 works fine as long as nobody opens them with NetBeans IDE in which case each UTF-8 character is replaced with two strange characters :(
Comment 27 c69 2012-12-06 21:56:01 UTC
Two of my last projects have utf8 encoded .properties files which are used for localization.

Please, at least add an option to somehow suppress the current behaviour. We need to work with utf8 .properties file every day.
Comment 28 netbeans.89423 2012-12-06 21:57:24 UTC
Same problem here.
Comment 29 Jesse Glick 2012-12-06 22:43:14 UTC
Project types such as Grails or PHP which are known to mandate UTF-8 *.properties should simply provide a FileEncodingQueryImplementation saying so.

For other cases it would be trivial to write a plugin which lets you specify which *.properties should be treated as UTF-8: none (default config), all, based on file path regexp, etc. The downside is the need for manual configuration, especially if you also use standard *.properties files at times.

I am not sure it is possible to reliably detect UTF-8-encoded files, as such files would be loadable in ISO-8859-1 as well and plenty of *.properties in the field include raw European accent characters. There may be some libraries out there which can detect characteristic patterns of UTF-8 misinterpreted as ISO-8859-1, such as improbable punctuation or character sequences (¡å, Ä«). A plugin using juniversalchardet [1] to sniff file contents might be very handy, for example. It may suffice to look for a high percentage of bytes in the 0x80–0x9F range, which are almost never used in ISO-8859-1 documents but frequent in UTF-8. One limitation of all such approaches is that it can only work for an existing *.properties file which has a significant amount of non-ASCII text in it, so an IDE user writing a new file would probably see it treated as UTF-8.

The Java team declined to mandate that UTF-8 *.properties start with a BOM, which would have solved the problem cleanly (at least for JVM-based projects; not PHP). NetBeans (or a NB plugin) could adopt Emacs’ convention, that files may start with a header comment specifying the encoding:

# -*- coding: UTF-8 -*-


[1] http://code.google.com/p/juniversalchardet/
Comment 30 netbeans.89423 2012-12-07 14:49:11 UTC
Actually, there already exists a global project encoding setting. Additionally, the general recommendation usually is to use UTF-8 and UTF-8 only. Using anything other than UTF-8 is bad behavior as can be discovered every day anew when inexperienced programmers forget to define file encodings in build.xml files etc and simple builds fail because one uses an UTF-8 console environment. We should really enforce best practices. Maybe Java should be changed to use UTF-8 by default. Always. (there are many other crappy defaults out there, like jdbc timezone handling etc., which should be killed once and for all)
Comment 31 Jan Peska 2012-12-12 08:26:04 UTC
From what I know .properties files are not used in PHP in general so I guess there are some PHP frameworks which use these files, right? If it is so please state name of these frameworks for me to have a better understanding of the situation.

Thanks
Comment 32 Jan Peska 2012-12-12 08:55:52 UTC
In what projects other than PHP do you have problem with .properties files encoding?
Comment 33 akobberup 2012-12-12 08:58:30 UTC
Google Web Toolkit also require utf-8 encoding of properties files:

https://developers.google.com/web-toolkit/doc/latest/DevGuideI18n?hl=en#DevGuidePropertiesFiles

... You must also ensure that all relevant source and .properties files are set to be in the UTF-8 charset in your IDE. ...
Comment 34 Jan Peska 2013-04-03 11:58:23 UTC
*** Bug 228196 has been marked as a duplicate of this bug. ***
Comment 35 ecerichter 2013-04-03 12:43:48 UTC
Would not suffice to just obey project properties encoding?
If my project is set to UTF-8, then assume properties are UTF-8. For me, this is all I need.

In case of need to detect the encoding, just check Notepad++ algorithm, it works almost perfectly IMHO.
Comment 36 gui 2013-04-17 06:32:26 UTC
Please implement it in the next release (maybe 7.4 or 7.5). So I can use Netbeans again.

It seems so easy for you guys. But if you can't do right now I'm encouraging someone to make a patch and send it for analysis.

Why its since 2006 without any "fix" guys?

Just give us some news about it (some main developers perhaps), its not WONTFIX so, what its that you can't do it?

Anyway thanks for you work and for your attention.
Comment 37 Jan Peska 2013-04-17 08:29:41 UTC
(In reply to comment #36)
> Please implement it in the next release (maybe 7.4 or 7.5). So I can use
> Netbeans again.
> 
> It seems so easy for you guys. But if you can't do right now I'm encouraging
> someone to make a patch and send it for analysis.
> 
> Why its since 2006 without any "fix" guys?
> 
> Just give us some news about it (some main developers perhaps), its not WONTFIX
> so, what its that you can't do it?
> 
> Anyway thanks for you work and for your attention.

As I've mentioned before I need to know in what project types you are using properties files with different encoding (not ISO 8859-1). 

PHP was mentioned, but PHP support developers would like to know particular use-cases of using .properties files in PHP (in what framework, etc.). In case of GWT it has to be fixed on side of the 3rd party plugin.

So let me know what particular problem do you have.

Thanks
Comment 38 akobberup 2013-04-17 08:34:52 UTC
Please clarify what you mean when stating that "In case
of GWT it has to be fixed on side of the 3rd party plugin".

Are you not basically saying that this will never be supported for java projects then?
Comment 39 Jan Peska 2013-04-17 08:38:53 UTC
(In reply to comment #38)
> Please clarify what you mean when stating that "In case
> of GWT it has to be fixed on side of the 3rd party plugin".
> 
> Are you not basically saying that this will never be supported for java
> projects then?

Well as far as I know standard Java API is designed to use ISO 8859-1 encoding for the properties file, but maybe I'm just missing something - what is the problem with .properties files in  java projects?
Comment 40 dynamite 2013-04-17 08:49:09 UTC
Properties files were originally ISO 8859-1, but since Java 1.5 they can also be read and written using a Reader in any encoding.  It is a *long* time since IDO 8859-1 was the required encoding. It is a surprise that are the reference IDE that NetBeans does not support this.

We use UTF-8 property files in our Java project to hold i18n translations.  This doesn't seem unreasonable.  JClearly I dare not open these files in NetBeans and it does make it hard to try and convince others to change away from Eclipse.
Comment 41 akobberup 2013-04-17 09:02:48 UTC
You are not missing anything - the java spec do indeed say so for the
load(InputStream) method.  

However, some frameworks/libs (such as GWT) seems to use the load(Reader)
method also of the Properties class, in which it is not specified what encoding
is used by the underlying inputstream.

I don't want to get into what seems to be a religious argument here, so i will
just describe my problem:

I use GWT for my web front end (this is all coded in java, so it is a java
project).
The localized files i keep for GWT to read must be UTF-8. 
Therefore i must set netbeans to open these files as plain text in order for it
to not convert the files to ISO-8859-1. 

So my issue is that i can not use the properties file editor.
Comment 42 Jan Peska 2013-04-17 12:42:29 UTC
IMHO we can't automatically assume that project encoding should be used for every properties file - standard Java project sources can be encoded in UTF-8 but resource bundles still has to be ISO-8859-1.

What if there would be a check box in project/properties "use project encoding for .properties files" and according to this check box the encoding of the properties files would be default (ISO-8859-1) or project specific (e.g. UTF-8).

Would that solve your problems with it?
Comment 43 akobberup 2013-04-17 12:53:42 UTC
That'll do the trick for me. 
Its fine that it is something that has to be actively selected for the individual project, as i agree that the standard should still be iso-8859-1 for properties files in java projects.
Comment 44 ecerichter 2013-04-17 17:31:47 UTC
(In reply to comment #42)
> IMHO we can't automatically assume that project encoding should be used for
> every properties file - standard Java project sources can be encoded in UTF-8
> but resource bundles still has to be ISO-8859-1.
> 
> What if there would be a check box in project/properties "use project encoding
> for .properties files" and according to this check box the encoding of the
> properties files would be default (ISO-8859-1) or project specific (e.g.
> UTF-8).
> 
> Would that solve your problems with it?

For me, it is a perfect workaround.

The perfect solution would make NetBeans detect each file encoding prior to opening (such algorithm would be very complex), but this seems to be overwhelming.


Edson
Comment 45 Jesse Glick 2013-04-17 19:04:12 UTC
(In reply to comment #44)
> The perfect solution would make NetBeans detect each file encoding prior to
> opening (such algorithm would be very complex)

See my comment #29 if you did not already.
Comment 46 ecerichter 2013-04-17 19:39:40 UTC
(In reply to comment #45)
> (In reply to comment #44)
> > The perfect solution would make NetBeans detect each file encoding prior to
> > opening (such algorithm would be very complex)
> 
> See my comment #29 if you did not already.

Yes, I was just enforcing my vision that the proposed workaround would work for me, but my humble opinion is that a complete solution would work better in all NB (not only .property files).

By today, my preferred editor with multi encoding support is Notepad++, which does a terrific job identifying file encoding as well converting from one encoding to another.

Would be nice to have such features in NetBeans and not needing external editors to do that.

Notepad++ is open source, and its algorithm would (or not) be easy to adapt to Java (I really don't know how easy would it be).


Regards,

Edson
Comment 47 Jan Peska 2013-04-18 08:51:14 UTC
I'll discuss it with Java and PHP guys, is there any other project which has this problem?
Comment 48 Jan Peska 2013-04-25 08:57:36 UTC
After discussion with Tomas Zezula (Java Project support). We have agreed on a little bit different approach in this case. 

User can check "use project encoding" property on a .properties file itself, not on its project. I know that it won't be very effective in case of project with multiple properties files, but on the other hand you can use this feature in any project.

I've added the property to the property sheet of a properties file and I'll integrate it today, so please take a look on that.
Comment 49 Jan Peska 2013-04-25 09:08:59 UTC
fix: http://hg.netbeans.org/core-main/rev/f32f64dd4a38
Comment 50 Quality Engineering 2013-04-28 02:16:42 UTC
Integrated into 'main-golden', will be available in build *201304272301* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/f32f64dd4a38
User: Jan Peska <JPESKA@netbeans.org>
Log: Issue #75906 - I18N - Add support for other encodings (other than ISO-8859-1)
Support usage of project encoding for .properties files
Comment 51 ecerichter 2013-04-28 22:04:06 UTC
Confirmed: with the fix, properties editor is working as expected.

Thanks!

Edson Richter
Comment 52 mcmagi 2013-04-29 16:28:25 UTC
Jan,

I'm very happy that this issue is getting attention and has a solution now.  I work in projects that have quite a lot of properties files to store internationalized content, so unfortunately checking off each file individually will be a somewhat tedious process.  Is there any way this can be done at a project or folder level?

Again, thanks for looking at this!
Comment 53 Jan Peska 2013-04-30 06:35:50 UTC
(In reply to comment #52)
> Jan,
> 
> I'm very happy that this issue is getting attention and has a solution now.  I
> work in projects that have quite a lot of properties files to store
> internationalized content, so unfortunately checking off each file individually
> will be a somewhat tedious process.  Is there any way this can be done at a
> project or folder level?
> 
> Again, thanks for looking at this!

No, unfortunately you can't specify it at project (or folder) level, it is a property on a properties file itself. I can evaluate possibility to select multiple properties files and then set the property for all of them at once if that would help you...
Comment 54 akobberup 2013-04-30 06:39:49 UTC
I also really appreciate that this is getting attention. 
In my current projects we have 80+ property files per language, that all have to be in utf-8, so i would appreciate the possibility to set this on multiple files at once.
Comment 55 Petr Jiricka 2013-04-30 06:54:52 UTC
> No, unfortunately you can't specify it at project (or folder) level, it is a
> property on a properties file itself.

So why could not we have both? If encoding is specified for an individual file, use this encoding. If it's not specified, use whatever is specified at the project level. If it's not specified at the project level, use the default ISO 8859_1. Would that work?
Comment 56 Jan Peska 2013-04-30 07:58:00 UTC
I would like to keep it as simple as possible. I've checked it and it works just fine if you select multiple files at once and set the property.
Comment 57 akobberup 2013-04-30 08:00:19 UTC
Thumbs up from Denmark then :) 10 points.
Comment 58 Jesse Glick 2013-04-30 12:42:18 UTC
Just a reminder that anyone with a need for a more general fix (e.g. sniffing encodings, looking for Emacs-style -*- mode headers, loading folder or project properties, etc.) can implement other strategies in plugins, which could be quite small (one class) using very limited bits of the NetBeans API.
Comment 59 alex.panchenko 2013-06-14 11:11:51 UTC
Was this fix included in 7.3.1 ?
Comment 60 Dranon 2013-06-19 14:54:11 UTC
Having to select the files will be a pain in some legacy projects where the .properties files have been placed where needed and not grouped in some way and I do happen to work on such a project with Netbeans and Eclipse. This should really be at a project level, especially for imported Eclipse's projects that use different encoding. This cause Netbeans to create files in the improper encoding in the project.
Comment 61 omeurice 2013-08-20 11:29:07 UTC
Where is this flag at the end???
I do not find it in release 7.2.1 nor 7.3.1.
I did not create a Java project, I just opened a maven project for a javascript client archive.  My properties files *need* to be UTF-8 but the IDE converts them automatically at save time.  That's definitely not what I need!
So please could you give me some clarifications about that dumb issue.
Thx.
Comment 62 Jan Peska 2013-08-20 12:00:31 UTC
This fix is a part of 7.4 - you can try it in 7.4 beta (https://netbeans.org/community/releases/74/)

(In reply to omeurice from comment #61)
> Where is this flag at the end???
> I do not find it in release 7.2.1 nor 7.3.1.
> I did not create a Java project, I just opened a maven project for a
> javascript client archive.  My properties files *need* to be UTF-8 but the
> IDE converts them automatically at save time.  That's definitely not what I
> need!
> So please could you give me some clarifications about that dumb issue.
> Thx.


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo