This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 131561 - I18N - changing project encoding may cause data loss
Summary: I18N - changing project encoding may cause data loss
Status: VERIFIED FIXED
Alias: None
Product: java
Classification: Unclassified
Component: Project (show other bugs)
Version: 6.x
Hardware: All All
: P2 blocker with 1 vote (vote)
Assignee: Tomas Zezula
URL:
Keywords: I18N
Depends on:
Blocks:
 
Reported: 2008-03-30 22:14 UTC by joconner
Modified: 2012-05-21 13:50 UTC (History)
7 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description joconner 2008-03-30 22:14:23 UTC
Changing a project's encoding may cause data loss in source files if the target encoding doesn't contain characters in
the  source (original) encoding. For example, a Java file containing Japanese characters (perhaps a ListResourceBundle)
will contain '?' characters when the target encoding doesn't contain the Japanese characters.

The NB editor should alert the user that data loss is probable in this situation, allowing the user to avoid the file
corruption.
Comment 1 joconner 2008-03-30 22:16:30 UTC
More information is provided in this blog entry:
http://joconner.com/site/entry/netbeans_6_1_uses_utf1
Comment 2 Petr Dvorak 2008-04-23 10:21:00 UTC
It is a DEFECT, imho... I know that the data loss caused by the change of the encoding is inevitable, but it could be
announced in advance by the IDE...

I switching to DEFECT, I lowering the priority to P4 and reassigning to i18n...
Comment 3 Petr Dvorak 2008-04-23 10:27:03 UTC
[Well, I will just say I think it is really not such a big deal...]
Comment 4 Ken Frank 2008-04-23 16:51:14 UTC
Wouldn't data loss be something more important, so that p4 is too low ?

I don't know if this relates to assumpptions about new project encoding properties
and details so that it could be expected, but as joshis mentions a warning could be
helpful.

is this just about properties files, since that is the category of this issue,
or is it about java files or any project files ? In that case a different
category would be needed.

see these faq items to see if it offers any clarifications:

http://wiki.netbeans.org/wiki/view/FaqI18nProjectEncoding
http://wiki.netbeans.org/wiki/view/FaqI18nChangeProjectEncodingImpact
http://wiki.netbeans.org/wiki/view/FaqI18nFileEncodingQueryObject

ken.frank@sun.com
Comment 5 Ken Frank 2008-04-23 17:45:26 UTC
moving to p2 since its about data loss
moving back to ide category since its not about properties files per se
and that is what this category is about (that is, i18n category is about
properties files and i18n wizard; i18n issues themselves belong in
the category in which it is seen, and in this case, when its not yet
clear what category would apply, the ide category is for issues like that,
and will be assigned to correct category.

ken.frank@sun.com
Comment 6 Tomas Zezula 2008-04-30 07:29:07 UTC
The IDE doesn't do any recoding of existing files when you change an encoding. There should be a warning in the project customizer which warns user about 
this, or it can when user agrees recode the source files.

Comment 7 Tomas Zezula 2008-06-17 16:11:49 UTC
I will add an dialog warning user about the possible consequences into j2seproject for now.
Can anyone provide me a warning message?
Comment 8 Ken Frank 2008-06-17 16:38:18 UTC
something like
"changing project encoding might result in some characters in existing files not being 
recognized and shown correctly" ?

I guess its one situation of the model where, to change global project encoding,
it needs to be done within the props of a given project, and that means
it impacts all previously created files of that project.

and thats why to ensure that all files in a project start off with some encoding other 
than utf-8, it means some other, new project should be created after encoding has been c
changed in some existing project.

How about a global encoding property, such that it would change global project
encoding for all subsequently created projects ?  I realize it would not help
case about this issue, but it could help avoid such an issue if docs were also
clear that - to change project encoding, change the option, THEN create a new project.

would this be a valid task or RFE ?

ken.frank@sun.com
Comment 9 joconner 2008-06-17 17:55:40 UTC
I think the engineers working on this issue still misunderstand. There is data loss -- permanent, irreversible data loss
-- when you follow the instructions I have provided. This is not a situation in which characters are simply not
recognized or not shown properly because of a font mismatch.

Again, here are steps to reproduce the problem:
1. create a new, default project with the default UTF-8 encoding
2. create a java class, Test.java, and then create a string containing characters outside of 8859-1 or US-ASCII. A good
example might be the word TANAKA as kanji (田中). Save the file.
3. change the project encoding to 8859-1 or ascii.
4. modify the Test.java file in some way...add a String or a comment perhaps.
5. Save the Test.java file.
6. You now have data loss...and the editor has never warned you about it. 

This is not simply a situation in which the editor doesn't display the characters correctly. In fact, the editor
continues to show the file correctly until you refresh it from the saved file. Open the file in another editor...you
will see that the original characters have been replaced with '?', actual 0x3F values. The original characters are gone.

I'm not saying NB is doing the wrong thing. After all, the user has changed the character encoding. However, I
respectfully suggest that if the NB engineers themselves don't understand that the file has changed, you cannot expect
an end-user to understand. In many cases, the user will not understand the significance. Saving the file at that point
attempts to save the file in the new encoding which does not support the original character. However, I do think that a
warning should be shown *before* saving...or perhaps a confirmation dialog should pop up when you change the encoding. 

Anyway, good luck with this. I think I've demonstrated this as much as I can without actually providing a video clip of
the problem....hey, maybe that's a good idea!
Comment 10 Tomas Zezula 2008-06-17 18:06:01 UTC
Thanks for the message.
>How about a global encoding property, such that it would change global project encoding for all subsequently created projects ?
Currently there is such an option but it's not visible to user. It's set automatically into the last used encoding (when you switch the project encoding).
There were long discussions what solution is better, I have no strong opinion abut this.

Comment 11 Ken Frank 2008-06-17 18:19:54 UTC
I just filed RFE on having the global project encoding option.
137472 - its on projects category; don't know if that is correct.

ken.frank@sun.com
Comment 12 Tomas Zezula 2008-06-17 18:21:30 UTC
To joconner: I understand the fact that the the files are destroyed not only wrongly shown in the editor. Unfortunately there is no way how to prevent user 
from doing it, user selected an encoding which is not able to store the content of files.
I suggest for now to warn an user when he selects different encoding in the project customizer about possible consequence.

>In many cases, the user will not understand the significance.
I fully agree, this is why I want to warn him.
>Saving the file at that point attempts to save the file in the new encoding which does not support the original character. However, I do think that a
>warning should be shown *before* saving...or perhaps a confirmation dialog should pop up when you change the encoding.
Yes, this is what I suggest show the warning when user changes the encoding in the project customizer. Unfortunately showing it in the editor before save is 
not so simple. But hopefully the warning poping up in the customizer when user changes encoding will be enough.
Comment 13 Tomas Zezula 2008-06-18 16:26:00 UTC
I've added the message into the j2seproject.
Log:
38a9cb73108f
30bc0cfa5a39

Adding Tomas to cc if he want to integrate it also in the web project.
Comment 14 joconner 2008-06-18 16:43:50 UTC
You do have another option for avoiding the data loss. If the user attempts to save a file into a more restrictive
encoding, you could ascii-encode the problem characters. For example, imagine the user attempts to save a file with
letter é (LATIN SMALL LETTER E WITH ACUTE) after selecting the project encoding US-ASCII. You could encode the é as
\u00E9 instead, which would preserve the character data in the file despite the US-ASCII encoding choice.
Comment 15 Tomas Mysik 2008-06-18 21:23:05 UTC
> Adding Tomas to cc if he want to integrate it also in the web project.

Thanks a lot, will do.
Comment 16 Quality Engineering 2008-06-19 04:26:49 UTC
Integrated into 'main-golden', available in NB_Trunk_Production #268 build
Changeset: http://hg.netbeans.org/main/rev/38a9cb73108f
User: Tomas Zezula <tzezula@netbeans.org>
Log: #131561:I18N - changing project encoding may cause data loss
Comment 17 Quality Engineering 2008-06-20 15:50:32 UTC
Integrated into 'main-golden', available in NB_Trunk_Production #271 build
Changeset: http://hg.netbeans.org/main/rev/b5b285f86894
User: Tomas Mysik <tmysik@netbeans.org>
Log: #131561:I18N - changing project encoding may cause data loss
Comment 18 Ken Frank 2008-07-30 19:17:42 UTC
to nb developers:

I want to verify - aside from j2se and web projects mentioned in this issue,
does it mean other project types need to implement this themselves ?
(the warning dialog)

a random look at a few project types shows the warning when try to change
project encoding property - thus is this implemented for all nb projects
that have encoding property ui ?

ken.frank@sun.com

Comment 19 Tomas Zezula 2008-07-31 07:19:21 UTC
Yes, the dialog comes from the project customizer. The project is responsible for implementing it.
I am not sure which project types have implemented this dialog. As far as I remember the j2se project and web project did, but I am not sure about others.

Comment 20 Ken Frank 2008-07-31 19:43:36 UTC
am verifying in context of this issue for j2se.

Tomas, in general, when there is something implemented that might apply to all
projects, like this warning dialog, how is it communicated to dev on other
project types that something is available (and in many cases I am guessing 
needs to be implemented for consitency) - is it thru some mail or web page ?

am asking since think this fix/feature needs to be communicated to those other
projects; I don't think its efficient to file issues on each and every project;
and am guessing thats not how its done for other common or shared features that still
need to be implemented per project ?

ken.frank@sun.com
Comment 21 Jesse Glick 2012-05-11 16:44:25 UTC
As per comment #14, I agree that it would be better for the editor kit to try to avoid data loss regardless of project settings: escape unencodable characters where the content type permits that (Java, XML, HTML, properties, ...), and do something saner for content types which have no well-defined Unicode escape syntax: refuse to save the modified buffer (so the user has a chance to at least make a backup of the original file); or try to provide similar content - via java.text.Normalizer.Form.NFKD, if the result is encodable, or a Java-style escape in the general case so that ascii2native could be used as a last resort.

To comment #20, agreed that implementing something like this in just one project type (j2seproject here) is not scalable.
Comment 22 Jesse Glick 2012-05-21 13:50:03 UTC
(In reply to comment #21)
> refuse to save the modified buffer

88c8987cbaa0 (no bug number) appears to be doing this.