This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Changing a project's encoding may cause data loss in source files if the target encoding doesn't contain characters in the source (original) encoding. For example, a Java file containing Japanese characters (perhaps a ListResourceBundle) will contain '?' characters when the target encoding doesn't contain the Japanese characters. The NB editor should alert the user that data loss is probable in this situation, allowing the user to avoid the file corruption.
More information is provided in this blog entry: http://joconner.com/site/entry/netbeans_6_1_uses_utf1
It is a DEFECT, imho... I know that the data loss caused by the change of the encoding is inevitable, but it could be announced in advance by the IDE... I switching to DEFECT, I lowering the priority to P4 and reassigning to i18n...
[Well, I will just say I think it is really not such a big deal...]
Wouldn't data loss be something more important, so that p4 is too low ? I don't know if this relates to assumpptions about new project encoding properties and details so that it could be expected, but as joshis mentions a warning could be helpful. is this just about properties files, since that is the category of this issue, or is it about java files or any project files ? In that case a different category would be needed. see these faq items to see if it offers any clarifications: http://wiki.netbeans.org/wiki/view/FaqI18nProjectEncoding http://wiki.netbeans.org/wiki/view/FaqI18nChangeProjectEncodingImpact http://wiki.netbeans.org/wiki/view/FaqI18nFileEncodingQueryObject ken.frank@sun.com
moving to p2 since its about data loss moving back to ide category since its not about properties files per se and that is what this category is about (that is, i18n category is about properties files and i18n wizard; i18n issues themselves belong in the category in which it is seen, and in this case, when its not yet clear what category would apply, the ide category is for issues like that, and will be assigned to correct category. ken.frank@sun.com
The IDE doesn't do any recoding of existing files when you change an encoding. There should be a warning in the project customizer which warns user about this, or it can when user agrees recode the source files.
I will add an dialog warning user about the possible consequences into j2seproject for now. Can anyone provide me a warning message?
something like "changing project encoding might result in some characters in existing files not being recognized and shown correctly" ? I guess its one situation of the model where, to change global project encoding, it needs to be done within the props of a given project, and that means it impacts all previously created files of that project. and thats why to ensure that all files in a project start off with some encoding other than utf-8, it means some other, new project should be created after encoding has been c changed in some existing project. How about a global encoding property, such that it would change global project encoding for all subsequently created projects ? I realize it would not help case about this issue, but it could help avoid such an issue if docs were also clear that - to change project encoding, change the option, THEN create a new project. would this be a valid task or RFE ? ken.frank@sun.com
I think the engineers working on this issue still misunderstand. There is data loss -- permanent, irreversible data loss -- when you follow the instructions I have provided. This is not a situation in which characters are simply not recognized or not shown properly because of a font mismatch. Again, here are steps to reproduce the problem: 1. create a new, default project with the default UTF-8 encoding 2. create a java class, Test.java, and then create a string containing characters outside of 8859-1 or US-ASCII. A good example might be the word TANAKA as kanji (田中). Save the file. 3. change the project encoding to 8859-1 or ascii. 4. modify the Test.java file in some way...add a String or a comment perhaps. 5. Save the Test.java file. 6. You now have data loss...and the editor has never warned you about it. This is not simply a situation in which the editor doesn't display the characters correctly. In fact, the editor continues to show the file correctly until you refresh it from the saved file. Open the file in another editor...you will see that the original characters have been replaced with '?', actual 0x3F values. The original characters are gone. I'm not saying NB is doing the wrong thing. After all, the user has changed the character encoding. However, I respectfully suggest that if the NB engineers themselves don't understand that the file has changed, you cannot expect an end-user to understand. In many cases, the user will not understand the significance. Saving the file at that point attempts to save the file in the new encoding which does not support the original character. However, I do think that a warning should be shown *before* saving...or perhaps a confirmation dialog should pop up when you change the encoding. Anyway, good luck with this. I think I've demonstrated this as much as I can without actually providing a video clip of the problem....hey, maybe that's a good idea!
Thanks for the message. >How about a global encoding property, such that it would change global project encoding for all subsequently created projects ? Currently there is such an option but it's not visible to user. It's set automatically into the last used encoding (when you switch the project encoding). There were long discussions what solution is better, I have no strong opinion abut this.
I just filed RFE on having the global project encoding option. 137472 - its on projects category; don't know if that is correct. ken.frank@sun.com
To joconner: I understand the fact that the the files are destroyed not only wrongly shown in the editor. Unfortunately there is no way how to prevent user from doing it, user selected an encoding which is not able to store the content of files. I suggest for now to warn an user when he selects different encoding in the project customizer about possible consequence. >In many cases, the user will not understand the significance. I fully agree, this is why I want to warn him. >Saving the file at that point attempts to save the file in the new encoding which does not support the original character. However, I do think that a >warning should be shown *before* saving...or perhaps a confirmation dialog should pop up when you change the encoding. Yes, this is what I suggest show the warning when user changes the encoding in the project customizer. Unfortunately showing it in the editor before save is not so simple. But hopefully the warning poping up in the customizer when user changes encoding will be enough.
I've added the message into the j2seproject. Log: 38a9cb73108f 30bc0cfa5a39 Adding Tomas to cc if he want to integrate it also in the web project.
You do have another option for avoiding the data loss. If the user attempts to save a file into a more restrictive encoding, you could ascii-encode the problem characters. For example, imagine the user attempts to save a file with letter é (LATIN SMALL LETTER E WITH ACUTE) after selecting the project encoding US-ASCII. You could encode the é as \u00E9 instead, which would preserve the character data in the file despite the US-ASCII encoding choice.
> Adding Tomas to cc if he want to integrate it also in the web project. Thanks a lot, will do.
Integrated into 'main-golden', available in NB_Trunk_Production #268 build Changeset: http://hg.netbeans.org/main/rev/38a9cb73108f User: Tomas Zezula <tzezula@netbeans.org> Log: #131561:I18N - changing project encoding may cause data loss
Integrated into 'main-golden', available in NB_Trunk_Production #271 build Changeset: http://hg.netbeans.org/main/rev/b5b285f86894 User: Tomas Mysik <tmysik@netbeans.org> Log: #131561:I18N - changing project encoding may cause data loss
to nb developers: I want to verify - aside from j2se and web projects mentioned in this issue, does it mean other project types need to implement this themselves ? (the warning dialog) a random look at a few project types shows the warning when try to change project encoding property - thus is this implemented for all nb projects that have encoding property ui ? ken.frank@sun.com
Yes, the dialog comes from the project customizer. The project is responsible for implementing it. I am not sure which project types have implemented this dialog. As far as I remember the j2se project and web project did, but I am not sure about others.
am verifying in context of this issue for j2se. Tomas, in general, when there is something implemented that might apply to all projects, like this warning dialog, how is it communicated to dev on other project types that something is available (and in many cases I am guessing needs to be implemented for consitency) - is it thru some mail or web page ? am asking since think this fix/feature needs to be communicated to those other projects; I don't think its efficient to file issues on each and every project; and am guessing thats not how its done for other common or shared features that still need to be implemented per project ? ken.frank@sun.com
As per comment #14, I agree that it would be better for the editor kit to try to avoid data loss regardless of project settings: escape unencodable characters where the content type permits that (Java, XML, HTML, properties, ...), and do something saner for content types which have no well-defined Unicode escape syntax: refuse to save the modified buffer (so the user has a chance to at least make a backup of the original file); or try to provide similar content - via java.text.Normalizer.Form.NFKD, if the result is encodable, or a Java-style escape in the general case so that ascii2native could be used as a last resort. To comment #20, agreed that implementing something like this in just one project type (j2seproject here) is not scalable.
(In reply to comment #21) > refuse to save the modified buffer 88c8987cbaa0 (no bug number) appears to be doing this.