This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | Changes are made to unedited French characters while saving a file | ||
---|---|---|---|
Product: | platform | Reporter: | PrakharMathur |
Component: | Text | Assignee: | Vladimir Voskresensky <vv159170> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dstrupl, mmirilovic |
Priority: | P2 | Keywords: | I18N |
Version: | 7.0 | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | DEFECT | Exception Reporter: | |
Attachments: | Proposed patch. |
Description
PrakharMathur
2011-03-15 06:50:13 UTC
not a stopper. IDE tries to autodetect encoding of file by reading first 1024 chars. Workaround: add some french comment at very beginning of the file. E.g. french copyright header. See also issue 191323 and issue 193476 More details about what Jan said. It actually checks the first 1024 bytes as to whether the encoding set as property of a project really matches the content of the file. If it does not match a warning is shown. The easy workaround should be to set encoding of a project to be the same as the encoding of the edited files. Closing the report as invalid. 1024 check is incorrect. We have to check full file content always. See reopened P2 CR#6992232 and probably issue #196945 as well more correct version of encoding check to prevent false positives private static boolean checkIfCharsetCanDecodeFile(FileObject fo, Charset charset) { try { int BUF_SIZE = 1024*4; BufferedInputStream input = new BufferedInputStream(fo.getInputStream(), BUF_SIZE); try { CharsetDecoder decoder = charset.newDecoder(); decoder.reset(); try { BufferedReader reader = new BufferedReader(new InputStreamReader(fo.getInputStream(), decoder), BUF_SIZE); char[] buf = new char[BUF_SIZE]; while (reader.read(buf) > 0) {} reader.close(); } catch (CharacterCodingException e) { ERR.log(Level.FINE, "Encoding problem using " + charset, e); // NOI18N return false; } catch (IllegalStateException e) { if (!e.getMessage().contains("CODING_END")) { ERR.log(Level.FINE, "Encoding problem using " + charset, e); // NOI18N return false; } } } finally { input.close(); } } catch (IOException ex) { ERR.log(Level.FINE, "Encoding problem using " + charset, ex); // NOI18N } return true; } in fact - data loss is P1... PrakharMathur, can you attach your file, please? Thanks, Vladimir. I am sorry, but the above fix does not seem good to me: it means that all files will be read twice - and encoding will be OK for vast majority of them. I would prefer if the overhead would be near-zero in such cases. I will attach a patch that tries to achieve that. Created attachment 107352 [details]
Proposed patch.
Jan, I'm fine with your patch and have only one comment: What would be internal state of kit which read part of file, then encoding exception was thrown and user said "No"? Integrated into 'main-golden', will be available in build *201103290400* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main/rev/1b8d3b85a037 User: Vladimir Voskresensky <vv159170@netbeans.org> Log: fixed #196945 - [70cat] org.openide.text.DataEditorSupport$1: The file cannot be safely opened with encoding UTF-8. Do you want to continue opening it? -- check content of whole file, otherwise we incorrectly reject UTF-8 and also can corrupt user's file without notification (issue #196707 - Changes are made to unedited French characters while saving a file) (Sorry for so late response, I have lots of meetings this week.) (In reply to comment #10) > Jan, I'm fine with your patch and have only one comment: > What would be internal state of kit which read part of file, then encoding > exception was thrown and user said "No"? I would expect less problems with the do-not-open path, as the document will be thrown away in that case. Not sure if that will be the case for the open-anyway path (that's why the patch tries to remove the document's content). The kit itself should not be holding any state (there is just one instance of a kit per mimetype in the IDE). I am a bit worried about the internal state of the document, though. Your patch is of course much safer in the respect that much fewer things can go wrong. But, reading each file twice is not acceptable, IMO, especially expecting that in almost all cases the encoding will be reasonable. Jan, so what to push? Now trunk has your variant. I've tested it with files which made issues and everything works as expected. I've asked editor team about integration in issue #196945 If needed, I would go with my variant. |