This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | I18N - Source Code Encoding capability | ||
---|---|---|---|
Product: | platform | Reporter: | lsomchai <lsomchai> |
Component: | Text | Assignee: | issues@editor <issues> |
Status: | RESOLVED INVALID | ||
Severity: | blocker | CC: | gooddreams, issues, issues, jchalupa, jf4jbug, jglick, lneme, mentlicher, mgrummich, misterm, pjiricka, ppisl, sdedic, strzinek |
Priority: | P2 | Keywords: | API, ARCH, I18N |
Version: | 3.x | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | ENHANCEMENT | Exception Reporter: | |
Bug Depends on: | 20259, 42638 | ||
Bug Blocks: | 21748, 67337, 77034 |
Description
lsomchai
2002-01-30 03:40:19 UTC
I think there should be some general mechanism for handling the file encodings/conversions. IMHO currently only the default encoding set to JVM is used in the IDE. When reading the file the editor kit only gets the input stream for reading so it chooses the default encoding. The editor kit must be given a Reader (with the proper byte-to-char-converter) instead. Reassigning to core but openide should be involved too I guess. passing to Peter. In addition, I can put my comment and generate document in my own language. Target milestone was changed from '3.4' to TBD. Target milestone was changed from '3.4' to TBD. I have a feeling this is a duplicate of something. By the way Isomchai - try my little experimental module, insertunicode.nbm, from http://contrib.netbeans.org/servlets/ProjectDownloadList It at least makes it easier to insert (but not read) escapes - mostly for alphabetic/syllabic languages, as it is too clumsy to be useful for ideographs. *** Issue 25191 has been marked as a duplicate of this issue. *** Such an API has been proposed and discussed in various forms on several occasions on the list throughout the past couple of years. Suggestions that I remember have included: - an EncodingCookie which supplies the encoding of a file - cause EditorCookie to automatically decode/encode the file according to a locale property associated with it Definitely needs a complete proposal and discussion; the issue is pretty complicated when you consider: - How much for API vs. hidden implementation? - usage of platform default encoding vs. a standard encoding like UTF-8 - Unix vs. Win vs. Mac line endings - should the same mechanism solve this problem? - external processes like javac may need to know file encoding, so encoding cannot be completely hidden in implementation - UI to present the choice? prop ed needed (issue #20259); per-file selection? per-file-type? per-filesystem (issue #25189)? global default? - input methods: is the OS's keyboard support and JRE's input method framework sufficient for users to enter international text in the editor, or do we need any more support? - escape vs. raw: for XML, HTML, .properties, and .java, there are standardized Unicode escape syntaxes. Should the Editor window display the raw characters, the escapes, or should you be able to choose on the fly (a question for editor.netbeans.org probably)? Should the file saved to disk contain the raw characters (encoded suitably), the escapes (encoding irrelevant), or should this be a choice (i.e. "escaped" is a special kind of "encoding")? 25191 is a duplicate of this issue; so am marking this one as defect since after consulation with nb QA and comments from nb strategy that some i18n rfes could actually be viewed as defects. Let me know if more details are needed. Also, 20259 will be marked also as defect as above. Finally, would 27240 be a duplicate of this also ? If so, I can mark it as such. ken.frank@sun.com To the previous note from Jesse: IMO the editor should display the content that was obtained from the java.io.Reader without changes i.e. if there is a "raw" unicode char that char should be displayed and if there was '\\' 'u' ... then that text should be displayed. IMHO the additional tweaking with the characters such as expanding to escapes etc. should be treated as a pluggable filters. In general there could be several cascaded filters. We should discuss whether the input methods are enough for inputting of the characters. I have no valuable opinion of that because I don't use the input methods. *** Issue 27240 has been marked as a duplicate of this issue. *** Issue #27240 also suggest per-file-type encoding defaults in some uniform way. But I think we need per-file encodings anyway. Jesse Click raises several interesting questions, which I'd like to address. For Example: >> - an EncodingCookie which supplies the encoding of a file >> - cause EditorCookie to automatically decode/encode the >> file according to a locale property associated with it I'm not quite sure what these two mean, but there exists a current mechanism for specifying the encoding for .java files. The encoding gets saved in the directory's .nbattrs file. This works well. >> - How much for API vs. hidden implementation? The current mechanism to specify the encoding of .java files works well, and I feel it should be applied to all files. It shouldn't be hidden, because the user needs some control to specify which files use which encodings. >> - usage of platform default encoding vs. a standard >> encoding like UTF-8 The platform default should be the default encoding, but the user needs to be able to override it for specific files or file types. >> - Unix vs. Win vs. Mac line endings - should the same >> mechanism solve this problem? This is an interesting idea, but I suspect it would cause more problems than it would solve. Line endings aren't an encoding issue. This should be seen as a separate issue, probably an editor issue. (Personally, I feel users should be allowed to specify a default line-ending, which should be used when saving files, but any standard line-ending should end a line when reading files.) >> - external processes like javac may need to know file >> encoding, so encoding cannot be completely hidden in >> implementation If the file is always loaded using the specified encoding, the external processes shouldn't have any problems. >> - UI to present the choice? prop ed needed (issue >> #20259); per-file selection? per-file-type? per- >> filesystem (issue #25189)? global default? A property editor would be a good idea. It's a separate issue, though, and should be considered separately. I'd also like a to specify the encoding by file type, but this shoudn't be seen as a substitute for specifying by specific files. >> - input methods: is the OS's keyboard support and JRE's >> input method framework sufficient for users to enter >> international text in the editor, or do we need any more >> support? Input methods are a separate issue. (In my experience, they are perfectly adequate, and we shouldn't have to worry about them.) >> - escape vs. raw: for XML, HTML, .properties, and .java, >> there are standardized Unicode escape syntaxes. Should >> the Editor window display the raw characters, the >> escapes, or should you be able to choose on the fly (a >> question for editor.netbeans.org probably)? Should the >> file saved to disk contain the raw characters (encoded >> suitably), the escapes (encoding irrelevant), or should >> this be a choice (i.e. "escaped" is a special kind >> of "encoding")? Again, this isn't an encoding issue, but it raises an interesting question: What happens if an editor enters characters that aren't supported by the file's encoding? However, currently, the java.io package already has a policy to handle unsupported data. (For ISO 8859-1, unsupported characters are converted to question marks.) Users may want the editor to highlight the unsupported data somehow. But this is an editor issue, not an encoding issue. (Properties files use escaped characters because java requires them to be in a the ISO 8859-1 encoding, so they can be cross-platform. Again, this is an editor issue, not an encoding issue, although there is certainly some overlap.) I like Miloslav Metelka's suggestion. However we decide this, we should keep in mind that, for multi-platform/multi-Locale projects, there's a lot of transferring files from one user to another, so there's no telling what the encoding should be for any file. So the user needs to be given the maximum possible control. Personally, I'd be happy to see all files get a text tab in their properties view, just like .java files do. This wouldn't let me specify encodings for specific file types, but gives me the flexibility I need to solve this problem. And it could be done quickly--the code already exists. Here's my (wacky) workaround. Currently, I need all .sql and .utx files encoded with UTF-8. So, in Tools:Options, I go to: IDE Configuration System Object types Java Source Objects and I set the "File Extensions" property to java, sql, utx Then, for my sql and utx files, I set the compiler to (do not compile). Ken, I don't understand why you have marked this as defect? It is pure feature/enhancement. I also don't understand how an enhancement could be viewed as defect? After talking with QA Changing to back to feature. And also sinnc it is not a must-have feature decreasing back the priority too. If the feature is important, it has to should be pushed thru plans in accordance to other features. The resources are limited and not all features could be must-have ones. reassigne to David K., new owner of editor *** Issue 32028 has been marked as a duplicate of this issue. *** 1. To tell the truth, it seems very strange for me, the issue is marked as RFE rarher DEFECT. When it is impossible to do some every day work (like editing text file, for example), the module (text module in my case) has P1 bug. 2. As the issue has rather long period of life, I think simple palliative step may be done: - introduce global system property how to interpret byte stream, OR - introduce such property for text editor only (java, editor has such one, XML and HTML editors are clever enough to invoke encoding from appropriate language items). I think, such little step demands one hour of efforts of NB guru. On the other hand, significant part of users problems will be resolved with such step (I see, it is not a decision for _all_ users problem). I'm afraid to incur NB developers anger :-), so I leave the issue priority and type as is. Andrew I agree that as a short term solution this should be fixed in plain text editor similarly as in Java editor. I would suggest to file an issue against text module asking for this. Frankly speaking I'm not planning to properly fix this issue soon. First, it is not trivial, second, I do not have resources for that. Somebody will have to contribute this. :-) That "short term solution" you describe sounds fine to me. I suspect that's all people are really looking for. I'm not sure why a new bug should be filed against text module. Can't this bug report just be reassigned? When I opened issue 27240 (now closed as dup of this), all I was concerned with is that the editor read the file in the proper encoding, and convert to Unicode. Once I start editing, I already have everything I need. If I need IMEs, I have them. Just make NetBeans read and write the files with the proper encoding. Thanks. If this bug report has a larger scope than 27240, please reopen 27240 and assign it to the text module. To Miguel: please don't change the version field. The bug was first logged against FFJ 3.0 and since it's still open, it's understood that it applies to all subsequent versions of NB, FFJ and S1S. Version: 3.5 -> FFJ 3.0. Yes, I think this issue is asking for proper solution on file granularity, etc. That's why I would want to keep it open. I reopened issue 32028 which was closed as duplicate of this one. Your one has larger scope, it asks for setting this property for all files. See issue 42638 which proposes simple File Encoding API. Cf. issue #6050 ("Faster alternative to EditorCookie") which recommends a Reader and Writer interface to a file rather than only Document. To NB dev team - has any of the things discussed in this issue been implemented already ? any in progress ? any that should have a seprate rfe filed ? ken.frank@sun.com 07/26/04 To Ken: no; no; and probably no. This stuff should be solved in a reasonably complete proposal to overhaul file encoding in the IDE. No one has worked seriously on such a proposal yet. *** Issue 51672 has been marked as a duplicate of this issue. *** *** Issue 55751 has been marked as a duplicate of this issue. *** *** Issue 55739 has been marked as a duplicate of this issue. *** *** Issue 56597 has been marked as a duplicate of this issue. *** Any chance this issue gets solved? New CVS Diff is facing problems due to lack of encoding support. If you have a file with latin characters and change one line, all lines containing latin characters are marked as different. In CVS we have file caches. Files in cache do not have original extension to avoid confusion of tools that recursively process directory content by extensions. Solutions: - the API could take InputStreamProvider and String (original file name) to address it. may be also original MIME - wait for JRE 6.0 that allows to set file hidden flag (and rewrite all tools to check it... - CVS cache should use workdir file encoding (but here is invalid assumtion that encoding can not change over time) To misterm - your comments about cvs and latin chars - can you elaborate a little and tell which locale you are in when running ide; in the file, are there characters in encoding or charset other than the one that is default for the locale you are in; are the issues also about filenames that have characters of extended ascii or multibyte ? ken.frank@sun.com 10/26/2005 >------- Additional comments from kfrank Wed Oct 26 17:34:05 +0000 2005 ------- > To misterm - your comments about cvs and latin chars - can you > elaborate a little and tell which locale you are in when > running ide; pt-BR in one machine and en-US in the other one, using Windows default encoding (cp1252, i guess) > in the file, are there characters in encoding or charset > other than the one that is default for the locale you are in; No, just regular characters for my locale such as ç, ã, á etc. > are the issues also about filenames that have characters of > extended ascii or multibyte ? NB CVS support used to have problems with it, but I haven't tested it lately. As Jesse mentions, it would help to have overall proposal and solution; how could that happen ? I've seen over time this kind of question about need for encoding capability arises. Thats why changing this to p2. ken.frank@sun.com Just restoring original version field. Reassigning to new module owner mslama. This issue had *6 votes* before move to platform component *** Bug 168265 has been marked as a duplicate of this bug. *** *** Bug 55738 has been marked as a duplicate of this bug. *** *** Bug 177714 has been marked as a duplicate of this bug. *** Also see issue #114123 and http://wiki.netbeans.org/TextEncodingFOW. Obsolete issue. |