This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 19928

Summary: I18N - Source Code Encoding capability
Product: platform Reporter: lsomchai <lsomchai>
Component: TextAssignee: issues@editor <issues>
Status: RESOLVED INVALID    
Severity: blocker CC: gooddreams, issues, issues, jchalupa, jf4jbug, jglick, lneme, mentlicher, mgrummich, misterm, pjiricka, ppisl, sdedic, strzinek
Priority: P2 Keywords: API, ARCH, I18N
Version: 3.x   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Exception Reporter:
Bug Depends on: 20259, 42638    
Bug Blocks: 21748, 67337, 77034    

Description lsomchai 2002-01-30 03:40:19 UTC
I found that there is the encoding capability with JSP source file. When I 
change encoding for example from ISO-8859-1 to UTF-8, it does not convert from 
but set to.
For Java source file. Yes, Java compiler only accept ASCII. It is very hard to 
input unicode escape character.
If it has encoding capability as JSP file and has menu for convert encoding file
(can use native2ascii), it will be easier and comfortable for use.
Comment 1 Miloslav Metelka 2002-01-30 09:00:18 UTC
I think there should be some general mechanism for handling the file
encodings/conversions. IMHO currently only the default encoding set to
JVM is used in the IDE.
 When reading the file the editor kit only gets the input stream for
reading so it chooses the default encoding. The editor kit must be
given a Reader (with the proper byte-to-char-converter) instead.
Reassigning to core but openide should be involved too I guess.
Comment 2 David Simonek 2002-03-18 12:16:34 UTC
passing to Peter.
Comment 3 lsomchai 2002-03-23 09:40:33 UTC
In addition, I can put my comment and generate 
document in my own language.
Comment 4 Marek Grummich 2002-07-22 08:42:26 UTC
Target milestone was changed from '3.4' to TBD.
Comment 5 Marek Grummich 2002-07-22 09:07:23 UTC
Target milestone was changed from '3.4' to TBD.
Comment 6 Jesse Glick 2002-08-27 15:57:59 UTC
I have a feeling this is a duplicate of something.

By the way Isomchai - try my little experimental module,
insertunicode.nbm, from

http://contrib.netbeans.org/servlets/ProjectDownloadList

It at least makes it easier to insert (but not read) escapes - mostly
for alphabetic/syllabic languages, as it is too clumsy to be useful
for ideographs.
Comment 7 Jesse Glick 2002-08-27 16:05:00 UTC
*** Issue 25191 has been marked as a duplicate of this issue. ***
Comment 8 Jesse Glick 2002-08-27 16:28:18 UTC
Such an API has been proposed and discussed in various forms on
several occasions on the list throughout the past couple of years.
Suggestions that I remember have included:

- an EncodingCookie which supplies the encoding of a file

- cause EditorCookie to automatically decode/encode the file according
to a locale property associated with it

Definitely needs a complete proposal and discussion; the issue is
pretty complicated when you consider:

- How much for API vs. hidden implementation?

- usage of platform default encoding vs. a standard encoding like UTF-8

- Unix vs. Win vs. Mac line endings - should the same mechanism solve
this problem?

- external processes like javac may need to know file encoding, so
encoding cannot be completely hidden in implementation

- UI to present the choice? prop ed needed (issue #20259); per-file
selection? per-file-type? per-filesystem (issue #25189)? global default?

- input methods: is the OS's keyboard support and JRE's input method
framework sufficient for users to enter international text in the
editor, or do we need any more support?

- escape vs. raw: for XML, HTML, .properties, and .java, there are
standardized Unicode escape syntaxes. Should the Editor window display
the raw characters, the escapes, or should you be able to choose on
the fly (a question for editor.netbeans.org probably)? Should the file
saved to disk contain the raw characters (encoded suitably), the
escapes (encoding irrelevant), or should this be a choice (i.e.
"escaped" is a special kind of "encoding")?
Comment 9 Ken Frank 2002-10-12 18:57:20 UTC
25191 is a duplicate of this issue; so am marking this one as defect
since
 after consulation with nb QA and comments from nb
strategy that some i18n rfes could actually be viewed as defects. 
Let me know if more details are needed.

Also, 20259 will be marked also as defect as above.

Finally, would 27240 be a duplicate of this also ? If so, I can mark
it as such.
ken.frank@sun.com
Comment 10 Miloslav Metelka 2002-10-14 14:36:11 UTC
To the previous note from Jesse:
IMO the editor should display the content that was obtained from the
java.io.Reader without changes i.e. if there is a "raw" unicode char
that char should be displayed and if there was '\\' 'u' ... then that
text should be displayed.
IMHO the additional tweaking with the characters such as expanding to
escapes etc. should be treated as a pluggable filters. In general
there could be several cascaded filters.
We should discuss whether the input methods are enough for inputting
of the characters. I have no valuable opinion of that because I don't
use the input methods.
Comment 11 Jesse Glick 2002-10-14 20:29:13 UTC
*** Issue 27240 has been marked as a duplicate of this issue. ***
Comment 12 Jesse Glick 2002-10-14 20:30:29 UTC
Issue #27240 also suggest per-file-type encoding defaults in some
uniform way. But I think we need per-file encodings anyway.
Comment 13 MiguelM 2002-10-14 23:57:11 UTC
Jesse Click raises several interesting questions, which I'd 
like to address. For Example:

>> - an EncodingCookie which supplies the encoding of a file
>> - cause EditorCookie to automatically decode/encode the 
>> file according to a locale property associated with it
I'm not quite sure what these two mean, but there exists a 
current mechanism for specifying the encoding for .java 
files. The encoding gets saved in the directory's .nbattrs 
file. This works well.

>> - How much for API vs. hidden implementation?
The current mechanism to specify the encoding of .java 
files works well, and I feel it should be applied to all 
files. It shouldn't be hidden, because the user needs some 
control to specify which files use which encodings.

>> - usage of platform default encoding vs. a standard 
>> encoding like UTF-8
The platform default should be the default encoding, but 
the user needs to be able to override it for specific files 
or file types.

>> - Unix vs. Win vs. Mac line endings - should the same 
>> mechanism solve this problem?
This is an interesting idea, but I suspect it would cause 
more problems than it would solve. Line endings aren't an 
encoding issue. This should be seen as a separate issue, 
probably an editor issue. (Personally, I feel users should 
be allowed to specify a default line-ending, which should 
be used when saving files, but any standard line-ending 
should end a line when reading files.)

>> - external processes like javac may need to know file 
>> encoding, so encoding cannot be completely hidden in 
>> implementation
If the file is always loaded using the specified encoding, 
the external processes shouldn't have any problems.

>> - UI to present the choice? prop ed needed (issue 
>> #20259); per-file selection? per-file-type? per-
>> filesystem (issue #25189)? global default?
A property editor would be a good idea. It's a separate 
issue, though, and should be considered separately. I'd 
also like a to specify the encoding by file type, but this 
shoudn't be seen as a substitute for specifying by specific 
files.

>> - input methods: is the OS's keyboard support and JRE's 
>> input method framework sufficient for users to enter 
>> international text in the editor, or do we need any more 
>> support?
Input methods are a separate issue. (In my experience, they 
are perfectly adequate, and we shouldn't have to worry 
about them.)

>> - escape vs. raw: for XML, HTML, .properties, and .java, 
>> there are standardized Unicode escape syntaxes. Should 
>> the Editor window display the raw characters, the 
>> escapes, or should you be able to choose on the fly (a 
>> question for editor.netbeans.org probably)? Should the 
>> file saved to disk contain the raw characters (encoded 
>> suitably), the escapes (encoding irrelevant), or should 
>> this be a choice (i.e. "escaped" is a special kind 
>> of "encoding")?
Again, this isn't an encoding issue, but it raises an 
interesting question: What happens if an editor enters 
characters that aren't supported by the file's encoding? 
However, currently, the java.io package already has a 
policy to handle unsupported data. (For ISO 8859-1, 
unsupported characters are converted to question marks.) 
Users may want the editor to highlight the unsupported data 
somehow. But this is an editor issue, not an encoding 
issue. (Properties files use escaped characters because 
java requires them to be in a the  ISO 8859-1 encoding, so 
they can be cross-platform. Again, this is an editor issue, 
not an encoding issue, although there is certainly some 
overlap.) I like Miloslav Metelka's suggestion.

However we decide this, we should keep in mind that, for 
multi-platform/multi-Locale projects, there's a lot of 
transferring files from one user to another, so there's no 
telling what the encoding should be for any file. So the 
user needs to be given the maximum possible control. 

Personally, I'd be happy to see all files get a text tab in 
their properties view, just like .java files do. This 
wouldn't let me specify encodings for specific file types, 
but gives me the flexibility I need to solve this problem. 
And it could be done quickly--the code already exists.

Here's my (wacky) workaround. Currently, I need all .sql 
and .utx files encoded with UTF-8. So, in Tools:Options, I 
go to:
  IDE Configuration
    System
      Object types
        Java Source Objects
and I set the "File Extensions" property to 
  java, sql, utx
Then, for my sql and utx files, I set the compiler to (do 
not compile).
Comment 14 Peter Zavadsky 2002-10-15 17:06:55 UTC
Ken, I don't understand why you have marked this as defect? It is pure
feature/enhancement.
I also don't understand how an enhancement could be viewed as defect?
Comment 15 Peter Zavadsky 2002-10-16 10:08:25 UTC
After talking with QA
Changing to back to feature. 
And also sinnc it is not a must-have feature decreasing back the
priority too.

If the feature is important, it has to should be pushed thru plans in
accordance to other features. The resources are limited and not all
features could be must-have ones.
Comment 16 Marian Mirilovic 2002-12-06 17:18:10 UTC
reassigne to David K., new owner of editor
Comment 17 David Konecny 2003-03-21 15:14:21 UTC
*** Issue 32028 has been marked as a duplicate of this issue. ***
Comment 18 andrew 2003-05-27 21:56:57 UTC
1. To tell the truth, it seems very strange for me,
the issue is marked as RFE rarher DEFECT. When it is
impossible to do some every day work (like editing
text file, for example), the module (text module in
my case) has P1 bug.

2. As the issue has rather long period of life, I think
simple palliative step may be done:

- introduce global system property how to interpret byte
  stream, OR
- introduce such property for text editor only (java,
  editor has such one, XML and HTML editors are clever
  enough to invoke encoding from appropriate language
  items).

I think, such little step demands one hour of efforts
of NB guru. On the other hand, significant part of users
problems will be resolved with such step (I see, it is 
not a decision for _all_ users problem).

I'm afraid to incur NB developers anger :-), so I leave
the issue priority and type as is.

Andrew
Comment 19 David Konecny 2003-05-28 08:45:38 UTC
I agree that as a short term solution this should be fixed in plain
text editor similarly as in Java editor. I would suggest to file an
issue against text module asking for this.

Frankly speaking I'm not planning to properly fix this issue soon.
First, it is not trivial, second, I do not have resources for that.

Somebody will have to contribute this. :-)
Comment 20 MiguelM 2003-05-28 09:45:49 UTC
That "short term solution" you describe sounds fine to me. 
I suspect that's all people are really looking for. I'm not 
sure why a new bug should be filed against text module. 
Can't this bug report just be reassigned? 

When I opened issue 27240 (now closed as dup of this), all 
I was concerned with is that the editor read the file in 
the proper encoding, and convert to Unicode. Once I start 
editing, I already have everything I need. If I need IMEs, 
I have them. Just make NetBeans read and write the files 
with the proper encoding. Thanks.

If this bug report has a larger scope than 27240, please 
reopen 27240 and assign it to the text module.
Comment 21 Jan Chalupa 2003-05-28 10:02:11 UTC
To Miguel: please don't change the version field. The bug was first
logged against FFJ 3.0 and since it's still open, it's understood that
it applies to all subsequent versions of NB, FFJ and S1S.

Version: 3.5 -> FFJ 3.0.
Comment 22 David Konecny 2003-05-28 13:58:31 UTC
Yes, I think this issue is asking for proper solution on file
granularity, etc. That's why I would want to keep it open. I reopened
issue 32028 which was closed as duplicate of this one. Your one has
larger scope, it asks for setting this property for all files.
Comment 23 David Konecny 2004-04-30 11:57:20 UTC
See issue 42638 which proposes simple File Encoding API.
Comment 24 Jesse Glick 2004-05-17 04:52:12 UTC
Cf. issue #6050 ("Faster alternative to EditorCookie") which
recommends a Reader and Writer interface to a file rather than only
Document.
Comment 25 Ken Frank 2004-07-26 22:04:18 UTC
To NB dev team - has any of the things discussed in this issue
been implemented already ?

any in progress ?

any that should have a seprate rfe filed ?

ken.frank@sun.com
07/26/04
Comment 26 Jesse Glick 2004-07-26 23:24:24 UTC
To Ken: no; no; and probably no. This stuff should be solved in a
reasonably complete proposal to overhaul file encoding in the IDE. No
one has worked seriously on such a proposal yet.
Comment 27 Jesse Glick 2004-12-01 19:03:56 UTC
*** Issue 51672 has been marked as a duplicate of this issue. ***
Comment 28 Jesse Glick 2005-03-09 23:37:43 UTC
*** Issue 55751 has been marked as a duplicate of this issue. ***
Comment 29 Jesse Glick 2005-03-10 16:35:58 UTC
*** Issue 55739 has been marked as a duplicate of this issue. ***
Comment 30 Jesse Glick 2005-03-18 15:21:40 UTC
*** Issue 56597 has been marked as a duplicate of this issue. ***
Comment 31 misterm 2005-10-26 14:30:49 UTC
Any chance this issue gets solved? New CVS Diff is facing problems due to lack 
of encoding support. If you have a file with latin characters and change one 
line, all lines containing latin characters are marked as different.
Comment 32 _ pkuzel 2005-10-26 15:55:52 UTC
In CVS we have file caches. Files in cache do not have original extension to
avoid confusion of tools that recursively process directory content by extensions.

Solutions:
  - the API could take InputStreamProvider and String (original file name) to
address it. may be also original MIME
  - wait for JRE 6.0 that allows to set file hidden flag (and rewrite all tools
to check it...
  - CVS cache should use workdir file encoding (but here is invalid assumtion
that encoding can not change over time)
Comment 33 Ken Frank 2005-10-26 18:34:05 UTC
To misterm - your comments about cvs and latin chars - can you
elaborate a little and tell which locale you are in when
running ide; in the file, are there characters in encoding or charset
other than the one that is default for the locale you are in;
are the issues also about filenames that have characters of 
extended ascii or multibyte ?

ken.frank@sun.com
10/26/2005
Comment 34 misterm 2005-10-26 18:51:06 UTC
>------- Additional comments from kfrank Wed Oct 26 17:34:05 +0000 2005 -------

> To misterm - your comments about cvs and latin chars - can you
> elaborate a little and tell which locale you are in when
> running ide; 

pt-BR in one machine and en-US in the other one, using Windows default encoding 
(cp1252, i guess)

> in the file, are there characters in encoding or charset
> other than the one that is default for the locale you are in;

No, just regular characters for my locale such as ç, ã, á etc.

> are the issues also about filenames that have characters of 
> extended ascii or multibyte ?

NB CVS support used to have problems with it, but I haven't tested it lately.
Comment 35 Ken Frank 2006-05-04 18:20:01 UTC
As Jesse mentions, it would help to have overall proposal
and solution; how could that happen ?  I've seen over time
this kind of question about need for encoding capability
arises. Thats why changing this to p2.


ken.frank@sun.com
Comment 36 Jesse Glick 2006-05-04 23:15:22 UTC
Just restoring original version field.
Comment 37 Antonin Nebuzelsky 2008-04-17 15:15:03 UTC
Reassigning to new module owner mslama.
Comment 38 Quality Engineering 2008-12-23 14:33:56 UTC
This issue had *6 votes* before move to platform component
Comment 39 Vitezslav Stejskal 2009-12-01 10:11:36 UTC
*** Bug 168265 has been marked as a duplicate of this bug. ***
Comment 40 Vitezslav Stejskal 2009-12-01 10:11:54 UTC
*** Bug 55738 has been marked as a duplicate of this bug. ***
Comment 41 Vitezslav Stejskal 2009-12-01 10:15:36 UTC
*** Bug 177714 has been marked as a duplicate of this bug. ***
Comment 42 Vitezslav Stejskal 2009-12-01 10:17:51 UTC
Also see issue #114123 and http://wiki.netbeans.org/TextEncodingFOW.
Comment 43 Jesse Glick 2010-07-02 19:49:10 UTC
Obsolete issue.