42638 – [devrev] I18N - Provide support for File Encoding

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 42638 - [devrev] I18N - Provide support for File Encoding

Summary: [devrev] I18N - Provide support for File Encoding

Status:	RESOLVED FIXED

Alias:	None

Product:	projects
Classification:	Unclassified
Component:	Generic Infrastructure (show other bugs)
Version:	4.x
Hardware:	All All

Importance:	P2 blocker with 15 votes (vote)
Assignee:	Tomas Zezula

URL:
Keywords:	API, I18N

Duplicates (7):	51864 56597 69803 71483 74766 95888 97320 (view as bug list)
Depends on:	98197
Blocks:	19928 32028 39521 55810 57515 71006 79337 87358 92642 92751 94676 97848 97861 97878
	Show dependency tree

Reported:	2004-04-30 11:51 UTC by David Konecny
Modified:	2007-04-26 14:32 UTC (History)
CC List:	15 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
implementation (36.83 KB, patch) 2004-04-30 11:54 UTC, David Konecny	Details \| Diff
Diff file with the new API/SPI (8.86 KB, patch) 2007-02-09 14:48 UTC, Tomas Zezula	Details \| Diff
Default implementations (delegate to DataObject's and Project's lookup) (7.47 KB, patch) 2007-02-09 14:50 UTC, Tomas Zezula	Details \| Diff
Patch of j2seproject (42.79 KB, patch) 2007-02-09 14:51 UTC, Tomas Zezula	Details \| Diff
Internal javac IO layer (2.62 KB, patch) 2007-02-09 14:53 UTC, Tomas Zezula	Details \| Diff
Diff of project/queries arch.xml (4.34 KB, patch) 2007-02-12 14:58 UTC, Tomas Zezula	Details \| Diff
Fixed project's diff, see TP01 (20.81 KB, patch) 2007-02-26 13:42 UTC, Tomas Zezula	Details \| Diff
Fixed j2seproject's diff, see TP03 (45.01 KB, patch) 2007-02-26 13:43 UTC, Tomas Zezula	Details \| Diff
Diff files (33.39 KB, application/octet-stream) 2007-03-13 18:05 UTC, Tomas Zezula	Details
Loaders with test and apichanges (7.70 KB, patch) 2007-03-14 11:27 UTC, Tomas Zezula	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Konecny 2004-04-30 11:51:11 UTC

There is several defects/RFEs for better
customization of encoding of files. Check issue
19928 and all its duplicates.

I propose to add a new API which for the given
FileObject returns its encoding. It follows Query
design pattern: it has SPI which permits providers
to answer query and it has API which is static
method calling all SPIs.

SPI implementation: on the SPI side I implemented
simple per-project provider which answers query
for all project files and allows both global per
project encoding and also per file type encoding
(based on the file extension). I expect that only
global project encoding would be customizable in
J2SE Project customizer and that per-type encoding
would be available only for advanced users and
would be customized directly in Ant properties file.

API usage: ideally DataEditorSupport  should use
this API, but that would create dependecy from
openide/loaders to project/queries. So instead of
that I would use new API directly from java module
(see its o.n.m.java.Util.getFileEncoding) and
similarly it could be used by text module (see
issue 32028). Even this small change would be
beneficial for our users.

In future there should be other implementations of
SPI which globally can answer encoding for file
types like properties, xml, html, etc.

Comment 1 David Konecny 2004-04-30 11:52:54 UTC

I filed this against projects component only because the API is
currently located in project/queries module.

Comment 2 David Konecny 2004-04-30 11:54:28 UTC

Created attachment 14639 [details]
implementation

Comment 3 Jesse Glick 2004-04-30 15:43:45 UTC

Adding Radek to CC. This query (as well as SharabilityQuery and
CollocationQuery) would ideally have been part of some FS Ext API if
we had one. But Trung says no to starting one for D, so
projects/queries is the only good place for it except the Filesystems
API itself.

Comment 4 David Konecny 2004-05-04 17:53:39 UTC

Web usecases from offline discussion:

USE CASE 1:
Web project needs to support per-file encodings. For example JSP can
have encoding specified directly in the JSP file but it can be also
specified in web descriptor. The web descriptor allows to specify
encoding per file, per folder of files, per group of files where group
is defined as regular expresion, etc.

USE CASE 2:
Web project may also contain TXT files which also may need different
per-file encodings. These files are then dynamically embedded into
some other JSP/HTML documents. These encodings cannot be stored in web
descriptor.

Of course in both these usecases the encoding must be sharable.

----

Current API/SPI supports both usecases. It is just a question of
implementation of SPI provider which provides per-file encodings in
case of Web project. And that should be implemented by Web project
together with appropriate UI.

The SimpleFileEncodingQueryImpl could be (perhaps in future) extended
to support per-file encodings declaration. Either with some properties
like format (e.g. "encoding:src/org/myapp/Foo.jsp=Win1250") or some
dedicated XML fragement in project.xml. TBD. That would allow Web team
to reuse it for per-file encodings which cannot be stored in web
descriptor. Advantage of that is that project type do not have to
invent their own storage format.

Comment 5 Jesse Glick 2004-05-04 18:24:21 UTC

Note: for a future per-file syntax for the support class I might
suggest e.g.

encoding.pattern.UTF-8=**/*.txt,**/*.java
encoding.pattern.ISO-8859-1=**/*.html

i.e. like Ant patternsets.

Comment 6 Petr Jiricka 2004-05-04 20:41:05 UTC

Well, I guess I'll be the devil's advocate. This issue pretends to be
the solution for 19928, however the UI you are adding certainly does
not cover all the cases mentioned in 19928. I think it would be useful
to describe how the various use cases of 19928 will be solved and what
role this API will play in the solution.

Next, I didn't see a specification of the UI anywhere. Is this
available? Thanks.

Comment 7 Jesse Glick 2004-05-04 21:00:36 UTC

Certainly this is not an attempt to cover all the use cases mentioned
in issue #19928; just to provide an initial API/SPI usable for
defining a per-project encoding to use when editing *.java and *.txt,
and to tie that to the -encoding switch used for javac for project
types which elect to do so.

The UI is minimal: one extra text field ("File Encoding:") in the
project properties dialog.

Comment 8 David Konecny 2004-05-05 09:06:35 UTC

Re. syntax suggestion: thanx, this looks much better.

Comment 9 Petr Pisl 2004-05-05 15:08:41 UTC

There is another use case, which is not directly connected with webapps.

USE CASE 3:
User wants write a description or documentation for a project, which
will be published on a website. The documentation will only a few html
pages, no a web application. There can be html files, which contains
only part of the html code. These pages are then included in the
result html with web server and these "html fragments" should not
contain the meta tag for encoding. The meta tag is now used for saving
the file in appropriate encoding.  There is a part of html file, where
the different file is included dynamicaly with Apache web server:

...
    <head>
        <title>Example</title>
        <meta http-equiv="Content-type" CONTENT="text/html;
charset=iso-8859-2">
    </head>
...  
   <!--#include file="menu.htm" -->
...

The web server try to parse the html file and try found the encoding.
Then the file is read with the encoding and the response has the same
encoding. The included file is read with the same encoding as the html
file has. In my example the menu.htm has to be saved in ISO-8859-2
encoding. Users of NB are not able to edit and save the included files
in correct encoding and he can include what he wants. For example a
java file with no English comments.

So the API/SPI should be able to store the encoding per a file. My
idea is that module should take care about the UI, but your API/SPI
should do a support for reading and saving the encoding.

Comment 10 Jesse Glick 2004-05-05 18:42:29 UTC

Of course anyone who wants to forget about encoding issues permanently
can just use UTF-8 everywhere to begin with, as modern operating
systems are starting to do...

Comment 11 Petr Jiricka 2004-05-05 19:23:34 UTC

Unfortunately, for JSPs, 8859_1 is the default.

You should read the encoding section in the JSP spec ;-) Encoding of
web applications (coupled with HTTP encoding issues and browser
implementation issues and specific JSTL and Faces I18N issues) is an
area understood only by the expert group members. Mere mortals like me
are lost.

Comment 12 Petr Pisl 2004-05-06 09:31:30 UTC

The encoding UTF-8 solve the problems. But we don't live in ideal
world. For example Apache 2.0 has as default encoding ISO-8859-1 for
html files. As the Petr J. mentioned in new JSP specification as
default encoding is set ISO-8859-1 as well. 

We have plenty of requests and complains from I18N group, the Japanese
and non egnlish writing people to solve this for all IDE. Yes, we have
solution for JSP and html, but we should have a consistent solution
for all files in the IDE. I thing there should be a support from the
openIDE/core for this.

Comment 13 Jesse Glick 2004-05-06 15:14:33 UTC

Yes, I just heard about the ISO-8859-1 problem with the JSP spec from
Petr. Can't imagine why anyone in the 21st century would choose that
as the default encoding for anything. :-(

As far as approving or denying this particular API request, the
question is only this: since we know the A/SPI can handle at least
basic use cases needed for j2seproject with *.java and *.txt, is it
flexible enough to deal with the following situations if someone
wanted to write support for them, i.e. we would not have to come up
with a totally different infrastructure?

- intrinsic encodings: XML and some HTML files which have their own
internally specified encoding, or .properties files which have the
encoding fixed by the definition of the file format

- JSPs and HTML files which have an encoding specified by some
deployment descriptor

- files which have an encoding manually set by the user within some
project, e.g. Java sources, either on a per-file or per-project or
per-pattern basis

Remember that the editor supports for files with intrinsic encodings
might ignore the FEQ and just load the file acc. to the stated
encoding, but editor supports for other files should eventually ask
FEQ how to load and save the file. Either way, there should probably
be FEQI impls for files with internally specified encodings, in case
other FEQ clients want to know. Find in Files and also TODO scanning
should at some point use FEQ to know how to create an
InputStreamReader to search the content of the file as Unicode text.

Note that the Subversion VCS is supposed to allow you to specify the
encoding of a particular file as a metadata attribute. Probably some
operating systems permit this too. From that perspective, having the
query pattern is nice because it means we can plug in support for such
filesystems etc. at a particular location in default lookup, so that
you could pick up the file encoding from a native source rather than
the project etc. if that were appropriate.

Comment 14 Jesse Glick 2004-05-06 20:39:03 UTC

Just remembered a potentially significant issue: several file types
(XML, HTML, Java, properties) have support for Unicode escapes of
various kinds (&#xxxx; or &nnnnn; or &ouml; or \uXXXX etc.) which are
insensitive to the file encoding. If you want to support Find in Files
etc. on such files with proper Unicode support, the encoding query
would not suffice; you would need to actually get a Reader from the
file that could transparently interpret such escape sequences. (I
think #19928 mentions some kind of cookie to produce a Reader, though
the cookie solution w/ DataObject is messy for e.g. *.properties
files, and it would not help a project to determine the encoding.) TBD
whether there is any use case for getting a Writer. Also an API to get
a Document has been suggested (performance implications TBD).

Comment 15 David Konecny 2004-05-12 15:52:40 UTC

Re. escaping: is it really related to encoding? Would not something
like this suffice: global static method somewhere in API which returns
smart reader/writer which is able to decode/encode escapes, could
consult FileEncoding query if necessary, etc.? It seems to me that it
is independent on encoding although in practice you need both.

Apart from this the API/SPI as is supports all the usecase which were
provided. The full power of API might not be used in first version. So
if there are no objections I would put it to trunk tomorrow.

Comment 16 Ken Frank 2004-07-26 21:22:24 UTC

Is the implementation of this in nb4 or is it now for
various modules to use these features to handle encodings ?

(asking since would need to test parts of nb4 for encoding
issues covered by these features)

ken.frank@sun.com
07/26/2004

Comment 17 Jesse Glick 2004-07-26 23:17:39 UTC

It is not in promo-D.

Comment 18 Jaroslav Tulach 2004-07-27 08:41:45 UTC

When ready to integrate, come back to apireviews@ for review with
final proposal.

Comment 19 Jesse Glick 2004-11-29 21:27:29 UTC

*** Issue 51864 has been marked as a duplicate of this issue. ***

Comment 20 David Konecny 2005-01-24 13:43:35 UTC

Jesse, I'm moving project infrastructure  base freeform issues to you.

Comment 21 Jesse Glick 2005-01-24 18:16:07 UTC

Not yet any API to review.

Comment 22 Jesse Glick 2005-04-11 17:13:58 UTC

*** Issue 56597 has been marked as a duplicate of this issue. ***

Comment 23 Jesse Glick 2005-04-11 17:15:55 UTC

Apparently the current state is

1. New impl dependencies on diff module added from other modules.

2. The dependency is on a method in the diff module that computes the encoding
for a given FileObject.

3. The encoding is computed using reflection on various pieces of the system
(java, CES, etc.).

Comment 24 Jesse Glick 2005-12-11 18:01:38 UTC

*** Issue 69803 has been marked as a duplicate of this issue. ***

Comment 25 Jesse Glick 2006-01-17 18:40:06 UTC

*** Issue 71483 has been marked as a duplicate of this issue. ***

Comment 26 gyftaki 2006-01-17 21:23:52 UTC

I am sorry if I am posting in the wrong place, but my request #71483 was 
directed as a duplicate of this, which honestly has nothing to do with what I 
was asking. This issue(#42638) deals with the primitive problems of encoding, 
but my question had to do with the compiling options, and encoding was an 
example of it. Also I looked around the settings to see if an encoding option 
was available for the output and nothing, can anyone please explain how my 
issue is a duplicate of this?

My issue can be found at http://www.netbeans.org/issues/show_bug.cgi?id=71483

Thanks

Comment 27 Jesse Glick 2006-01-17 23:08:46 UTC

This issue definitely deals with compiler encoding.

No issue available for output encoding; all stdio uses platform default
encoding, I think, not much to do about it.

Comment 28 anchoret 2006-07-24 10:20:19 UTC

long time has passed since this issue was submitted. Anything has been done? I
thought it is very very very important issue or say, serious bug, but why has
such a basic problem NOT been resolved since Netbeans5.5 has already in its
beta2 edition? Do NB developers, especially, developer leaders speak only
ENGLISH? Or Sun ONLY wants people who live in english world to use NB? So much
expectation we give to NB, but it dissapointed us once and again. No complete
i18n support, even no plain text file encoding support (editing, debuging) means
it is only a toy for real user.

Comment 29 abs 2006-07-29 16:17:46 UTC

I posted issue 69803.
http://www.netbeans.org/issues/show_bug.cgi?id=69803

But I think there is no progress of this issue.

NetBeans should provide encoding setting option to project and every files.
Otherwise non english users can't resolve encoding problems.

Comment 30 anchoret 2006-08-21 04:03:43 UTC

When will Sun or Netbeans' developers realize this is VERY IMPORTANT and BASIC
problem which block seriously NB to be used widely? I found in NB6 M2 the
problem still exists. Even in English world, people don't speak english always
and programers need to code content written in other languages and other
encodings, specially, in webapp field. Then, why do we still expect Netbeans?
Eclipse has been already doing it better and better. Only for the ideal "pure
java"? But the patience is limited. What we need is a real tool which can help
us resolve problem including the basic issue and not a toy which only has many
beautiful and complex colors.

Comment 31 Petr Pisl 2006-08-21 17:13:23 UTC

I absolutely agree with you and I increase the priority. I'm a NetBeans
developer and I want this support as well on general level since 2004, when I
started to solve encoding issue in webapp field. What exactly in webapp field do
you mean?

Comment 32 _ wadechandler 2006-08-21 19:23:49 UTC

I do not see why for a starter the encoding has to be determined, but can't just
be a setting at the project and file level.  Most times a user will know what
encodings should be used in their projects, and the times they don't should be
able to contact someone who will.  So, if the project has an encoding which can
be used as the default to use for files (saving and opening if encoding not set
on the file) and also allow setting a specific encoding for a specific file
inside the project it would be a good starter in my opinion.

Obviously Property files have issues per the documentation:
"When saving properties to a stream or loading them from a stream, the ISO
8859-1 character encoding is used. For characters that cannot be directly
represented in this encoding, Unicode escapes are used; however, only a single
'u' character is allowed in an escape sequence. The native2ascii tool can be
used to convert property files to and from other character encodings." and
possibly some other files.  For those types of files native2ascii could be used
at build time to convert them at build.  This could be associated in the UI with
a check box to allow the user to tell the build system which files to convert on
build and I imagine debug and run.

CVS/other VCS with no unicode/multibyte capability poses other issues which I do
not think should be a part of the scope of the IDE and this issue other than to
allow the user to mark the files as binary for checkin purposes as that is a
function of a chosen source control system i.e. the IDE developers can not play
jacks of all trades and VCS systems themselves are outside the scope of their
task, and if users want a given VCS system to support multibyte characters they
have other forums for those requests.

Comment 33 _ wadechandler 2006-08-21 19:30:12 UTC

Actually in the case where native2ascii could be used it could also be possible
to plug-in other tools which accept common parameters, and the switches can be
included as long as some type of variable notation/template could be used to
allow the correct file input and output parameters to be used in the command
line for the external tool to know which file to operate on and output. 
However, that could be done later, and from the start only support native2ascii.

Comment 34 anchoret 2006-08-22 03:51:04 UTC

hi, ppisl,
Thanks for you support. "Webapp field" that I said means that in desktop
application programming field the encoding program is not as important as in
webapp field because in desktop app field we can use .properties file and
native2ascii, etc to resolve encoding problem of output contents and can
tolerate that only OS default encoding can be used except we are in a team and
everyone has own OS encoding, and also because in webapp field web page file
must be written in encoding which can express its content, e.g UTF-8, and
usually there are some other files, e.g. javascript file, need to be included by
the web page file but these "other files" can't be set to other encoding from OS
default. 
expecting you can start a project to resolve these problems. I would like to put
my effort into it.

Comment 35 anchoret 2006-08-22 03:53:50 UTC

I have post a comment to community before. please reference 
http://www.netbeans.org/servlets/ReadMsg?list=nbusers&msgNo=67055

Comment 36 anchoret 2006-08-22 04:23:07 UTC

hi, wadechandler, 
Our great netbeans is not only for starters.
In desktop programming field, if we don't work in a team consists of ones who
have own OS default encoding, we can use native2ascii to resolve the problem. In
fact, we should do so. But in web application field, as what I said to ppisl
above, things are not so simple.

.properties files don't need to set encoding, for according to java
specification, iso8859-1 and unicode escapes is for that Although in coming
Mustang non-ISO8859-1 encoding will be supported.

About team programming, there CVS etc. are usually used, the problem is more
serious. We know, in java files and many plain text files, there are no any tip
bytes such as FF FE, etc, to tell what encoding the files have. So there must be
some mechanisms like cookies to record the encoding and this is responsibility
of the author (equivalent to IDE) because only he (or say IDE) knows what
encoding the files have but not the responsibility of CVS system because
perhaps, when we create these files, we don't submit them to CVS system. And,
when we use CVS, how can CVS knows these encoding if there are no any recoding
files?

Comment 37 _ wadechandler 2006-08-22 07:56:07 UTC

On web files: JSP and HTML and XHTML files can use an encoding, so you would
want to save in the encoding you say you are using.  Depending on the target web
browser JS files can use encodings as well depending on how they are included in
your pages, and the JS files themselves use the charset said to be used.  What
other issues would be specific to web applications?

On properties files: If you wanted to use your local properties files natively
and type as you normally would you would have to use native2ascii and have it be
either part of the build process or just allow the user to handle that otherwise
you could just escape everything as you needed manually, but it seems that would
be a pain, and with the upcoming property file changes that wouldn't even
matter, but before that becomes widely adopted you wouldn't want to have to use
escapes for every single character in a properties file, which brings you to
using native2ascii, and you would some how want build time conversion (if you
wanted that to be as productive as possible).

On CVS: CVS itself doesn't support multi-byte or unicode characters with things
such as diff AFAIK as it has no mechanism to take the encoding and pass it off
to other diff applications or easily know the encoding for certain file types,
but will take what it gets and give back what it got.  So to store these file
types you will need to use binary in CVS.  Until CVS allows one to pass an
encoding parameter and use that for the file and allow encoding translation to
allow diffing if the encoding changes and diffing multi-byte data as it is then
actually using CVS for anything other than binary storage of data with a
multibyte encoding will just not work unless native2ascii has been run on the
code before it is stored into CVS system, but then diffs would be a
nightmare....all the \u### in there would be horrible.

Also, if the "cookie" files were not stored in CVS then it would certainly seem
like a huge waste of time for the person on the other side checking out an
application/project to have to manually setup all the encodings of the file. 
That is where java being able to understand multiple encodings comes in handy,
so the encodings could be stored in the "cookie" files and shared.  

File names being encoded is another issue entirely which needs to be discussed.
 They could be stored in the project encoding information which gets loaded into
a Hashmap which could be searched based on the values given to it without a
known encoding.  So the encoding information could be something like...
fileenc0=<UTF-8 file name charset id>:<UTF-8 simple integer for file name byte
length>:<file name in the encoding with bytes of stored in UTF-8 HEX if encoding
is anything other than UTF-8 or ASCII (easier to read if debugging is ever
needed)>:<UTF-8 file charset id>
so simple example
fileenc0=UTF-8:13:FileName.java:UTF-8
anyways, it might work something like that, or store it in XML, obviously with
directory entries needing to be stored there some where.

So what I'm getting at is being able to set the encoding for the project and
individual files in the extra files or project information (you call it cookie)
and then handle it from that perspective without attempting to guess the
encoding of any files for a starter to get it going, and understanding there are
issues with source control systems which can't be completely resolved without
modifications to those systems which are outside of the NB teams domain. 
Obviously the XML and HTML file types need to have UTF encoding for the first
line which explains their encoding.

Comment 38 _ wadechandler 2006-08-22 07:56:33 UTC

On web files: JSP and HTML and XHTML files can use an encoding, so you would
want to save in the encoding you say you are using.  Depending on the target web
browser JS files can use encodings as well depending on how they are included in
your pages, and the JS files themselves use the charset said to be used.  What
other issues would be specific to web applications?

On properties files: If you wanted to use your local properties files natively
and type as you normally would you would have to use native2ascii and have it be
either part of the build process or just allow the user to handle that otherwise
you could just escape everything as you needed manually, but it seems that would
be a pain, and with the upcoming property file changes that wouldn't even
matter, but before that becomes widely adopted you wouldn't want to have to use
escapes for every single character in a properties file, which brings you to
using native2ascii, and you would some how want build time conversion (if you
wanted that to be as productive as possible).

On CVS: CVS itself doesn't support multi-byte or unicode characters with things
such as diff AFAIK as it has no mechanism to take the encoding and pass it off
to other diff applications or easily know the encoding for certain file types,
but will take what it gets and give back what it got.  So to store these file
types you will need to use binary in CVS.  Until CVS allows one to pass an
encoding parameter and use that for the file and allow encoding translation to
allow diffing if the encoding changes and diffing multi-byte data as it is then
actually using CVS for anything other than binary storage of data with a
multibyte encoding will just not work unless native2ascii has been run on the
code before it is stored into CVS system, but then diffs would be a
nightmare....all the \u### in there would be horrible.

Also, if the "cookie" files were not stored in CVS then it would certainly seem
like a huge waste of time for the person on the other side checking out an
application/project to have to manually setup all the encodings of the file. 
That is where java being able to understand multiple encodings comes in handy,
so the encodings could be stored in the "cookie" files and shared.  

File names being encoded is another issue entirely which needs to be discussed.
 They could be stored in the project encoding information which gets loaded into
a Hashmap which could be searched based on the values given to it without a
known encoding.  So the encoding information could be something like...
fileenc0=<UTF-8 file name charset id>:<UTF-8 simple integer for file name byte
length>:<file name in the encoding with bytes of stored in UTF-8 HEX if encoding
is anything other than UTF-8 or ASCII (easier to read if debugging is ever
needed)>:<UTF-8 file charset id>
so simple example
fileenc0=UTF-8:13:FileName.java:UTF-8
anyways, it might work something like that, or store it in XML, obviously with
directory entries needing to be stored there some where.

So what I'm getting at is being able to set the encoding for the project and
individual files in the extra files or project information (you call it cookie)
and then handle it from that perspective without attempting to guess the
encoding of any files for a starter to get it going, and understanding there are
issues with source control systems which can't be completely resolved without
modifications to those systems which are outside of the NB teams domain. 
Obviously the XML and HTML file types need to have UTF encoding for the first
line which explains their encoding.

Comment 39 Marian Petras 2006-08-22 13:38:08 UTC

This issue blocks two P2 defects so I change it from "P1 ENHANCEMENT" to "P2
DEFECT". The two defects are:
    #55810 - I18N - Wrong character encoding during compilation
    #79337 - I18N - Find in Projects ignores Java source file encoding

Comment 40 Ken Frank 2006-08-22 16:39:57 UTC

Marian, since this issue is now defect and talks about encoding for various file
types
or project wide encoding, should it be split into different issues ?
I can do that if given enough info and wording for each.

ken.frank@sun.com

Comment 41 Jesse Glick 2006-08-22 21:12:43 UTC

Please don't split this up, it will just create more confusion. When there is
someone available to work on it, that person will investigate what is needed in
detail and come up with a proposal. Right now this issue is not scheduled.

Comment 42 Ken Frank 2006-08-22 21:45:05 UTC

OK, won't split this up. 

To dev team, not asking just Jesse here, What is process to get it discussed and
scheduled internally ?

ken.frank@sun.com

Comment 43 anchoret 2006-08-23 06:50:35 UTC

hi, wadechandler,
in fact, your opinion is not so different as mine. There is a little difference
about CVS at first, but now they are consistent some how.
About web pages: JS file is only a example and there are many other examples in
this issue and 32028, 19928 etc. Of course, I know, with html, jsp, xml files,
there are mechanisms to recognize encoding. I also know the tag <script
charset="".... But why I adopted UTF-8 as the encoding of those html,jsp files?
Because there are some characters or hieroglyphs can't be expressed with the OS
default encoding. So do in javascript files. e.g. a web page about foreign
language learning. Certainly, we can write or paste all javascript codes inside
the jsp files. But that is ugly. It's a little awkard that Netbeans, as such a
powerful and great IDE, can't deal with multi-encoding of pure plain text files.
About .property files: no comments. I agree with you since last comment.
About CVS: So, now, you acknowledge that some setting or "cookie" should be
brought in. I agree with you that the setting files should be save in project
directories and this is just a shortcoming of NB-- it save the java files'
encoding info in userdir which can't be exported through CVS. Please reference
http://www.netbeans.org/servlets/ReadMsg?list=nbusers&msgNo=67055.

Comment 44 abs 2006-08-23 15:09:27 UTC

Japanese users talk about this issue. And we conclude following two point is
necessary.

1. project encoding setting
2. encoding setting for each file

1. project encoding setting
 NetBeans shoud have project encoding setting like eclipse. Each module
should use this encoding. If we set project encoding, OS native encoding
should be ignored.
 Compiling source files follow this setting. For example, all .java
files in a project are compiled as specified encoding property.
 And encoding setting should be shared through project information.

2. encoding setting for each file
 We can set encoding to each file. If we set encoding to each file,
this setting is given priority over project encoding setting.
 And following cases are excluded from this rule because
these files have own encoding setting.

1. jsp file : should use pageEncoding
2. xml file : shoud use xml's encoding
3. html file : should use meta http-equiv's encoding

(see also
http://www.netbeans.org/issues/show_bug.cgi?id=55810&x=30&y=9
http://www.netbeans.org/issues/show_bug.cgi?id=66323&x=13&y=8
)

Comment 45 Masaki Katakai 2006-08-24 03:55:52 UTC

Does anyone know how Eclipse handles these encodings?

I think the last comment (from abs) is a good summary and
the exact way of Eclipse. I think having the setting of
"File Encoding" per project would be the reasonable and
natural way for developers.

Please correct me if I have wrong understaiding about Eclipse.

- Eclipse has a setting of file encoding (for plain text,
  including .java and .js) in project
- by default, the encoding is set to OS encoding
- this setting can be overwrote per individual file
- files are stored in the encoding
- files that can have a encoding information in content
  (e.g. html, jsp), the specified encoding is used
- compiler uses the encoding at compiling

For CVS, because project has the settings of file encoding,
the setting can be shared in group members who check out
the project in different place.

- simply can checkout the project, IDE can find the
  file encoding from the project setting

Eclipse does not care about the "File Name" encoding,
it just uses OS platform encoding, I think. It should
be OK, developers need to take a risk when they're
using such native characters in file name.

Comment 46 Tomas Zezula 2006-08-25 12:19:32 UTC

The knowledge of a java file encoding is required by the Jackpot when it commits
changes to the files. Also implementation of javac's JavaFileManager SPI
requires an encoding of javac file.

Comment 47 lovetide 2006-11-22 07:11:01 UTC

I have an idea: the NetBeans IDE can add two simple and convenient dialogs when 
process text files:
"Reload As Different Encoding"/"Save As Different Encoding". With these two 
dialogs, the developers can manually control the encoding of every text file.

A windows application named EditPlus is a good text editor which can deal with 
different encoding files. And, even the Notepad of Microsoft Windows can "Save 
As" a different encoding file although it can't "Reload As".


Note: for HTML or JSP file, "Reload As Different Encoding"/"Save As Different 
Encoding" is also needed. because the actual encoding of HTML/JSP file may 
different to the encoding specified in <meta http-equiv="Content-Type" 
content="text/html; charset=XXXXX"/> tag for some reasons.

Comment 48 Ken Frank 2006-11-28 16:26:41 UTC

To developers on the cc list of this issue:

1. adding plan60 status whiteboard word so it can get on nb6 features plan list,
at suggestion of management.

Is this sufficient action to get this onto the planning process for nb6.0 ?

2. referring here to related internal document 
http://jupiter.czech.sun.com/wiki/view/Nbplan/NbFeature1091

that refers to this item and others on same topic --

--should the info in that document need to be entered as a separate issue(s)/rfes
in issuezilla or does it cover what is being discussed in this issue ?

3. in addition to comments in this issue and votes for it, there are still
mailing list postings about that users feel encoding support for
text based files and projects themselves and other encoding related
topics in this issue or in the NbFeature1091 is needed.

4. can text of NbFeature1091 document be placed as attachment to this issue
so visible to community ?

5. to community users - please add additional comments or reasons about this
issue and if you feel having it in nb6 would be helpful. If you have not voted
for the issue yet, and you want to do that, please do so.

ken.frank@sun.com

Comment 49 vigor 2007-01-09 11:02:52 UTC

make a project configuration setting - "files encoding" and use this encoding in
opening and saving files in current project

Comment 50 rmatous 2007-01-09 16:02:37 UTC

*** Issue 74766 has been marked as a duplicate of this issue. ***

Comment 51 tprochazka 2007-01-11 14:35:53 UTC

My idea is one default Encoding for everything (default: UTF-8)
except special files as .property (which must be ISO-8859-1)
or any other files as JSP, HTML, which has specified encoding in its own 
context.

Important is store this encoding (specially for source code) to project files a 
use it when run cimpile or javadoc task.

Aditional feature woudl be possibility set explicit encoding for some files in 
it's properties.

Comment 52 Ken Frank 2007-01-11 15:25:55 UTC

WRT previous comment -
My idea is one default Encoding for everything (default: UTF-8)
except special files as .property (which must be ISO-8859-1)

--- just checking that this default encoding will apply to text and other text based
files (beside properties) which do not yet have encoding handling provided and
also will allow these text
based files to set a different encoding in their own properties ?

(besides having a project wide settable encoding)

ken.frank@sun.com

Comment 53 Tomas Zezula 2007-01-12 08:04:14 UTC

I would like to do it a bit more extensible. There will be an default encoding
which can be overriden by project encoding (all files in the project have the
same encoding). The encoding query will provide SPI which can be used by other
modules to provide explicit encoding (eg. HTML module may provide encoding based
on th encoding attr).

Comment 54 Tomas Zezula 2007-02-05 12:31:40 UTC

The encoding support will define the following new API and SPI.

API: Provides static method to find an encoding for file. It delegates to
SPI (FileEncodingQueryImplementation).

public class FileEncodingQuery {

	/**
         * Returns the encoding of the given file.
         * @param file to find encoding for
         * @return encoding, never return null
	 */	
	public static String getEncoding (o.o.fs.FileObject file);
}


SPI: Implementations of FileEncodingQueryImplementation are registered in the global
lookup.

public interface FileEncodingQueryImplementation {

	/**
         * Returns the encoding of the given file.
         * @param file to find encoding for
         * @return encoding or null when the encoding of file is not known.
	 */	
	public static String getEncoding (o.o.fs.FileObject file);
}

The encoding support will register three standart EncodingQueryImplementations,
sorted according to priority (upper higher priority):

MIMELookup - tries to find a registered EncodingQueryImplementation in the
MIMELookup to get an encoding. Supports for the JSP,HTML where the encoding is
stored in the file, also supports properties file for which the encoding is
given by specification.

Project - delegates to project's lookup to find the encoding. Supports java,
text and other files which does not contain implicit encoding. The value
returned by this implementation is also passed to the java compiler.

Default - called as a last one, provides the default encoding, the encoding in
which the IDE is started.

In addition to this implementation the module can also define its own
EncodingQueryImplementation which is called before or after any of these queries
are called.

The implementation for J2SE project type will not allow assigning of an encoding
to individual java files, even the API allows it. The assignment of encoding to
individual java files will require several compilation units (one for each
encoding) which will slow down the compilation.

Comment 55 Petr Jiricka 2007-02-05 16:40:56 UTC

Hi, it's great to see this moving forward, the proposal sounds reasonable to me.
It would be good to put the Java project implementation into java/project, so
all Java-based project types can easily use this, not only J2SE project.

BTW, I guess the method in FileEncodingQueryImplementation should not be static.

Comment 56 Jesse Glick 2007-02-05 19:20:56 UTC

Generally looks OK to me. Some comments (is this API_REVIEW?):


[JG01] MIMELookup is one way to support per-file-type lookup. But can consider
(additionally or instead of MIMELookup) looking in DataObject.lookup, i.e. let
the data loader define it.


[JG02] "Support [...] JSP, HTML where the encoding is stored in the file" - same
for XML.

"also supports properties file for which the encoding is given by specification"
- may be more complicated than that now that in JDK 6 there is support for
loading .properties files in other encodings... with no way of indicating the
encoding in the file itself, unfortunately. Need not be dealt with in this
issue, just FYI.


[JG03] Is there any use case for finding the encoding of a java.io.File (or
java.net.URI/URL) without having to first create the FileObject? I am thinking
of full-text search, for which loading FileObject's on a large file tree is a
potentially serious performance hit.


[JG04] Would it make sense to return a java.nio.charset.Charset rather than a
String? This at least leaves open the possibility of producing a custom
encoder/decoder pair to handle special file formats, e.g. \uXXXX escape
sequences. It does not put much burden on implementors because you can easily
call Charset.forName to convert a String name. Charset.defaultCharset can be
used as the fallback value rather than looking up a system property.


[JG05] Rather than putting the fallback implementation in lookup, it can be
hardcoded into the API class, so that it can guarantee a non-null result
regardless of its environment.


To Petr's "It would be good to put the Java project implementation into
java/project, so all Java-based project types can easily use this, not only J2SE
project" - I suspect the query implementation would be too trivial to share.
Just load a single property from the project's evaluator and return it.

Comment 57 Petr Pisl 2007-02-06 09:32:42 UTC

[PP01]"Supports for the JSP, HTML where the encoding is stored in the file ... "

In the jsp case this is not true. If you want to find the right encoding for a
jsp file, you need to parse deployment descriptor, where can be defined charset
for a particular jsp file or for a group of files. The exact algorithm is
describe in the jsp specification. Html files are not also so simply. For
example html fragments don't have defined an encoding and have the same encoding
as the html file, which includes them. So the implementation of
FileEncodingQuery can be in some case time consuming.

[PP02] Will be there any ui change with this? I'm thinking about "Character Set"
property support, which will be accessible through Property Sheet.

Comment 58 Tomas Zezula 2007-02-06 09:35:05 UTC

[JG01] I agree that using the DatObject's lookup may be better than using
MIMELookup, I will change it. 

[JG02] When the API will be in the CVS a will create a task for JSP, XML, HTML
and properties DataObjects to support this API.

[JG03] It would be good to provide encoding even for URL, but the problem is
with the FEQI registered for given mime type in the DataLoader's lookup. I will
need to convert the URL to the DataObject to find the query. Or is there any
other solution?

[JG04] Charset rather than String.  I agree, I will change it.

[JG05] Default impl part of FEQ. It's done in this way.

Comment 59 Tomas Zezula 2007-02-06 09:47:46 UTC

I know the problem with JSP and this is a reason why the Project's EFQI is
called before mime type EFQI. The API allows you to register the implementation
(probably in the project) which uses the deployment descriptor and answer by the
correct value. The API supports the JSP case if it will be implemented by
implementation. The performance problem can be solved by caching.

The UI part depends on the project. In the J2SEProject I want to add an ant
property project.encoding and extend (probably the source tab) by text filed to
allow user to change it. By default the project.encoding will be set to the
default encoding. I don't want to allow per file encoding setting for java files
since it requires separate compilation units (slows down compilation, makes
build-impl.xml more difficult), but it may be changed in the future, the API
allows it. I understand that in the Web project you may need some UI to change
the encoding of JSP's stored in the deployment descriptor. For J2SEProject the
text field in the customizer should be enough.

Comment 60 tprochazka 2007-02-06 09:47:57 UTC

I vote primarily for default encoding option in settings of netbeans.
When a new project will be created, NB should use this setting for it. 

Default encoding must be specified in project files. (So working on Netbeans which 
has another default encoding should set right encoding). 

The encoding will be used for all Java sources and other files which don’t have 
theirs own encoding specification (XML, HTML, JSP). '.properties' files can have 
the same encoding too. 

Netbeans should configure the ant task to accept default encoding and should use it 
for compiling source code, creating javadoc and also should convert .properties to 
ISO 8859-1.

Comment 61 Tomas Zezula 2007-02-06 09:59:54 UTC

tprochazka: You described what the prototype implemented for the J2SEProject
does and I tried to describe in my comments. :-)
The only difference is that there is no IDE setting for global encoding or I
didn't find it, I am using System.getProperty ("file.encoding"). But it may be
added if the HIE agrees.

Comment 62 tprochazka 2007-02-06 14:40:29 UTC

This is good.

But I think, that use System.getProperty ("file.encoding") it's not too good 
idea. Because on Windows it is windows-1250, on the Linux ISO-8859-2 for 
Central Europan languages. And exist many different encodings. Maybe is much 
better set default encoding to UTF-8. In setting can be option "Use system 
default encoding". But when most people will be using one encoding world will 
be better.

If NB will be use System.getProperty ("file.encoding"). Many peoples will be 
must change setting for every new project.

Comment 63 Tomas Zezula 2007-02-06 15:28:12 UTC

Both my Solaris and Linux (Suse) are UTF-8, but you are right that Czech version
of windows has cp1250 by default, maybe to keep the compatibility with DOS :-).
The UTF-8 as default sounds reasonable.

Comment 64 Jesse Glick 2007-02-06 17:50:19 UTC

JG03 (query on URL) - maybe this is impractical (and anyway it could be
compatibly added as an option later). I was more wondering if you knew of
existing use cases for this, such as search.


More on JG04 (String vs. Charset) - take a look at issue #94676. The problem
there is that you want to create a new .xml file (though this would also apply
to .html I think) and write some arbitrary String content to it. You need an
encoding *before* you write the content. Calling FileEncodingQuery on the new
.xml file cannot return a meaningful String result since it is empty. I can
think of the following approaches:

1. In this particular case, choose the same encoding for the output file as for
the input template, i.e. do not call FEQ at all for the output. Would probably
work fine for the XML case, but may fail for e.g. Java sources whose encoding is
to be controlled by the destination project and may well differ from the
encoding used by templates bundled in a layer file.

2. Have FEQ return a Charset. Then XMLDataObject can implement the query to
ignore the file argument but return a special subclass of Charset whose Decoder
"sniffs" the XML header, and whose Encoder buffers output until it has seen the
XML header and then encodes all further characters according to the declaration.
Some work to implement, but optimal behavior. (Note that Swing's HTML editor kit
/ renderer does something very similar.)

3. Write the XML file once in UTF-8. Then ask FEQ for its encoding (which would
sniff the header) and overwrite it in that encoding if different.
Straightforward, though inefficient if other encodings are used frequently.
Might fail for some UTF-16 encodings, though I think these are rarely used anyway.


[JG06] +1 on UTF-8 as a default explicit encoding for new projects; though you
could also have an "invisible" Preferences setting which would be updated
whenever you explicitly set a different encoding for an existing project, which
would on average give people the encoding they want without any added GUI.
Probably ${file.encoding} should remain as the fallback encoding in FEQ, as FEQ
is an API which would be as neutral as possible.


[JG07] BTW where do you propose putting FEQ? projects/queries? openide/fs?
Elsewhere?

Comment 65 Petr Jiricka 2007-02-06 19:26:37 UTC

PJ01 - Regarding Jesse's comment More on JG04 (String vs. Charset) - a similar
situation will arise when saving file in the editor. In this case, method
saveFromKitToStream in EditorSupport will need to take encoding into account,
and it will need to consider the String in the editor, not the file on the disk
(as the encoding may differ). This may somehow be possible with the Charset, but
not with String. See also implementations of EditorSupport for JSP, HTML and
XML, e.g.:
web/core/src/org/netbeans/modules/web/core/jsploader/BaseJspEditorSupport.java
Our current approach is to have a parameter to the encoding detection method
that decides whether the encoding should be taken from the file or from the editor.

BTW, I expect that the base EditorSupport class will be changed to take encoding
into account, rather than forcing the subclasses to take care about it, right?

Comment 66 Tomas Zezula 2007-02-07 08:57:06 UTC

[JG03]: One more component which may benefit from method taking File|URL is a
RepositoryUpdater when it's doing initial scan.

[JG04] The API was changed to return the Charset, so it allows any of the
suggested solutions. For java files (J2SEProject) the new from template can
call getEncoding(template) read the content, create new empty java file, call
getEncoding on it and use this encoding to store the content in the project. For
the XML it will be harder since the content may contain explicit encoding. The
xml module have to do what you described in point 2. It shouldn't be so hard
since they need to do some similar encoding detection even now.

[JG06] OK, default project will be "UTF-8". When user changes the encoding in
project, the new encoding becomes default.

[JG07] Currently it's in the project/queries.

Comment 67 Tomas Zezula 2007-02-07 09:09:30 UTC

PJ01: Also JSP, XML case. Does not hold for Java and other files which don't
contain stored encoding. It can be solved using custom implemented encoding as
Jesse suggested in [JG04-2]. The change for your code is that instead of passing
the parameter to the encoding method you will create an "smart" Charset with it.
I hope that the EditorSupport will be rewritten. But I don't know much about
this class. I will fill an issue to it when the API will be available.

Comment 68 Tomas Zezula 2007-02-07 10:45:57 UTC

[JG06] Storing the last used encoding in the preferences. This will require
either to define a path in the preferences as a part of the API, not nice.
Another solution is to add a Charset getDefaultEncoding () returning the
encoding which should be used by project generator, this is not problem. But
also setter will be needed setDefaultEncoding (Charser encoding) since the
project needs to set a new default encoding when it's changed in the project
properties.

Comment 69 tprochazka 2007-02-07 19:06:11 UTC

[JG06] Storing the last used encoding in the preferences. 
Yes. This is much better solution. Settings for default encoding in Tools-
>Options is better for me, but solution [JG06] is also great.

Comment 70 Tomas Zezula 2007-02-09 14:48:20 UTC

Created attachment 38298 [details]
Diff file with the new API/SPI

Comment 71 Tomas Zezula 2007-02-09 14:50:12 UTC

Created attachment 38299 [details]
Default implementations (delegate to DataObject's and Project's lookup)

Comment 72 Tomas Zezula 2007-02-09 14:51:15 UTC

Created attachment 38300 [details]
Patch of j2seproject

Comment 73 Tomas Zezula 2007-02-09 14:53:07 UTC

Created attachment 38301 [details]
Internal javac IO layer

Comment 74 Tomas Zezula 2007-02-09 15:01:17 UTC

The project_queries.diff contains the actual API and SPI.

The projectapi.diff contains the default implementations of
FileEncodingQueryImplementation (the first delegates into DataObject's lookup,
the second delegates into the Project's lookup).

j2seproject.diff contains the implementation of FileEncodingQueryImplemantion
registered in the project's lookup. This implementation provides encoding for
all files owned by the j2seproject. It also contains a test of the query. The
project generator was changed to generate the project with default encoding.
Project customizer (sources) was extended to allow user to change the encoding.
The build-impl.xsl was changed to generate the javac task with encoding parameter.

The java_source.diff contains the JavaFileObject using the encoding for
reading|storing java files.

Any comments are welcomed.

Comment 75 Jesse Glick 2007-02-09 18:03:10 UTC

[JG08] This code:

String _encoding = System.getProperty("file.encoding");     //NOI18N
assert _encoding != null;        
encoding = Charset.forName(_encoding);
assert encoding != null;
return encoding;

should be replaced with

return Charset.defaultCharset();


[JG09] Why doesn't j2seproject/resources/build-impl.xsl use ${file.encoding}
directly in the <javac> def, rather than introducing another property
project.encoding?


[JG10] CustomizerSources should use a combo box for the encoding, not a text
field. Perhaps call Charset.availableCharsets().values() and render using
Charset.displayName(). It might in fact be useful to define a support SPI in
projectuiapi which would provide a ComboBoxModel<Charset> and a
ListCellRenderer<Charset>, since I expect other project types to need the same
thing.


[JG11] "//When using MockServices.setServices() the DummyXMLEncodingImpl isn't
registered." - explanation?

Comment 76 Tomas Zezula 2007-02-12 12:22:04 UTC

[JG08] Fixed.

[JG09] The property project.encoding is needed since the file.encoding cannot be
defined in the project.properties. It's defined by the JVM and ant's property
task doesn't allow to overwrite it. So, it's used as a default when
project.encoding is not set (old projects).

[JG10] Fixed, I've changed the text filed into combo box. I am not sure which
the supporting API. The model + renderer is less then 50 lines. But it's
possible to add such an API later.

[JG11] The original problem was that I've used TestUtil.makeScratchDir which
overrides the Lookup. I'm not using TestUtil anymore. But the bigger problem is
the order of the services registered by the MockServices. The service registered
with MockServices is the last one in the Lookup. The
MockServices.ServiceClassLoader creates an service registration file for the
registered service and places it in the front of existing ones. But the other
services have position since the order is important for them. The created
registration doesn't have position so it's placed in the end.

Comment 77 Tomas Zezula 2007-02-12 14:58:29 UTC

Created attachment 38376 [details]
Diff of project/queries arch.xml

Comment 78 Tomas Zezula 2007-02-12 15:20:07 UTC

Summary for API review:

Description:
An API for finding an encoding of a file. The lack of such an API causes that
internal java parser uses wrong encoding to read files. The same holds for the
java compiler during the build process. User is not able to set an encoding on
the project level (#55810). There are several modules which have to be updated
to use this query. Firstly project types should provide
FileEncodingQueryImplementation in the project lookup, otherwise the default
encoding is used - behaves in the same way as the NetBeans 5.x. When the
DataObject requires a special encoding handling, like HTML - encoding contained
in the file, the DataObject should provide the FileEncodingQueryImplementation
in its lookup. The datasystem (DataObject.createFromTemplate, DataObject.copy)
should use the query. Internal parser as well as j2seproject are already fixed.


API changes:
The new query FileEncodingQuery API and FileEncodingQueryImplementation were
added, see the attached diff files.

API stability: Stable in 6.0

Architecture overview: See an attached diff file.

Comment 79 tprochazka 2007-02-18 13:19:51 UTC

I tested new API and I have some questions.

[JG12]Why projectapi.diff create this service file: src/META-INF/services/
org.netbeans.spi.project.FileEncodingQueryImplementation, when 
FileEncodingQueryImplementation is in org.netbeans.spi.queries package?

[JG12]When I save project properties I got this:

java.lang.IllegalArgumentException: Null charset name
	at java.nio.charset.Charset.lookup(Charset.java:430)
	at java.nio.charset.Charset.forName(Charset.java:503)
	at 
org.netbeans.modules.java.j2seproject.ui.customizer.J2SEProjectProperties.storeProperties(J2SEProjectProperties.java:445)

I fixed it with             if (value!=null) 
FileEncodingQuery.setDefaultEncoding(Charset.forName(value));
in the J2SEProjectProperties.java:445 file. 

[JG13] And editor still load and save in system default encoding. This is not 
implemented at the moment, isn't it?

I'm sorry if this is stupid question, but this is first time, what I try 
explore netbeans API and compile it.

Comment 80 Jesse Glick 2007-02-19 19:51:21 UTC

I guess the last comments should be TP01, TP02, TP03. :-)

Comment 81 Tomas Zezula 2007-02-26 13:40:21 UTC

Sorry for the latter response, I was on the vacation last week.
TP01 - it's an old diff, the correct file is:
projectapi/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImp
lementation. I will attach the current projects diff.

TP02 - rewrite of the JavaEditorSupport is not yet done, it's issue depending on
this one.

TP03 - already fixed, in the same way as you did. I will attach the current
j2seproject diff.

Comment 82 Tomas Zezula 2007-02-26 13:42:56 UTC

Created attachment 38905 [details]
Fixed project's diff, see TP01

Comment 83 Tomas Zezula 2007-02-26 13:43:48 UTC

Created attachment 38906 [details]
Fixed j2seproject's diff, see TP03

Comment 84 Jaroslav Tulach 2007-02-27 17:25:13 UTC

Y01 As far as I can judge: For the purposes of issue 94676 it is enough to get 
Charset for a FileObject and create InputStreamReader and possibly also writer 
using the Charset as parameter. I have nothing against it.

Y02 I do not like the public static setter method in the APIs, but I can live 
with it if you document use of NbPreferences in arch <api group="prefs"/> and 
write a test for it ;-?

Comment 85 Tomas Zezula 2007-02-28 08:32:42 UTC

[Y02] I don't like the setter in the API too. But the other option was to let
the clients to access directly the preference node. The path to preference node
will be the API which is even worse. The usage of the preferences is already
documented in the attached arch.diff. The test will be added.

Comment 86 Jaroslav Tulach 2007-02-28 17:13:44 UTC

Notes from the meeting (Tomáš Zezula, Marek Fukala, Jesse Glick, Jaroslav 
Tulach):

TZ: should the api be in project/queries or be in filesystems?
JG: no problem being in project/queries
JT: DataEditorSupport in openide/loaders would need new dependency on 
project/queries
JG: Let's keep it where it is.

JT: Tomáš will fix DataEditorSupport to use the new query

JG: Are there many subclasses that override load/store from kit to stream?
JG: Will then the query open the stream twice?
JG: XML and HTML will first care about encoding specified in the file, and 
only if it is missing fallback to the query.
JG: We cannot prevent opening the stream twice.

JT: DefaultData...Impl should be probably in openide/loaders
JG: A protected method in DataEditorSupport to query the "default" method not 
counting in the advice provided by DefaultData...Impl.
JG: Preferred way is to not override the saveKit at all, instead provide good 
Charset with custom encoder/decoder

JT: Use CachedFileObject
JG: Possible problems with delegation especially FileOwnerQuery
JG: If we put the CachedFileObject impl into DataEditorSupport is it not 
useful for other usages without EditorSupport


JG: Will getEncoding and DataEditorSupport work for writting?
TS: Will work but not with the delegation.
JG: That would work with thread local.

Comment 87 Jaroslav Tulach 2007-02-28 17:30:21 UTC

MF: If the file is modified in editor, which encoding shall be choosen?
TZ: The one of the file as this is FileEncodingQuery
MF: What if the encoding in editor is 'iso-8859-1' and you use Č? This is 
unsaveable to disk...

TCR rewrite data editor support to use FileEncodingQuery, and make it in a way 
that everyone can use the same (e.g. no tricks in DataEditorSupport).

Comment 88 Jesse Glick 2007-03-01 05:17:22 UTC

Here is some summary from what I understood:

1. DOFEQ should be moved into openide/loaders. This requires a dep
openide/loaders -> projects/queries but we need that anyway for
DataEditorSupport and no one seemed upset about one more dep from loaders. So
TCR to get it out of projectapi, where it doesn't really belong anyway.

2. TCR to make DES work with FEQ by default, so that most editor supports would
not need to override loadFromStreamToKit or saveFromKitToStream - the default
impl, together with intelligent FEQI implementations, would handle it. Also the
impl in DES should be simple and clear - just get the Encoder/Decoder and use it
- because other code not using the editor will likely need to do the same (e.g.
Search and Replace in Files, if it should operate on characters and not bytes).

3. There was some unresolved discussion about how to handle unencodable
characters during save. Apparently the JSP editor kit currently overrides sFKTS
and prompts the user what to do. (Cancel, change encoding, etc.) This is still
an option for editor kits which want to do it. But an Encoder returned from a
FEQI must not prompt with a dialog! Options:

3a. Leave it up to certain editor kits to override sFKTS to do something
special. Works but clumsy.

3b. Use UserQuestionException or similar to indicate an encoding error that
could be corrected. But puts the burden of complex logic on the user of Encoder,
which is undesirable.

3c. Quietly try to fix up the situation in a content-type-specific way, e.g.
using XML or HTML character entity escapes, changing an explicit encoding to
UTF-8, just writing '?', etc. Optionally try to make up for this by adding
editor hints for unmappable characters in the editor window, offering various
fixes (potentially project-specific or interactive) while the document is still
being edited. This is the "moving the patient to another hospital" solution.

4. A long discussion about how best to handle content types which permit but do
not require explicit in-band encoding declarations, where out-of-band
information might also be applicable. (For example, XML's declaration or HTML
content-type headers, with a fallback to a general project-specific encoding, or
even an encoding derived from web.xml in a web app, etc.) A constraint is that
it is undesirable to reopen the file several times when loading it. There were
two basic proposed approaches.

4a. Using the current SPI and infrastructure, the FEQI for e.g. XMLDataObject
would usually return a constant special Charset object (without examining the
file). Its Decoder would start reading and buffering content, sniffing for an
explicit encoding decl. If it finds one, it decodes the buffered content with
that encoding and continues with the rest of the file. Otherwise, it sets a
thread-local flag and calls FEQ on the same file. While the flag is set, the
loader's FEQI returns null so it does not get itself back. The returned value -
whether from a specific source such as the project, or simply the fallback
Charset.default - is used to decode the file. Similarly, the Encoder sniffs the
character stream and preferentially writes out content using an explicit
encoding, falling back to an out-of-band encoding. Pros: fairly straightforward
to implement using current proposal. Cons: ThreadLocal is a bit ugly; impossible
to add a third-party encoding sniffer, if that ever becomes a requirement.

4b. (Yarda's idea) Make it possible for various FEQIs to cooperate on sniffing
in-band encodings in different ways. One suggested approach involved a proxy
FileObject whose inputStream would be buffered, as we do for MIMEResolver's.
Another suggestion (minority opinion?) was for FEQ to always return a proxy
Charset object whose Decoder would buffer up some content and then delegate to
the real Decoder's, permitting a Decoder to throw some special exception in case
it decides it lacks sufficient in-band information to continue. The intended
solution to encoding (writing) was less clear to me - might require a different
SPI? Pros: potentially more elegant composition of independent in-band encoding
sniffers. Cons: probably more complex implementation; possibly more complex API
or SPI, especially if the impl in DES cannot be made trivial.

Comment 89 tprochazka 2007-03-01 17:41:20 UTC

[TP04] And what about .properties file? Will be NB use UTF8 encoding when I 
will be set UTF-8 project or will be keep it in ISO-8859-1 and convert it to 
UNICODE when I open them in Editor?

Comment 90 Tomas Zezula 2007-03-01 17:52:37 UTC

[TP04] Marian Petras should as an owner of properties module should decide it. I
would prefer the second solution.

Comment 91 Marian Petras 2007-03-01 17:57:29 UTC

TP04 - This is not clear at the moment. The ideal state is that the user can
choose from two options:
  a) .properties file is always saved in ISO-8859-1 and characters having
Unicode 0x00ff and above are translated to form \uxxxx where 'xxxx' is the
respective Unicode value.
  b) .properties file is saved with some other encoding (probably the project's
default encoding). Characters that cannot be encoded using the encoding are
highlighted and the user is warned if he/she tries to save a file containing
such characters.

Option a) is not doable for NB 6.0. Option b) is doable.

The current state is that the user is only warned if they open a .properties
file already containing non-ISO-8859-1 characters.

Comment 92 Tomas Zezula 2007-03-01 18:06:58 UTC

[TZ01] Why option a) is not doable? Is it question of resources or do you see
any implementation problem? It should be doable by implementing Charset encoder
and decoder. But option b) is simpler.

Comment 93 Marian Petras 2007-03-01 18:18:43 UTC

TZ01 - I take it back. I did not realize advantages of the new API that it is
just a matter of using the appropriate Encoder/Decoder. The Encoder should be
pretty simple. The Decoder would be much more complex but the decoding algorithm
is (must be) already implemented in classes PropertiesParser and UtilConvert (in
the Properites module) so it should not take so much time either.

If the above ideas prove to be true, I will implement option a). If I have even
more time, I will also implement b) and a UI for the user so that he/she can
choose between a) and b). If I do not have time for a), I will only implement b).

Comment 94 tprochazka 2007-03-01 18:25:25 UTC

TP04 - I also prefer convert it from ISO-8859-1 when file is opening in IDE to 
Unicode a back to ISO-8859-1 when I save it. But if this is problem :-(

I'm doing that in this way:
Use the same encoding as source file for .properties
and add to xml.build this:

<target name="-pre-jar" depends="">
 <native2ascii encoding="UTF-8" src="${src.dir}" dest="${build.classes.dir}" 
includes="**/*.properties" />
</target>

Comment 95 Jesse Glick 2007-03-01 18:57:00 UTC

For TP04, option (a) also seems most attractive to me. Remember that while JDK 6
(no earlier JDK) permits you to load .properties in another encoding, you have
to write special code to do this; ResourceBundle will by default be broken. This
is not very friendly. It is safest to use the traditional ISO-8859-1 encoding
(or even ASCII) with escapes.

The Decoder should not be very complex, I think. Straightforward state machine,
easy to write in test-driven style. See EditableProperties for hints.

Comment 96 Miloslav Metelka 2007-03-05 15:53:52 UTC

*** Issue 95888 has been marked as a duplicate of this issue. ***

Comment 97 Petr Jiricka 2007-03-08 16:14:37 UTC

> A constraint is that it is undesirable to reopen the file several times when 
> loading it.

Is this really important? All OS's cache files anyway, reading them twice should
have no performance impact. We currently open the stream for JSP files twice and
it does not seem to be a problem. Is this a majority opinion?

Comment 98 Tomas Zezula 2007-03-13 18:04:20 UTC

I've fixed all the TCRs:
     Rewritten the DataEditorSupport to use FEQ.
     Moved the DataObjectFileEncodingQuery into loaders.

I also changed the FEQ to return the proxy of the Charset as Jarda proposed.
Probably none of the DataEditorSupport subclasses need to override the
loadFromStreamToKit or storeFromKitToStream methods. The developer of FEQImpl
which analyzes the content of file need to provide custom Charset whose
CharsetEncoder and CharsetDecoder sniffs for encoding in the processing data. If
the FEQImpl is implemented in this way the input is red just once. I also have
to thanks to Jarda for his help while implementing the proxy.

I've attached the zipped diff files (zipped because of length of test files). If
there are no other complains I am going to integrate it tomorrow.

Comment 99 Tomas Zezula 2007-03-13 18:05:55 UTC

Created attachment 39451 [details]
Diff files

Comment 100 Jaroslav Tulach 2007-03-13 19:37:12 UTC

Y01 No tests to justify changes in openide/loaders
Y02 Maybe there should be an apichange note that since new version of loaders 
the DataEditorSupport uses the FEQ

Comment 101 Vitezslav Stejskal 2007-03-14 01:09:26 UTC

*** Issue 97320 has been marked as a duplicate of this issue. ***

Comment 102 Tomas Zezula 2007-03-14 09:08:25 UTC

Y01: In dead covered by tests in project/queries.
YO2: I will add the apichange into openide/loaders/apichanges.xml today before
commit.

Comment 103 Tomas Zezula 2007-03-14 11:26:15 UTC

I've added the test for DataEditorSupport to let Jarda calmly sleep during the
nights. :-)

Comment 104 Tomas Zezula 2007-03-14 11:27:23 UTC

Created attachment 39466 [details]
Loaders with test and apichanges

Comment 105 Jaroslav Tulach 2007-03-14 14:18:35 UTC

Perfect, now I can sleep peacefully.

Comment 106 Tomas Zezula 2007-03-14 15:35:22 UTC

Checking in openide/loaders/manifest.mf;
/cvs/openide/loaders/manifest.mf,v  <--  manifest.mf
new revision: 1.31; previous revision: 1.30
done
Checking in openide/loaders/api/apichanges.xml;
/cvs/openide/loaders/api/apichanges.xml,v  <--  apichanges.xml
new revision: 1.26; previous revision: 1.25
done
Checking in openide/loaders/nbproject/project.xml;
/cvs/openide/loaders/nbproject/project.xml,v  <--  project.xml
new revision: 1.27; previous revision: 1.26
done
RCS file:
/cvs/openide/loaders/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImplementation,v
done
Checking in
openide/loaders/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImplementation;
/cvs/openide/loaders/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImplementation,v
 <--  org.netbeans.spi.queries.FileEncodingQueryImplementation
initial revision: 1.1
done
RCS file:
/cvs/openide/loaders/src/org/netbeans/modules/openide/loaders/DataObjectEncodingQueryImplementation.java,v
done
Checking in
openide/loaders/src/org/netbeans/modules/openide/loaders/DataObjectEncodingQueryImplementation.java;
/cvs/openide/loaders/src/org/netbeans/modules/openide/loaders/DataObjectEncodingQueryImplementation.java,v
 <--  DataObjectEncodingQueryImplementation.java
initial revision: 1.1
done
Checking in openide/loaders/src/org/openide/text/DataEditorSupport.java;
/cvs/openide/loaders/src/org/openide/text/DataEditorSupport.java,v  <-- 
DataEditorSupport.java
new revision: 1.40; previous revision: 1.39
done
Checking in
openide/loaders/test/unit/src/org/openide/text/DataEditorSupportTest.java;
/cvs/openide/loaders/test/unit/src/org/openide/text/DataEditorSupportTest.java,v
 <--  DataEditorSupportTest.java
new revision: 1.7; previous revision: 1.6
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/J2SEProject.java;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/J2SEProject.java,v
 <--  J2SEProject.java
new revision: 1.78; previous revision: 1.77
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/J2SEProjectGenerator.java;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/J2SEProjectGenerator.java,v
 <--  J2SEProjectGenerator.java
new revision: 1.54; previous revision: 1.53
done
RCS file:
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/queries/J2SEProjectEncodingQueryImpl.java,v
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/queries/J2SEProjectEncodingQueryImpl.java;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/queries/J2SEProjectEncodingQueryImpl.java,v
 <--  J2SEProjectEncodingQueryImpl.java
initial revision: 1.1
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/resources/build-impl.xsl;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/resources/build-impl.xsl,v
 <--  build-impl.xsl
new revision: 1.83; previous revision: 1.82
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/Bundle.properties;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/Bundle.properties,v
 <--  Bundle.properties
new revision: 1.86; previous revision: 1.85
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/CustomizerSources.form;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/CustomizerSources.form,v
 <--  CustomizerSources.form
new revision: 1.9; previous revision: 1.8
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/CustomizerSources.java;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/CustomizerSources.java,v
 <--  CustomizerSources.java
new revision: 1.14; previous revision: 1.13
done
Checking in
java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/J2SEProjectProperties.java;
/cvs/java/j2seproject/src/org/netbeans/modules/java/j2seproject/ui/customizer/J2SEProjectProperties.java,v
 <--  J2SEProjectProperties.java
new revision: 1.63; previous revision: 1.62
done
RCS file:
/cvs/java/j2seproject/test/unit/src/org/netbeans/modules/java/j2seproject/queries/FileEncodingQueryTest.java,v
done
Checking in
java/j2seproject/test/unit/src/org/netbeans/modules/java/j2seproject/queries/FileEncodingQueryTest.java;
/cvs/java/j2seproject/test/unit/src/org/netbeans/modules/java/j2seproject/queries/FileEncodingQueryTest.java,v
 <--  FileEncodingQueryTest.java
initial revision: 1.1
done
Checking in
java/source/src/org/netbeans/modules/java/source/parsing/SourceFileObject.java;
/cvs/java/source/src/org/netbeans/modules/java/source/parsing/SourceFileObject.java,v
 <--  SourceFileObject.java
new revision: 1.6; previous revision: 1.5
done
Checking in projects/projectapi/overview.html;
/cvs/projects/projectapi/overview.html,v  <--  overview.html
new revision: 1.5; previous revision: 1.4
done
RCS file:
/cvs/projects/projectapi/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImplementation,v
done
Checking in
projects/projectapi/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImplementation;
/cvs/projects/projectapi/src/META-INF/services/org.netbeans.spi.queries.FileEncodingQueryImplementation,v
 <--  org.netbeans.spi.queries.FileEncodingQueryImplementation
initial revision: 1.1
done
Checking in projects/projectapi/src/org/netbeans/api/project/Project.java;
/cvs/projects/projectapi/src/org/netbeans/api/project/Project.java,v  <-- 
Project.java
new revision: 1.18; previous revision: 1.17
done
RCS file:
/cvs/projects/projectapi/src/org/netbeans/modules/projectapi/ProjectFileEncodingQueryImplementation.java,v
done
Checking in
projects/projectapi/src/org/netbeans/modules/projectapi/ProjectFileEncodingQueryImplementation.java;
/cvs/projects/projectapi/src/org/netbeans/modules/projectapi/ProjectFileEncodingQueryImplementation.java,v
 <--  ProjectFileEncodingQueryImplementation.java
initial revision: 1.1
done
Checking in projects/queries/apichanges.xml;
/cvs/projects/queries/apichanges.xml,v  <--  apichanges.xml
new revision: 1.7; previous revision: 1.6
done
Checking in projects/queries/arch.xml;
/cvs/projects/queries/arch.xml,v  <--  arch.xml
new revision: 1.11; previous revision: 1.10
done
Checking in projects/queries/manifest.mf;
/cvs/projects/queries/manifest.mf,v  <--  manifest.mf
new revision: 1.12; previous revision: 1.11
done
Checking in projects/queries/nbproject/project.xml;
/cvs/projects/queries/nbproject/project.xml,v  <--  project.xml
new revision: 1.11; previous revision: 1.10
done
RCS file:
/cvs/projects/queries/src/org/netbeans/api/queries/FileEncodingQuery.java,v
done
Checking in projects/queries/src/org/netbeans/api/queries/FileEncodingQuery.java;
/cvs/projects/queries/src/org/netbeans/api/queries/FileEncodingQuery.java,v  <--
 FileEncodingQuery.java
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/src/org/netbeans/modules/queries/UnknownEncoding.java,v
done
Checking in projects/queries/src/org/netbeans/modules/queries/UnknownEncoding.java;
/cvs/projects/queries/src/org/netbeans/modules/queries/UnknownEncoding.java,v 
<--  UnknownEncoding.java
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/src/org/netbeans/spi/queries/FileEncodingQueryImplementation.java,v
done
Checking in
projects/queries/src/org/netbeans/spi/queries/FileEncodingQueryImplementation.java;
/cvs/projects/queries/src/org/netbeans/spi/queries/FileEncodingQueryImplementation.java,v
 <--  FileEncodingQueryImplementation.java
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/FileEncodingQueryTest.java,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/FileEncodingQueryTest.java;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/FileEncodingQueryTest.java,v
 <--  FileEncodingQueryTest.java
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/data.properties,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/data.properties;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/data.properties,v
 <--  data.properties
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_after_block,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_after_block;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_after_block,v
 <--  encoding_after_block
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_block,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_block;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_block,v
 <--  encoding_on_block
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_block_long,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_block_long;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_block_long,v
 <--  encoding_on_block_long
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_start,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_start;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_start,v
 <--  encoding_on_start
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_start_long,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_start_long;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/encoding_on_start_long,v
 <--  encoding_on_start_long
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/no_encoding,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/no_encoding;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/no_encoding,v
 <--  no_encoding
initial revision: 1.1
done
RCS file:
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/no_encoding_long,v
done
Checking in
projects/queries/test/unit/src/org/netbeans/api/queries/data/no_encoding_long;
/cvs/projects/queries/test/unit/src/org/netbeans/api/queries/data/no_encoding_long,v
 <--  no_encoding_long
initial revision: 1.1
done
Checking in ide/golden/deps.txt;
/cvs/ide/golden/deps.txt,v  <--  deps.txt
new revision: 1.473; previous revision: 1.472
done

Comment 107 Tomas Zezula 2007-03-14 15:58:45 UTC

I've created an umbrella issue #97848 for tracking the update of modules which
should depend on FileEncodingQuery (project types, some DataObjects, search).