94676 – Address file encoding issues in new templating system

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 94676 - Address file encoding issues in new templating system

Summary: Address file encoding issues in new templating system

Status:	RESOLVED FIXED

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	Data Systems (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	Jaroslav Tulach

URL:
Keywords:	I18N

Depends on:	42638
Blocks:	13250 97848
	Show dependency tree

Reported:	2007-02-06 17:26 UTC by Jesse Glick
Modified:	2008-12-22 11:42 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
scripting uses FEQ, UTF-8 is default encoding on SFS (14.75 KB, patch) 2007-03-21 17:42 UTC, Jaroslav Tulach	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jesse Glick 2007-02-06 17:26:38 UTC

JG05 in issue #13250, edited a bit:

Use of platform default encoding for OutputStreamWriter and InputStreamReader
(in ScriptingCreateFromTemplateHandler.createFromTemplate) is dangerous because
either the template or the output file (or both) might require an encoding
different from the platform default encoding. For example, a Mexican Windows
user named "Raúl" with encoding set to Cp1252 tries to instantiate an XML
template shipped with the IDE:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created on ${date} by ${user} -->
<root/>

Decoding the template in Cp1252 is in this case harmless, and substitution
inside the script engine probably proceeds without issue. But when the result is
then written in Cp1252, his name becomes garbage in UTF-8, possibly even making
the file malformed (causing parse errors).

Issue #42638 proposes something like

public static String FileEncodingQuery.getEncoding(FileObject);

which if available should be used for both the reading and writing stages.

Note that there is a subtlety in the writing part: if you create an empty *.xml
file and then write the String

"<?xml version=\"1.0\" encoding=\"UTF-8\"?><!-- Created by Raúl --><root/>"

then the encoder would actually need to scan the first part of the content to
see what encoding should be used for the remainder. This cannot be done with a
simple String return value, I think.

Comment 1 Tomas Zezula 2007-02-12 15:43:52 UTC

I agree, I changed the FileEncodingQuery and FileEncodingQueryImplementation to
return an Charset which can be subclassed.

The new from template should do:
FileObject template;
FileObject destDir;

Charset inenc = FileEncodingQuery.getEncoding (template);
FileObject newFile = destDir.createData(name,ext);
Charset outenc = FileEncodingQuery.getEncoding (newFile);

Reader in = new Reader (template.getInputStream(),inenc);
Writer out = new Writer (newFile.getOutputStrea(lck),outenc);
copy (in,out)
....

The Charset returned by the XML's FileEncoingQueryImplementation has to have
an encoder and decoder which finds out the correct encoding as Jesse described
in issue #42638 (Tue Feb 6 17:50:19 +0000 2007) [JG04] point 2.

Comment 2 Jaroslav Tulach 2007-03-21 17:42:25 UTC

Created attachment 39769 [details]
scripting uses FEQ, UTF-8 is default encoding on SFS

Comment 3 Jaroslav Tulach 2007-03-21 17:44:17 UTC

Tomáš, Jesse, is this what you wanted me to integrate?

Comment 4 Tomas Zezula 2007-03-21 17:54:56 UTC

Seems good to me. The template system uses correctly encoding when reading
template as well as in writing it. The default encoding for default filesystem
is UTF-8.

Comment 5 Jesse Glick 2007-03-21 22:34:09 UTC

Looks OK to me.


In FEQT, don't you mean "UTF-8" rather than "utf-8"? AFAIK the canonical name is
uppercase.

Comment 6 Jaroslav Tulach 2007-03-22 09:25:05 UTC

Ok, so I'll take this as an approval and commit it, for M8. It has anyway been 
discussed during the review of issue 13250 and issue 42638.

Comment 7 Jaroslav Tulach 2007-03-22 10:21:03 UTC

"#94676: Using FileEncodingQuery in scripting"

Checking in openide/templates/nbproject/project.xml;
/shared/data/ccvs/repository/openide/templates/nbproject/project.xml,v  <--  
project.xml
new revision: 1.3; previous revision: 1.2
done
Checking in 
openide/templates/src/org/netbeans/modules/templates/ScriptingCreateFromTemplateHandler.java;
/shared/data/ccvs/repository/openide/templates/src/org/netbeans/modules/templates/ScriptingCreateFromTemplateHandler.java,v  
<--  ScriptingCreateFromTemplateHandler.java
new revision: 1.3; previous revision: 1.2
done
Checking in 
openide/templates/test/unit/src/org/netbeans/modules/templates/SCFTHandlerTest.java;
/shared/data/ccvs/repository/openide/templates/test/unit/src/org/netbeans/modules/templates/SCFTHandlerTest.java,v  
<--  SCFTHandlerTest.java
new revision: 1.4; previous revision: 1.3
done
RCS 
file: /shared/data/ccvs/repository/openide/templates/test/unit/src/org/netbeans/modules/templates/utf8.xml,v
done
Checking in 
openide/templates/test/unit/src/org/netbeans/modules/templates/utf8.xml;
/shared/data/ccvs/repository/openide/templates/test/unit/src/org/netbeans/modules/templates/utf8.xml,v  
<--  utf8.xml
initial revision: 1.1
done
Checking in projects/queries/apichanges.xml;
/shared/data/ccvs/repository/projects/queries/apichanges.xml,v  <--  
apichanges.xml
new revision: 1.9; previous revision: 1.8
done
Checking in projects/queries/manifest.mf;
/shared/data/ccvs/repository/projects/queries/manifest.mf,v  <--  manifest.mf
new revision: 1.13; previous revision: 1.12
done
Checking in 
projects/queries/src/org/netbeans/api/queries/FileEncodingQuery.java;
/shared/data/ccvs/repository/projects/queries/src/org/netbeans/api/queries/FileEncodingQuery.java,v  
<--  FileEncodingQuery.java
new revision: 1.3; previous revision: 1.2
done
Checking in 
projects/queries/test/unit/src/org/netbeans/api/queries/FileEncodingQueryTest.java;
/shared/data/ccvs/repository/projects/queries/test/unit/src/org/netbeans/api/queries/FileEncodingQueryTest.java,v  
<--  FileEncodingQueryTest.java
new revision: 1.3; previous revision: 1.2
done
Checking in ide/golden/deps.txt;
/shared/data/ccvs/repository/ide/golden/deps.txt,v  <--  deps.txt
new revision: 1.486; previous revision: 1.485

Comment 8 tprochazka 2007-03-23 20:55:19 UTC

I tested templates in NB build 200703221900.

A)
I created project with UTF-8 encoding.
Open Java Class template and put ěščřžýáí chars to it.
Create new Class file.
Result:
ěščřžýáíé is OK
but __DATE__ is 23. b�ezen 2007 ( I'm using cz_cs locale)

B)
I switch project to Windows-1250
I create new Class
Result:
ěščřžýáíé is created as: Ä›ĹˇÄŤĹ™ĹľĂ˝ĂˇĂĂ©
but __DATE is corectly displayed as 23. březen 2007

Are these bugs related to this issue?

I don't understand it. Why NB doesn't use for all internall operation unicode? 
Only when user load or save .java file, NB take conversion from/to unicode.

Comment 9 Jaroslav Tulach 2007-03-23 22:13:25 UTC

I guess you found a bug. There may be some subtleties around our current impl. 
Please report new bug, with steps how to reproduce it (using cs_CZ is perfect, 
that is mine encoding as well).

Comment 10 tprochazka 2007-03-24 11:59:17 UTC

OK. I created new issue:

http://www.netbeans.org/issues/show_bug.cgi?id=98874