This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 43269 - I18N - .jspx is not saved without setting is-xml:false for specified encoding
Summary: I18N - .jspx is not saved without setting is-xml:false for specified encoding
Status: VERIFIED FIXED
Alias: None
Product: javaee
Classification: Unclassified
Component: Code (show other bugs)
Version: 3.x
Hardware: Sun All
: P2 blocker (vote)
Assignee: Petr Pisl
URL:
Keywords: I18N, RELNOTE, TOMCAT
Depends on:
Blocks: 44422
  Show dependency tree
 
Reported: 2004-05-14 12:34 UTC by mtsuruta
Modified: 2007-08-03 09:03 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
I attached some jsps created in nb-trunk 20040517800. (10.02 KB, application/octet-stream)
2004-05-20 13:02 UTC, mtsuruta
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mtsuruta 2004-05-14 12:34:15 UTC
Without setting "false" in is-xml property in
web.xml, jsp in XML syntax is not saved using
page-encoding specified in web.xml.

1. Create web module
2. Create a JSP file in XML syntax with New Wizard
3. Add following property in web.xml

    <jsp-config>
        <jsp-property-group>
        <url-pattern>*.jspx</url-pattern>
        <page-encoding>EUC-JP</page-encoding>
        <is-xml>false</is-xml>
        </jsp-property-group>
    </jsp-config>

4. Type multibyte character in <text> tag
5. Saved properly using the encoding type
specified in web.xml, but not when setting "true"
for <is-xml>.
Comment 1 mtsuruta 2004-05-20 05:49:51 UTC
With "true" for is-xml property, the jsp in xml syntax is saved in
UTF-8, and page-encoding value in web.xml is not used for the jsp
encoding.
Comment 2 Petr Pisl 2004-05-20 09:18:42 UTC
Can you attach the jspx page? 

According the jsp 2.0 specification - section Page Encoding Detection:

1. Decide whether the source file is a JSP page in standard syntax or
a JSP document in XML syntax. 

a. If there is a <is-xml> element in a <jsp-property-group> that names
this file, then if it has the value "true", the file is a JSP
document, and if it has the value "false", the file is not a JSP document.

.....

4. If the file is a JSP document in XML syntax, use these steps. 

a. Determine the page character encoding as described in appendix F.1
of the XML 1.0 specification. Note whether the encoding was named in
the encoding attribute of the XML prolog or just derived from the
initial bytes. 

b. Check whether there is a JSP configuration element <page-encoding>
whose URL pattern matches this file.

.....

Is it possible, that your jspx page defines itself UTF-8 encoding ? If
yes, that according to the specification this encoding has the biggest
priority for jsps in xml syntax. You can say in web.xml that this is
not document in xml syntax with <is-xml> tag as you did, and then the
steps for obtaining encoding are different.
Comment 3 mtsuruta 2004-05-20 13:02:02 UTC
Created attachment 15031 [details]
I attached some jsps created in nb-trunk 20040517800.
Comment 4 Petr Pisl 2004-06-24 12:45:02 UTC
Hi,

I solved this issue and this is  really complicated issue. 

I found out, that tamcat's parser needs to have xerces on the
classpath, because it needs xerces for finding encoding of a jsp
document. So this  was our bug and I fixed it .
http://web.netbeans.org/source/browse/web/jspparser/nbproject/project.xml.diff?r1=1.2&r2=1.3

So when I tested Mika's attached examples I got encoding before the
fix (all jsp pages should have EUC-JP encoding):
jsp from group is-xml_false
       general.jps           EUC-JP
       xml_syntx.jsp      UTF-8
jsp from group is-xml_true
       general.jsp           UTF-8
       xml_syntx.jsp      UTF-8

After the fix in jsp parser this situation is:
jsp from group is-xml_false
       general.jps           EUC-JP
       xml_syntx.jsp      EUC-JP
jsp from group is-xml_true
       general.jsp           UTF-8
       xml_syntx.jsp      UTF-8

The problem is in the tomcat. The tomcat uses xerces for obtaining the
encoding for a jsp document. The xerces looks for the encoding
according the
XML 1.0 specification mainly from the xml prolog. If the document
doesn't have defined encoding in the prolog (<?xml version="1.0"
encoding="EUC-JP" ?>  for example) or the xml prolog is not in the
document (it doesn't have to be), the xerces returns UTF-8 as default
endcoding. The problem is that Tomcat doesn't care about, whether the
encoding is defined in the prolog or not, and always works with the
encoding which is returned from the xerces.  I entered a bug against
the tomcat. You can see at
http://issues.apache.org/bugzilla/show_bug.cgi?id=29763.

According to the jsp specification, there are three places where the
encoding for a jsp document can be set. In the xml prolog according to
XML 1.0 specification, deployment descriptor in the
<jsp-property-group> and as value of pageEncoding attribute of
<jsp:directive.page />. When there are defined at least two of them,
then they have to be the same.

The workaround is easy. User has to define the encoding in the xml
prolog until the bug will not be fixed in the tomcat.  I suggest
include this bug in the release notes.

I think that jsp compiler needs to have xerces on the classpath as
well. It should be the same problem as was in the jsp parser.
Comment 5 Petr Pisl 2004-06-28 10:34:43 UTC
I discussed the issue about encoding of jsp document  with experts Jan
Luehe and Mark Roth. The result is, that user has to define an
encoding in xml prolog, if she wants to defined encoding in the
web.xml or in directive.page. If the jsp document doesn't have a XML
prolog, then the document has to be in the encoding UTF-8. See at
http://issues.apache.org/bugzilla/show_bug.cgi?id=29763.

So I suggest create documentation for this and put the xml prolog <?
xml version="1.0"  encoding="UTF-8" ?> in our template.
Comment 6 Dan Kolar 2007-08-03 09:03:11 UTC
v