This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 40780

Summary: I18N - pageEncoding property is always UTF-8
Product: javaee Reporter: Keiichi Oono <keiichio>
Component: CodeAssignee: issues@javaee <issues>
Status: RESOLVED INVALID    
Severity: blocker CC: jf4jbug
Priority: P2 Keywords: I18N
Version: 3.x   
Hardware: Sun   
OS: Solaris   
Issue Type: DEFECT Exception Reporter:

Description Keiichi Oono 2004-03-05 07:54:19 UTC
NetBeans Q-Build 200402241900

to reproduce:
  - create web module
  - edit web.xml
    add <page-encoding> element to specify
    encoding (e.g. EUC-JP), save and close
  - create JSP

JSP is generated as follows:
------
<@page contentType="text/html; charset=EUC-JP"%>
<@page pageEncoding="UTF-8"%>
<html>...
------

I think charset, pageEncoding, and <page-encoding>
needs to be same. pageEncoding needs to be set
from <page-encoding> as same as charset.
Would you give me your any thoughts?
Comment 1 Keiichi Oono 2004-03-05 07:54:46 UTC
add I18N keyword
Comment 2 Petr Pisl 2004-03-05 12:28:35 UTC
There are some parts from JSP 2.0 specification:

---------------------------------------------------
JSP 3.3.4:
It is a translation-time error to name different encodings in the
pageEncoding attribute of the page directive of a JSP page and in a
JSP configuration element matching the page. It is also a
translation-time error to name different encodings in the prolog /
text declaration of the document in XML syntax and in a JSP
configuration element matching the document. It is legal to name the
same encoding through multiple mechanisms.

JSP.4.1
The page character encoding is the character encoding in which the JSP
page or tag file itself is encoded. The character encoding is
determined for each file separately, even if one file includes another
using the include directive
...
For JSP pages in standard syntax, the page character encoding is
determined from the following sources:
-A JSP configuration element page-encoding value whose URL pattern
matches the page.
-The pageEncoding attribute of the page directive of the page. It is a
translation- time error to name different encodings in the
pageEncoding attribute of the page directive of a JSP page and in a
JSP configuration element whose URL pattern matches the page.
- The charset value of the contentType attribute of the page
directive. This is used to determine the page character encoding if
neither a JSP configuration element page-encoding nor the pageEncoding
attribute are provided.
- If none of the above is provided, ISO-8859-1 is used as the default
character encoding.

JSP.4.2
The initial response character encoding is set to the CHARSET value of
the contentType attribute of the page directive. If the page doesn t
provide this attribute or the attribute doesn t have a CHARSET value,
the initial response character encoding is determined as follows:
- For documents in XML syntax, it is UTF-8.
- For JSP pages in standard syntax, it is the character encoding
specified by the pageEncoding attribute of the page directive or by a
JSP configuration element page-encoding whose URL pattern matches the
page. Only the character encoding specified for the requested page is
used; the encodings of files included via the include directive are
not taken into consideration. If there s no such specification, no
initial response character encoding is passed to ServletResponse.
setContentType() - the ServletResponse object s default, ISO-8859-1,
is used.
---------------------------------------------------

So as you can read, the pegeEncoding and the charset have different
purposes and can be different.
- The page-endcoding and pageEncoding are for the encoding of the file
itself. If they are not defined for a file, then the value of charset
or default encoding (ISO-8859-1) is used.
- The value of charset is used for response encoding. So this can be
different from encoding of file. If there are not defined value of
charset, then page-encoding or pageEncoding or default encoding is
used for response encoding.

This bug I'm closing as invalid.
There is other bug in tomcat's parser. See the issue #40791
Comment 3 Keiichi Oono 2004-03-05 16:23:39 UTC
Thank you very much for your detail clarification. I understand this
should be closed as invalid. But I'm still confusing current NB's
mechanism to handle the following three setting:
   charset, pageEncoding, and <page-encoding>
Please allow me to add comments in issue #40791.