This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Tested build: Nevada ML RC1 build (030607_02) When a jsp file is created in New Wizard on Japanese Solaris, default encoding is always "EUC-JP". There are three japanese locales, ja (ja_JP.eucJP), ja_JP.PCK, ja_JP.UTF-8, so default encoding should be system's locale (locale when IDE was run). I think the following codes are related to this problem. web/core/src/org/netbeans/modules/web/core/jsploader/JspDataObject.java public static String getDefaultEncoding() { String language = System.getProperty("user.language"); if ("ja".equals(language)) { // NOI18N // we are Japanese if (org.openide.util.Utilities.isUnix()) return "EUC-JP"; // NOI18N else return "Shift_JIS"; // NOI18N } else // we are English return "ISO-8859-1"; // NOI18N // per JSP 1.2 specification, the default encoding is always ISO-8859-1, // regardless of the setting of the file.encoding property //return System.getProperty("file.encoding", "ISO-8859-1"); } Also, when IDE is run in chinese locale, default encoding is "ISO-8859-1". To support chinese locale, I think default encoding should be system's locale, so I would like to request that this problem will be fixed in NB3.5.1 and Nevada (NB3.5) patch release. In Nevada ML (en/ja), we will document this in ja release notes.
This sounds like a reasonable suggestion. I talked about the default encoding before with Shioda, and we agreed on the behavior that's currently in place. I am happy to change this if you think the behavior should be different. However, there is yet another issue: this encoding must match the <%@page contentType="text/html;charset=..."%> directive in the page. BTW, isn't this currently a bug ? Should not file web/core/src/org/netbeans/modules/web/core/resources/templa tes/JSP.template be also localized, with the changed encoding ? Should I add this file to l10n.list ?
System locale do to equal the encoding of "charset=" in a jsp file, so I think it is difficult to identify the encoding of charset from system locale. Also, I think user can add the appropriate encoding to <%@page contentType="text/html"%> in the template file and it is safety. I think the change of the default encoding should be enough.
Well, the truth is that if the value of the Encoding property on a JSP file (in the "Text" tab) differs from the value specified in the @page directive, then the user will run into serious problems when trying to run the page. The users have come across this before and it resulted in hard-to-debug bugs and we were getting lots of complaints. So I believe the IDE must make sure that these two values are in sync after the page is created. Many users are not savvy enough to set the "charset=" value themselves.
Can the existing RFEs about encoding and charset be looked at in context of this issue ? These RFEs have been around awhile and was wondering if the whole issue of encoding and charset for jsp and webapps could be looked at again, perhaps as part of fixing this issue ? The RFEs 18651, 18652, 7427, 20161. ken.frank@sun.com
I think it would be better that the encoding of "charset=" in the <%@page contentType="text/html;charset=..."%> directive is set dynamically by system locale (we do not have template files for each locale). Is it possible that the encoding of "charset" is identified from system locale?
I agree this would be the ideal solution. However, it is not easy to do in the NB 3.5 code base. Would it be ok to do it in NB 4.0, and think of a simpler solution for NB 3.5 ? Can you think of a simpler solution that would also work (esp. for the Chinese version) ?
We have two ideas and would like 1. 1. default encoding - system locale charset - keep the present (not set "charset=") 2. default encoding - UTF-8 charset - UTF-8
Have you verified that solution #1 will work ? So if I have a file encoded in some exotic encoding (and the page contains multibyte characters), and put set the encoding property to this valie, but I don't specify the charset= part in the page, will it be possible to successfully compile and deploy the page ? In the past we had problems in this scenario.
In the case of #1, we will document the notice (urge users to specify appropriate encoding to "charset=") in ja ane zh release notes. The current IDE also does not set "charset=", so I think the description to release notes is enough.
Sorry, that seems like a half baked solution to me. I don't understand how it can work. I am thinking now it would be better to do a proper fix, i.e. set the "charset=" clause dynamically based on the JVM encoding. I implemented this change in the NetBeans trunk - should appear in tomorrow's build. If you think this is a showstopper, then we can test this fix properly and put it into the ML release. I think this solution is better than having the file property and the page directive out of synch. Checking in JspDataObject.java; /cvs/web/core/src/org/netbeans/modules/web/core/jsploader/J spDataObject.java,v <-- JspDataObject.java new revision: 1.26; previous revision: 1.25 done
Add I18N keyword.
Below is the result that we tested new jsp.jar. locale charset default encoding ---------------------------------------------------- C n/a ISO-8859-1 ja eucJP eucJP ja_JP.PCK PCK PCK ja_JP.UTF-8 UTF-8 UTF-8 Shift JIS MS932 MS932 (ja windows) ja_JP.eucJP EUC-JP-LINUX ja_JP.eucJP (ja Linux) zh_CN.EUC gb2312 gb2312 zh_CN.GBK GBK GBK zh_CN.UTF-8 UTF-8 UTF-8 euro locales ISO-8859-15 ISO-8859-15 Each default encoding works correctly, but charset for ja and euro locales are not correct. For example: charset eucJP -> EUC-JP charset PCK -> Shift_JIS charset MS932 -> Windows_31J charset EUC-JP-LINUX -> EUC-JP cahrset ISO8859-15 -> ISO-8859-15 To get correct charset name, could you please use java.nio.charset.Charset? For detail, please see the attached file. However, java.nio.charset.Charset does not seem to produce Windows_31J for MS932, so please change MS932 to Windows_31J forcibly.
Created attachment 10767 [details] sample code of java.nio.charset.Charset
Yuko, thanks for your advice. (I don't know much about the Charset class.) I implemented your suggestion and sent you the new jar file by e-mail. I don't understand the part about Windows_31J though. When tested with your patch, it seems that Java does not condider "Windows_31J" to be a valid encoding. Rather, "windows-31j" is the canonical name, and that's also what I get if I feed "MS932" into your test. Note the difference in case, and also in the hyphen instead of the underscore. So is "Windows_31J" really correct ? Shouldn't it be "Windows-31J" ? BTW, here is the code I used to produce the default encoding, can you please review it ? public static String getDefaultEncoding() { String language = Locale.getDefault().getLanguage(); if (language.startsWith("en")) { // we are English return "ISO-8859-1"; // NOI18N // per JSP 1.2 specification, the default encoding // is always ISO-8859-1, regardless of the setting // of the file.encoding property // return System.getProperty("file.encoding", // "ISO-8859-1"); } return canonizeEncoding(System.getProperty( "file.encoding", "ISO-8859-1")); } private static final String CORRECT_WINDOWS_31J = "Windows-31J"; private static String canonizeEncoding( String encodingAlias) { if (Charset.isSupported(encodingAlias)) { Charset cs = Charset.forName(encodingAlias); String name = cs.name(); if (name.equalsIgnoreCase(CORRECT_WINDOWS_31J)) { return CORRECT_WINDOWS_31J; } return name; } else { return encodingAlias; } }
I guess "Windows_31J" is typo. I think it's not needed to convert "windows-31j" to "Windows-31J". And I've checked attached program, too. jdk1.4.1_02 - when I feed MS932, Charset.isSupported() return false. jdk1.4.2 (beta) - when I feed MS932, Charset.isSupported() return true, and Charset.name() method return "windows-31j" I think it's a bug of jdk1.4.1, because windows-31j should be returned as charset name of MS932. And also, I can 't find "EUC-JP-LINUX" as charset name in IANA, but it's returned in Japanese locale in RH7.2. IANA website is here: http://www.iana.org/assignments/character-sets As a workaround for above two things, would you review if the following can be implemented? - If "file.encoding" is "MS932", charset is set to "windows-31j" - If "file.encoding" is "EUC-JP-LINUX", charset is set to "EUC-JP". Yuko, please correct me if anythings are incorrect. Thank you. Keiichi By the way, I don't know why Japanese charset is complex like this, as for other east asian locales, the current implementation seems to work fine in jdk1.4.1_02.
Created attachment 10784 [details] Patch for the whole issue against the release35 branch
Fixed in the NetBeans trunk. Will attach the new refined patch, as the previous one didn't quite work.
Created attachment 10833 [details] New refined diff of the changes (in trunk)
I'm sorry for my late verification. I've verified in the latest Q-build with: j2sdk 1.4.1_02 j2sdk 1.4.2 j2sdk 1.4.2_01 The behavior of Charset class has been changed at 1.4.2 release. Would you add the following name conversion as same as existing? x-EUC-CN -> GB2312 eucJP-open -> EUC-JP x-euc-jp-linux -> EUC-JP x-EUC-CN Charset.name() returns x-EUC-CN, but it's not valid charset name. I've just filed this as java bug (4914869). The charset name should be "GB2312". eucJP-open As for eucJP-open, it has been added at 1.4.2 as file encoding name. System.getProperty("file.encoding") return "eucJP-open" as encoding name, but it's not supported by Charset class, and "eucJP-open" is not valid charset name. The charset name should be "EUC-JP". x-euc-jp-linux It's not registered charset name returned by JDK. When this value is returned, charset name in JSP should be "EUC-JP". I'm sorry for these additional conversion. I didn't think the return value from JDK is changed between versions. Would you add them in current fixing?
Created attachment 11927 [details] Additional changes fixes Keiichi's review
Fixed also in Nevada Patch 1 and in Arrow.
I have verified these fixes using Nevada Patch1 and j2sdk 1.4.1_05, not arrow. Checked charset of @page directive and encoding property in the Text tab are set for jsp file properly as following. OS locale.lang encoding type (text and @page) ==================================================== Japan Sol8 ja EUC-JP -- OK ja_JP.PCK Shift_JIS -- OK ja_JP.UTF-8 UTF-8 -- OK Win2k - Windows-31j -- OK Linux (default) EUC-JP -- OK ---------------------------------------------------- China Solaris zh GB2312 -- OK zh_CN.GB18030 GB2312 -- OK zh_CN.UTF-8 UTF-8 -- OK zh_CN.GBK GB2312 -- OK Win2k - GB2312 -- OK ---------------------------------------------------- Taiwan Sol8 zh_TW Big5 -- OK ---------------------------------------------------- France Sol8 fr_FR.ISO8859-1 ISO8859-1 -- OK * ---------------------------------------------------- German Sol8 de_DE.ISO8859-1 ISO8859-1 -- OK * ---------------------------------------------------- * no "charset=" in @page directive as the default
Excellent. Since we've done some changes in the encoding handling area in the NetBeans 3.6 code base, I suggest this is also retested on the current NetBeans trunk builds.
Guys, I'm reopening this because I'm not convinced we are doing the right thing. Maybe we are, and it's just difficult to follow the behaviour from the issue entries. So please bear with me... If everything is OK, just update the issue with the details and we can close it again. Here is the problem I have: the page encoding in the JSP is used for two things. Firstly, it's used to read in the JSP file when the container compiles it. Secondly, it is used for the HTTP response, in case the response encoding has not been set explicitly. Because of this second use of the page encoding, I don't think that the approach that was chosen here (based on the last comments from Tsuruta-san) where we set the encoding to be what the system default is is the correct one. We chose it on the basis of what the server where the development is done supports, but it is potentially used to create the HTTP response, and the response will be read by many different types of hosts (especially when it's windows character sets) . The JSP loader's charset and the page encoding have to match and *can* be set to something that's suitable for the development host, but the response charset needs to be set to UTF-8 (which works on all the browsers since a few versions back), or if you're supporting PDAs, to something that's determined dynamically because of the client. Further, there is no reason to set the page encoding to anything but UTF-8 if you're only going to work on the JSPs in the IDE. The only reason to set it to something else is if you're going to use another tool to edit the JSPs and it doesn't support UTF-8. So in fact, Yuko's suggestion (2) was the best one. I hope I've explained this clearly. FWIW, I think we need a spec for all the i18n features in webapps, so that we can review it all in one go.
Reassigning to our new i18n guy.
Fixed in 3.6. You can reopen this bug in bugtraq. The solution is describet in #7427 and there is set the default encoding UTF-8 for jsp pages..
Couple of questions - last comment says its ok to open back into bugtraq ? Should bugs in this area be opened in BT ? - could someone summarize the solution since there are many comments to this bug, and it will help us for testing to know specific spec of solution. (not says solution in 7427 was used but that issue also is complex and has many comments so summary of both issues will be helpful for us being able to verify them) - Ana's comments below had concern about soliving for http response also - does solution to this issue solve that concern ? ken.frank@sun.com 1/17/2003
> - last comment says its ok to open back into bugtraq ? Should > bugs in this area be opened in BT ? No, Petr meant to say that if the desire is to continue tracking this issue for the purpose of the Arrow release, then a bugtraq bug should be filed (as all Arrow bugs are tracked in bugtraq). All bugs against open source trunk / NB 3.6 should continue to be tracked in issuezilla. I'll let Petr speak to the other questions.
I have a couple of questions. - For some reason, if user needs to create JSPs with non UTF-8 encoding, is there any approach for it? - Seems JSPs which were created on other IDE in non UTF-8 encoding display multibyte chars garbaged. Is there any way to read JSP which is not saved in UTF-8 and no page encoding and charset setting as UTF-8 on IDE?
Created attachment 12971 [details] collapsed multibyte chars are placed between title tag in JSP
The property Encoding for the jsp files was removed. The editor (BaseJspEditorSupport) asks to jsp parser for encoding during loading and saving files. If the encoding is supported, then the file is loaded or saved in the appropriate encoding, in opposite case the user is informed that the file will be loaded in UTF-8 and during saving the IDE asks to user, whether he wants to save in UTF-8 or not to save. When user creates the new jps page, then the page has pageEncoding="UTF-8". The jsp parser is the same parser as is used in the tomcat 5. So if the tomcat recognizes the encoding, then the IDE too. The parser obtains the encoding from web.xml <page-encoding> attribute for <jsp-property-group> element or pageEncoding attribute for the page tag or from the contentType attribute. So the information about encoding the page has itself or deployment descriptor. In the case when you take the page from other ide and the jsp parser is not able to recognize the encoding for this page then the parser returns default which is ISO-8859-1. You can do simple test. Put the page on the standalone server in a web module, which doesn't have defined encoding <page-encoding> in the deployment descriptor. The page will not be displayed correctly.
Added to <page-encoding> attribute to web.xml, but could not loaded multibyte-chars on IDE without page directive in JSPs. Could you please check this test method is proper or not? 1. Added following element in web.xml and save it. 2. Saved JSPs with no tags on nb36, and reopend. <jsp-property-group> <url-pattern>*.jsp</url-pattern> <page-encoding>UTF-8</page-encoding> </jsp-property-group> Seems all jsps with no page directive save and load in ISO-8859-1 even if I create jsp on nb3.6 and not other ide. If we consider about include directive, user needs possibility to specify the encoding type on IDE without page directive. At least JSP needs to be saved in UTF-8 as the default if Yuko-san's 2nd proposal is followed in footsteps. Test:Created JSPs on another IDE and displayed on nb3.6 Saved as | Page Tag| Result ---------------------------- Shift_JIS | Added | OK | NotAdded| *2 ---------------------------- UTF-8 | Added | OK | NotAdded| *1 ---------------------------- EUC-JP | Added | OK | NotAdded| *3 nb36:bld200401151900 *1 - Loaded but multibyte chars are garbaged *2, *3 - JSP does not load on Editor. workaround: Right-click on Editor and select "Clone Document".
You are right. The problem is that the parse doesn't recognize the encoding in the web.xml.
*** Issue 35332 has been marked as a duplicate of this issue. ***
Hi, I found out where the problem is with the parser. The parser used a cache for data. The information from web.xml file are stored in this cache as well, but parser doesn't know about changes in the web.xml so the parser doesn't update the information in the cashe. I fixed the problem and committed in the trunk. So when the web.xml file contains something like <jsp-config> <jsp-property-group> <url-pattern>*.jsp</url-pattern> <page-encoding>ISO-8859-2</page-encoding> </jsp-property-group> </jsp-config> and the jsp file doesn't contain setting of pageEncoding neither charset, then the encoding is used from the web.xml. Of course, the jsp file has to satisfy the url pattern. There is still minor issue, when the wrong cache data are used. It's in this case: 1) start edit a jsp page where are not set pageEncoding neither charset, but the web.xml file includes setting of encoding in <jsp-config> element for this page. 2) change the encoding in the web.xml 3) save the change in the web.xml file. 4) save changes in the jsp file, which you had started to edit before the saving web.xml file. As result old encoding is used for saving (the old one), but if you save the page again, the right encoding is used. So I think, that this is not so important problem as was the original issue and I set the priority to P3.
I have tested jsps created on nb36 by following steps. 1. Mount new dir. 2. Create web module. 3. Add <jsp-config>... to web.xml, and save it. 4. Create a JSP which has include directive for jsp and html. <jsp:include page="test.jsp" flush="true" /> <%@ include file="incHTML.html" %> 5. Create included jsp and html which have ja message. 6. Create JSP segment and JSP document by template wizard. 6. Load created jsps and html. 7. Execute the jsp created at step4. OS locale.lang encoding result (@page) execution=e load=l ==================================================== Sol8 ja EUC-JP l -- OK *1 e -- OK *2 ja_JP.PCK Shift_JIS l -- OK *1 e -- OK *2 ja_JP.UTF-8 UTF-8 l -- OK *1 e -- OK *2 Win2k - Windows-31j l -- OK *1 Linux (default) EUC-JP e -- OK *1 l -- OK *2 *1 ... JSP, JSP document, and html displayed multibyte chars properly on ide, But not JSP Segment. *2 ... JSP Executed properly but values settled by included html is garbaged.
Crated JSP by various page encoding type on another ide. - Settled web.xml. - Specified @page directive in jsp, which has include directive for html and jsp segment. - Included jsp and html has ja message. OS locale.lang jsp encoding result charset execution=e load=l ==================================================== Sol8 ja EUC-JP l -- X *3 e -- X *4 ja_JP.PCK Shift_JIS l -- X *3 e -- X *4 ja_JP.UTF-8 UTF-8 l -- OK e -- X *2 Win2k - Windows-31j l -- X *6 Linux (default) EUC-JP e -- OK *5 l -- OK *6 *2 - JSP Executed properly but values settled by included html is garbaged. *3 - JSP displayed ja message garbaged on ide, but not HTML. *4 - JSP displayed ja message garbaged on browser as ?????. *5 - JSP is not loaded properly on ide(ja is garbaged), but html. *6 - JSP had http error 500. please see attached log for more detail.
Created attachment 13738 [details] logs on browser when executing jsp
All jsp created on other ide by various page encoding is displayed properly on ide. I had verified with wrong version of web.xml on the previsouse comment. I am sorry for making confusion here. here is correct result of loading jsps on ide: locale.lang jsp file encoding result of loading &charset ==================================================== Sol8 ja EUC-JP OK Shift_JIS OK UTF-8 OK PCK OK Win2k - EUC-JP OK Shift_JIS OK UTF-8 OK PCK OK Linux (default) EUC-JP OK Shift_JIS OK UTF-8 OK PCK OK - <url-pattern> of web.xml is "*.jsp."
Ok, so is this issue fixed, or is there any pending item? Can we mark as fixed?
I don't think so. There are still two issues: 1.Bed recognizing encoding with jsp parser. If user have defined both pageEncoding and charset in a jsp page, then parser returns the encoding from charset, not from pageEncoding. 2. Wrong cache data are used. It's in this case: 1) start edit a jsp page where are not set pageEncoding neither charset, but the web.xml file includes setting of encoding in <jsp-config> element for this page. 2) change the encoding in the web.xml 3) save the change in the web.xml file. 4) save changes in the jsp file, which you had started to edit before the saving web.xml file. As result old encoding is used for saving (the old one), but if you save the page again, the right encoding is used. We can close this bug (if every body agree) and fill two new issue, which I describe above.
Mika (mtsuruta) and I agree to close this bug. Please allow me to ask a question. The list two issue seems not problem in my environment. Probably I'm not understanding enough. above case 1, Is it acceptable to set two different encoding in charset and pageEncoding? I can't list any cases which the user needs to set two different encoding in charset and pageEncoding. above case 2, I've tested by following above wrong case #2, but encoding works fine. step 1 -> 2 -> 3 -> 4 : JSP is saved correctly step 1 -> 2 -> 4 : JSP is saved by previous encoding. I think it's because web.xml is not saved. In my environment, cache works fine for this implementation.
case 1) See at my last comment in issue #40780 case 2) I have some debug messages in saving and opening method. If you do this quickly, then the bug happen. The cache is automatically refreshed after 2 seconds. And at last I fill new issue with is connected with encoding - issue #40791. So I close this bug as fixed and we can follow the new issues.
verified