This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
I'm testing FFJ 4.0 FCS RC2. I'm running in ja locale on a pseudo localized ffj that has the help files and javahelp control files pseudo localized to emulate real localization. This process has been going on for some time. so a given pseudo localized help file will have all the help files in docs/ja except for the _ja.hs file in docs. And the xx_ja.jar localized file lives in modules/docs 1. Given this setup, invoking help-> contents, get the exception message about help and the terminal output about the tomcat-idx.xml illegal character. (ide.log attached) Specifically, these messages: a. Parsing failed for nbdocs:/org/netbeans/modules/tomcat/tomcat40/do cs/ja/tomcat-idx.xml Exception caught while parsing nbdocs:/org/netbeans/modules/tomcat /tomcat40/docs/ja/tomcat-idx.xmljava.io.CharConversionException: I llegal XML character 0xa4 b. ava.lang.NullPointerException: <no message> java.lang.NullPointerException [catch] at org.netbeans.core.JavaHelp.displayInJHelp(JavaHelp.java :598at org.netbeans.core.JavaHelp.showHelp(JavaHelp.java:274)at org.netbeans.core.Help$HelpCtxProcessor$Presenter.actionPerformed(Help ...... (see ide.log) 1. And the help viewer appears empty. 2. help->help contents does show the chosen help set (except the tomcat one) 3. context help (like help button on file-> new wizard, just for example, shows empty help viewer, with same exception/terminal messages. 4. Attached tomcat-idx.xml has just one <indexitem text="XXtext" with multibyte just to show that any multibyte in this file at usual location where localization happens causes the problem. 5. if no multibyte is in localized idx.xml, then all works ok. ==> Bob May suggested filing here after checking the help documents and items in docs areas. Assuming this is not localization process issue, this could be blocking issue for localized release. ken.frank@sun.com
Created attachment 5743 [details] ide.log
Created attachment 5744 [details] help xml file with one line with multibyte
Created attachment 5745 [details] dir tree of the tomcat40 javhelp with pseudo localized files
tomcat-idx.xml is malformed XML, as any real XML parser would have told you. Binky take note - the JavaHelp built-in parser does not report this stuff well. Doc writers please take note before filing bugs in core/javahelp: 1. When there is an apparent parse problem in a help set that you think is incorrect, please first try the help set in the standalone JavaHelp viewer. If the problem exists there too, assign straight to Binky - nothing to do with the NB integration. 2. Always validate XML files using a real XML parser. For example, ensure that the XML modules are installed in NetBeans/FFJ; also that the entity catalogs mounted include the defs for the JavaHelp DTDs (mount NetBeans catalog, this works); and right-click the XML file and choose Validate. You will then see parser diagnostics from Xerces, which will be more helpful than the cheap & dirty parser in JH, which was not designed to report errors meaningfully. According to Xerces, there are three problems with your tomcat-idx.xml, making it not only invalid for the DTD but not well-formed XML: 1. If you include the special <?xml?> processing directive in an XML file, it must be the very first characters in the document. Otherwise it would be impossible to detect encodings reliably. You have a space and newline before it. 2. I don't know what the <b> tag is supposed to be, but it is not declared in the JavaHelp Index DTD and cannot be used. Delete occurrences of this tag. 3. One of the <b> tags is "closed" by another <b> tag rather than </b>.
cc'ing Leslie
Indeed, the helpset really did appear fine in the build and there were problems that surfaced *only when a multibyte character was added*; so, in some sense, it was "tested in a helpset viewer"; but many thanks for the information about parsing. Apparently the malformed XML problem was caught only when the multibyte character was added. The writer has been out but should be back to take care of this, as needed,
Bob: right, the JH built-in parser is rather simplistic, so it does not signal a direct error for the <?xml?> directive in the wrong position. My guess is that it *does* take advantage of the fact that a well-formed XML file has an <?xml?> directive in the correct position in order to detect the file encoding. This is a somewhat delicate process (see XML 1.0 specification) since the encoding is listed in the file itself, and some unusual encodings (UTF16 or EBCDIC for example) actually make the <?xml?> declaration not be in ASCII, so it is not feasible to check the encoding unless the first characters are literally "<?xml" (meaning you can match against some known translations of these five characters into the weird encodings). When you put the " \n" at the beginning of the file, the JH parser decides "oh, file does not begin with recognizable XML declaration, assuming default ASCII (?) encoding" - which is probably harmless unless you are including non-ASCII characters which it then cannot interpret.
Thanks for the clarification, Jesse. That makes perfect sense.
Consistent use of the I18N keyword.
Resolved for 3.4.x or earlier, no new info since then -> verified.
Resolved for 3.4.x or earlier, no new info since then -> closing.