This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 23494 - I18N - help>contents not appear if tomcat40 tomcat-idx.xml has multibyte
Summary: I18N - help>contents not appear if tomcat40 tomcat-idx.xml has multibyte
Status: CLOSED INVALID
Alias: None
Product: platform
Classification: Unclassified
Component: Help System (show other bugs)
Version: 3.x
Hardware: Sun Solaris
: P2 blocker (vote)
Assignee: Jesse Glick
URL:
Keywords: I18N
Depends on:
Blocks:
 
Reported: 2002-05-14 23:36 UTC by Ken Frank
Modified: 2008-12-23 11:47 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
ide.log (18.76 KB, text/plain)
2002-05-14 23:40 UTC, Ken Frank
Details
help xml file with one line with multibyte (5.83 KB, text/plain)
2002-05-14 23:42 UTC, Ken Frank
Details
dir tree of the tomcat40 javhelp with pseudo localized files (159.50 KB, application/octet-stream)
2002-05-14 23:44 UTC, Ken Frank
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ken Frank 2002-05-14 23:36:22 UTC
I'm testing FFJ 4.0 FCS RC2.

I'm running in ja locale on a pseudo localized
ffj that has the help files and javahelp control
files pseudo localized to emulate real
localization.

This process has been going on for some time.

so a given pseudo localized help file will
have all the help files in docs/ja except for
the _ja.hs file in docs.

And the xx_ja.jar localized file lives in
modules/docs

1. Given this setup, invoking help-> contents,
get the exception message about help and the
terminal output about the tomcat-idx.xml illegal
character.
(ide.log attached)

Specifically, these messages:
a. 
Parsing failed for
nbdocs:/org/netbeans/modules/tomcat/tomcat40/do
cs/ja/tomcat-idx.xml
Exception caught while parsing
nbdocs:/org/netbeans/modules/tomcat
/tomcat40/docs/ja/tomcat-idx.xmljava.io.CharConversionException:
I
llegal XML character 0xa4

b. 
ava.lang.NullPointerException: <no
message>
java.lang.NullPointerException
[catch] at
org.netbeans.core.JavaHelp.displayInJHelp(JavaHelp.java
:598at
org.netbeans.core.JavaHelp.showHelp(JavaHelp.java:274)at
org.netbeans.core.Help$HelpCtxProcessor$Presenter.actionPerformed(Help
...... (see ide.log)


1. 

And the help viewer appears empty.

2. help->help contents does show the chosen
help set (except the tomcat one)

3. context help (like help button on file->
new wizard, just for example, shows empty
help viewer, with same exception/terminal
messages.


4. Attached tomcat-idx.xml has just
one <indexitem text="XXtext" with
multibyte just to show that any
multibyte in this file at usual 
location where localization happens
causes the problem.

5. if no multibyte is in localized idx.xml,
then all works ok.

==> Bob May suggested filing here after checking
the help documents and items in docs areas.

Assuming this is not localization process issue,
this could be blocking issue for localized
release.

ken.frank@sun.com
Comment 1 Ken Frank 2002-05-14 23:40:34 UTC
Created attachment 5743 [details]
ide.log
Comment 2 Ken Frank 2002-05-14 23:42:29 UTC
Created attachment 5744 [details]
help xml file with one line with multibyte
Comment 3 Ken Frank 2002-05-14 23:44:52 UTC
Created attachment 5745 [details]
dir tree of the tomcat40 javhelp with pseudo localized files
Comment 4 Jesse Glick 2002-05-15 02:59:38 UTC
tomcat-idx.xml is malformed XML, as any real XML parser would have
told you. Binky take note - the JavaHelp built-in parser does not
report this stuff well.

Doc writers please take note before filing bugs in core/javahelp:

1. When there is an apparent parse problem in a help set that you
think is incorrect, please first try the help set in the standalone
JavaHelp viewer. If the problem exists there too, assign straight to
Binky - nothing to do with the NB integration.

2. Always validate XML files using a real XML parser. For example,
ensure that the XML modules are installed in NetBeans/FFJ; also that
the entity catalogs mounted include the defs for the JavaHelp DTDs
(mount NetBeans catalog, this works); and right-click the XML file and
choose Validate. You will then see parser diagnostics from Xerces,
which will be more helpful than the cheap & dirty parser in JH, which
was not designed to report errors meaningfully.

According to Xerces, there are three problems with your
tomcat-idx.xml, making it not only invalid for the DTD but not
well-formed XML:

1. If you include the special <?xml?> processing directive in an XML
file, it must be the very first characters in the document. Otherwise
it would be impossible to detect encodings reliably. You have a space
and newline before it.

2. I don't know what the <b> tag is supposed to be, but it is not
declared in the JavaHelp Index DTD and cannot be used. Delete
occurrences of this tag.

3. One of the <b> tags is "closed" by another <b> tag rather than </b>.
Comment 5 Patrick Keegan 2002-05-15 11:41:41 UTC
cc'ing Leslie
Comment 6 Bob May 2002-05-15 16:37:05 UTC
Indeed, the helpset really did appear fine in the build and there were
problems that
surfaced *only when a multibyte character was added*; so, in some
sense, it was "tested in a helpset viewer"; but many thanks for the
information about parsing. Apparently the malformed XML problem was
caught only when the multibyte character was added.
The writer has been out but should be back to take care of this, as
needed,
Comment 7 Jesse Glick 2002-05-15 18:27:16 UTC
Bob: right, the JH built-in parser is rather simplistic, so it does
not signal a direct error for the <?xml?> directive in the wrong position.

My guess is that it *does* take advantage of the fact that a
well-formed XML file has an <?xml?> directive in the correct position
in order to detect the file encoding. This is a somewhat delicate
process (see XML 1.0 specification) since the encoding is listed in
the file itself, and some unusual encodings (UTF16 or EBCDIC for
example) actually make the <?xml?> declaration not be in ASCII, so it
is not feasible to check the encoding unless the first characters are
literally "<?xml" (meaning you can match against some known
translations of these five characters into the weird encodings).

When you put the " \n" at the beginning of the file, the JH parser
decides "oh, file does not begin with recognizable XML declaration,
assuming default ASCII (?) encoding" - which is probably harmless
unless you are including non-ASCII characters which it then cannot
interpret.
Comment 8 Bob May 2002-05-15 18:29:25 UTC
Thanks for the clarification, Jesse. That makes perfect sense.
Comment 9 Jesse Glick 2002-12-23 16:37:21 UTC
Consistent use of the I18N keyword.
Comment 10 Quality Engineering 2003-07-01 15:53:27 UTC
Resolved for 3.4.x or earlier, no new info since then -> verified.

Comment 11 Quality Engineering 2003-07-01 16:21:10 UTC
Resolved for 3.4.x or earlier, no new info since then -> closing.