This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 31962

Summary: Charset & escape errors in dev_alpha_daily.ent breaks AU
Product: www Reporter: Jesse Glick <jglick>
Component: Builds & RepositoriesAssignee: Michal Zlamal <mzlamal>
Status: RESOLVED FIXED    
Severity: blocker CC: issues, jhuth, jtulach, mmirilovic, phrebejk, rnovak
Priority: P1    
Version: 3.x   
Hardware: All   
OS: All   
Issue Type: DEFECT Exception Reporter:
Bug Depends on:    
Bug Blocks: 31961, 32497, 32774    
Attachments: Proposed patch
Test script to exercise encoding aspects of <makeupdatedesc> etc.
Propsed patch for this on and #32497

Description Jesse Glick 2003-03-12 19:40:52 UTC
[dev mar 11, JDK 1.4.1 or 1.4.2 b18] I cannot
connect to the daily alpha NBM server. After
adding some better debug info I get this:

Annotation: URL:
http://www.netbeans.org/updates/alpha/dev_1.6_.xml
Annotation: Parse error in file
http://www.netbeans.org/updates/alpha/dev_alpha_daily.ent
line 4,324 column -1 (PUBLIC null)
org.xml.sax.SAXParseException: Whitespace required
before attributes.

Indeed, the file is quite corrupt in this line:

OpenIDE-Module-Long-Description="????????????
"???????????? JSP" ??????????????????????????
?????????????????????? ??????????????
?????????????? JSP. ???????? ????????????
???????????? ???? ?????????? ????????????????????
????????????????????????, ????
???????????????????????? ??????
???????????????????? ??????????????????
???????????????????? ??
???????????????????????????? ???????? JSP, ??
?????????? ?????? ????????????????????
?????????????? JSP ?????????????? ????????????????."

I don't know if anyone is using
MakeUpdateDesc.java, but that is definitely buggy:

java.io.PrintWriter pw = new java.io.PrintWriter
(new java.io.OutputStreamWriter (os));
pw.println ("<?xml version='1.0'?>");

This should (1) set the encoding on the
OutputStreamWriter to be UTF-8 explicitly, since
otherwise it will probably be ISO-8859-1 or
something which cannot handle e.g. Russian
characters correctly; (2) should set the XML
encoding to UTF-8, since again otherwise that will
default to some unspecified value according to the
client platform.

MakeNBM.java and MakeLNBM.java look right to me
w.r.t. encoding.

Also of interest: dev_alpha_daily.ent at least
does seem to be using the new <l10n> tags. Petr
Hr. take notice - I said it did not, but there it
is, in the dev version (not in 3.5). However it is
invalid according to the DTD and clearly cannot
work since the required 'distribution' attr
pointing to the NBM is not there! Somebody must
have introduced this stuff recently and not
actually tested it.

Digging a little deeper, I see a problem even in
translatedfiles, independent of the problems with
the encoding being mangled. Run

ant -f translatedfiles/build.xml nbm

and examine

translatedfiles/nb/web-jspparser/Info_ru/info.xml

It is in UTF-8 but is malformed:

OpenIDE-Module-Long-Description="...stuff in
Cyrillic... "... JSP" ..."

The double quotes are not being escaped. The
reason is that in MakeLNBM.java, writeProp forgets
to call xmlEscape on val.
Comment 1 Jesse Glick 2003-03-12 19:42:05 UTC
As with issue #31961, I recommend that any task which generates XML
also validate it and throw build errors in case of trouble. That would
let us catch these kinds of problems much earlier.
Comment 2 Jesse Glick 2003-03-12 19:43:41 UTC
The buggy code in MakeLNBM.java is Jerry Huth's. Not sure who or what
creates the entity file, though, and gets the charset - I think Ruda
owns that (cannot find any task in nbbuild/antsrc however).
Comment 3 rbalada 2003-03-13 08:32:28 UTC
Accepting
Comment 4 rbalada 2003-03-13 17:21:38 UTC
Fixed MakeLNBM.java and updated MakeNBM.java. All (or at least almost
all) variables are xmlEscaped before printed. I will continues to work
MakeUpdateDesc and issue 31961.
Comment 5 Jesse Glick 2003-03-13 19:21:18 UTC
Just realized that not only does the current dev_1.6_.xml (presumably
before your fixes) lack an XML encoding declaration, but the external
entities lack any XML declaration at all. These should *also* have an
XML declaration with an explicit UTF-8 encoding. E.g.
dev_alpha_daily.ent should begin with:

<?xml version="1.0" encoding="UTF-8"?>

(but no DOCTYPE of course). Otherwise it seems that an XML parser
could interpret the external entity as being in some platform default
encoding even if the master document has the encoding set correctly;
at least I am not sure what

http://www.w3.org/TR/REC-xml#sec-TextDecl

implies, but to be safe, better to include the declaration.
Comment 6 rbalada 2003-03-13 19:33:16 UTC
default charset encoding for external entity has been fixed on trunk
and release35
Comment 7 Jesse Glick 2003-03-13 20:33:29 UTC
OK, then I guess this can be marked FIXED (with #31961 left open)?
Comment 8 rbalada 2003-03-14 10:32:09 UTC
OK, marking FIXED
Comment 9 Jesse Glick 2003-03-17 18:44:45 UTC
Dev alpha AU still does not work. Excerpt:

Annotation: URL: http://www.netbeans.org/updates/alpha/dev_1.6_.xml
Annotation: Parse error in file
http://www.netbeans.org/updates/alpha/dev_alpha_daily.ent line 2,492
column -1 (PUBLIC null)
org.xml.sax.SAXParseException: Character conversion error: "Illegal
ASCII character, 0xef" (line number may be too low).
Comment 10 Petr Hrebejk 2003-03-19 09:30:23 UTC
*** Issue 32105 has been marked as a duplicate of this issue. ***
Comment 11 Jesse Glick 2003-03-20 18:21:12 UTC
Status? This is P1; our update server is broken.
Comment 12 rbalada 2003-03-20 21:37:43 UTC
nbcvs ci -m "#31962 Removed *daily* update centre catalog entity from
3.5 and dev alpha update centres"  
cvs.binary commit: Examining .
Checking in 35_1.6_.xml;
/cvs/www/www/updates/alpha/35_1.6_.xml,v  <--  35_1.6_.xml
new revision: 1.3; previous revision: 1.2
done
Checking in dev_1.6_.xml;
/cvs/www/www/updates/alpha/dev_1.6_.xml,v  <--  dev_1.6_.xml
new revision: 1.5; previous revision: 1.4
done
Comment 13 Jesse Glick 2003-03-27 18:47:15 UTC
*** Issue 32426 has been marked as a duplicate of this issue. ***
Comment 14 Jesse Glick 2003-03-27 19:35:20 UTC
Created attachment 9577 [details]
Proposed patch
Comment 15 Jesse Glick 2003-03-27 19:35:56 UTC
Created attachment 9578 [details]
Test script to exercise encoding aspects of <makeupdatedesc> etc.
Comment 16 Jesse Glick 2003-03-27 19:37:59 UTC
Not so complicated to solve; <makeupdatedesc> was simply not using
UTF-8 consistently. With the patch, it seems to work, except for the
missing distribution= attr on modules.

Also <makenbm> and <makelocnbm> were not reading license.txt files in
UTF-8, which caused problems for the Japanese license text. Patch also
fixes that. Resulting master.xml viewed in Mozilla seems to be using
Japanese and Cyrillic characters properly.
Comment 17 Jesse Glick 2003-03-27 20:06:34 UTC
Filed issue re. distribution attr separately. Also fairly easy to
solve I think.
Comment 18 rbalada 2003-03-28 11:32:54 UTC
The patch looks fine. I'll request approval.
Comment 19 _ ttran 2003-03-28 12:54:25 UTC
approved for 3.5.  Don't forget to commit the fix to trunk as well as
release35
Comment 20 rbalada 2003-03-28 14:43:51 UTC
Integrated to trunk and release35.
Comment 21 Jesse Glick 2003-03-28 15:34:05 UTC
Is the daily entity back in the catalog? Looks that way, though
dev_alpha_daily.ent is still broken (I guess it needs to wait for the
next build).

Also dev_alpha_daily.xml still has the wrong root element.
Comment 22 Jesse Glick 2003-03-31 14:44:42 UTC
Broken again, I presume; license files are again read in a non-UTF-8
encoding. See issue #32497.
Comment 23 Michal Zlamal 2003-04-02 10:24:27 UTC
Created attachment 9648 [details]
Propsed patch for this on and #32497
Comment 24 Michal Zlamal 2003-04-02 13:56:18 UTC
I found that the xml/external/flute-sac-license.html is also on UTF-8, but this file has in header "<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >", what is OK. So I think that fix isn't that easy. 
Jesse could you advise?
Comment 25 Jesse Glick 2003-04-02 19:35:31 UTC
For r35, any non-ASCII chars in the license files used in <makenbm>
should be replaced with appropriate ASCII equivalents. For the trunk,
we need to make sure that text files are all in UTF-8, and that HTML
files are encoded according to their <meta> tag (but preferably in UTF-8).

I will fix the unscrambler in the trunk to also expect UTF-8 text
files and any encoding in HTML files (default UTF-8 if unspecified).
This is less critical though.
Comment 26 Jesse Glick 2003-04-02 23:57:58 UTC
See issue #32497 for details re. use of UTF-8 encoding in license files.

Presuming stuff mentioned here is now fixed, though the daily NBMs are
not currently included in the alpha AU server for some reason.