This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Summary: | Charset & escape errors in dev_alpha_daily.ent breaks AU | ||
---|---|---|---|
Product: | www | Reporter: | Jesse Glick <jglick> |
Component: | Builds & Repositories | Assignee: | Michal Zlamal <mzlamal> |
Status: | RESOLVED FIXED | ||
Severity: | blocker | CC: | issues, jhuth, jtulach, mmirilovic, phrebejk, rnovak |
Priority: | P1 | ||
Version: | 3.x | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | DEFECT | Exception Reporter: | |
Bug Depends on: | |||
Bug Blocks: | 31961, 32497, 32774 | ||
Attachments: |
Proposed patch
Test script to exercise encoding aspects of <makeupdatedesc> etc. Propsed patch for this on and #32497 |
Description
Jesse Glick
2003-03-12 19:40:52 UTC
As with issue #31961, I recommend that any task which generates XML also validate it and throw build errors in case of trouble. That would let us catch these kinds of problems much earlier. The buggy code in MakeLNBM.java is Jerry Huth's. Not sure who or what creates the entity file, though, and gets the charset - I think Ruda owns that (cannot find any task in nbbuild/antsrc however). Accepting Fixed MakeLNBM.java and updated MakeNBM.java. All (or at least almost all) variables are xmlEscaped before printed. I will continues to work MakeUpdateDesc and issue 31961. Just realized that not only does the current dev_1.6_.xml (presumably before your fixes) lack an XML encoding declaration, but the external entities lack any XML declaration at all. These should *also* have an XML declaration with an explicit UTF-8 encoding. E.g. dev_alpha_daily.ent should begin with: <?xml version="1.0" encoding="UTF-8"?> (but no DOCTYPE of course). Otherwise it seems that an XML parser could interpret the external entity as being in some platform default encoding even if the master document has the encoding set correctly; at least I am not sure what http://www.w3.org/TR/REC-xml#sec-TextDecl implies, but to be safe, better to include the declaration. default charset encoding for external entity has been fixed on trunk and release35 OK, then I guess this can be marked FIXED (with #31961 left open)? OK, marking FIXED Dev alpha AU still does not work. Excerpt: Annotation: URL: http://www.netbeans.org/updates/alpha/dev_1.6_.xml Annotation: Parse error in file http://www.netbeans.org/updates/alpha/dev_alpha_daily.ent line 2,492 column -1 (PUBLIC null) org.xml.sax.SAXParseException: Character conversion error: "Illegal ASCII character, 0xef" (line number may be too low). *** Issue 32105 has been marked as a duplicate of this issue. *** Status? This is P1; our update server is broken. nbcvs ci -m "#31962 Removed *daily* update centre catalog entity from 3.5 and dev alpha update centres" cvs.binary commit: Examining . Checking in 35_1.6_.xml; /cvs/www/www/updates/alpha/35_1.6_.xml,v <-- 35_1.6_.xml new revision: 1.3; previous revision: 1.2 done Checking in dev_1.6_.xml; /cvs/www/www/updates/alpha/dev_1.6_.xml,v <-- dev_1.6_.xml new revision: 1.5; previous revision: 1.4 done *** Issue 32426 has been marked as a duplicate of this issue. *** Created attachment 9577 [details]
Proposed patch
Created attachment 9578 [details]
Test script to exercise encoding aspects of <makeupdatedesc> etc.
Not so complicated to solve; <makeupdatedesc> was simply not using UTF-8 consistently. With the patch, it seems to work, except for the missing distribution= attr on modules. Also <makenbm> and <makelocnbm> were not reading license.txt files in UTF-8, which caused problems for the Japanese license text. Patch also fixes that. Resulting master.xml viewed in Mozilla seems to be using Japanese and Cyrillic characters properly. Filed issue re. distribution attr separately. Also fairly easy to solve I think. The patch looks fine. I'll request approval. approved for 3.5. Don't forget to commit the fix to trunk as well as release35 Integrated to trunk and release35. Is the daily entity back in the catalog? Looks that way, though dev_alpha_daily.ent is still broken (I guess it needs to wait for the next build). Also dev_alpha_daily.xml still has the wrong root element. Broken again, I presume; license files are again read in a non-UTF-8 encoding. See issue #32497. Created attachment 9648 [details]
Propsed patch for this on and #32497
I found that the xml/external/flute-sac-license.html is also on UTF-8, but this file has in header "<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >", what is OK. So I think that fix isn't that easy. Jesse could you advise? For r35, any non-ASCII chars in the license files used in <makenbm> should be replaced with appropriate ASCII equivalents. For the trunk, we need to make sure that text files are all in UTF-8, and that HTML files are encoded according to their <meta> tag (but preferably in UTF-8). I will fix the unscrambler in the trunk to also expect UTF-8 text files and any encoding in HTML files (default UTF-8 if unspecified). This is less critical though. See issue #32497 for details re. use of UTF-8 encoding in license files. Presuming stuff mentioned here is now fixed, though the daily NBMs are not currently included in the alpha AU server for some reason. |