This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 83321 - Validating an XML with UTF8_BOM fails.
Summary: Validating an XML with UTF8_BOM fails.
Status: VERIFIED FIXED
Alias: None
Product: xml
Classification: Unclassified
Component: Code (show other bugs)
Version: 5.x
Hardware: All All
: P2 blocker (vote)
Assignee: Samaresh Panda
URL:
Keywords:
: 26943 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-08-23 22:44 UTC by Jun Qian
Modified: 2008-03-27 21:36 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
UTF-8 Test Code Snapshot (58.57 KB, application/octet-stream)
2006-08-23 22:45 UTC, Jun Qian
Details
UTF-8 XML View in NetBeans (31.65 KB, application/octet-stream)
2006-08-23 22:46 UTC, Jun Qian
Details
UTF-8 XML View in Eclipse (43.53 KB, application/octet-stream)
2006-08-23 22:46 UTC, Jun Qian
Details
UTF-8 XML File with BOM created using Windows Notepad (75 bytes, application/octet-stream)
2006-08-23 22:48 UTC, Jun Qian
Details
xml support (462.59 KB, application/octet-stream)
2008-03-26 17:41 UTC, Samaresh Panda
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jun Qian 2006-08-23 22:44:21 UTC
Netbeans XML Editors display unicode BOM code at the beginning of files.
Validating an XML with UTF8_BOM fails.

To reproduce create an XML file with Windows Notepad and saveas UTF-8. Windows
adds the BOM code to the file. Now open and validate the XML file in Netbeans.
It fails with an illegal character in prolog error.

This is background on a similar issue

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

This supplies a solution

http://koti.mbnet.fi/akini/java/unicodereader
Comment 1 Jun Qian 2006-08-23 22:45:08 UTC
Created attachment 33212 [details]
UTF-8 Test Code Snapshot
Comment 2 Jun Qian 2006-08-23 22:46:08 UTC
Created attachment 33213 [details]
UTF-8 XML View in NetBeans
Comment 3 Jun Qian 2006-08-23 22:46:30 UTC
Created attachment 33214 [details]
UTF-8 XML View in Eclipse
Comment 4 Jun Qian 2006-08-23 22:48:29 UTC
Created attachment 33215 [details]
UTF-8 XML File with BOM created using Windows Notepad
Comment 5 Petr Jiricka 2006-08-24 14:03:30 UTC
Another strange thing is that there is an extra character visible in the editor
before <?xml - I guess it should not be there, as it is a part of the header.
Comment 6 Jun Qian 2006-08-24 17:29:46 UTC
That is exactly what I am talking about. The extra character at the beginning is
the BOM part. It is not being skipped during parsing, and therefore screws the
editor and the validation.
Comment 7 Petr Pisl 2006-09-11 16:51:17 UTC
I played with this today and it's not so simple to fix this. What is your
default encoding in your system?
Comment 8 Jun Qian 2006-09-12 08:42:37 UTC
Where can I find the default encoding?
Comment 9 Petr Pisl 2006-09-12 09:09:42 UTC
For example when you start the NetBeans you have got in the console line like this:

System Locale; Encoding = en_US (nb); UTF-8

The same is written in the messages.log file(${user.dir}/var/log/messages.log).
You can attach this file to this issue. 

Regards,
Petr
Comment 10 Jun Qian 2006-09-12 18:15:19 UTC
This is from my NetBeans console:
    System Locale; Encoding = en_US (nb); Cp1252
Comment 11 Jun Qian 2008-01-17 23:44:55 UTC
Another user is running into this BOM issue:

 
I am using the WS-A schema in a WSDL,

        <types>
            <xsd:schema targetNamespace="http://j2ee.netbeans.org/wsdl/dynamicPL">
                <xsd:import namespace="http://schemas.xmlsoap.org/ws/2004/08/addressing"
                schemaLocation="http://schemas.xmlsoap.org/ws/2004/08/addressing/"/>
               
                <xsd:import namespace="http://docs.oasis-open.org/wsbpel/2.0/process/executable"
                schemaLocation="http://docs.oasis-open.org/wsbpel/2.0/OS/process/executable/ws-bpel_executable.xsd"/>
                              
            </xsd:schema>
        </types>


 NB XSD parser is not able to parse the file. Whether NB downloads this WS-A schema file through populate catalog action
or accesses it remotely, it seems to be unable to parse the schema file. When I force it to open in the text mode within
NB, I see some additional character at the beginning, a dot. If i delete this "dot" in the file it works fine.

The problem is I am able to open the xsd in the browser, by opening http://schemas.xmlsoap.org/ws/2004/08/addressing/. I
also can open the file in notepad and textpad and don't see the additional character. Maybe those editors chose to
ignore the "dot"

To test if NB has an issue, I used another remote xsd,
http://docs.oasis-open.org/wsbpel/2.0/OS/process/executable/ws-bpel_executable.xsd. And that works fine.

I am using windows XP. Could the XSD model folks or the "catalog" feature folks look at this and confirm that there is
in fact no issue with NB?

thanks,
Kiran.
Comment 12 kiran_bhumana 2008-01-18 00:02:04 UTC
changed it to P2 because we need this feature to work with WS-A addressing. without this support, the users will not
know why their import doesn't work. For BPEL this is very important feature as part of the dynamic addressing.

            <xsd:import namespace="http://schemas.xmlsoap.org/ws/2004/08/addressing"
            schemaLocation="http://schemas.xmlsoap.org/ws/2004/08/addressing/"/>

Comment 13 Samaresh Panda 2008-02-15 18:37:18 UTC
*** Issue 26943 has been marked as a duplicate of this issue. ***
Comment 14 Ken Frank 2008-03-13 19:53:30 UTC
I dont know if this applies to this issue, but since it was filed there have
been the new feq encoding handling related to encoding property of project
and about encoding handling for xml, jsp and html files

that is, project or file encoding can't be assumed to be encoding of locale user is in
or cannot be assumed to be utf-8 either, which is the default project encoding now
but user can change that.

ken.frank@sun.com
Comment 15 Samaresh Panda 2008-03-18 21:15:54 UTC
Windows does add optional BOM code and ideally it should be fixed by jdk. The jdk issue
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058 has been marked as "closed, will not be fixed" and the
work-around asks the applications to skip BOMs.

If at all it needs to be fixed by Netbeans, it should be fixed at a much higher level, perhaps in DataEditorSupport.
Comment 16 Jaroslav Tulach 2008-03-19 20:27:21 UTC
DataEditorSupport knows nothing about XML. It knows nothing about encoding. We can indeed search for general solution 
and find someone else than Samaresh to implement it, but as the problem manifests in 99% of cases with XML files, I am 
inclined to believe XML is the place that should fix it for 6.1.
Comment 17 Samaresh Panda 2008-03-26 17:40:23 UTC
jqian, kiran_bhumana, please try the new jar and let me know if it fixes this issue. Copy the jar to ide9/modules.
Comment 18 Samaresh Panda 2008-03-26 17:41:34 UTC
Created attachment 59153 [details]
xml support
Comment 20 Jun Qian 2008-03-27 21:36:52 UTC
Verified.

Product Version: NetBeans IDE Dev (Build 20080327180128)
Java: 1.6.0_03; Java HotSpot(TM) Client VM 1.6.0_03-b05
System: Windows XP version 5.1 running on x86; Cp1252; en_US (nb)
Userdir: C:\Documents and Settings\jqian\.netbeans\ide_dev