It seems that this bug has resurfaced (#83321). Trying to validate an XML schema file (right click | Validate XML) containing the 3 byte UTF-8 BOM yields
XML validation started.
Content is not allowed in prolog. 
XML validation finished.
Clicking on <Content is not allowed in prolog. > opens the XML file and highlights the first line of the file.
Other UTF encodings with or without BOM do work.
The exact NetBeans version used is NetBeans IDE 7.3 RC1 (Build 201301240957)
Another observation: if no encoding is specified (e.g. only <?xml version="1.0"?>) then the validation also works for UTF-8 with BOM.
My previous comment actually is not true.
So here are some results:
| encoding specified | no encoding specified
UTF-8 | OK | OK
UTF-8 BOM | 1) | 1)
UTF-16 LE BOM | OK | 2) #)
UTF-16 BE BOM | OK | 2) #)
[UTF-16 LE | 1) *) | 3) +) ]
[UTF-16 BE | OK | 1) +) ]
1) "Content is not allowed in prolog."
2) "Premature end of file."
3) "The markup in the document preceding the root element must be well-formed."
*) Garbage when opened in NB (wrong encoding detected)
#) Nothing when opened in NB
+) Probably detected as UTF-8 (spaces between characters)
<?xml version="1.0" encoding="$encoding"?>
I am of course unsure which of these combinations should have worked ...
If I understand the W3C requirements correctly UTF-8 with or without BOM and UTF-16 with BOM have to be understood. UTF-16 without BOM is illegal.
Only documents not encoding in UTF-8 or UTF-16 seem to be required to provide a correct encoding information. [http://www.w3.org/TR/REC-xml/#charencoding]
Hm, at least in the case the encoding is not specified, the file even opens bad (BOM is displayed). The defect is present from netbeans 7.1.2, I cannot pinpoint a changeset which changed the behaviour.
Anyway, the EncodingUtil.doDetectEncoding attempts to autodetect encoding and then reads document's declared encoding. If the document does NOT declare anything, the autodetected encoding (e.g. UTF-8 detected using BOM presence) is thrown away and null is returned. That causes the next encoding in the queue (project default, ISO-8859-1 in my case) to step in, and interpret the BOM as a regular text.
Although I was able to fix the charset detection, the UTF-8 encoded file is still not read correctly. Java I/O libraries do not support UTF-8 with BOM mark correctly - see
Sadly, the net result of the evaluation is that NetBeans XML support should warn if a document contains BOM sequence at the start; even if NB worked around this JDK defect, JAXP would not parse the XML correctly at application runtime.
I'll commit the encoding detection fix; it won't harm, and improve code's correctness. However I have to to mark the issue as an enhancement to report JDK-unsupported feature rather than provide fix for the use-case, sorry.
encoding detection improved by http://hg.netbeans.org/jet-main/rev/6bf6bd1eac3f