[ BUILD # : 200903231401 ]
[ JDK VERSION : 1.6.* ]
The NetBeans editor does not support UTF-8 files with signatures
(Byte Order Marks). If present, the BOM is rendered as text and UTF-8
characters are displayed incorrectly.
At our company, we currently use Notepad2
(http://www.flos-freeware.ch/) which is one of the very few editors
supporting UTF-8+signature files.
I attached one such file and a screenshot of this file as rendered in
the NB editor.
Created attachment 78742 [details]
UTF-8+signature encoded .properties file
Created attachment 78743 [details]
UTF-8+signature encoded .properties file as rendered in IDE
This in fact is a JDK problem, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058, which was closed as
"Won't fix" due to compatibility reasons. The applications are left on their own to handle BOM in UTF-8 streams.
Anyway Netbeans use encoding from a project and expects that files in the project are stored using this encoding (unless
a file itself can't specify a different encoding, eg JSP, XML files can). So, from Netbeans point of view BOM in UTF-8
encoded files is superfluous and you don't have to use it at all.
Thank you for explaining this.
As so often I'm dealing with a production/legacy problem here. In an ideal world I could say "let's get rid of the
signatures!" but I hit the reality wall and our customer's IT department insists on having the BOM/signature.
Again, this leaves developer Denbo alone with the problem and I'm forced to edit my .properties files in Notepad2
instead of Netbeans :(
There were similar issues against different type of files.
One the most critical case had been fixed: issue 83321.
Probably similar approach can be used in this case.
Please explain what do you expect from NetBeans editor. (It looks like the best place for general support is openide.loaders module -> DataEditorSupport class.)
I see 2 possible variants:
a) Shall we use BOM to detect encoding for particular file? (Like it is done in UnicodeReader class in xml.core module.)
b) Or is it enough just to use system or project encoding (which should be UTF-8 in this case) and skip BOM at reading and then prepend BOM on save? (In such case you cannot mix different encodings in one project.)
It looks like variant a) is better (and safer).
Yes, definitely a) is the better choice.
*** Bug 183040 has been marked as a duplicate of this bug. ***
*** Bug 185868 has been marked as a duplicate of this bug. ***
b) is enough for me.
If this is not possible, I have to search for other editor...
Is it possible to use BOM only C++ mode?
Is it possible to do it by plug in ?
*** Bug 206511 has been marked as a duplicate of this bug. ***
IMO the best place to handle this issue is a FileEncodingQuery so reassigning to queries module.
(In reply to Miloslav Metelka from comment #16)
> IMO the best place to handle this issue is a FileEncodingQuery so
> reassigning to queries module.
FileEncodingQuery itself only delegates to FileEncodingQueryImplementation services that can be in any module.
as of now however it's not involved in handling the stream at all. It's passed a FileObject and returns encoding. If I understand it right, it would have to open the stream and read the BOM, return encoding value and close the stream. The FOQ api user would then load the file with the given encoding. However who would strip the BOM from the opened file content? and who would write the BOM at the beginning of the file when saving?
I don't see this as defect but rather a missing feature, additionally I'm nit convinced this can be fixed in project system alone. (by implementing a BOM aware FileEncodingQueryImplementation service.
>I don't see this as defect but rather a missing feature
It's a standard (albeit rare) file format the editor doesn't support. To me it doesn't get any more basic than reading and writing a text file properly so I would pencil it as a bug. Since it's a bit uncommon we could say it has a low priority instead?
Last I looked into this (loong ago) there were problems in the editor but also in the versioning module (where diff was also looking at the BOM to compute differences and removing the BOM in the editor marked the file as 'changed', etc).
(In reply to Milos Kleint from comment #18)
> I don't see this as defect but rather a missing feature, additionally I'm
> nit convinced this can be fixed in project system alone. (by implementing a
> BOM aware FileEncodingQueryImplementation service.
I also think it is rather bug. Like Emi said, it is basic file format.
Another thing is, that lack of BOM support means that BOM is loaded as characters (actually as one character) into editor. Which is definitely a bug.
> FileEncodingQuery itself only delegates to FileEncodingQueryImplementation
> services that can be in any module.
> as of now however it's not involved in handling the stream at all. It's
> passed a FileObject and returns encoding. If I understand it right, it would
> have to open the stream and read the BOM, return encoding value and close
> the stream. The FOQ api user would then load the file with the given
> encoding. However who would strip the BOM from the opened file content? and
> who would write the BOM at the beginning of the file when saving?
I thought that the Charset impl returned from FEQ would strip the BOM when decoding and possibly add the BOM when encoding. It could possibly maintain a static WeakSet<FileObject> of the files containing the BOM so that the BOM gets written back when saving. IMHO the BOM should not be present in the output produced by java.io.Reader that feeds the javax.swing.text.Document otherwise we could have problems with positions offsets correctness if e.g. a refactoring manipulates the files directly. The DataEditorSupport.loadFromStreamToKit() produces the Reader as
new InputStreamReader (stream, decoder);
where decoder is obtained from FEQ and IMHO all other content manipulation impls should go through the FEQ.
*** Bug 207898 has been marked as a duplicate of this bug. ***
*** Bug 241478 has been marked as a duplicate of this bug. ***
*** Bug 247881 has been marked as a duplicate of this bug. ***
This old bug may not be relevant anymore. If you can still reproduce it in 8.2 development builds please reopen this issue.
Thanks for your cooperation,
NetBeans IDE 8.2 Release Boss
This test was done with the state of core-main as of 2016-07-09.
Created attachment 164547 [details]
example of displaying BOM in Netbeans Editor
Confirming bug - created a js file (in Notepad++ with UTF-8 encoding).
NetBeans displays BOM as a strange mark at the beginning of the file (see attachment of 2017-06-15).