Bug 161015 - [67cat] UTF-8 files with signature (Byte Order Mark) not supported
[67cat] UTF-8 files with signature (Byte Order Mark) not supported
Status: REOPENED
Product: projects
Classification: Unclassified
Component: Generic Infrastructure
8.2
PC Windows 7 x64
: P3 with 8 votes (vote)
: TBD
Assigned To: Tomas Stupka
issues@projects
:
: 183040 185868 206511 207898 241478 247881 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-24 13:07 UTC by denbo
Modified: 2017-06-15 10:35 UTC (History)
15 users (show)

See Also:
Issue Type: ENHANCEMENT
:


Attachments
UTF-8+signature encoded .properties file (131 bytes, text/plain)
2009-03-24 13:08 UTC, denbo
Details
UTF-8+signature encoded .properties file as rendered in IDE (19.35 KB, image/jpeg)
2009-03-24 13:09 UTC, denbo
Details
example of displaying BOM in Netbeans Editor (5.77 KB, image/png)
2017-06-15 10:32 UTC, protasovams
Details

Note You need to log in before you can comment on or make changes to this bug.
Description denbo 2009-03-24 13:07:31 UTC
[ BUILD # : 200903231401 ]
[ JDK VERSION : 1.6.* ]

The NetBeans editor does not support UTF-8 files with signatures
(Byte Order Marks). If present, the BOM is rendered as text and UTF-8
characters are displayed incorrectly.
At our company, we currently use Notepad2
(http://www.flos-freeware.ch/) which is one of the very few editors
supporting UTF-8+signature files.

I attached one such file and a screenshot of this file as rendered in
the NB editor.
Comment 1 denbo 2009-03-24 13:08:24 UTC
Created attachment 78742 [details]
UTF-8+signature encoded .properties file
Comment 2 denbo 2009-03-24 13:09:09 UTC
Created attachment 78743 [details]
UTF-8+signature encoded .properties file as rendered in IDE
Comment 3 Vitezslav Stejskal 2009-03-24 15:29:08 UTC
This in fact is a JDK problem, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058, which was closed as
"Won't fix" due to compatibility reasons. The applications are left on their own to handle BOM in UTF-8 streams.

Anyway Netbeans use encoding from a project and expects that files in the project are stored using this encoding (unless
a file itself can't specify a different encoding, eg JSP, XML files can). So, from Netbeans point of view BOM in UTF-8
encoded files is superfluous and you don't have to use it at all.
Comment 4 Vitezslav Stejskal 2009-03-24 15:29:51 UTC
http://unicode.org/faq/utf_bom.html#BOM
Comment 5 denbo 2009-03-24 16:04:34 UTC
Thank you for explaining this.
As so often I'm dealing with a production/legacy problem here. In an ideal world I could say "let's get rid of the
signatures!" but I hit the reality wall and our customer's IT department insists on having the BOM/signature. 
Again, this leaves developer Denbo alone with the problem and I'm forced to edit my .properties files in Notepad2
instead of Netbeans :(
Comment 6 Vitaly Bychkov 2009-12-15 03:39:56 UTC
There were similar issues against different type of files.
One the most critical case had been fixed: issue 83321.

Probably similar approach can be used in this case.
Comment 7 mslama 2009-12-16 06:04:43 UTC
Please explain what do you expect from NetBeans editor. (It looks like the best place for general support is openide.loaders module -> DataEditorSupport class.)
I see 2 possible variants:
a) Shall we use BOM to detect encoding for particular file? (Like it is done in UnicodeReader class in xml.core module.) 
b) Or is it enough just to use system or project encoding (which should be UTF-8 in this case) and skip BOM at reading and then prepend BOM on save? (In such case you cannot mix different encodings in one project.)
Comment 8 mslama 2009-12-16 06:25:05 UTC
It looks like variant a) is better (and safer).
Comment 9 denbo 2009-12-16 06:27:12 UTC
Yes, definitely a) is the better choice.
Comment 10 Vitezslav Stejskal 2010-05-11 14:55:13 UTC
*** Bug 183040 has been marked as a duplicate of this bug. ***
Comment 11 Vitezslav Stejskal 2010-05-11 14:55:29 UTC
*** Bug 185868 has been marked as a duplicate of this bug. ***
Comment 12 shirojirou 2010-11-17 07:03:21 UTC
b) is enough for me. 
If this is not possible, I have to search for other editor...
Is it possible to use BOM only C++ mode?
Comment 13 shirojirou 2010-11-19 07:36:34 UTC
Is it possible to do it by plug in ?
Comment 14 Jiri Prox 2011-12-19 08:25:14 UTC
*** Bug 206511 has been marked as a duplicate of this bug. ***
Comment 15 szd 2012-06-25 10:39:36 UTC
I can understand that JDK limits BOM support for Java files, however, NetBeans editor supports many more file types (for instance JavaScript) and default encoding cannot be specified for each file type. Even Windows Notepad inserts BOM into utf8 encoded files (txt or whatever).
Comment 16 Miloslav Metelka 2013-08-28 11:36:03 UTC
IMO the best place to handle this issue is a FileEncodingQuery so reassigning to queries module.
Comment 17 Milos Kleint 2013-09-02 12:13:02 UTC
(In reply to Miloslav Metelka from comment #16)
> IMO the best place to handle this issue is a FileEncodingQuery so
> reassigning to queries module.

FileEncodingQuery itself only delegates to FileEncodingQueryImplementation services that can be in any module. 

as of now however it's not involved in handling the stream at all. It's passed a FileObject and returns encoding. If I understand it right, it would have to open the stream and read the BOM, return encoding value and close the stream. The FOQ api user would then load the file with the given encoding. However who would strip the BOM from the opened file content? and who would write the BOM at the beginning of the file when saving?
Comment 18 Milos Kleint 2013-09-03 12:18:06 UTC
I don't see this as defect but rather a missing feature, additionally I'm nit convinced this can be fixed in project system alone. (by implementing a BOM aware FileEncodingQueryImplementation service.
Comment 19 emi 2013-09-03 12:35:07 UTC
>I don't see this as defect but rather a missing feature

It's a standard (albeit rare) file format the editor doesn't support. To me it doesn't get any more basic than reading and writing a text file properly so I would pencil it as a bug. Since it's a bit uncommon we could say it has a low priority instead?

Last I looked into this (loong ago) there were problems in the editor but also in the versioning module (where diff was also looking at the BOM to compute differences and removing the BOM in the editor marked the file as 'changed', etc).
Comment 20 maxym 2013-09-03 19:55:04 UTC
(In reply to Milos Kleint from comment #18)
> I don't see this as defect but rather a missing feature, additionally I'm
> nit convinced this can be fixed in project system alone. (by implementing a
> BOM aware FileEncodingQueryImplementation service.

I also think it is rather bug. Like Emi said, it is basic file format.

Another thing is, that lack of BOM support means that BOM is loaded as characters (actually as one character) into editor. Which is definitely a bug.
Comment 21 Miloslav Metelka 2013-09-16 14:23:04 UTC
> FileEncodingQuery itself only delegates to FileEncodingQueryImplementation
> services that can be in any module. 
> 
> as of now however it's not involved in handling the stream at all. It's
> passed a FileObject and returns encoding. If I understand it right, it would
> have to open the stream and read the BOM, return encoding value and close
> the stream. The FOQ api user would then load the file with the given
> encoding. However who would strip the BOM from the opened file content? and
> who would write the BOM at the beginning of the file when saving?

I thought that the Charset impl returned from FEQ would strip the BOM when decoding and possibly add the BOM when encoding. It could possibly maintain a static WeakSet<FileObject> of the files containing the BOM so that the BOM gets written back when saving. IMHO the BOM should not be present in the output produced by java.io.Reader that feeds the javax.swing.text.Document otherwise we could have problems with positions offsets correctness if e.g. a refactoring manipulates the files directly. The DataEditorSupport.loadFromStreamToKit() produces the Reader as

new InputStreamReader (stream, decoder);

where decoder is obtained from FEQ and IMHO all other content manipulation impls should go through the FEQ.
Comment 22 Jiri Prox 2014-02-06 16:50:06 UTC
*** Bug 207898 has been marked as a duplicate of this bug. ***
Comment 23 Jiri Prox 2014-02-06 16:50:34 UTC
*** Bug 241478 has been marked as a duplicate of this bug. ***
Comment 24 Vladimir Riha 2014-10-13 09:09:31 UTC
*** Bug 247881 has been marked as a duplicate of this bug. ***
Comment 25 Martin Balin 2016-07-07 08:39:35 UTC
This old bug may not be relevant anymore. If you can still reproduce it in 8.2 development builds please reopen this issue.

Thanks for your cooperation,
NetBeans IDE 8.2 Release Boss
Comment 26 matthias42 2016-07-09 20:08:40 UTC
Reopening - I created a javascript file, encoded as UTF-8 with BOM and while gedit and mousepad both open the file just fine, netbeans renders the BOM as whitespace.

This test was done with the state of core-main as of 2016-07-09.
Comment 27 protasovams 2017-06-15 10:32:14 UTC
Created attachment 164547 [details]
example of displaying BOM in Netbeans Editor
Comment 28 protasovams 2017-06-15 10:35:39 UTC
Confirming bug - created a js file (in Notepad++ with UTF-8 encoding).
NetBeans displays BOM as a strange mark at the beginning of the file (see attachment of 2017-06-15).


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2014, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo