This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 205884 - Recognize TXT file's UTF-16 encoding based on BOM
Summary: Recognize TXT file's UTF-16 encoding based on BOM
Status: RESOLVED WONTFIX
Alias: None
Product: platform
Classification: Unclassified
Component: -- Other -- (show other bugs)
Version: 7.1
Hardware: PC Windows XP
: P3 normal with 1 vote (vote)
Assignee: Antonin Nebuzelsky
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-02 20:41 UTC by bht
Modified: 2013-12-27 12:11 UTC (History)
4 users (show)

See Also:
Issue Type: ENHANCEMENT
Exception Reporter:


Attachments
file contained in zip file (5.56 KB, application/zip)
2011-12-02 20:41 UTC, bht
Details
Dialog (23.47 KB, image/gif)
2011-12-02 20:42 UTC, bht
Details
Screen shot of garbage in editor (94.78 KB, image/gif)
2012-06-19 17:14 UTC, bht
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bht 2011-12-02 20:41:56 UTC
Created attachment 113778 [details]
file contained in zip file

Product Version: NetBeans IDE Dev (Build 201111120600)
Java: 1.6.0_25; Java HotSpot(TM) Client VM 20.0-b11
System: Windows XP version 5.1 running on x86; Cp1252; en_NZ (nb)

The file is the Windows XP boot log file ntbtlog.txt in the attachment.

When opening the file, the editor complains with "This file appears to contain binary data. Are you sure you want to open it in the text editor?"

When pressing OK, then the editor displays only garbage.

I can open this without warnings in any number of test editors including IDE editors.
Comment 1 bht 2011-12-02 20:42:43 UTC
Created attachment 113779 [details]
Dialog
Comment 2 Jiri Prox 2011-12-05 14:26:04 UTC
reproducible
it is probably caused by start character of unicode file (Byte-Order Mark)
Comment 3 Miloslav Metelka 2012-04-16 13:57:26 UTC
Since the http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058 was closed as wontfix the BOM recognition will hardly ever get implemented in jdk so we should workaround it ideally in FileEncodingQuery impl. Reassigning to Tomas Z. Thanks.
Comment 4 Antonin Nebuzelsky 2012-06-19 11:23:23 UTC
The file opens for me after accepting the warning about binary content. Editor shows the text but because of the special characters at the beginning, there are extra spaces (UTF-16 vs. UTF-8?).
Comment 5 bht 2012-06-19 17:14:06 UTC
Created attachment 121060 [details]
Screen shot of garbage in editor

When I open the file then I get a full screen of garbage
Comment 6 Marian Mirilovic 2012-06-25 08:20:39 UTC
This is the stopper for NB 7.2, please evaluate and resolve ASAP.
Comment 7 Milos Kleint 2012-06-25 13:16:48 UTC
with regard to FileEncodingQuery: for txt file most likely the Project based implementation of FEQ will be selected based on FileOwnerQuery's file ownership. If not in project default encoding Charset will be used.
Comment 8 Antonin Nebuzelsky 2012-06-25 15:53:02 UTC
Definitely not a P1 based on bug criteria guidelines. Also not a regression, UTF-16 in TXT file's BOM has never been recognized.

There is a workaround for UTF-16 (and other) encoded TXT files. Install Encoding Support plugin from this URL:

http://deadlock.netbeans.org/hudson/job/nbms-and-javadoc/lastSuccessfulBuild/artifact/nbbuild/nbms/extra/org-netbeans-modules-encoding.nbm

Then open a file in a variety of encodings via File / Open In Encoding action.

At least UTF-16 could be recognized by the default Text File dataloader based on its BOM (Byte-Order Mark). Reassigning to datasystems.
Comment 9 Jaroslav Tulach 2012-12-13 15:18:22 UTC
DataEditorSupport calls

FileEncodingQuery.getEncoding(this.getDataObject().getPrimaryFile());

which is part of queries module and queries belong to --other--

$ grep queries .nbbugzilla-components 
queries = platform/-- other --