This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 103549

Summary: I18N - Editor does not support UTF-16 encoding
Product: web Reporter: Martin Schovanek <mschovanek>
Component: HTML EditorAssignee: Marek Fukala <mfukala>
Status: RESOLVED FIXED    
Severity: blocker CC: kaa, kfrank, mmetelka, tzezula, vstejskal
Priority: P3 Keywords: I18N
Version: 6.x   
Hardware: All   
OS: All   
Issue Type: DEFECT Exception Reporter:
Bug Depends on:    
Bug Blocks: 120529    

Description Martin Schovanek 2007-05-09 11:01:31 UTC
[#200705040000, jdk1.5.0]

to reproduce:
-------------
1) open a .html file
2) change the file encoding to UTF-16 (or other which is not compatible with
default encoding) by:
Comment 1 Marek Fukala 2007-05-09 12:42:54 UTC
reproducible
Comment 2 Ken Frank 2007-05-30 04:59:58 UTC
what is the problem seen ? 
are the characters not displayed ok in editor or when html run in browser ?

also, I don't know if upcoming feq changes for web or html might impact this
situation (allowing project to have an encoding property)

ken.frank@sun.com
Comment 3 Martin Schovanek 2007-06-06 14:49:52 UTC
Partially fixed, works for UTF-16 NOW, but still does not work for eg. UTF-16LE,
because there is not BOM by default, downgrading to P3.

The problem appears when you put the following line into html-head section and
reopen the document.

   <meta http-equiv="Content-Type" content="text/html; charset=UTF-16LE">

Encoding property may serve as workaround for this.
Comment 4 Ken Frank 2007-10-03 18:08:38 UTC
to dev, does feq implememtations for file and project solve this issue ?

ken.frank@sun.com
Comment 5 Marek Fukala 2007-10-22 15:35:54 UTC
I really do not know how can I fix that without setting special file encoding property. In the FEQ impl. I need to read
the stream to find the meta tag. However the string created from the inputstream is incorrect since I do not know the
encoding and use the default. As a result of that I do not find the meta tag and do not return the encoding, the FEQ
infrastructure then uses the project one which causes the file being incorrectly loaded. BTW, how other editors handle
this??? Isn't it a generic problem? Has anyone already solved this?
Comment 6 Tomas Zezula 2007-10-22 16:54:01 UTC
I don't know how it's in the HTML, but it should be probably the same as in XML, the UTF-16 starts with UTF-16 mark
otherwise the head has to be in UTF-8 or ISO Latin 1, I am not sure which one, look into XML/Core EncodingUtil.
Comment 7 Marek Fukala 2007-10-22 16:58:45 UTC
Tomasi, we do the same as in XML - looking for BOM, but it seems UTF-16LE doesn't have it.
Comment 8 Vitezslav Stejskal 2007-10-29 16:30:31 UTC
IMO the first 128 characters are the same in UTF-8 and ISO Latin 1. So, I would say if there is no UTF-16 mark just fall
back on UTF-8 for reading the header.
Comment 9 Marek Fukala 2007-10-31 09:02:53 UTC
fixed. Martine, please verify ASAP.

Checking in HtmlDataObject.java;
/cvs/html/src/org/netbeans/modules/html/HtmlDataObject.java,v  <--  HtmlDataObject.java
new revision: 1.32; previous revision: 1.31
done
Comment 10 Martin Schovanek 2007-10-31 13:13:22 UTC
Still can reproduce, reopen.
Comment 11 Marek Fukala 2007-10-31 14:39:10 UTC
fixed. The problem of the previous fix was that it supposed that the UTF-16LE encoded stream has BOM. I extended the
logic so the code tries to read the file and find the meta tag using DEFAULT or found from BOM, UTF-16LE, UTF-16BE.

Checking in HtmlDataObject.java;
/cvs/html/src/org/netbeans/modules/html/HtmlDataObject.java,v  <--  HtmlDataObject.java
new revision: 1.33; previous revision: 1.32
done
Comment 12 Ken Frank 2007-11-04 20:03:07 UTC
Martin,

can you see if its now ok ?

ken.frank@sun.com