156117 – I18N : parsing JavaFX codes with Japanese characters fails

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 156117 - I18N : parsing JavaFX codes with Japanese characters fails

Summary: I18N : parsing JavaFX codes with Japanese characters fails

Status:	VERIFIED FIXED

Alias:	None

Product:	javafx
Classification:	Unclassified
Component:	Editor (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	Anton Chechel

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2008-12-29 04:04 UTC by Masaki Katakai
Modified:	2009-03-23 13:53 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
sample project, contains two JavaFX files. Try it with windows-31j encoding (8.05 KB, application/x-compressed) 2008-12-29 04:08 UTC, Masaki Katakai	Details
screenshot : wrong code coloring, error underline is showing in this case. Changing Japanese to ascii will fix it. (203.59 KB, image/png) 2008-12-29 04:11 UTC, Masaki Katakai	Details
screenshot : wrong code coloring - " is wrong color. (204.39 KB, image/png) 2008-12-29 04:12 UTC, Masaki Katakai	Details
fix import, format code causes unexpected deletion of user codes. (229.34 KB, image/png) 2008-12-29 04:15 UTC, Masaki Katakai	Details
another example for UTF-8 locale (8.14 KB, application/x-compressed) 2009-02-17 01:40 UTC, Masaki Katakai	Details
patch for returning a byte by read() of LexerInputStream (2.13 KB, patch) 2009-03-09 04:01 UTC, Masaki Katakai	Details \| Diff
this patch should work better (1.39 KB, patch) 2009-03-09 12:02 UTC, Adam Sotona	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Masaki Katakai 2008-12-29 04:04:05 UTC

NB : 6.5
JavaFX : 1.0 plugin
OS : Windows XP Japanese
Encoding of project is set to windows-31j for workaround.

It seems that parsing JavaFX codes fails when it contains Japanaese characters.
Color highlighting is not correct on editor. Fixing import and format codes
on editor in such situation causes unexpected deletion, users codes will be lost.

I'll attach the sample project and scheenshots.

Comment 1 Masaki Katakai 2008-12-29 04:08:33 UTC

Created attachment 75336 [details]
sample project, contains two JavaFX files. Try it with windows-31j encoding

Comment 2 Masaki Katakai 2008-12-29 04:11:17 UTC

Created attachment 75337 [details]
screenshot : wrong code coloring, error underline is showing in this case. Changing Japanese to ascii will fix it.

Comment 3 Masaki Katakai 2008-12-29 04:12:41 UTC

Created attachment 75338 [details]
screenshot : wrong code coloring - " is wrong color.

Comment 4 Masaki Katakai 2008-12-29 04:15:42 UTC

Created attachment 75339 [details]
fix import, format code causes unexpected deletion of user codes.

Comment 5 David Strupl 2009-01-19 13:59:55 UTC

Rasta, can you please check this one?

Comment 6 Rastislav Komara 2009-01-23 10:52:54 UTC

Lowering priority. This happens only on special combination of ENV and characters in file.

Comment 7 Masaki Katakai 2009-02-13 02:44:58 UTC

I don't think it's a rare case, most users are using Japanese in JavaFX codes on Windows platform. Once this issue
happens, it slows down the IDE performance.

Comment 8 Masaki Katakai 2009-02-13 02:57:22 UTC

The interesting thing is that when I use UTF-8 encoding in IDE (not source encoding) via setting
"-J-Dfile.encoding=UTF-8" option, this error does not happen.

$ netbeans

IDE uses ShiftJIS on Japanese Windows. It causes this parse error.

$ netbeans -J-Dfile.encoding=UTF-8

I can not see this issue. The project(source) encoding does not matter.

On Mac, the default encoding of Java is UTF-8 on JDK5, ShiftJIS on JDK6 in Japanese environment. So the issue happens
when users are using JDK6.

Comment 9 Masaki Katakai 2009-02-17 01:38:29 UTC

I'm sorry the last comment was not correct. Using -Dfile.encoding=UTF-8 only solved
the issue in the sample project attached before (JavaFXApplication1.zip).
Using -Dfile.encoding=UTF-8 changed the behavior but there are another cases.
I'll attach sample project.

I'm trying to see source codes, it seems that the following line should be
changed to specify the encoding. I think that's why it depends on file.encoding.

javafx.lexer/src/org/netbeans/lib/javafx/lexer/JFXLexer.java:
-            ANTLRReaderStream input = new ANTLRInputStream(reader);
+            ANTLRReaderStream input = new ANTLRInputStream(reader, "UTF-8");

Another issue is that it seems that input.LA(1) in v4Lexer.java does not
return multibytes properly. I don't know why, when the data is loaded,
these data are properly stored, however input.LA() is not correctly returning.

If data is "\u307b" but it looks the following input.LA(1) in mDoubleQuoteBody()
returns only 0x7b.

                int LA5_0 = input.LA(1);

Comment 10 Masaki Katakai 2009-02-17 01:40:26 UTC

Created attachment 77052 [details]
another example for UTF-8 locale

Comment 11 Rastislav Komara 2009-03-02 10:20:29 UTC

reassigning to new owner.

Comment 12 Masaki Katakai 2009-03-09 03:58:10 UTC

It seems that the root cause is, read() is not implemented to return a byte in LexerInputStream.
LexerInputStream extends InputStream,

    static class LexerInputStream extends InputStream {

so the read() method needs to return a byte, not a character.

http://java.sun.com/javase/6/docs/api/java/io/InputStream.html#read()

   Applications that need to define a subclass of InputStream  must always provide a method that returns the next byte
of input. 

Currently read() returns a character by reading the contents of editor,
the contents are already encoded to corresponding characters, but
byte is expected as return value so the character is corrupted.

It seems that ANTLRInputStream() accepts only InputStream,
so I think we need to make read() return a  byte.
I could not find any methods for reading bytes from editor contents,
so I made very simple patch in read() method.

It's just an example. It's not good code. I didn't care the performance and errors, but
it's working for me, unexpected error stripe and wrong coloring disappear.

Could you please check the patch and think the better and reasonable fix?

Comment 13 Masaki Katakai 2009-03-09 04:01:09 UTC

Created attachment 77890 [details]
patch for returning a byte by read() of LexerInputStream

Comment 14 Adam Sotona 2009-03-09 10:31:21 UTC

The proposed patch contains a bug - whenever Lexer reaches EOF the rest of the "si" queue is not returned.
Also using ArrayList.remove(0) for the queue implementation is a performance issue.
I'll check for the possible fix.

Comment 15 Adam Sotona 2009-03-09 12:02:51 UTC

Created attachment 77900 [details]
this patch should work better

Comment 16 Adam Sotona 2009-03-09 12:03:30 UTC

fixed

Comment 17 Masaki Katakai 2009-03-09 14:25:13 UTC

Great! Thank you for fixing! I built it locally and tried on Windows and Mac. It works for now, performance is also
really better than mine.

Comment 18 Masaki Katakai 2009-03-23 13:53:10 UTC

On my environment, it's now working fine.