123631 – I18N: sometimes unicode ascii characters are displayed on Editor, it should be converted to native characters

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 123631 - I18N: sometimes unicode ascii characters are displayed on Editor, it should be converted to native characters

Summary: I18N: sometimes unicode ascii characters are displayed on Editor, it should b...

Status:	VERIFIED FIXED

Alias:	None

Product:	utilities
Classification:	Unclassified
Component:	Properties (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	Marian Petras

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2007-12-07 13:26 UTC by Masaki Katakai
Modified:	2008-05-02 06:44 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
only \u3093 is still displayed as unicode ascii on editor, others are native characters (143.48 KB, image/png) 2007-12-07 13:29 UTC, Masaki Katakai	Details
But the character is displayed OK in table Properties editor (45.07 KB, image/png) 2007-12-07 13:31 UTC, Masaki Katakai	Details
Japanese .properties file (6.49 KB, application/octet-stream) 2007-12-07 13:35 UTC, Masaki Katakai	Details
example - smaller file (2.13 KB, text/plain) 2007-12-25 04:31 UTC, Masaki Katakai	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Masaki Katakai 2007-12-07 13:26:34 UTC

NetBeans IDE 6.0 (Build 200711261600)
Java: 1.5.0_13, 1.6.0_03
Japanese locale on Solaris and Windows

We can edit native characters as they are on plain text editor of NetBeans 6.0.
Unicode ascii \uXXXX are all converted to native characters when it's opened.
However, when I'm using the editor, I can still see some \uXXXX format on it.
It's valid code point but it seems that the character can not be converted to
the native character.

When I select "Open" to open the properties editor, the character is displayed
properly (converted to native character) on the table.

The source code is correct. I verified it on another editor.

So it seems that some problem happens when the editor opens and converts
unicode ascii to native encoding. I can not find any rule (exact code point) yet.

Comment 1 Masaki Katakai 2007-12-07 13:29:24 UTC

Created attachment 53988 [details]
only \u3093 is still displayed as unicode ascii on editor, others are native characters

Comment 2 Masaki Katakai 2007-12-07 13:31:29 UTC

Created attachment 53989 [details]
But the character is displayed OK in table Properties editor

Comment 3 Masaki Katakai 2007-12-07 13:35:00 UTC

Created attachment 53990 [details]
Japanese .properties file

Comment 4 Ken Frank 2007-12-13 19:32:29 UTC

for nb6, there were many enhancements and fixes related to properties files, non ascii,
and the escapped ascii, and we can see if what is seen is part of that functionality;
that is there were some different behavior in editor view compared to  props editor view.

lets let developer comment on this to see if issue or rfe or if as designed and ok.

ken.frank@sun.com

Comment 5 Masaki Katakai 2007-12-14 07:17:02 UTC

> or rfe or if as designed and ok.

I don't understand what you want to say.
Do you think this is designed/expected behavior?

Comment 6 Ken Frank 2007-12-14 16:53:28 UTC

reply to this comment
> or rfe or if as designed and ok.

I don't understand what you want to say.
Do you think this is designed/expected behavior?

--> you might want to read the comment completely to see the context
of what is being said.  I am explaining some possible situations that might
cause this issue - it could be as designed, it could be a valid issue, it could
be a valid rfe.  And I said that developer evaluation will be able to explain about that.

Does that help you understand what I was saying ?

ken.frank@sun.com

Comment 7 Masaki Katakai 2007-12-25 04:31:54 UTC

Created attachment 54493 [details]
example - smaller file

Comment 8 Masaki Katakai 2007-12-25 04:39:44 UTC

Attached small file for example. This problem always happens with the file.

"\u30c8" is displayed as unicode ascii in the last line.

ACSD_TargetMappingPanel_jLabel10=\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30ce\u30fc\u30c9

It seems that this problem happens the file size is over 8192 bytes.

properties/src/org/netbeans/modules/properties/PropertiesEncoding.java

in decodeLoop()

                        } else if (emptyIn && hasPendingCharacters()) {
                            handlePendingCharacters();
                        }

It seems that handlePendingCharacters() is called even if characters
are remaining after 8192 bytes in file, so just "\u30" of "\u30c8" is
stored into outBuf by flushUnicodeSequence(). I think flushUnicodeSequence()
should be called only when there is no remaining characters in file.

"c8" in the beginning of next reading buffer should be handled with
"\u30" again because "\u30c8" is one unicode character.

Comment 9 Marian Petras 2008-01-09 08:35:51 UTC

I confirm the bug. But I do not consider it a P2 bug - I lower the priority to P3.

Comment 10 Marian Petras 2008-01-10 09:20:42 UTC

Yes, you are right about the cause of the bug.

Comment 11 Marian Petras 2008-01-10 10:27:13 UTC

I am afraid I cannot fix this without creating a more serious bug.

The two lines

                        } else if (emptyIn && hasPendingCharacters()) {
                            handlePendingCharacters();
                        }

are there to workaround a bug that implFlush() is not called at the end of the decoding routine. If there was not this
bug, the decoder would rely on it and put 'handlePendingCharacters()' there. But since implFlush() is not called, the
routine for handling pending characters must be invoked at some other moment.

The JDK bugs causing this are also mentioned in the source code of the decoder. The bugs are:

    http://bugs.sun.com/view_bug.do?bug_id=6221056
        - CharsetEncoder.encode(ByteBuffer) should call flush(ByteBuffer)   (fixed in JDK 6)
    http://bugs.sun.com/view_bug.do?bug_id=4744247
        - StreamDecoder.CharsetSD.read does not invoke CharsetDecoder.flush  (reported in 2002, still not fixed!)

If you find any better solution, reopen this bug and let me know. Thanks.

Comment 12 Marian Petras 2008-01-10 10:37:28 UTC

Now I realized that this bug should remain open. The reason is that the mere fix in JDK will not fix this issue - this
issue is caused by a workaround of the JDK bug and once the JDK bug is fixed in a publicly available release, this
workaround must be disabled in the NetBeans code.

Comment 13 Marian Petras 2008-01-10 10:39:09 UTC

I will try to make the fix appear in some update release of JDK 6.

Comment 14 Marian Petras 2008-01-31 13:52:26 UTC

I urged fix of JDK bug 4744247 but I have not seen any effect so far.

Comment 15 Marian Petras 2008-01-31 17:29:42 UTC

I will try to fix the editor issue by making a workaround for the JDK bug - via the PropertiesEditorKit.

Comment 16 Marian Petras 2008-03-03 14:24:40 UTC

Fixed.

The fix only fixes the problem in the editor. It is not a general fix of the decoding routine. E.g. when searching a
Japanese file using the Find in Projects tool, some Japanese characters can still remain in the form of a \u....
sequence and thus a matching occurrence of a substring can be missed.

Modified files:
    properties/src/org/netbeans/modules/properties/PropertiesEditorSupport.java
    properties/src/org/netbeans/modules/properties/PropertiesEncoding.java

Changeset Id:
93816164a693
(http://hg.netbeans.org/main/rev/93816164a693)

Comment 17 Marian Petras 2008-03-03 14:26:40 UTC

Correction of the target milestone: it will not be in 6.1 M2 but it will be in the final release.

Comment 18 Masaki Katakai 2008-05-02 06:44:35 UTC

Thank you!

I verified the fix on editor and still not working on search window properly.
I'll file another request for the search issue.