This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 116179 - I18N - [60cat] Encoding problems in editor when pasting from clipboard in Mac OS
Summary: I18N - [60cat] Encoding problems in editor when pasting from clipboard in Mac OS
Status: RESOLVED WORKSFORME
Alias: None
Product: editor
Classification: Unclassified
Component: -- Other -- (show other bugs)
Version: 6.x
Hardware: Macintosh Mac OS X
: P3 blocker (vote)
Assignee: issues@editor
URL:
Keywords: I18N
Depends on:
Blocks:
 
Reported: 2007-09-20 15:50 UTC by muhlig
Modified: 2007-12-14 20:34 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
Java file with pasted and typed file name (909 bytes, application/octet-stream)
2007-09-21 12:35 UTC, muhlig
Details
Screenshot in vim (19.25 KB, image/png)
2007-09-21 12:39 UTC, muhlig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description muhlig 2007-09-20 15:50:26 UTC
[ BUILD # : 200709141330 ]
[ JDK VERSION : 1.5.* ]

When pasting a string containing umlauts or other non-ascii characters into an editor, the caret shows weird behaviour:
1. an extra space is inserted after the word
2. if you delete a character the second character before the caret actually gets deleted instead of the first one

This only happens if you paste from an other application. If you copy the string in NetBeans and then paste it this anomaly does not turn up.

The problem appears in both the Java and the XML editor likewise.
Comment 1 Jiri Prox 2007-09-21 11:18:30 UTC
Please provide more info:
What encoding have you set in project properties? Is the file saved correctly? - e.g. on the disk the file contains all
characters and encoding is correct. Does the pasted text contain any tabs?
Comment 2 muhlig 2007-09-21 12:32:35 UTC
The project encoding is set to UTF-8 (so it looks like a problem with multi-byte characters). The text does not contain any whitespace, just displayable 
characters (e.g. "größer.png"). 

The problem actually only occurrs when copying a file name, so I would assume that it has something to do with Apple using decomposed unicode as their 
filesystem encoding.
I actually just tried to see what the result is in a .properties file and here is what is written to disk (both values are "größer.png"):

pasted=gro\u0308\u00dfer.png
typed=gr\u00f6\u00dfer.png

The first line shows the reported problems.

The strange thing that happens when you actually create a java.io.File object with that String as a parameter the file can be found, no matter if you use the 
typed or pasted name.
Comment 3 muhlig 2007-09-21 12:35:47 UTC
Created attachment 49228 [details]
Java file with pasted and typed file name
Comment 4 muhlig 2007-09-21 12:39:17 UTC
Created attachment 49229 [details]
Screenshot in vim
Comment 5 Vitezslav Stejskal 2007-09-21 13:12:52 UTC
Could you please try to reproduce this with the latest dev build? There was a problem with painting tab characters,
issue #115638. Its fix could have actually fixed this problem as well. Thanks.
Comment 6 muhlig 2007-09-21 14:49:00 UTC
I can see definitely see a difference, as now the characters are doubled when navigating the caret with the cursor keys persists (i.e. "größer.png" becomes  "größeer.png" or "größer.png"").

Doing some testing I found out that even some native OS X applications have this problem. It seems like it is more a Finder/HFS+ or MRJ bug than one in NetBeans as you cannot know if the pasted text is actually precomposed 
or decomposed (see http://developer.apple.com/qa/qa2001/qa1235.html for details).

I do not consider fixing this high priority but I think that the fix would actually be to precompose the pasted text if it contains decomposed characters.
Comment 7 Ken Frank 2007-11-29 23:03:51 UTC
am removing the incomplete keyword since filer has provided the information.

Can team evaluate this and see if its something fixable in nb vs being
an OS situation ?

ken.frank@sun.com
Comment 8 Vitezslav Stejskal 2007-12-03 11:21:13 UTC
I'm not sure if we will be able to workaround this somehow. I read the attached FAQ entry on the apple's dev side and
they suggest using their native APIs for converting/normalizing strings in applications that require precomoposed
characters. Since this is native C we would have to provide java bindings for it and use it in our clipboard interface
so that all Nb code could benefit from it (not only the editor and not only the IDE, but platform apps too). The obvious
place for this would be core/applemenu, which is why I'm passing this to core people.
Comment 9 Petr Nejedly 2007-12-03 13:35:58 UTC
Well, there's actually no magic in composing the characters back from the decomposed format.
See: http://pub.ks-and-ks.ne.jp/prog/unicode-precomposed.html

I believe we an quite easily do this on paste in the editor and maybe the composing table is already somewhere in the JDK,
but what if I really wanted to paste the character 0x0308? (e.g. to explain how composing works ;-))
Comment 10 Vitezslav Stejskal 2007-12-03 15:25:46 UTC
There is a lot about Unicode on www.unicode.org, especially http://www.unicode.org/reports/tr15/#Code_Sample could be
interesting. 

> maybe the composing table is already somewhere in the JDK
I'm not sure about that. The above link mentions Unicode Characters Database as the primary source of information about
Unicode characters. http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html states that its implementation is
based on data from UCD, but doesn't seem to make this data publicly accessible through its API.

Anyway, this will need some non-trivial amount of effort to implement properly, including testing and avoiding possible
performance implications. Plus I still don't think that this is entirely an editor problem. Text can be pasted to *any*
component in the IDE and/or a platform application. Fixing in the editor will leave all the other cases broken. Are you
sure this should not be addressed for everyone somewhere in openide/core?
Comment 11 muhlig 2007-12-04 00:23:42 UTC
After switching to Mac OS X 10.5 I actually was not able to reproduce this bug so I think it really is a bug in the Apple Runtime. I cannot say though, if it is 
fixed in release 6  DP1 for Tiger - someone else would have to test that.
Comment 12 Vitezslav Stejskal 2007-12-04 08:59:44 UTC
I see, so it's ok on Mac OS X 10.5. Just for the record what was the version you used before (where it was not working)?
If it's working in newer apple runtimes I would say we can close it as WORKSFORME. Do you agree?
Comment 13 muhlig 2007-12-04 09:20:38 UTC
I was using the latest update on 10.4.10 which was Java Release 5 (based on 1.5.0_07). I can imagine that this bugfix also goes into Release 6, as many of the 
features and bugfixes now present in Java for 10.5 are backported. But as always with Apple: noone knows when the update is released and whether it will 
contain that fix or not unless it is already implemented in the currently available developer preview. If someone has that preview installed already he/she 
could check if this issue has been addressed there.
This problem would then only exist on Mac OS X < 10.4 (which is not very common on a developer's machine) so WORKSFORME is fine for me. 
Comment 14 Vitezslav Stejskal 2007-12-04 09:53:44 UTC
Thanks for the explanation. So, it's fixed in 10.5 and may be fixed in 10.4 + Java Release 6, which has not been
released yet except for its developer preview. Is there anybody on the list with 10.4 Java Release 6 (dev preview), who
could confirm that this issue was fixed? If so, please add your comments here. Thanks
Comment 15 Miloslav Metelka 2007-12-05 15:55:35 UTC
BTW please note that editor just delegates its paste operation to JTextComponent.paste() and there is a bunch of JDK's
code underneath. I don't remember the exact reason but we already considered overriding the paste in the past but we had
cancelled that. JTC delegates the paste to a TransferHandler which is a client property of the JTC and can be changed
dynamically. The default transfer handler BasicTextUI.TextTransferHandler is a 450-lines package private class so we
would have to copy and maintain a bunch of extra code (though limited overriding is fine see e.g.
QuietEditorPane.DelegatingTransferHandler).
Comment 16 Petr Nejedly 2007-12-05 20:30:04 UTC
I would actually close this as "As designed". The OS really provided the text with decomposed unicode (which is legal)
and our editor just kept that in given format. I would guess that it was in fact implementation detail of the OS X
filesystem (it probably keeps filenames internally normalized into decomposed unicode so you can easily locate the file
using both composed and decomposed characters) sneaking out through DnD.

The only fault on our side would actually be rendering - such a decomposed unicode sequence should be rendered as a
single character, not two independent, but that would make the rendering engine much more complicated, I guess.
Comment 17 Ken Frank 2007-12-06 17:57:14 UTC
from comment/question below:
Thanks for the explanation. So, it's fixed in 10.5 and may be fixed in 10.4 + Java Release 6, which has not been
released yet except for its developer preview. Is there anybody on the list with 10.4 Java Release 6 (dev preview), who
could confirm that this issue was fixed? If so, please add your comments here. Thanks

--> Its not fixed in 10.4 using jdk preview.  Thanks to Jirka for investigating this.

ken.frank@sun.com
Comment 18 muhlig 2007-12-14 20:34:28 UTC
Reading the Java for Mac OS X 10.4 Release 6 Release Notes I found out that it actually should be fixed in the new release (fix for Radar #5375935):

http://developer.apple.com/releasenotes/Java/Java104R6RN/ResolvedIssues/chapter_3_section_6.html#//apple_ref/doc/uid/TP40006829-CH3-
DontLinkElementID_14

I cannot confirm this though, as I am on Leopard now.