90921 – I18N - Unicode characters become '?'

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 90921 - I18N - Unicode characters become '?'

Summary: I18N - Unicode characters become '?'

Status:	CLOSED FIXED

Alias:	None

Product:	java
Classification:	Unclassified
Component:	Editor (show other bugs)
Version:	5.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	Tomas Zezula

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2006-12-12 07:45 UTC by _ tboudreau
Modified:	2007-06-10 20:44 UTC (History)
CC List:	4 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description _ tboudreau 2006-12-12 07:45:30 UTC

I have the following code in the editor:

    public char getUnicodeChar() {
        switch (this) {
            case HEARTS :
                return '♥';
            case DIAMONDS :
                return '♦';
            case CLUBS :
                return '♣';
            case SPADES :
                return '♠';
            default :
                throw new AssertionError();
        }
    }

(well, who knows what issuezilla will do to it either - the characters are the
constants you can find at the bottom of the page here: 
http://en.wikipedia.org/wiki/Playing_card ).

If I save the file, close it, garbage collect and reopen it, what I get is:
    
    public char getUnicodeChar() {
        switch (this) {
            case HEARTS :
                return '?';
            case DIAMONDS :
                return '?';
            case CLUBS :
                return '?';
            case SPADES :
                return '?';
            default :
                throw new AssertionError();
        }
    }

And indeed, when I run, it is really '?' that I get.

If the editor is going to look like it is accepting pastes of unicode
characters, it should not lose them on save.

Comment 1 Jiri Prox 2006-12-12 16:39:47 UTC

What is the encoding of the java file? Is it capable to accept such caracters?
You can find encoding in the properties of each javafile or default encoding can
be set in advanced options: Editing -> Java Source

Comment 2 Jesse Glick 2006-12-12 17:02:31 UTC

Cannot reproduce on my machine: Linux FC6 with default system locale UTF-8.

However I know that Java files can have various encodings. IMHO if the selected
encoding cannot represent some character in the document, the editor kit impl
must write out an appropriate \uXXXX escape sequence. BTW I think there is
already a bug almost exactly like this filed for .properties files.

Comment 3 Miloslav Metelka 2006-12-12 17:26:25 UTC

So it seems that there are special unicode characters directly used in the
example code.
IMHO the problem is that the charToByteConvertor used when saving the file is
likely unable to save the special chars properly.
Anyway the editor module only writes the document's content into a properly
configured java.io.Writer that must come as a parameter into EditorKit.write()
so reassigning to openide/editor for evaluation (IMHO this could be in fact a
dup of issue 19928 but not sure).

Comment 4 Jesse Glick 2006-12-12 17:59:27 UTC

Specific to the text/x-java content type.

Comment 5 _ tboudreau 2006-12-12 22:42:06 UTC

Setting the content type for the source file to UTF-8 fixes the problem - the
characters are preserved on save, so part of the problem is the default encoding
on Windows.

However, using UTF-8 in the source gets me:

Compiling 1 source file to
C:\space\nbsrc\serialversion\samples\nodes_and_explorer_views\CardsSuite\cards\build\classes
C:\space\nbsrc\serialversion\samples\nodes_and_explorer_views\CardsSuite\cards\src\org\netbeans\examples\cards\api\Suits.java:81:
unclosed character literal
                return 'â™¥';

So I'm still out of luck.

It would be preferable if the editor would somehow warn that I've put characters
into a document that cannot be saved in the encoding I'm using.  Nicer still
would be doing unicode escapes automagically.

Comment 6 Jesse Glick 2006-12-12 23:16:10 UTC

Of course if you set the file encoding to UTF-8 in the editor you still need to
change the Ant build to use that encoding. This and other things is separately
covered by a more general issue #42638. Any fix in this issue should be
considered a stopgap measure only, and maybe WONTFIX (or duplicate).

Comment 7 Jesse Glick 2007-03-22 19:08:13 UTC

Should be reevaluated now that issue #42638 is fixed.

Comment 8 Tomas Zezula 2007-03-23 08:43:51 UTC

Fixed for J2SEProject by fixing issue #42638. For other project types it should
start to work when FileEncodingQueryImplementation will be implemented for other
project types, see umbrella issue: #97848.

Create project, set UTF-8 encoding paste the code, save, close, open =>
correctly loaded, add a definition of DIAMONDS, HEARTS,... build and run.

Comment 9 Ken Frank 2007-06-10 20:33:02 UTC

am verifying

am in solaris ja euc locale (which usually emulates being in windows, as to
encoding of it
not being utf-8)

i paste the code with the special characters and it shows ok; I close editor and
reopen
and code shows ok.

the default encoding of the project is utf-8 even though am running ide from non
utf8 locale; am guessing that if project encoding was changed, it might be
expected then
that such special chars might not be seen ok.

I don't know if this was related also to running the project.

ken.frank@sun.com

Comment 10 Ken Frank 2007-06-10 20:44:03 UTC

running a j2se project, however, in ja solaris locale, using default proj
encoding of
utf-8,
the speciall characters mentioned in this report don't show ok in output window, but
as ? (even though ok in editor)

the multibyte characters do show ok in output window.

--> this can be filed separately - but isnt it expected that since utf8 is used
that these chars should show ok without user needing to set compiler or ant options 
related to encoding ?

ken.frank@sun.com