51672 – Bad displaying of characters with diacritics in output window

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 51672 - Bad displaying of characters with diacritics in output window

Summary: Bad displaying of characters with diacritics in output window

Status:	CLOSED DUPLICATE of bug 19928

Alias:	None

Product:	projects
Classification:	Unclassified
Component:	Ant (show other bugs)
Version:	4.x
Hardware:	PC Linux

Importance:	P3 blocker (vote)
Assignee:	issues@projects

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-11-19 12:36 UTC by Roman Strobl
Modified:	2006-03-24 09:49 UTC (History)
CC List:	4 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Netbeans czech characters screenshot (75.36 KB, image/png) 2004-11-19 15:31 UTC, Roman Strobl	Details
Screenshot - Czech characters in JBuilderX (65.57 KB, image/png) 2004-11-19 15:32 UTC, Roman Strobl	Details
Screenshot of Czech characters in console (101.45 KB, image/png) 2004-11-19 15:43 UTC, Roman Strobl	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Roman Strobl 2004-11-19 12:36:47 UTC

Not sure if this is the correct module, please
reassign if necessary.

When I try to create a class which prints
characters with diacritics, they are incorrectly
displayed in output window. This happens even if I
set the character encoding attribute of the class
to ISO8850_2. As an example try following class:

public class TestClass {
        
    public static void main(String[] args) {
        System.out.println("P&#345;íli lu&#357;ou&#269;ký k&#367;&#328;
úp&#283;l &#271;ábelské ódy.");
    }
   
}

The text displayed in the output window is:

P&#65533;&#65533;li&#65533; &#65533;lu&#65533;ou&#65533;k&#65533; k&#65533;&#65533; &#65533;p&#65533;l &#65533;&#65533;belsk&#65533; &#65533;dy.

My OS is Java Desktop System v.2.

Comment 1 Roman Strobl 2004-11-19 12:39:11 UTC

Ok, it happened as I've expected, netbeans.org doesn't handle
characters with czech diacritics, too :-) The sentence in source code
is with correct Czech characters. The text in the output window
contains squares instead of characters with diacritics. As far as I
remember this works well for other IDEs such as JBuilder.

Comment 2 Milos Kleint 2004-11-19 12:51:52 UTC

please include details about yout environment:
JDK

Does it print correctly when executed on the command-line? did you try
the jbuilder on the JDS or elsewhere?

Comment 3 Roman Strobl 2004-11-19 15:30:14 UTC

This issue occurs with JDK 1.5.0-b64.

Yes, I've tried JBuilderX, works out of box on JDS (but uses JDK
1.4.2_01). Works fine also from Linux console with JDK 1.5.0 (I am
more than surprised) after setting locale to cs_CZ.ISO-8859-2 and
setting correct console font. I'll try if this works with JBuilder2005
which supports 1.5 JDK. 

I cannot make it work with Netbeans no matter if the encoding is set
or not in property sheet. I attach screenshots.

Comment 4 Roman Strobl 2004-11-19 15:31:48 UTC

Created attachment 18969 [details]
Netbeans czech characters screenshot

Comment 5 Roman Strobl 2004-11-19 15:32:55 UTC

Created attachment 18970 [details]
Screenshot - Czech characters in JBuilderX

Comment 6 Roman Strobl 2004-11-19 15:43:07 UTC

Created attachment 18971 [details]
Screenshot of Czech characters in console

Comment 7 Milos Kleint 2004-11-20 17:42:58 UTC

ok, thanks for the info.

Tim, any hints where to look and how to fix?

Comment 8 _ tboudreau 2004-11-20 20:17:49 UTC

Well, the output window itself handles these characters fine (try running the task core/
output2/build.xml|demo from within the IDE - the task starts with):

<echo>And I am &#x05D9;&#x05E9;&#x05D9; &#x05D1;&#x05E0; &#x05DC;&#x05D6;
&#x05E8; or similar.</echo>

In fact, printing both hebrew (R2L) and czech characters were things we tested when it was 
under development.

Embedding them directly in a string sent to the output window was not;  it's expecting the 
actual UTF-16 characters, not this encoded form.  So, the question is *if* such translation 
should be done, and if so, who should do it.

Comment 9 Milos Kleint 2004-11-20 21:06:41 UTC

well, if you ask me the question, as lazy as I am, I would say output2
just handles UTF-16 and nothing else :)

Seriously though. IMHO the only *unsecure* printouts can come from
user programs, correct? or are there other places? what if someone
uses chinese localization of ant for example (is there something like
that?)

within the IDE we are able to ensure UTF-16 only usage I suppose. So
the correct place seems to be the code that is responsible for
wrapping and printing the user program's output. where is that?

Comment 10 _ tboudreau 2004-11-20 21:29:15 UTC

Either the Ant execution piece, or some layer above the output window would be my 
preference;  probably best to keep all the output window code itself just handling UTF-16 
(though I suppose such a layer could be put into NbWriter if absolutely necessary - there's 
a reasonable amount of abstraction there).

Comment 11 Jan Chalupa 2004-11-21 21:11:53 UTC

IMO, the problem is in how Java determines the encoding based on the 
OS setting and how it sets the file.encoding property. On my Win XP 
system, I use English (US) locale and NetBeans starts with 
file.encoding set to Cp1252 by default. I can force it to run with a 
Eastern Europe friendly encoding (Cp1250 or ISO8859_2) by setting 
the property when starting the IDE, e.g.:

  nb.exe -J-Dfile.encoding=Cp1250

This is important to get a localized file display and save properly, 
but it doesn't help when the localized output of an executed project 
needs to be displayed in the Output window. By default, the project 
is run in another VM (right?) which doesn't inherit runtime args and 
properties from the launching VM. The default encoding is used in 
this case and the IDE captures and displays potentially mangled 
localized output.

A simple fix that works for me is to explicitly set the 
file.encoding property for the project using Project Properties | 
Running Project | VM Options -> -Dfile.encoding=Cp1250. This way, 
both the IDE and the externally executed program use the same 
encoding and the Output window displays localized output just fine.

Interestingly, in NB 3.6 it works exactly the same way, although it 
uses a different output window component. No matter whether the IDE 
runs with the default or Eastern European encoding, output of a 
localized program executed using External Execution is mangled. 
Switching to Internal Execution fixes the problem (provided that 
file.encoding was set to Cp1250 when starting the IDE).

I'm not sure if this solution can be used on other operating systems 
as well. I'd expect the behavior to be similar. If so, this bug 
should probably be closed as WONTFIX. Obviously, if the user has the 
OS locale set correctly and Java uses it to determine the file 
encoding, this shouldn't be an issue at all.

Comment 12 _ tboudreau 2004-11-21 21:22:41 UTC

Probably the plumbing between Ant execution and the output window could do translation 
from the default encoding to UTF-16 (presumably it can detect correctly what the default 
encoding is via Locale.getDefault()) - it knows more about what encoding it should be 
expecting than the output window possibly can, so I think putting this stuff in the output 
window would likely just result in a similar set of problems.  The client doing the writing 
has a better shot at knowing what's being written.

Comment 13 Roman Strobl 2004-11-22 09:40:08 UTC

I'm glad there was such a discussion opened around this issue. I would
like to emphasize that this *just works* out of box in JBuilderX
without any special settings so I would expect from Netbeans similar
functionality. If this would not be possible, then according to my
common sense I would expect that setting a file encoding to the
executed class should be enough. If this would not be possible, then
there would be probably needed instructions how to configure NB in
such a way that various charsets work well on various platforms.
That's my IMO.

Comment 14 Milos Kleint 2004-11-22 09:59:34 UTC

Agree. The conversion should be imho done on the side of the executor
of user programs, not output window. 

Since this was there in the old output as well and that it works with
internal execution, I would say it's a quite uncommon usecase (that
also explains why it was not discovered by users of the IDE so far)

Comment 15 Roman Strobl 2004-11-24 10:04:29 UTC

FYI, works out of box in JBuilder2005 with JDK 1.5.0_01, no extra
settings, even no locale settings. But most importantly it works out
of box with Eclipse 3.0.1 and JDK 1.5.0.

Comment 16 _ tboudreau 2004-11-25 07:07:45 UTC

Yes, that's well and good, certainly it should work out of the box.  The question of the 
moment is whether the output window itself should dereference the characters (I'm 99.9% 
sure it should *not*), or whether the infrastructure that feeds the output window with text 
should be doing the dereferencing.  Whenever you design a piece of software it should do 
one thing, only that thing and do it well.  In this case, the process which is sending text to 
the output window has a much better shot at figuring out what the characters being sent 
to it mean.  Yes, it needs to be fixed.  But fixing it wrong is worse than not fixing it at all.

Comment 17 Jesse Glick 2004-12-01 19:03:57 UTC

Probably you did not compile the class with the correct encoding when
using NB, but that's just a guess.

Everything works perfectly fine, out of the box, for me on Fedora Core
2/3, using UTF-8 encoding systemwide.

Any specific requests are probably covered by issue #19928. Generally,
if you set things up to use UTF-8 consistently, everything is fine;
otherwise you are in a world of pain and you will have to fix things
piece by piece.

Ant (incl. NB's integration of it) and the output window should handle
whatever encoding fine, but javac requires an -encoding argument in
general. (IMHO the Java spec should have enforced UTF-8 as the only
encoding possible, but too late for that now.)

*** This issue has been marked as a duplicate of 19928 ***

Comment 18 Roman Strobl 2004-12-03 16:03:15 UTC

I have updated my system to JDS release 3 (build 14) and the example
works now with various versions of Netbeans and JDKs! The difference
I've mentioned is that the environmental variable LANG is now set to
en_US.UTF-8 and this seems to be the reason why it started to work
now. AFAIK JDS made a switch from country encoding (ISO-xxxx-x) to
UTF-8, like most other Linuxes including RedHat. If I change the LANG
variable to en_US only, it stops to work.

So most probably the encoding of output window will work out of box on
new Linuxes and JDS where the LANG variable is set correctly. On older
systems it should help to set LANG=en_US.UTF-8, cs_CZ.UTF-8 or
similar. It is enough to set it right before launching Netbeans like:

export LANG=cs_CZ.UTF-8
/path-to-netbeans/bin/netbeans

Setting status of this issue to verified.