41385 – I18N - Term does not correctly render Unicode

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 41385 - I18N - Term does not correctly render Unicode

Summary: I18N - Term does not correctly render Unicode

Status:	NEW

Alias:	None

Product:	cnd
Classification:	Unclassified
Component:	Terminalemulator (show other bugs)
Version:	6.x
Hardware:	PC Linux

Importance:	P4 blocker with 1 vote (vote)
Assignee:	ivan

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2004-03-25 23:55 UTC by Jesse Glick
Modified:	2017-01-29 06:37 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Screenshot of term output (2.52 KB, image/png) 2004-03-25 23:56 UTC, Jesse Glick	Details
Dialog showing proper rendering (6.80 KB, image/png) 2004-03-26 00:01 UTC, Jesse Glick	Details
Screenshot of output2 on same program (7.31 KB, image/png) 2008-10-23 19:01 UTC, Jesse Glick	Details
Program run from Gnome Terminal, showing lack of RTL rendering in Hebrew, and only partial Indic support (KA+VIRAMA क् correct but not combined with SSA ष) (7.14 KB, image/png) 2008-10-23 19:07 UTC, Jesse Glick	Details
From FSF Emacs/X, showing complete lack of Unicode support (plus some missing glyphs) (1.64 KB, image/png) 2008-10-23 19:08 UTC, Jesse Glick	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jesse Glick 2004-03-25 23:55:34 UTC

Consider the following program, run with Ant in a
NB dev build on a RH9 system (system locale UTF-8):

public class Main {
    public static void main(String[] args) {
        System.out.println("My name is
\u05D9\u05E9\u05D9 \u05D1\u05DF
\u05DC\u05D6\u05E8...");
        System.out.println("And \u0915\u094D\u0937
is KA VIRAMA SSA");
    }
}

The first line should render the Hebrew
right-to-left, but Term renders it left-to-right.
The second line should render a single glyph for
the three Unicode characters (due to Devanagari
orthographic rules), but it is rendered as three
separate glyphs without consideration to their
proper display.

Screenshot attached.

Comment 1 Jesse Glick 2004-03-25 23:56:45 UTC

Created attachment 14160 [details]
Screenshot of term output

Comment 2 Jesse Glick 2004-03-26 00:00:22 UTC

That Java itself supports these rendering styles can be seen plainly
from the modified program:

import java.awt.Font;
import javax.swing.JLabel;
import javax.swing.JOptionPane;
public class Main {
    public static void main(String[] args) {
        String text = "My name is \u05D9\u05E9\u05D9 \u05D1\u05DF
\u05DC\u05D6\u05E8...\nAnd \u0915\u094D\u0937 is KA VIRAMA SSA";
        System.out.println(text);
        JLabel l = new JLabel(text);
        l.setFont(new Font("Monospaced", Font.PLAIN, 24));
        JOptionPane.showMessageDialog(null, l);
    }
}

Comment 3 Jesse Glick 2004-03-26 00:01:57 UTC

Created attachment 14161 [details]
Dialog showing proper rendering

Comment 4 _ tboudreau 2004-04-19 14:06:11 UTC

Reassigning to Marek, new owner of output window and help system

Comment 5 Marian Mirilovic 2006-09-25 10:50:28 UTC

The core team has not been responsible for terminal emulator for long time, so
reassigne all opened issues to responsible person.

Comment 6 ralphrmartin 2008-08-20 15:05:13 UTC

This is a serious issue for anyone using Netbeans to develop programs using non-western fonts.

Please can we have some action on this?

Comment 7 Ken Frank 2008-08-20 15:53:48 UTC

to developers, is this still correct category for this issue ?

also, is p4 (vs p3) still valid for it ?

ralph, see also 96333 and 96472 and 91440 on rtl issues and please comment there as needed.

ken.frank@sun.com

Comment 8 Ken Frank 2008-08-20 16:05:33 UTC

also see 69750 and 98904 related to rtl.

Comment 9 ivan 2008-08-20 19:38:29 UTC

Jesse seems to have filed this very explicitly against the terminalemulator.
The terminalemulator is not being used in mainline netbeans so it can't be
a "serious issue".
I know for a fact that Term doesn't do left-to-right unicode and overlayed
characters so this is still an issue for Term.

It needs to be determined whether output2 which replaced terminalemulator
still suffers from this. When I ried it on my Solaris9 I just got
???'s so it's hard for me to tell.

Term still needs to address this as it having a comeback 
(see http://wiki.netbeans.org/TerminalEmulator) so P4 seems just right.

Comment 10 ivan 2008-10-22 09:20:21 UTC

I can testify that there is no code in Term that handles 
left-to-right vs right-to-left rendering of unicode chracters
so this needs to be looked into (The trick is get a hebrew locale :-)

Comment 11 Jesse Glick 2008-10-23 18:59:00 UTC

You do not need to run in any special locale to observe this, as the rendering is specific to the characters themselves,
not the environment.

Whether this is worth fixing is another question. RTL rendering for standard Hebrew (without vowel points or other
diacritics) would not be too hard, but correct rendering for natural-language text in Arabic, Indic, and some other
scripts would be very hard in term if it assumes that each character occupies a distinct rectangle of fixed size, which
is not even remotely true. Perhaps we don't care about supporting program output in such scripts.


Note that it seems to work tolerably well in output2, as I will show in a new attachment. At least the basic display is
correct, for both RTL and composite glyphs.

Unlike a true Unicode-aware editor, output2 in text-wrap mode seems to implement caret display (for movement and
highlighting) at the glyph level rather than the character level, so that you can highlight e.g. (and I will try to
annotate this with Latin equivalents in case your browser's RTL rendering is messed up):

My name is ישי בן לזר...
        ^^^^^^
My name is RZL NB JSJ... (~ "yishai ben lazer", supposedly my name)

whereas using Shift-RightArrow in a proper editor (JTextArea, gedit, Firefox text area, ...) would produce e.g.

My name is ישי בן לזר...
        ^^^       ^^^
My name is RZL NB JSJ...

in the logical character order. Confusingly, output2 really is selecting ישי (as you will see if you copy and paste to
another editor), but displays the highlight on לזר.

In non-text-wrap mode, output2 behaves differently, though also incorrectly, showing the highlight as

My name is בן לזר...ישי
        ^^^^^^
My name is JSJ...RZL NB

or some corrupt variant depending on whether or not you refresh the display (i.e. the paint logic does not keep track of
every keystroke).

For Indic rendering, output2 in TWM correctly displays क्ष as a single glyph and highlights it as such with a single
caret movement, but does not copy & paste correctly (just copies क, the first character). In NTWM it behaves correctly,
letting you highlight  क (KA ~ /ka/) then  क् (KA+VIRAMA ~ /k/) then  क्ष (KA+VIRAMA+SSA ~ /ksha/) in sequence, and
copying them appropriately to clipboard.

Comment 12 Jesse Glick 2008-10-23 19:01:29 UTC

Created attachment 72553 [details]
Screenshot of output2 on same program

Comment 13 Jesse Glick 2008-10-23 19:07:36 UTC

Created attachment 72554 [details]
Program run from Gnome Terminal, showing lack of RTL rendering in Hebrew, and only partial Indic support (KA+VIRAMA क् correct but not combined with SSA ष)

Comment 14 Jesse Glick 2008-10-23 19:08:29 UTC

Created attachment 72555 [details]
From FSF Emacs/X, showing complete lack of Unicode support (plus some missing glyphs)