This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Multibytes characters such as Korean ones occupy 2 columns per character. The IDE source editors don't count this. They increase column counter by 1 regardless of the character currently being input. In summary, Editor module should consider the fact that Multibytes character such as Korean ones are 2 columns wide. (Maybe Chinese and Japanese characters are such ones)
Are there any other editors that count the Korean and Japanese chars as 2 columns?
Target milestone -> 3.3.1.
Add "I18N" prefix at Summary, and "jf4jbug@netbeans.org" at cc for tracking this bug.
As for C, we have wscol() to get the column width of wide characters. This function returns the screen display width. Please type on Solaris: % man wscol Please note that the column number is not equal to byte (alphanumeric is 1 column, but it's 2 byte in Unicode). And not all CJK characters are 2 columns, some of them are 1 column. So if we don't have any method like above wscol() of C, I guess it is difficult to implement. Do you have any information to get column width? The current editor shows "number of characters" for wide char instead of number of columns.
*** Issue 19310 has been marked as a duplicate of this issue. ***
Set target milestone to TBD
I was searching for "java wscol" and also for "java wide character width" occurrences on the internet but there was nothing really useful found. So I regret but there seems to be no way to obtain this information for java at this time. So I modify this to be an enhancement. If you are aware of a way how to find this information in java please update the issue. Thanks.
*** Issue 164820 has been marked as a duplicate of this issue. ***
See also issue #164820.
Thanks to wcwidth/wcswidth implementation provided by Markus Kuhn, I ported his C implementation to Java. http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c I think it is helpful on counting column width taken by the characters. I attached the Java version of wcwidth/wcswidth and patch here. I have a little doubt on my patch. + if (Character.isHighSurrogate(buffer[offset])) { + codePoint = Character.toCodePoint(buffer[offset], buffer[offset + 1]); Should the offset+1 be tested whether it is in the range of (offset, offset2)? In normal cases, a low surrogate should follow after the high surrogate. If there is a wrong cutting range of buffer or offset2 parameter, an IndexOutOfBoundsException might be thrown. I think it might be better to throw an exception to notify the error here somehow. Again, it seems that Tibetan characters are displayed improperly in Java. From the algorithm, Tibetan characters are counted as 1 width column as Latin characters do, not 2 width column as CJK characters do. When I use Microsoft Himalaya, a font with Tibetan characters support, to display some texts containing Tibetan characters, Java displays Tibetan characters as 2 width column and makes the algorithm look wried. Windows 7's Notepad counts and displays Tibetan characters as 1 width column characters, so it doesn't cause any problem. From the screenshot, you can see the counting algorithm gets the same result as Notepad, but differs in other line which contains no such characters. Could someone give me an explanation? And this algorithm calculates some characters like Yi SYLLABLE differently from Notepad. I'm still working on these points. But it works great for most CJK characters, half-width Katakana characters and Latin characters.
Created attachment 87383 [details] #17356 patch (need reviewed and further mod)
Created attachment 87384 [details] counting result comparison (Tibetan ch line, non-Tibetan line, and Notepad)
Created attachment 87385 [details] Counting algorithm demo on CJK / Half-width KATAKANA / Latin mixed
Sorry. Windows 7's Notepad didn't not count characters in their column widths. It is like what NetBeans does currently, counting only by the number of characters. Supplement characters will be counted as 2 columns, just because in Windows, wchar_t is 2-bytes and supplement characters are represented as a couple of wchar_t, so they would be counted as 2 characters. Eclipse does the same thing. I confirmed that CJK characters in BMPs and supplementary planes behave differently in both Notepad and Eclipse. Abouut the counting difference between lines with and without Tibetan characters, it might be caused by the proportional font. It is no doubt that proportional font would cause the algorithm look weird, but reasonable.
Created attachment 87446 [details] Screenshot: Notepad counts differently on bmp and supplementary characters.
Created attachment 87447 [details] Screenshot: Eclipse behaves the same with notepad.
Maybe another NetFIX candidate.
Good idea Vito. I have added this candidate to the NetFIX Pool [1]. Thanks. [1] http://wiki.netbeans.org/NetFIXIssues
Could anyone attach an file with this kind of character for future tests?
Created attachment 88405 [details] Character samples
Hi hmichel, I've attached a file with some character samples. Hope it might be helpful. The file contains 1) Latin, 2) Chinese characters, 3) Japanese Kanji, KATAKANA, HIRAGANA and half-width KATAKANA characters, 4) Korean Hanja and Hangul characters (I barely know Korean. The Korean characters are copied from Wikipedia). It would cover most cases of interesting. I also provided a file with some supplementary characters.
Created attachment 88406 [details] Some supplementary characters (CJK characters)
And this is a text file with all lines ends exactly in 83 columns (6x83).
Created attachment 88409 [details] 6x83 Character samples
Vito, can you please review last Lao's patch and integrate it if you find it safe? Thanks a lot!
Sorry for the delay - http://hg.netbeans.org/jet-main/rev/f17598cd4963
Great job guys. Thanks!
Integrated into 'main-golden', will be available in build *200912051400* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main/rev/f17598cd4963 User: Vita Stejskal <vstejskal@netbeans.org> Log: #17356: applying johnsonlau's patch