87158 – I18N - multibyte in uml reports not display correctly in some ja or zh asian locales

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 87158 - I18N - multibyte in uml reports not display correctly in some ja or zh asian locales

Summary: I18N - multibyte in uml reports not display correctly in some ja or zh asian ...

Status:	VERIFIED FIXED

Alias:	None

Product:	uml
Classification:	Unclassified
Component:	Reporting (show other bugs)
Version:	5.x
Hardware:	Sun All

Importance:	P2 blocker (vote)
Assignee:	Yang Su

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2006-10-14 03:01 UTC by Ken Frank
Modified:	2006-11-29 16:00 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ken Frank 2006-10-14 03:01:58 UTC

This is seen in current uml griffin but coco will be getting griffin
code later this month so filing in BT also besides filing in bugtraq.

If this is not correct, please let me know how it should be referenced
for coco.

background - uml reports use html files in uml report jar as basis for
some information/labels, but running in solaris ja utf-8 locale, vs ja euc
locale, the localized words do not show correctly in uml web report pages.

(not sure if all ok in windows or ja euc or ja sjis locale; will add to report
if not ok there)

from mail exchange with Sheryl:

 Sheryl,

am doing some additonal testing of report, and how mbyte looks in web pages.

And realized, as to needing to change meta charset tag - the problem is that
for unix, there are 3 ja sub locales, and 3 zh sub locales, each with a different
default encoding.

What I do:

1. in ja locale, add mbyte to some lines of web report html template files.
2. add meta charset tag of euc-jp
3. run in ja_JP.UTF-8 locale  (things are mostly ok if run in ja locale, will
discuss
that another time)

For other product html files, like javahelp and new wizard description areas
html files, it was found that if the meta charset was euc-jp or euc-cn,
characters would still display ok or all of those sub locales, plus windows.

I am not sure if this is because javahelp viewer and template description
area in nb allows for it or something else.

But in report viewer, using meta charset, euc-jp, and/or combined with that the
actual characters were created in ja locale (euc-jp  is default encoding
for regular ja solaris locale,) then characters dont display ok in report pages
when running in ja utf-8 locale - the
characters all either incorrect or have random ascii.
(and changing in browser the encoding value from euc-jp to utf8 does not help - then
the characters are not shown ok also with different symptoms.

PROBLEM - there can be only one meta charset value used per localized jar
and files in it, since there is just one localized jar per ja and one for zh,
so that the product code needs to take care of case when user is in
another sub locale of ja or zh or whatever other locale user is in that might have
this situation of sublocales with different encodings (users in any locale can
have diagrams with non ascii, even if they not run in localized locale)

---> I can file something but want to make sure my steps are ok as to your
assumptions first.

Thanks - Ken

Sheryl reply:
hat's indeed the problem, we are using a combination of static templates and
generated runtime contents, any suggestions to address it? Should we use utf-8
instead?

Comment 1 Ken Frank 2006-10-14 03:46:08 UTC

happens as well in solaris ja_JP.PCK locale (sjis encoding)
think it will happen in the other unix zh_CN locales as well.


On windows, even without localized files, on the lower left section All Elements, 
of the report window, if a classname of a class diagram has multibyte,
then this name shows as garbage ascii, even though shows correctly in the
main part of the windows. So this is not related to the localized
html files, and perhaps needs a separate issue ?

But clicking all packages, then in lower left, these class names show mbyte
ok. 

To clarify, user who does not run localized release can still use mbyte in java
class,method and other names, as well as in other places in uml diagrams.

ken.frank@sun.com

Comment 2 Yang Su 2006-11-16 00:10:30 UTC

The root cause is that we read template in client default encoding, ignoring the
fact that there are often multiple encodings associated with one locale which is
supplied with one localized jar resource.

So the encoding used in html templates must be well defined at implementation
time, in our case, it's UTF-8. Report module will read static content in UTF-8
and add generated data, then write out to html files in the same encoding, which
is UTF-8. 

I have implemented logic to take care encoding for reading and writing, tested
on Solaris 10 under zh_CN with various encoding. 

I noticed that in 061113_2 build, a few localized html templates (e.g.
overview-summary.html) were done in non-utf8 encoding, that must be corrected by
localization team.

Comment 3 Sergey Petrov 2006-11-22 11:22:04 UTC

may be issue 78633 has the same root as this one (was closed as not reproducible)
I can't verify this one with ja locale

Comment 4 Ken Frank 2006-11-29 16:00:55 UTC

this can be verified now due to changes and fixes from other issues
related to uml reports

if anything related to uml reports still does not seem correct,
please open a new issue on it.

ken.frank@sun.com