99488 – I18N: javadoc index search is not working for UTF-8 based Japanese Javadoc

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 99488 - I18N: javadoc index search is not working for UTF-8 based Japanese Javadoc

Summary: I18N: javadoc index search is not working for UTF-8 based Japanese Javadoc

Status:	RESOLVED FIXED

Alias:	None

Product:	java
Classification:	Unclassified
Component:	Javadoc (show other bugs)
Version:	5.x
Hardware:	All All

Importance:	P3 blocker (vote)
Assignee:	Jan Pokorsky

URL:
Keywords:	I18N

Duplicates (1):	108492 (view as bug list)
Depends on:
Blocks:

Reported:	2007-03-30 06:09 UTC by Masaki Katakai
Modified:	2007-10-12 17:49 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
proposed patch, check Japanese encoding first. If utf-8, we need to check contents and check localized strings in it. JISAutodetect encoding should not be used. Using allclasses-frame.html file would be reasonable. This will not break English build. Will (4.07 KB, patch) 2007-04-05 03:39 UTC, Masaki Katakai	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Masaki Katakai 2007-03-30 06:09:49 UTC

It seems that current codes are assuming only iso-2022-jp, sjis and euc-jp as
Javadoc encoding. Only these encodings will be accepted as Javadoc, which means when
users create Javadoc in UTF-8 with Japanese, it will not be accepted.

 Jdk12SearchType_japan.java:
       if ("jisautodetect".equals(acceptedEncoding)) {     //NOI18N
            return "iso-2022-jp".equals (encoding) ||       //NOI18N
                   "sjis".equals (encoding) ||              //NOI18N
                   "euc-jp".equals (encoding);              //NOI18N
                   // || "utf-".equals (encoding);  XXX Probably not, UTF-8 can
be anything ????

        }

JDK has changed the encoding from euc-jp to UTF-8 at JDK6.

jdk-6-doc-ja.zip
(can be downloaded from http://java.sun.com/javase/ja/6/download.html)

So currently it's not working with this Javadoc zip. We need to think better way
to accept such UTF-8 based Javadoc as Japanese javadoc.

Comment 1 Masaki Katakai 2007-04-03 04:12:50 UTC

In current implementation, accepts() is only checking the encoding - it accepts
possible Japanese encodings, e.g. sjis, euc-jp and iso-2022-jp. utf-8 is now
widely used and we should accept it and we need to provide additional way to
determine the javadoc are containing Japanese keywords or not.

If encoding are Japanese -> accept (return true)
If encoding is utf-8 -> check contents, if japanese keywords are includeds ->
accept (return true)

Any idea about other conditions? Should we check NetBeans locale?

Comment 2 Rebecca Liu 2007-04-03 08:36:46 UTC

We have the same situation for Simplified Chinese. It seems that only GB2312,
GB18030, GBK are accepted, but not UTF-8.

Comment 3 Ken Frank 2007-04-03 15:17:11 UTC

I think its reasonable to include ja or zh_CN utf8 encoding as a search parameter
since at least for solaris and linux, there are ja and zh_CN utf8 locales.
Usually assumption is that encoding being used is that of the locale user is in,
but also user can change the encoding of a given java file by properties,

so perhaps also the encoding of each searched file can be used but perhaps
that is not possible since user enters the search term while in a certain
locale and encoding.

ken.frank@sun.com

Comment 4 Masaki Katakai 2007-04-05 03:39:51 UTC

Created attachment 40469 [details]
proposed patch, check Japanese encoding first. If utf-8, we need to check contents and check localized strings in it. JISAutodetect encoding should not be used. Using allclasses-frame.html file would be reasonable. This will not break English build. Will

Comment 5 Masaki Katakai 2007-07-30 05:32:15 UTC

Can anyone review the patch?

Comment 6 Jan Pokorsky 2007-07-30 19:59:50 UTC

I just do not see any reason to read "JDK12_ALLCLASSES_JA" from Bundle.properties file. Otherwise it looks OK. Feel free
to integrate it. Thanks for the patch!

Comment 7 Jan Pokorsky 2007-08-07 15:14:28 UTC

fixed in

/cvs/javadoc/src/org/netbeans/modules/javadoc/search/Jdk12SearchType_japan.java,v  <--  Jdk12SearchType_japan.java
new revision: 1.11; previous revision: 1.10

Comment 8 Jan Pokorsky 2007-08-07 15:54:00 UTC

/cvs/javadoc/src/org/netbeans/modules/javadoc/search/Jdk12SearchType_japan.java,v  <--  Jdk12SearchType_japan.java
new revision: 1.12; previous revision: 1.11

Comment 9 Jan Pokorsky 2007-08-07 15:54:54 UTC

*** Issue 108492 has been marked as a duplicate of this issue. ***

Comment 10 Ken Frank 2007-10-10 18:45:15 UTC

what is the user scenarios here ?  I'd like to verify.
please specify both about locale user is in but also about the project encoding
properties that might be used
for example lets use ja locale, and project in default utf-9 or euc-jp project encoding
for solaris.

(and please confirm - is this about viewing ja javadoc or user generating javadoc of their
own project ?)

also, there are some issues now with javadoc not appearing ok in firefox if non ascii
is used in places like project name, path, class name, pkg name, etc.
(I know not related to this but FYI since about javadoc)

see 118174 for j2se project, issues on web and j2ee for same will be filed.

Comment 11 Masaki Katakai 2007-10-11 05:18:28 UTC

To verify this issue quickly, you can use Japanese jdk5 javadoc and jdk6 javadoc, then try search. We should get the
same results with English.
 - jdk5 japanese javadoc : EUC encoding
 - jdk6 japanese javadoc : UTF-8 encoding
In both cases, it should work and actually I could get the same results.
(but I found small issue that icons on search result are not correct.
I'll open new bug for this.)

> also, there are some issues now with javadoc not appearing ok in firefox if non ascii
> is used in places like project name, path, class name, pkg name, etc.
> (I know not related to this but FYI since about javadoc)

Yes, I think the reason is described in bug 118174, the default behavior
of javadoc will generate javadoc in native encoding and will not add
charset metatag in javadoc. So sometimes it does not work until we change
the charset encoding of browser.

I'll try the new build to see how the fix of 118174 is working.

Comment 12 Ken Frank 2007-10-12 17:08:38 UTC

why do bundle files need specific translated words for Japanese related to javadoc viewing ?

I don't know if that's completely related to this issue but saw here mention of bundle files
then saw in bundle files some separate Japanese words.

ken.frank@sun.com

Comment 13 Masaki Katakai 2007-10-12 17:24:28 UTC

Jan already fixed that issue. Thank you Jan. We don't need any specific Japanese words in bundle file. (old
implementation was using some words of Japanese in Bundle files)

Comment 14 Ken Frank 2007-10-12 17:42:17 UTC

is it safe to remove them from current bundles or is it better to wait until after nb6 ?

ken.frank@sun.com

Comment 15 Masaki Katakai 2007-10-12 17:49:49 UTC

Yes, it's safe. We can remove them now and actually these have been already removed in the fix of bug 118488.
It means displaying and searching Japanese javadoc are working without Japanese Bundle.properties.