169804 – I18N : search does not work in non-UTF-8 encoding project

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 169804 - I18N : search does not work in non-UTF-8 encoding project

Summary: I18N : search does not work in non-UTF-8 encoding project

Status:	RESOLVED FIXED

Alias:	None

Product:	utilities
Classification:	Unclassified
Component:	Search (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	Victor Vasilyev

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2009-08-04 11:32 UTC by Masaki Katakai
Modified:	2009-09-08 17:22 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
sample projects (21.13 KB, application/x-compressed) 2009-08-04 11:37 UTC, Masaki Katakai	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Masaki Katakai 2009-08-04 11:32:05 UTC

I'll attach sample projects later. There are two projects in it.

SearchTest_UTF8
SearchTest_EUC

Try to open both on your NetBeans. SearchTest_UTF8 sets UTF-8 as
project encoding. SearchTest_EUC is using EUC encoding. There
is no multibyte codes. However search does not work at all
in SearchTest_EUC project.

1. Open SearchTest_UTF8 project
2. Verify the source encoding is UTF-8 on project encoding
3. Select a project and select Find... from context menu
4. Try to search string e.g. "katakai"
   it should match with the following 2 lines.
    * @author katakai
    application.vendor=katakai

5. Try the same operation for SearchTest_EUC
   Only "application.vendor=katakai" is matched.

I tried it on 6.8 M1. But it's reproducible from 6.7.
6.5 works without this issue.

Product Version: NetBeans IDE Dev (Build 200908022240)
Java: 1.6.0_13; Java HotSpot(TM) 64-Bit Server VM 11.3-b02-83
System: Mac OS X version 10.5.7 running on x86_64; SJIS; ja_JP (nb)

Comment 1 Masaki Katakai 2009-08-04 11:37:17 UTC

Created attachment 85755 [details]
sample projects

Comment 2 Victor Vasilyev 2009-08-20 21:12:00 UTC

It is a valid issue.

Attached projects was tested on both 
the NetBeans 6.8M1: 
  Product Version: NetBeans IDE Dev (Build 200908022240)
  Java: 1.6.0_13; Java HotSpot(TM) Client VM 11.3-b02
  System: Windows XP version 5.1 running on x86; Cp1251; ru_RU (nb)
  Userdir: C:\Documents and Settings\vvg\.netbeans\6.8m1
and the a trunk version on the same platform:
  changeset:   141827:01c13dece50c 
  date:        Tue Aug 18 18:24:57 2009 +0200

I confirm that the "Find..." functionality don't work properly when it has been invoked against the attached project
SearchTest_EUC.

Comment 3 Victor Vasilyev 2009-08-20 23:37:42 UTC

Seems a root cause of the issue is an incorrect value of encoding for the Java source files that is returned after
polling a collection of the service providers registered in the IDE
(service=org.netbeans.spi.queries.FileEncodingQueryImplementation.class)

The NetBeans recognizes encoding of a Java source file as an encoding defined for the project containing that file (in
the test project "SearchTest_EUC" it is "EUC-JP"), but doesn't do it according to the Java Language Specification.
See http://java.sun.com/docs/books/jls/third_edition/html/lexical.html

During the test I've found that the service providers have been polled in the following order:
org.netbeans.modules.versioning.diff.DiffFileEncodingQueryImpl returns null - OK
org.netbeans.modules.versioning.util.queries.DiffFileEncodingQueryImpl returns null - OK
org.netbeans.modules.openide.loaders.DataObjectEncodingQueryImplementation returns null - Not sure that it is OK!
org.netbeans.modules.xml.util.DefaultXmlFileEncodingQueryImpl returns null - OK
org.netbeans.modules.projectapi.ProjectFileEncodingQueryImplementation returns a project encoding "EUC-JP" - OK.
   BTW It also defines a default encoding of the platform as a secondary value that 
       was "windows-1251" on my platform, but it is not used for decoding inside the "Find..." functionality.  
org.netbeans.modules.diff.DiffFileEncodingQueryImplementation returns null - OK

I guess implementation of the DataObjectEncodingQueryImplementation [1] has incorrect behavior, and it should return a
correct encoding value for the Java source file even if containing project defines another encoding. Most likely
encoding of the Java source file should be always UTF-16? 

Probably, a special service provider (service=org.netbeans.spi.queries.FileEncodingQueryImplementation.class)
should be provided for the MIME type "text/x-java", so that it will be accessible via the expression
MimeLookup.getLookup(mimeType).lookup(FileEncodingQueryImplementation.class)
in the method DataObjectEncodingQueryImplementation.getEncoding(FileObject file) - see [2].

[1] 
http://hg.netbeans.org/main/file/2304b0a11dcd/openide.loaders/src/org/netbeans/modules/openide/loaders/DataObjectEncodingQueryImplementation.java#l79
[2]
http://hg.netbeans.org/main/file/2304b0a11dcd/openide.loaders/src/org/netbeans/modules/openide/loaders/DataObjectEncodingQueryImplementation.java#l95

Comment 4 Victor Vasilyev 2009-08-21 15:35:39 UTC

Could you please, review my investigation results and/or reassign the issue if my assumption about
component/subcomponent is wrong.

Comment 5 Jan Lahoda 2009-08-23 18:18:02 UTC

Sorry, but I think that the problem is somewhere else. I added e.printStackTrace() to catch block on line 712 in
utilities/src/org/netbeans/modules/search/BasicSearchCriteria.java and the following exception is being thrown inside
the corresponding try block:
java.lang.IllegalStateException: Current state = FLUSHED, new state = CODING_END
        at java.nio.charset.CharsetDecoder.throwIllegalStateException(CharsetDecoder.java:951)
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:537)
        at org.netbeans.api.queries.FileEncodingQuery$ProxyCharset$ProxyDecoder.implFlush(FileEncodingQuery.java:276)
        at java.nio.charset.CharsetDecoder.flush(CharsetDecoder.java:633)
        at org.netbeans.modules.search.Utils.decodeByteBuffer(Utils.java:226)
        at org.netbeans.modules.search.Utils.getCharSequence(Utils.java:180)
        at org.netbeans.modules.search.BasicSearchCriteria.checkFileContent(BasicSearchCriteria.java:710)
        at org.netbeans.modules.search.BasicSearchCriteria.matches(BasicSearchCriteria.java:644)
        at org.netbeans.modules.search.SpecialSearchGroup.processSearchObject(SpecialSearchGroup.java:146)
        at org.netbeans.modules.search.SpecialSearchGroup.doSearch(SpecialSearchGroup.java:119)
        at org.openidex.search.SearchGroup.search(SearchGroup.java:178)
        at org.netbeans.modules.search.SearchTask.run(SearchTask.java:126)
        at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:602)
        at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:1070)

This is (IMO) responsible to the missing occurrence.

Regarding the encoding of the Java files, I think that the Unicode character set is meant as internal representation,
the input file can be in any encoding convertible to Unicode character set (see javac's "-encoding" command line
option). Forcing UTF-16 (or any other encoding) on all Java files would not be correct (note that the files in the
attached project are not UTF-16 encoded files). The project supplied encoding is meant to be used to decode the Java
files (under normal circumstances). Also note that the files from the attached project can be opened in the editor
correctly (by use of the FileEncodingQuery).

Comment 6 Victor Vasilyev 2009-09-08 17:22:24 UTC

It is fixed in the main trunk
http://hg.netbeans.org/main/rev/5ba89a3887a1