Bug 184513 - Problem while opening cyrillic path passed as parameter in Win32
Problem while opening cyrillic path passed as parameter in Win32
Product: platform
Classification: Unclassified
Component: Launchers&CLI
PC Windows XP
: P2 (vote)
: 6.x
Assigned To: Antonin Nebuzelsky
: I18N
Depends on:
  Show dependency treegraph
Reported: 2010-04-20 07:10 UTC by opentitan
Modified: 2010-05-11 07:48 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT

Patch provided by Tomáš Holý (7.90 KB, patch)
2010-05-05 09:53 UTC, Jaroslav Tulach
Details | Diff
Compiled nbexec.exe and nbexec.dll with Tomas's patch applied (109.72 KB, application/octet-stream)
2010-05-06 08:40 UTC, Antonin Nebuzelsky

Note You need to log in before you can comment on or make changes to this bug.
Description opentitan 2010-04-20 07:10:58 UTC
Product Version = NetBeans IDE Dev (Build 201004060201)
Operating System = Windows 7 version 6.1 running on x86
Java; VM; Vendor = 1.6.0_19
Runtime = Java HotSpot(TM) Client VM 16.2-b04

Unable to open the file having cyrillic chars in path,
when the path is passed as command line parameter in win32:
e.g. "w:\Плата\svn\media\флешка\white\flash_testing.html"

When I try to open the file from windows explorer (manually associated with netbeans.exe),
I get an error message "<here is incorrectly encoded full path> does not exist, or is not a plain file."

By the way, Ctrl+C in the error message window does not copy it's text. It should do so.
I have to type it manually to report the issue now.

Thanks for Netbeans!

PS: I was waiting for wordwrap for years and just migrated from Komodo Edit to the nightly build once wrapping was implemented.
Comment 1 Alexei Mokeev 2010-04-21 08:29:56 UTC
Just for the case checked on Linux and Cyrillic letters in the path works well, e.g: netbeans /tmp/какой-то/каталог/1.html works fine.

So seems to be Windows specific if any.
Comment 2 Victor Vasilyev 2010-04-21 12:08:07 UTC
Seems the NetBeans implementation doesn't take into account a case when different encodings are used in the command line and in the GUI.

Traditionally, by default, the Russian versions of the Windows use the code page 866 in the command line for DOS applications and the code page 1251 in the GUI for Windows applications.
Comment 3 Victor Vasilyev 2010-04-27 05:50:04 UTC
1. It is common I18N issue (not for Russian locale only!), because Windows is able to have different code pages in console and in GUI at the same time. 
E.g. IBM850 in console ("OEMCP"), and Windows-1252 in GUI ("ANSI code page").
Also, "OEMCP" can be easily changed by a user:
- for a console session via command chcp, 
- permanently via changing of a value in the Registry, namely by the changing of value of "OEMCP" at the key [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]. Note, you need to restart your computer to see the changes. 

2. The NetBeans as a GUI application correctly uses GUI code page. It is based on a Java solution that relies on the using of a default charset returned by the method java.nio.charset.Charset.defaultCharset().

3. This bug can't be fixed in the package org.netbeans.modules.openfile. To be more specific it can't be fixed at the Java level if incorrect encoding has been applied to original codes of the command line arguments. 
- The "main" Java application uses only String objects (UTF-8) to communicate with the native level.
- in the common case the transcoding between code pages is non-reversible transformation.
E.g. after sequential transcoding of the Russian alphabet IBM866 -> Windows-1251 -> UTF-8 -> IBM866 we will "lose" info about the capital letter "Ш" and we'll see it as the unrecognized symbol "?", e.g. "О?ИБКА" instead of correct word "ОШИБКА" (i.e. "ERROR" in English). 

4. AFAIU the Windows recognizes types of the applications at start of their launching. At least I remember such code in the Windows 3.1 where there were investigating of the name of the function used as entry point of the application. To get info about more modern versions of the Windows see remarks on the page http://msdn.microsoft.com/en-us/library/ms683156%28VS.85%29.aspx

Note, according to the remarks from viewpoint of the Windows the NetBeans is not a Unicode process ;-) Because, the NetBeans launcher for Windows doesn't use the wmain nor _tmain function as the entry point. BTW the same is true for any application based on the Java 6. So, we won't consider these types of applications here. 

5.  Current implementation of The NetBeans defines several entry points:

Have I missed something?

All these entry points are based on the C function 
int main(int argc, char *argv[]).

Hence, the NetBeans, including IDE and any application based on the NetBeans platform, are console applications form viewpoint of the Windows. In this case, the Windows will use "OEMCP" to pass argv values. But, nature of the NetBeans is GUI application, and "ANSI code page" should be used to encode arguments of the command line.
I guess, it is a causal place of this bug.

I think, the NetBeans should inherit a solution used in Java for many years where there are two Windows applications with different kinds of the entry points, i.e. java.exe with function main(int argc, char ** argv) and javaw.exe with function WinMain(HINSTANCE inst, HINSTANCE previnst, LPSTR cmdline, int cmdshow). In the last case, to pass cmdline the Windows will use "ANSI code page" that is needed for the GUI application. 

If both my conclusions and proposal are correct then a user will  associate netbeansw.exe instead of netbeans.exe to make possible the opening of files with the names where non-Latin-1 characters are used. Of course, in this case, all .exe files with entry points should be fixed and all should have own "w" clones.

I'll re-assign this bug to platform/Launchers&CLI.
Please, validate my investigation.
Comment 4 Jaroslav Tulach 2010-04-28 16:49:31 UTC
We used to have nb.exe and netbeans.exe in previous versions of NetBeans, but Tomáš Holý unified these two. It would be nice if he could come up with simpler solution than to return back to original state.

Btw. workaround exists: Drag&Drop the file into open NetBeans window. Don't you want to reclassify to P3 then?
Comment 5 Jaroslav Tulach 2010-05-05 09:53:37 UTC
Created attachment 98474 [details]
Patch provided by Tomáš Holý

Return of lost son: Tomáš Holý generously donated the patch that solve the problem for him (tested on "žluťoučký kůň" filename). Thanks a lot Tomáši, we owe you a t-shirt.

Can someone verify the patch solves the problem for Cyrilic characters too?
Comment 6 Alexei Mokeev 2010-05-05 10:28:15 UTC
Hi Jarda,

Sure, I can verify the patch with Cyrillic letters. Just don't have the setup to rebuild the launcher. Can you send an executable(in .zip) to me ?

Comment 7 Antonin Nebuzelsky 2010-05-06 08:40:34 UTC
Created attachment 98543 [details]
Compiled nbexec.exe and nbexec.dll with Tomas's patch applied

Alexei, please try the attached nbexec.exe and nbexec.dll.
Comment 8 Alexei Mokeev 2010-05-06 09:36:31 UTC
Tested with Cyrillic paths - patched version works with them.
Comment 9 Jaroslav Tulach 2010-05-06 10:36:14 UTC
Good to know. Congratulation Tomáš!
Comment 10 Antonin Nebuzelsky 2010-05-06 14:52:48 UTC
Comment 11 opentitan 2010-05-07 01:07:44 UTC
(In reply to comment #8)
> Tested with Cyrillic paths - patched version works with them.

Tested, confirmed.

You are great, devs!
The issue reported by regular user is fixed in 2 weeks.
That's really good. I will promote Netbeans as an open source product with a very responsible community.

Comment 12 Quality Engineering 2010-05-10 09:16:57 UTC
Integrated into 'main-golden', will be available in build *201005100200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/
Comment 13 Quality Engineering 2010-05-11 07:48:34 UTC
Integrated into 'main-golden', will be available in build *201005110931* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main/rev/

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo