This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 70510 - [50cat] -J-Dfile.encoding=UTF-8 and CVS
Summary: [50cat] -J-Dfile.encoding=UTF-8 and CVS
Status: RESOLVED INVALID
Alias: None
Product: versioncontrol
Classification: Unclassified
Component: CVS (show other bugs)
Version: 5.x
Hardware: PC Windows XP
: P1 blocker (vote)
Assignee: issues@versioncontrol
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-16 14:32 UTC by wulgar
Modified: 2007-01-04 17:14 UTC (History)
0 users

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wulgar 2005-12-16 14:32:49 UTC
[ BUILD # : 200512152030 ]
[ JDK VERSION : 1.5.0_06 ]

Hello

I'm working in Netbeans with option -J-Dfile.encoding=UTF-8 because I have some txt files in UTF-8.

When I create file named ZażółćGęśląJaźń.txt and add it to CVS, the file created at server is: zaĹĽĂłĹ,Ä++GÄ^(TM)Ĺ>lÄ...jaĹşĹ".txt

When I remove -J-Dfile.encoding=UTF-8 from config, CVS -> Show changes shows, that there is new Remote file.

It may cause data loss --> P1
Comment 1 Peter Pis 2005-12-16 15:41:35 UTC
"-J-Dfile.encoding" switch is used for file content not for file name. Command
line cvs treats this kind of file the same way as javacvs library. I'm afraid we
can't do nothing about this.
Comment 2 wulgar 2005-12-16 17:56:43 UTC
-J-Dfile.encoding is suposed to work like you described. 

But if I dont run Nb with this swich file is added to repository properly.
Comment 3 wulgar 2005-12-16 18:01:41 UTC
Lets suppose that I have two files:

Zażółć.txt

and 

Gęślą.txt

When I add the first one with switch it id added as: zaĹĽĂłĹ,Ä+.txt
When I add second one without the -J-Dfile.encoding=UTF-8 it is added as: Gęślą.txt

Is this normal?
Comment 4 _ pkuzel 2005-12-18 19:43:42 UTC
CVS as such supports only ASCII filenames! All non-ASCII file names can work 
if server and client uses the same encoding. Please check encoding at your 
server side and align or stick with ASCII names. 
 
BTW -J-Dfile.encoding is used (dafault) for : 
  new InputStreamReader(in) and 
  new OutputStreamWriter(out) 
constructors. CVS library uses these so user can align its local environment 
with server setup. Well, extra property e.g. cvs.filename.encoding could be 
better.  
 
INVALID because it's likely user's setup problem. Is it? 
Comment 5 wulgar 2005-12-19 06:47:07 UTC
> INVALID because it's likely user's setup problem. Is it?

No it isn't user's setup problem because behavior of CVS depends of this
setting. See my comments --> Fri Dec 16 18:01:41 +0000 2005
Comment 6 Peter Pis 2005-12-19 09:38:33 UTC
> Lets suppose that I have two files:
>
> Zażółć.txt
>
> and 
>
> Gęślą.txt

These two words contain different characters. I think that second word will be
added correctly for both cases (with switch on/off) for this file. Am I right?
Comment 7 wulgar 2005-12-19 09:40:16 UTC
It is only Example :) I've tested it with:

ZażółćGęśląJaźń.txt (with -J-Dfile.encoding=UTF-8)
ZażółćGęśląJaźń_1.txt (without -J-Dfile.encoding=UTF-8)

Results was:
zaĹĽĂłĹ,Ä++GÄ^(TM)Ĺ>lÄ...jaĹşĹ".txt
ZażółćGęśląJaźń_1.txt
Comment 8 Maros Sandor 2005-12-19 11:31:04 UTC
I played with it a bit and think this is not an issue but one has to properly
understand what is going on. First off, CVS server does not understand or care
about different encodings. For filenames, this means that it takes the name as
series of raw bytes as they come and sends the same bytes back to clients. This
works well for plain 7-bit ascii characters. Now back to your case. You started
Netbeans with -J-Dfile.encoding=UTF-8, so you are telling java to use UTF-8 as
the default system (platform) encoding for this session. Then you created a file
whose name contains special characters and _those chars have different byte
representations depending on encoding in use_. And CVS has to pick one when
communicating with server. It is natural that it picks the default system
encoding, this time UTF-8. In this encoding, special characters are encoded with
2 bytes, hence longer filenames. CVS server takes this and stores it as you sent
it. Later when you do update, checkout or any other CVS operation, everything
works perfectly, because server sends you filenames in UTF-8 and you expect them
to be in this encoding. However, once you remove the -J-D switch, your platform
encoding becomes whatever_it_is and things will break because server does not
care and you now expect all filenames coming from server to be in
whatever_it_is. To conclude, I would suggest you either name your files using
safe ascii only OR use the same encoding everytime.
Comment 9 wulgar 2005-12-19 11:35:49 UTC
Uuups. Thanks for clearing things :)

-J-Dfile.encoding=UTF-8 is NOT ONLY for content of files right?

If am I right how can I tell NB that some of my text files is UTF-8 encoded?
Comment 10 _ pkuzel 2005-12-19 13:14:01 UTC
From CVS spec:

Conventions regarding transmission of file names

In most contexts, `/' is used to separate directory and file names in filenames,
and any use of other conventions (for example, that the user might type on the
command line) is converted to that form. The only exceptions might be a few
cases in which the server provides a magic cookie which the client then repeats
verbatim, but as the server has not yet been ported beyond unix, the two rules
provide the same answer (and what to do if future server ports are operating on
a repository like e:/foo or CVS_ROOT:[FOO.BAR] has not been carefully thought out).

Characters outside the invariant ISO 646 character set should be avoided in
filenames. This restriction may need to be relaxed to allow for characters such
as `[' and `]' (see above about non-unix servers); this has not been carefully
considered (and currently implementations probably use whatever character sets
that the operating systems they are running on allow, and/or that users
specify). Of course the most portable practice is to restrict oneself further,
to the POSIX portable filename character set as specified in POSIX.1.