106089 – I18N - ruby project usesdefault project encoding of utf-8 but at runtime multibyte characters do not show correctly

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 106089 - I18N - ruby project usesdefault project encoding of utf-8 but at runtime multibyte characters do not show correctly

Summary: I18N - ruby project usesdefault project encoding of utf-8 but at runtime mul...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	ruby
Classification:	Unclassified
Component:	Code (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	Torbjorn Norbye

URL:
Keywords:	I18N

Depends on:
Blocks:

Reported:	2007-06-08 08:04 UTC by Ken Frank
Modified:	2007-08-31 12:24 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ken Frank 2007-06-08 08:04:05 UTC

new project encoding uses utf-8 as default encoding.

new project default encoding does not require user to change the encoding value
in order for things to work ok if they use the characters of the locale they
start ide in.

But for ruby project (and perhaps for other scripting like css or javascript),
although in the editor the multibyte used shows ok, at runtime it does not for
certain
locales/os:

steps:

be in locale like solaris ja(which is solaris euc encoding locale) and start
ide,  and create a ruby project and then have some multibyte in a string
that will be printed
(same thing happens on windows, mac and probably linux non utf8 asian locales)

then run the project.

the characters do not show correctly in the output window.

change project encoding value to euc-jp and create a new project and run - the
characters show ok in output window.  (or use same project and create another
ruby file)
(( that is, changing project encoding causes all future projects to use that
encoding)


Whats needed - the module should do the right conversion/encoding handling for
runtime -
its ok that utf-8 would be used as default but if so, the encoding handling for
runtime
needs to happen; user should not need to change the default project encoding if they
are just going to use the characters of the locale they are in.

Comment 1 Masaki Katakai 2007-06-13 07:40:16 UTC

I have no idea how to fix it and I think that only solution is to use native encoding in project.

Tor, what is your idea?

For example, on Windows, AFAIK there is no way to switch the encoding to UTF-8 in ruby runtime. As you many know, there
is an option of source encoding even in ruby runtime, but I could not find any option to control the runtime encoding
(output encoding). We should support any ruby runtime, not only jruby. So I think it's not good idea to implement any
special things into just jruby integration.

If we use native encoding on project by default, it would be the best for user.

If we decide that default is UTF-8, I think we should inform what the default encoding is being used to users so that
users can easily understand the encoding and can switch easily to native encoding. If users can find the encoding
quickly, they will be able to convert encoding in their program. It's not only Ruby project issue, I filed a bug 106084
for this.

Once user understand the encoding of sources and runtime, it's easy for users to control their output encoding like below.

$KCODE= "u"
require 'kconv'

s = "<japanese_characters_in_UTF-8>".tosjis()

printf(s)

KCODE="u" means that source code is UTF-8 encoding. Output should be SJIS in Windows case, so users will need to convert
UTF-8 to SJIS at displaying somethings to console.

Comment 2 Torbjorn Norbye 2007-06-23 19:39:48 UTC

Are you using JRuby or native Ruby when running the program?

I have a patch which passes -Ku to the ruby interpreter (for native Ruby only; JRuby doesn't seem to have this flag, but given that it's a VM I'm wondering if it 
already handles encoding better).  Or does $KCODE work for JRuby as well?

Comment 3 Jiri Kovalsky 2007-07-03 14:12:52 UTC

Reassigning this issue to newly created 'ruby' component.

Comment 4 Masaki Katakai 2007-07-10 01:58:11 UTC

-Ku and $KCODE will not solve this scenario, I think. Because these option
will be used just at parsing multibyte characters, will not affect output strings.
jruby/ruby runtime will be invoked on Windows in SJIS for Japanese, so
users will need to care the runtime encoding.

So I think there is no reasonable solution from NetBeans side,
we use UTF-8 as default, not platform encoding. Users need to
consider the platform encoding and source encoding. I think
it should be OK and acceptable.


Btw, FYI, it seems that $KCODE does not work in current JRuby,

http://jira.codehaus.org/browse/JRUBY-1133

Comment 5 Torbjorn Norbye 2007-08-02 19:36:50 UTC

I'm still not sure what we should do about this. When you said:

    So I think there is no reasonable solution from NetBeans side,
    we use UTF-8 as default, not platform encoding. Users need to
    consider the platform encoding and source encoding. I think
    it should be OK and acceptable.

do you mean that you think the IDE doesn't need to do anything else here (e.g. resolve bug as fixed/wontfix)? Or should the UTF-8 default be switched?

Comment 6 Masaki Katakai 2007-08-03 07:33:47 UTC

Hi Tor,

> Or should the UTF-8 default be switched?

I understand that dev team already decided to use UTF-8 as default for all projects. So there is no solution for us. You
can mark this as WONTFIX, I think.

Ruby is a scripting language and it will depend on runtime environment's encoding. So I think most developers usually
care of the source encoding and runtime encoding. I hope it would be easy for them to find out they need to change the
encoding setting.

Comment 7 Martin Krauskopf 2007-08-30 09:51:19 UTC

Seems like solved? Probably we could reopen this issue when UTF-8 support in Ruby interpreters settle down a little bit.
Probably Ruby 2.0 and relevant version of JRuby(, Rubinius, ...).

Comment 8 Ken Frank 2007-08-30 16:00:40 UTC

I think info on this could be added to release notes and faq - I'd sent Tor a draft of some,
but did not have parts about this issue in it; Martin, could you add info on this that other info ?

Just so I'm clear on it, does the impact mean that we can't guarantee that non ascii used in ruby/rails
files as data or output, will be detected and shown correctly at runtime, regardless of what project
encoding is used for a ruby/rails project ?

ken.frank@sun.com

Comment 9 Martin Krauskopf 2007-08-30 17:43:32 UTC

I do not know that much about it. It seemed to me that Masaki and Tor agreed that there is nothing to do from NetBeans
side. Also I've read a tons of problems about UTF-8 support in Ruby. It just does not work for some methods in String. E.g:

puts "こんにちは"
puts "こんにちは".size
puts "Řeřicha".size
puts "Rericha".size

Size is wrong for Czech characters. Not sure about Japanese. So I thought that this is just to be closed.
If not, sorry for mistake, and reopen.

Comment 10 Ken Frank 2007-08-30 17:47:44 UTC

I agree with the closing, just wanted to see if rel notes or faq could provide this limitation info
to users (limits of ruby/rails, not nb).

ken.frank@sun.com

Comment 11 Martin Krauskopf 2007-08-30 18:00:16 UTC

I'm not sure if it is appropriate to describe limitation of interpreter. They should be described in the interpreter's
pages/wiki. Like Ruby's, JRuby's, Rubinius wiki pages.
But I'm wrong person to decide this. I just should not close this issue, that was probably mistake :)
Masaki seems to know the most about this topic so might he could help? :)

Comment 12 Ken Frank 2007-08-30 18:13:06 UTC

IMO the user is using netbeans, and is using netbeans provided functionality, that of course
interacts with other functionality like java compilers, ruby compilers, etc.

But when user does not see non ascii handled ok, I don't think they should need to go search
ruby/rails docs to find out that its a limitation of ruby/rails, and I think that they will
think it a limitation of netbeans.

thus what is wrong with a one sentence nb release note that will explain this ?


ken.frank@sun.com

Comment 13 Masaki Katakai 2007-08-30 23:49:51 UTC

How about this?

Ruby interpreter will run in OS default encoding (locale of NetBeans is running).
So you need to care the encoding and your source encoding if these are different
and if you want to use non ascii characters.

For example, on Japanese Windows, the default source encoding is
UTF-8 in newly created Ruby project in NetBeans but Ruby interpreter will
use Windows-31J encoding. If you need to use native characters in source files,
it's better to change source encoding to Windows-31J in this case.
You can change to another encoding on project property dialog.

Comment 14 Ken Frank 2007-08-31 01:10:09 UTC

sounds like a fair compromise to ask user to chg project encoding to that of the encoding
of the locale they are running nb in - 

but assumes users will know what that enc is, which they might not, so we could explain

and we need to point out that for other project types of  nb, its ok to use utf-8 as default.

lets get some agreement from dev and others on how to phrase this, and if its acceptable kind
of rel note and request to users.

But another question - will all such use of non ascii where legal in ruby or rails nb projects
work ok with ruby/rails itself if user in non utf8 locale - or still some other limitations ?

I am assuming that, based on other issues and mails, that we will state that use of non ascii
in ruby/rails file names, project names, function names, etc is not supported in any case.



ken.frank@sun.com

Comment 15 Masaki Katakai 2007-08-31 12:24:41 UTC

> but assumes users will know what that enc is, which they might not, so we could explain

FYI. I filed the bug 106084. It's not easy to find out the platform encoding now.

But I understand developers for scripting languages usually care
the source/input/output encoding because it depends on interpreter encoding.
So I hope most they can find the encoding of platform.

> But another question - will all such use of non ascii where legal in ruby or rails nb projects
> work ok with ruby/rails itself if user in non utf8 locale - or still some other limitations ?

When I investigated before, I could not find such topic from the results of web search.

Btw, the original problem of this bug report is not happing by UTF-8 locale,
it's happening in the case when source encoding and platform encoding are different.

When we use NB in UTF-8 locale of Solaris and other Linux platform, it's working fine.

> I am assuming that, based on other issues and mails, that we will state that use of non ascii
> in ruby/rails file names, project names, function names, etc is not supported in any case.

I agree. As I posted in bug 99058, it's deprecated usage and I understand
no developer is using it in practice.