Bug 127086 - I18N - php file with multibyte characters in it not show them ok sometimes
I18N - php file with multibyte characters in it not show them ok sometimes
Status: RESOLVED FIXED
Product: php
Classification: Unclassified
Component: Code
6.x
Sun All
: P2 (vote)
: 6.x
Assigned To: Tomas Mysik
issues@php
: I18N
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-11 04:37 UTC by Ken Frank
Modified: 2008-08-16 15:54 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ken Frank 2008-02-11 04:37:52 UTC
0. running in ja locale on solaris and windows.

1. assumption here is php project name and path to it with english only; problems with
having multibyte in project name or path discussed in another issue.

2. have php echo statement with multibyte characters

3. have the project encoding be either default utf-8 or have another project, not same
one, have project encoding of euc-jp for solaris or win31j for windows.

4. run the php file; might happen also with run the project

a. in the browser, in these cases, the browser encoding is not chosen ok, thus
the multibyte does not show ok.  seems like the browser encoding is 8859-1 for solaris or 
sjis for windows (for the utf-8 project encoding project)

windows - utf-8 project encoding - not shows ok
windows - win31j project encoding - shows ok

solaris utf-8 or euc-jp project encoding - not shows ok in both cases.

b. adding a meta charset tag to the php file,
in the head section of the html part, does not seem
to help, but user should not need to do that.

c. nb regular html files are seeded with the
charset value of the project encoding,
but in this case seems that would not help
but can the php files be seeded with such an encoding ?

user should not need to do it; let me know if separate issue is needed.


d. please look overall at the nb6 feq file and project encoding api and features to
see if any of it needs to be done for php files and projects; I don't know if this
issue is related to that or not, but the feq things need to be done
(its more than just having project encoding property choice)
Comment 1 Tomas Mysik 2008-04-06 13:36:30 UTC
I will investigate more later - now just one notice:

> b. adding a meta charset tag to the php file,
> in the head section of the html part, does not seem
> to help, but user should not need to do that.

User definitely has to do it because PHP file itself doesn't care about encoding - so user has to tell web server in 
which encoding the page is (using PHP function header() or HTML META tag which is more common) because PHP file itself 
is - in the end - nothing more than ordinary HTML page (I'm not talking about PHP shell scripts now). Also please 
verify that your web server is not sending any default charset (Apache usually does this, ISO 8859-1 is sent as 
default).

Thanks for reporting.
Comment 2 Ken Frank 2008-04-06 17:52:48 UTC
thanks for explaining about that php file itself - no encoding tag should be needed
 and since php would always be embedded in html file that would/should have encoding
tag

I am assuming still that for users running file at this level ie standalone php file,  that php module
will do the right thing in communicating to browser the correct encoding to use.

but about the web server, which for me on solaris is the apache installed as part of the
samp package, how does one ensure that its not sending a default charset vs doing the right
thing based on users locale or browser choice or something else ?

that is, if apache default setup handles non ascii ok but a special config is needed
for our php, then it will be imporant to have that info, perhaps mentioned in our own
docs.


ken.frank@sun.com

ken.frank@sun.com
Comment 3 Ken Frank 2008-04-06 18:30:10 UTC
another question about user model - is it correct assumoption that users using
php functionality in netbeans will also be same user who would install or use
some web server on their or other machine ?

if thats the case then it could be expected they would need to do something
needed to change if default charset being used by server is 8859

but if they might be using a server owned by root or admin, which I think
is a typical situation, then they would not be able to make those changes in any case.


Can team clarify about this ?

Also, since in netbeans, users projects and files might be using any project encoding,
and different projects can have different encodings, and not all will be utf-8,
how could apache be customized anyway about this (that is, to not use 8859 as default
but yet be flexible about other encodings ?)

ken.frank@sun.com
Comment 4 Tomas Mysik 2008-04-07 10:47:30 UTC
> but about the web server, which for me on solaris is the apache installed as part of the
> samp package, how does one ensure that its not sending a default charset vs doing the right
> thing based on users locale or browser choice or something else ?

Not sure what the situation is these days - my experience is from Windows and is several years old - maybe someone 
using Windows could verify how it is today. But on linux (I have Gentoo linux) there is no setup related to encoding - 
so no such problem exists. But as I said, not sure about other distros and Windows today.

> that is, if apache default setup handles non ascii ok but a special config is needed
> for our php, then it will be imporant to have that info, perhaps mentioned in our own
> docs.

I agree, some hints/FAQ should cover this topic.

Ken, could you verify the default Apache configuration on Windows please? Maybe it is OK already. Thanks.
Comment 5 Tomas Mysik 2008-04-07 11:16:39 UTC
> user model...

I think that the typical scenario could be - standalone developer, it means every developer has his own Apache, MySQL 
installation on his PC. This is definitely the most common case. And in such case one has full control over his 
Apache.

> Also, since in netbeans, users projects and files might be using any project encoding,
> and different projects can have different encodings, and not all will be utf-8,
> how could apache be customized anyway about this (that is, to not use 8859 as default
> but yet be flexible about other encodings ?)

It is easy - Apache should not send any header with any charset/encoding. One notice - as I said formerly, I don't 
know the current situation on Windows. But of course, we could have some FAQ like "I have UTF-8 in HTML META tag but 
pages are displayed incorrectly" or something like this.

Thanks,
Tomas
Comment 6 Tomas Mysik 2008-04-07 14:33:21 UTC
Adding Radek to CC because he can comment more on this issue (he fixed some issues with encoding in output window 
recently).
My opininion is that we can close this issue as FIXED because:
- for shell scripts - it has been fixed by Radek (please confirm)
- for web pages - there's nothing to do from PHP project

Any objections? Have I overlooked something?
Comment 7 Tomas Mysik 2008-04-17 12:45:57 UTC
No response for a long time, closing as suggested. Feel free to reopen and comment. Thanks.
Comment 8 Ken Frank 2008-04-17 16:40:20 UTC
several comments:

1. I thought the incomplete was waiting for confirnation from Radek that it was
fixed in shell scripts; seems like that would be important info to have
in any case before closing this and what is the shell script referred to here -
is it a unix shell script ?


2. I will definitely experiment with using a charset tag in php file
code itself, since dont think using the meta charset tag in html part did help.
new issue can be opened if needed later.

3. also will look at if some apache setup is needed about character sets - any hints
will be appreciated.

4. My assumption is that if apache is setup so that it can/will use utf-8
for example, then nb user would still be able to have project encoding
be any encoding and that characters of that project should be able
to be used and shown ok when run php file -- does this make sense for php nb model ?

5. yes I agree faq about setup related to charsets would be good.

ken.frank@sun.com
Comment 9 Ken Frank 2008-07-08 22:03:43 UTC
 it mentions below - for shell scripts - it has been fixed by Radek (please confirm)-

Radek, can you clarify about this; it can be important info for us to have
for testing.

ken.frank@sun.com
Comment 10 rmatous 2008-07-09 08:29:21 UTC
#127085 is still open as a duplicate of #127088. My only commit in this area I can remember was related to running php
script - you can in options adjust where output should go (outp.window, editor tab, web browser) - impl.takes output of
/usr/bin/php, copies it to output window and also to temporary file. When process finished, temporary file is shown in
web page and editor pane. My fix just tried consistently use Reader, Writer with proper project encoding according to
EncodingQuery for project. 
Comment 11 Ken Frank 2008-07-30 19:58:49 UTC
am in progress of verifying but want to find out the "rules/assumptions here"

1. for running of php file itself (not php web page) and showing in output window or editor,
the non ascii should show ok without needing header command like
header('Content-type: text/html; charset=utf-8');

assuming the characters used in php print statement are ok for that encoding ?

2. for running of php file, not php web page, in browser, should it show correct
characters if the header('Content-type: text/html; charset=utf-8');
is used ?

that is, is it required that such a header would be there or does php module
code/functionality provide this info to browser for such a standalone file ?


3. I noticed that if php options are set to browser, but php project props
set to run for command line, that even if the header statement above,
the characters dont show ok in browser (but are ok in ow and editor)

but if php proj props run set to use local web server, and the header statement is there, the
the characters in browser show ok.

---> is it expected that characters would not show ok with this combo of run from command line
but show in browser ?

ken.frank@sun.com
Comment 12 Ken Frank 2008-08-15 20:26:52 UTC
I'd still like to verify but need responses to questions in last posting
before this one. Could someone on dev team look at those questions ?

ken.frank@sun.com
Comment 13 Tomas Mysik 2008-08-16 15:54:44 UTC
> 1. for running of php file itself (not php web page) and showing in output window or editor,
> the non ascii should show ok without needing header command like
> header('Content-type: text/html; charset=utf-8');
>
> assuming the characters used in php print statement are ok for that encoding ?

Yes.

> 2. for running of php file, not php web page, in browser, should it show correct
> characters if the header('Content-type: text/html; charset=utf-8');
> is used ?

Yes, if the file is in UTF-8 encoding.

> that is, is it required that such a header would be there or does php module
> code/functionality provide this info to browser for such a standalone file ?

Ken, you probably still don't understand - it is up to user to decide whether he uses HTML META tag or PHP header() 
function. Moreover, for HTML file, it is valid not to define META tag with charset at all - the same applies for PHP.
So no, the header is NOT required by PHP module.

> 3. I noticed that if php options are set to browser, but php project props
> set to run for command line, that even if the header statement above,
> the characters dont show ok in browser (but are ok in ow and editor)
> 
> but if php proj props run set to use local web server, and the header statement is there, the
> the characters in browser show ok.
> 
> ---> is it expected that characters would not show ok with this combo of run from command line
> but show in browser ?

If the header() function is there, then the browser should display characters correctly (no matter what run 
configuration is set).


By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo