Forums / Developer / Hungarian (UNICODE) storage problem
Tony Wood
Sunday 07 November 2004 3:59:36 am
Hi,
We are using MySQL 4.1.7 set with utf-8 for the db abd get the following weird problem with some characters.
When you cut and past certain characters into Exponential they fail This is the case with the A in Árucikkenként.
It appears to affect but not limited to. U+00C1 LATIN CAPTIAL LETTER A WITH ACUTEU+0150 LATIN CAPTIAL LETTER O WITH DOUBLE ACUTE
We tried with and without OE and it still fails. The cut and paste works with other Apps it appears only to be eZ
Has anyone else come across this and got a fix?
tia
tony
Tony Wood : twitter.com/tonywood Vision with Technology Experts in eZ Publish consulting & development Power to the Editor! Free eZ Training : http://www.VisionWT.com/training eZ Future Podcast : http://www.VisionWT.com/eZ-Future
Balazs Halasy
Sunday 07 November 2004 2:35:14 pm
I have no clue why this happens, but: you could try turning on debug output, SQL debug and redirection debug. Copy & paste the text again and look for the SQL insert query which actually stores the thing in the database - does the query itself look healthy?
When you say "other apps" I reckon you mean other Win32 applications, correct? Well, the reason for why that would work is because most windows apps support UNICODE (or do some internal mapping) by default. However, what happens if you try using this text in other web-based solutions? Also, what are the exact symptoms? Do the A+acute and the O+double-acute letters simply disappear?
Allman
Monday 08 November 2004 12:51:29 am
Hi, Thanks for the quick reply.
>>SQL There are no errors in the SQL, and the text looks good in the storage line. Somewhere between entering in the screen and it being stored in the db it fails.If I past in the value direct into the DB it still fails on the display, even though the DB has the correct value.
>>Cut and PasteThis was cut and paste on Mandrake, but I think this was a red herin as I have found the problem is not here.
>>Symptons The character get converted into what looks like a non-double byte character.So Árucikkenként will be converted to �?rucikkenként.
I see that Árucikkenként stores correctly on your site is the ez.no site utf-8 (unicode)?
thanks
Jan Borsodi
Monday 08 November 2004 10:17:25 pm
It might be that the output of the site is not in UTF-8 but in a standard 8-bit charset. Some browsers will send characters not in that range as HTML entities.
You should check the output of the HTML page and see if it contains a <i>meta</i> tag with:
http-equiv="Content-Type" content="text/html; charset=utf-8"
If charset is not <i>utf-8</i> then that will explain the problem.
You should also examine the <i>HTTP</i> headers for the page.
-- Amos Documentation: http://ez.no/ez_publish/documentation FAQ: http://ez.no/ez_publish/documentation/faq
Tuesday 09 November 2004 5:04:29 am
The problem occurs in the admin interface as well as the front end. I have tested the admin interface with and without the OE and it still has the problem.
If you can confirm that you have it working with UTF-8 in your environment then it must be our setup and I will review..
Tony
Tuesday 16 November 2004 2:49:48 am
Hi Jan,
This is fixed by patching mysqldb.php with charset code from trunk
see: http://ez.no/community/bug_reports/hungarian_utf_8_character_bug and http://ez.no/community/bug_reports/mysql_connect_mysql_client_has_differnet_charset_as_server_db