Forums / Extensions / eZ Find / charset problem
laurent le cadet
Wednesday 12 December 2007 3:02:32 am
Hi,
I'm using ezfind 1.0.2 with ezP 3.9.3 - iso-8859-1 and text is not correctly indexed.
ie : V�rin hydrauliqueV�rin hydraulique ... pompes, chaleur, hydraulique, v�rin
This should be "Vérin hydraulique"
Any additionnal settings are needed?
Regards.
Laurent
Thursday 13 December 2007 6:30:19 am
It sounds like the encoding is not correct.Must we have a utf-8 db?
Kåre Køhler Høvik
Thursday 13 December 2007 7:36:40 am
Hi
UTF8 should not be required for eZ Find and eZP3. If you have a test environment available, please try to comment out these two lines in <i>extension/ezfind/java/solr/conf/schema.xml</i>
.... <!-- <filter class="ISOLatin1AccentFilterFactory"/> --> ... <!-- <filter class="ISOLatin1AccentFilterFactory"/> --> ...
restart Solr, and reindex the data.
Kåre Høvik
Friday 14 December 2007 3:08:13 am
Hi Kåre,
We add comment for the lines :
<!-- <filter class="ISOLatin1AccentFilterFactory"/> -->
restart solr and reindex but the results are still corrupted :
This text :
Le DMP est con�u pour r�aliser pour le microdosage de tr�s haute pr�cision de tous les produits
Should be :
Le DMP est conçu pour réaliser pour le microdosage de très haute précision de tous les produits
The charcaters : ç,é,è (and I presume all the special characters) are not well encode.
Stuck at this point.
Any hint ?
regards.
Monday 17 December 2007 4:36:37 am
I read that on http://lucene.apache.org/solr/tutorial.html#Requirements :
"SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported"
Is that related to our problem or can we override that?
I tryed almost everythings without any results actually.
Best regards
Monday 17 December 2007 4:59:55 am
Thank you for looking into this.
It looks you found the problem. The resolution for this is to use eZ Find to convert the data to UTF-8 before it's indexed. Please add a bug report about this in the issue tracker, and I'll fix it as soon as I have time.
Best regardsKåre
Monday 17 December 2007 5:07:56 am
Kåre,
I'm going to report the bug.As you can see, there is additionnal info for encoding/decoding (java.net) or another alternative with additionnal code :
String encoding = request.getCharacterEncoding(); if (null == encoding) { // Set your default encoding here request.setCharacterEncoding("UTF-8"); } else { request.setCharacterEncoding(encoding); } ... String value = request.getParameter("q");
I'm digging in the "java.net" solution. For the other one, I don't know if it can serves us and where to apply the "patch".
Any idea?
Wednesday 19 December 2007 2:50:18 am
Finally, I convert the DB to UTF-8.Everything works fine.
(http://ez.no/developer/forum/general/convert_from_iso_8859_1_encoding_to_utf_8/)
Hope this help.
laurent
John Smith
Tuesday 19 August 2008 10:26:10 am
hi laurent,
I used the script by Kristof Coomans while upgrading 3.6.1 to 3.8.0 to do the uft-8 conversion, which is posted on
http://ez.no/developer/forum/install_configuration/update_to_3_8_and_codepage_problems
I am getting the notice of
SET NAMES 'utf8' on adminstration and public website.
Are you getting the same....
Please help...