Forums / Developer / PHP Search in nodes escaping special characters

"Please Note:
  • At the specific request of Ibexa we are changing this projects name to "Exponential" or "Exponential (CMS)" effective as of August, 11th 2025.
  • This project is not associated with the original eZ Publish software or its original developer, eZ Systems or Ibexa".

PHP Search in nodes escaping special characters

Author Message

Damien MARTIN

Tuesday 01 February 2011 1:26:55 am

Hi there,

I want to do a search using PHP in a tree.

So I use :

$entreprises = eZContentObjectTreeNode::subTreeByNodeID(
    array(
        'ClassFilterType' => 'include',
        'ClassFilterArray' => array('etablissement'),
        'AttributeFilter' => $criteres,
        'SortBy' => array('name', true)
    ),
    2283
);

and where

$criteres[] = array('etablissement/ville', 'like', "*$commune*");

It works very well but I have a problem with accents.

When I'm looking for "Chateau-Thierry" I can't find "Château-Thierry".

But when I search in the back office, I find the same results while typing "chateau" or "château". So I suppose There is something I can do.

But what ?

Could you help me with this ?

Adrien LOCHON

Tuesday 01 February 2011 6:05:16 am

Hi,

I would have used a ID for that. Instead of using a filter on a name, with spaces, accents and things like that, a ID is much better for accuracy.

Or, if you don't want to use any ID or you can't, use a string converter function like this :

function tvReplayImportFiltreBadChars($sContent) {
    $sContent = str_replace(" ", " ", $sContent);
    $sContent = trim($sContent);
    
    $sContent = str_replace("«", "\"", $sContent);
    $sContent = str_replace("»", '"', $sContent);
    $sContent = str_replace("“", "\"", $sContent);
    $sContent = str_replace("”", "\"", $sContent);
    $sContent = str_replace("…", "...", $sContent);
    $sContent = str_replace("´", "'", $sContent);
    $sContent = str_replace("‘", "'", $sContent);
    $sContent = str_replace("’", "'", $sContent);
    $sContent = str_replace("œ", "oe", $sContent);
    $sContent = str_replace(chr(197).chr(34), "oe", $sContent);
    $sContent = str_replace(chr(226) . chr(128) . chr(147), "-", $sContent);
    $sContent = str_replace("Œ", "OE", $sContent);
    $sContent = str_replace("—", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("•", "-", $sContent);
    $sContent = str_replace("Ÿ", "Y", $sContent);
    $sContent = str_replace("ž", "z", $sContent);
    $sContent = str_replace("Ž", "Z", $sContent);
    $sContent = str_replace("Š", "S", $sContent);
    $sContent = str_replace("š", "s", $sContent);
    $sContent = str_replace("›", ">", $sContent);
    $sContent = str_replace("‹", "<", $sContent);
    $sContent = str_replace("€", "E", $sContent);
    $sContent = str_replace(chr(226).chr(130).chr(172) , 'euros', $sContent );
    $sContent = str_replace(chr(195).chr(169), "é", $sContent);
    $sContent = str_replace(chr(195).chr(34), "û", $sContent);
    $sContent = str_replace(chr(195).chr(170), "ê", $sContent);
    $sContent = str_replace(chr(195).chr(168), "è", $sContent);
    $sContent = str_replace(chr(195).chr(32), "à" . chr(32), $sContent);
    $sContent = str_replace(chr(195).chr(162), "â", $sContent);
    $sContent = str_replace(chr(195).chr(69), "à", $sContent);
    $sContent = str_replace(chr(194).chr(34), '"', $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(153), "'", $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(166), "...", $sContent);
    $sContent = str_replace(chr(195).chr(167), "ç", $sContent);
    
    $sContent = trim($sContent);

    $sContent = strtoupper($sContent);
    $aLettres = array(",", ";", ".", ":", "°", "-", "_", "'", '"', "&", " ", "/", "\\", "@", "$", "%", "£", "¤", "µ", "*", "!", "§");
    $sContent = str_replace($aLettres, " ", $sContent);
    $sContent = str_replace(" ", "", $sContent);
    
    $sContent = strtr(utf8_decode($sContent), utf8_decode("ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ"), "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn");
    $sContent = strtolower($sContent);
    $sContent = utf8_encode($sContent);
    
    return $sContent;
}

This will change something like "île-de-fortûné" into "iledefortune", avoiding search errors, at least most of them i suppose... I use it a lot to match movies titles between our portal and our partners.

My opinion is : use ids :)

Damien MARTIN

Tuesday 01 February 2011 6:35:02 am

Thank you very much Adrien, but it is not what i'm looking for.

Since this morning I wrote a little piece of code to do what I want :

if($commune != "tous")
{
    // Conversion de la chaine de caractere pour la passer en ASCII
    $commune = iconv("UTF-8", "ASCII//TRANSLIT", $commune);

    $resultat = array();
    foreach($entreprises as $e)
    {
        $c = eZContentObject::fetchByNodeID($e->NodeID);
        $dm = $c->DataMap();

        $ville = $dm['ville']->DataText;
        $ville = iconv("UTF-8", "ASCII//TRANSLIT", $ville);

        $commune = strtoupper($commune);
        $ville = strtoupper($ville);

        $commune = str_replace("-", " ", $commune);
        $ville = str_replace("-", " ", $ville);

        //echo "<!-- $commune / $ville -->\n";

        unset($dm);
        unset($c);

        if($commune == $ville)
        {
            $resultat[] = $e;
        }
    }

    $entreprises = $resultat;

}

It works well, but I have 1800+ entries in $entreprises, so the server stop serving the page after a few seconds.

This is why I want to use an embed mecanism in Exponential to search through a tree without having to take care of accents as it does in the administration panel.

@Adrien : You should tke a look at iconv. I discovered it a few weeks ago and it is very usefull.

Thanks.

Franck Magnan

Tuesday 01 February 2011 11:59:52 am

Hello Damien,
I don't think it's an Exponential issue but a MySQL issue. Do you use MySQL as database?
I took your code and it's working on my Exponential instance. When I search for "Chateau-Thierry", I find my "Château-Thierry" content and my sql query looks like:

SELECT DISTINCT
ezcontentobject.*,
ezcontentobject_tree.*,
ezcontentclass.serialized_name_list as class_serialized_name_list,
ezcontentclass.identifier as class_identifier,
ezcontentclass.is_container as is_container

, ezcontentobject_name.name as name,  ezcontentobject_name.real_translation 


FROM
ezcontentobject_tree,
ezcontentobject,ezcontentclass
, ezcontentobject_name 

, ezcontentobject_attribute a1 


WHERE
ezcontentobject_tree.path_string like '/1/2/%' and  



a1.contentobject_id = ezcontentobject.id AND
a1.contentclassattribute_id = 343 AND
a1.version = ezcontentobject_name.content_version AND 
( a1.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 2 ) )
<
( a1.language_id & 1 )
+ ( ( a1.language_id & 2 ) )
) 
AND                             ( a1.sort_key_string LIKE '%Chateau-Thierry%'  ) AND 
ezcontentclass.version=0 AND
ezcontentobject_tree.node_id != 2 AND
ezcontentobject_tree.contentobject_id = ezcontentobject.id  AND
ezcontentclass.id = ezcontentobject.contentclass_id AND

ezcontentobject.contentclass_id  IN  ( 45 ) AND
ezcontentobject_tree.contentobject_id = ezcontentobject_name.contentobject_id and
ezcontentobject_tree.contentobject_version = ezcontentobject_name.content_version and 
( ezcontentobject_name.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 2 ) )
<
( ezcontentobject_name.language_id & 1 )
+ ( ( ezcontentobject_name.language_id & 2 ) )
) 

AND ezcontentobject_tree.is_invisible = 0
AND ((ezcontentobject.section_id in (1))) 

AND 
ezcontentobject.language_mask & 3 > 0 

ORDER BY ezcontentobject_name.name ASC

You can enable output debug and sql debug to compare your sql query with mine.
If it does not work in your environment, I suppose your MySQL server is not configured like mine, especially the collation. You may be take a look to that post

--
Developer at Open Wide