Forums / Setup & design / Clean URL for Vietnamese pages

"Please Note:
  • At the specific request of Ibexa we are changing this projects name to "Exponential" or "Exponential (CMS)" effective as of August, 11th 2025.
  • This project is not associated with the original eZ Publish software or its original developer, eZ Systems or Ibexa".

Clean URL for Vietnamese pages

Author Message

Guillaume Marty

Thursday 09 June 2011 9:22:19 am

I saw a topic and a bug related to this issue, but they date back to 2009.

The problem is clean URL are not generated for pages written in Vietnamese, falling back to /content/view/full/ type URL.

I installed the transformation file attached to the bug report and override transform.ini this way:

[Transformation]
Charsets[]=utf-8;vietnamese

[vietnamese]
Files[]=vietnamese.tr
Extensions[]

 

That's almost OK as some characters are not caught by the transformation rules and are replace by a hyphen.

Character: ệ
Rule in tranformation file: U+1EC7 = "e"
Result: -
Expected result: e

Any ideas why not all characters are transformed?

Ivo Lukac

Thursday 09 June 2011 10:28:12 am

Hi

Try this custom url translator, place the file in "urlfilters/ngvietnamesefilter.php" in your extension with content:

<?php
class nGVietnameseFilter extends eZURLAliasFilter
{
static $mappingArray = array('\u00C0' => 'A', '\u1EA2' => 'A', '\u00C3' => 'A', '\u00C1' => 'A', '\u1EA0' => 'A', '\u1EB0' => 'A','\u1EB2' => 'A', '\u1EB4' => 'A', '\u1EAE' => 'A', '\u1EB6' => 'A', '\u1EA6' => 'A', '\u1EA8' => 'A','\u1EAA' => 'A', '\u1EA4' => 'A', '\u1EAC' => 'A', '\u00C8' => 'E', '\u1EBA' => 'E', '\u1EBC' => 'E','\u00C9' => 'E', '\u1EB8' => 'E', '\u1EC0' => 'E', '\u1EC2' => 'E', '\u1EC4' => 'E', '\u1EBE' => 'E','\u1EC6' => 'E', '\u00CC' => 'I', '\u1EC8' => 'I', '\u0128' => 'I', '\u00CD' => 'I', '\u1ECA' => 'I','\u00D2' => 'O', '\u1ECE' => 'O', '\u00D5' => 'O', '\u00D3' => 'O', '\u1ECC' => 'O', '\u1ED2' => 'O','\u1ED4' => 'O', '\u1ED6' => 'O', '\u1ED0' => 'O', '\u1ED8' => 'O', '\u1EDC' => 'O', '\u1EDE' => 'O','\u1EE0' => 'O', '\u1EDA' => 'O', '\u1EE2' => 'O', '\u00D9' => 'U', '\u1EE6' => 'U', '\u0168' => 'U','\u00DA' => 'U', '\u1EE4' => 'U', '\u1EEA' => 'U', '\u1EEC' => 'U', '\u1EEE' => 'U', '\u1EE8' => 'U','\u1EF0' => 'U', '\u1EF2' => 'Y', '\u1EF6' => 'Y', '\u1EF8' => 'Y', '\u00DD' => 'Y', '\u1EF4' => 'Y','\u00E0' => 'a', '\u1EA3' => 'a', '\u00E3' => 'a', '\u00E1' => 'a', '\u1EA1' => 'a', '\u1EB1' => 'a','\u1EB3' => 'a', '\u1EB5' => 'a', '\u1EAF' => 'a', '\u1EB7' => 'a', '\u1EA7' => 'a', '\u1EA9' => 'a','\u1EAB' => 'a', '\u1EA5' => 'a', '\u1EAD' => 'a', '\u00E8' => 'e', '\u1EBB' => 'e', '\u1EBD' => 'e','\u00E9' => 'e', '\u1EB9' => 'e', '\u1EC1' => 'e', '\u1EC3' => 'e', '\u1EC5' => 'e', '\u1EBF' => 'e','\u1EC7' => 'e', '\u00EC' => 'i', '\u1EC9' => 'i', '\u0129' => 'i', '\u00ED' => 'i', '\u1ECB' => 'i','\u00F2' => 'o', '\u1ECF' => 'o', '\u00F5' => 'o', '\u00F3' => 'o', '\u1ECD' => 'o', '\u1ED3' => 'o','\u1ED5' => 'o', '\u1ED7' => 'o', '\u1ED1' => 'o', '\u1ED9' => 'o', '\u1EDD' => 'o', '\u1EDF' => 'o','\u1EE1' => 'o', '\u1EDB' => 'o', '\u1EE3' => 'o', '\u00F9' => 'u', '\u1EE7' => 'u', '\u0169' => 'u','\u00FA' => 'u', '\u1EE5' => 'u', '\u1EEB' => 'u', '\u1EED' => 'u', '\u1EEF' => 'u', '\u1EE9' => 'u','\u1EF1' => 'u', '\u1EF3' => 'y', '\u1EF7' => 'y', '\u1EF9' => 'y', '\u00FD' => 'y', '\u1EF5' => 'y','\uFB00' => 'ff', '\uFB01' => 'fi', '\uFB02' => 'fl', '\uFB03' => 'ffi', '\uFB04' => 'ffl', '\uFB05' => 'ft', '\uFB06' => 'st','\u00C2' => 'A', '\u00CA' => 'E', '\u00CE' => 'I', '\u00D4' => 'O', '\u00DB' => 'U','\u00E2' => 'a', '\u00EA' => 'e', '\u00EE' => 'i', '\u00F4' => 'o', '\u00FB' => 'u','\u01A0' => 'O', '\u01A1' => 'o', '\u01AF' => 'U', '\u01B0' => 'u');

static function utf8ToUnicode( $str ) {
$unicode = array();$values = array();$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
$unicode[] = chr($thisValue);
}else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = '\u' . strtoupper(str_pad($number, 4, '0', STR_PAD_LEFT));
$values = array();
$lookingFor = 1;
}
} 
}
} 
return implode("",$unicode);
} 
function process( $text, &$languageObject, &$caller ){
$outputText = '';$textArray = preg_split('/(?<!^)(?!$)/u', $text);
foreach($textArray as $char){
$unicodeChar = nGVietnameseFilter::utf8ToUnicode($char);
$outputText .= (array_key_exists($unicodeChar, nGVietnameseFilter::$mappingArray)) ? nGVietnameseFilter::$mappingArray[$unicodeChar] : $char;
}
return $outputText;
}
}
?>

Add following lines to your site.ini:

 [URLTranslator]
Extensions[]={YOUR EXTENSION NAME}
Filters[]=nGVietnameseFilter

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Guillaume Marty

Tuesday 14 June 2011 5:35:45 am

Thanks for your reply, but it didn't work for me.

First, I tried to do what you described.

Then I regenerated the autoloads array and tried:

[URLTranslator]
FilterClasses[]=nGVietnameseFilter

(Extensions & Filters are deprecated now)

But it didn't work either. It looks like the characters are transformed in a bad way beforehand. I'm still enquiring.

Ivo Lukac

Tuesday 14 June 2011 5:50:21 am

Hi,

Send me your email via "Direct contact" form (http://share.ez.no/authorcontact/form/9504 ) and I'll send you the files, maybe the copy&paste method from post is not good

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac