Forums / Developer / Import rss as literal html

"Please Note:
  • At the specific request of Ibexa we are changing this projects name to "Exponential" or "Exponential (CMS)" effective as of August, 11th 2025.
  • This project is not associated with the original eZ Publish software or its original developer, eZ Systems or Ibexa".

Import rss as literal html

Author Message

michael depetrillo

Tuesday 15 July 2008 10:14:42 am

Hello everyone

I need to import my rss feed as literal html.

I changed the setEZXMLAttribute method in cronjobs/rssimport.php.

The rss feed is importing OK, but when I go to front-end I do not see all the HTML tags.

If I go into back-end with editor enabled, I still do not see all the HTML tags.

If I go into back-end with editor disabled, I see all the HTML. I can then hit save and the editor and front-end will display the correct HTML.

What piece am I missing here?

The feed I am working with is - http://www.cnbc.com/id/20040302/rssCmp/97305/device/rss/rss.xml

function setEZXMLAttribute( $attribute, $attributeValue, $link = false )
{
    //include_once( 'kernel/classes/datatypes/ezxmltext/handlers/input/ezsimplifiedxmlinputparser.php' );
	
    $contentObjectID = $attribute->attribute( "contentobject_id" );
	
	// echo $attributeValue ."\n";
	
	// ADDED FOR LP
	$contentClassID = $attribute->attribute('contentclassattribute_id');
	if ($contentClassID == 206) {
		
		$inputData = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
		$inputData .= "<section xmlns:image=\"http://ez.no/namespaces/ezpublish3/image/\"\n";
		$inputData .= "         xmlns:xhtml=\"http://ez.no/namespaces/ezpublish3/xhtml/\"\n";
		$inputData .= "         xmlns:custom=\"http://ez.no/namespaces/ezpublish3/custom/\">\n";
		$inputData .= "<paragraph>\n<literal class=\"html\">";
		$inputData .= strip_tags($attributeValue, 			"<span><a><p><h1><h2><h3><h4><h5><ul><li><br><table><tr><td><th><tbody><tfoot><hr><img><embed><object>");
		$inputData .= "</literal></paragraph>";
		$inputData .= "</section>";

		$domString = $inputData;
			
	// END ADDED FOR LP
	} else {
		
		$parser = new eZSimplifiedXMLInputParser( $contentObjectID, false, 0, false );
	
		$attributeValue = str_replace( "\r", '', $attributeValue );
		$attributeValue = str_replace( "\n", '', $attributeValue );
		$attributeValue = str_replace( "\t", ' ', $attributeValue );
	
		$document = $parser->process( $attributeValue );
		if ( !is_object( $document ) )
		{
			$cli = eZCLI::instance();
			$cli->output( 'Error in xml parsing' );
			return;
		}
		$domString = eZXMLTextType::domString( $document );
	}
	
	// echo $domString;
	
    $attribute->setAttribute( 'data_text', $domString );
    $attribute->store();
}

Guillaume Kulakowski

Tuesday 15 July 2008 1:37:46 pm

Hello Michael,

I use eZ for a planet : http://planet.fedora-fr.org.

For that, I store RSS content in Text block.
For a valid xHTML content, I use a tidy and a cleaner parser.

You can inspirate of my code :
http://trac.llaumgui.com/browser/ez_publish/myutils/trunk/cronjobs/planet.php (look at setEZTXTAttribute)

My blog : http://www.llaumgui.com (not in eZ Publish ;-))
eZC on RHEL : http://blog.famillecollet.com/pages/Config-en
eZC on Fedora : just "yum install php-channel-ezc"

michael depetrillo

Thursday 17 July 2008 12:12:37 pm

What does the disabled editor due to the HTML before it saves it to a dom document?

Or I could ask

What does the editor due to the HTML from the dom document before it displays it?