Forums / Install & configuration / PDF files not indexed

"Please Note:
  • At the specific request of Ibexa we are changing this projects name to "Exponential" or "Exponential (CMS)" effective as of August, 11th 2025.
  • This project is not associated with the original eZ Publish software or its original developer, eZ Systems or Ibexa".

PDF files not indexed

Author Message

Jeroen Sangers

Thursday 29 June 2006 1:54:44 am

I am trying to include the contents of PDF files in the search index, but cannot get it to work.

I installed pstotext on my server, and tested it with a PDF file. I followed the steps as layed out in http://ez.no/products/ez_publish/documentation/configuration/configuration/search_engine/configuring_binary_file_indexing, and uploaded a PDF file to my site. However, when I search for some words in that file, no results show up.

Is there any way I can turn on logging/auditing to see what is happening when I upload a PDF file?

Siniša Šehović

Thursday 29 June 2006 11:20:51 pm

Hi Jeroen

I have the same problem on eZ 3.8.2.

Can anyone help us here? :-)

Best regards,
S.

---
If at first you don't succeed, look in the trash for the instructions.

Jeroen Sangers

Friday 30 June 2006 8:17:04 am

I still can't get it to work. I have tried moving around pstotext all over my server, I switched to pdftotext, I specified the full path to pstotext in my binaryfile.ini.append.php, but always I receive the same error:

Plugin for application/pdf was not found

Does anybody have a clue on how I can solve this?

Siniša Šehović

Saturday 01 July 2006 4:18:09 am

Hi Jeroen

What happend if you try to execute pstotext from linux shell?

Do you get any errors?

Did you try this aproach?
http://ez.no/community/forum/setup_design/indexing_binary_files_excel_and_powerpoint

S.

---
If at first you don't succeed, look in the trash for the instructions.

Jeroen Sangers

Monday 03 July 2006 1:33:13 am

I managed to solve it this weekend. There were two problems, and in the various configurations I have tried, always one of them appeared, until I tried the right combination!

The first problem is a mistake in the documentation. http://ez.no/products/ez_publish/documentation/configuration/configuration/search_engine/configuring_binary_file_indexing mentioned the following code:

[HandlerSettings]
MetaDataExtractor[application/pdf]=pdf

I copied that setting to my binaryfile.ini file, effectively destroying PDF parsing. Of course, I should have left it at the default value:

[HandlerSettings]
MetaDataExtractor[application/pdf]=ezpdf

The second problem I had was related to pdftotext. I've found out that the command used by Exponential (pdftotext example.pdf) does not produce any output. To get this to work, I had to modify kernel/classes/datatypes/ezbinaryfile/plugins/ezpdfparser.php:

passthru( "$textExtractionTool $fileName -" );

Siniša Šehović

Tuesday 04 July 2006 2:57:18 am

Hi Jeroen

Thanx for tip!

Now I can index my PDFs.

Best regards,
S.

---
If at first you don't succeed, look in the trash for the instructions.