umbSearch with PDF and Office Filter Options
avalentin
Posted: Friday, December 21, 2007 4:12:47 PM
Rank: Newbie

Joined: 12/21/2007
Posts: 3
Hi!

I just compiled the umbSearch extension. Indexing of Content works fine after some work, but I'm still having trouble to index pdf, doc and docx files.
Of course I also installed that filters, aspx files and dll files.
My current config looks like this (stored unter data/umbSearch/umbracoSearchConfig.xml)
<?xml version="1.0" encoding="utf-8" ?>
<indexConfiguration>
<!-- separate multiple items with semicolons -->
<indexDataWithAliases>bodyText;anotherPropertyAlias;Content</indexDataWithAliases>
<excludeUmbracoNaviHide>true</excludeUmbracoNaviHide>
<excludeNodeTypes>sampleAlias</excludeNodeTypes>
<excludeIds>0</excludeIds>
<indexFilesWithAliases>umbracoFile;Content</indexFilesWithAliases>
<!-- the username (login) for a user who should receieve error messages (mails) when the index doesn't update -->
<alertUserAlias>umbraco</alertUserAlias>
</indexConfiguration>

My Content is stored on the in the datafield Content. In that content are links to the media store. But I do not unterstand why umbSearch doesn't index it.

Hopefully someone could help.
drobar
Posted: Friday, December 21, 2007 4:20:16 PM

Rank: Umbracoholic

Joined: 9/8/2006
Posts: 1,831
Location: MA, USA
You might also look at the code Ismail developed some months ago at http://ismail.umbraco.net/1243.aspx

cheers,
doug.

MVP 2007-2009 - Percipient Studios
avalentin
Posted: Friday, December 21, 2007 4:27:51 PM
Rank: Newbie

Joined: 12/21/2007
Posts: 3
I just saw that there is a log in the Database,it says this:
Code:

Error indexing file: <p><a href="/media/398/trueimageenterpriseserver9.1_ug.de.pdf">PDF Datei:</a></p><p><a href="http://umbraco.marcant.net/media/911/kurt%20kehrt%20heim.docx">DocX</a></p><p><a href="http://umbraco.marcant.net/media/914/kurt%20kehrt%20heim.doc">Doc</a></p>

Full error message:
System.Exception: No filter found matching extension doc">doc</a></p>. Loaded and valid extensions includes: pdf, odt, txt, xls, ppt, doc, docx, xlsx, pptx
   bei umbSearch.businessLogic.FileFilterFactory.GetFilter(String extension) in F:\Projekte\VS\umbracoext-14317\umbSearch\umbSearch\businessLogic\FileFilterFactory.cs:Zeile 34.
   bei umbSearch.businessLogic.Indexer.AddNodeToIndex(XmlNode n, IndexWriter writer, String[] excludeIds) in F:\Projekte\VS\umbracoext-14317\umbSearch\umbSearch\businessLogic\Indexer.cs:Zeile 122.


Why is umbSearch trying to index the PageContent and not the file itself?

Thanks

André
avalentin
Posted: Friday, December 21, 2007 4:31:05 PM
Rank: Newbie

Joined: 12/21/2007
Posts: 3
Hi!

Thanks for the hint, but it seems I have a general misunderstanding of how it works. I 'll take a look at Ismails alternative.
Users browsing this topic
Guest


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.