Fellow Umbracans.
I have de-enterprised, upgraded to 2.0 and open sourced a project (which I chickened out of demonstrating at Codegarden) that allows you to do some pretty powerful things that require retrieving and relating Umbraco content and metadata with a minimal amount of effort and high performance, hopefully.
Get it here.
XSLT Lucene search extensionThe core of the system is an extended (well, rewritten) Lucene query language hosted in an XSLT extension for easy integration.
InstallationCopy the binary files and 'DefaultStopWords.txt' to the bin directory of your umbraco installation.
Add the following line into the xsltExtensions element in config/xsltExtensions.config:
<ext assembly="/bin/SearchTools" type="SearchTools.Xsl" alias="stk" />
UseageTo use the search extensions from XSLT, first add xmlns:stk="urn:stk" to your xsl:stylesheet element. You should then be able to call the provided Search method from within your XSLT.
The search method takes two parameters; a query string and a result limit, it returns a nodeset containing results for each Umbraco node that matched the query.
Sample result setThe resultset returned
Code:
<root>
<Result>
<Score>0.23332</Score>
<Field name="PageName">A result page</Field>
<Field name="ADoctypeField">Itsvalue</Field>
<Field name="AnotherDoctypeField">Itsvalue</Field>
</Result>
</root>
Sample XSLT - fixed search for termsThe following XSLT searches the site index for the terms 'project' and 'management', returning up to 50 results in descending order of suitability. The first parameter to the , download the source and have a look at the test suite for an idea of what it should be capable of. Any feedback on the syntax would be appreciated at this early stage where it can be changed easily - I'm starting to doubt my choice of '&&' as the and operator when the queries are mainly going to be written inside xml attributes for one..
Simple queries take the form of fieldName:term or fieldName:"composite term", where the fieldname corresponds to a field in your sites Lucene index.
Code:
<xsl:template match="/">
<ul>
<xsl:for-each select="stk:Search('contents:project contents:management',50)">
<li>
<xsl:apply-templates/>
</li>
</xsl:for-each>
</ul>
<xsl:template match = "Result">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match = "Score">
(<xsl:value-of select="./text()"/>)
</xsl:template>
<xsl:template match = "Field">
<xsl:if test="./@name = 'PageName'">
<xsl:value-of select="./text()"/>
</xsl:if>
</xsl:template>
So far, so not very interesting. Though it might be useful in scenarios where you want to use metadata tags to have your pages appear in multiple sections of a site. I have added features to directly link umbraco content with the Lucene index as a part of your queries.
Sample query - link page fields#injectMeta(PageUnstructured:tags,content)
This query will take the contents of the current page's 'tags' field and transform it into a query against the 'content' field, so it should return a ranked list of all the pages in your site that contain the pages' keywords in their content field.
Sample query - link member field#injectMeta(MemberUnstructured:tags,content)
This query will take the contents of the current member's 'tags' field and transform it into a query against the 'content' field.
Sample query - composite metadata and boostingThe query language is pretty expressive - you can combine expressions with boolean logic, boost components of the expression and negate terms.
#injectMeta(PageUnstructured:tags,content)^3 && PageType:NewsItem && !(PageName:Introduction)
Should be a completely valid query where the metadata injection is boosted by 3x in the ranking, the PageType is required to be 'NewsItem' and the page name should not be 'Introduction'
Utility method - Baysian summariser I have also added the simple baysian summariser from the nclassifier project. This can be used to automatically determine the most significant sentences from text - handy for the automatic production of teaser text.
Quote:
Being able to take a look at the words and phrases people use when looking for things online is invaluable. Rather than listening to people say what they think they might do, you get to observe what they actually did. And when aggregated, you get a nice view of the words people most often use when thinking about and searching for a certain topic.Once armed with keyword intelligence that’s relevant to your niche, you have the unique ability to create highly-relevant content that aids your site visitors and enhances your credibility. You’re speaking the language of the audience after all, and satisfying their needs.And if you get it right, you’ll likely rank well in the search engines too, after promoting the content in a strategic way. It may seem strange to view search traffic as a secondary benefit in a Google-driven world, but that’s exactly how you should view it. Google won’t treat you as relevant until someone else does first.
Will become:
Quote:
Being able to take a look at the words and phrases people use when looking for things online is invaluable. And when aggregated, you get a nice view of the words people most often use when thinking about and searching for a certain topic.Once armed with keyword intelligence that’s relevant to your niche, you have the unique ability to create highly-relevant content that aids your site visitors and enhances your credibility.
When summarised to 2 sentences.
Anybody who has feedback or ideas for this project or who wishes to contribute, please get in touch - ryansroberts at gmail dotcom.