|
|
 Rank: Fanatic
Joined: 7/19/2006 Posts: 496 Location: Göteborg, Sweden
|
Hi! I just implemented the (great) XSLT Search on a site I'm working on. If i search for a word with the letters å, ä or ö in it, it returns only one result, but if I search for the same word but omits the last part with "ö" in it I get 13 matches? See for your self here (in progress): http://www.soms.nu/sok.aspxThis is a Swedish site for an non-profit association for physical therapists that work with people with different disabilities. Words to test with might be: (learning disability) "utvecklingsstörning" v/s "utvecklingsst" (1 v/s 13 matches) (network) "nätverk" v/s "tverk" (first part omitted) (1 v/s 5 matches) Obviously XSLT search finds the first word but of some reason it doesn't show the rest... // ;) Kalle
" - Yeah I'd like to share your point of view, as long as it's my view too... ( http://www.d-a-d.dk/lyrics/pointofview)
|
|
 Rank: Umbracoholic
Joined: 9/8/2006 Posts: 1,831 Location: MA, USA
|
Is this related to this post? http://forum.umbraco.org/17461View the source of the pages that are and aren't returned and see if some are URL-escaped and some are not. Let me know. cheers, doug.
MVP 2007-2009 - Percipient Studios
|
|
 Rank: Fanatic
Joined: 7/19/2006 Posts: 496 Location: Göteborg, Sweden
|
Yes, that's the same "feature" is seems, the page that actually was found contained unescaped chars somehow... I think I'll try Mr Bock's little hack, unless 2.7 is just around the corner that is ; ) // ; ) Kalle PS. Thanks for pointing me to that thread i totally missed that one :blush:
" - Yeah I'd like to share your point of view, as long as it's my view too... ( http://www.d-a-d.dk/lyrics/pointofview)
|
|
 Rank: Fanatic
Joined: 11/24/2006 Posts: 323 Location: Stockholm, Sweden
|
I'm using XSLTSearch on a Swedish site and I'm not experiencing this problem. Test it at: http://www.itresurs.se/sokPerhaps it has something to do with the globalization settings in web.config? I have set all mine to UTF-8, like this: <globalization requestEncoding="utf-8" responseEncoding="utf-8" fileEncoding="urf-8"/> Still, I see inconsistencies in how the characters in the results are encoded sometimes I see no umlauted Swedish characters in the search results, sometimes only some of them are encoded and sometimes they are all encoded. Don't know why this is though...
Web Developer at Kärnhuset - http://www.karnhuset.net - Stockholm, Sweden
|
|
 Rank: Fanatic
Joined: 7/19/2006 Posts: 496 Location: Göteborg, Sweden
|
@Thomas: I'm sorry to say to actually do have the same issue there... If you search first for "konsulttjänster" and for "konsulttj" you'll notice that the last one returns 4 matches while the first returns 2 matches... And you can see that the 2 first results are for a match in the headline that (I guess) probably is a plain textstring while the word "konsulttjänster" in the body doesn't get matcheded at all... This sounds somehow like the same kind of problem... A question: I noticed that the latest news item contained some paragraphs with the class attribute "MsoNormal" is this a class defined by you or is it from the editor somehow? (Cut'n'paste from word by the client perhaps?) // ; ) Kalle
" - Yeah I'd like to share your point of view, as long as it's my view too... ( http://www.d-a-d.dk/lyrics/pointofview)
|
|
 Rank: Fanatic
Joined: 11/24/2006 Posts: 323 Location: Stockholm, Sweden
|
MsoNormal is something that comes from Microsoft Word and it's not removed by the code cleaner in Umbraco. It would be nice if it where removed but I don't know if I can configure Umbracos HTML-editor to do that? Still - that is a sidetrack. The mail issue here is how to make XSLT Search handle Scandinavian characters better. Perhaps this could be a task for the Scandinavian users of Umbraco? I'm up to my neck in work for clients right now, but I could take a closer look at the code if I get some time.
Web Developer at Kärnhuset - http://www.karnhuset.net - Stockholm, Sweden
|
|
 Rank: Fanatic
Joined: 11/24/2006 Posts: 323 Location: Stockholm, Sweden
|
Looking through the code, prehaps an idea would be to extend the following functions to cover umlauts for the Scandinavian characters as well: Code: public string escapeSearchTerms(string data) { return data.Replace(Convert.ToString((char)38), "&"); }
public string unescapeSearchTerms(string data) { return data.Replace("&", Convert.ToString((char)38)); }
The character code (38) is an ASCII number though and the Scandinavian characters do not exist in ASCII. Would something like this work?: Code: public string escapeSearchTerms(string data) { return data.Replace(Convert.ToString((char)38), "&"); return data.Replace("ä", "ä"); return data.Replace("å", "å"); return data.Replace("ö", "ö"); }
public string unescapeSearchTerms(string data) { return data.Replace("&", Convert.ToString((char)38)); return data.Replace("ä", "ä"); return data.Replace("å", "å"); return data.Replace("ö", "ö"); }
If I understand the structure of the code correctly this would umlaut the characters before they are used in the search? This means that if an "ä" is umlauted in the actual page content you would get a match. But if it's not umlauted (like in the case with the page headlines above) you wouldn't get a match.
Web Developer at Kärnhuset - http://www.karnhuset.net - Stockholm, Sweden
|
|
 Rank: Fanatic
Joined: 7/19/2006 Posts: 496 Location: Göteborg, Sweden
|
You're on the right track there... Further up in this thread Mr ImageGen'n'XSLT Search pointed me to this post about XSLT Search and Danish characters, it covers just what you've tried to accomplish here... // ; ) Kalle
" - Yeah I'd like to share your point of view, as long as it's my view too... ( http://www.d-a-d.dk/lyrics/pointofview)
|
|
|
Guest |