Archive for January 13th, 2008
Strategies for improving enterprise search
John Ferrara argues the importance of going beyond the out-of-box experience at Boxes and Arrows Sep 07.
I’d agree with his argument that quality results only come about through applied effort.
The conceptual task
- Search engine. The algorithmic gears that parse the query and assign pages relevance.
- Content. The documents searched.
- Index. A catalogue of the locations of every word in every document.
- User input. The keywords and other parameters the user submits.
- Results display. The way the data returned by the search engine is presented.
Strategies
- Make content machine readable.
- Structural markup - using semantic elements and class attributes.
- Use the keywords and descriptions tags and build a controlled vocabulary to populate. Well, I think there’s rooms for a and the tactical application of wild keywords.
- More metadata - audience, sector etc.
- Ontology - Get the search engine to exploit the relationships between concepts.
- Index All of the Right Data
- Ignore unnecessary content - navigation, footers, adverts etc.
- Get all resources - what about content in PDFs, Word documents etc?. I’d agree that search needs to be aware of them, but disagree about indexing them all. I often find a more usable experience is to index a web page that provides links to and a summary of such documents. The ranking of digital assets can be unpredictable and getting it right is, as Ferrara says, a big job
- Make the most of user input - it is important to make the best of the user’s intent on their first search attempt- after all the user has probably a complex intent behind those few keywords and users are much less likely to make a second. (People Search Once, Maybe Twice - Jared Spool 2001).
- Query expansion - stemming, thesaurus.
- Syntax conventions - get the parser to understand common human syntax - and use AND as the default operator - not sure about that; it can be too restrictive.
- Assisting query formulation - Did you mean?, others searched for…
- Build results around the user’s needs
- Demonstrate relevance
- Generate a snippet which contains the user’s terms. I disagree - in an enterprise solution, they often not very good and you have the opportunity to teach editors to write good, relevant descriptions which include the most important keywords.
- Bold the terms that match the user’s query terms.
- Best bets/ recommended links. I’m a huge fan of these as they can really help orientate users and push important content for the business.
- Conditional content. Take best bets further and display contextually appropriate content when a query indicates a user has a particular interest.
- Demonstrate relevance
Add comment 13 January 2008
Information behaviours
“People don’t understand their own information behaviors, and they don’t really understand much about search or the web, so they will have to learn. It could take generations.”
Christina Wodtke from Boxes and Arrows interviews Amanda Spink in Oct 2006.
Most web queries short:
2- 3 terms
Sessions:
2-3 queries in length with little query modification
Long tail:
The long tail makes ‘relevant’ retrieval very hard - especially if users use only 2 or 3 words - search is not very interactive. Personalisation is an attempt to get more info from user to help determine relevance.
Complex search behaviours:
Often long and complex - beyond one topic/one search paradigm most search engine assume
- Successive searching (e.g. looking for information on cars)
- Multitasking search (batch searches - time constraint or new topics emerge)
Search is challenging and interactive:
- no silver bullet or ’single’ feature that’s effective
- clustering + relevance feedback +??
- lack of people trained in info and web retrieval, web design and usability
- how to elucidate real intent from a small number of keywords?
Add comment 13 January 2008