Act on CO2 campaign
Robin Goad, Research Director at Hitwise UK, blogged about the surge in searches for ‘act on CO2′ at the turn of the year. As he says, it’s a timely push by the Department for Transport’s CO2 and transport campaign.
He notes that there were a lot more searches for ‘Act on CO2′ than ‘CO2′ in the last four weeks and the “top three sites receiving traffic from ‘act on co2’ were all UK government sites: the Department of Transport (66.5% of searches ended up there), Directgov (27.7%) and www.actonco2.direct.gov.uk (4.6%).” Which reflects the fact there are two UK government ‘Act on CO2′ campaigns - the DfT’s car/driving one and the Defra CO2 Carbon Calculator.
1 comment 16 January 2008
Strategies for improving enterprise search
John Ferrara argues the importance of going beyond the out-of-box experience at Boxes and Arrows Sep 07.
I’d agree with his argument that quality results only come about through applied effort.
The conceptual task
- Search engine. The algorithmic gears that parse the query and assign pages relevance.
- Content. The documents searched.
- Index. A catalogue of the locations of every word in every document.
- User input. The keywords and other parameters the user submits.
- Results display. The way the data returned by the search engine is presented.
Strategies
- Make content machine readable.
- Structural markup - using semantic elements and class attributes.
- Use the keywords and descriptions tags and build a controlled vocabulary to populate. Well, I think there’s rooms for a and the tactical application of wild keywords.
- More metadata - audience, sector etc.
- Ontology - Get the search engine to exploit the relationships between concepts.
- Index All of the Right Data
- Ignore unnecessary content - navigation, footers, adverts etc.
- Get all resources - what about content in PDFs, Word documents etc?. I’d agree that search needs to be aware of them, but disagree about indexing them all. I often find a more usable experience is to index a web page that provides links to and a summary of such documents. The ranking of digital assets can be unpredictable and getting it right is, as Ferrara says, a big job
- Make the most of user input - it is important to make the best of the user’s intent on their first search attempt- after all the user has probably a complex intent behind those few keywords and users are much less likely to make a second. (People Search Once, Maybe Twice - Jared Spool 2001).
- Query expansion - stemming, thesaurus.
- Syntax conventions - get the parser to understand common human syntax - and use AND as the default operator - not sure about that; it can be too restrictive.
- Assisting query formulation - Did you mean?, others searched for…
- Build results around the user’s needs
- Demonstrate relevance
- Generate a snippet which contains the user’s terms. I disagree - in an enterprise solution, they often not very good and you have the opportunity to teach editors to write good, relevant descriptions which include the most important keywords.
- Bold the terms that match the user’s query terms.
- Best bets/ recommended links. I’m a huge fan of these as they can really help orientate users and push important content for the business.
- Conditional content. Take best bets further and display contextually appropriate content when a query indicates a user has a particular interest.
- Demonstrate relevance
Add comment 13 January 2008
Information behaviours
“People don’t understand their own information behaviors, and they don’t really understand much about search or the web, so they will have to learn. It could take generations.”
Christina Wodtke from Boxes and Arrows interviews Amanda Spink in Oct 2006.
Most web queries short:
2- 3 terms
Sessions:
2-3 queries in length with little query modification
Long tail:
The long tail makes ‘relevant’ retrieval very hard - especially if users use only 2 or 3 words - search is not very interactive. Personalisation is an attempt to get more info from user to help determine relevance.
Complex search behaviours:
Often long and complex - beyond one topic/one search paradigm most search engine assume
- Successive searching (e.g. looking for information on cars)
- Multitasking search (batch searches - time constraint or new topics emerge)
Search is challenging and interactive:
- no silver bullet or ’single’ feature that’s effective
- clustering + relevance feedback +??
- lack of people trained in info and web retrieval, web design and usability
- how to elucidate real intent from a small number of keywords?
Add comment 13 January 2008
GoPubMed
An interesting use of facets based on an ontology. Sorting abstracts according to the concept hierarchies of GO and MeSH enable a combined search in molecular biology and medicine. The “What” categories help to systematically explore the results. The tree displayed to the left of results serves as table of contents in order to structure the millions of articles of the PubMed/MEDLINE data base. By navigating the tree users can narrow down from thousands of search results to a few in seconds.
GoPubMed retrieves PubMed abstracts for a search query and sorts relevant information to the 4 top level categories:
- What
- Who
- Where
- When
Add comment 9 January 2008
Wikia search launches
Wikia Search launched today. Nice simple interface; but, as the site says, “the results are pretty bad”. But the concept is that trusted user feedback from a community of users acting together in an open, transparent, public way will improve results. This seems to be based on providing user feedback and creating mini articles that provide short definitions, disambiguations, photos and ’see also’s.
Users get to see the Nutch relevancy score by clicking on the numerical score by each result.
Add comment 7 January 2008
The seven deadly sins of site search according to Vivisimo
A Vivisimo white paper, available at searchdoneright.com lists the seven deadly sins of site search. It’s a basic, but useful introduction:
- Omission - Make sure you offer search - some users prefer it and it’s an escape route when navigation fails.
- Apathy - Make sure you check how search is functioning - relevancy, scent of information from titles an descriptions.
- Complexity - Keep search interface simple - don’t offer >1 search box on a page.
- Omnipotence - Beware the conceit that your engine will understand the user’s intent and respond with an ‘answer’ to a ‘question’.
- Egoism - Don’t ask too much of a visitor - make it user-friendly.
- Brand confusion - Ensure the search results has your brand and your URL.
- Multiple personality - Ensure search pages have a consistent branding and look and feel to the rest of the site.
Add comment 6 January 2008
Eye tracking in MSN Search: Investigating snippet length, target position and task types
A Microsoft Research report using eye tracking techniques to investigate user strategies for web search.
Trust
How people respond to search results when the target is systematically manipulated to be displayed at different positions on two kinds of search tasks and found that users seem to exhibit an implicit trust for the rank generated by the search engine, particularly for informational tasks.
How varying the amount of information in Web search results affected user performance on the same tasks.
As the length of the query-dependent contextual snippet in search results was increased, performance improved for informational queries, while it degraded for navigational queries.
Eye tracking results suggest this difference in performance was due to the fact that as the snippet length increased, users paid more attention to the snippet and less attention to the URL located at the bottom of the search result.
Web search is a very attractive domain for the use of eye tracking techniques.
There are many kinds of metadata that are potentially useful for Web search. How can this information be presented to users in such a way that is complementary to existing information in search results?
Add comment 3 January 2008
Google PSE or Google’s Semantic Web
A summary of an interesting Bear, Steans equity research paper (PDF) from May 2007.
Google is introducing a new layer to its search and indexing methodology. Google’s patent applications were published in February 2007 and call for a Programmable Search Engine (PSE).
PSE will augment its current PageRank algorithm and change the way in which relevance ranking occurs for some types of web pages.
Under the PSE, web page data will be more structured and webmasters will be able to communicate 2-way to Google’s PSE.
Web pages will be indexed more effectively and web site owners will have the ability to instruct Google about what it can and can not do with the web page’s content:
- Provide more granular detail on search results (think car inventory on a lot, not just the local dealer’s phone number)
- Provide more personal results (Google could customize search results to the individual user, based on their preferences and past behavior)
- Reduce spoofed results by spammers and SEOs
- Index password protected information (sometimes called “deep web” or “invisible web” material), with permission (think of information behind in a site with which people have subscriptions)
- Index dynamic sites (sites that change based on what the user asks for – think of flight information on sites like Expedia)
- Do a much better job indexing non-text based information (think video or audio based content)
- Cross-integrate information from different web pages to provide more complete results to answer a question more completely
- Finally, Google would be able to leverage the new found ability to provide more granular information to better target advertising, increasing advertisers ROI
PageRank no longer enough:
- Spammers/Black Hat ‘arms race’
- Can’t offer vertical search or deal with deep web/rich media
- Advertisers want more precision.
Key components of PSE:
- Programmable - via XML to guide indexing - essentially Sitemaps
- Partnerships - webmasters to become content partners, not anonymous sources of data. Onus will be on webmasters to conform to structured data format
- Aggregate multiple data sources - across the web - will alter the playing field for web search, relevance and current advantage of vertical search engines
- Targetted for users and advertisers - customised to context (incl device) of user
- PSE will learn and grow - as it accepts instructions from usage analysis, webmasters (Sitemaps etc), users and advertisers. As well as an ontology for all the data held in PSE, there will be a ‘database of databases’
- Opening up Rich Media Web - users can use XML to specify info, output and formats they want
- Barriers to competition - Competitors could emulate, but Google has scale in place
- Semantic Web - PSE takes an important step towards Google delivering Semantic Web functionality. In stead of a flat index and a keyword index, Google has a database of databases. With the original content and the site’s metadata data can be accessed in a more manipulated form.
- PSE does not replace PageRank
- PSE will take instuctions from XML files - so it’s a more open 2-way design, but Google ultimately defines formats
The five patents
- PSE - a layer on top of PageRank. Importance of context: Metrics of the rules and instructions - PSE will learn which are the ‘popular’ rules; Metadata at any level; Usage tracking will help PSE focus on a contextual subset of results - possible privacy concerns; generates metadata on pre- and post-processing operations - so can learn what users do after seeing results and push related content/ads
- Aggregating context data.
- Sharing context data.
- Detecting spam
- Generating Ads
Add comment 2 January 2008
But search will eventually change all websites
Matt Chapman in Information World Review 01 Jun 2007, summarises John Batelle’s conversation at HP ’s Print 2.0 conference in New York.
Google is now the default interface for the web but will not always be the dominant point of access, “The web has an interface and I would argue that it is Google right now.”
“Search is that interface but it is not always going to be, just like it was not always DOS and it is not always going to be Windows.”
Batelle compared Google’s sparse homepage to the command line from DOS that was used to get information from a computer.
“Where are we now in search? Well, the command line, but with a huge difference: we are not talking in the computer’s language, we are talking in our language,” he said. So search is facilitating a conversation.
Batelle argued that the way search treats its users would eventually have wide-reaching effects for all websites.
“Think about what search does. You come to a place (Google, MSN, Yahoo, whatever), you say something and the whole place reorganises around what you just said,” he explained.
“And this is an interface that we are getting so used to that we are going to start getting mad at businesses that do not do that for us.”
See the videos at HP Corporate TV.
Add comment 9 June 2007
Web 2.0 as ‘content customisation and personalisation’
Mark Iremonger, head of digital, Proximity London, has an interesting article in May’s Revolution magazine. In Digital direct: You have to make your content useful, entertaining, or both (registration required) he makes the makes the point that:
“…brands need to create content that consumers value to earn the right to a relationship. …This means creating content that is either useful or entertaining. … People currently see web 2.0 as synonymous with ‘community creation’, but it is ‘content customisation and personalisation’ that is really at the heart of it. We will see a growing ‘unbundling‘ of services that are traditionally anchored to web sites, setting them free to be available anywhere at anytime.”
Add comment 5 May 2007