Posts filed under 'web 2.0'
Wikia search launches
Wikia Search launched today. Nice simple interface; but, as the site says, “the results are pretty bad”. But the concept is that trusted user feedback from a community of users acting together in an open, transparent, public way will improve results. This seems to be based on providing user feedback and creating mini articles that provide short definitions, disambiguations, photos and ’see also’s.
Users get to see the Nutch relevancy score by clicking on the numerical score by each result.
Add comment 7 January 2008
Google PSE or Google’s Semantic Web
A summary of an interesting Bear, Steans equity research paper (PDF) from May 2007.
Google is introducing a new layer to its search and indexing methodology. Google’s patent applications were published in February 2007 and call for a Programmable Search Engine (PSE).
PSE will augment its current PageRank algorithm and change the way in which relevance ranking occurs for some types of web pages.
Under the PSE, web page data will be more structured and webmasters will be able to communicate 2-way to Google’s PSE.
Web pages will be indexed more effectively and web site owners will have the ability to instruct Google about what it can and can not do with the web page’s content:
- Provide more granular detail on search results (think car inventory on a lot, not just the local dealer’s phone number)
- Provide more personal results (Google could customize search results to the individual user, based on their preferences and past behavior)
- Reduce spoofed results by spammers and SEOs
- Index password protected information (sometimes called “deep web” or “invisible web” material), with permission (think of information behind in a site with which people have subscriptions)
- Index dynamic sites (sites that change based on what the user asks for – think of flight information on sites like Expedia)
- Do a much better job indexing non-text based information (think video or audio based content)
- Cross-integrate information from different web pages to provide more complete results to answer a question more completely
- Finally, Google would be able to leverage the new found ability to provide more granular information to better target advertising, increasing advertisers ROI
PageRank no longer enough:
- Spammers/Black Hat ‘arms race’
- Can’t offer vertical search or deal with deep web/rich media
- Advertisers want more precision.
Key components of PSE:
- Programmable - via XML to guide indexing - essentially Sitemaps
- Partnerships - webmasters to become content partners, not anonymous sources of data. Onus will be on webmasters to conform to structured data format
- Aggregate multiple data sources - across the web - will alter the playing field for web search, relevance and current advantage of vertical search engines
- Targetted for users and advertisers - customised to context (incl device) of user
- PSE will learn and grow - as it accepts instructions from usage analysis, webmasters (Sitemaps etc), users and advertisers. As well as an ontology for all the data held in PSE, there will be a ‘database of databases’
- Opening up Rich Media Web - users can use XML to specify info, output and formats they want
- Barriers to competition - Competitors could emulate, but Google has scale in place
- Semantic Web - PSE takes an important step towards Google delivering Semantic Web functionality. In stead of a flat index and a keyword index, Google has a database of databases. With the original content and the site’s metadata data can be accessed in a more manipulated form.
- PSE does not replace PageRank
- PSE will take instuctions from XML files - so it’s a more open 2-way design, but Google ultimately defines formats
The five patents
- PSE - a layer on top of PageRank. Importance of context: Metrics of the rules and instructions - PSE will learn which are the ‘popular’ rules; Metadata at any level; Usage tracking will help PSE focus on a contextual subset of results - possible privacy concerns; generates metadata on pre- and post-processing operations - so can learn what users do after seeing results and push related content/ads
- Aggregating context data.
- Sharing context data.
- Detecting spam
- Generating Ads
Add comment 2 January 2008
But search will eventually change all websites
Matt Chapman in Information World Review 01 Jun 2007, summarises John Batelle’s conversation at HP ’s Print 2.0 conference in New York.
Google is now the default interface for the web but will not always be the dominant point of access, “The web has an interface and I would argue that it is Google right now.”
“Search is that interface but it is not always going to be, just like it was not always DOS and it is not always going to be Windows.”
Batelle compared Google’s sparse homepage to the command line from DOS that was used to get information from a computer.
“Where are we now in search? Well, the command line, but with a huge difference: we are not talking in the computer’s language, we are talking in our language,” he said. So search is facilitating a conversation.
Batelle argued that the way search treats its users would eventually have wide-reaching effects for all websites.
“Think about what search does. You come to a place (Google, MSN, Yahoo, whatever), you say something and the whole place reorganises around what you just said,” he explained.
“And this is an interface that we are getting so used to that we are going to start getting mad at businesses that do not do that for us.”
See the videos at HP Corporate TV.
Add comment 9 June 2007
Web 2.0 as ‘content customisation and personalisation’
Mark Iremonger, head of digital, Proximity London, has an interesting article in May’s Revolution magazine. In Digital direct: You have to make your content useful, entertaining, or both (registration required) he makes the makes the point that:
“…brands need to create content that consumers value to earn the right to a relationship. …This means creating content that is either useful or entertaining. … People currently see web 2.0 as synonymous with ‘community creation’, but it is ‘content customisation and personalisation’ that is really at the heart of it. We will see a growing ‘unbundling‘ of services that are traditionally anchored to web sites, setting them free to be available anywhere at anytime.”
Add comment 5 May 2007