Posts tagged ‘pse’

Google PSE or Google’s Semantic Web

A summary of an interesting Bear, Steans equity research paper (PDF) from May 2007.

Google is introducing a new layer to its search and indexing methodology. Google’s patent applications were published in February 2007 and call for a Programmable Search Engine (PSE).

PSE will augment its current PageRank algorithm and change the way in which relevance ranking occurs for some types of web pages.

Under the PSE, web page data will be more structured and webmasters will be able to communicate 2-way to Google’s PSE.

Web pages will be indexed more effectively and web site owners will have the ability to instruct Google about what it can and can not do with the web page’s content:

  1. Provide more granular detail on search results (think car inventory on a lot, not just the local dealer’s phone number)
  2. Provide more personal results (Google could customize search results to the individual user, based on their preferences and past behavior)
  3. Reduce spoofed results by spammers and SEOs
  4. Index password protected information (sometimes called “deep web” or “invisible web” material), with permission (think of information behind in a site with which people have subscriptions)
  5. Index dynamic sites (sites that change based on what the user asks for – think of flight information on sites like Expedia)
  6. Do a much better job indexing non-text based information (think video or audio based content)
  7. Cross-integrate information from different web pages to provide more complete results to answer a question more completely
  8. Finally, Google would be able to leverage the new found ability to provide more granular information to better target advertising, increasing advertisers ROI

PageRank no longer enough:

  1. Spammers/Black Hat ‘arms race’
  2. Can’t offer vertical search or deal with deep web/rich media
  3. Advertisers want more precision.

Key components of PSE:

  • Programmable – via XML to guide indexing – essentially Sitemaps
  • Partnerships – webmasters to become content partners, not anonymous sources of data. Onus will be on webmasters to conform to structured data format
  • Aggregate multiple data sources – across the web – will alter the playing field for web search, relevance and current advantage of vertical search engines
  • Targetted for users and advertisers – customised to context (incl device) of user
  • PSE will learn and grow – as it accepts instructions from usage analysis, webmasters (Sitemaps etc), users and advertisers. As well as an ontology for all the data held in PSE, there will be a ‘database of databases’
  • Opening up Rich Media Web – users can use XML to specify info, output and formats they want
  • Barriers to competition – Competitors could emulate, but Google has scale in place
  • Semantic Web – PSE takes an important step towards Google delivering Semantic Web functionality. In stead of a flat index and a keyword index, Google has a database of databases. With the original content and the site’s metadata data can be accessed in a more manipulated form.
  • PSE does not replace PageRank
  • PSE will take instuctions from XML files – so it’s a more open 2-way design, but Google ultimately defines formats

The five patents

  1. PSE – a layer on top of PageRank. Importance of context: Metrics of the rules and instructions – PSE will learn which are the ‘popular’ rules; Metadata at any level; Usage tracking will help PSE focus on a contextual subset of results – possible privacy concerns; generates metadata on pre- and post-processing operations – so can learn what users do after seeing results and push related content/ads
  2. Aggregating context data.
  3. Sharing context data.
  4. Detecting spam
  5. Generating Ads

2 January 2008 at 23:22 Leave a comment


July 2018
« Sep    

Twitter Updates