Google has been publishing a series of posts about how its search engine works, such as this one from Amit Singhal, one of the key scientists in its search quality group. (I met Mr. Singhal when I wrote this article about Google’s search algorithms.)
This series reminds me how much confusion there is in the discussion about the future of search and how so many companies are going to become Google-killers by building what they call “semantic search.”
Semantic search is based on the idea that if you can build technology that better understands what a user is really looking for and what a Web page is really about, you can find the page most users want faster. Semantic technology has great promise to improve search, and it offers to provide gainful employment to scores of linguists who have been toiling unappreciated in the world’s universities.
The confusion comes when for entrepreneurs, venture capitalists and journalists assume that the existing search engines haven’t thought about whether the meaning of words will be helpful. Yes, Google’s first innovation was what it calls PageRank—a way of finding high-quality Web pages by counting how many other pages link to them. And many search engines look for other clues to what a page is about, such as what words are used in its title and headlines.
But if you read the most recent post by Mr. Singhal, you will note that the concept of PageRank is not mentioned. He doesn’t use the phrase “semantic search,” either, but that is much of what he writes about. He talks about the nuanced way that Google interprets synonyms. The abbreviation “ab” could mean “air base” or “Alberta, Canada,” depending on the context.
And Google’s search engine can find meaning even when a word is not used. Mr. Singhal gives an example that a search for “galleria sprovieri londra” typed into Google’s Italian site will return the page for this Sprovieri Gallery in London. This is a double trick: The search engine translates “londra” into “London”, but more importantly, the gallery’s home page doesn’t have any address on it at all, and Google, from other information, associates a location with the site anyway. (Mr. Singhal doesn’t say what information Google uses.)
I bring this up because of a rather frustrating conversation I had with Riza C. Berkan, the chief executive of Hakia, which is based in New York. He kept insisting that Hakia had a rare secret.
“We kept expecting search engines like Google and Yahoo to start building semantic systems, but it never happened,” he told me.
Hakia’s solution, he said, is built around an understanding of the concepts represented on a given Web page and the synonyms a user may use for those concepts. And he kept asserting that Google largely looked at PageRank and not meaning. Mr. Berken insisted his search engine had no need to look at links or any other signal besides its understanding of semantic concepts. (Hakia is now focused on a few specialized areas, such as medical and financial information. I don’t find its search engine especially useful so far.)
One reason that Google has been successful is that it knows it is a search engine, not a PageRank engine or, for that matter, a synonym engine. And its systems pick which combination of approaches is best for each query.
Google certainly doesn’t have a monopoly on good ideas, and no doubt someone could develop a Google-killer. But I’ll bet that any real success will come from someone that is also devoted to solving problems of users, not showing off any one innovation from the labs.