Wednesday, June 8, 2011

Debunking Myths about IBM's Watson!


There are many instances of televised technological feats of human race which have managed to leave a lasting impression on us - In my opinion, probably, Apollo 11 landing on Moon on July 20th 1969 will always retain the number one spot - in terms of impact value at a given time. I don’t know who is the close second but IBM Watson’s televised Jeopardy challenge when it bested Brad Rutter, the biggest all-time money winner on Jeopardy!, and Ken Jennings, the record holder for the longest championship streak, will always have a place in history. Organizing these kind of challenges is not new to IBM - who can forget when Deep Blue, a chess playing computer developed by IBM, beat world champion Gary Kasparov in a controversial match on 11th May 1997.




A lot of great things have been written about Watson, named after the IBM's first President but also influenced by the name of Sherlock Holmes' assistant Dr. Watson, since the televised challenge. Though, at the same time, you will also find many tweets ridiculing, hopefully in a good humour, that it needed lessons in geography after the (in)famous "Toronto" answer. But if you understand even little bit of information retrieval, natural language processing, machine learning, knoweldge representation etc.. then you would have realized what an amazing accomplishment this is. Recently, I had the opportunity to meet Aditya Kalyanpur, one of the team members of the DeepQA project which built Watson. Rome wasn't built in a day! Similarly it took more than four years and roughly twenty five brilliant technologists to build Watson. There are many unknown facts which we will come to know in due course of time. for e.g. Did you know that from September 2010 through December 2010 Watson played 55 games against Tournament of Champion Jeopardy! players and won 71% of the games. These players represent some of the best Jeopardy! players in the world.

Aditya managed to debunk few myths about Watson and also highlighted the approach taken by the team to develop the software. I would like to share them with you:

Three prominent myths:

• Watson answers a question by changing the query to structured query; then it queries a structured knowledge base. This is not true at all. This approach is taken by traditional QA systems and is very brittle with poor domain coverage.

• Watson identifies the question type and generates candidates from precompiled list of instances. This is false. Watson relies on a radically novel open domain type “coercion” technique.

• It uses either structured or unstructured data analytics. This is not true either. Watson integrates information from both unstructured and structured data analytics.


Some of the notable things about the DeepQA project:

• It does deep analysis of a question by breaking it down into relevant components like the focus, key entities and relationships, classifying it into broad classes, requirements of special handling etc..In one such strategy the system identifies independent facts within a given question, poses these as new questions to the underlying QA system, and generates answer candidates supported by the facts. Candidates that have support from multiple independent facts reinforce each other to boost system confidence in them

• Hypotheses, which in the QA case, are potential answers to the question, are generated by the Search component, which retrieves content relevant to a given question from the large volume of local knowledge resources Watson can access. The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles, and literary works. Watson also used databases, taxonomies, and ontologies, for example, DBPedia, YAGO, WordNet.

• Candidate Generation component identifies potential answers to the question from the retrieved content. A variety of answer scoring algorithms are then applied using DeepQA’s pervasive probabilistic framework. Other than linguistic processing, taxonomic, geospatial, temporal, popularity and source reliability are some of the evidence dimensions used by the scorers to constrain or support the right answer.

UIMA-AS was used for orchestrating the overall processing and an in-memory implementation of Sesame was used for storing RDF data. Initially, it used to take 1-2 hours to answer a question but using a massively parallel architecture and exploiting more than 2800 P7 cores, QA time was reduced to a few seconds.

• Machine learning and Monte Carlo methods used by the strategy components to estimate and optimize the win probabilities for the various players in a particular game state



No comments:

Post a Comment