Wednesday, April 29, 2015

Watson's Mistakes

James Barrat has identified IBM's Watson as "the first machine to make us wonder if it understands" (224).  But many critics, including many computer scientists, would say that this is an illusion.  There is no reason to wonder whether Watson understands what it is doing when it plays the Jeopardy game, because Watson is little more than a fancy search engine.

Watson has downloaded millions of documents, including encyclopedias, books, newspapers, and the whole of Wikipedia.  When it is presented with a question or a Jeopardy clue, it analyses the language and does a statistical word-search across its documents for matching language, comparable to the way the Google search engine scans the Internet for us when we type in some words.  Watson must then find a match between the significant phrases in the original question and phrases from its search.  From this, Watson can produce an answer that looks like the product of intelligent thinking, comparable to the thinking of Ken Jennings and Brad Rutter when they play Jeopardy.  But surely thinking and searching are not the same thing.

But if we interpret the success of Jennings and Rutter in playing Jeopardy as evidence of their human thinking, why shouldn't we interpret the success of Watson in defeating them in Jeopardy as evidence of his machine thinking?

After all, even Jennings said that he felt that Watson was thinking like a human competitor:
"The computer's techniques for unraveling Jeopardy clues sounded just like mine.  That machine zeroes in on key words in a clue, then combs its memory (in Watson's case, a fifteen-terabyte data bank of human knowledge) for clusters of associations with those words.  It rigorously checks the top hits against all the contextual information it can muster: the category name; the kind of answer being sought; the time, place, and gender hinted at in the clue; and so on.  And when it feels 'sure' enough, it decides to buzz.  This is all an instant, intuitive process for a human Jeopardy player, but I felt convinced that under the hood my brain was doing more or less the same thing." ("My Puny Human Brain," Slate, February 26, 2011)
Or was Jennings caught up in the same illusion that captivated so many other people who observed Watson at work?  Can we dispel this illusion by looking at Watson's mistakes in the game that reveal his mechanical stupidity and failure to understand anything? 

Consider the following three examples of Watson's mistakes.

1.  "Olympic Oddities" in the Jeopardy Round for $1,000.

It was the anatomical oddity of U.S. gymnast George Eyser who won a gold medal on the parallel bars in 1904.

Jennings:  What is he only had one hand?

Watson:  What is leg?

Correct answer: What is he's missing a leg?

Triple Stumper (all three contestants failed to answer correctly)

2. "The Art of the Steal" in the Double Jeopardy Round for $1,600.

In May 2010, 5 paintings worth $125 million by Braque, Matisse, & 3 others left Paris' Museum of this art period.

Watson: What is Picasso?

Jennings:  What is cubism?

Rutter:  What is impressionism?

Correct answer: What is modern art?

Triple Stumper.

3.  "U.S. Cities" in Final Jeopardy Round.

The largest airport is named for a World War II hero, its second largest for a World War II battle.

Jennings:  What is Chicago?

Rutter:  What is Chicago?

Watson:  What is Toronto?????

Notice first that two of these three were Triple Stumpers.  So the two human contestants were no better than Watson in unraveling two of these clues.

If you go to the Wikipedia page for Eyser, you will see that he had lost his left leg as a child, and it was replaced with a wooden leg.  Seeing this, Watson answered: "What is leg?"  Alex Trebek said "Yes."  But then a judge stopped the game.  After a five-minute discussion, the judges decided this was the wrong answer, because "leg" was not the "anatomical oddity," but the fact that he was missing a leg.  If Watson had answered--"What is a wooden leg?"--that might have provoked another discussion as to whether that was the correct answer.

We can see that Jennings knew by common sense that for a gymnast doing parallel bars, agile use of hands, arms, and legs is normal, and so missing one of these would be an "oddity."  But he could only guess which one was missing.  In the few seconds he had to think about it, Jennings did not have time to figure out that artificial arms or hands in 1904 would have been too crude for a gold medal performance, but a wooden leg might have been less disabling.

Watson knew that "anatomical" could include leg.  But the Wikipedia page does not include the word "oddity."  Any human being reading the Wikipedia page would immediately identify missing a natural leg and having a wooden leg as an "oddity" for an Olympic gymnast.  This is part of what computer scientists have called "common sense knowledge"--the massive accumulation of informal knowledge that human beings acquire by experience without any explicit instruction, but which computers do not have.

Although it is a daunting project, providing computers with human common sense knowledge is in principle possible.  After all, if Watson had had the information in its data base that "missing a leg is an oddity for an Olympic gymnast," Watson could have answered correctly.  Beginning in 1984, Doug Lenat and his colleagues at Cycorp has been building an expert system--CYC (for encyclopedic)--that codes common sense knowledge to provide machines with an ability to understand the unspoken assumptions underlying human ideas and reasoning.  Lenat wants CYC to master hundreds of millions of things that a typical person knows about the world.  For example, consider the common sense knowledge that birds can generally fly, but not ostriches and penguins, and not dead birds, and not birds with their feet in cement, and not . . . .  It is hard but not impossible to formalize all such common sense knowledge.

Watson's mistake about the art theft clue shows how Jeopardy clues are often confusing in that it's hard to interpret what kind of answer is being sought.  In this case, the clue wasn't seeking the name of an artist or an art period as such, but the name of a museum--the Musee d'Art Moderne, the Museum of Modern Art, in Paris.  Jennings and Rutter couldn't come up with the right answer.  Watson's first choice--"Picasso"--was also wrong.  But Watson did have the right answer--"Modern Art"--as his third choice!

Watson was widely ridiculed for his mistaken answer "Toronto" under the "U.S. Cities" category.  This machine is so dumb that he thinks Toronto is a U.S. city!  But notice that Watson put multiple question marks after his answer to indicate a low level of confidence.  The confidence level for "Toronto" was only 14%.  "Chicago" was his second ranked answer at 11% confidence.

Why did Watson guess that Toronto was an American city?  There are lots of small towns in the United States named Toronto, but none are large cities with large airports.  In doing his statistical analysis, Watson might have noticed that the United States is often called America, and Toronto, Ontario, is a North American city that has a baseball team (the Blue Jays) that is in the American League.

This all shows how Watson can become so confused that he cannot confidently find the right answer.  But, of course, this also happens to the human contestants in Jeopardy

Searle would say that we just know that human beings can think, and machine's can't!

But what if, after the game was over, it was announced that the game was actually a Turing Test--that Ken Jennings was actually a robot designed to look like Jennings, and that the robot's intelligence was Watson's?  Searle would say that even if this had happened, it would not have proven that Watson can think, because a machine can pass the Turing Test without really understanding anything.  The problem with this argument is that it throws us into solipsism, because it would mean that we cannot even be sure that human beings understand anything based on what we observe of their behavior.

No comments: