Wednesday, February 16, 2011

Another problem Watson can't solve


 The Brown Daily Herald 

I.B.M.’s Watson - Computers Close In on the ‘Paris Hilton’ Problem - NYTimes.com

I suppose that this is a little premature, given that we have not observed the final showing. Despite this, most people are speculating (rightfully so) that Watson will surely win the competition.

I.B.M.’s Watson - Computers Close In on the ‘Paris Hilton’ Problem - NYTimes.com

I enjoyed this article. What does this competition mean for us? What does IBM's feat say about the current state of artificial intelligence (AI)? Artifical intelligence strives to develop algorithms that mimic human intelligence.

Let's view ourselves as a model of computation. Like most computational systems, there are inputs, computation based on those input values, and subsequent outputs based on the input. We have five classes of input -- vision, audio, touch, taste, and smell. Our brains are constantly executing millions of "algorithms" that are processing these input streams into knowledge and information. We do some sifting to store important information in more readily available way (i.e. touching hot burner -- BAD!) Don't believe me? Read on.

Think about something simple, and keep track of the first thought that comes to mind... ready? Think about the term "dog". What was your first thought? Your current dog, related pictures of dogs, other four-legged animals, types of dogs you've seen in the past, perhaps your dog from your childhood, the smell of wet dog, common names of dogs, doggie messes on the floor, companionship, sadness over the loss of a dog, various breeds, your favorite breed, the fact that you need to pick up dog food for Fido, and so on. Over your lifetime to date, your brain has assembled enormous quantities of data about dogs. It inherently maintains an incredibly efficient knowledgebase of information related to dogs so that you can immediately retrieve such information when the need arises.

I gave you new input. It was a statement. You read a sentence to think about dogs and remember your first thought.  As you read on, you were easily able to recall a vast pool of information in the form of memories of experiences, literature read, etc. When we want to know more information about them, we easily consult the library for literature, surf online, speak with friends, neighbors, etc., continually feeding our five senses with more information. I brought up just one topic. How many other topics has your brain acquired information about?

Define intelligence. (Go ahead... look it up in Google, and see if you can come up with a consensus definition!) When you think of intelligence, what comes to mind? I would define it as the ability to acquire and store knowledge in a way such that it can be used to reason -- i.e. to think, understand and make sound judgments -- when faced with decisions. Under this definition, surely the majority of us are intelligent to some degree.

Quantifying intelligence is another topic of debate for many.

The current system of quantification being used is a competition on Jeopardy. At face value, it seems to be a fair premise. But, before jumping to enormous conclusions that this is the end of mankind, please carefully consider Watsons fine-tuned set of inputs, processing capabilities, knowledgebase, and output.

WATSON - 2800 CPUs, 15 TB of memory, 10 primary clusters. (I'm sure I'm missing details. Most of this was assembled from a couple of presentations on Watson.) It was fed an initial seed of approximately 200,000,000 pages of text documents, for which it needed to sift through and store knowledge about. This was NOT an easy feat! It needed to be able to extract what is important and discard what is not. For a computer to do this is truly a remarkable feat! It used this initial seed of knowledge and continued to improve itself.  How? As it continued to play repeated matches of jeopardy over the past couple of years, its algorithms would continue to learn from its mistakes, basically making corrections in its knowledgebase and more importantly, fine tune its underlying models and systems of information retrieval, working to continually find tune its parameters in order to improve its performance. Performance is a typical measure in machine learning -- get the best outcome with respect to minimizing risk.  For Jeopardy, we want to maximize the number of questions answered correctly against minimizing risk of losing money by answering questions that it is not 100% confident of the answer. 

With our definition of intelligence, Watson has surely demonstrated an artificial form of intelligence. It can acquire knowledge, store it and efficiently reason with it for personal gain.  A computer can far surpass our capability to recall specific facts about an extremely wide range of topics, limited only by computational resources.  I argue that the model for knowledge management in Watson is not the real win gere.  The only reason why Watson is able to store as much as it has is owed mostly to the availability of powerful systems that we did not have 30 years ago or more -- a time when AI was a real hot area. The real win with Watson is not so much in its ability to recall information from a database of topics, terms, concepts, etc. Again, you can thank its computational resources for that.  The real win with Watson is in its natural language processing!

Let's consider NLP. Watson was fed every question and document in pure text form. It did not have audio input to listen (which is why it would repeat mistakes made by other opponents.) We still have a way to go to get a computer to effectively understand spoken language effectively.

My final thoughts: Given its restricted system for deep analytics, in this context, Watson was a wonderful demonstration of AI for these specific purposes. It's display of deep analytics and knowledge storage and retrieval through natural language processing was amazing.  The real-world applications are obvious, ranging from helping doctors make medical decisions based on symptoms, insurance decisions, technical support assistance, military decisions, court document management, and so on. But, is it in par with all facets of human intelligence? Can it process the same types of inputs? Absolutely not. Watson has no capability for image recognition. (Notice, there were no pictures displayed, unlike real Jeopardy.) There is no ability to recall audio of famous speeches, which so often brings a new light to a standard text (think about MLK's 'I have a dream...' speech). It can not reason about important parts of life such as smell or taste (and sometimes I wish I couldn't either!) And most importantly, humans have other needs and face far more complex problems beside factual recovery. These decisions often require a good measure of intelligence without emotion, and yet the emotion can not be discarded through the process. Many of these decisions have no one specific right answer for all mankind. These decisions often have an enormous amount of individual context and experiences that must be considered.

Consider the complexities of relationships... Watson, help! Perhaps IBM's next project should be a new system called Freud.

PS - I hope to learn more about the algorithms behind Watson soon...












Tuesday, February 08, 2011

Challenges in Data Mining

I always enjoy reading material from Han. If you're interested in learning about some of the challenges still remaining in data mining, I encourage you to read through this paper.

ngdm09_han_gao.pdf (application/pdf Object)