Watson and the DeepQA Architecture - Invited Tutorial 2013

We are proud to offer the opportunity of a two-day tutorial on the Watson Deep QA architecture, given by a member of the original IBM research team that won the Jeopardy! challenge.

The tutorial was open to all participants, including but not limited to students and researchers of TU Darmstadt and other universities.

For students of Computer Science at TU Darmstadt: this tutorial is part of an extended seminar (4CP) about Watson, see http://www.ke.tu-darmstadt.de/lehre/ss13/ml-sem for details.

Download the complete deck of slides (28MB).

 

Presenter


Name: Alfio Massimiliano Gliozzo

Affiliation: Research Staff Member at IBM T.J. Watson Research Center

Contact Information: 19 Skyline Drive, Hawthorne, NY 10532, gliozzo-at-us.ibm-dot-com

Bio: Dr. Alfio Gliozzo is Research Staff Member at IBM Watson, where he is part of the Deep QA team. His main research focus is Textual Entailment and Domain Adaptation of Question Answering systems using Distributional Semantics. Before joining IBM, Dr. Alfio Gliozzo worked as a researcher for 11 years in both academic research and semantic technology industry. 

He is author of 40+ scientific publications in the areas of Computational Linguistics, Information Retrieval, and Semantic Web. He achieved a significant track record in delivering competitive Semantic Technology systems by conducting state-of-the-art applied research and successful coordination of R&D teams. His solutions and technologies have been applied to develop production level systems for Question Answering, Semantic Advertising and Multimedia Retrieval.

Organizer

Prof. Dr. Chris Biemann, biem(at)cs(dot)tu-darmstadt(dot)de.

Duration and Sessions

The course is structured in 4 modules (2h each), described below. Videos, Demos and other high quality educational material developed by IBM will be presented during sessions, together with technical content describing details of the DeepQA architecture.

Session 1 Open Domain Question answering and the Jeopardy! Grand challenge

  • Open Domain Question Answering
  • The Jeopardy! Grand challenge
  • Analysis of the Jeopardy Task

Session 2 Watson and the Deep QA architecture

  • The Deep QA architecture
  • UIMA
  • Watson Development Cycle

Session 3 Natural Language Processing and Semantic Web Technology in Watson

  • The NLP Stack in the Deep QA architecture
  • The NLP Stack
  • Question Classification and Passage Scoring
  • Relation Extraction
  • Linking Text to Knowledge using Linked Data
  • Temporal and Spatial Reasoning
  • Type Coercion
  • Answer Merging

Session 4 Distributional Semantics for Domain Adaptation

  • Introduction to Structuralism and Distributional Similarity
  • Scaling Latent Semantic Analysis
  • The JoBimText project
  • Domain Adaptation using Distributional Models
  • Conclusion: Watson In Healthcare and Potential business applications

 

Dates and Schedule

All sessions will take place at

Altes Maschinenhaus, S01|05, Lecture Hall 122, Magdalenenstr. 12, 64285 Darmstadt  close to the main building of TU Darmstadt.

Please find directions below.

Monday, March 18, 2013

  • 13:30 – 15:45: Session 1: Open Domain Question answering and the Jeopardy! Grand challenge
  • coffee break
  • 16:00 – 18:15: Session 2: Watson and the Deep QA architecture

 

Tuesday, March 19, 2013

  • 11:00 – 13:00: Session 3:  Natural Language Processing and Semantic Web Technology in Watson
  • lunch break
  • 14:30 – 16:45: Session 4: Distributional Semantics for Domain Adaptation



Topic and Description

Open domain Question Answering (QA) is a long-standing research problem. Recently, IBM took on this challenge in the context of Jeopardy!, a well-known TV quiz show that has been airing on television in the United States for more than 25 years. It pits three human contestants against one another in a competition that requires answering rich natural language questions over a very broad domain of topics. The development of a system able to compete with grand champions in the Jeopardy! challenge led to the design of the DeepQA architecture and the implementation of Watson. 


The DeepQA project shapes a grand challenge in Computer Science that aims to illustrate how the wide and growing accessibility of natural language content and the integration and advancement of Natural Language Processing, Information Retrieval, Machine Learning, Knowledge Representation and Reasoning, and massively parallel computation can drive open-domain automatic Question Answering technology to a point where it clearly and consistently rivals the best human performance. 

Natural Language Processing  (NLP) plays a crucial role in the overall Deep QA architecture. It allows to “make sense” of both question and unstructured knowledge contained in the large corpora where most of the answers are located. Semantic Web Technology, enhanced by a massive use of open linked data,  is another key component of  Watson. Linked data and triple stores have been used to generate candidate answers and to score them under multiple points of view such as type coercion and geographic proximity.  In addition the connection between linked data and natural language text offered by Wikipedia has been very useful to generate open domain training data for relation detection and entity recognition systems, improving substantially the NLP capabilities of the system and therefore allowing the development of a truly open domain QA system.  With Distributional Semantics, a technology is leveraged that allows fast adaptation of the system to new domains by computing semantic similarity from the application domain’s data and linking terms in context automatically to domain-specific ontologies.

Audience

  • Ph.D. students, advanced MA students and researchers in the following areas: Natural Language Processing, Machine Learning, Information Retrieval, and Semantic Web
  • University Lecturers/Professors interested in teaching Watson and Deep QA
  • Learning outcomes: detailed knowledge of “state of the art” open domain Question Answering architectures and their components

Prerequisite

  • Basic knowledge of Natural Language Processing and Machine Learning is required
  • Some basic knowledge of Information Retrieval and Semantic Web is preferred, but not required

Relevance

The successful performance of Watson in playing Jeopardy! is stimulating a huge debate around semantic technology and its possible applications. At the same time little effort has been spent in explaining how Watson works from a technical perspective, generating a gap between the “external” perception of the Watson technology and the actual “state of the art”. The goal of this tutorial is to fill this gap, by providing the technical background required to understand Watson and its components. The tutorial will be presented in an extended version, covering NLP and Semantic Web topics. Open domain Question Answering is extremely interesting for web mining and knowledge engineering, including topics like Search on both text and linked data, Information Extraction and NLP.

Previous Editions

The education activity around Watson is an established series in world wide top conferences. Below a list of selected previous venues. The first three sessions have been partially covered by previous tutorials, the fourth sessions runs for the first time.

Press

Directions

Individual traffic:

A 5 / A 67 Exit "Darmstadt Stadtmitte"
B 26 "Rheinstraße" direction Stadtmitte
B 26 "Cityring" section S1 is directly adjacent to Cityring

Public transport:

Bus lines F und H from Hauptbahnhof (Darmstadt central train station)
via central transfer point "Luisenplatz"
to bus stop "Alexanderstraße/TU".

Alternatively, you can take tram nr. 3 to "Willy Brand Platz" and walk through the park.

 

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang