Teaching

Watson and the DeepQA Architecture - Invited Tutorial 2013

We are proud to offer the opportunity of a two-day tutorial on the Watson Deep QA architecture, given by a member of the original IBM research team that won the Jeopardy! challenge.

The tutorial was open to all participants, including but not limited to students and researchers of TU Darmstadt and other universities.

For students of Computer Science at TU Darmstadt: this tutorial is part of an extended seminar (4CP) about Watson, see http://www.ke.tu-darmstadt.de/lehre/ss13/ml-sem for details.

Download the complete deck of slides (28MB).

Presenter

Name: Alfio Massimiliano Gliozzo

Affiliation: Research Staff Member at IBM T.J. Watson Research Center

Contact Information: 19 Skyline Drive, Hawthorne, NY 10532, gliozzo-at-us.ibm-dot-com

Bio: Dr. Alﬁo Gliozzo is Research Staff Member at IBM Watson, where he is part of the Deep QA team. His main research focus is Textual Entailment and Domain Adaptation of Question Answering systems using Distributional Semantics. Before joining IBM, Dr. Alfio Gliozzo worked as a researcher for 11 years in both academic research and semantic technology industry.   He is author of 40+ scientiﬁc publications in the areas of Computational Linguistics, Information Retrieval, and Semantic Web. He achieved a signiﬁcant track record in delivering competitive Semantic Technology systems by conducting state-of-the-art applied research and successful coordination of R&D teams. His solutions and technologies have been applied to develop production level systems for Question Answering, Semantic Advertising and Multimedia Retrieval.

Organizer

Prof. Dr. Chris Biemann, biem(at)cs(dot)tu-darmstadt(dot)de.

Duration and Sessions

The course is structured in 4 modules (2h each), described below. Videos, Demos and other high quality educational material developed by IBM will be presented during sessions, together with technical content describing details of the DeepQA architecture.

Session 1 Open Domain Question answering and the Jeopardy! Grand challenge

Open Domain Question Answering
The Jeopardy! Grand challenge
Analysis of the Jeopardy Task

Session 2 Watson and the Deep QA architecture

The Deep QA architecture
UIMA
Watson Development Cycle

Session 3 Natural Language Processing and Semantic Web Technology in Watson

The NLP Stack in the Deep QA architecture
The NLP Stack
Question Classification and Passage Scoring
Relation Extraction
Linking Text to Knowledge using Linked Data
Temporal and Spatial Reasoning
Type Coercion
Answer Merging

Session 4 Distributional Semantics for Domain Adaptation

Introduction to Structuralism and Distributional Similarity
Scaling Latent Semantic Analysis
The JoBimText project
Domain Adaptation using Distributional Models
Conclusion: Watson In Healthcare and Potential business applications

Dates and Schedule

All sessions will take place at

Altes Maschinenhaus, S01|05, Lecture Hall 122, Magdalenenstr. 12, 64285 Darmstadt close to the main building of TU Darmstadt.

Please find directions below.

Monday, March 18, 2013

13:30 – 15:45: Session 1: Open Domain Question answering and the Jeopardy! Grand challenge
coffee break
16:00 – 18:15: Session 2: Watson and the Deep QA architecture

Tuesday, March 19, 2013

11:00 – 13:00: Session 3: Natural Language Processing and Semantic Web Technology in Watson
lunch break
14:30 – 16:45: Session 4: Distributional Semantics for Domain Adaptation

Topic and Description

Open domain Question Answering (QA) is a long-standing research problem. Recently, IBM took on this challenge in the context of Jeopardy!, a well-known TV quiz show that has been airing on television in the United States for more than 25 years. It pits three human contestants against one another in a competition that requires answering rich natural language questions over a very broad domain of topics. The development of a system able to compete with grand champions in the Jeopardy! challenge led to the design of the DeepQA architecture and the implementation of Watson.   
The DeepQA project shapes a grand challenge in Computer Science that aims to illustrate how the wide and growing accessibility of natural language content and the integration and advancement of Natural Language Processing, Information Retrieval, Machine Learning, Knowledge Representation and Reasoning, and massively parallel computation can drive open-domain automatic Question Answering technology to a point where it clearly and consistently rivals the best human performance.  
Natural Language Processing (NLP) plays a crucial role in the overall Deep QA architecture. It allows to “make sense” of both question and unstructured knowledge contained in the large corpora where most of the answers are located. Semantic Web Technology, enhanced by a massive use of open linked data, is another key component of Watson. Linked data and triple stores have been used to generate candidate answers and to score them under multiple points of view such as type coercion and geographic proximity. In addition the connection between linked data and natural language text offered by Wikipedia has been very useful to generate open domain training data for relation detection and entity recognition systems, improving substantially the NLP capabilities of the system and therefore allowing the development of a truly open domain QA system. With Distributional Semantics, a technology is leveraged that allows fast adaptation of the system to new domains by computing semantic similarity from the application domain’s data and linking terms in context automatically to domain-specific ontologies.

Audience

Ph.D. students, advanced MA students and researchers in the following areas: Natural Language Processing, Machine Learning, Information Retrieval, and Semantic Web
University Lecturers/Professors interested in teaching Watson and Deep QA
Learning outcomes: detailed knowledge of “state of the art” open domain Question Answering architectures and their components

Prerequisite

Basic knowledge of Natural Language Processing and Machine Learning is required
Some basic knowledge of Information Retrieval and Semantic Web is preferred, but not required

Relevance

The successful performance of Watson in playing Jeopardy! is stimulating a huge debate around semantic technology and its possible applications. At the same time little effort has been spent in explaining how Watson works from a technical perspective, generating a gap between the “external” perception of the Watson technology and the actual “state of the art”. The goal of this tutorial is to fill this gap, by providing the technical background required to understand Watson and its components. The tutorial will be presented in an extended version, covering NLP and Semantic Web topics. Open domain Question Answering is extremely interesting for web mining and knowledge engineering, including topics like Search on both text and linked data, Information Extraction and NLP.

Previous Editions

The education activity around Watson is an established series in world wide top conferences. Below a list of selected previous venues. The first three sessions have been partially covered by previous tutorials, the fourth sessions runs for the first time.