25 July 2004
Much of the current research in question answering systems is driven by programs such as AQUAINT and evaluation exercises such as TREC, NTCIR and CLEF, all of which focus on open-domain question answering. The availability of large volumes of data (e.g. documents extracted from the World Wide Web) has prompted the development of systems that focus on shallow text processing.

But there are many document sets in restricted domains that are potentially valuable as a source for question answering systems. For example, the documentation pages of Unix and Linux systems would make an ideal corpus for QA systems targeted at users that want to know how to use these operating systems. There is a wealth of information in other technical documentation such as software manuals, car maintenance manuals, and encyclopediae of specific areas such as medicine. Users interested in these specific areas would benefit from QA systems targeted to their areas of interest.

Restricted domains typically have limited data available and therefore conventional techniques based on data redundancy can simply not be applied in an effective way. The scarcity of data available seems to prompt for a more targeted, NLP-intensive approach to QA. The use of additional corpora such as the WWW raises a number of interesting questions.  For instance, will these corpora help or obstruct the proper functioning of NLP-intensive approach to QA? And, how do we find good pockets of information that are appropriate to the chosen domains?

On the other hand, restricted domains (e.g. law, medicine) have specific stylistic conventions. Often these domains use terminology that is not stored in conventional lexica. Consequently NLP approaches devised for open-domain systems may choke on these specific domains, thus raising the question of how portable these systems can be.

In this workshop we aim at answering some of the following questions:

We welcome papers that address any of the above questions or that focus on any of the following topics:

  Chair: Diego Mollá
8:45-9:00 Welcome
9:00-10:00 Invited presentation: The Perils and Rewards of Developing Restricted Domain Applications
Daniel Marcu
  Coffee Break
  Chair: José Luis Vicedo
10:30-11:00 Evaluation of Restricted Domain Question-Answering Systems
Anne R. Diekema, Ozgur Yilmazel and Elizabeth D. Liddy
11:00-11:30 The Problem of Precision in Restricted-Domain Question Answering. Some Proposed Methods of Improvement
Hai Doan-Nguyen and Leila Kosseim
11:30-12:00 A Qualitative Comparison of Scientific and Journalistic Texts from the Perspective of Extracting Definitions
Igal Gabbay and Richard F.E. Sutcliffe
  Lunch Break
  Chair: Richard F.E. Sutcliffe
13:50-14:20 BioGrapher: Biography Questions as a Restricted Domain Question Answering Task
Oren Tsur, Maarten de Rijke and Khalil Sima'an
14:20-14:50 Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment
Farah Benamara
14:50-15:20 A Practical QA System in Restricted Domains
Hoojung Chung, Young-In Song, Kyoung-Soo Han, Do-Sang Yoon, Joo-Young Lee, Hae-Chang Rim and Soo-Hong Kim
  Coffee Break
  Chair: Elizabeth D. Liddy
15:50-16:20 Answering Questions in the Genomics Domain
Fabio Rinaldi, James Dowdall, Gerold Schneider and Andreas Persidis
16:20-16:50 Analysis of Semantic Classes in Medical Text for Question Answering
Yun Niu and Graeme Hirst
