Meeting Room
Meeting Room Project

Topic Detection in Meeting Dialogs

This project, part of our Meeting Room Project, is trying to find ways to automatically locate topic breaks in meeting transcripts. That is, given the text of a meeting, find where the topic of conversation changes significantly. In order to evaluate our technology we need human annotated meetings which include topic change information. We're asking for volunteers to read through some transcripts and record where they think topic changes occur.

Guidelines

Markup Format

Each meeting transcript is presented as a web page with each speaker clearly marked. Within a turn, an underlined section means that another speaker is interrupting the first, possibly overlapping with them. Each speaker will have an identifier, in some cases this is S1, S2 etc, in others it's a name. Each speaker turn is also numbered in the top left of the box containing that turn. Use this number to identify where breaks occur, that is, write down the turn number for the first turn in each new topic along with the break strength (1-5) and any comments you might have. We ask that you return to us a small text file containing this information, eg:

12 2
29 3
89 5   Start talking about fruit
103 1  now fruit canning
130 4  now vegetable production
    

If you think that a topic change happens inside a turn, please include this in a comment and estimate about where you think this occurs, eg "topic change about 30% in to turn after 'Eggplant'"

Dialogs

We are targetting the following dialogs in the first instance which we'd like to have annotated by as many people as possible to allow us to measure inter-annotator reliability. If you are willing to help, please select a dialog (at random) from the list below and then work your way through the list as your patience allows. Even if you can only do one dialog, that would be useful. Please email the completed text file (as above) to me (Steve Cassidy).

I've added links to MP3 files of the meetings that I have audio for, listening to these while following the transcript may make annotation a little easier.

adv105su068
ICSI_1450 MP3
ICSI_1430 MP3
mtg485sg142
NIST_1148 MP3
NIST_1007 MP3
CMU_1500 MP3
CMU_1400 MP3
LDC_1400 MP3
LDC_1500 MP3
tut301mu021
dis115ju087

Many thanks for any help that you can provide with this task.