We describe a competitive question generation and answering project used in our undergraduate natural language processing courses. This semester-long project challenges teams of three or four students to use available NLP components (or develop their own) to construct systems that ask and answer questions about an arbitrary Wikipedia article. We describe how the project and competition were structured, the outcomes, and lessons learned. The Question/Answer dataset generated by students who took undergraduate natural language processing courses taught by Noah Smith at Carnegie Mellon and Rebecca Hwa at the University of Pittsburgh during Spring 2008, Spring 2009, and Spring 2010.
Data Collection
The project proceeded in 4 phases of a 15-week semester: data preparation (weeks 1¨C4), during which the first few course lectures introduced the most important concepts for getting started in NLP and motivating applications; system development (weeks 5¨C12), during which teams worked on their systems as they learned more about problems and solutions in NLP; evaluation/competition (weeks 13¨C14); and live demonstrations (hosted by the local Google office) at the end. The first and third phases are most relevant.
