SMA - Annual Report 20045/2005


	Annual Report Homepage		Previous	Next



Project abstracts can be viewed from the CD-ROM which is enclosed or the SMA website (http://www.sma.nus.edu.sg).

IMST Programme	MEBCS Programme	CS Programme

CS Programme

Using Paraphrases for Information Extraction

Student	:	Shilpa Arora

SMA Supervisors	:	Assoc Prof Ng Hwee Tou (Singapore) & Prof Leslie P. Kaelbling (MIT)

Project Abstract:

Maximum Entropy has been used in several statistical classification problems in Natural Language Processing (NLP). This project uses the Maximum Entropy approach to build an Information Extraction (IE) system for extracting information about events such as terrorist
events. Here, the Information Extraction (IE) task is modeled as a classification problem where information extracted from the document is used to fill information slots in a template. Manually generated templates are used as the training set for this system but the amount of manually labeled data that can be provided is very limited. So, in our approach we use weakly labeled data, which essentially contains news articles describing same event, and are thus a rich source of paraphrases. The weakly labeled data is freely available and hence provides ample training examples. A basic Maximum Entropy based IE system was developed and improvement in performance using weakly labeled data was investigated.

- Go back to titles