Michael A. Covington, Ph.D.
Senior Research Scientist Emeritus
Adjunct Professor of Computer Science
Institute for Artificial Intelligence
The University of Georgia
Room 111, Boyd Graduate Studies Research Center
Athens, Georgia 30602-7415 U.S.A.

Michael A. Covington > Information for Students > Suggested Thesis Topics

Suggested thesis topics in natural language processing


The following is a list of suggested M.S. thesis topics in natural language processing and information retrieval. Please use this as a stimulus for your own creativity.

Note that to write a successful thesis, you have to think like a researcher — formulate and test hypotheses — and you will have to master a considerable amount of background knowledge not taught to you in courses. For best results, exploit your super-powers — ask yourself what you can do that others can't.

You can see earlier theses here. We encourage students to build on the work of their predecessors.

This list was last updated 2011 August 10.

There is no funding or assistantship presently associated with any of these topics.


FinITA project — Financial text analysis. The FinITA project has not yet developed a specific research agenda, but the overall area is analysis of texts containing financial news or reports, to extract un-obvious content (e.g., information about whether the writer is optimistic or pessimistic).

ARC project — Computer understanding of descriptions of cathedrals. This is currently Charles Hollingsworth's thesis project, and others should continue the work. It is a combination of natural language understanding and knowledge representation. Implementation is mainly in Prolog.

CASPR project — Measure the coherence of a text. That is, measure whether a natural-language text stays on the same subject or not. Colin Nicholson has already done one thesis in this area. We need research on this topic for the schizophrenia project; it also has other uses.

CASPR project — Measure the orderliness and completeness of a picture description. Schizophrenia patients are given pictures to describe. We want to measure whether they mention all the prominent objects and actions in the picture, and also whether the description is "orderly" (to be defined). Some early work on this has been done by Hemali Vin (CURO Symposium).

CASPR project — Phonetic aspects of schizophrenia. This is a piece of collaborative research already in progress but there might be room for a thesis. We are measuring phonetic attributes of the speech of schizophrenia patients, to correlate pitch, tongue movement, etc., with abnormalities caused by the disease. Requires some knowledge of phonetics.

CASPR project — Psycholinguistic study of the poet Emily Dickinson. (We have a collaborator at the National Institute of Mental Health who is quite interested in this.) The poet is thought to have suffered from bipolar disorder, which can be tracked by variations in her writing rate. The question is whether the style or content of the poems also varied in a measurable way.

IHUT project — Terrorist manifestoes. We have a collection of terrorist-group manifestoes (all written in English) together with statistics on how violence-prone the groups are. We would like to correlate violence-proneness with the style or vocabulary of the manifesto. One graduate student (David Robinson) has worked on this and failed to get results, but there may still be possibilities. A definite negative result (proof that there is no correlation) would also be significant.

Syntactic stylometry. It is well known that authors can often be identified by their vocabulary, sentence length, and other measures of style. How about using a parser and measuring statistics of their sentence structure? Charles Hollingsworth has done some preliminary work on this.

CPIDR 5. This is a software package that measures idea density and needs revising. Merely finishing CPIDR 5 is not enough for a thesis, but a set of interesting applications of it would be. Implementation is in C#.

Reusable parser for English. We have the Penn Treebank, and we have a reusable though imperfect part-of-speech tagger (ODT). It would be very useful to have a parser for English that performs well and can be used as a foundation for other research. (Implementation so far has been in C#.)

Visualization of properties of a text. An interesting paper called "Literature Fingerprinting" by Keim and Oelke makes colorful maps of texts reflecting the vocabulary and other characteristics. This is the tip of an iceberg — lots of powerful visualization techniques can be applied to natural-language material.



The content and opinions expressed on this Web page do not necessarily reflect the views of,
nor are they endorsed by, the University of Georgia or the University System of Georgia.