Tuesday, March 24, 2009

476/676 Writing Project

It is sometimes desirable to write comprehensive surveys of a given area of science, including a thorough review of the literature, and a list of open problems.

It is also sometimes useful to write a short summary of a given area, based on a few of the most recent or most important papers in the literature. Then one can ask questions like, what is the most important issue, and what is the most promising approach to addressing that issue? Such documents are called by different names, e.g. white papers.

When one approaches an agency such as the National Science Foundation (NSF) for money to do research, the first is to write a white paper. The structure is in four parts, based on these questions: what's the problem, what have others done to solve it, what is the most promising approach at this point, and what would it take to pursue that approach?

The 476/676 writing project this semester is to choose a topic within (or related to) information retrieval, and write a short white paper on that topic. The paper may be no more than five pages in length, single space, no more than ten references, and at most three small figures.

Possible topics, not an exhaustive list: building large collections for IR evaluation; distributed IR, especially collection selection, or results fusion; cross-language IR; variations on the vector space model, such as GVSM; variations on latent semantic analysis; searching specialized corpora, such as music, patents, images, movies, etc.; searching the semantic web; specialized computer architectures for IR; clustering algorithms, especially variations on k-means; information filtering, especially spam detection; text summarization; information extraction; IR systems for the disabled; recommender systems; adversarial IR; text categorization, especially feature selection; use of linear algebra in IR, e.g. scalable matrix decomposition methods.

Many but not all of these may be mentioned in the textbook. We won't have a chance to cover too much of any of these in class, although we will talk about LSA and clustering. But there is lots of material available on all of these topics!

Important dates: Choose a topic, and send an email to me (and cc Don Dimitroff) describing your topic. I may suggest that the topic may broadened or narrowed. Do this by April 7.
The paper will be due in class on May 12.

No comments:

Post a Comment