Tuesday, February 10, 2009

notes before class 2/10/09

Last week we did chapters 1 and 2 of the textbook.

The Boolean retrieval model is not used in practice as much as in years past, but as an example of the issues that arise in IR systems in general, it's helpful I think. The concepts related to managing the "term space", i.e. the set of terms used to represent the documents in a corpus, are still quite relevant.

N-grams have some advantages and disadvantages over words, when it comes to terms used to represent a document. I'll explain this in class, in the future if not today.

The first phase of the project is due today. Some students have offered to attach their programs to the submission, allowing me or Mr. Don to run the programs. This is acceptable but not necessary.

I'll be releasing phase two today, I think.