252-0341-01L  Information Retrieval

SemesterSpring Semester 2018
LecturersG. Fourny
Periodicityyearly recurring course
Language of instructionEnglish

AbstractIntroduction to information retrieval with a focus on text documents and images.

Main topics comprise extraction of characteristic features from documents, index structures, retrieval models, search algorithms, benchmarking, and feedback mechanisms. Searching the web, images and XML collections demonstrate recent applications of information retrieval and their implementation.
ObjectiveIn depth understanding of how to model, index and query unstructured data (text), the vector space model, boolean queries, terms, posting lists, dealing with errors and imprecision.

Knowledge on how to make queries faster and how to make queries work on very large datasets. Knowledge on how to evaluate the quality of an information retrieval engine.

Knowledge about alternate models (structured data, probabilistic retrieval, language models) as well as basic search algorithms on the web such as Google's PageRank.
ContentTentative plan (subject to change). The lecture structure will follow the pedagogical approach of the book (see below).

The field of information retrieval also encompasses machine learning aspects. However, we will make a conscious effort to limit overlaps, and be complementary with, the Introduction to Machine Learning lecture.

1. Introduction

2. The basics of how to index and query unstructured data

3. Pre-processing the data prior to indexing: building the term vocabulary, posting lists

4. Dealing with spelling errors: tolerant retrieval

5. Scaling up to large datasets

6. How to improve performance by compressing the index

7. Ranking the results: scores and the vector space model

8. Evaluating the quality of information retrieval: relevance

9. Query expansion

10. Structured retrieval: when the data is not quite unstructured (XML or HTML)

11. Alternate approach: Probabilistic information retrieval

12. Alternate approach: Language models

13. Crawling the Web

14. Link analysis (PageRank)
LiteratureC. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press.
Prerequisites / NoticePrior knowledge in linear algebra, data structures and algorithms, and probability theory (at the Bachelor's level) is required.