252-0341-01L Information Retrieval
Semester | Spring Semester 2018 |
Lecturers | G. Fourny |
Periodicity | yearly recurring course |
Language of instruction | English |
Abstract | Introduction to information retrieval with a focus on text documents and images. Main topics comprise extraction of characteristic features from documents, index structures, retrieval models, search algorithms, benchmarking, and feedback mechanisms. Searching the web, images and XML collections demonstrate recent applications of information retrieval and their implementation. |
Learning objective | In depth understanding of how to model, index and query unstructured data (text), the vector space model, boolean queries, terms, posting lists, dealing with errors and imprecision. Knowledge on how to make queries faster and how to make queries work on very large datasets. Knowledge on how to evaluate the quality of an information retrieval engine. Knowledge about alternate models (structured data, probabilistic retrieval, language models) as well as basic search algorithms on the web such as Google's PageRank. |
Content | Tentative plan (subject to change). The lecture structure will follow the pedagogical approach of the book (see below). The field of information retrieval also encompasses machine learning aspects. However, we will make a conscious effort to limit overlaps, and be complementary with, the Introduction to Machine Learning lecture. 1. Introduction 2. The basics of how to index and query unstructured data 3. Pre-processing the data prior to indexing: building the term vocabulary, posting lists 4. Dealing with spelling errors: tolerant retrieval 5. Scaling up to large datasets 6. How to improve performance by compressing the index 7. Ranking the results: scores and the vector space model 8. Evaluating the quality of information retrieval: relevance 9. Query expansion 10. Structured retrieval: when the data is not quite unstructured (XML or HTML) 11. Alternate approach: Probabilistic information retrieval 12. Alternate approach: Language models 13. Crawling the Web 14. Link analysis (PageRank) |
Literature | C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press. |
Prerequisites / Notice | Prior knowledge in linear algebra, data structures and algorithms, and probability theory (at the Bachelor's level) is required. |