263-5200-00L Data Mining: Learning from Large Data Sets
Semester | Spring Semester 2015 |
Lecturers | A. Krause |
Periodicity | yearly recurring course |
Course | Does not take place this semester. |
Language of instruction | English |
Comment | The course will be offered again in the autumn semester 2015. |
Abstract | Many scientific and commercial applications require insights from massive, high-dimensional data sets. This courses introduces principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. The course both covers theoretical foundations and practical applications. |
Objective | Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In this graduate-level course, we will study principled, state-of-the-art techniques from statistics, algorithms and discrete and convex optimization for learning from such large data sets. The course will both cover theoretical foundations and practical applications. |
Content | Topics covered: - Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk) - Fast nearest neighbor methods (Shingling, locality sensitive hashing) - Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines) - Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback) - Active learning (uncertainty sampling, pool-based methods, label complexity) - Dimension reduction (random projections, nonlinear methods) - Data streams (Sketches, coresets, applications to online clustering) - Recommender systems |
Prerequisites / Notice | Prerequisites: Solid basic knowledge in statistics, algorithms and programming. Background in machine learning is helpful but not required. |