This course introduces the student to the foundations and state-of-the-art techniques in developing high performance software for mathematical functionality occurring in various fields in computer science. The focus is on optimizing for a single core and includes optimizing for the memory hierarchy, for special instruction sets, and the possible use of automatic performance tuning.
Learning objective
Software performance (i.e., runtime) arises through the complex interaction of algorithm, its implementation, the compiler used, and the microarchitecture the program is run on. The first goal of the course is to provide the student with an understanding of this "vertical" interaction, and hence software performance, for mathematical functionality. The second goal is to teach a systematic strategy how to use this knowledge to write fast software for numerical problems. This strategy will be trained in several homeworks and a semester-long group project.
Content
The fast evolution and increasing complexity of computing platforms pose a major challenge for developers of high performance software for engineering, science, and consumer applications: it becomes increasingly harder to harness the available computing power. Straightforward implementations may lose as much as one or two orders of magnitude in performance. On the other hand, creating optimal implementations requires the developer to have an understanding of algorithms, capabilities and limitations of compilers, and the target platform's architecture and microarchitecture.
This interdisciplinary course introduces the student to the foundations and state-of-the-art techniques in high performance mathematical software development using important functionality such as matrix operations, transforms, filters, and others as examples. The course will explain how to optimize for the memory hierarchy, take advantage of special instruction sets, and other details of current processors that require optimization. The concept of automatic performance tuning is introduced. The focus is on optimization for a single core; thus, the course complements others on parallel and distributed computing.
Finally a general strategy for performance analysis and optimization is introduced that the students will apply in group projects that accompany the course.
Prerequisites / Notice
Solid knowledge of the C programming language and matrix algebra.
Performance assessment
Performance assessment information (valid until the course unit is held again)
Repetition only possible after re-enrolling for the course unit.
Additional information on mode of examination
The grade for the course is determined by several homeworks (30%), one midterm exam (30%), and one semester-long project with final report and presentation (40%). There is no possibility to repeat the midterm exam!
Last cancellation/deregistration date for this graded semester performance: second Friday in March! Please note that after that date no deregistration will be accepted and the course will be considered as "fail".