Search result: Catalogue data in Spring Semester 2020

Computational Biology and Bioinformatics Master Information
More informations at: https://www.cbb.ethz.ch
Advanced Courses
A total of 30 ECTS needs to be acquired in the Advanced Courses category. Thereof 18 ECTS in the Theory and 12 ECTS in the Biology category.
Note that some of the lectures are being recorded: https://video.ethz.ch/lectures.html
Theory
At least 18 ECTS need to be acquired in this category.
NumberTitleTypeECTSHoursLecturers
252-0063-00LData Modelling and Databases Information Restricted registration - show details W7 credits4V + 2UC. Zhang
AbstractData modelling (Entity Relationship), relational data model, relational design theory (normal forms), SQL, database integrity, transactions and advanced database engines
ObjectiveIntroduction to relational databases and data management. Basics of SQL programming and transaction management.
ContentThe course covers the basic aspects of the design and implementation of databases and information systems. The courses focuses on relational databases as a starting point but will also cover data management issues beyond databases such as: transactional consistency, replication, data warehousing, other data models, as well as SQL.
LiteratureKemper, Eickler: Datenbanksysteme: Eine Einführung. Oldenbourg Verlag, 7. Auflage, 2009.

Garcia-Molina, Ullman, Widom: Database Systems: The Complete Book. Pearson, 2. Auflage, 2008.
401-0674-00LNumerical Methods for Partial Differential Equations
Not meant for BSc/MSc students of mathematics.
W10 credits2G + 2U + 2P + 4AR. Hiptmair
AbstractDerivation, properties, and implementation of fundamental numerical methods for a few key partial differential equations: convection-diffusion, heat equation, wave equation, conservation laws. Implementation in C++ based on a finite element library.
ObjectiveMain skills to be acquired in this course:
* Ability to implement fundamental numerical methods for the solution of partial differential equations efficiently.
* Ability to modify and adapt numerical algorithms guided by awareness of their mathematical foundations.
* Ability to select and assess numerical methods in light of the predictions of theory
* Ability to identify features of a PDE (= partial differential equation) based model that are relevant for the selection and performance of a numerical algorithm.
* Ability to understand research publications on theoretical and practical aspects of numerical methods for partial differential equations.
* Skills in the efficient implementation of finite element methods on unstructured meshes.

This course is neither a course on the mathematical foundations and numerical analysis of methods nor an course that merely teaches recipes and how to apply software packages.
Content1 Second-Order Scalar Elliptic Boundary Value Problems
1.2 Equilibrium Models: Examples
1.3 Sobolev spaces
1.4 Linear Variational Problems
1.5 Equilibrium Models: Boundary Value Problems
1.6 Diffusion Models (Stationary Heat Conduction)
1.7 Boundary Conditions
1.8 Second-Order Elliptic Variational Problems
1.9 Essential and Natural Boundary Conditions
2 Finite Element Methods (FEM)
2.2 Principles of Galerkin Discretization
2.3 Case Study: Linear FEM for Two-Point Boundary Value Problems
2.4 Case Study: Triangular Linear FEM in Two Dimensions
2.5 Building Blocks of General Finite Element Methods
2.6 Lagrangian Finite Element Methods
2.7 Implementation of Finite Element Methods
2.7.1 Mesh Generation and Mesh File Format
2.7.2 Mesh Information and Mesh Data Structures
2.7.2.1 L EHR FEM++ Mesh: Container Layer
2.7.2.2 L EHR FEM++ Mesh: Topology Layer
2.7.2.3 L EHR FEM++ Mesh: Geometry Layer
2.7.3 Vectors and Matrices
2.7.4 Assembly Algorithms
2.7.4.1 Assembly: Localization
2.7.4.2 Assembly: Index Mappings
2.7.4.3 Distribute Assembly Schemes
2.7.4.4 Assembly: Linear Algebra Perspective
2.7.5 Local Computations
2.7.5.1 Analytic Formulas for Entries of Element Matrices
2.7.5.2 Local Quadrature
2.7.6 Treatment of Essential Boundary Conditions
2.8 Parametric Finite Element Methods
3 FEM: Convergence and Accuracy
3.1 Abstract Galerkin Error Estimates
3.2 Empirical (Asymptotic) Convergence of Lagrangian FEM
3.3 A Priori (Asymptotic) Finite Element Error Estimates
3.4 Elliptic Regularity Theory
3.5 Variational Crimes
3.6 FEM: Duality Techniques for Error Estimation
3.7 Discrete Maximum Principle
3.8 Validation and Debugging of Finite Element Codes
4 Beyond FEM: Alternative Discretizations [dropped]
5 Non-Linear Elliptic Boundary Value Problems [dropped]
6 Second-Order Linear Evolution Problems
6.1 Time-Dependent Boundary Value Problems
6.2 Parabolic Initial-Boundary Value Problems
6.3 Linear Wave Equations
7 Convection-Diffusion Problems [dropped]
8 Numerical Methods for Conservation Laws
8.1 Conservation Laws: Examples
8.2 Scalar Conservation Laws in 1D
8.3 Conservative Finite Volume (FV) Discretization
8.4 Timestepping for Finite-Volume Methods
8.5 Higher-Order Conservative Finite-Volume Schemes
Lecture notesThe lecture will be taught in flipped classroom format:
- Video tutorials for all thematic units will be published online.
- Tablet notes accompanying the videos will be made available to the audience as PDF.
- A comprehensive lecture document will cover all aspects of the course.
LiteratureChapters of the following books provide supplementary reading
(detailed references in course material):

* D. Braess: Finite Elemente,
Theorie, schnelle Löser und Anwendungen in der Elastizitätstheorie, Springer 2007 (available online).
* S. Brenner and R. Scott. Mathematical theory of finite element methods, Springer 2008 (available online).
* A. Ern and J.-L. Guermond. Theory and Practice of Finite Elements, volume 159 of Applied Mathematical Sciences. Springer, New York, 2004.
* Ch. Großmann and H.-G. Roos: Numerical Treatment of Partial Differential Equations, Springer 2007.
* W. Hackbusch. Elliptic Differential Equations. Theory and Numerical Treatment, volume 18 of Springer Series in Computational Mathematics. Springer, Berlin, 1992.
* P. Knabner and L. Angermann. Numerical Methods for Elliptic and Parabolic Partial Differential Equations, volume 44 of Texts in Applied Mathematics. Springer, Heidelberg, 2003.
* S. Larsson and V. Thomée. Partial Differential Equations with Numerical Methods, volume 45 of Texts in Applied Mathematics. Springer, Heidelberg, 2003.
* R. LeVeque. Finite Volume Methods for Hyperbolic Problems. Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge, UK, 2002.

However, study of supplementary literature is not important for for following the course.
Prerequisites / NoticeMastery of basic calculus and linear algebra is taken for granted.
Familiarity with fundamental numerical methods (solution methods for linear systems of equations, interpolation, approximation, numerical quadrature, numerical integration of ODEs) is essential.

Important: Coding skills and experience in C++ are essential.

Homework assignments involve substantial coding, partly based on a C++ finite element library. The written examination will be computer based and will comprise coding tasks.
401-3052-05LGraph Theory Information W5 credits2V + 1UB. Sudakov
AbstractBasic notions, trees, spanning trees, Caley's formula, vertex and edge connectivity, 2-connectivity, Mader's theorem, Menger's theorem, Eulerian graphs, Hamilton cycles, Dirac's theorem, matchings, theorems of Hall, König and Tutte, planar graphs, Euler's formula, basic non-planar graphs, graph colorings, greedy colorings, Brooks' theorem, 5-colorings of planar graphs
ObjectiveThe students will get an overview over the most fundamental questions concerning graph theory. We expect them to understand the proof techniques and to use them autonomously on related problems.
Lecture notesLecture will be only at the blackboard.
LiteratureWest, D.: "Introduction to Graph Theory"
Diestel, R.: "Graph Theory"

Further literature links will be provided in the lecture.
Prerequisites / NoticeStudents are expected to have a mathematical background and should be able to write rigorous proofs.


NOTICE: This course unit was previously offered as 252-1408-00L Graphs and Algorithms.
227-1034-00LComputational Vision (University of Zurich)
No enrolment to this course at ETH Zurich. Book the corresponding module directly at UZH.
UZH Module Code: INI402

Mind the enrolment deadlines at UZH:
https://www.uzh.ch/cmsssl/en/studies/application/mobilitaet.html
W6 credits2V + 1UD. Kiper
AbstractThis course focuses on neural computations that underlie visual perception. We study how visual signals are processed in the retina, LGN and visual cortex. We study the morpholgy and functional architecture of cortical circuits responsible for pattern, motion, color, and three-dimensional vision.
ObjectiveThis course considers the operation of circuits in the process of neural computations. The evolution of neural systems will be considered to demonstrate how neural structures and mechanisms are optimised for energy capture, transduction, transmission and representation of information. Canonical brain circuits will be described as models for the analysis of sensory information. The concept of receptive fields will be introduced and their role in coding spatial and temporal information will be considered. The constraints of the bandwidth of neural channels and the mechanisms of normalization by neural circuits will be discussed.
The visual system will form the basis of case studies in the computation of form, depth, and motion. The role of multiple channels and collective computations for object recognition will
be considered. Coordinate transformations of space and time by cortical and subcortical mechanisms will be analysed. The means by which sensory and motor systems are integrated to allow for adaptive behaviour will be considered.
ContentThis course considers the operation of circuits in the process of neural computations. The evolution of neural systems will be considered to demonstrate how neural structures and mechanisms are optimised for energy capture, transduction, transmission and representation of information. Canonical brain circuits will be described as models for the analysis of sensory information. The concept of receptive fields will be introduced and their role in coding spatial and temporal information will be considered. The constraints of the bandwidth of neural channels and the mechanisms of normalization by neural circuits will be discussed.
The visual system will form the basis of case studies in the computation of form, depth, and motion. The role of multiple channels and collective computations for object recognition will
be considered. Coordinate transformations of space and time by cortical and subcortical mechanisms will be analysed. The means by which sensory and motor systems are integrated to allow for adaptive behaviour will be considered.
LiteratureBooks: (recommended references, not required)
1. An Introduction to Natural Computation, D. Ballard (Bradford Books, MIT Press) 1997.
2. The Handbook of Brain Theorie and Neural Networks, M. Arbib (editor), (MIT Press) 1995.
227-0558-00LPrinciples of Distributed Computing Information W7 credits2V + 2U + 2AR. Wattenhofer, M. Ghaffari
AbstractWe study the fundamental issues underlying the design of distributed systems: communication, coordination, fault-tolerance, locality, parallelism, self-organization, symmetry breaking, synchronization, uncertainty. We explore essential algorithmic ideas and lower bound techniques.
ObjectiveDistributed computing is essential in modern computing and communications systems. Examples are on the one hand large-scale networks such as the Internet, and on the other hand multiprocessors such as your new multi-core laptop. This course introduces the principles of distributed computing, emphasizing the fundamental issues underlying the design of distributed systems and networks: communication, coordination, fault-tolerance, locality, parallelism, self-organization, symmetry breaking, synchronization, uncertainty. We explore essential algorithmic ideas and lower bound techniques, basically the "pearls" of distributed computing. We will cover a fresh topic every week.
ContentDistributed computing models and paradigms, e.g. message passing, shared memory, synchronous vs. asynchronous systems, time and message complexity, peer-to-peer systems, small-world networks, social networks, sorting networks, wireless communication, and self-organizing systems.

Distributed algorithms, e.g. leader election, coloring, covering, packing, decomposition, spanning trees, mutual exclusion, store and collect, arrow, ivy, synchronizers, diameter, all-pairs-shortest-path, wake-up, and lower bounds
Lecture notesAvailable. Our course script is used at dozens of other universities around the world.
LiteratureLecture Notes By Roger Wattenhofer. These lecture notes are taught at about a dozen different universities through the world.

Distributed Computing: Fundamentals, Simulations and Advanced Topics
Hagit Attiya, Jennifer Welch.
McGraw-Hill Publishing, 1998, ISBN 0-07-709352 6

Introduction to Algorithms
Thomas Cormen, Charles Leiserson, Ronald Rivest.
The MIT Press, 1998, ISBN 0-262-53091-0 oder 0-262-03141-8

Disseminatin of Information in Communication Networks
Juraj Hromkovic, Ralf Klasing, Andrzej Pelc, Peter Ruzicka, Walter Unger.
Springer-Verlag, Berlin Heidelberg, 2005, ISBN 3-540-00846-2

Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes
Frank Thomson Leighton.
Morgan Kaufmann Publishers Inc., San Francisco, CA, 1991, ISBN 1-55860-117-1

Distributed Computing: A Locality-Sensitive Approach
David Peleg.
Society for Industrial and Applied Mathematics (SIAM), 2000, ISBN 0-89871-464-8
Prerequisites / NoticeCourse pre-requisites: Interest in algorithmic problems. (No particular course needed.)
401-3632-00LComputational StatisticsW8 credits3V + 1UM. H. Maathuis
AbstractWe discuss modern statistical methods for data analysis, including methods for data exploration, prediction and inference. We pay attention to algorithmic aspects, theoretical properties and practical considerations. The class is hands-on and methods are applied using the statistical programming language R.
ObjectiveThe student obtains an overview of modern statistical methods for data analysis, including their algorithmic aspects and theoretical properties. The methods are applied using the statistical programming language R.
ContentSee the class website
Prerequisites / NoticeAt least one semester of (basic) probability and statistics.

Programming experience is helpful but not required.
101-0178-01LUncertainty Quantification in Engineering Information W3 credits2GS. Marelli
AbstractUncertainty quantification aims at studying the impact of aleatory and epistemic uncertainty onto computational models used in science and engineering. The course introduces the basic concepts of uncertainty quantification: probabilistic modelling of data (copula theory), uncertainty propagation techniques (Monte Carlo simulation, polynomial chaos expansions), and sensitivity analysis.
ObjectiveAfter this course students will be able to properly pose an uncertainty quantification problem, select the appropriate computational methods and interpret the results in meaningful statements for field scientists, engineers and decision makers. The course is suitable for any master/Ph.D. student in engineering or natural sciences, physics, mathematics, computer science with a basic knowledge in probability theory.
ContentThe course introduces uncertainty quantification through a set of practical case studies that come from civil, mechanical, nuclear and electrical engineering, from which a general framework is introduced. The course in then divided into three blocks: probabilistic modelling (introduction to copula theory), uncertainty propagation (Monte Carlo simulation and polynomial chaos expansions) and sensitivity analysis (correlation measures, Sobol' indices). Each block contains lectures and tutorials using Matlab and the in-house software UQLab (www.uqlab.com).
Lecture notesDetailed slides are provided for each lecture. A printed script gathering all the lecture slides may be bought at the beginning of the semester.
Prerequisites / NoticeA basic background in probability theory and statistics (bachelor level) is required. A summary of useful notions will be handed out at the beginning of the course.

A good knowledge of Matlab is required to participate in the tutorials and for the mini-project.
252-0526-00LStatistical Learning Theory Information W7 credits3V + 2U + 1AJ. M. Buhmann, C. Cotrini Jimenez
AbstractThe course covers advanced methods of statistical learning:

- Variational methods and optimization.
- Deterministic annealing.
- Clustering for diverse types of data.
- Model validation by information theory.
ObjectiveThe course surveys recent methods of statistical learning. The fundamentals of machine learning, as presented in the courses "Introduction to Machine Learning" and "Advanced Machine Learning", are expanded from the perspective of statistical learning.
Content- Variational methods and optimization. We consider optimization approaches for problems where the optimizer is a probability distribution. We will discuss concepts like maximum entropy, information bottleneck, and deterministic annealing.

- Clustering. This is the problem of sorting data into groups without using training samples. We discuss alternative notions of "similarity" between data points and adequate optimization procedures.

- Model selection and validation. This refers to the question of how complex the chosen model should be. In particular, we present an information theoretic approach for model validation.

- Statistical physics models. We discuss approaches for approximately optimizing large systems, which originate in statistical physics (free energy minimization applied to spin glasses and other models). We also study sampling methods based on these models.
Lecture notesA draft of a script will be provided. Lecture slides will be made available.
LiteratureHastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001.

L. Devroye, L. Gyorfi, and G. Lugosi: A probabilistic theory of pattern recognition. Springer, New York, 1996
Prerequisites / NoticeKnowledge of machine learning (introduction to machine learning and/or advanced machine learning)
Basic knowledge of statistics.
227-0216-00LControl Systems II Information W6 credits4GR. Smith
AbstractIntroduction to basic and advanced concepts of modern feedback control.
ObjectiveIntroduction to basic and advanced concepts of modern feedback control.
ContentThis course is designed as a direct continuation of the course "Regelsysteme" (Control Systems). The primary goal is to further familiarize students with various dynamic phenomena and their implications for the analysis and design of feedback controllers. Simplifying assumptions on the underlying plant that were made in the course "Regelsysteme" are relaxed, and advanced concepts and techniques that allow the treatment of typical industrial control problems are presented. Topics include control of systems with multiple inputs and outputs, control of uncertain systems (robustness issues), limits of achievable performance, and controller implementation issues.
Lecture notesThe slides of the lecture are available to download.
LiteratureSkogestad, Postlethwaite: Multivariable Feedback Control - Analysis and Design. Second Edition. John Wiley, 2005.
Prerequisites / NoticePrerequisites:
Control Systems or equivalent
151-0566-00LRecursive Estimation Information W4 credits2V + 1UR. D'Andrea
AbstractEstimation of the state of a dynamic system based on a model and observations in a computationally efficient way.
ObjectiveLearn the basic recursive estimation methods and their underlying principles.
ContentIntroduction to state estimation; probability review; Bayes' theorem; Bayesian tracking; extracting estimates from probability distributions; Kalman filter; extended Kalman filter; particle filter; observer-based control and the separation principle.
Lecture notesLecture notes available on course website: http://www.idsc.ethz.ch/education/lectures/recursive-estimation.html
Prerequisites / NoticeRequirements: Introductory probability theory and matrix-vector algebra.
401-3642-00LBrownian Motion and Stochastic Calculus Information W10 credits4V + 1UW. Werner
AbstractThis course covers some basic objects of stochastic analysis. In particular, the following topics are discussed: construction and properties of Brownian motion, stochastic integration, Ito's formula and applications, stochastic differential equations and connection with partial differential equations.
ObjectiveThis course covers some basic objects of stochastic analysis. In particular, the following topics are discussed: construction and properties of Brownian motion, stochastic integration, Ito's formula and applications, stochastic differential equations and connection with partial differential equations.
Lecture notesLecture notes will be distributed in class.
Literature- J.-F. Le Gall, Brownian Motion, Martingales, and Stochastic Calculus, Springer (2016).
- I. Karatzas, S. Shreve, Brownian Motion and Stochastic Calculus, Springer (1991).
- D. Revuz, M. Yor, Continuous Martingales and Brownian Motion, Springer (2005).
- L.C.G. Rogers, D. Williams, Diffusions, Markov Processes and Martingales, vol. 1 and 2, Cambridge University Press (2000).
- D.W. Stroock, S.R.S. Varadhan, Multidimensional Diffusion Processes, Springer (2006).
Prerequisites / NoticeFamiliarity with measure-theoretic probability as in the standard D-MATH course "Probability Theory" will be assumed. Textbook accounts can be found for example in
- J. Jacod, P. Protter, Probability Essentials, Springer (2004).
- R. Durrett, Probability: Theory and Examples, Cambridge University Press (2010).
401-3602-00LApplied Stochastic Processes Information
Does not take place this semester.
W8 credits3V + 1Unot available
AbstractPoisson processes; renewal processes; Markov chains in discrete and in continuous time; some applications.
ObjectiveStochastic processes are a way to describe and study the behaviour of systems that evolve in some random way. In this course, the evolution will be with respect to a scalar parameter interpreted as time, so that we discuss the temporal evolution of the system. We present several classes of stochastic processes, analyse their properties and behaviour and show by some examples how they can be used. The main emphasis is on theory; in that sense, "applied" should be understood to mean "applicable".
LiteratureR. N. Bhattacharya and E. C. Waymire, "Stochastic Processes with Applications", SIAM (2009), available online: http://epubs.siam.org/doi/book/10.1137/1.9780898718997
R. Durrett, "Essentials of Stochastic Processes", Springer (2012), available online: http://link.springer.com/book/10.1007/978-1-4614-3615-7/page/1
M. Lefebvre, "Applied Stochastic Processes", Springer (2007), available online: http://link.springer.com/book/10.1007/978-0-387-48976-6/page/1
S. I. Resnick, "Adventures in Stochastic Processes", Birkhäuser (2005)
Prerequisites / NoticePrerequisites are familiarity with (measure-theoretic) probability theory as it is treated in the course "Probability Theory" (401-3601-00L).
636-0530-00LHigh Performance ComputingW4 credits4Gexternal organisers
Abstract
Objective
262-6220-00LMolecular Dynamics Simulations with Applications in Soft Matter
Does not take place this semester.
W3 credits3Vexternal organisers
Abstract
Objective
262-0200-00LBayesian PhylodynamicsW4 credits2G + 2AT. Stadler, T. Vaughan
AbstractHow fast was Ebola spreading in West Africa? Where and when did the epidemic outbreak start? How can we construct the phylogenetic tree of great apes, and did gene flow occur between different apes? At the end of the course, students will have designed, performed, presented, and discussed their own phylodynamic data analysis to answer such questions.
ObjectiveAttendees will extend their knowledge of Bayesian phylodynamics obtained in the “Computational Biology” class (636-0017-00L) and will learn how to apply this theory to real world data. The main theoretical concepts introduced are:
* Bayesian statistics
* Phylogenetic and phylodynamic models
* Markov Chain Monte Carlo methods
Attendees will apply these concepts to a number of applications yielding biological insight into:
* Epidemiology
* Pathogen evolution
* Macroevolution of species
ContentIn the first part of the semester, in each week, we will first present the theoretical concepts of Bayesian phylodynamics. The presentation will be followed by attendees using the software package BEAST v2 to apply these theoretical concepts to empirical data. We use previously published datasets on e.g. Ebola, Zika, Yellow Fever, Apes, and Penguins for analysis. Examples of these practical tutorials are available on https://taming-the-beast.org/.
In the second part of the semester, the students choose an empirical dataset of genetic sequencing data and possibly some non-genetic metadata. They then design and conduct a research project in which they perform Bayesian phylogenetic analyses of their dataset. The weekly class is intended to discuss and monitor progress and to address students’ questions very interactively. At the end of the semester, the students present their research project in an oral presentation. The content of the presentation, the style of the presentation, and the performance in answering the questions after the presentation will be marked.
Lecture notesLecture slides will be available on moodle.
LiteratureThe following books provide excellent background material:
• Drummond, A. & Bouckaert, R. 2015. Bayesian evolutionary analysis with BEAST.
• Yang, Z. 2014. Molecular Evolution: A Statistical Approach.
• Felsenstein, J. 2003. Inferring Phylogenies.
The tutorials in this course are based on our Summer School “Taming the BEAST”: https://taming-the-beast.org/
Prerequisites / NoticeThis class builds upon the content which we teach in the Computational Biology class (636-0017-00L). Attendees must have either taken the Computational Biology class or acquired the content elsewhere.
261-5113-00LComputational Challenges in Medical Genomics Information Restricted registration - show details
Number of participants limited to 20.
W2 credits2SA. Kahles, G. Rätsch
AbstractThis seminar discusses recent relevant contributions to the fields of computational genomics, algorithmic bioinformatics, statistical genetics and related areas. Each participant will hold a presentation and lead the subsequent discussion.
ObjectivePreparing and holding a scientific presentation in front of peers is a central part of working in the scientific domain. In this seminar, the participants will learn how to efficiently summarize the relevant parts of a scientific publication, critically reflect its contents, and summarize it for presentation to an audience. The necessary skills to succesfully present the key points of existing research work are the same as needed to communicate own research ideas.
In addition to holding a presentation, each student will both contribute to as well as lead a discussion section on the topics presented in the class.
ContentThe topics covered in the seminar are related to recent computational challenges that arise from the fields of genomics and biomedicine, including but not limited to genomic variant interpretation, genomic sequence analysis, compressive genomics tasks, single-cell approaches, privacy considerations, statistical frameworks, etc.
Both recently published works contributing novel ideas to the areas mentioned above as well as seminal contributions from the past are amongst the list of selected papers.
Prerequisites / NoticeKnowledge of algorithms and data structures and interest in applications in genomics and computational biomedicine.
262-6240-00LDistributed Information Systems
Mutually exclusive courses in the advanced course category: "Distributed Information Systems" (Uni Basel) and "Principles of Distributed Compution" (ETH Zürich).
W4 credits2Vexternal organisers
Abstract
Objective
252-0834-00LInformation Systems for Engineers Information
Wird ab HS20 nur in Herbstsemester angeboten.
W4 credits2V + 1UG. Fourny
AbstractThis course provides the basics of relational databases from the perspective of the user.

We will discover why tables are so incredibly powerful to express relations, learn the SQL query language, and how to make the most of it. The course also covers support for data cubes (analytics).
ObjectiveThis lesson is complementary with Big Data for Engineers as they cover different time periods of database history and practices -- you can even take both lectures at the same time.

After visiting this course, you will be capable to:

1. Explain, in the big picture, how a relational database works and what it can do in your own words.

2. Explain the relational data model (tables, rows, attributes, primary keys, foreign keys), formally and informally, including the relational algebra operators (select, project, rename, all kinds of joins, division, cartesian product, union, intersection, etc).

3. Perform non-trivial reading SQL queries on existing relational databases, as well as insert new data, update and delete existing data.

4. Design new schemas to store data in accordance to the real world's constraints, such as relationship cardinality

5. Explain what bad design is and why it matters.

6. Adapt and improve an existing schema to make it more robust against anomalies, thanks to a very good theoretical knowledge of what is called "normal forms".

7. Understand how indices work (hash indices, B-trees), how they are implemented, and how to use them to make queries faster.

8. Access an existing relational database from a host language such as Java, using bridges such as JDBC.

9. Explain what data independence is all about and didn't age a bit since the 1970s.

10. Explain, in the big picture, how a relational database is physically implemented.

11. Know and deal with the natural syntax for relational data, CSV.

12. Explain the data cube model including slicing and dicing.

13. Store data cubes in a relational database.

14. Map cube queries to SQL.

15. Slice and dice cubes in a UI.

And of course, you will think that tables are the most wonderful object in the world.
ContentUsing a relational database
=================
1. Introduction
2. The relational model
3. Data definition with SQL
4. The relational algebra
5. Queries with SQL

Taking a relational database to the next level
=================
6. Database design theory
7. Databases and host languages
8. Databases and host languages
9. Indices and optimization
10. Database architecture and storage

Analytics on top of a relational database
=================
12. Data cubes

Outlook
=================
13. Outlook
Literature- Lecture material (slides).

- Book: "Database Systems: The Complete Book", H. Garcia-Molina, J.D. Ullman, J. Widom
(It is not required to buy the book, as the library has it)
Prerequisites / NoticeFor non-CS/DS students only, BSc and MSc
Elementary knowledge of set theory and logics
Knowledge as well as basic experience with a programming language such as Pascal, C, C++, Java, Haskell, Python
636-0022-00LDesign of ExperimentsW4 credits3GH.‑M. Kaltenbach
AbstractThe course introduces 'classical' statistical design of experiments, particularly designs for blocking, full and fractional factorial designs with confounding, and response surface methods. Topics covered include (restricted) randomization and blocking, sample size and power calculations, confounding, and basics of analysis-of-variance methods for analysis including random effects and nesting.
ObjectiveStudents will learn about the statistical basics of designing and analyzing experiments with multiple qualitative and/or quantitative variables. Students will be able to construct designs for efficiently identifying important influence factors in their experiments, use sequential designs for optimizing experimental conditions, and correctly handle analyses with nested sampling or involving multiple comparisons.
ContentThe course introduces the basics of statistical design of experiments. We will start by discussing the role of randomization for the validity of inferences, see how replication (i.e., sample size) affects the precision of estimates that can be made, how we deal with nested replication (for example, taking several measurements on the same animal), and how we correctly handle multiple comparisons based on the same data.

We will then discuss how restrictions of randomization lead to blocked designs, which serve to improve precision of comparisons between experimental conditions. Such designs are also important to avoid confounding of the experimental effect of interest with other effects of no interest, e.g., to handle batch effects that are common in biological experimentation.

Next, we learn how to design efficient experiments with multiple factors of interest. In contrast to a one-variable-at-a time approach, factorial designs allow investigation of multiple factors simultaneously, and under some assumptions on the interplay of the factors, we may even get away with only a fraction of all possible factor combinations while still getting all the information we need.

We then discuss optimizing the combination of factors with respect to some response function, such as optimizing the composition of a medium solution to achieve maximum growth rate. Response surface methods offer an efficient and systematic way of finding optimal conditions with low effort through sequential experimentation; they are also common in industrial (engineering) applications.

Throughout the course, we will touch on several additional topics without getting into much detail, such as designs that are 'optimal' for either inference or prediction, and designs where experimental conditions are nested (e.g., split-plot designs).

The course assumes familiarity with the content of a typical introductory course in statistics: distributions and random variables, estimators and confidence intervals, hypothesis testing using p-values and false positives/negatives, and basics of linear regression or analysis of variance.
Lecture notesCourse material will be made available at: http://www.csb.ethz.ch/education/lectures.html
LiteratureMain text:
Gary W. Oehlert: A first course in design and analysis of experiments, Freeman (http://users.stat.umn.edu/~gary/Book.html)
Additional texts:
D. R. Cox: Planning of Experiments, Wiley
G. Casella: Statistical Design, Springer
H. R. Lindman: Analysis of variance in complex experimental designs, Freeman (now Springer)
252-3900-00LBig Data for Engineers Information
This course is not intended for Computer Science and Data Science MSc students!
W6 credits2V + 2U + 1AG. Fourny
AbstractThis course is part of the series of database lectures offered to all ETH departments, together with Information Systems for Engineers. It introduces the most recent advances in the database field: how do we scale storage and querying to Petabytes of data, with trillions of records? How do we deal with heterogeneous data sets? How do we deal with alternate data shapes like trees and graphs?
ObjectiveThis lesson is complementary with Information Systems for Engineers as they cover different time periods of database history and practices -- you can even take both lectures at the same time.

The key challenge of the information society is to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations.

This combination of requirements, together with the technologies that have emerged in order to address them, is typically referred to as "Big Data." This revolution has led to a completely new way to do business, e.g., develop new products and business models, but also to do science -- which is sometimes referred to as data-driven science or the "fourth paradigm".

Unfortunately, the quantity of data produced and available -- now in the Zettabyte range (that's 21 zeros) per year -- keeps growing faster than our ability to process it. Hence, new architectures and approaches for processing it were and are still needed. Harnessing them must involve a deep understanding of data not only in the large, but also in the small.

The field of databases evolves at a fast pace. In order to be prepared, to the extent possible, to the (r)evolutions that will take place in the next few decades, the emphasis of the lecture will be on the paradigms and core design ideas, while today's technologies will serve as supporting illustrations thereof.

After visiting this lecture, you should have gained an overview and understanding of the Big Data landscape, which is the basis on which one can make informed decisions, i.e., pick and orchestrate the relevant technologies together for addressing each business use case efficiently and consistently.
ContentThis course gives an overview of database technologies and of the most important database design principles that lay the foundations of the Big Data universe.

It targets specifically students with a scientific or Engineering, but not Computer Science, background.

We take the monolithic, one-machine relational stack from the 1970s, smash it down and rebuild it on top of large clusters: starting with distributed storage, and all the way up to syntax, models, validation, processing, indexing, and querying. A broad range of aspects is covered with a focus on how they fit all together in the big picture of the Big Data ecosystem.

No data is harmed during this course, however, please be psychologically prepared that our data may not always be in normal form.

- physical storage: distributed file systems (HDFS), object storage(S3), key-value stores

- logical storage: document stores (MongoDB), column stores (HBase)

- data formats and syntaxes (XML, JSON, RDF, CSV, YAML, protocol buffers, Avro)

- data shapes and models (tables, trees)

- type systems and schemas: atomic types, structured types (arrays, maps), set-based type systems (?, *, +)

- an overview of functional, declarative programming languages across data shapes (SQL, JSONiq)

- the most important query paradigms (selection, projection, joining, grouping, ordering, windowing)

- paradigms for parallel processing, two-stage (MapReduce) and DAG-based (Spark)

- resource management (YARN)

- what a data center is made of and why it matters (racks, nodes, ...)

- underlying architectures (internal machinery of HDFS, HBase, Spark)

- optimization techniques (functional and declarative paradigms, query plans, rewrites, indexing)

- applications.

Large scale analytics and machine learning are outside of the scope of this course.
LiteraturePapers from scientific conferences and journals. References will be given as part of the course material during the semester.
Prerequisites / NoticeThis course is not intended for Computer Science and Data Science students. Computer Science and Data Science students interested in Big Data MUST attend the Master's level Big Data lecture, offered in Fall.

Requirements: programming knowledge (Java, C++, Python, PHP, ...) as well as basic knowledge on databases (SQL). If you have already built your own website with a backend SQL database, this is perfect.

Attendance is especially recommended to those who attended Information Systems for Engineers last Fall, which introduced the "good old databases of the 1970s" (SQL, tables and cubes). However, this is not a strict requirement, and it is also possible to take the lectures in reverse order.
  •  Page  1  of  2 Next page Last page     All