252-0376-00L  Data Warehousing

SemesterSpring Semester 2012
LecturersR. Marti
Periodicityyearly recurring course
Language of instructionEnglish



Courses

NumberTitleHoursLecturers
252-0376-00 VData Warehousing2 hrs
Fri09:15-11:00CAB G 51 »
R. Marti
252-0376-00 UData Warehousing1 hrs
Fri11:15-12:00CAB G 51 »
12:15-13:00CAB G 51 »
R. Marti

Catalogue data

AbstractData Warehousing from the perspectives of data modelers, developers, and end users.
Topics: On-Line Analytical Processing (OLAP) vs On-Line Transaction Processing (OLTP); data modeling for OLAP; OLAP querying and Data Mining algorithms; integration of data from existing databases: Extract Transform Load (ETL) processes, incl. mapping to a common data model and dealing with data quality issues.
ObjectiveKnow the raison d’ être for data warehouses, be able to design logical database schemas for On-Line Analytical Processing, know and correctly apply the options of physical database structures, be able to write SQL queries using advanced grouping and ranking operators and window functions, and know various approaches to data mining.
Content1 Introduction
Characterization of Data Warehouses (DWs); basic architecture of DWs; OLAP vs OLTP

2 Data Marts
Star and Snowflake Schemas: dimensions, measures and facts; advanced grouping, aggregation and ranking in SQL, pivoting; special architectural support in relational DBMSs: bitmap indexes, materialized views; columnar data stores

3 The Integrated Data Warehouse
Designing an integrated database schema and Master Data Management (MDM); valid time and transaction time, design patterns for versioned, temporal and bi-temporal data

4 ETL Processing
Staging and Landing Areas; quality assurance per source, e.g. dealing with missing or incorrect data; integration issues, e.g. determining common entities from different sources (entity resolution); data profiling

5 Data Mining
Detecting outliers, clusters, and association rules from large structured data sets.

6 Loose Ends (time permitting)
Workflow support; data lineage

7 Business Environment
The “enterprise feedback loop”: measuring the performance of an enterprise and planning future resource allocation
Lecture notesThe slides presented in the lecture will be published on the Web Page.
LiteratureBuying a textbook is not required. Pointers to selected literature will be published on the Web Page.

However, for "book collectors", the following two references are recommended:

1) Christopher Adamson: Star Schema, The Complete Reference. McGraw Hill 2010.
This book contains a good overview of data warehouses from design and usage perspectives.

2) Jiawei Han, Micheline Kamber, Jian Pei: Data Mining, Concepts and Techniques, 3rd Edition. Morgan Kaufmann, 2012.
The focus of this book is on data mining techniques (chapters 6-12), but it also covers the basics of data warehouses (chapters 2-5).
Prerequisites / NoticeA working knowledge of relational database technology, i.e. "core" SQL and the basic architecture of a DBMS (especially the basics of query evaluation and B-tree indexes), is assumed.

Performance assessment

Performance assessment information (valid until the course unit is held again)
Performance assessment as a semester course
ECTS credits4 credits
ExaminersR. Marti
Typesession examination
Language of examinationEnglish
RepetitionThe performance assessment is only offered in the session after the course unit. Repetition only possible after re-enrolling.
Mode of examinationwritten 60 minutes
Written aids"Open book examination": Any documents candidates wish to take along, including books, but excluding electronic devices.
This information can be updated until the beginning of the semester; information on the examination timetable is binding.

Learning materials

No public learning materials available.
Only public learning materials are listed.

Groups

No information on groups available.

Restrictions

There are no additional restrictions for the registration.

Offered in

ProgrammeSectionType
Computer Science MasterFocus Elective Courses Information SystemsWInformation
Certificate of Advanced Studies in Computer ScienceFocus Courses and ElectivesWInformation