263-5150-00L  Scientific Databases

SemesterHerbstsemester 2014
DozierendeG. H. Gonnet
Periodizitätjährlich wiederkehrende Veranstaltung
LehrspracheEnglisch


KurzbeschreibungScientific databases share many aspects with classical DBs, but have additional specific aspects. We will review Relational DBs, Object Oriented DBs, Knowledge DBs, textual DBs and the Semantic Web. All these topics will be studied from the point of view of the scientific applications (Bioinformatics, Physics, Chemistry, Health, Engineering) A toy SDB will be used for exercises.
LernzielThe goals of this course are to:
(a) Familiarize the students with how existing DBs can be used for
scientific applications.
(b) Recognize the areas where SciDBs differ and require additional
features compared to classical DBs.
(c) Be able to understand more easily SciDBs, improve existing ones
or design/create new ones.
(d) Familiarize the students with at least two examples of SciDBs.
Inhalt1) - Introduction, Statement of the problem, course structure, exercises,
why Scientific DBs (SDBs) do not fit exactly the classical DB area.
Hierarchy: File systems, data bases, knowledge bases and variations.
Efficiency issues and how they differ from classical DB.

2) - Relational DB used for scientific data, pros/cons
Introduction to RDB, limitations of the model, basics of SQL,
handling of metadata, examples of scientific use of RDBs.

3) - Object Oriented DB. Rich/structured objects are very appealing
in SDB. OODB primitives and environments. OODB searching.
Space and access time efficiency of OODBs.

4) - Knowledge bases, key-value stores, ontologies, workflow-based
architectures. WASA.

5) - MapReduce / Hadoop

6) - Storing and sharing mathematical objects, Open Math, its relation
with OODB and Knowledge bases. Also the problem of chemical
formula representation.

7) - SGML and XML, human-readable databases, genomic databases.
Advantages of human-readable databases (the huge initial success
of genomic databases).

8) - Semantic web, Resource Description Framework (RDF) triples, SparQL.
An example of very flexible database for knowlege storage. Goals of
the Semantic Web, discussion about its future.

9) - An ideal scenario (and the design of a toy system with most of the
desired features for exploration and exercises).

10) - Automatic dependency management, (make and similar). The graph
theory problem. Critical paths.

11) - Functional testing, Verifiers, Consistency, Short-circuit testing,
Recovery and Automatic recovery, Backup (incremental) methods.

12) - Performance and space issues, various uses of compression,
concurrency control. Hardware issues, clusters, Cloud computing,
Crowd-sourcing.

13) - Guest speaker: Ioannis Xenarios (UniProtKB/Swiss-Prot).
LiteraturSeveral papers and online articles will be made available.
There is no single textbook for this course.
A significant amount of material will be delivered in the lectures making lecture attendance highly recommended.