Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources

CS310

Data Storages

Barcelona Campus
May 01, 2023 - May 19, 2023
The course examines the fundamental problem of data storage in the context of modern technology for data-intensive applications.
Barcelona Campus
May 01, 2023 - May 19, 2023
Nikolay Golov

Faculty

Nikolay Golov

CPO of Tengri Data Platform

Course length

3 weeks

Duration

3 hours
per day

Total hours

45 hours

Credits

6 ECTS

Language

English

Course type

Offline

Fee for single course

€1500

Fee for degree students

€750

Skills you’ll learn

Computer ScienceSQLDatabase DesignData WarehousingData StructuresData StoringData Engineering
OverviewCourse outlineCourse materialsPrerequisitesMethod & grading

Overview

The course examines the fundamental problem of data storage in the context of modern technology for data-intensive applications. A particular focus is given to choosing the right database technology for a task at hand and tradeoffs between performance, ease of use, data integrity and other considerations. Hands-on lessons are done using PostgreSQL, MongoDB, Redis and Snowflake databases.

The course starts with a brief overview of a data storing task for an application. Data storing tasks in general, with all possible solutions, are like files, in-memory services or specialised applications (databases). We proceed with a list of requirements, which proved to be essential for a data storing tool: ACID, transactions, availability of data access languages (SQL, etc.). Afterwards, we will illustrate why given requirements determined the market dominance of classical relational databases (Oracle, MS SQL, PostgreSQL, MySQL, etc.) at the end of the 20th century. Later we describe why technological advances of the 21st century gave birth to a set of non-classical databases, such as in-memory storage, document storage, columnar storage, etc. The bulk of the remaining course focuses on the tradeoffs to be considered during technology selection and the database design. We discuss a Polyglot Persistence paradigm for combining multiple databases for different facets of an application, combining their strengths and mitigating their weaknesses. We discuss the balance between performance, complexity and permitted data delay for various databases and architectural approaches. We emphasise the difference between OLAP (analytical) and OLTP tasks and modern data warehouse designs (Data Vault, Anchor Modeling, etc.). Plenty of hands-on examples and homework is given to demonstrate ideas and compare and contrast various approaches and technologies. The course wraps up with a discussion of the modern state of the art databases, like serverless cloud databases (Snowflake) and global cloud tools, violating CAP-theorem (Google Cloud Spanner).

Learning highlights

  • Students shall be able to study real applications (actual business areas of an enterprise or startup) and select appropriate database engines for them.
  • Students shall understand the pros and cons of different existing database engines.
  • Students shall know and efficiently use SQL, both for OLTP and analytical tasks.
  • Students shall be able to decompose a single application to multiple subdomains, offering an optimal database for each subdomain.
  • Students shall be able to forecast possible long-term risks of database choices made and mitigate them.

Course outline

15 classes

Dive into the details of the course and get a sense of what each class will cover.
Monday
Tuesday
Wednesday
Thursday
Friday
Monday
1

Session 1

Introduction. Data storage in general. CRUD. Evolution of approaches, the birth of relational model. SQLite.

Tuesday
2

Session 2

SQLite. ACID - Durability. SQL - Create, Insert, Select, Group by. Practice 01 - SQLite.

Wednesday
3

Session 3

ACID - atomicity. SQL - transactions, commit/rollback. Rollback journal, Write-Ahead-log. Basic indexes. Practice 02 - SQLite+python.

Thursday
4

Session 4

Classical client-server databases: PostgreSQL, MSSQL, Oracle, MySQL. Master/Slave replication. Different types of indexes: B-tree, hashtable, projection. SQL - Join, View. Practice 03 - PostgreSQL RDS.

Friday
5

Session 5

Analytics. OLTP vs OLAP tasks for classical databases. Kimball vs Inmon. “Star” schema, “Snowflake” schema. Modern BI tools - Tableau, Looker. Analytical SQL. Practice 04 - PostgreSQL RDS.

Monday
6

Session 6

ACID - Isolation. Transaction isolation levels. Replication techniques. Limits of classical databases: single master, raw-storage, inefficiency.

Tuesday
7

Session 7

Key-value storages. Redis. Rethinking everything. Memcached. Using key-value as cash. Sharding approach. Practice 05 - -ElastiCache Redis.

Wednesday
8

Session 8

Document-oriented databases. MongoDB. JSON and SQL, document store for classical DB. Sharding approach. Practice 06.

Thursday
9

Session 9

Column storages. OLTP vs OLAP tasks for columnar databases. Vertica, Greenplum, ClickHouse, Snowflake. Sharding approach. SQL - window functions. Practice 07 - Snowflake.

Friday
10

Session 10

Modern analytics using columnar databases: Data Vault, Anchor Modeling. Big data clickstream analytics. Practice 08 - Snowflake.

Monday
11

Session 11

Databuses. Kafka, Pulsar. Databuses for OLTP (event-based architecture) and OLAP (data streaming).

Tuesday
12

Session 12

Polyglot persistence. Modern databases on a Performance/Complexity/Delay graph. Proper roles of classical, key-value, document-orients, columnar databases and data-buses.

Wednesday
13

Session 13

Risks of polyglot persistence. CAP theorem - meaning and applications. Eventual consistency. SAGA pattern.

Thursday
14

Session 14

Clouds change everything. Managed databases. Serverless databases. BigQuery, Snowflake, YDB, Athena, Aurora.

Friday
15

Session 15

Future databases. Can a CAP theorem be violated? Final discussion.

Prerequisites

Python coding experience

Basic understanding of algorithms or set theory

Methodology

Classes will consist of lectures and discussions based on readings from database literature. Some topics will be covered by practical tasks.

Grading

The final grade will be composed of the following criteria:
20% - Participation
30% - Daily Quizzes
50% - Practices
Nikolay Golov

Faculty

Nikolay Golov

CPO of Tengri Data Platform

Nikolay got his M.S. degree in applied mathematics and cybernetics from Moscow State University, Russia. Afterwards, he had 15 years of experience building data platforms for various startups and enterprises. From 2013 until 2019, he headed the Data Platform of Avito, Craigslist of Russia, which grew to a multi-billion-dollar company from a small startup. In Avito, he was responsible for analytical databases (Vertica, ClickHouse), OLTP engines (PostgreSQL, Redis, MongoDB), and data buses (Kafka) for analytics and microservices. Later he was Head of Data Platform at ManyChat (a California and Barcelona-based SaaS startup), responsible for the implementation and growth of its Data Platform (AWS+Redis+Snowflake+Tableau), which is being used for analytics and AI. Currently Nikolay is a CPO of a startup, creating a new analytical database, Tengri Data Platform.

See full profile

Apply for this course

Snap up your chance to enroll before all spaces fill up.

Data Storages

by Nikolay Golov

Total hours

45 Hours

Dates

May 01 - May 19, 2023

Fee for single course

€1500

Fee for degree students

€750

How to secure your spot

Complete the form below to kickstart your application

Schedule your Harbour.Space interview

If successful, get ready to join us on campus

FAQ

Will I receive a certificate after completion?

Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.

Do I need a visa?

This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.

Can I get a discount?

Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.