CS310

Faculty
Nikolay Golov
CPO of Tengri Data Platform
Course length
Duration
Total hours
Credits
Language
Course type
Fee for single course
Fee for degree students
Skills you’ll learn
The course examines the fundamental problem of data storage in the context of modern technology for data-intensive applications. A particular focus is given to choosing the right database technology for a task at hand and tradeoffs between performance, ease of use, data integrity and other considerations. Hands-on lessons are done using PostgreSQL, MongoDB, Redis and Snowflake databases.
The course starts with a brief overview of a data storing task for an application. Data storing tasks in general, with all possible solutions, are like files, in-memory services or specialised applications (databases). We proceed with a list of requirements, which proved to be essential for a data storing tool: ACID, transactions, availability of data access languages (SQL, etc.). Afterwards, we will illustrate why given requirements determined the market dominance of classical relational databases (Oracle, MS SQL, PostgreSQL, MySQL, etc.) at the end of the 20th century. Later we describe why technological advances of the 21st century gave birth to a set of non-classical databases, such as in-memory storage, document storage, columnar storage, etc. The bulk of the remaining course focuses on the tradeoffs to be considered during technology selection and the database design. We discuss a Polyglot Persistence paradigm for combining multiple databases for different facets of an application, combining their strengths and mitigating their weaknesses. We discuss the balance between performance, complexity and permitted data delay for various databases and architectural approaches. We emphasise the difference between OLAP (analytical) and OLTP tasks and modern data warehouse designs (Data Vault, Anchor Modeling, etc.). Plenty of hands-on examples and homework is given to demonstrate ideas and compare and contrast various approaches and technologies. The course wraps up with a discussion of the modern state of the art databases, like serverless cloud databases (Snowflake) and global cloud tools, violating CAP-theorem (Google Cloud Spanner).
15 classes
Introduction. Data storage in general. CRUD. Evolution of approaches, the birth of relational model. SQLite.
SQLite. ACID - Durability. SQL - Create, Insert, Select, Group by. Practice 01 - SQLite.
ACID - atomicity. SQL - transactions, commit/rollback. Rollback journal, Write-Ahead-log. Basic indexes. Practice 02 - SQLite+python.
Classical client-server databases: PostgreSQL, MSSQL, Oracle, MySQL. Master/Slave replication. Different types of indexes: B-tree, hashtable, projection. SQL - Join, View. Practice 03 - PostgreSQL RDS.
Analytics. OLTP vs OLAP tasks for classical databases. Kimball vs Inmon. “Star” schema, “Snowflake” schema. Modern BI tools - Tableau, Looker. Analytical SQL. Practice 04 - PostgreSQL RDS.
ACID - Isolation. Transaction isolation levels. Replication techniques. Limits of classical databases: single master, raw-storage, inefficiency.
Key-value storages. Redis. Rethinking everything. Memcached. Using key-value as cash. Sharding approach. Practice 05 - -ElastiCache Redis.
Document-oriented databases. MongoDB. JSON and SQL, document store for classical DB. Sharding approach. Practice 06.
Column storages. OLTP vs OLAP tasks for columnar databases. Vertica, Greenplum, ClickHouse, Snowflake. Sharding approach. SQL - window functions. Practice 07 - Snowflake.
Modern analytics using columnar databases: Data Vault, Anchor Modeling. Big data clickstream analytics. Practice 08 - Snowflake.
Databuses. Kafka, Pulsar. Databuses for OLTP (event-based architecture) and OLAP (data streaming).
Polyglot persistence. Modern databases on a Performance/Complexity/Delay graph. Proper roles of classical, key-value, document-orients, columnar databases and data-buses.
Risks of polyglot persistence. CAP theorem - meaning and applications. Eventual consistency. SAGA pattern.
Clouds change everything. Managed databases. Serverless databases. BigQuery, Snowflake, YDB, Athena, Aurora.
Future databases. Can a CAP theorem be violated? Final discussion.
Media
Python coding experience
Basic understanding of algorithms or set theory
Classes will consist of lectures and discussions based on readings from database literature. Some topics will be covered by practical tasks.
Nikolay got his M.S. degree in applied mathematics and cybernetics from Moscow State University, Russia. Afterwards, he had 15 years of experience building data platforms for various startups and enterprises. From 2013 until 2019, he headed the Data Platform of Avito, Craigslist of Russia, which grew to a multi-billion-dollar company from a small startup. In Avito, he was responsible for analytical databases (Vertica, ClickHouse), OLTP engines (PostgreSQL, Redis, MongoDB), and data buses (Kafka) for analytics and microservices. Later he was Head of Data Platform at ManyChat (a California and Barcelona-based SaaS startup), responsible for the implementation and growth of its Data Platform (AWS+Redis+Snowflake+Tableau), which is being used for analytics and AI. Currently Nikolay is a CPO of a startup, creating a new analytical database, Tengri Data Platform.
See full profileApply for this course
by Nikolay Golov
Total hours
45 Hours
Dates
May 01 - May 19, 2023
Fee for single course
€1500
Fee for degree students
€750
How to secure your spot
Complete the form below to kickstart your application
Schedule your Harbour.Space interview
If successful, get ready to join us on campus
FAQ
Will I receive a certificate after completion?
Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.
Do I need a visa?
This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.
Can I get a discount?
Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.