DS414
Faculty Profiles

Oleg Ivchenko
BigData system administrator at Yandex-CERN partnership

Julia Ivanova
Machine Learning Software Engineer, Information Analysis Centre of the Ministry of Emergency Situations

Mikhail Anukhin
Practical lecturer at MIPT
Course length
Duration
Total hours
Credits
Language
Course type
Fee for single course
Fee for degree students
Skills you’ll learn
During this course, the students will master and sharpen their knowledge in basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce, Hive, Spark (especially real-time Spark Streaming). The subject of particular interest during this course is efficient data warehousing using Hive and Spark.
Under the teacher’s supervision, they will study the intricacies of the system’s internals and their applications and learn distributed file systems, the purpose of their existence, and the ways of their application. The listeners will also practice using the MapReduce framework, a workhorse for many modern Big Data applications. The key element of this course is the possibility of applying knowledge into practice to process texts and solve sample business cases. Finally, the participants will deal with Spark, the next-generation computational framework, from its basic concepts up to advanced applications made to squeeze maximum performance.
15 classes
What is BigData? Working with distributed file systems (HDFS)
MapReduce paradigm. Basic knowledge
MapReduce paradigm.Advanced elements
MapReduce paradigm. APIs knowledge, practical examples
SQL over Big Data: Hive. Basic constructions
SQL over Big Data: Hive extensions (Hive Streaming, UDFs). Not only Hive
SQL over Big Data: Different data formats; practical cases
Spark. In-memory computational model. RDD API
Spark. Dataframe API, SQL
Big Data applications examples and Spark optimisation
Real-Time computations over Big Data. Spark Streaming
Real-Time message processing. Apache Kafka and its connection with Spark
Real-Time message processing. Kafka streams.
Guest speaker Ivan Ponomarev, Staff software engineer at Synthesized.io, senior Java lecturer at MIPT
NoSQL over Big Data. Apache HBase framework and its working with Hadoop. Apache Cassandra.
NoSQL over Big Data.Working with Apache Cassandra and Spark.
Books
Programming experience in Python. Python is required to complete programming assignments.
Basic Java or (and) Scala knowledge. Most Hadoop ecosystem frameworks are written on Java or Scala. So basic experience in these languages is good to deep dive into these services.
Unix basics. The Hadoop ecosystem is deployed on the computational servers. The modern servers usually work under the Linux operating system, so the learners should have at least minimal experience in Linux.
Git. Modern programming can’t be without a version control system. The most popular VCS is Git. During the course, we will emulate the real development process in the team (with git branches, merge requests etc.)
In the modern world of Big Data, employers require practical skills and experience together with theoretical knowledge. Though, with the help of cloud providers such as Amazon AWS, Google Cloud, Microsoft Azure, it is becoming easier to spawn your own cluster. However, it is still a challenge to get practical skills without wasting CPU cycles, money and time. This course is built upon programming assignments that will be evaluated on pre-configured clusters. It will help students focus on practical assignments instead of cluster sizing and configuration. All the materials will be available for students after the course.
Oleg is a Senior lecturer of the Department of Algorithms and Programming Technologies. Oleg started to work with BigData in 2015. Now he is the Head of the BigData course at the Department of Algorithms and Programming technologies and co-developer of the testing framework for “Big Data for Data Engineers” Coursera specialisation. He is also a Hadoop and HPC administrator at the Yandex-CERN partnership.
Under the direction of Alexey Dral, he developed HJudge - the testing system for application in the Hadoop ecosystem (Rospatent num. 2016660616). The following generation testing framework is used for autonomous testing of students' applications in this course.
See full profileJulia completed her bachelor's and Master's degrees at the Moscow Institute of Physics and Technology. Julia’s school love was physics, but at some point, the world of computer science lured her to its side. Now, in parallel with her work in the industry, she teaches several CS courses at her university.
See full profileMikhail works at the Department of Industrial Data Analysis in Retail. He developed and taught a course about the Fundamentals of Distributed Systems Theory and Designing Data-intensive Applications. Mikhail has led a lot of courses at MIPT, such as: “Theory and Practice of Concurrent Computing”, ”Algorithms and Data Structures” and ”Foundations of Programming”. He participated in the production of a few Big Data online courses for the Higher School of Economics and Innopolis University.
See full profileApply for this course
by Oleg Ivchenko, Julia Ivanova, Mikhail Anukhin
Total hours
45 Hours
Dates
Aug 01 - Aug 19, 2022
Fee for single course
€1500
Fee for degree students
€750
How to secure your spot
Complete the form below to kickstart your application
Schedule your Harbour.Space interview
If successful, get ready to join us on campus
FAQ
Will I receive a certificate after completion?
Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.
Do I need a visa?
This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.
Can I get a discount?
Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.