Course Code: bdbwsr
Duration: 21 hours
Overview:
The "Databases in Building High-Performance Distributed Systems" training is an intensive course that combines the theory of distributed systems with practical database usage in designing scalable, fault-tolerant applications. Over three days, participants will gain knowledge on:✅ Key challenges of distributed systems: consistency, availability, scalability✅ Design patterns: CQRS, Saga, Two-Phase Commit (2PC), Circuit Breaker✅ Modern databases: document-based (MongoDB, CouchDB), key-value (Redis), graph-based (Neo4j), column-based (HBase), object-oriented (GridGain), time-series (TimescaleDB, InfluxDB)✅ Techniques for partitioning, sharding, and replication to enhance system performance✅ Managing real-time data versus classical batch processing✅ Practical methods for implementing distributed transactions and data recovery after failuresThe training includes numerous practical workshops where participants will model, optimize, and deploy database systems in a distributed environment. The course is intended for programmers, system architects, and administrators who want to acquire skills in efficiently managing data in distributed systems.
Course Outline:

Day 1: Theory and Introduction to Distributed Systems

  1. Introduction
    • Introduction to the training structure and agenda, discussion of the training environment.
  2. Basic Concepts of Distributed Systems
    • Definition of distributed systems and their significance in modern applications.
    • Key challenges: scalability, availability, consistency, fault tolerance.
  3. Data Consistency Models
    • Discussion of Strong Consistency and Eventual Consistency.
    • Managing consistency in distributed systems: quorum, Read-Write Quorums, Read Your Own Writes.
  4. Distributed Logging Systems and Communication
    • The pub/sub pattern and stream-table dualism.
    • Data compaction and real-time data processing.
  5. Case Study 1: Example of High-Performance Applications
    • Analysis of the architecture of communication systems (e.g., WhatsApp, Signal).
    • Challenges related to consistency and data recovery.

Day 2: Practical Aspects of Designing Distributed Systems

  1. Designing Fault-Tolerant Applications
    • Discussion of patterns: CQRS, Inbox/Outbox, Two-Phase Commit (2PC), Saga, Change Data Capture (CDC), Circuit Breaker, Read Repair.
    • Examples of practical applications.
  2. Examples of Non-Relational Databases
    • Document Databases (e.g., MongoDB, CouchDB):
    • Key-Value Databases (e.g., Redis):
    • Graph Databases (e.g., Neo4j, OrientDB):
    • Columnar Databases (e.g., HBase):
    • Object Databases (e.g., GridGain):
    • Time-Series Databases (e.g., TimescaleDB, InfluxDB):
    • Search Engines (e.g., Apache Solr):
    • In-Memory Grids (e.g., Hazelcast, GridGain):
  3. Modern Databases: Partitioning, Sharding, and Replication
    • Partitioning and Sharding: Discussion of techniques for dividing data into smaller fragments to improve system performance and scalability.
    • Data Replication: Different types of replication (synchronous, asynchronous), benefits, and challenges related to data replication in distributed environments.
    • Secondary Indexes: Creating and optimizing queries using secondary indexes to improve performance.
  4. Case Study 2: Designing a Graph-Based System
    • Graph-based design and modeling in distributed systems using Neo4j or OrientDB.
    • Practical exercise: graph modeling.
  5. Managing Real-Time Data vs. Traditional Data Warehouses
    • Introduction to real-time data processing and batch processing.
    • Example of using Timescale for monitoring time-series data.
  6. NewSQL – Modern Approach to Relational Databases
    • Discussion of the NewSQL concept as a combination of the advantages of relational databases with the flexibility and scalability of NoSQL solutions.
    • Task for NewSQL: Participants will familiarize themselves with the most popular NewSQL databases and will work with CockroachDB in a practical task. The goal of the task will be to implement transactions with ACID guarantees in a distributed environment.

Day 3: Practical Exercises and Database Optimization

  1. Practical Tasks Using Non-Relational Databases:
    • Task for MongoDB: Creating complex queries with data aggregation.
      • Participants will work on creating queries using the MongoDB pipeline, grouping, and filtering data in real-time.
    • Task for Redis: Implementing caching mechanisms using Redis.
      • Participants will build a system for storing query results in Redis to optimize read performance.
    • Task for CouchDB: Data synchronization in CouchDB using replication functions.
      • The task includes configuring replication between two CouchDB instances and analyzing data conflicts.
    • Task for Neo4j: Optimizing Cypher queries in a graph database.
      • Participants will analyze a large graph and build query optimizations to find dependencies between nodes.
    • Task for InfluxDB: Processing time-series data and optimizing data retention.
      • Exercises using InfluxQL for data flow analysis and setting retention strategies.
    • Task for GridGain: Processing data using GridGain, building and optimizing queries in an object-oriented environment.
      • Participants will optimize the storage and retrieval of large objects.
    • Task for Apache Solr: Implementing full-text search using Solr.
      • Creating indexes and optimizing search queries on large datasets.
  2. Task: Distributed Transactions
    • Participants will learn about the concept of distributed transactions, including mechanisms that ensure consistency in distributed systems.
    • Practical exercise: Implementing distributed transactions using patterns such as Two-Phase Commit (2PC) or Saga.
  3. Data Recovery After Failures and Backups
    • Patterns: Last-Writer-Wins, Vector Clocks, CRDT.
    • Strategies for data recovery after failures.
  4. Summary and Discussion
    • Q&A session and exchange of experiences.

The training does not cover relational databases, Elasticsearch, Apache Kafka, Prometheus, Cassandra.

Sites Published:

Polska - Bazy danych w budowaniu wysokowydajnych systemów rozproszonych