Course Code: bspcasstxtk
Duration: 21 hours
Prerequisites:

Created for Textkernel

Overview:

Bespoke course with staged delivery for Textkernel

This training is delivered in 3 days each targeting a specific profile.

The content of the first day if for big data and NoSQL enthusiasts but also for architects and analysts, the first day covers:

  • Cassandra’s origins: Cassandra was born out the association of concepts coming from different precursor technologies and research in the field of IT. We will explore how Google’s “Bigtable”, Amazon’s “Dynamo” and concepts like “Staged Event-Driven Architecture” have contributed to bring together the architecture and data model of Cassandra.
  • Cassandra’s position in today’s Information Technology: Cassandra occupies a very specific spot in the landscape of solutions intended to service data. Cassandra must be put in perspective with other NoSQL solutions, the Big Data mentality and the CAP theorem. This will help answer the question of “what use case is for Cassandra?”, “where does Cassandra stand in the landscape of IT”, “what can I do with Cassandra?”
  • Cassandra’s Architecture & Data Model: The internals and details of the architecture and the Data Model along with how Cassandra reacts as a cluster are the objective of the first day and constitute a requirement to be able to attend the second and/or third day of the training.

The content of the second day is for administrators, it covers:

  • The different distributions of Cassandra
  • Installation and configuration of Cassandra cluster spanning a single data centre and 2 data centres
  • Administration of Cassandra using SQLSH
  • Administration of Cassandra using admin tools
  • Tuning Cassandra

The content of the third day is for any data centric profile, it covers:

  • Making the most / exploiting Cassandra’s Data Model: the hybrid row columnar multidimensional table
  • Querying and manipulating data

Cassandra for specific needs (time series in Cassandra for example)

Course Outline:
  1. Data model of Google’s Bigtable - Architecture of Amazon’s dynamo

Cassandra was inspired by two major systems namely Bigtable and Dynamo.

The first day of the training starts by exploring those two solutions and understanding what makes their strength and their popularity. Followed by which the Cassandra architecture and internals (protocols…) will be addressed along with the different cluster topologies and the consequences they have on how operations (read and right operations) take place.

  1. Solutions that have influenced Cassandra
  2. Data Model of Bigtable
  • The idea behind Google’s Bigtable
  • The data model
    • Rows
    • column families
    • Timestamps
  • Building blocks
    • Tablets
    • Compaction
    • Commit logs
    • Bloom filters
  • Performance
  • Applications
  1. Architecture of Amazon’s Dynamo
  • The idea behind Amazon’s Dynamo
  • Design considerations
    • Peer to peer systems
  •                 Distributed systems and databases
  • Architecture
    • Partitioning
    • Replication
    • Versioning
    • Hinted Handoff
    • Membership and Failure detection
  • Performance
  • Load distribution
  • Divergent versions
  1. SEDA
  • Background
    • Thread based concurrency
    • Bounded thread pools
    • Event driven concurrency
    • Structured event queues
  • The Staged Event-Driven Architecture
    • Goals
    • Stages
    • Resource controllers
  • Asynchronous IO
  1. Cassandra
  2. The need for a solution like Cassandra
  • RDBMS Foundations
  • RDBMS for Scale
  • Consistency, consistency levels & the CAP Theorem
  • First Words on Cassandra
  • Use Cases for Cassandra
  1. The Cassandra data model
  • The Relational Data Model
  • The Relational Data Interaction
  • Wanted Evolutions
  • Logical Structures of the data model
  • Map & Sorted Map
  • Considerations (time series…)
  1. A Global view of Cassandra
  • The Cluster
  • KeySpaces
  • Column Families
  • Columns
  1. Elements of architecture 1
  • A multidimensional hybrid row columnar structure
  • Partitioners and data distribution (Consistent Hashing)
  • The Ring representation
  • Vnodes
  • Read and right operations: Quorum and consistency levels
    • Bloom filters
    • Caches
  1. Elements of architecture 2
  • System keyspace
  • Gossip & Failure Detection
  • Anti-Entropy & Read Repair
  • Hinted Handoff
  • SEDA
  • Memtables
  • SStables
  • Commit logs
  • Compaction
  1. Installation & Administrative operation of a Cassandra Cluster
  1. The different distributions of Cassandra
  • Apache Cassandra
  • Datastax Cassandra
  1. LABS
  • A simple Cassandra installation
  • The CQL shell (help, Configuration, Keyspaces and tables, moving data with CQLSH, roles, permissions, users, consistency levels)
  • Deployment of Cassandra cluster on a single data centre
  • Operations on Memtables
  • Backup and Recovery
  • Deployment of Cassandra cluster on 2 data centres
    • Adding and removing a node to a cluster
    • Use of different snitches
  1. Tuning Cassandra
  • Tracing to analyse performance
  • Bloom filters performance
  • Caching (configuring and monitoring)
  1. Data Modelling with Cassandra
  1. LABS
  • Clustering columns
  • Counters and TTL
  • The design of row keys and column names
  • Compound and composite keys
    • Skinny rows
    • Wide rows
  • Secondary Indexes
  • Operations on tuples, maps, sets UDT
  • Time series