Cassandra for Administrators and Developers

Course Code: bspcasstxtk

Duration: 21 hours

Prerequisites:

Created for Textkernel

Overview:

Bespoke course with staged delivery for Textkernel

This training is delivered in 3 days each targeting a specific profile.

The content of the first day if for big data and NoSQL enthusiasts but also for architects and analysts, the first day covers:

Cassandra’s origins: Cassandra was born out the association of concepts coming from different precursor technologies and research in the field of IT. We will explore how Google’s “Bigtable”, Amazon’s “Dynamo” and concepts like “Staged Event-Driven Architecture” have contributed to bring together the architecture and data model of Cassandra.
Cassandra’s position in today’s Information Technology: Cassandra occupies a very specific spot in the landscape of solutions intended to service data. Cassandra must be put in perspective with other NoSQL solutions, the Big Data mentality and the CAP theorem. This will help answer the question of “what use case is for Cassandra?”, “where does Cassandra stand in the landscape of IT”, “what can I do with Cassandra?”
Cassandra’s Architecture & Data Model: The internals and details of the architecture and the Data Model along with how Cassandra reacts as a cluster are the objective of the first day and constitute a requirement to be able to attend the second and/or third day of the training.

The content of the second day is for administrators, it covers:

The different distributions of Cassandra
Installation and configuration of Cassandra cluster spanning a single data centre and 2 data centres
Administration of Cassandra using SQLSH
Administration of Cassandra using admin tools
Tuning Cassandra

The content of the third day is for any data centric profile, it covers:

Making the most / exploiting Cassandra’s Data Model: the hybrid row columnar multidimensional table

Querying and manipulating data

Cassandra for specific needs (time series in Cassandra for example)

Course Outline:

Data model of Google’s Bigtable - Architecture of Amazon’s dynamo

Cassandra was inspired by two major systems namely Bigtable and Dynamo.

The first day of the training starts by exploring those two solutions and understanding what makes their strength and their popularity. Followed by which the Cassandra architecture and internals (protocols…) will be addressed along with the different cluster topologies and the consequences they have on how operations (read and right operations) take place.

Solutions that have influenced Cassandra
Data Model of Bigtable

The idea behind Google’s Bigtable
The data model
- Rows
- column families
- Timestamps
Building blocks
- Tablets
- Compaction
- Commit logs
- Bloom filters
Performance
Applications

Architecture of Amazon’s Dynamo

The idea behind Amazon’s Dynamo
Design considerations
- Peer to peer systems
Distributed systems and databases
Architecture
- Partitioning
- Replication
- Versioning
- Hinted Handoff
- Membership and Failure detection
Performance
Load distribution
Divergent versions

SEDA

Background
- Thread based concurrency
- Bounded thread pools
- Event driven concurrency
- Structured event queues
The Staged Event-Driven Architecture
- Goals
- Stages
- Resource controllers
Asynchronous IO

Cassandra
The need for a solution like Cassandra

RDBMS Foundations
RDBMS for Scale
Consistency, consistency levels & the CAP Theorem
First Words on Cassandra
Use Cases for Cassandra

The Cassandra data model

The Relational Data Model
The Relational Data Interaction
Wanted Evolutions
Logical Structures of the data model
Map & Sorted Map
Considerations (time series…)

A Global view of Cassandra

The Cluster
KeySpaces
Column Families
Columns

Elements of architecture 1

A multidimensional hybrid row columnar structure
Partitioners and data distribution (Consistent Hashing)
The Ring representation
Vnodes
Read and right operations: Quorum and consistency levels
- Bloom filters
- Caches
- …

Elements of architecture 2

System keyspace
Gossip & Failure Detection
Anti-Entropy & Read Repair
Hinted Handoff
SEDA
Memtables
SStables
Commit logs
Compaction

Installation & Administrative operation of a Cassandra Cluster

The different distributions of Cassandra

Apache Cassandra
Datastax Cassandra

LABS

A simple Cassandra installation
The CQL shell (help, Configuration, Keyspaces and tables, moving data with CQLSH, roles, permissions, users, consistency levels)
Deployment of Cassandra cluster on a single data centre
Operations on Memtables
Backup and Recovery
Deployment of Cassandra cluster on 2 data centres
- Adding and removing a node to a cluster
- Use of different snitches

Tuning Cassandra

Tracing to analyse performance
Bloom filters performance
Caching (configuring and monitoring)

Data Modelling with Cassandra

LABS

Clustering columns
Counters and TTL
The design of row keys and column names
Compound and composite keys
- Skinny rows
- Wide rows
Secondary Indexes
Operations on tuples, maps, sets UDT
Time series