Course Code: kafadbesp
Duration: 21 hours
Course Outline:

Introduction (Brief)

Apache Kafka vs. traditional message brokers (Brief)

Overview of Kafka Features and Ecosystem (Brief)

Apache Kafka On-premise vs. in the Cloud (Brief)

Apache Kafka variants & in Docker/Kubernetes (Brief)

Setup

Installing and Configuring Apache Kafka

Setting up Zookeeper to Manage the Kafka Cluster

Testing the Cluster

Testing IDE, APIs integration with Kafka

  • Note** A lab with distributed Kafka Cluster will already be setup (Kafka with zookeeper)

Deep Dive into Kafka

Understanding Kafka Internals & architecture

  • Cluster formation & Membership
  • Zookeeper and its role
  • Leader, follower role for Kafka brokers & zookeeper (Load balancing)
  • The Controller –as per new version where ZK is not used (brief introduction)
  • Kraft Mode –as per new version where ZK is not used (brief introduction)
  • Topics, Partitions & Segments.
  • Messages & batches
  • Producers & consumers
  • Consumer groups, offsets and fault tolerance.
  • Brokers & replication.
  • Fault tolerance & semantics.
  • Important configurations.
  • Physical storage & understanding underlying log/index files.
  • Log retention & compaction

Understanding Kafka APIs

  • Kafka Java Client APIs
  • Kafka Producer Java API
  • Kafka Consumer Java API
  • Kafka AdminClient Java API
  • Kafka Streams Java API
  • Kafka Connect Java API
  • Terminologies

Working with Kafka programmatically & understanding reliability

Kafka Producers

Constructing a Kafka Producer, publishing messages, configurations for producers, understanding serializers, interceptors, headers, partitions, consistency, retries, compression, quotas/ throttling etc.

Related configurations for optimized performance.

Kafka Consumers

Constructing a consumer, working with consumers, consumer Groups.

Subscribing & consuming from topics.

Understanding Polling & heartbeat thread, commits & offsets, fetch behaviour, auto offset or preferred read, rebalancing listeners, serializers & deserializers

Related configurations for optimized performance.

      Reliable Data Delivery

Reliability Guarantees & Validating System Reliability

Understanding semantics & ensuring

Data durability & retention

Important considerations

      Kafka Streaming API

      Kafka Connect API

Building Data Pipelines

Considerations

Kafka Connect Versus Producer and Consumer

Kafka Connect

Managing Kafka Programmatically

AdminClient Lifecycle: Creating, Configuring and Closing, Configuration management, Consumer group management, Cluster Metadata & Testing 

Managing, administering & Monitoring Kafka

Cross-Cluster Data Mirroring

Use Cases, Hub-and-Spokes Architecture, Active-Active Architecture, Active-Standby Architecture and Apache Kafka’s MirrorMaker.

Administering Kafka

Topic Operations, Consumer & Consumer Groups, Dynamic Configuration Changes

Partition Management

Monitoring Kafka

Using tools to monitor Kafka Cluster

Understanding emitted metrics from Kafka & zookeepers

Client, performance & Lag Monitoring

Kafka logs

Known issues and optimizing Kafka & its components

Troubleshooting

           

Summary & conclusion