Jan 6, 2021

Review: Kafka: The Definitive Guide

Last week i read the book "Kafka: The Definitive Guide" with the subtitle "Real-Time Data and Stream Processing at Scale" which was provided by confluent.io:


The book contains 11 chapters on 288 pages - let's take look on the content:

Chapter 1 "meet Kafka" start with a motivation, why moving data is important and why you should not spend your effort not into moving but into your business. In addition an introduction to the messaging concepts like publish/subscribe, queues, messages, batches, schemas, topics, partitions, ... Many technical terms are defined there, but some are specific to Kafka and some are more general definitions. One additional info: Kafka was built by linkedin - the complete story told in the last section of this chapter.

The second chapter is about installing Kafka. Nothing special. OS, Java, Zookeeper (if clustered), Kafka.

Chapter 3 is called "Kafka producers: Writing messages to Kafka". Like the title indicates: all configuration details about sending messages are listed and explained. 

Chapter 4 is the same as the previous chapter but for reading messages. Both chapters contain many java example listings.

Chapters 5 & 6 are about clusters and reliability. Here are the nifty details explained like high water marks, message replication, timeouts, indices, ... If you want to run a high available Kafka system, you should read that and in case of failures you will know what to do.

Chapter 7 introduces Kafka Connect. Here a citation, when you should use Connect (it is not possible to summarize this):

You will use Connect to connect Kafka to datastores that you did not write and whose code you cannot or will not modify. Connect will be used to pull data from the external datastore into Kafka or push data from Kafka to an external store. For datastores where a connector already exists, Connect can be used by nondevelopers, who will only need to configure the connectors.

"Cross data cluster mirroring" is the title of chapter 8 - i do not understand why this chapter is not placed before chapter 7...

In chapter 9 and 10 administration and monitoring is explained. Very impressive is the amount of CLI examples. If you have a question: here you will find the CLI command, which provides the answer.

The last chapter "stream processing" is one of the longest chapters (>40 pages).  Here two APIs are presented to do some processing based on the messages. One example is, a stream which processes stock quotes. With stream processing it is possible to calculate the number of trades for every five-second window or the average ask price for every five-second window. Of course this chapter shows much more, but i think this gives the best impression ;-)

All in all a excellent book - even if you are not implementing Kafka ;-)



 

No comments:

Post a Comment