Kafka CheatSheet
Introduction
Let’s review the main concepts in kafka
Step 1: Choose your seriealization
Choose whichever serialization method you want, if you want to serialize multiple message in one sending do it.
Step 2: Scale
You can publish and consume for the same topic
on multiple brokers.
topic1/partition1 (can be on broker1 within this you maintani order for consumers meaning first in will be first out for consumer) topic1/partition2 (can be on broker2)
Step 3: Storage
- Each partition corresponds to logical log.
- Messages do not have explicit ids, they have logical offset, reduces complexity, no random seek.
- Each consumer pull request ocntains the offset to consume from.
Step 4: Kafka Broker
Kafka brokers are stateless
this is not the case in other messaging queues! The consumer holds it’s sequence! How does the broker knows when to delete messages? => SLA, retention. if message is on broker longer than a period.
Consumer can violate queue and rewind and reread messages.
Step 5: Zookeper
Agreeing which serer is alive network failures etc. Each kafka broker coordinates with other brokers via zookeper.
Step 6: Kafka vs others
- 5 to 10 times faster.
- Producer doesnt wait for acknoledgment.
- Can batch multiple messages in one send receive.
- More efficient storage format.
- Consumer hodls the sequence id thus kafka is stateless.
Summary