Message Queue
A message queue is a form of asynchronous communication between different parts of a system, allowing components to send and receive messages without being directly connected or operating at the same time.
This approach decouples message producers from consumers, improving system scalability and resilience. Messages are stored in a "queue" until they can be processed by the appropriate receiver, which makes it easier to manage traffic spikes and distribute load among multiple workers or services.
Message queues are used in distributed systems to ensure reliable delivery of information, support batch processing, and integrate different applications or software components efficiently.
Apache Kafka
Apache Kafka is an open source distributed event streaming platform. Initially developed by LinkedIn and later made by the Apache Software Foundation. It is designed to handle high volumes of data in real time, offering robust message processing functionalities.
Main concepts
Kafka Cluster: A Kafka cluster is a system distributed across multiple Kafka brokers that work together to handle the storage and processing of real-time streaming data. It offers fault tolerance, scalability, and high availability for efficient data streaming and messaging in large-scale applications.
Brokers: Brokers are the servers that make up the Kafka cluster. Each broker is responsible for receiving, storing and serving data. They manage the read and write operations of producers and consumers. Brokers also manage data replication to ensure fault tolerance.
Topics: Data in Kafka is organized into topics, which are logical channels to which producers send data and from which consumers read data. Each topic is divided into partitions, which are the basic unit of parallelism in Kafka. Partitions allow Kafka to scale horizontally by distributing data across multiple brokers.
Partitions: To manage large volumes of data, topics are divided into partitions. Each partition can be considered a log where records are stored in a sequence. This division allows Kafka to scale horizontally.
Record: A record in Kafka is a basic unit of data, similar to a message or entry.
Produtores: Produtores são aplicações cliente que publicam (escrevem) dados nos tópicos do Kafka. Eles enviam registros para o tópico e partição apropriados com base na estratégia de particionamento, que pode ser baseada em chave ou round-robin.
Consumidores: Producers are client applications that publish (write) data to Kafka topics. They send records to the appropriate topic and partition based on the partitioning strategy, which can be key-based or round-robin.
ZooKeeper: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization and group services. In Kafka, ZooKeeper is used to manage and coordinate Kafka brokers. ZooKeeper is presented as a separate component that interacts with the Kafka cluster.
Offsets: Offsets are unique identifiers assigned to each message in a partition. Consumers use these offsets to track their progress in reading messages from a topic.
Replicas: Topic backups to prevent data loss.
📚 References
https://www.youtube.com/watch?v=HZklgPkboro&list=PLHXG_yQQf1HVqezzQLS2NumpSX_ztSYgV&index=2 https://www.youtube.com/watch?v=UNUz1-msbOM&list=PLHXG_yQQf1HVqezzQLS2NumpSX_ztSYgV&index=3 https://www.geeksforgeeks.org/kafka-architecture/ https://developer.confluent.io/courses/architecture/broker/ https://medium.com/@rocky.bhatia86/kafka-crash-course-ba660a51ac71 https://medium.com/p/1743a584176d