Question about Kafka Flink consumer parallelism - apache-kafka

I am trying to figure out how leverage parallelism to improve throughput of a Kafka consumer.
From my research, I understand the scenario when kafka partitions (=<>) # consumer and to use rebalance spread messages evenly across workers.
Also use setParallelism(#) to achieve the similar effect as adding more bolts in Storm`s speak. In storm, there is an offsetManager to handle multiple outstanding offsets due to parallelism.
Does Flink also has mechanism to manage multiple offset when setParallelism is used and make sure the offset is committed 'in order'?
From my own experiments, looks like it has something to do with whether checkpointing is enabled and the interval of checkpoint if it is enabled.
when setParallelism is used, if one thread is stuck, how does Flink decide what is the number of uncommitted offset?
Looks like Flink is able to manage offsets correctly during parallel execution. I`d like to understand how Flink does it behind the scene.

Related

In what situation can a Flink 1.15.2 job stop consumption on a single Kafka partition, but continue to consume on the other partitions?

We are running a Flink 1.15.2 cluster with a job that has a Kafka Source and Kafka Sink.
The Source topic has 30 partitions. There are 5 TaskManager nodes with a capacity of 4 slots, and we are running the job with a parallelism of 16, so that is 4 free slots. So depending upon the slots/node assignment, we can expect, each node to have roughly 6-7 partitions assigned.
Our alerting mechanisms notified us that consumer lag was getting built up on a single partition out of the 30 partitions.
As Flink does its own offset management, we had no way of figuring out (through the Flink Web UI or the Kafka console tools) which TaskManager the partition was assigned to.
I would like to know if anyone else has faced this in their experience, and what can be done to proactively monitor and/or mitigate such instances in future. Is it possible for a single partition consumer thread to behave in this manner?
We decided to bounce the Flink TaskManager service one by one hoping that a partition reassignment would jump start consumption again. Bouncing the first node had no impact, but when we bounced the second node, some other TaskManager picked up the lagging partition and started consumption again.
Maybe related to this https://issues.apache.org/jira/browse/FLINK-28975? For more details see also here.
I doubt this is the correct explanation, but perhaps watermark alignment could explain this sort of behavior.

how do i test Exactly Once Semantics working in my kafka streams application

i have a Kafka Streams DSL application, we have a requirement on exactly once processing, for the same i have added the configuration
streamConfig.put(processing.gurantee, "exactly_once");
I am using kafka 2.7
I have 2 queries
what's the difference between exactly_once and exactly_once_beta
how do i test this functionality to be sure my messages are getting processed only once
Thanks!
exactly_once_beta is an improvement over exactly_once. While exactly_once uses a transactional producer for each stream task (combination of sub-topology and input partition, exactly_once_beta uses a transactional producer for each stream thread of a Kafka Streams client.
Every producer comes with separate memory buffers, a separate thread, separate network connections which might limit scaling the number of input partitions (i.e. number of tasks). A high number of producers might also cause more load on the brokers. Hence, exactly_once_beta has better scaling characteristics. You can find more details in KIP-447.
Note that exactly_once will be deprecated and exactly_once_beta will be renamed to exactly_once_v2 in Apache Kafka 3.0. See KIP-732 for more details.
For tests you can get inspiration from the tests in the Apache Kafka repo:
https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/EosIntegrationTest.java
https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/EOSUncleanShutdownIntegrationTest.java
https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/streams/streams_eos_test.py
Basically, you need to create a failover scenario and verify that messages are not produced multiple times to the output topics. Note that messages may be processed multiple times, but the results in the output topics must appear as if they were only processed once. You can find a pretty good talk about exactly-once semantics that also explains the failover scenarios here: https://www.confluent.io/kafka-summit-london18/dont-repeat-yourself-introducing-exactly-once-semantics-in-apache-kafka/

Multiple Flink pipelines for the same Kafka topic

Background
We have a Kafka topic with a steady stream of data. To process it we have a stateless Flink pipeline that consumes that topic and writes to another topic.
From time to time we have bursts of information that our Flink is not configured to handle. We don't want to configure our Flink pipeline and cluster to always support the maximum load we can have, we want to dynamically scale according to the load. (budget reasons $$$)
Solutions we thought of
One way to do so is to add/remove nodes to the Flink cluster and change the parallelism of the Flink pipeline operators. This will require stopping the Flink job with a snapshot, reconfiguring the parallelism and restarting with new parallelism.
This would be great but we cannot allow ourselves the downtime it produces. We have to scale up/down without downtime.
If we would use regular Kafka consumers it would be as simple as adding a consumer (assuming we have enough Kafka partitions) and Kafka would redistribute the topic partitions between all the consumers.
The Flink Kafka consumer manages the partition assignment and the offset on its own which allows exactly-once semantics (we don't need it). The drawback is that a single Flink job always uses all the topic partitions.
We thought we could create another instance of Flink that would subscribe to the same topic with the same group and let Kafka distribute the partitions between them. But for that we would need the Kafka Flink consumer to let Kafka manage which partitions are assigned to which consumer.
What are we looking for
We couldn't find a library that contains such a consumer or a configuration in the existing consumer. We could write it on our own (not so difficult) but if there is an existing solution we'd rather use it.
Are we missing something? Are we misunderstanding something? Is there a better solution?
Thanks!
The most straightforward approach, since you said that at worst you'll need double the capacity, would be to modify your topology to be able to write Kafka messages you can't process quickly enough to a second overflow Kafka topic. Both input and output Kafka topic names would be configurable. Maybe you would have a threshold backlog delay that automatically triggers this writing or maybe you would have a flag in the topology that you can externally set while the topology is running. That's a design detail you can work through that has operational implications.
This gives you a Flink topology that can handle some maximum number of messages in a timely fashion while writing the rest of the messages that can't be handled to a second Kafka topic. You can then run a second instance of the same Flink topology that reads from that secondary topic and writes, if necessary to a third topic. If the writing to the overflow topic happens very early in the topology processing, you could chain several of these instances together via Kafka with minimal latency and without having to reconfigure and restart any topologies.

How does Kafka message processing scale in publish-subscribe mode?

All, Forgive me I am a newbie just beginner of Kafka. Currently I was reading the document of Kafka about the difference between traditional message system like Active MQ and Kafka.
As the document put.
For the traditional message system. they can not scale the message processing.
Since
Publish-subscribe allows you broadcast data to multiple processes, but
has no way of scaling processing since every message goes to every
subscriber.
I think this make sense to me.
But for the Kafka. Document says the Kafka can scale the message processing even in the publish-subscribe mode. (Please correct me if I was wrong. Thanks.)
The consumer group concept in Kafka generalizes these two concepts. As
with a queue the consumer group allows you to divide up processing
over a collection of processes (the members of the consumer group). As
with publish-subscribe, Kafka allows you to broadcast messages to
multiple consumer groups.
The advantage of Kafka's model is that every topic has both these
properties—it can scale processing and is also multi-subscriber—there
is no need to choose one or the other.
So my question is How Kafka make it ? I mean scaling the processing in the publish-subscribe mode. Thanks.
The main unique features in Kafka that enables scalable pub/sub are:
Partitioning individual topics and spreading the active partitions across multiple brokers in the cluster to take advantage of more machines, disks, and cache memory. Producers and consumers often connect to many or all nodes in the cluster, not just a single master node for a given topic/queue.
Storing all messages in a sequential commit log and not deleting them when consumed. This leads to more sequential reads and writes, offloads the broker from having to deal with keeping track of different copies of messages, deleting individual messages, handling fragmentation, tracking which consumer has acknowledged consuming which messages.
Enabling smart parallel processing of individual consumers and consumer groups in a way that each parallel message stream can come from the distributed partitions mentioned in #1 while offloading the offset management and partition assignment logic onto the clients themselves. Kafka scales with more consumers because the consumers do some of the work (unlike most other pub/sub brokers where the bulk of the work is done in the broker)

kafka log deletion and load balancing across consumers

Say a consumer does a time intensive processing. In order to scale consumer side processing, i would like to spawn multiple consumers and consumer messages from kafka topic in a round robin fashion. Based on the documentation, it seems like if i create multiple consumers and add them in one consumer group, only one consumer will get the messages. If i add consumers to different consumer groups, each consumer will get the same message. So, in order to achieve the above objective, is the only solution to partition the topic ? This seems like an odd design choice, because the consumer scalability is now bleeding into topic and even producer design. Ideally, if a topic does not partitioning, there should be no need to partition it. This puts un-necessary logic on producer and also causes other consumer types to consume from these partitions that may only make sense to one type of consumer. Plus it limits the usecase, where a certain consumer type may want ordering over the messages, so splitting a topic into partitions may not be possible.
Second if i choose "cleanup.policy" to compact, does it mean that kafka log will keep increasing as it will maintain the latest value for each key? If not, how can i get log deletion and compaction?
UPDATE:
It seems like i have two options to achieve scalability on consumer side, which are independent of topic scaling.
Create consumer groups and have them consume odd and even offsets. This logic would have to be built into the consumers to discard un-needed messages. Also doubles the network requirements
Create a hierarchy of topics, where the root topic gets all the messages. Then some job classifies the logs and publish them again to more fine grained topics. In this case, the strong ordering can be achieved at root and more fine grained topics for consumer scaling can be constructed.
In 0.8, kafka maintains the consumer offset, so publishing messages in a round robin across various consumers is not a too far fetched requirement from their design.
Partitions are the unit of parallelism in Kafka by design. Not just for consumtion but kafka distributes the partiotions accross cluster which has different other benifits like sharing load among different servers, replication management for ensuring no Data loss, managing log to scale beyond a size that will fit on a single server etc.
Ordering of messages is a key factor as if you do not need a storng ordering then diving topics with multiple partitions will allow you to evenly distribute the load while producing (this will be handled by the producer itself). And while using consumer group you just need to add more consumer instances in the same group in order to consume them parallely.
Plus it limits the usecase, where a certain consumer type may want ordering over the messages, so splitting a topic into partitions may not be possible.
True,from the doc
However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process.
Maintaining ordering whiile consuming in distributed manner requires the messaging system to maintain per-message state to keep track of message acknowledgement. But this will involve a lot of expensive random I/O in the system. So clearly there is a trade-off.
Ideally, if a topic does not partitioning, there should be no need to partition it. This puts un-necessary logic on producer and also causes other consumer types to consume from these partitions that may only make sense to one type of consumer
Distributing messages across partitions is typically handled by the producer it self without any intervention from the programmers end (assuming you don't want to categories messages using key). And for the consumers as you just mentioned here the better choice would be to use Simple/Low level consumers which will allow you to consume only a subset of the partitions in a topic.
This seems like an odd design choice, because the consumer scalability is now bleeding into topic and even producer design
I believe for a system like Kafka which focuses on high throughput ( handle hundreds of megabytes of reads and writes per second from thousands of clients ), ensuring scalability and strong durability and fault-tolerance guarantees might not be a good fit for someone having totally a different business requirements.
Topic partitioning is primarily a way to scale out consumers and brokers so if you need many consumers to keep up then you need to partition the topic and add multiple consumer instances in the same consumer group. The producer API will manage partitions transparently. If you need to have certain consumers subscribing only to some partitions, then you need to use the simple consumer API instead of the high level API and in this case you don't have the consumer group concept and have to coordinate consumption yourself.
Message ordering is guaranteed within partitions but not between partitions so if this is a requirement it needs to be dealt with on consumer side.
Setting cleanup.policy=compact means that the Kafka brokers will keep the latest version of a message key indefinitely and use cases like that should be more for recording of data updates for things you intend to keep around rather than the log stream buffering use case.
You need to factor out the reading of Kafka messages from the subsequent processing of those messages. You can use partitions and consumer groups to make reading messages as fast as possible, but if you process the messages as part of your consumer logic then you'll just slow down your consumers. By streaming the messages from consumers to other classes that will perform your processing you can adjust the parallelism of the consumers and of the processors independently. You'll see this approach in technologies like Spark and Storm.
This approach does add one complication and that is that the consumer has to commit the message offset before the message has been processed. You may have to track the messages in flight to insure execute-exactly-once.