Does Kafka Streams library kill idle StreamThreads? - apache-kafka

Say, KStream topology is simple: input topic -> process -> output topic. Partitions of input topic = 4.
If there is a single instance of app running with num.stream.threads=4, all 4 StreamThreads are utilized.
If a second instance is launched (with num.stream.threads=4), stream tasks are now distributed between the two. Task 0_1 and 0_2 on first instance, Task 0_3 and 0_4 on second instance.
On first instance, does kafka streams library kill the threads which were running 0_3 and 0_4 so far?

For your case when input topic has only 4 partitions, what will happen When starting 8 instances with num.stream.threads=1?
4 instances become idle but not be killed. They are remaining and get assign a task if any of other already assigned instance goes down.
So, same thing happens when you start multiple treads in one instances. In your case, 8 treads in 2 instances 4 per each. same scenario happen I will explained above. 4 of your threads getting idle and remain idle until it is getting a task by going down other instance.
more reference :
streams-faq-scalability-maximum-parallelism
kafka-streams-internals-StreamThread
Let’s take an example. Imagine your application is reading from an input
topic that has 5 partitions. How many app instances can we run here?
The short answer is that we can run up to 5 instances of this
application, because the application’s maximum parallelism is 5. If we
run more than 5 app instances, then the “excess” app instances will
successfully launch but remain idle. If one of the busy instances goes
down, one of the idle instances will resume the former’s work.
You can see more information of threads by set up metrics referring this

Related

Difference between executing StreamTasks in the same instance v/s multiple instances

Say I have a topic with 3 partitions
Method 1: I run one instance of Kafka Streams, it starts 3 tasks [0_0,0_1,0_2] and each of these tasks consume from one partition.
Method 2: I spin up three instance of the same streams application, here again three tasks are started but now, it is distributed among the 3 instances that was created.
Which method is preferable and why?
In method 1 do all the tasks run as a part of the same thread, and in method 2, they run on different threads, or is it different?
Consider that the streams application has a very simple topology, and does only mapping of values from a single stream
By default, a single KafkaStreams instance runs one thread, thus in "Method 1" all three tasks are executed by a single thread. In "Method 2" each task is executed by its own thread. Note, that you can also configure multiple thread pre KafkaStreams instance via num.stream.threads configuration parameter. If you set it to 3 for "Method 1" both method are more or less the same. How many threads you need, depends on your workload, ie, how many messages you need to process per time unit and how expensive the computation is. It also depends on the hardware: for a single-core CPU, it may not make sense to configure more than one thread, but you should deploy multiple instances on multiple machines to get more hardware. Hence, if your workload is lightweight one single-threaded instance might be enough.
Also note, that you may be network bound. For this case, starting more thread would not help, but you want to scale out to multiple machines, too.
The last consideration is fault-tolerance. Even if a single thread/instance may be powerful enough to not lag, what should happen if the instance crashes? If you only have one instance, the whole computation goes down. If you run two instances, the second instance would take over all the work and your application stays online.

Storm+Kafka not parallelizing as expected

We are having an issue regarding parallelism of tasks inside a single topology. We cannot manage to get a good, fluent processing rate.
We are using Kafka and Storm to build a system with different topologies, where data is processed following a chain of topologies connected using Kafka topics.
We are using Kafka 1.0.0 and Storm 1.2.1.
The load is small in number of messages, about 2000 per day, however each task can take quite some time. One topology in particular can take a variable amount of time to process each task, usually between 1 and 20 minutes. If processed sequentially, the throughput is not enough to process all incoming messages. All topologies and Kafka system are installed in a single machine (16 cores, 16 GB of RAM).
As messages are independent and can be processed in parallel, we are trying to use Storm concurrent capabilities to improve the throughput.
For that the topology has been configured as follows:
4 workers
parallelism hint set to 10
Message size when reading from Kafka large enough to read about 8 tasks in each message.
Kafka topics use replication-factor = 1 and partitions = 10.
With this configuration, we observe the following behavior in this topology.
About 7-8 tasks are read in a single batch from Kafka by the Storm topology (task size is not fixed), max message size of 128 kB.
About 4-5 task are computed concurrently. Work is more-or-less evenly distributed among workers. Some workers take 1 task, others take 2 and process them concurrently.
As tasks are being finished, the remaining tasks start.
A starvation problem is reached when only 1-2 tasks remain to be processed. Other workers wait idle until all tasks are finished.
When all tasks are finished, the message is confirmed and sent to the next topology.
A new batch is read from Kafka and the process starts again.
We have two main issues. First, even with 4 workers and 10 parallelism hint, only 4-5 tasks are started. Second, no more batches are started while there is work pending, even if it is just 1 task.
It is not a problem of not having enough work to do, as we tried inserting 2000 tasks at the beginning, so there is plenty of work to do.
We have tried to increase the parameter "maxSpoutsPending", expecting that the topology would read more batches and queue them at the same time, but it seems they are being pipelined internally, and not processed concurrently.
Topology is created using the following code:
private static StormTopology buildTopologyOD() {
//This is the marker interface BrokerHosts.
BrokerHosts hosts = new ZkHosts(configuration.getProperty(ZKHOSTS));
TridentKafkaConfig tridentConfigCorrelation = new TridentKafkaConfig(hosts, configuration.getProperty(TOPIC_FROM_CORRELATOR_NAME));
tridentConfigCorrelation.scheme = new RawMultiScheme();
tridentConfigCorrelation.fetchSizeBytes = Integer.parseInt(configuration.getProperty(MAX_SIZE_BYTES_CORRELATED_STREAM));
OpaqueTridentKafkaSpout spoutCorrelator = new OpaqueTridentKafkaSpout(tridentConfigCorrelation);
TridentTopology topology = new TridentTopology();
Stream existingObject = topology.newStream("kafka_spout_od1", spoutCorrelator)
.shuffle()
.each(new Fields("bytes"), new ProcessTask(), new Fields(RESULT_FIELD, OBJECT_FIELD))
.parallelismHint(Integer.parseInt(configuration.getProperty(PARALLELISM_HINT)));
//Create a state Factory to produce outputs to kafka topics.
TridentKafkaStateFactory stateFactory = new TridentKafkaStateFactory()
.withProducerProperties(kafkaProperties)
.withKafkaTopicSelector(new ODTopicSelector())
.withTridentTupleToKafkaMapper(new ODTupleToKafkaMapper());
existingObject.partitionPersist(stateFactory, new Fields(RESULT_FIELD, OBJECT_FIELD), new TridentKafkaUpdater(), new Fields(OBJECT_FIELD));
return topology.build();
}
and config created as:
private static Config createConfig(boolean local) {
Config conf = new Config();
conf.setMaxSpoutPending(1); // Also tried 2..6
conf.setNumWorkers(4);
return conf;
}
Is there anything we can do to improve the performance, either by increasing the number of parallel tasks or/and avoiding starvation while finishing to process a batch?
I found an old post on storm-users by Nathan Marz regarding setting parallelism for Trident:
I recommend using the "name" function to name portions of your stream
so that the UI shows you what bolts correspond to what sections.
Trident packs operations into as few bolts as possible. In addition,
it never repartitions your stream unless you've done an operation
that explicitly involves a repartitioning (e.g. shuffle, groupBy,
partitionBy, global aggregation, etc). This property of Trident
ensures that you can control the ordering/semi-ordering of how things
are processed. So in this case, everything before the groupBy has to
have the same parallelism or else Trident would have to repartition
the stream. And since you didn't say you wanted the stream
repartitioned, it can't do that. You can get a different parallelism
for the spout vs. the each's following by introducing a repartitioning
operation, like so:
stream.parallelismHint(1).shuffle().each(…).each(…).parallelismHint(3).groupBy(…);
I think you might want to set parallelismHint for your spout as well as your .each.
Regarding processing multiple batches concurrently, you are right that that is what maxSpoutPending is for in Trident. Try checking in Storm UI that your max spout pending value is actually picked up. Also try enabling debug logging for the MasterBatchCoordinator. You should be able to tell from that logging whether multiple batches are in flight at the same time or not.
When you say that multiple batches are not processed concurrently, do you mean by ProcessTask? Keep in mind that one of the properties of Trident is that state updates are ordered with regard to batches. If you have e.g. maxSpoutPending=3 and batch 1, 2 and 3 in flight, Trident won't emit more batches for processing until batch 1 is written, at which point it'll emit one more. So slow batches can block emitting more, even if 2 and 3 are fully processed, they have to wait for 1 to finish and get written.
If you don't need the batching and ordering behavior of Trident, you could try regular Storm instead.
More of a side note, but you might want to consider migrating off storm-kafka to storm-kafka-client. It's not important for this question, but you won't be able to upgrade to Kafka 2.x without doing it, and it's easier before you get a bunch of state to migrate.

How many kafka streams app is recommended to run on single machine in production?

In our architecture, we are assuming to run three jvm processes on one machine (approx.) and each jvm machine can host upto 15 kafka-stream apps.
And if I am not wrong each kafka-stream app spawns one java thread. So, this seems like an awkward architecture to have with around 45 kafka-stream apps running on a single machine.
So, I have question in three parts
1) Is my understanding correct that each kafka-stream app spawns one java thread ? Also, each kafka-stream starts a new tcp connection with kafka-broker ?
2) Is there a way to share one tcp connection for multiple kafka-streams ?
3) Is is difficult(not recommended) to run 45 streams on single machine ?
The answer to this is definitely NO unless there is a real use case in production.
Multiple answers:
a KafkaStreams instance start one processing thread by default (you
can configure more processing threads, too)
internally, KafkaStreams uses two KafkaConsumers and one KafkaProducer
(if you turn on EOS, it uses even more KafkaProducers): a KafkaConsumer
starts a background heartbeat thread and a KafkaProducer starts a
background sender thread => you get 4 threads in total (processing, 2x
heartbeat, sender) -- if you configure two processing threads, you end
up with 8 threads in total, etc)
there is more than one TCP connection as the consumer and the producer
(and the restore consumer, if you enable StandbyTasks) connect to the
cluster
it's not possible to share any TPC connections atm (this would require
a mayor rewrite of consumers and producers)
how many threads you can efficient run depends on your hardware and
workload... monitor you CPU utilization and see how buys your machine is...
Each Kafka stream job spawns a single thread.If the thread number is
set as n numbers it will provide parallelism in processing n number
of Kafka partitions.
If a single machine does not have the capacity to run large number of
threads, parallelism can be achieved by submitting the Streams
applications job with same application name in another machine
in the same cluster. The job will be identified by Kafka
streams and handled in background.
is is difficult(not recommended) to run 45 streams on single machine
? The answer to this is definitely NO unless there is a real use
case in production.--unless your system has these many cores
or the input has 45 partition this is not necessary

Kafka Streams thread number

I am new to Kafka Streams, I am currently confused with the maximum parallelism of Kafka Streams application. I went through following link and did not get the answer what I am trying to find.
https://docs.confluent.io/current/streams/faq.html#streams-faq-scalability-maximum-parallelism
If I have 2 input topics, one have 10 partitions and the other have 5 partitions, and only one Kafka Streams application instance is running to process these two input topics, what is the maximum thread number I can have in this case? 10 or 15?
If I have 2 input topics, one have 10 partitions and the other have 5 partitions
Sounds good. So you have 15 total partitions. Let's assume you have a simple processor topology, without joins and aggregations, so that all 15 partitions are just being statelessly transformed.
Then, each of the 15 input partitions will map to a single a Kafka Streams "task". If you have 1 thread, input from these 15 tasks will be processed by that 1 thread. If you have 15 threads, each task will have a dedicated thread to handle its input. So you can run 1 application with 15 threads or 15 applications with 1 thread and it's logically similar: you process 15 tasks in 15 threads. The only difference is that 15 applications with 1 thread allows you to spread your load over across JVMs.
Likewise, if you start 15 instances of the application, each instance with 1 thread, then each application will be assigned 1 task, and each 1 thread in each application will handle its given 1 task.
what is the maximum thread number I can have in this case? 10 or 15?
You can set your maximum thread count to anything. If your thread count across all tasks exceeds the total number of tasks, then some of the threads will remain idle.
I recommend reading https://docs.confluent.io/current/streams/architecture.html#parallelism-model, if you haven't yet. Also, study the logs your application produces when it starts up. Each thread logs the tasks it gets assigned, like this:
[2018-01-04 16:45:26,859] INFO (org.apache.kafka.streams.processor.internals.StreamThread:351) stream-thread [entities-eb9c0a9b-ecad-48c1-b4e8-715dcf2afef3-StreamThread-3] partition assignment took 110 ms.
current active tasks: [0_0, 0_2, 1_2, 2_2, 3_2, 4_2, 5_2, 6_2, 7_2, 8_2, 9_2, 10_2, 11_2, 12_2, 13_2, 14_2]
current standby tasks: []
previous active tasks: []
Dmitry's answer does not seems to be completely correct.
Then, each of the 15 input partitions will map to a single a Kafka Streams "task"
Not in general. It depends on the "structure" of your topology. It could also be only 10 tasks.
Otherwise, excellent answer from Dmitry!

How to design task distribution with ZooKeeper

I am planning to write an application which will have distributed Worker processes. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a ephemeral node in the same path. Whoever is successful, becomes the leader.
Now, my question is how to design the process of distributing the tasks evenly? Any recipe for this?
I'll elaborate a little on the environment setup:
Suppose there are 10 worker maschines, each one runs a process, one of them become leader. Tasks are submitted in the queue, the Leader takes them and assigns to a worker. The worker processes gets notified whenever a tasks is submitted.
I am not sure I understand your algorithm for Leader election, but the recommended way of implementing this is to use sequential ephemeral nodes and use the algorithm at http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_leaderElection which explains how to avoid the "herd" effect.
Distribution of tasks can be done with a simple distributed queue and does not strictly need a Leader. The producer enqueues tasks and consumers keep a watch on the tasks node - a triggered watch will lead the consumer to take a task and delete the associated znode. There are certain edge conditions to consider with requeuing tasks from failed consumers. http://zookeeper.apache.org/doc/r3.3.3/recipes.html#sc_recipes_Queues
I would recommend the section Example: Master-Worker Application of this book ZooKeeper Distributed Process Coordination http://shop.oreilly.com/product/0636920028901.do
The example demonstrates to distribute tasks to worker using znodes and common zookeeper commands.
Consider using an actor singleton service pattern. For example, in Scala there is Akka which solves this class of problem with less code.