I'm using Heron for performing streaming analytics on IoT data. Currently in the architecture there is only one spout with parallelism factor 1.
I'm trying to benchmark the stats on the amount of data Heron can hold in the queue which it internally uses at spout.
I'm playing around with the method setMaxSpoutPending() by passing value to it. I want to know if there is any limit on the number which we pass to this method?
Can we tweak the parameter method by increasing system configuration or providing more resource to the topology?
So if you have one spout and one bolt, then max spout pending is the best way to control the number of pending tuples.
Max Spout pending can be increased indefinitely. However increasing it beyond a certain amount increases the probability of timeout errors happening and in the worst case there could be no forward progress. Also higher msp typically require more heap required for spout and other components of the topology.
MSP is used to control the topology ingestion rate; it tells Storm the maximum number of tuples that may be unacknowledged at any given time. If the MSP is lower than the parallelism of the topology, it can be a bottle neck. On the other hand, increasing MSP beyond the topology parallelism level can lead to the topology being 'flooded' and unable to keep up with the inbound tuples. In such a situation the 'message timeout' of the topology will be exceeded and Storm will attempt to replay them while still feeding new tuples. Storm will stop feeding new inbound tuples only when the MSP limit is reached.
So yes, you can tweak it but keep an eye out for increasing timed out tuples indicating that your topology is overwhelmed.
BTW, if you're processing IoT events you may be able to increase parallelism by grouping the spout tuples by the device id (tuple stream per device) using field grouping.
Related
like how re-hash after adding nodes. Will there be some period of time of unavailability?
There are no official documents for this answer, so I wonder if Thingsboard supports smooth dynamic expansion.
ThingsBoard Rule Engine uses Zookeeper to detect the addition/deletion of sibling nodes. The data stream of Rule Engine messages is stored in Kafka topic with a configurable amount of partitions. When you add the node, the rule engine PartitionService.recalculatePartitions method is executed on each node to get the list of partitions that this node is responsible for. So, old Rule Engine nodes will stop consuming a certain partition and the new rule engine node will start consuming those partitions. During a short period of time, you may notice a slight degradation in processing speed (due to the warm-up of the new node) but this should not affect the data processing logic itself.
As a side effect, some of the messages may be processed twice in this case. For example, rule engine A polled 1000 messages from the queue but has not committed the offset yet. The messages are now traveling through the rule chains. Then the 2nd node started and starts reading the same partition. It will start processing the uncommitted messages again.
If you must avoid duplicates in processing - configure your queue to process messages one-by-one (Sequentially, with the size of the pack equal to 1). Although this will cause performance degradation.
I need to consume messages from a event source (represented as a single Kafka topic) producing about 50k to 250k events per second. It only provides a single partition and the ping is quite high (90-100ms).
As far as I have learned by reading the Kafka client code, during polling a fetch request is issued and once the response is fully read, the events/messages are parsed and deserialized and once enough events/messages are available consumer.poll() will provide the subset to the calling application.
In my case this makes the whole thing not worth while. The best throughput I achieve with about 2s duration per fetch request (about 2.5MB fetch.max.bytes). Smaller fetch durations will increase the idle time (time the consumer does not receive any bytes) between last byte of previous response, parsing, deserialization and sending next request and waiting for the first byte of the next request's response.
Using a fetch duration of about 2s results in a max latency of 2s which is highly undesirable. What I would like to see is while receiving the fetch response, that the messages transmitted are already available to the consumer as soon as a individual message is fully transmitted.
Since every message has an individual id and the messages are send in a particular order while only a single consumer (+thread) for a single partition is active, it is not a problem to suppress retransmitted messages in case a fetch response is aborted / fails and its messages were partially processed and later on retransmitted.
So the big question is, if the Kafka client provides a possibility to consume messages from a not-yet completed fetch response.
That is a pretty large amount of messages coming in through a single partition. Since you can't control anything on the Kafka server, the best you can do is configure your client to be as efficient as possible, assuming you have access to Kafka client configuration parameters. You didn't mention anything about needing to consume the messages as fast as they're generated, so I'm assuming you don't need that. Also I didn't see any info about average message size, how much message sizes vary, but unless those are crazy values, the suggestions below should help.
The first thing you need to do is set max.poll.records on the client side to a smallish number, say, start with 10000, and see how much throughput that gets you. Make sure to consume without doing anything with the messages, just dump them on the floor, and then call poll() again. This is just to benchmark how much performance you can get with your fixed server setup. Then, increase or decrease that number depending on if you need better throughput or latency. You should be able to get a best scenario after playing with this for a while.
After having done the above, the next step is to change your code so it dumps all received messages to an internal in-memory queue, and then call poll() again. This is especially important if processing of each message requires DB access, hitting external APIs, etc. If you take even 100ms to process 1K messages, that can reduce your performance in half in your case (100ms to poll/receive, and then another 100ms to process the messages received before you start the next poll())
Without having access to Kafka configuration parameters on the server side, I believe the above should get you pretty close to an optimal throughput for your configuration.
Feel free to post more details in your question, and I'd be happy to update my answer if that doesn't help.
To deal with such a high throughput, this is what community recommendations for number of partitions on a source topic. And it is worth considering all these factors when choosing the number of partitions.
• What is the throughput you expect to achieve for the topic?
• What is the maximum throughput you expect to achieve when
consuming from a single partition?
• If you are sending messages to partitions based on keys,
adding partitions later can be very challenging, so calculate
throughput based on your expected future usage, not the current
usage.
• Consider the number of partitions you will place on each
broker and available diskspace and network bandwidth per
broker.
So if you want to be able to write and read 1 GB/sec from a topic, and each consumer can only process 50 MB/s, then you need at least 20 partitions. This way, you can have 20 consumers reading from the topic and achieve 1 GB/sec.
Also,
Regarding the fetch.max.bytes, I am sure you have already had a glance on this one Kafka fetch max bytes doesn't work as expected.
We are trying to benchmark the performance in our Storm Topology. We are ingesting messages around 1000/seconds to Kafka Topic. When we put max.spout.pendind=2000 in our KafkaSpout then we don't see any failed messages in storm UI but when we decrease the max.spout.pendind value to 500 or 100, then we see lot of failed messages in spout in Storm UI. My understanding is that if we keep the max.spout.pending low then we will not have any failed messages as nothing will timeout but it behaving in opposite manner. We are using Storm 1.1.0 version from HDP 2.6.5 version.
We have one Kafka Spout and two bolts .
KafkaSpout Parallelism - 1
Processing Bolt Parallelism - 1
Custom Kafka Writer Bolt Parallelism - 1
Could anyone have any idea about this?
The first you will have to do is on the storm UI check the statistics of the latency. You should also look at how the bolts/spouts are loaded (capacity statistics).
Is the rate of emit of tuples really high compared to the rate of sinking this data ? , This is an indication that i get when you mention that increasing pending spouts is fixing the issue. Can you provide these stats .. Another part worth exploring is increasing the task time out on the tuples (to see if this is causing replay and flooding the topology)
Please find the below topology stats :
This is interesting. You are right, follow my steps to narrow down the issue,
Upload a screenshot of your Topology Visualization screen on peek load.
Check for the bolts which are changing their color to brown/red. Red is indicating that your bolt takes too much time to process records.
Your spout/bolt executors are way less to process 1K tuples per second.
Number of machines you are using?
If tuples are failed in "KafkaSpout" then most of the time it means timeout error.
Find out after processing how many events tuples are failing.
We are using spark-streaming-kafka-0-8 Receivers. We are not able to increase the amount of consumed events by increasing numPartitions. It seems increasing numPartitions doesn't affect the performance.
The KafkaUtils.createStream method has a topic_name to numPartitions map, while each partition should be consumed in its own thread.
Currently we are working with:
KafkaUtils.createStream[Integer, Event, IntegerDecoder, EventDecoder](ssc,
Configuration.kafkaConfig, scala.collection.immutable.Map(topic -> 1),
StorageLevel.MEMORY_AND_DISK)
I would expect using scala.collection.immutable.Map(topic -> 10) will pull much more events than when using 1 thread, but it doesn't improve the performance (I made sure that 10 threads are in fact used per receiver)
However, If I create more Kafka receivers (which from my understanding is exactly equivalent to increasing threads) the performance does improve.
Is this a problem with version 0-8?
Should increasing numPartitions improve amount of consumed events?
Why does adding receivers improve performance while increasing numPartition doesn't?
Is this a problem with version 0-8?
No, it is a "problem" with the receiver based approach, which is what you're using with createStream. The said approach will create a single thread for consumption on a given executor node. If you want to read concurrently, you have to create multiple such receivers.
Per the documentation:
Topic partitions in Kafka does not correlate to partitions of RDDs
generated in Spark Streaming. So increasing the number of
topic-specific partitions in the KafkaUtils.createStream() only
increases the number of threads using which topics that are consumed
within a single receiver. It does not increase the parallelism of
Spark in processing the data
If you want to increase concurrency, use the direct (receiverless) based approach (using KafkaUtils.createDirectStream) which dispatches a each TopicPartition to a given executor node for consumption, thus allowing all executors to participate in consuming from Kafka
This is a question regarding how Storm's max spout pending works. I currently have a spout that reads a file and emits a tuple for each line in the file (I know Storm is not the best solution for dealing with files but I do not have a choice for this problem).
I set the topology.max.spout.pending to 50k to throttle how many tuples get into the topology to be processed. However, I see this number not having any effect in the topology. I see all records in a file being emitted every time. My guess is this might be due to a loop I have in the nextTuple() method that emits all records in a file.
My question is: Does Storm just stop calling nextTuple() for the Spout task when topology.max.spout.pending is reached? Does this mean I should only emit one tuple every time the method is called?
Exactly! Storm can only limit your spout with the next command, so if you transmit everything when you receive the first next, there is no way for Storm to throttle your spout.
The Storm developers recommend emitting a single tuple with a single next command. The Storm framework will then throttle your spout as needed to meet the "max spout pending" requirement. If you're emitting a high number of tuples, you can batch your emits to at most a tenth of your max spout pending, to give Storm the chance to throttle.
Storm topologies have a max spout pending parameter. The max
spout pending value for a topology can be configured via the
“topology.max.spout.pending” setting in the topology
configuration yaml file. This value puts a limit on how many
tuples can be in flight, i.e. have not yet been acked or failed, in a
Storm topology at any point of time. The need for this parameter
comes from the fact that Storm uses ZeroMQ to dispatch
tuples from one task to another task. If the consumer side of
ZeroMQ is unable to keep up with the tuple rate, then the
ZeroMQ queue starts to build up. Eventually tuples timeout at the
spout and get replayed to the topology thus adding more pressure
on the queues. To avoid this pathological failure case, Storm
allows the user to put a limit on the number of tuples that are in
flight in the topology. This limit takes effect on a per spout task
basis and not on a topology level.(source) For cases when the spouts are
unreliable, i.e. they don’t emit a message id in their tuples, this
value has no effect.
One of the problems that Storm users continually face is in
coming up with the right value for this max spout pending
parameter. A very small value can easily starve the topology and
a sufficiently large value can overload the topology with a huge
number of tuples to the extent of causing failures and replays.
Users have to go through several iterations of topology
deployments with different max spout pending values to find the
value that works best for them.
One solution is to build the input queue outside the nextTuple method and the only thing to do in nextTuple is to poll the queue and emit. If you are processing multiple files, your nextTuple method should also check if the result of polling the queue is null, and if yes, atomically reset the source file that is populating your queue.