What is the meaning of 99th percentile latency and throughput - streaming

I've read some article, benchmarking the performance of stream processing engines like Spark streaming, Storm, and Flink. In the evaluation part, the criterion was 99th percentile and throughput. For example, Apache Kafka sent data at around 100.000 events per seconds and those three engines act as stream processor and their performance was described using 99th percentile latency and throughput.
Can anyone clarify these two criteria for me?

99th percentile latency of X milliseconds in stream jobs means that 99% of the items arrived at the end of the pipeline in less than X milliseconds. Read this reference for more details.
When application developers expect a certain latency, they often need
a latency bound. We measure several latency bounds for the stream
record grouping job which shuffles data over the network. The
following figure shows the median latency observed, as well as the
90-th, 95-th, and 99-th percentiles (a 99-th percentile of latency of
50 milliseconds, for example, means that 99% of the elements arrive at
the end of the pipeline in less than 50 milliseconds).

Related

How to identify Data skew in spark jobs via web /yarn UI?

We have a spark job (HDInsight) and its run time is increasing slowly over the last couple of months. Based on the spark UI , are there any indicators to say that there is dataskew thats why its performance is degrading ? below is stage details, please see the median and 75th percentile difference. How should i go about optimizing this job ? appreciate any guidance
Also what is the optimal value for spark.sql.shuffle.partitions given the input dataset size is around 12 GB and cluster has got 128 cores ?
Normally we use the following approach to identify possible stragglers and you have caught on to that:
When Max duration among completed tasks is significanly higher than
Median or 75th Percentile value, then this indicates the possibility
of Stragglers.
The median in your case tells enough.
You need to salt the key to distribute the data better - which can be complicated, or look at current partitioning approach.
This article https://medium.com/swlh/troubleshooting-stragglers-in-your-spark-application-47f2568663ec provides good guidance.

Higher concurrency vs lower concurrency

In doing a load test (by using Siege for example) for servers, is a lower concurrency number better?
What does this number signify?
The Siege docs go into detail on concurrency here: https://www.joedog.org/2012/02/17/concurrency-single-siege/
From that page:
The calculation is simple: total transactions divided by elapsed time. If we did 100 transactions in 10 seconds, then our concurrency was 10.00.
Higher concurrency measure CAN mean that your server is handling more connections faster but it can also mean that your server is falling behind on calculations and causing connections to be queued. So your concurrency measure is only valuable when taken in context of time elapsed.

How long should a weighted average take on Apache Ignite?

I am currently benchmarking Appache Ignite for a near real-time application and simple operations seem to be excessively slow for a relatively small sample size. The following is giving the setup details and timings - please see 2 questions at the bottom.
Setup:
Cache mode: Partitioned
Number of server nodes: 3
CPUs: 4 per node (12)
Heap size: 2GB per node (6GB)
The first use case is computing the weighted average over two fields of the object at different rates.
First method is to run a SQL style query:
...
query = new SqlFieldsQuery("select SUM(field1*field2)/SUM(field2) from MyObject");
cache.query(query).getAll();
....
The observed timings are:
Cache: 500,000 Queries/second: 10
Median: 428ms, 90th percentile: 13,929ms
Cache: 500,000 Queries/second: 50
Median: 191,465ms, 90th percentile: 402,285ms
Clearly this is queuing up with an enormous latency (>400 ms), a simple weighted average computation on a single jvm (4 Cores) takes 6 ms.
The second approach is to use the IgniteCompute to broadcast Callables across nodes and compute the weighted average on each node, reducing at the caller, latency is only marginally better, throughput improves but still at unusable levels.
Cache: 500,000 Queries/second: 10
Median: 408ms, 90th percentile: 507ms
Cache: 500,000 Queries/second: 50
Median: 114,155ms, 90th percentile: 237,521ms
A few things i noticed during the experiment:
No disk swapping is happening
CPUs run at up to 400%
Query is split up in two different weighted averages (map reduce)
Entries are evenly split across the nodes
No garbage collections are triggered with each heap size around 500MB
To my questions:
Are these timings expected or is there some obvious setting i am missing? I could not find benchmarks on similar operations.
What is the advised method to run fork-join style computations on ignite without moving data?
This topic was discussed in detail on Apache Ignite user forum: http://apache-ignite-users.70518.x6.nabble.com/Ignite-performance-td6703.html

How interpret Storm UI statistic

I have some problems with interpreting storm UI stats.
I've deployed a simple topology with Kafka spout and Custom Bolt that do nothing, but acknowledge tuples. The idea was to measure performance of Kafka spout.
So, looking at this stats I've got some questions.
1) In last 10 min topology acked 1619220 tuples and complete latency was 14.125ms. But if you make some calculations:
(1619220 * 14.125) / (1000 * 60) = 381
1619220 tuples with complete latency 14.125 each, requires 381 min to pass topology. But that stats was on last 10 minutes. Complete latency shows wrong number, or I interpret it wrong?
2) Bolt capacity is around 0.5. Does it proof that bottleneck is kafka spout?
I will appreciate any information about improving storm topology perfomace, since it's not obvious for me right now.
As tuples processing does overlap for many tuples in-flight due to parallelism, your calculation is not correct. (It would only be correct, if Storm would process a single tuple completely before starting with the next tuple.)
A capacity on 0.5 is not really an overload. Overload would be more than 1. But I would suggest to try to keep it below 0.9 or maybe 0.8 to get some head room for spikes in you input rate.

How to calculate bandwidth requirments based upon flows per minute (fpm)?

I want to know how can one calculate bandwidth requirements based upon flows and viceversa.
Meaning if I had to achieve total of 50,000 netflows what is the bandwidth requirement to produce this number? Is there a formula for this. I'm using this to size up flow analyzer appliance. If its license says supports 50,000 flows what does this means. How more bandwidth if i increase I would lose the license coverage?
Most applications and appliances list flow volume per second, so you are asking what the bandwidth requirement is to transport 50k netflow updates per second:
For NetFlow v5 each record is 48 bytes each with each packet carrying 20 or so records, with 24 bytes overhead per packet. This means you'd use about 20Mbps to carry the flow packets. If you used NetFlow v9 which uses a template based format it might be a bit more, or less, depending on what is included in the template.
However, if you are asking how much bandwidth you can monitor with 50k netflow updates per second, the answer becomes more complex. In our experience monitoring regular user networks (using google, facebook, etc) an average flow update encompasses roughly 25kbytes of transferred data. Meaning 50,000 flow updates per second would equate to monitoring 10Gbps of traffic. Keep in mind that these are very rough estimates.
This answer may vary, however, if you are monitoring a server-centric network in a datacenter, where each flow may be much larger or smaller, depending on content served. The bigger each individual traffic session is, the higher the bandwidth will be you can monitor with 50,000 flow updates per second.
Finally, if you enable sampling in your NetFlow exporters you will be able to monitor much more bandwidth. Sampling will only use 1 in every N packets, thus skipping many flows. This lowers the total number of flow updates per second needed to monitor high-volume networks.