Higher concurrency vs lower concurrency - rest

In doing a load test (by using Siege for example) for servers, is a lower concurrency number better?
What does this number signify?

The Siege docs go into detail on concurrency here: https://www.joedog.org/2012/02/17/concurrency-single-siege/
From that page:
The calculation is simple: total transactions divided by elapsed time. If we did 100 transactions in 10 seconds, then our concurrency was 10.00.
Higher concurrency measure CAN mean that your server is handling more connections faster but it can also mean that your server is falling behind on calculations and causing connections to be queued. So your concurrency measure is only valuable when taken in context of time elapsed.

Related

What is the meaning of 99th percentile latency and throughput

I've read some article, benchmarking the performance of stream processing engines like Spark streaming, Storm, and Flink. In the evaluation part, the criterion was 99th percentile and throughput. For example, Apache Kafka sent data at around 100.000 events per seconds and those three engines act as stream processor and their performance was described using 99th percentile latency and throughput.
Can anyone clarify these two criteria for me?
99th percentile latency of X milliseconds in stream jobs means that 99% of the items arrived at the end of the pipeline in less than X milliseconds. Read this reference for more details.
When application developers expect a certain latency, they often need
a latency bound. We measure several latency bounds for the stream
record grouping job which shuffles data over the network. The
following figure shows the median latency observed, as well as the
90-th, 95-th, and 99-th percentiles (a 99-th percentile of latency of
50 milliseconds, for example, means that 99% of the elements arrive at
the end of the pipeline in less than 50 milliseconds).

Benefits of using aggressive timeouts with reactive programming

In the blocking world, it is highly recommended to set aggressive timeouts in order to fail fast and release the underlying resources (Section 5.1 of https://pragprog.com/book/mnee/release-it).
In the async/non-blocking world, requests are not blocking the main thread and the resources are available immediately for further processing. Timeouts are still necessary, however does it still make sense to set aggressive values?
In real-time software, network requests or control operations on machinery take a large amount of time in comparison to day-to-day software operations. For instance, telling a step motor to advance to a particular position may take seconds, while normal operations might take milliseconds. Let's say that a typical step motor advance takes n milliseconds, and one that goes the maximum distance takes m milliseconds.
An aggressive timeout would compute n and add a small fudge factor, perhaps 10%, and fail quickly if the goal wasn't reached in that time. As you stated, the aggressive timeout will allow you to release resources. A non-aggressive timeout of m plus epsilon would fail much more slowly, and tie up resources unnecessarily.
In the asynchronous software world, there a number of other choices between success and failure. An asynchronous operation might also calculate n plus 10%, and put up a progress bar (if user feedback is desired) and then show progress towards the estimated goal's end. When the timeout is reached, the progress bar would be full, but you might cause it to pulse or change color to indicate it was taking longer than expected. If the step motor still had not reached its goal after m milliseconds, then you could announce a failure.
In other cases, when the feedback is not important, then you could certainly use m plus epsilon as your timeout.

Burst limit of IBM API Connect

I found "rate limit" and "burst limit" at Design section of API Designer,
What is the difference of them?
Rate limit can be set at second, minute, hour, day an week time interval.
On the other hand, burst limit can be set only second and minute time interval.
Does it mean same to set 1/1sec rate limit and to set 1/1sec burst limit?
Different Plans can have differing rate limits, both between operations and for the overall limit. This is useful for providing differing levels of service to customers. For example, a "Demo Plan" might enforce a rate limit of ten calls per minute, while a "Full Plan" might permit up to 1000 calls per second.
You can apply burst limits to your Plans, to prevent usage spikes that might damage infrastructure. Multiple burst limits can be set per Plan, at second and minute time intervals.
That said, these two parameters have a different meaning and could be used together. E.g.: I want to permit a total of 1000 calls per hour (rate limit) and a maximum spike of 50 calls per second (burst limit).
Rate limit enforce how many calls (total) are possible for a given time frame. After that the calls are not possible anymore. This is to create staged plans with different limits and charges (like e.g. entry or free, medium, enterprise).
Burst limits are used to manage, e.g., system load by capping the maximum calls for a moment (hence seconds or minutes), to prevent usage spikes. They can be used to make sure the allowed number of API calls (the rate limit) is evenly spread across the set time frame (day, week, month). They can also be used to protect the backend system from overloading.
So you could set a rate limit of 1000 API calls for a week and the burst limit to 100 calls a minute. If there were 10 "heavy" minutes, the entire rate would have been consumed. An user could also use 100+ calls per day to reach the 1000 calls a week.

Flynn's Bottleneck - maximum speedup 2

According to Flynn's Bottleneck, the speedup due to instruction level parallelism (ILP) can be at best 2. Why is it so?
That version of Flynn's Bottleneck originates in Detection and Parallel Execution of Independent Instructions where the authors empirically conclude that ILP for most programs is less than 2. That was 1970 technology and that was an empirical conclusion. You can contrast it with Fisher's Optimism which said there was lots of ILP out there and proposed trace scheduling and VLIW to exploit it.
So the literal answer to your question is because that's what they measured within basic blocks back then.
The ILP less than 2 meaning isn't really used anymore because superscalars and better compilers have blown past the number 2. So instead, over time Flynn's Bottleneck has come to mean You cannot retire more than you fetch which stems from his earlier paper Some Computer Organizations and Their Effectiveness.
The execution bandwidth of a system is usually referred to as being
the maximum number of operations that can be performed per unit time
by the execution area. Notice that due to bottlenecks in issuing
instructions, for example, the execution bandwidth is usually
substantially in excess of the maximum performance of a system.

Performance degrades, if number of threads is more than 2 on Xeon X5355

I have a strange problem but may not be that much strange to some of you.
I am writing an application using boost threads and using boost barriers to synchronize the threads. I have two machines to test the application.
Machine 1 is a core2 duo (T8300) cpu machine (windows XP professional - 4GB RAM) where I am getting following performance figures :
Number of threads :1 , TPS :21
Number of threads :2 , TPS :35 (66 % improvement)
further increase in number of threads decreases the TPS but that is understandable as the machine has only two cores.
Machine 2 is a 2 quad core ( Xeon X5355) cpu machine (windows 2003 server with 4GB RAM) and has 8 effective cores.
Number of threads :1 , TPS :21
Number of threads :2 , TPS :27 (28 % improvement)
Number of threads :4 , TPS :25
Number of threads :8 , TPS :24
As you can see, performance is degrading after 2 threads (though it has 8 cores). If the program has some bottle neck , then for 2 thread also it should have degraded.
Any idea? , Explanations ? , Does the OS has some role in performance ? - It seems like the Core2duo (2.4GHz) scales better than Xeon X5355 (2.66GHz) though it has better clock speed.
Thank you
-Zoolii
The clock speed and the operating system doesn't have as much to do with it as the way your code is written. Things to check might include:
Are you actually spinning up more than two threads at one time?
Do you have unnecessary synchronization artifacts in your code?
Are you synchronizing your code at the appropriate places?
What is your shareable resource and how many of then are there? If each of your transactions is relying on a single section of code, native library, file, database, whatever, then it doesn't matter how many CPUs you've got.
One tool at your disposal when analyzing software bottlenecks is the simple thread dump. Taking a few dumps throughout the life of an execution of your software should expose bottlenecks in your software. You may be able to take that output and use it to reevaluate your code.
Adding more CPU's does not always equate to better performance, locking and contention can severely degrade performance. Factors to consider are:
Is your algorithm suited to parallelisation?
Any inherently sequential portions of code?
Can you partition work into coarse grained 'chunks'? Corase is usually better than fine grained...
Can you alter your code to use less locking?
Synchronisation overheads can often be reduced by ensuring chunks of work are similiar sized.
Based on experience it could be that the Intel policy is 2 threads or dual-process only on that processor, that only pthreads can be used with that version of operating system, that the two processors were designed to conform to different laws with different provisions or allows, that the own thread process is not allowed, that more than n threads are being backed-out by the processor and the processing of error messages reporting this is slowing down throughput of the two cores and may lead to deactivate of cores 3 and 4.