AWS API Gateway difference between method throttling and overall Burst/Rate - aws-api-gateway

I've multiple APIs defined in my APIG. For each of my clients, i've created a separate usage plan. In the usage plans tab in the APIG AWS console, I clicked on Add API Stage and configured per method throttling for each of my APIs for a client in the particular stage.
However unless I add the overall Rate & Burst value clients are unable to call APIG as an error is thrown saying no rate configured.
For ex: If I've two APIs /EP1 and /EP2
For client1 I set the following rates:
/EP1 -> Rate: 30, Burst: 50
/EP2 -> Rate: 30, Burst: 50
For client2 I set :
/EP1 -> Rate: 10, Burst: 10,
/EP2 -> Rate: 20, Burst: 20
Should the overall rate/burst for the 2 clients be the sum of each of the rates? i.e.
For Client1: Rate: 60, Burst 100
For Client2: Rate: 30, Burst: 30


Will Kafka Producer always waits for the value specified by, before sending a request?

As per LINGER_MS_DOC in ProducerConfig java class:
"The producer groups together any records that arrive in between
request transmissions into a single batched request. Normally this
occurs only under load when records arrive faster than they can be
sent out. However in some circumstances the client may want to reduce
the number of requests even under moderate load. This setting
accomplishes this by adding a small amount of artificial delay; that
is, rather than immediately sending out a record the producer will
wait for up to the given delay to allow other records to be sent so
that the sends can be batched together. This can be thought of as
analogous to Nagle's algorithm in TCP. This setting gives the upper
bound on the delay for batching: once we get "BATCH_SIZE_CONFIG" worth
of records for a partition it will be sent immediately regardless of
this setting, however if we have fewer than this many bytes
accumulated for this partition we will 'linger' for the specified time
waiting for more records to show up. This setting defaults to 0 (i.e.
no delay). Setting "LINGER_MS_CONFIG=5" for example, would have the
effect of reducing the number of requests sent but would add up to 5ms
of latency to records sent in the absence of load."
I searched for a suggested value for but nowhere found a higher value suggested for this. Most of the places 5 ms is mentioned for
For testing, I have set "batch.size" to 16384 (16 KB)
and "" to 60000 (60 seconds)
as per doc I felt if I send a message of size > 16384 bytes then the producer will not wait and send the message immediately, but I am not observing the same behavior.
I am sending events of size > 16384 bytes but it still waits for 60 seconds. Am I missing to understand the purpose of "batch.size"? My understanding of "batch.size" and "" is that whichever meets first the messages/batch will be sent.
In this case, if it is going to be the minimum wait time and do not give preference to "batch.size" then I guess setting a high value for is not right.
Here is the kafka properties used in yaml:
acks: all
retries: 5
size: 16384
ms: 10
size: 1046528

Azure Maps Routing Not Accurate For Duration

If I use Bing Maps Api to calculate a journey from A to B at a specific time where I know of heavy traffic on this route I get an accurate journey duration of 24 Min delay due to heavy traffic 44 Mins in total. If I use the Azure Maps Routing Api with
and exact same time departure date time I am not getting an traffic delay I get duration of 20.9 Mins. I understand that the data comes from Tom Tom which is different from Bing. It seems that the Azure routing is just not accurate when compared to Bing. May be I am doing something wrong?
Here is my example Monday 14th Jan 2019 07:30 in Azure Maps using postman:,-1.117809:50.850064,-1.071691&departAt=2019-01-14T07:30:00&travelMode=car&&traffic=true
Any non holiday monday is fine the route has to be in the future. This route is very traffic congested at this time 07:30.
If the put the same route into Bing maps you traveltime is
58 mins with 30 mins due to traffic.
With azure routing:
"routes": [
"summary": {
"lengthInMeters": 19357,
"travelTimeInSeconds": 2166,
"trafficDelayInSeconds": 0,
"departureTime": "2019-01-14T07:30:00Z",
"arrivalTime": "2019-01-14T08:06:05Z"
30 mins and no delay due to traffic.
Not getting any delay due to trafiic!
TomTom result does not explicitly show delays. Delays resulting from historic travel information are however included in travel time. As comparison from of Bing and TomTom routes (Start: 50.795225,-1.117809 ,Destination: 50.850064,-1.071691 , Departure: Jan 14 2019, 07:30). Results:
Route length; 21 km
Travel time: 41 min
Delay: 11 min
Azure Maps/TomTom:
Route length; 19,35 km
Travel time: 36 min
Delay: 0 min
To get delay resulting from historic traffic information, routing parameter "&computeTravelTimeFor=all" needs to be added. This will not directly return delay from historic traffic, but travel times without any delays, travel time including delays from historic traffic info, travel time including delays from historic plus live traffic info

Akka Http Performance tuning

I am performing Load testing on Akka-http framework(version: 10.0), I am using wrk tool.
wrk command:
wrk -t6 -c10000 -d 60s --timeout 10s --latency http://localhost:8080/hello
first run without any blocking call,
object WebServer {
implicit val system = ActorSystem("my-system")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
def main(args: Array[String]) {
val bindingFuture = Http().bindAndHandle(router.route, "localhost", 8080)
s"Server online at http://localhost:8080/\nPress RETURN to stop...")
StdIn.readLine() // let it run until user presses return
.flatMap(_.unbind()) // trigger unbinding from the port
.onComplete(_ => system.terminate()) // and shutdown when done
object router {
implicit val executionContext = WebServer.executionContext
val route =
path("hello") {
get {
complete {
output of wrk:
Running 1m test # http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.22ms 16.41ms 2.08s 98.30%
Req/Sec 9.86k 6.31k 25.79k 62.56%
Latency Distribution
50% 3.14ms
75% 3.50ms
90% 4.19ms
99% 31.08ms
3477084 requests in 1.00m, 477.50MB read
Socket errors: connect 9751, read 344, write 0, timeout 0
Requests/sec: 57860.04
Transfer/sec: 7.95MB
Now if i add a future call in the route and run the test again.
val route =
path("hello") {
get {
complete {
Future { // Blocking code
Output, of wrk:
Running 1m test # http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 527.07ms 491.20ms 10.00s 88.19%
Req/Sec 49.75 39.55 257.00 69.77%
Latency Distribution
50% 379.28ms
75% 632.98ms
90% 1.08s
99% 2.07s
13744 requests in 1.00m, 1.89MB read
Socket errors: connect 9751, read 385, write 38, timeout 98
Requests/sec: 228.88
Transfer/sec: 32.19KB
As you can see with future call only 13744 requests are being served.
After following Akka documentation, I added a separate dispatcher thread pool for the route which creates max, of 200 threads.
implicit val executionContext = WebServer.system.dispatchers.lookup("my-blocking-dispatcher")
// config of dispatcher
my-blocking-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
// or in Akka 2.4.2+
fixed-pool-size = 200
throughput = 1
After the above change, the performance improved a bit
Running 1m test # http://localhost:8080/hello
6 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 127.03ms 21.10ms 504.28ms 84.30%
Req/Sec 320.89 175.58 646.00 60.01%
Latency Distribution
50% 122.85ms
75% 135.16ms
90% 147.21ms
99% 190.03ms
114378 requests in 1.00m, 15.71MB read
Socket errors: connect 9751, read 284, write 0, timeout 0
Requests/sec: 1903.01
Transfer/sec: 267.61KB
In the my-blocking-dispatcher config if I increase the pool size above 200 the performance is same.
Now, what other parameters or config should I use to increase the performance while using future call.So that app gives the maximum throughput.
Some disclaimers first: I haven't worked with wrk tool before, so I might get something wrong. Here are assumptions I've made for this answer:
Connections count is independent from threads count, i.e. if I specify -t4 -c10000 it keeps 10000 connections, not 4 * 10000.
For every connection the behavior is as follows: it sends the request, receives the response completely, and immediately sends the next one, etc., until the time runs out.
Also I've run the server on the same machine as wrk, and my machine seems to be weaker than yours (I have only dual-core CPU), so I've reduced wrk's thread counts to 2, and connection count to 1000, to get decent results.
The Akka Http version I've used is the 10.0.1, and wrk version is 4.0.2.
Now to the answer. Let's look at the blocking code you have:
Future { // Blocking code
This means, every request will take at least 100 milliseconds. If you have 200 threads, and 1000 connections, the timeline will be as follows:
Msg: 0 200 400 600 800 1000 1200 2000
Ms: 0 100 200 300 400 500 600 1000
Where Msg is amount of processed messages, Ms is elapsed time in milliseconds.
This gives us 2000 messages processed per second, or ~60000 messages per 30 seconds, which mostly agrees to the test figures:
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test # http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 412.30ms 126.87ms 631.78ms 82.89%
Req/Sec 0.95k 204.41 1.40k 75.73%
Latency Distribution
50% 455.18ms
75% 512.93ms
90% 517.72ms
99% 528.19ms
here: --> 56104 requests in 30.09s <--, 7.70MB read
Socket errors: connect 0, read 1349, write 14, timeout 0
Requests/sec: 1864.76
Transfer/sec: 262.23KB
It is also obvious that this number (2000 messages per second) is strictly bound by the threads count. E.g. if we would have 300 threads, we'd process 300 messages every 100 ms, so we'd have 3000 messages per second, if our system can handle so many threads. Let's see how we'll fare if we provide 1 thread per connection, i.e. 1000 threads in pool:
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test # http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 107.08ms 16.86ms 582.44ms 97.24%
Req/Sec 3.80k 1.22k 5.05k 79.28%
Latency Distribution
50% 104.77ms
75% 106.74ms
90% 110.01ms
99% 155.24ms
223751 requests in 30.08s, 30.73MB read
Socket errors: connect 0, read 1149, write 1, timeout 0
Requests/sec: 7439.64
Transfer/sec: 1.02MB
As you can see, now one request takes almost exactly 100ms on average, i.e. the same amount we put into Thread.sleep. It seems we can't get much faster than this! Now we're pretty much in standard situation of one thread per request, which worked pretty well for many years until the asynchronous IO let servers scale up much higher.
For the sake of comparison, here's the fully non-blocking test results on my machine with default fork-join thread pool:
complete {
Future {
wrk -t2 -c1000 -d 30s --timeout 10s --latency http://localhost:8080/hello
Running 30s test # http://localhost:8080/hello
2 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 15.50ms 14.35ms 468.11ms 93.43%
Req/Sec 22.00k 5.99k 34.67k 72.95%
Latency Distribution
50% 13.16ms
75% 18.77ms
90% 25.72ms
99% 66.65ms
1289402 requests in 30.02s, 177.07MB read
Socket errors: connect 0, read 1103, write 42, timeout 0
Requests/sec: 42946.15
Transfer/sec: 5.90MB
To summarize, if you use blocking operations, you need one thread per request to achieve the best throughput, so configure your thread pool accordingly. There are natural limits for how many threads your system can handle, and you might need to tune your OS for maximum threads count. For best throughput, avoid blocking operations.
Also don't confuse asynchronous operations with non-blocking ones. Your code with Future and Thread.sleep is a perfect example of asynchronous, but blocking operation. Lots of popular software operates in this mode (some legacy HTTP clients, Cassandra drivers, AWS Java SDKs, etc.). To fully reap the benefits of non-blocking HTTP server, you need to be non-blocking all the way down, not just asynchronous. It might not be always possible, but it's something to strive for.
I get x3 performance on my localhost with this config:
akka {
actor {
default-dispatcher {
fork-join-executor {
parallelism-min = 1
parallelism-max = 64
parallelism-factor = 1
throughput = 64
http {
host-connection-pool {
max-connections = 10000
max-open-requests = 4096
server {
pipelining-limit = 1024
max-connections = 4096
backlog = 1024
Maybe other values for these params will make even better (write to me pls if yes).
Akka Http version 10.1.12.

Stability testing of REST API

I have a registration REST API which i want to test as -
Register 15000 users and pound the server with repeated incident reports (varying traffic with max being 100 per minute, min being 1 per 24 hrs and avg being one per minute ) over a period of 48 hours.
Which tool can I use to test stability of my REST API?
For pounding the server with incidents over a period of time, you can use .
It is helpful for testing APIs. You can just trigger events in runscope over a period time or schedule it to hit the server as required.

IPC speed and compare

I am trying to implement a real-time application which involves IPC across different modules. The modules are doing some data intensive processing. I am using message queue as the backbone(Activemq) for IPC in the prototype, which is easy(considering I am a totally IPC newbie), but it's very very slow.
Here is my situation:
I have isolated the IPC part so that I could change it other ways in future.
I have 3 weeks to implement another faster version. ;-(
IPC should be fast, but also comparatively easy to pick up
I have been looking into different IPC approaches: socket, pipe, shared memory. However, I have no experience in IPC, and there is definitely no way I could fail this demo in 3 weeks... Which IPC will be the safe way to start with?
Best results you'll get with Shared Memory solution.
Recently I met with the same IPC benchmarking. And I think my results will be useful for all who want to compare IPC performance.
Pipe benchmark:
Message size: 128
Message count: 1000000
Total duration: 27367.454 ms
Average duration: 27.319 us
Minimum duration: 5.888 us
Maximum duration: 15763.712 us
Standard deviation: 26.664 us
Message rate: 36539 msg/s
FIFOs (named pipes) benchmark:
Message size: 128
Message count: 1000000
Total duration: 38100.093 ms
Average duration: 38.025 us
Minimum duration: 6.656 us
Maximum duration: 27415.040 us
Standard deviation: 91.614 us
Message rate: 26246 msg/s
Message Queues benchmark:
Message size: 128
Message count: 1000000
Total duration: 14723.159 ms
Average duration: 14.675 us
Minimum duration: 3.840 us
Maximum duration: 17437.184 us
Standard deviation: 53.615 us
Message rate: 67920 msg/s
Shared Memory benchmark:
Message size: 128
Message count: 1000000
Total duration: 261.650 ms
Average duration: 0.238 us
Minimum duration: 0.000 us
Maximum duration: 10092.032 us
Standard deviation: 22.095 us
Message rate: 3821893 msg/s
TCP sockets benchmark:
Message size: 128
Message count: 1000000
Total duration: 44477.257 ms
Average duration: 44.391 us
Minimum duration: 11.520 us
Maximum duration: 15863.296 us
Standard deviation: 44.905 us
Message rate: 22483 msg/s
Unix domain sockets benchmark:
Message size: 128
Message count: 1000000
Total duration: 24579.846 ms
Average duration: 24.531 us
Minimum duration: 2.560 us
Maximum duration: 15932.928 us
Standard deviation: 37.854 us
Message rate: 40683 msg/s
ZeroMQ benchmark:
Message size: 128
Message count: 1000000
Total duration: 64872.327 ms
Average duration: 64.808 us
Minimum duration: 23.552 us
Maximum duration: 16443.392 us
Standard deviation: 133.483 us
Message rate: 15414 msg/s
Been facing a similar question myself.
I've found the following pages helpful - IPC performance: Named Pipe vs Socket (in particular) and Sockets vs named pipes for local IPC on Windows?.
It sounds like the concensus is that shared memory is the way to go if you're really concerned about performance, but if the current system you have is a message queue it might be a rather... different structure. A socket and/or named pipe might be easier to implement, and if either meets your specs then you're done there.
On Windows, you can use WM_COPYDATA, a special kind of shared memory-based IPC. This is an old, but simple technique: "Process A" sends a message, which contains a pointer to some data in its memory, and waits until "Process B" processes (sorry) the message, e.g. creates a local copy of the data. This method is pretty fast and works on Windows 8 Developer Preview, too (see my benchmark). Any kind of data can be transported this way, by serializing it on the sender, and deserializing it on the receiver side. It's also simple to implement sender and receiver message queues, to make the communication asynchronous.
You may check out this blog post
Basically it compares Enduro/X, which is built on POSIX queues ( kernel queues IPC ) versus ZeroMQ, which may deliver messages simultaneously on several different transport classes, incl. tcp:// ( network sockets ), ipc://, inproc://, pgm:// and epgm:// for multicast.
From charts you may see that at some point with larger data packets Enduro/X running on queues wins over the sockets.
Both systems are running good with ~400 000 messages per second, but with 5KB messages, kernel queues are running better.
(image source:
Another update as answer to bellow comment, I did rerun test to run ZeroMQ on ipc:// too, see the picture:
As we see the ZeroMQ ipc:// is better, but again in some range Enduro/X shows the better results and then again ZeroMQ takes over.
Thus I could say that IPC selection depends on the work you plan to do.
Note that ZeroMQ IPC runs on POSIX pipes. While Enduro/x runs on POSIX queues.