How does transactions per second relate to number of users and server configuration? - rest

Consider a brokerage platform where customers can see all the securities they hold. There is an API which supports the brokerage frontend by providing all the securities data corresponding to a customer Id.
Now this api has a tps of 6000, that is what it is certified for.
And the SLA for this api is 500ms, and 99th percentile response time is 200ms.
The API is hosted on aws, where the servers are autoscaled.
I have few questions:
At peak traffic, the num of users on frontend is in hundreds of thousands if not million. How does the tps relate with the number of users ?
My understanding is that tps is the number of transactions which the api can process in a second. So, the number 6000 for the api, is it for one server or for one thread in a server? And let's say if we add 5 servers, then the cumulative tps would become 30,000 ?
How do we decide the configuration (core, ram) of each server if we know the tps ?
And finally, is the number of servers basically [total num of users / tps] ?
I tried reading about it in couple of books and few blogs, but I wasn't able to clear my confusion entirely.
Kindly correct me if I am asking the wrong questions or if the questions doesn't make sense.

Related

how to improve publishing efficiency between ctps and client procs?

I have
host A - US
host B - germany
CTP is in host A and all client procs are in host B.
CTP is publishing data to about 30 client processes.
My question is :
If I move the ctp on the same host where the client processes are, will the data transfer speed improve ?
If the 30x clients on hostB are subscribing to small tables only and/or filtering on sym to only receive subsets of tables then possibly data volumes could increase if the CTP is moved. As then all data will be sent rather than a subset.
If the 30x clients on hostB are subscribing to a table each without overlap then data transfer volume will not change.
If the 30x clients on hostB all subscribe to all data from the CTP then moving it to hostB would see a 30x reduction between the hosts. As then the data would only be sent once between the machines before fanning out to subscribers.
In most scenarios you will likely see a decrease. You can see what subscribers are listening to in .u.w in standard tick.q
https://code.kx.com/q/kb/publish-subscribe/
Then you can check counts of tables and sum up data that will be transferred to measure an estimate of how much traffic will increase/decrease.

Server Throughput definition ambiguity

Is throughput the max number of requests a server instance can handle or is it the number of requests that the server instance is currently handling?
Edit: By "currently handling" I mean, the number of requests the server is receiving for a given time interval in recent time. For eg: The server is currently handling 400 reqs every min.
For eg:, I might have a server instance with a lot of hardware which can have high throughput, but I might be only receiving small amount of traffic. What does throughput measure in such a situation. Also, what about the inverse case, i.e if my instance can only handle x requests per min. but is receiving y>>>x requests per min.
If throughput is the max no. of requests a server can handle, how is it measured? Do we do a load/stress test, where we keep increasing the requests per min on the server until it cannot handle them anymore?
No, Throughput is an aggregation that depends on execution time, you can send 1000 requests in the same second and your server won't handle, but when you'll send 1000 requests in an hour and your server will handle it normally.
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
You want to find the number of concurrent users that your server can handle by increasing JMeter threads until server reach his maximum
Throughput is the number of Samplers which JMeter executes within the duration of your test. If you want to see the actual amount of requests which are being sent - consider using i.e. Server Hits Per Second listener (can be installed using JMeter Plugins Manager)
If you see that your server resources consumption doesn't increase as you increase the load in JMeter the reasons are in:
Your application middleware configuration is not suitable for high load (i.e. there is a limit of concurrent sessions and requests are queuing up instead of being processed), check out Web Performance Optimization: Top 3 Server and Client-Side Performance Tips for overall ideas what could be looked at
Your application code doesn't utilize underlying OS resources efficiently, consider using profiler tool to see what's going on under the hood.
JMeter may fail to send requests fast enough, make sure to follow JMeter Best Practices and if JMeter's machine is overloaded - consider going for Distributed Testing

How do calculate IO operations in Cloudant account for Bluemix pricing? (throughput - lookups, reads, writes, queries)

Coundant standatd plan is written as "100 reads / sec, 50 writes / sec, 5 global queries / sec".
Is this IO/s calculate end-to-end request? Or is it based on the query execution plan?
Let's give some examples
Q1.Let's say I use a Bulk operation to create 3 new documents in Cloudant (Bluemix Standard plan).
1 write operation? 3 write operation?
Q2. Query by aggregation(join)-1000 indexed docs with "name, age range, join time" and get as one docs.
1 read? 1000 + 1 read?
Q3. When I am using the standard plan (limit 100 reads / sec), it is assumed that 100 users executed the query in (Q2) at the same time.
How is IO calculated? 1 * 100 reads? (1000 + 1) * reads?
Do some users fail to execute queries because of limitation IO?
There is no data listed properly about Cloudant Price Method.
Can anyone please point me out correctly?
I want to know exactly how the standard plan calculation is measured.
It would be better if you could add a calculation example and answer!
Also answered here, https://developer.ibm.com/answers/questions/525190/how-do-calculate-io-operations-in-cloudant-account/
Bulk operations currently count as 1 W, regardless of the number of docs it contains.
A query is a request to a URL that has one of _design, _find or _search, again unreated to the number of documents actually involved. Note that some of these API endpoins (search) are paged, so it would be 1 Query per requested page of results.
I assume that by "100 users" you mean 100 concurrent connections using the same credentials, as Cloudant's rate limiting is applied per account. If so, the sum total of requests are counted towards the limit. When that bucket is full, any further requests will be cut off, and failed with a 429: Too Many Requests HTTP status code.
As an example, let's say you have a Standard account where you've set the rate limit to allow 100 queries per second. You have 100 concurrent connections hitting _find repeatedly, each query returning 1000 documents. Cloudant will allow 100 queries per second, so on average each of your connections will get 1 query per second fulfilled, and any attempts to push harder than that will results in 429 http errors. With 10 concurrent connections, on average each will get 10 qps etc.
Cloudant rate limits at the http level. No splitting of bulk ops into the constituent parts take place, at least not yet.
Documentation for how this all hangs together can be found here: https://cloud.ibm.com/docs/services/Cloudant?topic=cloudant-pricing#pricing"
The Cloudant offering in the IBM Cloud Catalog has a link to the documentation. In the docs is a description of the plans with additional examples. The docs also have sections that explain how read and write operations are calculated.
The http code 429 is returned by Cloudant to indicate too many requests. The documentation discusses this fact and there are code samples on how to handle it.

AWS EC2 Spot Instances fit for edge services?

Spot instances can randomly get shut down by Amazon. Does this mean that they would not work well as edge services (e.g. REST services)? Using an Elastic Load Balancer (ELB) plus some persistent EC2 nodes (plus the spot instances), would this work well if the client retried a few times upon failure? Or could they get numerous 404s, even with a few retries?
You will have a little bit of an impact if you decide to use spot instances in this scenario. The key will be getting the load balancer to recognize that the instance is out of service quickly. Also, not using sticky sessions can reduce the chance that they would get repeated 504 (Gateway timeout) errors.
Spot instances are a bit tricky to grok. On one hand they can give you compute power for a very low price, but on the other hand you might lose these instances with minimal notice.
One thing you can do is to give a "max bid" which represents the risk of losing the instances and not only the price you are willing to pay. Since you are not paying your bid price, but only the market price until the market price is higher than your max bid, most of the times you will pay a lower price than your max bid. For example, if you are bidding 90% of the on-demand (OD) price, you will most likely pay less than (for example, 30% of the on-demand price), on average during a run of a week or a month. You can even consider giving a max bid which is higher than on-demand (up to 4 times OD price), and still on average pay much less than the OD price.
It is best to analyze the spot prices for the last 3 months that are provided by the API, and check the behaviour of the market price for the different instance types and in the different regions and availability zones.
Another option you can consider is running 2 auto scaling groups (ASG). One will try to scale (or heal) your spot based instances, and one that will work with on-demand instances. The latter will be slower to kick in, and will work only if the Spot based group is not available due to higher market prices.

Distributed rate limiting

I have multiple servers/workers going through a task queue doing API requests. (Django with Memcached and Celery for the queue) The API requests are limited to 10 requests a second. How can I rate limit it so that the total number of requests (all servers) don't pass the limit?
I've looked through some of the related rate limit questions I'm guessing they are focused on a more linear, non concurrent scenario. What sort of approach should I take?
Have you looked in Rate Limiter from Guava project? They introduced this class in one of the latest releases and it seems to partially satisfy your needs.
Surely it won't calculate rate limit across multiple nodes in distributed environment but what you coud do is to have rate limit configured dynamically based on number of nodes which are are running (ie for 5 nodes you'd have rate limit of 2 API requests a second)
I have been working on an opensource project to solve this exact problem called Limitd. Although I don't have clients for other technologies than node yet, the protocol and the idea are simple.
Your feedback is very welcomed.
I solved that problem unfortunately not for your technology: bandwidth-throttle/token-bucket
If you want to implement it, here's the idea of the implementation:
It's a token bucket algorithm which converts the containing tokens into a timestamp since when it last was completely empty. Every consumption updates this timestamp (locked) so that each process shares the same state.