clouddformation creation failed, Maximum number of simultaneous requests reached - aws-cloudformation

I got the following in the CloudFormation events panel when creating a large stack. It's on an RDS resource, and caused the stack creation to fail:
Maximum number of simultaneous requests reached. Please consider adding a retry mechanism or increasing the queue size.
What does it mean? How do I fix it?

This is a rate-limiting error from Amazon. It probably occurs on other resources, but it definitely happens on RDS creation because only one RDS instance can be created at a time. In other words, if your stack includes two databases, they can't be created in parallel.
To work around this, make one database depend on the other. In other words, use DependsOn: databaseA on the databaseB resource. That will ensure database A is created before database B creation begins.

Related

Handle replication lag in data pipeline

Given a pipeline of tasks that run in sequence. Each task consumes data from the database, manipulate it, and produces (write) to the same database.
We are using AWS RDS Aurora, and in order to spread the load, the “reading phase” of each task is done within the read replica.
In some cases of high loads, we reach replication lag of 10-15 seconds. This means that by the time the new task consume data, it gets wrong/missing data points.
We know this is not the “right” way to design such pipeline, and it contradicts the idiom “Do not communicate by sharing memory; instead, share memory by communicating”.
Since it’s too much effort to change the design now, we come up with alternative solution:
Create a service that check replication lag value and expose it to all tasks. If the value is greater than x, task will fallback to read from RDS master node.
This is not optimal, and I would like to hear other solution to bypass this issue.
It is worth mentioning that we are using Celery (& Python) to construct such workflow and each task is unaware of the tasks that ran previously.
There will always be data which is inserted into the database but not yet visible, either because it wasn't committed yet, it was committed after your snapshot was started, or due to replication lag. The only real solution is to make your tasks robust to this inevitability.
Create a service that check replication lag value and expose it to all tasks. If the value is greater than x, task will fallback to read from RDS master node.
You want to shed load from the master until the first sign of trouble, then you want to suddenly dump all the load back onto it?
Create a service that check replication lag value and expose it to all tasks. If the value is greater than x, task will fallback to read from RDS master node
Depending on the cause of your replication lag this might make things worse due to further increasing the load on the master node.
If your pipeline allows it you could wait in Task A, after write, until the data propagated to the read replica.

Microservices Replication: What about the Database?

Let's say you are using either ServiceFabric or Kubernetes, and you are hosting a transaction data warehouse microservice (maybe a bad example, but suppose all it dose is a simple CQRS architecture consisting of Id of sender, receiver, date and the payment amount, writes and reads into the DB).
For the sake of the argument, if we say that this microservice needs to be replicated among different geographic locations to insure that the data will be recoverable if one database goes down.
Now the naïve approach that I'm thinking is to have an event which gets fired when the transaction is received, and the orchestrator microservice will except to receive event-processed acknowledgment within specific timeframe.
But the question stays that what about the database ? what will happen when we will scale out the microservices and a new microservice instances will be raise up?
they will write to the same database, no ?
One of solutions can be to put the database within the docker, and let it be owned by each replica, is this a good solution?
Please share your thoughts and best practices.
what will happen when we will scale out the microservices and a new microservice instances will be raise up? they will write to the same database?
Yes, the instances of your service, all share the same logical database. To achieve high availability, you typically run a distributed database cluster, but it appears as a single database system for your service.
One of solutions can be to put the database within the docker, and let it be owned by each replica, is this a good solution?
No, you typically want that all your instances of your service see the same consistent data. E.g. a read-request sent to two different instances of your service, should respond with the same data.
If the database becomes your bottleneck, then you can mitigate that by implementing caching or shard your data, or serve read-requests from specific read-instances.

How to read/write to secondary member of a MongoDB replica-set?

I am currently planning some server infrastructure. I have two servers in different locations. My apps (apis and stuff) are running on both of them. The client connects to the nearest (best connection). In case of failure of one server the other can process the requests. I want to use mongodb for my projects. The first idea is to use a replica set, therefore I can ensure the data is consistent. If one server fails the data is still accessible and the secondary switches to primary. When the app on the primary server wants to use the data, it is fine, but the other server must connect to to the primary server in order to handle data (that would solve the failover, but not the "best connection" problem). In Mongodb there is an option to read data from secondary servers, but then I have to ensure, that the inserts (only possible on primary) are consistent on every secondary. There is also an option for this "writeConcern". Is it possible to somehow specify “writeConcern on specific secondary”? Because If an add a second secondary without the apps on it, "writeConcern" on every secondary would not be necessary. And if I specify a specific value I don't really know on which secondary the data is available, right ?
Summary: I want to reduce the connections between the servers when the api is called.
Please share some thought or Ideas to fix my problem.
Writes can only be done on primaries.
To control which secondary the reads are directed to, you can use max staleness as well as tags.
that the inserts (only possible on primary) are consistent on every secondary.
I don't understand what you mean by this phrase.
If you have two geographically separated datacenters, A and B, it is physically impossible to write data in A and instantly see it in B. You must either wait for the write to propagate or wait for the read to fetch data from the remote node.
To pay the cost at write time, set your write concern to the number of nodes in the deployment (2, in your proposal). To pay the cost at read time, use primary reads.
Note that merely setting write concern equal to the number of nodes doesn't make all nodes have the same data at all times - it just makes your application only consider the write successful when all nodes have received it. The primary can still be ahead of a particular secondary in terms of operations committed.
And, as noted in comments, a two-node replica set will not accept writes unless both members are operational, which is why it is generally not a useful configuration to employ.
Summary: I want to reduce the connections between the servers when the api is called.
This has nothing to do with the rest of the question, and if you really mean this it's a premature optimization.
If what you want is faster network I/O I suggest looking into setting up better connectivity between your application and your database (for example, I imagine AWS would offer pretty good connectivity between their various regions).

.Net Core connection pool exhausted (postgres) under heavy load spike, when new pod instances are created

I have an application which runs stable under heavy load, but only when the load increase in graduate way.
I run 3 or 4 pods at the same time, and it scales to 8 or 10 pods when necessary.
The standard requests per minute is about 4000 (means 66 req-per-second per node, means 16 req-per-second per single pod).
There is a certain scenario, when we receive huge load spike (from 4k per minute to 20k per minute). New pods are correctly created, then they start to receive new load.
Problem is, that in about 10-20% of cases newly created pod struggles to handle initial load, DB requests are taking over 5000ms, piling up, finally resulting in exception that connection pool was exhausted: The connection pool has been exhausted, either raise MaxPoolSize (currently 200) or Timeout (currently 15 seconds)
Here goes screenshots from NewRelic:
I can see that other pods are doing well, and also that after initial struggle, all pods are handling the load without any issue.
Here goes what I did when attempting to fix it:
Get rid of non-async calls. I had few lines of blocking code inside async methods. I've changed everything to async. I do not longer have non-async methods.
Removed long-running transactions. We had long running transactions, like this:
beginTransactionAsync
selectDataAsync
saveDataAsync
commitTransactionAsync
which I refactored to:
- selectDataAsync
- saveDataAsync // under-the-hood EF core short lived transaction
This helped a lot, but did not solve problem completely.
Ensure some connections are always open and ready. We added Minimum Pool Size=20 to connection string, to always keep at least 20 connections open.
This also helped, but still sometimes pods struggle.
Our pods starts properly after Readiness probe returns success. Readiness probe checks connection to the DB using standard .NET core healthcheck.
Our connection string have MaxPoolSize=100;Timeout=15; setting.
I am of course expecting that initially new pod instance will need some spin-up time when it operates at lower parameters, but I do not expect that so often pod will suffocate and throw 90% of errors.
Important note here:
I have 2 different DbContexts sharing same connection string (thus same connection pool). Each of this DbContext accesses different schema in the DB. It was done to have a modular architecture. DbContexts never communicate with each other, and are never used together in same request.
My current guess, is that when pod is freshly created, and receives a huge load immediately, the pod tries to open all 100 connections (it is visible in DB open sessions chart) which makes it too much at the beginning. What can be other reason? How to make sure that pod does operate at it's optimal performance from very beginning?
Final notes:
DB processor is not at its max (about 30%-40% usage under heaviest load).
most of the SQL queries and commands are relatively simple SELECT and INSERT
after initial part, queries are taking no more than 10-20ms each
I don't want to patch the problem with increasing number of connections in pool to more than 100, because after initial struggle pods operates properly with around 50 connections in use
I rather not have connection leak because in such case it will throw exceptions after normal load as well, after some time
I use scoped DbContext, which is disposed (thus connection is released to the pool) at the end of each request
EDIT 25-Nov-2020
My current guess is that new created pod is not receiving enough of either BANDWITH resource or CPU resource. This reasoning is supported by fact that even those requests which DOES NOT include querying DB were struggling.
Question: is it possible that new created pod is granted insufficient resources (CPU or network bandwidth) at the beginning?
EDIT 2 26-Nov-2020 (answering Arca Artem)
App runs on k8s cluster on AWS.
App connects to DB using standard ADO.NET connection pool (max 100 connections per pod).
I'm monitoring DB connections and DB CPU (all within reasonable limits). Hosts CPU utilization is also around 20-25%.
I thought that when pod start, and /health endpoint responds successfully (it checks DB connections, with simple SELECT probe) and also pod's max capacity is e.g. 200rps - then this pod is able to handle this traffic since very first moment after /health probe succeeded. However, from my logs I see that after '/health' probe succeed 4 times in a row under 20ms, then traffic starts coming in, few first seconds of pod handling traffic is taking more than 5s per request (sometimes even 40seconds per req).
I'm NOT monitoring hosts network.
At this point it's just a speculation on my part without knowing more about the code and architecture, but it's worth mentioning one thing that jumps out to me. The health check might not be using the normal code path that your other endpoints use, potentially leading to a false positive. If you have the option, use of a profiler could help you pin-point exactly when and how this happens. If not, we can take educated guesses where the problem might be. There could be a number of things at play here, and you may already be familiar with these, but I'm covering them for completeness sake:
First of all, it's worth bearing in mind that connections in Postgres are very expensive (to put it simply, it's because it's a fork on the database process) and your pods are consequently creating them in bulk when you scale your app all at once. A relatively considerable time is needed to set each one up and if you're doing them in bulk, it'll add up (how long is dependent on configuration, available resources..etc).
Assuming you're using ASP.NET Core (because you mentioned DbContext), the initial request(s) will take the penalty of initialising the whole stack (create min required connections in the pool, initialise ASP.NET stack, dependencies...etc). Again, this will all depend on how you structure your code and what your app is actually doing during initialisation. If your health endpoint is connecting to the DB directly (without utilising the connection pool), it would mean skipping the costly pool initialisation resulting in your initial requests to take the burden.
You're not observing the same behaviour when your load increases gradually possibly because usually these things are an interplay between different components and it's generally a non-linear function of available resources, code behaviour...etc. Specifically if it's just one new pod that spun up, it'll require much less number of connections than, say, 5 new pods spinning up, and Postgres would be able to satisfy it much quicker. Postgres is the shared resource here - creating 1 new connection would be significantly faster than creating 100 new connections (5 pods x 20 min connections in a pool) for all pods waiting on a new connection.
There are a few things you can do to speed up this process with config changes, using an external connection pooler like PgBouncer...etc but they won't be effective unless your health endpoint represents the actual state of your pods.
Again it's all based on assumptions but if you're not doing that already, try using the DbContext in your health endpoint to ensure the pool is initialised and ready to take connections. As someone mentioned in the comments, it's worth looking at other types of probes that might be better suited to implementing this pattern.
I found ultimate reason for the above issue. It was insufficient CPU resources assigned to pod.
Problem was hard to isolate because NewRelic APM CPU usage charts are calculated in different way than expected (please refer to NewRelic docs). The real pod CPU usage vs. CPU limit can be seen only in NewRelic kubernetes cluster viewer (probably it uses different algorithm to chart CPU usage there).
What is more, when pod starts up, it need a little more CPU at the beginning. On top of it, the pods were starting because of high traffic - and simply, there was no enough of CPU to handle these requests.

Redshift WLM: "final queue may not contain User Groups or Query Groups"

I want to update the WLM configuration for my Redshift cluster, but I am unable to make changes and save them due to the following message displayed:
The following problems must be corrected before you can save this workload configuration:
The final queue may not contain User Groups or Query Groups.
Now, the obvious solution is to just create a new queue with no user group specified and give it the remaining amount of memory so that it all adds up to 100%. That's annoying because adding a new queue requires cluster reboot, but that's not the reason I'm asking this question.
My main question is where is this need for a new "non-user" queue explained? This is definitely a change because previously, I had four queues, each with an assigned user group, and their collective memory allocation was 87%. It didn't add up to 100%, and supposedly the rest was dynamically managed by redshift.
Now, I have no problem creating this new queue, but I would really like to see an explicit explanation of what it does/what its affects are before I do it. I didn't see an update on their official blog, I don't see this mentioned in the docs, or the doc updates (http://docs.aws.amazon.com/redshift/latest/mgmt/document-history.html, https://docs.aws.amazon.com/redshift/latest/dg/doc-history.html) This is blocking me from making other desired changes to the existing queues.
I am not sure about this statement of yours:
previously, I had four queues, each with an assigned user group
However, it is easy to guess why Redshift requires a default queue (i.e. one which doesn't have any User or Query groups) as the last one. Let's say you have four queues, each of which has a different User group, say, UG1, UG2, UG3 and UG4. Now, a User from another User group, say UG5, queries Redshift. Which queue does it get routed to? A default queue helps in such a case.
You might counter-argue that Redshift can use the remaining 13% memory. What if 100 concurrent queries came from UG5. Redshift doesn't have any configuration of how you'd like the 13% memory to be allocated to these queries. If it manages the memory on its own, the performance of your cluster would be unpredictable.
But yeah, I agree that the Redshift documentation lacks clarity on this. The closest I could find was this. Though, it doesn't say that the default queue can't have any User or Query groups.
The default queue must be the last queue in the WLM configuration. Any
queries that are not routed to other queues run in the default queue.
Anyway, you'll have to live with it, I guess.