Kubernetes, Running VPA with Recommendations Only and HPA

There are already a million questions and answers stating you cannot run a VPA+HPA at the same time in your cluster performing modifications, for obvious reasons.
However, my question is can you run a VPA in recommend only mode (updateMode: "Off") with an active HPA? Seems like others have had this question but I haven't found a definitive answer to my question. I just want to be really safe before I just start turning things on and have some stuff break.
Others have asked here: https://github.com/kubernetes/autoscaler/issues/3858

Well I just took the plunge and deployed it. So far I'm not having any issues. So it appears to be safe to deploy a VPA+HPA both watching CPU and Memory but have the VPA set with updateMode: "Off".


MongoDB Atlas dedicated cluster: Should I be concerned about 'Restarts in last hour' alerts?

We’re using a standard 3-node Atlas replicaset in a dedicated cluster (M10, Mongo 6.0.3, AWS) and have configured an alert if the ‘Restarts in last hour is’ rule exceeds 0 for any node.
We’re seeing this alert fire every now and then and we’re wondering what this means for a node in a dedicated cluster and whether this is something to be concerned about, since I don’t think we have any control over it. Should we should disable this rule or increase the restart threshold?
Thanks in advance for any advice.
(Note I've asked this over at the Mongo community support site also, but haven't received any traction yet so asking here too)
I got an excellent response on my question at the Mongo community support site:
A node restarting is not necessarily a cause for concern. However, you should investigate the cause of the restart itself to better determine if this is an issue or not. You should take a look at your Project Activity Feed to see if you can determine why the nodes are restarting. I understand you have noted this is an M10 cluster so you should have access to the MongoDB logs, you also can check those to try determine the cause of the node restart. If you do not have access to the logs, you can consider working with Atlas in-app chat support to diagnose the issue.
It’s always good to keep the alerts active, as they can indicate a potential problem as soon as they occur. You can consider increasing the restart threshold to reduce alert noise after concluding whether the restarts are expected or not.
In my case, having checked the activity feed I was able to match up all the alerts we were seeing to Mongo version auto-updates on the nodes. We still wanted to keep that so we've increased our alert threshold to fire on >1 restart per hour rather than >0 restart, assuming that auto-updates won't be applied multiple times in the same hour.

Error using the connection to database when RDS scales out

We have a .net API hosted in ECS that queries data from a serverless v1 cluster using Entity Framework. Under normal load this service performs very well but when there's a large spike in traffic that require the RDS cluster to scale out to more ACUs we are seeing a lot of connection errors in our API.
An error occurred using the connection to database '\"ourdatabasename\"' on server '\"tcp://ourcluster.region.rds.amazonaws.com:5432\"'.
The high level overiew of the infrastructure looks like this:
CloudFront >> Load Balancer >> ECS Fargate >> RDS Aurora PostgeSQL Serverless v1
Stack information:
.Net 6 API compiled for Linux
Entity Framework Core 6.x
Npgsql.EntityFrameworkCore.PostgreSQL 6.x
PostgeSQL 10.18
We did open AWS support cases about this issue in the past year, but those basically always result in the answer that this is an implementation issue and not an infrastructure issue.
We can easily reproduce the issue by running a k6 stress test on our API (bypassing the CloudFront caching layer of course) to generate a spike high enough to trigger scaling of the RDS cluster.
For the past year we have worked around this issue by configuring RDS at a capacity at which it basically never needs to scale out. This is of course wasting money, and not the purpose of serverless as all, so we would like to find the underlying root cause and solve that.
Some things we have tried already:
We have experimented with serverless v2 which should scale in a completely different fashion as it's just the same vm consuming more resources from the hosting machine. But our preliminary conclusion is that this was even worse. We do not yet understand why that is, but it appears to trigger the same effect but than a lot faster/more as v2 scales a lot faster/more. With v1 we get in trouble around 400 requests per second, with v2 it was at 150rps.
EnableRetryOnFailure seemed to help a tiny bit, but not a lot. We have left it at the default configuration as implemented by Npgsql for now.
We have experimented with the Maximum Pool Size connection string parameter. At 300 it appears to be a bit better, but it does not solve the issue.
Changing the scaling behaviour of ECS/the ALB or even just prescaling that to handle peak load did not change anything.
We have not tried:
RDS Proxy, it's supposed to solve all your connection pooling issues. But we're not sure it's even a pooling issue. We're not keen on trusting on yet another black box service to solve the issues our first black box service (aurora serverless) has. And it's not really cheap. If all of SO will now convince us this is the holy grail, then surely we'll try it out.
Data API for RDS, you can't have connection management issues if you're not making them right? It's a huge investment to rewrite all EF code to Data API requests and I'm not sure what it says about the service if it's still not out for serverless v2. So, not for now I think.
The first purpose of this question here on SO it trying to find someone that could help us understand what is even going on. Helping us understand the error and where it comes from. We understand that you cannot expect that ECS+RDS can just magically handle all the load you throw at it. But if we do not fully understand how it breaks we are not able to come up with how to create potential failover mechanisms or how to make the system fail more gracefully.
If someone knows the magic setting but not the why that's also great of course :) We can then maybe figure out the why ourselves and share that back with the community ;)
Feel free to ask more questions where needed.

What causes cold start in serverless [closed]

I have read enough papers on serverless cold start, but have not found a clear explanation on what causes cold start. Could you try to explain it from both commercial and open-source platform's points of view?
commercial platform such as AWS Lambda or Azure Funtion. I know they are more like a black-box to us
There are open-source platforms such as OpenFaaS, Knative, or OpenWhisk. Do those platforms also have a cold start issue?
My initial understanding about cold start latency is time spent on spinning up a container. After the container being up, it can be reused if not being killed yet, so there is a warm start. Is this understanding really true? I have tried to run a container locally from the image, no matter how large the image is, the latency is near to none.
Is the image download time also part of cold start? But no matter how many cold starts happened in one node, only one image download is needed, so this seems to make no sense.
Maybe a different question, I also wonder what happened when we instantiate a container from the image? Are the executable and its dependent libraries (e.g., Python library) copied from disk into memory during this stage? What if there are multiple containers based on the same image? I guess there should be multiple copies from disk to memory because each container is an independent process.
There's a lot of levels of "cold start" that all add latency. The hottest of the hot paths is the container is still running and additional requests can be routed to it. The coldest is a brand new node so it has to pull the image, start the container, register with SD, wait for the serverless plane's routing stuffs to update, probably some more steps if you dig deep enough. Some of those can happen in parallel but most can't. If the pod has been shut down because it wasn't being used, and the next run schedules on the same machine then yes kubelet usually skips pulling image (unless imagePullPolicy Always is forced somewhere) so you get a bit of a faster launch. K8s' scheduler doesn't generally optimize for that though.

Create New Health State in SCOM

We have three health states in SCOM - Healthy, Warning and Critical. Is it possible to add a new custom health state? I tried by editing the System.Health.Library but failed. Is it possible?
Unfortunately there is no other answer except "No. It's impossible.". SCOM has three states maximum hardcoded in a lot of places and it doesn't allow you to have 4 states or more. It's really weird because 4+ states can bring us a lot of interesting and valuable scenarios (especially for ITSM integration), but it is what it is.
I recommend you think how you can break your monitor down to 2 or more monitors to have reasonable and clear 2 or 3-state monitors. In major cases it's possible. If you can give me more details about your case - I'll be glad to help you with design.
Good luck!

How should I benchmark a system to determine the overall best architecture choice?

This is a bit of an open ended question, but I'm looking for an open ended answer. I'm looking for a resource that can help explain how to benchmark different systems, but more importantly how to analyze the data and make intelligent choices based on the results.
In my specific case, I have a 4 server setup that includes mongo that serves as the backend for an iOS game. All servers are running Ubuntu 11.10. I've read numerous articles that make suggestions like "if CPU utilization is high, make this change." As a new-comer to backend architecture, I have no concept of what "high CPU utilization" is.
I am using Mongo's monitoring service (MMS), and I am gathering some information about it, but I don't know how to make choices or identify bottlenecks. Other servers serve requests from the game client to mongo and back, but I'm not quite sure how I should be benchmarking or logging important information from them. I'm also using Amazon's EC2 to host all of my instances, which also provides some information.
So, some questions:
What statistics are important to log on a backend setup? (CPU, RAM, etc)
What is a good way to monitor those statistics?
How do I analyze the statistics? (RAM usage is high/read requests are low, etc)
What tips should I know before trying to create a stress-test or benchmarking script for my architecture?
Again, if there is a resource that answers many of these questions, I don't need an explanation here, I was just unable to find one on my own.
If more details regarding my setup are helpful, I can provide those as well.
I like to think of performance testing as a mini-project that is undertaken because there is a real-world need. Start with the problem to be solved: is the concern that users will have a poor gaming experience if the response time is too slow? Or is the concern that too much money will be spent on unnecessary server hardware?
In short, what is driving the need for the performance testing? This exercise is sometimes called "establishing the problem to be solved." It is about the goal to be achieved-- because if there is not goal, why go through all the work of testing the performance? Establishing the problem to be solved will eventually drive what to measure and how to measure it.
After the problem is established, a next set is to write down what questions have to be answered to know when the goal is met. For example, if the goal is to ensure the response times are low enough to provide a good gaming experience, some questions that come to mind are:
What is the maximum response time before the gaming experience becomes unacceptably bad?
What is the maximum response time that is indistinguishable from zero? That is, if 200 ms response time feels the same to a user as a 1 ms response time, then the lower bound for response time is 200 ms.
What client hardware must be considered? For example, if the game only runs on iOS 5 devices, then testing an original iPhone is not necessary because the original iPhone cannot run iOS 5.
These are just a few question I came up with as examples. A full, thoughtful list might look a lot different.
After writing down the questions, the next step is decide what metrics will provide answers to the questions. You have probably comes across a lot metrics already: response time, transaction per second, RAM usage, CPU utilization, and so on.
After choosing some appropriate metrics, write some test scenarios. These are the plain English descriptions of the tests. For example, a test scenario might involve simulating a certain number of games simultaneously with specific devices or specific versions of iOS for a particular combination of game settings on a particular level of the game.
Once the scenarios are written, consider writing the test scripts for whatever tool is simulating the server work loads. Then run the scripts to establish a baseline for the selected metrics.
After a baseline is established, change parameters and chart the results. For example, if one of the selected metrics is CPU utilization versus the number of of TCP packets entering the server second, make a graph to find out how utilization changes as packets/second goes from 0 to 10,000.
In general, observe what happens to performance as the independent variables of the experiment are adjusted. Use this hard data to answer the questions created earlier in the process.
I did a Google search on "software performance testing methodology" and found a couple of good links:
Check out this white paper Performance Testing Methodology by Johann du Plessis
Have a look at the Methodology section of this Wikipedia article.