We are migrating our infrastructure to kubernetes. I am talking about a part of it, that contains of an api for let's say customers (we have this case for many other resources). Let's consider we have a billion customers, each with some data etc. and we decided they deserve a specialized api just for them, with its own db, server, domain, etc.
Kubernetes has the notion of nodes and pods. So we said "ok, we dedicate node X with all its resources to this particular api". And now the question:
Why would I used multiple pods each of them containing the same nginx + fpm and code, and limit it to a part of traffic and resources, and add an internal lb, autoscale, etc., instead of having a single pod, with all node resources?
Since each pod adds a bit of extra memory consumption this seems like a waste to me. The only upside being the fact that if something fails only part of it goes down (so maybe 2 pods would be optimal in this case?).
Obviously, would scale the nodes when needed.
Note: I'm not talking about a case where you have multiple pods with different stuff, I'm talking about that particular case.
Note 2: The db already is outside this node, on it's own pod.
Google fails me on this topic. I find hundreds of post with "how to configure things, but 0 with WHY?".
Why would I used multiple pods each of them containing the same nginx + fpm and code, and limit it to a part of traffic and resources, and add an internal lb, autoscale, etc., instead of having a single pod, with all node resources?
Since each pod adds a bit of extra memory consumption this seems like a waste to me. The only upside being the fact that if something fails only part of it goes down (so maybe 2 pods would be optimal in this case?).
This comes down to the question, should I scale my app vertically (larger instance) or horizontally (more instances).
First, try to avoid using only a single instance since you probably want more redundancy if you e.g. upgrade a Node. A single instance may be a good option if you are OK with some downtime sometimes.
Scale app vertically
To scale an app vertically, by changing the instance to a bigger, is a viable alternative that sometimes is a good option. Especially when the app can not be scaled horizontally, e.g. an app that use leader election pattern - typically listen to a specific event and react. There is however a limit by how much you can scale an app vertically.
Scale app horizontally
For a stateless app, it is usually much easier and cheaper to scale an app horizontally by adding more instances. You typically want more than one instance anyway, since you want to tolerate that a Node goes down for maintenance. This is also possible to do for a large scale app to very many instances - and the cost scales linearly. However, not every app can scale horizontally, e.g. a distributed database (replicated) can typically not scale well horizontally unless you shard the data. You can even use Horizontal Pod Autoscaler to automatically adjust the number of instances depending on how busy the app is.
Trade offs
As described above, horizontal scaling is usually easier and preferred. But there are trade offs - you would probably not want to run thousands of instances when you have low traffic - an instance has some resource overhead costs, also in maintainability. For availability you should run at least 2 pods and make sure that they does not run on the same node, if you have a regional cluster, you want to make sure that they does not run on the same Availability Zone - for availability reasons. Consider 2-3 pods when your traffic is low, and use Horizontal Pod Autoscaler to automatically scale up to more instance when you need. In the end, this is a number game - resources cost money - but you want to provide a good service for your customers as well.
Related
I was wondering if it was really relevant to set “requests” (CPU/MEM) values if I’m not using HPA ?
If those values are not used to scale up or down some pods, what is the point ?
It's fine and it will work if you don't provide the requests (CPU/MEM) to workloads.
But consider the scenario, suppose you have 1-2 Nodes with a capacity of 1 GB and you have not mentioned the requests.
Already running application utilizing half of the node around 0.5 GB. Your new app needs now 1 GB to start so K8s will schedule the PODs onto that node as not aware of the minimum requirement to start the application.
After that whatever happens, we call it a Crash.
If you have extra resources in the cluster, setting affinity and confidence in the application code you can go without putting the requests (not best practice).
I am fairly new to Kubernetes, and I think I understand the basics of provisioning nodes and setting memory limits for pods. Here's the problem I have: my application can require dramatically different amounts of memory, depending on the input (and there is no fool-proof way to predict it). Some jobs require 50MB, some require 50GB. How can I set up my K8s deployment to handle this situation?
I have one strategy that I'd like to try out, but I don't know how to do it: start with small instances (nodes with not a lot of memory), and if the job fails with out-of-memory, then automatically send it to increasingly bigger instances until it succeeds. How hard would this be to implement in Kubernetes?
Thanks!
Natively K8S supports horizontal autoscalling i.e. automatically deplying more replicas of a deployment basing on chosen metric like CPU usage, memory usage etc.: Horizontal Pod Autoscaling
What you are describing here though is vertical scaling. It is not supported out of the box, but there is a subproject that seems to be able to fulfill your requirements: vertical-pod-autoscaler
I want to use the Kubernetes namespace for each user of my application. So potentially I'll need to create thousands of namespaces each with kubernetes resources in them. I want to make sure this is scalable, so I want to ensure that I can have millions of namespaces on a Kubernetes Cluster before I use this construct on a per user basis.
I'm building a web hosting application. So I'm giving resources to each user, but I want them separated by namespaces.
Are there any limitations to the number of Kubernetes namespaces you can create?
"In majority of cases, thresholds are NOT hard limits - crossing the limit results in degraded performance and doesn't mean cluster immediately fails over.
Many of the thresholds (for cluster scope) are given for the largest possible cluster. For smaller clusters, the limits are proportionally lower.
"
#Namespaces = 10000 scope=cluster
source with more data
kube Talk explaining how the data is computed
You'll usually run into limitations with resources and etcd long before you hit a namespace limit.
For scaling, you're probably going to want to scale your clusters which most companies treat as cattle rather than create a giant cluster which will be a Pet, which is not a scenario you want to be dealing with.
Hy !!
I was wondering if it could be possible to replicate an VMWare architecture in Kubernetes.
What I mean by that :
In place of having the Control-Panel always separated from the Worker Nodes, I would like to put them all together, at the end we would obtain a cluster of Master Nodes on which we can schedule applications. For now I'm using kata-container with containerd as such all applications are deployed in 'mini' VMs so there isn't the 'escape from the container' problem. The management of the Cluster would be done trough a special interface (eth0 1Gb). The users would be able to communicate with the apps that are deployed within the cluster trough another interface (eth1 10Gb). I would use Keepalived and HAProxy to elect my 'Main Master' and load balance the traffic.
The question might be 'why would you do that ?'. Well to assure High Availability at all time and reduce the management overhead, in place of having 2 sets of "entities" to manage (the control-plane and the worker nodes) simply reduce it to one, as such there won't be any problems such as 'I don't have more than 50% of my masters online so there won't be a leader elect', so now I would have to either eliminate master nodes from my cluster until the percentage of online master nodes > 50%, that would ask for technical intervention and as fast as possible which might result in human errors etc..
Another positive point would be the scaling, in place of having 2 parts of the cluster that I would need to scale (masters and workers) there would be only one, I would need to add another master/worker to the cluster and that's it. All the management traffic would be redirected to the Main Master that uses a Virtual IP (VIP) and in case of an overcharge the request would be redirected to another Node.
In the end I would have something resembling to this :
Photo - Architecture VMWare-like
I try to find disadvantages to this kind of architecture, I know that there would be etcd traffic on each Node but how impactful is it ? I know that there will be wasted resources for the Pods of the control-plane on each node, but knowing that these pods (except etcd) wont do much beside waiting, how impactful would it be ? Having each Node being capable to take the Master role there won't be any down time. Right now if my control-plane (3 masters) go down I have to reboot them or find the solution as fast as possible before there's a problem with one of the apps that turn on the worker Nodes.
The topology I'm using right now resembles the following :
Architecture basic Kubernetes
I'm new to kuberentes so the question might be seen as stupid but I would really like to know the advantages/disadvantages between the two and understand why it wouldn't be a good idea.
Thanks a lot for any help !! :slightly_smiling_face:
There are two reasons for keeping control planes on their own. The big one is that you only want a small number of etcd nodes, usually 3 or 5 and that's usually the bounding factor on the size of the control plane. You usually want the ability to scale worker nodes independently from that. The second issue is Etcd is very sensitive to IOPS brownouts and can get bad cascade failures if the machine runs low on IOPS.
And given that you are doing things on top of VMWare anyway, the overhead of managing 3 vs 6 VMs is not generally a difference in kind. This seems like a false savings in the long run.
We're moving all of our infrastructure to Google Kubernetes Engine (GKE) - we currently have 50+ AWS machines with lots of APIs, Services, Webapps, Database servers and more.
As we have already dockerized everything, it's time to start moving everything to GKE.
I have a question that may sound too basic, but I've been searching the Internet for a week and did not found any reasonable post about this
Straight to the point, which of the following approaches is better and why:
Having multiple node pools with multiple machine types and always specify in which pool each deployment should be done; or
Having a single pool with lots of machines and let Kubernetes scheduler do the job without worrying about where my deployments will be done; or
Having BIG machines (in multiple zones to improve clusters' availability and resilience) and let Kubernetes deploy everything there.
List of consideration to be taken merely as hints, I do not pretend to describe best practice.
Each pod you add brings with it some overhead, but you increase in terms of flexibility and availability making failure and maintenance of nodes to be less impacting the production.
Nodes too small would cause a big waste of resources since sometimes will be not possible to schedule a pod even if the total amount of free RAM or CPU across the nodes would be enough, you can see this issue similar to memory fragmentation.
I guess that the sizes of PODs and their memory and CPU request are not similar, but I do not see this as a big issue in principle and a reason to go for 1). I do not see why a big POD should run merely on big machines and a small one should be scheduled on small nodes. I would rather use 1) if you need a different memoryGB/CPUcores ratio to support different workloads.
I would advise you to run some test in the initial phase to understand which is the size of the biggest POD and the average size of the workload in order to properly chose the machine types. Consider that having 1 POD that exactly fit in one node and assign to it is not the right to proceed(virtual machine exist for this kind of scenario). Since fragmentation of resources would easily cause to impossibility to schedule a large node.
Consider that their size will likely increase in the future and to scale vertically is not always this immediate and you need to switch off machine and terminate pods, I would oversize a bit taking this issue into account and since scaling horizontally is way easier.
Talking about the machine type you can decide to go for a machine 5xsize the biggest POD you have (or 3x? or 10x?). Oversize a bit as well the numebr of nodes of the cluster to take into account overheads, fragmentation and in order to still have free resources.
Remember that you have an hard limit of 100 pods each node and 5000 nodes.
Remember that in GCP the network egress throughput cap is dependent on the number of vCPUs that a virtual machine instance has. Each vCPU has a 2 Gbps egress cap for peak performance. However each additional vCPU increases the network cap, up to a theoretical maximum of 16 Gbps for each virtual machine.
Regarding the prices of the virtual machines notice that there is no difference in price buying two machines with size x or one with size 2x. Avoid to customise the size of machines because rarely is convenient, if you feel like your workload needs more cpu or mem go for HighMem or HighCpu machine type.
P.S. Since you are going to build a pretty big Cluster, check the size of the DNS
I will add any consideration that it comes to my mind, consider in the future to update your question with the description of the path you chose and the issue you faced.
1) makes a lot of sense as if you want, you can still allow kube deployments treat it as one large pool (by not adding nodeSelector/NodeAffinity) but you can have different machines of different sizes, you can think about having a pool of spot instances, etc. And, after all, you can have pools that are tainted and so forth excluded from normal scheduling and available to only a particular set of workloads. It is in my opinion preferred to have some proficiency with this approach from the very beginning, yet in case of many provisioners it should be very easy to migrate from 2) to 1) anyway.
2) As explained above, it's effectively a subset of 1) so better to build up exp with 1) approach from day 1, but if you ensure your provisioning solution supports easy extension to 1) model then you can get away with starting with this simplified approach.
3) Big is nice, but "big" is relative. It depends on the requirements and amount of your workloads. Remember that while you need to plan for loss of a whole AZ anyway, it will be much more frequent to loose single nodes (reboots, decommissions of underlying hardware, updates etc.) so if you have more hosts, impact of loosing one will be smaller. Bottom line is that you need to find your own balance, that makes sense for your particular scale. Maybe 50 nodes is too much, would 15 cut it? Who knows but you :)