Running multiple containers on the same Service Fabric node - azure-service-fabric

I have a windows Service Fabric node with 4 cores and I want to host 3 containerized stateless services on it, where each windows container is allocated 1 core to read a message from a queue and process it. I run some experiments and got these results:
1 container running on the node: message takes ~18 sec to be
processed, avg cpu usage per container: 24.7%, memory usage: 1 GB
2 containers running on the node: message takes ~25 sec to be
processed, avg cpu usage per container: 24.4%, memory usage: 1 GB
3 containers running on the node: message takes ~35 sec to be
processed, avg cpu usage per container: 24.6%, memory usage: 1 GB
I thought that containers are supposedly isolated, and I expected the processing time to be constant at ~18s regardless of the number of containers, but in this case, it seems that adding one container affects the processing time in other containers. Each container is set to use 1 core, so they shouldn't be overstepping to use each other's resources, and cpu is not reaching full utilization. Even if cpu was a bottleneck here, I'd expect that at least 2 containers would be able to run with ~18 sec processing time.
Is there a logical explanation for the results? Isn't it not possible to run multiple containers on the same Service Fabric host without affecting the performance of each when there are enough compute resources? How big could the Service Fabric overhead possibly be when trying to run multiple containers on the same node?
Thanks!

Your container is not only using CPU, but also memory and I/O (disk, network), which can also become bottlenecks.
To see the overhead of SF, run the containers outside of SF and see if it makes a difference.
Use a machine with more memory, and after that, try using an SSD drive. See if that increases performance.
To avoid process overhead, consider using a single container and have multiple threads do parallel message processing. Make sure to assign it 3 cores.

Related

Can't allocate more than 1 core to a container

Having an issue allocating more than 1 cpu to a pod that is running code that requires more processing power.
I have set my limit for a container to 3 cpu's
and have set set the container to request 2 cpu;s with a limit of 3
But when running the container it does not go over 1000Mi of 1 cpu.
There is very little running during this process and keda will start new nodes if needed.
How can i assign more cpu power to this container?
UPDATE
So i changed the Default Limit as suggested by moonkotte but i can only ever get just over 1 cpu
New Nodes are coming online when more containers are required, through Keda.
each node has 4 cpu, so sufficient resources
this is the details of each node. in this it is running one of the containers in question
It just isn't using all the cpu allocated

Rightsizing Kubernetes Nodes | How much cost we save when we switch from VMs to containers

We are running 4 different micro-services on 4 different ec2 autoscaling groups:
service-1 - vcpu:4, RAM:32 GB, VM count:8
service-2 - vcpu:4, RAM:32 GB, VM count:8
service-3 - vcpu:4, RAM:32 GB, VM count:8
service-4 - vcpu:4, RAM:32 GB, VM count:16
We are planning to migrate this workload on EKS (in containers)
We need help in deciding the right node configuration (in EKS) to start with.
We can start with a small machine vcpu:4, RAM:32 GB, but will not get any cost saving as each container will need a separate vm.
We can use a large machine vcpu:16, RAM: 128 GB, but when these machines scale out, scaled out machine will be large and thus can be underutiliized.
Or we can go with a Medium machine like vcpu: 8, RAM:64 GB.
Other than this recommendation, we were also evaluating the cost saving of moving to containers.
As per our understanding, every VM machine comes with following overhead
Overhead of running hypervisor/virtualisation
Overhead of running separate Operating system
Note: One large VM vs many small VMs cost the same on public cloud as cost is based on number of vCPUs + RAM.
Hypervisor/virtualization cost is only valid if we are running on-prem, so no need to consider this.
On the 2nd point, how much resources a typical linux machine can take to run a OS? If we provision a small machine (vcpu:2, RAM:4GB), an approximate cpu usage is 0.2% and memory consumption (other than user space is 500Mb).
So, running large instances (count:5 instances in comparison to small instances count:40) can save 35 times of this cpu and RAM, which does not seem significant.
You are unlikely to see any cost savings in resources when you move to containers in EKS from applications running directly on VM's.
A Linux Container is just an isolated Linux process with specified resource limits, it is no different from a normal process when it comes to resource consumption. EKS still uses virtual machines to provide compute to the cluster, so you will still be running processes on a VM, regardless of containerization or not and from a resource point of view it will be equal. (See this answer for a more detailed comparison of VM's and containers)
When you add Kubernetes to the mix you are actually adding more overhead compared to running directly on VM's. The Kubernetes control plane runs on a set of dedicated VM's. In EKS those are fully managed in a PaaS, but Amazon charges a small hourly fee for each cluster.
In addition to the dedicated control plane nodes, each worker node in the cluster need a set of programs (system pods) to function properly (kube-proxy, kubelet etc.) and you may also define containers that must run on each node (daemon sets), like log collectors and security agents.
When it comes to sizing the nodes you need to find a balance between scaling and cost optimization.
The larger the worker node is the smaller the relative overhead of system pods and daemon sets become. In theory a worker node large enough to accommodate all your containers would maximize resources consumed by your applications compared to supporting applications on the node.
The smaller the worker nodes are the smaller the horizontal scaling steps can be, which is likely to reduce waste when scaling. It also provides better resilience as a node failure will impact fewer containers.
I tend to prefer nodes that are small so that scaling can be handled efficiently. They should be slightly larger than what is required from the largest containers, so that system pods and daemon sets also can fit.

Airflow Memory Error: Task exited with return code -9

According to both of these Link1 and Link2, my Airflow DAG run is returning the error INFO - Task exited with return code -9 due to an out-of-memory issue. My DAG run has 10 tasks/operators, and each task simply:
makes a query to get one of my BigQuery tables, and
writes the results to a collection in my Mongo database.
The size of the 10 BigQuery tables range from 1MB to 400MB, and the total size of all 10 tables is ~1GB. My docker container has default 2GB of memory and I've increased this to 4GB, however I am still receiving this error from a few of the tasks. I am confused about this, as 4GB should be plenty of memory for this. I am also concerned because, in the future, these tables may become larger (a single table query could be 1-2GB), and I'd like to avoid these return code -9 errors at that time.
I'm not quite sure how to handle this issue, since the point of the DAG is to transfer data from BigQuery to Mongo daily, and the queries / data in-memory for the DAG's tasks is necessarily fairly large then, based on the size of the tables.
As you said, the error message you get corresponds to an out of memory issue.
Referring to the official documentation:
DAG execution is RAM limited. Each task execution starts with two
Airflow processes: task execution and monitoring. Currently, each node
can take up to 6 concurrent tasks. More memory can be consumed,
depending on the size of the DAG.
High memory pressure in any of the GKE nodes will lead the Kubernetes scheduler to evict pods from nodes in an attempt to relieve that pressure. While many different Airflow components are running within GKE, most don't tend to use much memory, so the case that happens most frequently is that a user uploaded a resource-intensive DAG. The Airflow workers run those DAGs, run out of resources, and then get evicted.
You can check it with following steps:
In the Cloud Console, navigate to Kubernetes Engine -> Workloads
Click on airflow-worker, and look under Managed pods
If there are pods that show Evicted, click each evicted pod and look for the The node was low on resource: memory message at the top of the window.
What are the possible ways to fix OOM issue?
Create a new Cloud Composer environment with a larger machine type than the current machine type.
Ensure that the tasks in the DAG are idempotent, which means that the result of running the same DAG run multiple times should be the same as the result of running it once.
Configure task retries by setting the number of retries on the task - this way when your task gets -9'ed by the scheduler it will go to up_for_retry instead of failed
Additionally you can check the behavior of CPU:
In the Cloud Console, navigate to Kubernetes Engine -> Clusters
Locate Node Pools at the bottom of the page, and expand the default-pool section
Click the link listed under Instance groups
Switch to the Monitoring tab, where you can find CPU utilization
Ideally, the GCE instances shouldn't be running over 70% CPU at all times, or the Composer environment may become unstable during resource usage.
I hope you find the above pieces of information useful.
I am going to chunk the data so that less is loaded into any 1 task at any given time. I'm not sure yet whether I will need to use GCS/S3 for intermediary storage.

Kubernetes limit and request of resource would be better to be closer

I was told by a more experienced DevOps person, that resource(CPU and Memory) limit and request would be nicer to be closer for scheduling pods.
Intuitively I can image less scaling up and down would require less calculation power for K8s? or can someone explain it in more detail?
The resource requests and limits do two fundamentally different things. The Kubernetes scheduler places a pod on a node based only on the sum of the resource requests: if the node has 8 GB of RAM, and the pods currently scheduled on that node requested 7 GB of RAM, then a new pod that requests 512 MB will fit there. The limits control how much resource the pod is actually allowed to use, with it getting CPU-throttled or OOM-killed if it uses too much.
In practice many workloads can be "bursty". Something might require 2 GB of RAM under peak load, but far less than that when just sitting idle. It doesn't necessarily make sense to provision enough hardware to run everything at peak load, but then to have it sit idle most of the time.
If the resource requests and limits are far apart then you can "fit" more pods on the same node. But, if the system as a whole starts being busy, you can wind up with many pods that are all using above their resource request, and actually use more memory than the node has, without any individual pod being above its limit.
Consider a node with 8 GB of RAM, and pods with 512 MB RAM resource requests and 2 GB limits. 16 of these pods "fit". But if each pod wants to use 1 GB RAM (allowed by the resource limits) that's more total memory than the node has, and you'll start getting arbitrary OOM-kills. If the pods request 1 GB RAM instead, only 8 will "fit" and you'll need twice the hardware to run them at all, but in this scenario the cluster will run happily.
One strategy for dealing with this in a cloud environment is what your ops team is asking, make the resource requests and limits be very close to each other. If a node fills up, an autoscaler will automatically request another node from the cloud. Scaling down is a little trickier. But this approach avoids problems where things die randomly because the Kubernetes nodes are overcommitted, at the cost of needing more hardware for the idle state.

AWS ECS Task Memory and CPU Allocation

I'm looking for guidance on allocating memory for an ECS task. I'm running a Rails app for a client who wants to be as cheap as possible on server cost. I was looking at the medium server size that has 2 CPU and 4 gb memory.
Most of the time I'll only need 1 container running the rails server at a time. However, there are occasional spikes and I want to scale out another server and have the container deployed to it. When traffic slows down, I want to scale back down to the single server / task.
Here's where I need help:
What should I make my task memory setting be? 4GB? That would be the total on the box but doesn't account for system processes. I could do 3 GB, but then I'd be wasting some passionless free memory. Same question for the CPU... should I just make it 100%?
I don't want to pay for a bigger server, i.e. 16 GB to sit there and only have 1 container needed most of the time... such a waste.
What I want seems simple. 1 task per instance. When the instance gets to 75% usage, scale a new instance and deploy the task to the second. I don't get why I have to set task memory and CPU settings when it's a one-to-one ratio.
Can anyone give me guidance on how to do what I've described? Or what the proper task definition settings should be when it's meant to be one-to-one with the instance?
Thanks for any help.
--Edit--
Based on feedback, here's a potential solution:
Task definition = memory reservation is 3 GB and memory is 4 GB.
Ec2 medium nodes, Which have 4 GB
ECS Service autoscaling configured:
- scale up (increase task count by 1) when Service CPU utilization is greater than 75%.
- scale down (decrease task count by 1) when Service CPU utilization is less than 25%.
ECS Cluster scaling configured:
- scale up (increase ec2 instance count by 1) when cluster memory utilization is greater than 80%.
- scale down (decrease ec2 instance count by 1) when cluster memory utilization is less than 40%.
Example:
Starts with 1 EC2 instance running a task with 3 GB reservation. This is 75% cluster utilization.
When the service spikes and CPU utilization of the service jumps to greater than 75%, it will trigger a service scale. Now the task count is increased and the new task is asking for 3 GB again, which makes it a total of 6 GB but only 4 is available so the cluster is at 150% utilization.
This triggers the cluster scale (over 80%) which adds a new ec2 node to the cluster for the new service. When it's there, we're back down to 6GB demand / 8 GB available which is 75% and stable.
The scale down would happen the same.
For setting memory for containers, I would recommend using "memoryReservation": The soft limit of memory to your container and
"memory": the hard limit on your container.
You can set "memoryReservation" to 3GB, which will ensure the other instance of the container does not land up on the same EC2 instance. The "memory" option will allow the container to swell up on memory when absolutely needed.
Ref:http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
ECS right now does not support the flexibility to disable the deployment of same task twice on the same ec2 compute instance.
But you can hack your way by either blocking cpu/memory or externalizing a known port on you task.