Running an aws ecs task in a region based on the cost of spot instances - amazon-ecs

I am recently experimenting with AWS and in particular the ECS service.
I have an application that runs tasks in a cluster and these tasks are launched via a lambda using boto3.
Wanting to reduce the cost of running the containers I was thinking of creating multiple clusters in different regions by setting FARGATE_SPOT as the capacity provider and then choosing the cluster in which to run the containers based on the lowest cost of the spot instances.
To control that cost and select the region accordingly I was thinking about the existence of an API that would allow me to do this in the lambda but I couldn't find anything in the documentation.
Would it be possible to do something like this ? If yes, is there an API to check the trend of the spot market in the various regions ?

Fargate spot pricing is not variable like EC2 spot pricing. There is no Fargate spot market like there is with EC2. There is no trending Fargate spot price like EC2. It is a set price, with the cheapest Fargate spot pricing always being in the US-EAST-2 Ohio region, followed by Oregon, then Virginia.

Related

split Kubernetes cluster costs between namespaces

We are running a multi tenant Kubernetes cluster running on EKS (in AWS) and I need to come up with an appropriate way of charging all the teams that use the cluster. We have the costs of the EC2 worker nodes but I don't know how to split these costs up given metrics from prometheus. To make it trickier I also need to give the cost per team (or pod/namespace) for the past week and the past month.
Each team uses a different namespace but this will change soon so that each pod will have a label with the team name.
From looking around I can see that I'll need to use container_spec_cpu_shares and container_memory_working_set_bytes metrics but how can these two metrics be combined to used so that we get a percentage of the worker node cost?
Also, I don't know promql well enough to know how to get the stats for the past week and the past month for the range vector metrics.
If anyone can share a solution if they're done this already or maybe even point me in the right direction i would appreciate it.
Thanks

Can service fabric autoscaling scale out nodes as well?

Based on this
Link, auto scaling instances or partitions are provided from service fabric.
However, what's confusing is if this can also provide auto-scaling in/out of the nodes(VMs / actual physical environment), which seems not mentioned explicitly.
Yes, you can auto scale the cluster as well, assuming that you are running in Azure. This will be done based on performance counter data. It works by defining rules on the VM scaleset.
Note that in order to automatically scale down gracefully, it's recommended you use the durability level Gold or Silver, otherwise you'll be responsible to drain the node before it's taken out of the cluster.
More info here and here.

Breakdown of GKE bill based on pods or deployments

I need a breakdown of my usage inside a single project categorized on the basis of Pods or Services or Deployments but the billing section in console doesn't seem to provide such granular information. Is it possible to get this data somehow? I want to know what was the network + compute cost on per deployment or pods.
Or maybe if it is possible to have it atleast on the cluster level? Is this breakdown available in BigQuery?
Recently it was released a new features in GKE that allows to collect metrics inside a cluster that can also be combined with the exported billing data to separate costs per project/environment, making it possible to separate costs per namespace, deployment, labels, among other criteria.
https://cloud.google.com/blog/products/containers-kubernetes/gke-usage-metering-whose-line-item-is-it-anyway
It's not possible at the moment to breakdown the billing on a pod level, services or deployment, Kubernetes Engine uses Google Compute Engine instances for nodes in the cluster. You are billed for each of those instances according to Compute Engine's pricing, until the nodes are deleted. Compute Engine resources are billed on a per-second basis with a 1 minute minimum usage cost.
You can Export Billing Data to BigQuery enables you to export your daily usage and cost estimates automatically throughout the day to a BigQuery dataset you specify. You can then access your billing data from BigQuery then you can use BigQuery queries on exported billing data to do some breakdown.
You can view your usage reports as well and estimate your kubernetes charges using the GCP Pricing Calculator. If you want to move forward you can create a PIT request as a future request
You can get this visibility with your GKE Usage Metering dataset and your BigQuery cost exports.
Cost per namespace, cost per deployment, per node can be obtained by writing queries to combine these tables. If you have labels set, you can drilldown based on labels too. It shows you what's the spend on CPU, RAM, and egress cost.
Check out economize.cloud - it integrates with your datasets and allows you to slice and dice views. For example, cost per customer or cost per service can be obtained with such granular cost data.
https://www.economize.cloud/blog/gke-usage-cost-monitoring-kubernetes-clusters/
New GCP offering: GKE Cost Allocation allows users easily and natively view and manage the cost of a GKE cluster by cluster, namespace pod labels and more right from the Billing page or export Detailed usage cost data to Big Query:
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations
GKE Cost Allocation is a more accurate and robust compare to GKE Usage Metering.
Kubecost provides Kubernetes cost allocation by any concept, e.g. pod, service, controller, etc. It's open source and is available for GKE, AWS/EKS, and other major providers. https://github.com/kubecost/cost-model

Is there a way to test an M10 Atlas cluster on MongoDB Atlas?

We have an M10 cluster and the official page states that we get a max of 100 IOPS.
I cant run mongoperf on the cluster as we have direct mongo shell and compass access and mongoperf needs to be run on the instance that has MongoDB installed.
Is there any way to test the maximum requests per second that this cluster can handle and if not, is there any rough estimate available as to how many read/write operations it can handle concurrently?
PS:- Assume the queries being run aren't particularly complex and are only entering small sets of data such as Name, Email Address, Age, etc.
Thanks in advance!
Is there any way to test the maximum requests per second that this cluster can handle and if not, is there any rough estimate available as to how many read/write operations it can handle concurrently?
The answer to this, really depends on a lot of variables. i.e. document sizes, indexes utilisation, network latency between application and servers, etc.
To provide a rough estimation however, assuming your MongoDB cluster is hosted on AWS (GCP and Azure would be different), the specs would be:
M10, 2GB RAM and 10GB included storage.
In addition to this, you can select different EBS storage sizes as well as provisioned IOPS to match your required performance.
See also MongoDB Atlas: FAQ
We have an M10 cluster and the official page states that we get a max of 100 IOPS.
The number of IOPS advertised is, what would be advised by the cloud provider, i.e. AWS. Not taking account the network latency and your database usage that affects the server resources i.e. CPU/RAM.

Cluster node VM size options

There appears to be some churn in what VM sizes are available for nodes in a SF cluster. Not long after SF went GA I created a cluster using a mix of A0 and A1 nodes. I was ecstatic at the time to see this was supported as it's awesome for dev/qa scenarios.
Today I went to create a new cluster and find my options for VM size are severely limited. D1v2, D2v2 or D3v2 for Bronze durability and D15v2 for Gold. Hugely disappointing to say the least. And a significant backpedal from just a few weeks ago.
What is the backstory here?
Was my original cluster configuration never supposed to be allowed and was a bug in the Portal?
Were there problems seen with these sizes and the SF team decided they are unusable?
Something else entirely?
And is this a permanent decision?
I'd really like to see as many VM size options as possible be supported.
You're looking at the recommended list of VM sizes. You can still use any VM size you want, including A0 and A1, just click the "View All" button.
We generally recommend VMs with SSDs for stateful services so that your applications aren't bottlenecked on old spinning disks.
The list of recommended SKUs were rolled out in response to customer feedback.
All the VM options are still available under "View all" button. The intent was to make sure that the customers choose the recommended VMs with SSDs (with enough SSD space), unless they were specifically looking for a particular SKU. This was done mainly in response to a good number of customers wrongly choosing the DS SKUs when they really were looking for D series VMs. (Choosing of the DS SKUs resulted in the VMs quickly running out of disk space).
Although I realize that A0 SKU is very attractive in terms of price, and may be ideal for a test cluster, for a production Cluster it is strongly recommended that you do not choose A0 as the SKU for the primary node type. The primary node type is where majority of the system services live. For more considerations on cluster capacity planning see - https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-capacity/