How to estimate RAM and CPU per Kubernetes Pod for a Spring Batch processing job? - kubernetes

I'm trying to estimate hardware resources for a Kubernetes Cluster to be able to handle the following scenarios:
On a daily basis I need to read 46,3 Million XML messages 10KB each (approx.) from a queue and then insert them in a Spark instance and in a Sybase DB instance. I need to come out with an estimation of how many pods I will need to process this amount of data and how much RAM and how many vCPUs will be required per pod in order to determine the characteristics of the nodes of the cluster. The reason behind all this is that we have some budget restrictions and we need to have an idea of the sizing before starting the corresponding development.
The second scenario is the same as the one already described but 18,65 times bigger, i.e. 833,33 Million XML messages per day. This is expected to be the case within a couple of years.
So far we plan to use Spring Batch with partitioning steps. I need orientation on how to determine the ideal Spring Batch configuration, required RAM, and required CPU per POD, as well as the number of PODS.
I will greatly appreciate any comments from your side.
Thanks in advance.

Related

Druid Cluster going into Restricted Mode

We have a Druid Cluster with the following specs
3X Coordinators & Overlords - m5.2xlarge
6X Middle Managers(Ingest nodes with 5 slots) - m5d.4xlarge
8X Historical - i3.4xlarge
2X Router & Broker - m5.2xlarge
Cluster often goes into Restricted mode
All the calls to the Cluster gets rejected with a 502 error.
Even with 30 available slots for the index-parallel tasks, cluster only runs 10 at time and the other tasks are going into waiting state.
Loader Task submission time has been increasing monotonically from 1s,2s,..,6s,..10s(We submit a job to load the data in S3), after
recycling the cluster submission time decreases and increases again
over a period of time
We submit around 100 jobs per minute but we need to scale it to 300 to catchup with our current incoming load
Cloud someone help us with our queries
Tune the specs of the cluster
What parameters to be optimized to run maximum number of tasks in parallel without increasing the load on the master nodes
Why is the loader task submission time increasing, what are the parameters to be monitored here
At 100 jobs per minute, this is probably why the overlord is being overloaded.
The overlord initiates a job by communicating with the middle managers across the cluster. It defines the tasks that each middle manager will need to complete and it monitors the task progress until completion. The startup of each job has some overhead, so that many jobs would likely keep the overlord busy and prevent it from processing the other jobs you are requesting. This might explain why the time for job submissions increases over time. You could increase the resources on the overlord, but this sounds like there may be a better way to ingest the data.
The recommendation would be to use a lot less jobs and have each job do more work.
If the flow of data is so continuous as you describe, perhaps a kafka queue would be the best target followed up with a Druid kafka ingestion job which is fully scalable.
If you need to do batch, perhaps a single index_parallel job that reads many files would be much more efficient than many jobs.
Also consider that each task in an ingestion job creates a set of segments. By running a lot of really small jobs, you create a lot of very small segments which is not ideal for query performance. Here some info around how to think about segment size optimization which I think might help.

Kubernetes Orchestration depending upon number of rows/records/Input Files

Requirement is to orchestrate ETL containers depending upon the number of records present at the Source system (SQL/Google Analytics/SAAS/CSV files).
To explain take a Use Case:- ETL Job has to process 50K records present in SQL server, however, it takes good processing time to execute this job by one server/node as this server makes a connection with SQL, fetches the data and process the records.
Now the problem is how to orchestrate in Kubernetes this ETL Job so that it scales up/down the containers depending upon number of records/Input. Like the case discussed above if there are 50K records to process in parallel then it should scale up the containers process the records and scales down.
You would generally use a queue of some kind and Horizontal Pod Autoscaler (HPA) to watch the queue size and adjust the queue consumer replicas automatically. Specifics depend on the exact tools you use.

What is the maximum number of jobs that you can submit to a kubernetes cluster?

I have a Kubernetes cluster and I have tested submitting 1,000 jobs at a time and the cluster has no problem handling this. I am interested in submitting 50,000 to 100,000 jobs and was wondering if the cluster would be able to handle this?
Yes you can but only if only don't run out of resources or you don't exceed this criteria regarding building large clusters.
Usually you want to limit your jobs in some way in order to better handle memory and CPU or to adjust it in any other way according to your needs.
So the best practice in your case would be to:
set as many jobs as you want (bear in mind the building large clusters criteria)
observe the resource usage
if needed use for example Resource Quotas in order to limit resources used by the jobs
I hope you find this helpful.

Apache Geode scaling

I'm trying to measure the performance of Geode
I have 3 identical hosts to test it.
I created a partitioned region.
I started a geode cluster with one server.
I do "get" and "put" operations in the loop.
I get about 50000 op/sec.
Add started a cluster with three geode nodes.
I do get and put operations in the loop.
I get the same 50000 op/sec.
I would expect to see the increased performance, but it is suprisingly the same for 1-node cluster and 3-nodes cluster.
Could you please help. What are the possible settings to change in order to get horizontal scalability.
Thank you.
Well, you just got horizontal scalability for data storage at no loss of throughput :)
To horizontally scale your throughput, I think your workload was not enough to max-out the server. You need to start multiple clients (OR threads in a single client) against a single server until you do not see throughput increase by adding any new clients. At this point you start a new server. This new server should allow you to add more clients and horizontally scale your throughput.
You may find the ycsb benchmark useful, which allows you to start multiple threads in a client to perform operations.
You should setuo and environment who you see a performance decrease with single node and then make same test with partitioned one.

Azure Service Fabric reliable collections and memory

Let's say I'm running a Service Fabric cluster on 5 D1 class (1 core, 3.5GB RAM, 50GB SSD) VMs. and that I'm running 2 reliable services on this cluster, one stateless and one stateful. Let's assume that the replica target is 3.
How to calculate how much can my reliable collections hold?
Let's say I add one or more stateful services. Since I don't really know how the framework distributes services do I need to take most conservative approach and assume that a node may run all of my stateful services on a single node and that their cumulative memory needs to be below the RAM available on a single machine?
TLDR - Estimating the expected capacity of a cluster is part art, part science. You can likely get a good lower bound which you may be able to push higher, but for the most part deploying things, running them, and collecting data under your workload's conditions is the best way to answer this question.
1) In general, the collections on a given machine are bounded by the amount of available memory or the amount of available disk space on a node, whichever is lower. Today we keep all data in the collections in memory and also persist it to disk. So the maximum amount that your collections across the cluster can hold is generally (Amount of available memory in the cluster) / (Target Replica Set Size).
Note that "Available Memory" is whatever is left over from other code running on the machines, including the OS. In your above example though you're not running across all of the nodes - you'll only be able to get 3 of them. So, (unrealistically) assuming 0 overhead from these other factors, you could expect to be able to put about 3.5 GB of data into that stateful service replica before you ran out of memory on the nodes on which it was running. There would still be 2 nodes in the cluster left empty.
Let's take another example. Let's say that it is about the same as your example above, except in this case you set up the stateful service to be partitioned. Let's say you picked a partition count of 5. So now on each node, you have a primary replica and 2 secondary replicas from other partitions. In this case, each partition would only be able to hold a maximum of around 1.16 GB of state, but now overall you can pack 5.83 GB of state into the cluster (since all nodes can now be utilized fully). Incidentally, just to prove out the math works, that's (3.5 GB of memory per node * 5 nodes in the cluster) [17.5] / (target replica set size of 3) = 5.83.
In all of these examples, we've also assumed that memory consumption for all partitions and all replicas is the same. A lot of the time that turns out to not be true (at least temporarily) - some partitions can end up with more or less work to do and hence have uneven resource consumption. We also assumed that the secondaries were always the same as the primaries. In the case of the amount of state, it's probably fair to assume that these will track fairly evenly, though for other resource consumption it may not (just something to keep in mind). In the case of uneven consumption, this is really where the rest of Service Fabric's Cluster Resource Management will help, since we can come to know about the consumption of different replicas and pack them efficiently into the cluster to make use of the available space. Automatic reporting of consumption of resources related to state in the collections is on our radar and something we want to do, so in the future, this would be automatic but today you'd have to report this consumption on your own.
2) By default, we will balance the services according to the default metrics (more about metrics is here). So by default, the different replicas of those two different services could end up on the machine, but in your example, you'll end up with 4 nodes with 1 replica from a service on it and then 1 node with two replicas from the two different services. This means that each service (each with 1 partition as per your example) would only be able to consume 1.75 GB of memory in each service for a total of 3.5 GB in the cluster. This is again less than the total available memory of the cluster since there are some portions of nodes that you're not utilizing.
Note that this is the maximum possible consumption, and presuming no consumption outside the service itself. Taking this as your maximum is not advisable. You'll want to reduce it for several reasons, but the most practical reason is to ensure that in the presence of upgrades and failures that there's sufficient available capacity in the cluster. As an example, let's say that you have 5 Upgrade Domains and 5 Fault Domains. Now let's say that a fault domain's worth of nodes fails while you have an upgrade going on in an upgrade domain. This means that (a little less than) 40% of your cluster capacity can be gone at any time, and you probably want enough room left over on the remaining nodes to continue. This means that if your cluster previously could hold 5.83 GB of state (from our prior calculations), in reality you probably don't want to put more than about 3.5 GB of state in it since with more of that the service may not be able to get back to 100% healthy (note also that we don't build replacement replicas immediately so the nodes would have to be down for your ReplicaRestartWaitDuration before you ran into this case). There's a bunch more information about metrics, capacity, buffered capacity (which you can use to ensure that room is left on nodes for the failure cases) and fault and upgrade domains are covered in this article.
There are some other things that practically will limit the amount of state you'll be able to store. You'll want to do several things:
Estimate the size of your data. You can make a reasonable estimate up-front of how big your data is by calculating the size of each field your objects hold. Be sure to take into consideration 64-bit references. This will give you a lower-bound starting point.
Storage overhead. Each object you store in a collection will come with some overhead for storing that object. In the reliable collections depending on the collection and the operations currently in flight (copy, enumerations, updates, etc.) this overhead can range from between 100 and around 700 bytes per item (row) stored in the collections. Do know also that we're always looking for ways to reduce the amount of overhead we introduce.
We also strongly recommend running your service over some period of time and measuring actual resource consumption via performance counters. Simulating some sort of real workload and then measuring the actual usage of the metrics you care about will serve you pretty well. The reason we recommend this in particular is that you will be able to see consumption from things like which CLR object heap your objects end up placed in, how often GC is running, if there's leaks, or other things like this which will impact the amount of memory you can actually utilize.
I know that this has been a long answer but I hope you find it helpful and complete.