How to increase the number of parallel workers in pgadmin4 - postgresql

my goal is to increase the number of parallel workers used in the docker image dpage/pgadmin4. I can configure parallel workers for each table and I can increase the number of threads but I cant manage to increase the number of workers when the container start as you can see in the image below. In a environment where we have multiple tables change the configuration for each table can be painful. Thanks in advance.

Related

How to estimate RAM and CPU per Kubernetes Pod for a Spring Batch processing job?

I'm trying to estimate hardware resources for a Kubernetes Cluster to be able to handle the following scenarios:
On a daily basis I need to read 46,3 Million XML messages 10KB each (approx.) from a queue and then insert them in a Spark instance and in a Sybase DB instance. I need to come out with an estimation of how many pods I will need to process this amount of data and how much RAM and how many vCPUs will be required per pod in order to determine the characteristics of the nodes of the cluster. The reason behind all this is that we have some budget restrictions and we need to have an idea of the sizing before starting the corresponding development.
The second scenario is the same as the one already described but 18,65 times bigger, i.e. 833,33 Million XML messages per day. This is expected to be the case within a couple of years.
So far we plan to use Spring Batch with partitioning steps. I need orientation on how to determine the ideal Spring Batch configuration, required RAM, and required CPU per POD, as well as the number of PODS.
I will greatly appreciate any comments from your side.
Thanks in advance.

Do partitions increase performance when only using one computer/node?

I know that partitions will boost the performance by doing parallel tasks on different nodes in a cluster. But will partitions help me get better performance when I am only using one single computer? I am using Spark and Scala.
Yes it will increase performance.
Make sure your CPU have more than one core.
when you making your local sparksession, make sure to use multiple core :
local to run locally with one thread, or local[N] to run locally with N thread, i suggest you to use local[*]
and make sure your RDD/Dataset have multiple partition, i good number of partition is 2 to 4 time the number of core.
Apache Spark scacles as well vertically (CPU, Ram, ...) and horizontally (Nodes). I assume, that your computer/node has a CPU with more than one core. The partitions are then processed in parallel.

Kubernetes Orchestration depending upon number of rows/records/Input Files

Requirement is to orchestrate ETL containers depending upon the number of records present at the Source system (SQL/Google Analytics/SAAS/CSV files).
To explain take a Use Case:- ETL Job has to process 50K records present in SQL server, however, it takes good processing time to execute this job by one server/node as this server makes a connection with SQL, fetches the data and process the records.
Now the problem is how to orchestrate in Kubernetes this ETL Job so that it scales up/down the containers depending upon number of records/Input. Like the case discussed above if there are 50K records to process in parallel then it should scale up the containers process the records and scales down.
You would generally use a queue of some kind and Horizontal Pod Autoscaler (HPA) to watch the queue size and adjust the queue consumer replicas automatically. Specifics depend on the exact tools you use.

Large number of connections to kdb

I have a grid with over 10,000 workers, and I'm using qpython to append data to kdb. Currently with 1000 workers, I'm getting ~40 workers that fail to connect and send data on the first try, top shows q is at 100% cpu when that happens. As I scale to 10k workers, the problem will escalate. The volume of data is only 100MBs. I've tried running extra slaves, but kdb tells me I can't use it with -P option, which I'm guessing I need to use qpython. Any ideas how to scale to support 10k workers. My current idea is to write a server in between that will buffer write requests and pass them to kdb, is there a better solution?
It amazes me that you're willing to dedicate 10,000 cpus to Python but only a single one to Kdb.
Simply run more Kdb cores (on other ports) and then, enable another process to receive the updates from the ingestion cores. The tickerplant (u.q) is a good model for this.

running a single job across multiple workers in apache spark

I am trying to get to know how Spark splits a single job (a scala file built using sbt package and the jar is run using spark-submit command) across multiple workers.
For example : I have two workers (512MB memory each). I submit a job and it gets allocated to one worker only (if driver memory is less than the worker memory). In case the driver memory is more than the worker memory, it doesn't get allocated to any worker (even though the combined memory of both workers is higher than the driver memory) and goes to submitted state. This job then goes to running state only when a worker with the required memory is available in the cluster.
I want to know whether one job can be split up across multiple workers and can be run in parallel. If so, can anyone help me with the specific steps involved in it.
Note : the scala program requires a lot of jvm memory since I would be using a large array buffer and hence trying to split the job across multiple workers
Thanks in advance!!
Please check if the array you would be using is parallelized. Then when you do some action on it, it should work in parallel across the nodes.
Check out this page for reference : http://spark.apache.org/docs/0.9.1/scala-programming-guide.html
Make sure your RDD has more than one partition (rdd.partitions.size). Make sure you have more than one executor connected to the driver (http://localhost:4040/executors/).
If both of these are fulfilled, your job should run on multiple executors in parallel. If not, please include code and logs in your question.