Speed The Processing Time Of A JoB - google-cloud-dataprep

I have a sample (100 row) and three steps in my Recipe; When i run the job to load the data in a table in bigquery; it takes 6mn to create the table. The timelapse is too long for a simple process like the one that i am testing. I am trying to understand if there is a way to speed up the job. Change some settings, increase the size of the machine, run the job at a specific time, ect.

If you look in Google Cloud Platform -> Dataflow -> Your Dataprep Job, you will see a workflow diagram containing computation steps and computation times. For complex flows, you can identify there the operations that take longer to know what to improve.
For small jobs there is not much improvement to do, since setting the environment takes about 4min. You can see on the right side the "Elapsed time" (real time) and a time graph illustrating how much it takes starting and stopping workers.

Related

Avoiding long compute acquisition time for sequential dataflows

I am experiencing very long compute acquisition times when running flows in sequence (7-8 minutes). I have a pipeline with several flows all running the same integration runtime with TTL = 15 minutes. It was my understanding that several flows executed one after the other and running on the same integration runtime would only incur long acquisition times for the first and not for subsequent flows, but I experience very sporadic behavior with subsequent flows sometimes spinning up very fast and other times much slower (3-8 minutes). How can this be avoided?
If you are using sequential data flow activity executions against the same Azure IR in the same data factory using a TTL, then you should see 1-3 minute startup times for each compute environment. If you are experiencing longer latencies, then please create a ticket on the Azure portal. Make sure you are following the techniques described here and here.

How to troubleshoot and reduce communication overhead on Rockwell ControlLogix

Need help. We have a plc that's cpu keeps getting maxed out. We've already upgraded it once. Now we need work on optimize it.
We have over 50 outgoing msg instructions, 60 incoming, and 103 number of ethernet devices (flow meters, drives, etc) I've gone through and tried to make sure everything is cached that can be, only instructions that are currently needed are running, and communication to the same plc happen in the same scan, but I haven't made a dent.
I'm having trouble identifying which instructions are significant. It seems the connections will be consolidated so lots of msgs shouldn't be too big of a problem. Considering Produced & Consumed tags but our team isn't very familiar with them and I believe you have to do a download to modify them, which is a problem. Our IO module RPIs are all set to around 200ms, but that didn't seem to make a difference (from 5ms).
We have a shutdown this weekend and I plan on disabling everything and turning it back on one part at a time to see where the load is really coming from.
Does anyone have any suggestions? The task monitor doesn't have a lot of detail that I can understand, i.e. It's either too summarized or too instant for me to make heads or tales of it. Here is a couple screens from the Task Monitor to shed some light on what I'm seeing.
First question coming to mind is are you using the Continues Task or is all in Periodic tasks?
I had a similar issue many years ago with a CLX. Rockwell suggested increasing the System Overhead Time Slice to around 40 to 50%. The default is 20%.
Some details:
Look at the System Overhead Time Slice (go to Advanced tab under Controller Properties). Default is 20%. This determines the time the controller spends running its background tasks (communications, messaging, ASCII) relative to running your continuous task.
From Rockwell:
For example, at 25%, your continuous task accrues 3 ms of run time. Then the background tasks can accrue up to 1 ms of run time, then the cycle repeats. Note that the allotted time is interrupted, but not reduced, by higher priority tasks (motion, user periodic or event tasks).
Here is a detailed Word Doc from Rockwell:
https://rockwellautomation.custhelp.com/ci/fattach/get/162759/&ved=2ahUKEwiy88qq0IjeAhUO3lQKHf01DYcQFjADegQIAxAB&usg=AOvVaw125pgiSor_bf-BpNSvNVF8
And here is a detailed KB from Rockwell:
https://rockwellautomation.custhelp.com/app/answers/detail/a_id/42964

How to evenly distribute the DAGs runs execution during the day

I have a huge amount of DAGS (>>100.000) that should each run once a day.
In order to not have big spikes in processing at certain times during the day (and other reasons) I would like to have the actual DAG runs be distributed evenly throughout the day.
Do I need to do this programmatically by myself distributing the start_date throughout the day or is there a better way where Airflow does that for me?
One possibly solution: If you create one or more pools, each with a limited number of slots you can effectively set a 'maximum parallelism' of execution, and tasks will wait until a slot is available. However, it may not give you quite enough flexibility you need

Optimize JBPM5 engine's performance

We are launching processes in bulk (say 100 instances) in JBPM5 at a time. And each and every tasks of the process are started and completed by external programs asynchronously. In this scenario, the JBPM engine is taking much time to generate the next task and thus the overall performance is getting affected. (eg: Its taking an average of 45 mins to complete 100 process instances) Kindly suggest a way to optimize the performance of the jbpm5 engine.
Something must be wrong or misconfigured, as 45min to complete 100 process instances seems way too much, each request in general should take significantly less than a second in normal cases. But it's difficult to figure out what might be wrong. Do you have more info on your set up and what is actually taking up a lot of time? What types of external services are you invoking? Do you have a prototype available that we might be able to look at?
Kris
Yes that sounds as problem in your domain, and not in your engine. Some time ago we did some performance tests for in memory processes and for DB persisted processes and the latency introduced by the engine were less that 2ms per activity (in memory) and 5ms per activity (persisted in the database).
How exactly are you calling the engine, how are you hosting it? what kind of calls are you doing? Do yo have a way to measure how much time takes your external services to answer?
Cheers
Now its faster. After completing the task by
client.complete()
i'm intimating/signalling the server using the command
ksession.getWorkItemManager().completeWorkItem(id, data);
with this the engine is generating the subsequent tasks faster and i could able to retrieve it for my processing.
But is this the ideal way of completing any tasks..?

Celery - Granular tasks vs. message passing overhead

The Celery docs section Performance and Strategies suggests that tasks with multiple 'steps' should be divided into subtasks for more efficient parallelization. It then mentions that (of course) there will be more message passing overhead, so dividing into subtasks may not be worth the overhead.
In my case, I have an overall task of retrieving a small image (150px x 115px) from a third party API, then uploading via HTTP to my site's REST API. I can either implement this as a single task, or divide up the steps of retrieving the image and then uploading it into two seperate tasks. If I go with seperate tasks, I assume I will have to pass the image as part of the message to the second task.
My question is, which approach should be better in this case, and how can I measure the performance in order to know for sure?
Since your jobs are I/O-constrained, dividing the task may increase the number of operations that can be done in parallel. The message-passing overhead is likely to be tiny since any capable broker should be able to handle lots of messages/second with only a few ms of latency.
In your case, uploading the image will probably take longer than downloading it. With separate tasks, the download jobs needn't wait for uploads to finish (so long as there are available workers). Another advantage of separation is that you can put each job on different queue and dedicate more workers as backed-up queues reveal themselves.
If I were to try to benchmark this, I would compare execution times using same number of workers for each of the two strategies. For instance 2 workers on the combined task vs 2 workers on the divided one. Then do 4 workers on each and so on. My inclination is that the separated task will show itself to be faster; especially when the worker count is increased.