Avoiding long compute acquisition time for sequential dataflows

Avoiding long compute acquisition time for sequential dataflows - azure-data-factory

I am experiencing very long compute acquisition times when running flows in sequence (7-8 minutes). I have a pipeline with several flows all running the same integration runtime with TTL = 15 minutes. It was my understanding that several flows executed one after the other and running on the same integration runtime would only incur long acquisition times for the first and not for subsequent flows, but I experience very sporadic behavior with subsequent flows sometimes spinning up very fast and other times much slower (3-8 minutes). How can this be avoided?

If you are using sequential data flow activity executions against the same Azure IR in the same data factory using a TTL, then you should see 1-3 minute startup times for each compute environment. If you are experiencing longer latencies, then please create a ticket on the Azure portal. Make sure you are following the techniques described here and here.

Related

Druid Cluster going into Restricted Mode

We have a Druid Cluster with the following specs
3X Coordinators & Overlords - m5.2xlarge
6X Middle Managers(Ingest nodes with 5 slots) - m5d.4xlarge
8X Historical - i3.4xlarge
2X Router & Broker - m5.2xlarge
Cluster often goes into Restricted mode
All the calls to the Cluster gets rejected with a 502 error.
Even with 30 available slots for the index-parallel tasks, cluster only runs 10 at time and the other tasks are going into waiting state.
Loader Task submission time has been increasing monotonically from 1s,2s,..,6s,..10s(We submit a job to load the data in S3), after
recycling the cluster submission time decreases and increases again
over a period of time
We submit around 100 jobs per minute but we need to scale it to 300 to catchup with our current incoming load
Cloud someone help us with our queries
Tune the specs of the cluster
What parameters to be optimized to run maximum number of tasks in parallel without increasing the load on the master nodes
Why is the loader task submission time increasing, what are the parameters to be monitored here

At 100 jobs per minute, this is probably why the overlord is being overloaded.
The overlord initiates a job by communicating with the middle managers across the cluster. It defines the tasks that each middle manager will need to complete and it monitors the task progress until completion. The startup of each job has some overhead, so that many jobs would likely keep the overlord busy and prevent it from processing the other jobs you are requesting. This might explain why the time for job submissions increases over time. You could increase the resources on the overlord, but this sounds like there may be a better way to ingest the data.
The recommendation would be to use a lot less jobs and have each job do more work.
If the flow of data is so continuous as you describe, perhaps a kafka queue would be the best target followed up with a Druid kafka ingestion job which is fully scalable.
If you need to do batch, perhaps a single index_parallel job that reads many files would be much more efficient than many jobs.
Also consider that each task in an ingestion job creates a set of segments. By running a lot of really small jobs, you create a lot of very small segments which is not ideal for query performance. Here some info around how to think about segment size optimization which I think might help.

How to troubleshoot and reduce communication overhead on Rockwell ControlLogix

Need help. We have a plc that's cpu keeps getting maxed out. We've already upgraded it once. Now we need work on optimize it.
We have over 50 outgoing msg instructions, 60 incoming, and 103 number of ethernet devices (flow meters, drives, etc) I've gone through and tried to make sure everything is cached that can be, only instructions that are currently needed are running, and communication to the same plc happen in the same scan, but I haven't made a dent.
I'm having trouble identifying which instructions are significant. It seems the connections will be consolidated so lots of msgs shouldn't be too big of a problem. Considering Produced & Consumed tags but our team isn't very familiar with them and I believe you have to do a download to modify them, which is a problem. Our IO module RPIs are all set to around 200ms, but that didn't seem to make a difference (from 5ms).
We have a shutdown this weekend and I plan on disabling everything and turning it back on one part at a time to see where the load is really coming from.
Does anyone have any suggestions? The task monitor doesn't have a lot of detail that I can understand, i.e. It's either too summarized or too instant for me to make heads or tales of it. Here is a couple screens from the Task Monitor to shed some light on what I'm seeing.

First question coming to mind is are you using the Continues Task or is all in Periodic tasks?

I had a similar issue many years ago with a CLX. Rockwell suggested increasing the System Overhead Time Slice to around 40 to 50%. The default is 20%.
Some details:
Look at the System Overhead Time Slice (go to Advanced tab under Controller Properties). Default is 20%. This determines the time the controller spends running its background tasks (communications, messaging, ASCII) relative to running your continuous task.
From Rockwell:
For example, at 25%, your continuous task accrues 3 ms of run time. Then the background tasks can accrue up to 1 ms of run time, then the cycle repeats. Note that the allotted time is interrupted, but not reduced, by higher priority tasks (motion, user periodic or event tasks).
Here is a detailed Word Doc from Rockwell:
https://rockwellautomation.custhelp.com/ci/fattach/get/162759/&ved=2ahUKEwiy88qq0IjeAhUO3lQKHf01DYcQFjADegQIAxAB&usg=AOvVaw125pgiSor_bf-BpNSvNVF8
And here is a detailed KB from Rockwell:
https://rockwellautomation.custhelp.com/app/answers/detail/a_id/42964

Speed The Processing Time Of A JoB

I have a sample (100 row) and three steps in my Recipe; When i run the job to load the data in a table in bigquery; it takes 6mn to create the table. The timelapse is too long for a simple process like the one that i am testing. I am trying to understand if there is a way to speed up the job. Change some settings, increase the size of the machine, run the job at a specific time, ect.

If you look in Google Cloud Platform -> Dataflow -> Your Dataprep Job, you will see a workflow diagram containing computation steps and computation times. For complex flows, you can identify there the operations that take longer to know what to improve.
For small jobs there is not much improvement to do, since setting the environment takes about 4min. You can see on the right side the "Elapsed time" (real time) and a time graph illustrating how much it takes starting and stopping workers.

Optimize JBPM5 engine's performance

We are launching processes in bulk (say 100 instances) in JBPM5 at a time. And each and every tasks of the process are started and completed by external programs asynchronously. In this scenario, the JBPM engine is taking much time to generate the next task and thus the overall performance is getting affected. (eg: Its taking an average of 45 mins to complete 100 process instances) Kindly suggest a way to optimize the performance of the jbpm5 engine.

Something must be wrong or misconfigured, as 45min to complete 100 process instances seems way too much, each request in general should take significantly less than a second in normal cases. But it's difficult to figure out what might be wrong. Do you have more info on your set up and what is actually taking up a lot of time? What types of external services are you invoking? Do you have a prototype available that we might be able to look at?
Kris

Yes that sounds as problem in your domain, and not in your engine. Some time ago we did some performance tests for in memory processes and for DB persisted processes and the latency introduced by the engine were less that 2ms per activity (in memory) and 5ms per activity (persisted in the database).
How exactly are you calling the engine, how are you hosting it? what kind of calls are you doing? Do yo have a way to measure how much time takes your external services to answer?
Cheers

Now its faster. After completing the task by
client.complete()
i'm intimating/signalling the server using the command
ksession.getWorkItemManager().completeWorkItem(id, data);
with this the engine is generating the subsequent tasks faster and i could able to retrieve it for my processing.
But is this the ideal way of completing any tasks..?

Celery - Granular tasks vs. message passing overhead

The Celery docs section Performance and Strategies suggests that tasks with multiple 'steps' should be divided into subtasks for more efficient parallelization. It then mentions that (of course) there will be more message passing overhead, so dividing into subtasks may not be worth the overhead.
In my case, I have an overall task of retrieving a small image (150px x 115px) from a third party API, then uploading via HTTP to my site's REST API. I can either implement this as a single task, or divide up the steps of retrieving the image and then uploading it into two seperate tasks. If I go with seperate tasks, I assume I will have to pass the image as part of the message to the second task.
My question is, which approach should be better in this case, and how can I measure the performance in order to know for sure?

Since your jobs are I/O-constrained, dividing the task may increase the number of operations that can be done in parallel. The message-passing overhead is likely to be tiny since any capable broker should be able to handle lots of messages/second with only a few ms of latency.
In your case, uploading the image will probably take longer than downloading it. With separate tasks, the download jobs needn't wait for uploads to finish (so long as there are available workers). Another advantage of separation is that you can put each job on different queue and dedicate more workers as backed-up queues reveal themselves.
If I were to try to benchmark this, I would compare execution times using same number of workers for each of the two strategies. For instance 2 workers on the combined task vs 2 workers on the divided one. Then do 4 workers on each and so on. My inclination is that the separated task will show itself to be faster; especially when the worker count is increased.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Avoiding long compute acquisition time for sequential dataflows - azure-data-factory

Related

Druid Cluster going into Restricted Mode

How to troubleshoot and reduce communication overhead on Rockwell ControlLogix

Speed The Processing Time Of A JoB

Optimize JBPM5 engine's performance

Celery - Granular tasks vs. message passing overhead

Categories

Resources