checking maximum time taken by specific processes in BIDS SSIS packages - bids

is there a way to check maximum time taken by specific processes in BIDS SSIS packages while debugging, so we can optimize the those processes

Related

Optimization Experiment technical error Anylogic

I'm trying to minimize the waiting time of people in the queue for a truck, I provided 25 trucks and only 1 is being used, so I did an optimization experiment with the objective of minimizing the waiting time in the queue with requirements of 95% utilization of trucks, so more than one truck at once could deliver people, when I run the optimization experiment it gives me this error: OpenJDK 64 bit server warning there is insufficient memory of for the java runtime environment to continue , although I used maximum available memory of 16343, how to solve this issue in order to give me the best number of trucks?
Thanks
Insufficient memory of for the java runtime simply means there is too much memory required to run the model.
More often than not the issue in AnyLogic is that there are too many agents, especially if you run some parameter variation or optimization experiments that tries to run multiple experiments in parallel which increase the memory requirement
One option is to convert as many agents to Java Classes as possible. First start converting all agents that are not used in Flow Blocks and where you don't need any animation.
Check this blog post for an example and more information
https://www.theanylogicmodeler.com/post/why-use-java-classes-in-anylogic

Avoiding long compute acquisition time for sequential dataflows

I am experiencing very long compute acquisition times when running flows in sequence (7-8 minutes). I have a pipeline with several flows all running the same integration runtime with TTL = 15 minutes. It was my understanding that several flows executed one after the other and running on the same integration runtime would only incur long acquisition times for the first and not for subsequent flows, but I experience very sporadic behavior with subsequent flows sometimes spinning up very fast and other times much slower (3-8 minutes). How can this be avoided?
If you are using sequential data flow activity executions against the same Azure IR in the same data factory using a TTL, then you should see 1-3 minute startup times for each compute environment. If you are experiencing longer latencies, then please create a ticket on the Azure portal. Make sure you are following the techniques described here and here.

How to evenly distribute the DAGs runs execution during the day

I have a huge amount of DAGS (>>100.000) that should each run once a day.
In order to not have big spikes in processing at certain times during the day (and other reasons) I would like to have the actual DAG runs be distributed evenly throughout the day.
Do I need to do this programmatically by myself distributing the start_date throughout the day or is there a better way where Airflow does that for me?
One possibly solution: If you create one or more pools, each with a limited number of slots you can effectively set a 'maximum parallelism' of execution, and tasks will wait until a slot is available. However, it may not give you quite enough flexibility you need

Speed The Processing Time Of A JoB

I have a sample (100 row) and three steps in my Recipe; When i run the job to load the data in a table in bigquery; it takes 6mn to create the table. The timelapse is too long for a simple process like the one that i am testing. I am trying to understand if there is a way to speed up the job. Change some settings, increase the size of the machine, run the job at a specific time, ect.
If you look in Google Cloud Platform -> Dataflow -> Your Dataprep Job, you will see a workflow diagram containing computation steps and computation times. For complex flows, you can identify there the operations that take longer to know what to improve.
For small jobs there is not much improvement to do, since setting the environment takes about 4min. You can see on the right side the "Elapsed time" (real time) and a time graph illustrating how much it takes starting and stopping workers.

How to distribute data from MongoDb to processors

Please help me also with adding PROPER TAGS
I have a script that streams documents from mongo. It processes them one by one.
Here comes the problem. Processing a document takes some while. If running on 1 processor it's fine. But from 2+ processors, they can be reprocessing the same data. The processors will be added/removed dynamically depending on how our CI is busy.
How can I distribute the docs between the processors?
My ideas:
Distribution based on last digits of doc._id.
Whenever a processor is connected it writes a log to db. From the log other processors would calculate the ranges they should process. The problem is that the script is batch, say 500 docs per call.
Thanks for any ideas.