Why speed of variable length pipeline is determined by the slowest stage + whats the total execution time of program? - cpu-architecture

I am new to pipelining and I need some help regarding the fact that
The speed of the pipelining is determined by the speed of the slowest stage
Not only this, if I am given a 5 stage pipeline with duration of them 5 ns,10 ns, 8 ns, 7 ns,7 ns respectively , it is said that each instruction would take 10 ns time.
Can I get a clear explanation for this?
(edited)
Also let my program has 3 instructions I1,I2,I3 and I take 1 clk cycle duration = 1ns
such that the above stages take - 5, 10, 8 , 7 , 7 clock cycles respectively.
Now according to theory a snapshot of the pipeline would be -
But that gives me a total time to be -no of clk cycles*clk cycle duration = 62 * 1 = 62 ns
But according to theory total time should be - (slowest stage) * no. of instructions = 10 * 3 = 30 ns
Though I have an idea why slowest stage is important (each pipeline stage needs to wait hence 1 instruction is produced after every 10 clk cycle- but the result is inconsistent when i calculate it using clk cycles.Why this inconsistency? What am I missing??
(edited)

Assume a car manufacturing process. Assume it's used two stage pipe lining. Say it takes 1 day to manufacture an engine. 2 days to manufacture the rest. You can do both stages in parallel. What is your car output rate? It should be one car per 2 days. Although you manufacture the rest in 1 day, you have to wait another day to get the engine.
In your case, although other stages finish their job in lesser time, you have to wait 10ns to get the whole process done

Staging allows you to do the "parts" of the same operation at onces.
I'll create a smaller example here, dropping the last 2 stages of your example: 5, 10, 8 ns
Let's take two operations:
5 10 8
5 10 8
| The first operation starts here
| At stage 2 the second operation can start it's fist stage
| However, since the stages take different amount of times,
| the longest ones determines the runtime
| the thirds stage can only start after the 2nd has completed: after 15ns
| this is also true for the 2nd stage of the 2nd operation

I am not sure about the source of your confusion. If one unit of your pipeline is taking longer, the units behind it cannot push the pipeline to move ahead until that unit is finished, even though they themselves are finished with their work. Like DPG said, try to look at it from the car manufacture line example. It is one of the most common ones given to explain a pipeline. If the units AHEAD of the slowest unit after finished quicker, it still doesn't matter because they have to wait until the slower unit finishes its work. So yes, your pipeline is executing 3 instructions for a total execution time of 30ns.

Thank you all for your answers
I think I have got it clear by now.
This is what I think the answer is -
Question- 1 :- Why pipeline execution dependent on the slowest step
Well clearly from the diagram each stage has to wait for the slowest stage to complete.
So the total time after which each instruction is completed bounded by the time of wait.
(In my example after a gap of 10 ns)
Question-2 :- Whats the total execution time of program
I wanted to know how long will the particular program containing 3 instructions take to execute
NOT how long will it take for 3 instructions to execute- which is obviously 30ns pertaining to the fact
that every instruction gets completed every 10 ns.
Now suppose I1 is fetched in the pipeline then already 4 other instructions are executing in it.
4 instructions are completed in 40 ns time.
After that I1,I2,I3 executed in order 30 ns.(Assuming no pipeline stalls)
This gives a total of 30+40=70ns .
In fact for n instruction program,k- stage pipeline
I think it is (n + k-1 ) *C *T
where C= no. of clk cycles in slowest stage
T= clock cycle time
Please review my understanding .... to know if I am thinking anything wrong and so that
I can accept my own answer!!!

Related

Remove stop times from ResourcePool for OEE calculations [duplicate]

I am working on production model where the the input of raw material is on hourly bases and i am running the model for 8 hours ( 1 shift ) so basically for 16 hour the resources are idle. When i was not using the schedule part and running the model for 8 *7 hours ( 56 hours) then the time measurement for each job is fine but now when i schedule the output it include the idle time also. So How can i only calculate the busy time to see the average time spent by a job in a workshop ( from raw material to finish good.
This the time spent by a job in a process it should be 34-16= approx 18
Firstly, one note: although you state that you run 8 hour shifts, the 8 AM to 6 PM time period is actually 10 hours so I will ignore it in this solution and instead assume that the shifts are actually 8 hours and run from 9:00 to 17:00.
Here is a simple model that was used to test (model time unit is SECONDS):
There are 4 elements to make this work:
Service must be configure to allow pre-emption of tasks with recovery this is done by using Priorities / preemption options as below:
ResourcePool must be configure for 'End of Shift' preemption as shown below:
Calculations of true time (excluding dead time between shifts) is done in f_calcTATsec function:
// get 'Service' enter time for that agent
double startTime = col_startTimesSec.get(_agent);
// calculate time spent
double timeSpent = time() - startTime;
traceln("%.2f: agent spent %.2f in service", time(), timeSpent);
traceln("%.2f: 8hrs is %.2f, 16hrs is %.2f",
time(), (8 * hour()), (16 * hour()));
// below is a ternary statement which says:
// if 'timeSpent' is less than 8 hrs then use it
// otherwise
// exclude whole 16 hr periods (can be more than 1)
// and use the remainder
double trueTimeSpent = timeSpent <= (8 * hour()) ?
timeSpent :
timeSpent % (16 * hour());
// return time spent
traceln("%.2f: returning %.2f", time(), trueTimeSpent);
return trueTimeSpent;
Service object needs to be configured to record entry time for each Agent in col_startTimeSec collection and then call f_calcTATsec() function on exit, i.e. On enter = col_startTimesSec.put(agent, time()); and On exit = double trueTimeSpentSec = f_calcTATsec(agent);
When i was not using the schedule part and running the model for 8 *7 hours ( 56 hours) then the time measurement for each job is fine but now when i schedule the output it include the idle time also
So I assume you were using TimeMeasureStart/End blocks to do the time-in-system measurement. They just calculate the elapsed time from the start block to the end block, and so can never account for 'time that shouldn't count'. You don't have to use these blocks to calculate timings; typically you store relevant start times in the (custom) agent type flowing through the process and then calculate the relevant elapsed times as needed (e.g., at on-exit of a Service block, pseudo-code is "current time - time-on-entry = elapsed time in block").
Firstly though, you need to be clearer about what metric you're calculating and why. You want to exclude time where an in-progress (presumably pre-empted) job waits for the resource to return on-shift. But what about jobs which are queueing for a resource which then goes off-shift? What about if there are multiple possible resources that can be used with different shift patterns? What about the more general waiting of jobs for resources (when resources are on-shift)?
It sounds like what you may really want is both
The elapsed time spent working on a job (cf. waiting for anything).
The elapsed time jobs spend waiting for something (typically resources in a Seize/Service block — not just when tasks are pre-empted by shift-end — but could be other wait mechanisms, such as using Wait blocks).
The latter is just the total elapsed time minus the former.
So there are multiple ways to tackle this. Probably the easiest is to retain your overall elapsed time (via TimeMeasureStart/End blocks) and then calculate the working time separately: store it as a variable in the job agent and add to it in each block where it has 'work done to it' (e.g., for Service blocks without pre-emption use duration from on-seize to on-exit, for Delay blocks use duration from on-enter to on-exit).
To handle where a shift-end-pre-empted task waits for a resource to return on-shift you can use the Service block's "On task suspended" and "On task resumed" actions which trigger when a task is suspended (due to pre-emption) or resumed (when the original resource becomes available if that's the preemption option you chose).
This requires an extra variable to store the "current duration start time".
To be explicit:
Type double Variable cumulativeWorkingTimeMins in your Job agent type
Type double Vriable currentWorkStartTimeMins in your Job agent type
...and for a Service block (handling the pre-emption case)
'On seize unit' (or 'On enter delay') action of agent.currentWorkStartTimeMins = time(MINUTE);
'On task suspended' action of agent.cumulativeWorkingTimeMins += (time(MINUTE) - agent.currentWorkStartTimeMins);
'On task resumed' action of agent.currentWorkStartTimeMins = time(MINUTE);
'On exit' action of agent.cumulativeWorkingTimeMins += (time(MINUTE) - agent.currentWorkStartTimeMins);
[Note units specified in variable names to be clear, and explicit specification of units when getting the current time; this ensures the code is robust to changing your model time unit.]
NB: If you really wanted to just subtract the time pre-empted jobs are waiting for off-shift resources to return (and no other waiting time) — which doesn't seem to make sense as a metric — you can still do that using a variant of the above which just captures that waiting time.
(You'll also need to store the relevant final numbers in some HistogramData element or similar when the job finishes to be able to then show this data in charts: the TimeMeasureEnd blocks automatically capture this histogram data in their distribution variable but, when calculating timings yourself, you need to store data yourself for charts.)

Time Distribution and time spent in process in anylogic

I am working on production model where the the input of raw material is on hourly bases and i am running the model for 8 hours ( 1 shift ) so basically for 16 hour the resources are idle. When i was not using the schedule part and running the model for 8 *7 hours ( 56 hours) then the time measurement for each job is fine but now when i schedule the output it include the idle time also. So How can i only calculate the busy time to see the average time spent by a job in a workshop ( from raw material to finish good.
This the time spent by a job in a process it should be 34-16= approx 18
Firstly, one note: although you state that you run 8 hour shifts, the 8 AM to 6 PM time period is actually 10 hours so I will ignore it in this solution and instead assume that the shifts are actually 8 hours and run from 9:00 to 17:00.
Here is a simple model that was used to test (model time unit is SECONDS):
There are 4 elements to make this work:
Service must be configure to allow pre-emption of tasks with recovery this is done by using Priorities / preemption options as below:
ResourcePool must be configure for 'End of Shift' preemption as shown below:
Calculations of true time (excluding dead time between shifts) is done in f_calcTATsec function:
// get 'Service' enter time for that agent
double startTime = col_startTimesSec.get(_agent);
// calculate time spent
double timeSpent = time() - startTime;
traceln("%.2f: agent spent %.2f in service", time(), timeSpent);
traceln("%.2f: 8hrs is %.2f, 16hrs is %.2f",
time(), (8 * hour()), (16 * hour()));
// below is a ternary statement which says:
// if 'timeSpent' is less than 8 hrs then use it
// otherwise
// exclude whole 16 hr periods (can be more than 1)
// and use the remainder
double trueTimeSpent = timeSpent <= (8 * hour()) ?
timeSpent :
timeSpent % (16 * hour());
// return time spent
traceln("%.2f: returning %.2f", time(), trueTimeSpent);
return trueTimeSpent;
Service object needs to be configured to record entry time for each Agent in col_startTimeSec collection and then call f_calcTATsec() function on exit, i.e. On enter = col_startTimesSec.put(agent, time()); and On exit = double trueTimeSpentSec = f_calcTATsec(agent);
When i was not using the schedule part and running the model for 8 *7 hours ( 56 hours) then the time measurement for each job is fine but now when i schedule the output it include the idle time also
So I assume you were using TimeMeasureStart/End blocks to do the time-in-system measurement. They just calculate the elapsed time from the start block to the end block, and so can never account for 'time that shouldn't count'. You don't have to use these blocks to calculate timings; typically you store relevant start times in the (custom) agent type flowing through the process and then calculate the relevant elapsed times as needed (e.g., at on-exit of a Service block, pseudo-code is "current time - time-on-entry = elapsed time in block").
Firstly though, you need to be clearer about what metric you're calculating and why. You want to exclude time where an in-progress (presumably pre-empted) job waits for the resource to return on-shift. But what about jobs which are queueing for a resource which then goes off-shift? What about if there are multiple possible resources that can be used with different shift patterns? What about the more general waiting of jobs for resources (when resources are on-shift)?
It sounds like what you may really want is both
The elapsed time spent working on a job (cf. waiting for anything).
The elapsed time jobs spend waiting for something (typically resources in a Seize/Service block — not just when tasks are pre-empted by shift-end — but could be other wait mechanisms, such as using Wait blocks).
The latter is just the total elapsed time minus the former.
So there are multiple ways to tackle this. Probably the easiest is to retain your overall elapsed time (via TimeMeasureStart/End blocks) and then calculate the working time separately: store it as a variable in the job agent and add to it in each block where it has 'work done to it' (e.g., for Service blocks without pre-emption use duration from on-seize to on-exit, for Delay blocks use duration from on-enter to on-exit).
To handle where a shift-end-pre-empted task waits for a resource to return on-shift you can use the Service block's "On task suspended" and "On task resumed" actions which trigger when a task is suspended (due to pre-emption) or resumed (when the original resource becomes available if that's the preemption option you chose).
This requires an extra variable to store the "current duration start time".
To be explicit:
Type double Variable cumulativeWorkingTimeMins in your Job agent type
Type double Vriable currentWorkStartTimeMins in your Job agent type
...and for a Service block (handling the pre-emption case)
'On seize unit' (or 'On enter delay') action of agent.currentWorkStartTimeMins = time(MINUTE);
'On task suspended' action of agent.cumulativeWorkingTimeMins += (time(MINUTE) - agent.currentWorkStartTimeMins);
'On task resumed' action of agent.currentWorkStartTimeMins = time(MINUTE);
'On exit' action of agent.cumulativeWorkingTimeMins += (time(MINUTE) - agent.currentWorkStartTimeMins);
[Note units specified in variable names to be clear, and explicit specification of units when getting the current time; this ensures the code is robust to changing your model time unit.]
NB: If you really wanted to just subtract the time pre-empted jobs are waiting for off-shift resources to return (and no other waiting time) — which doesn't seem to make sense as a metric — you can still do that using a variant of the above which just captures that waiting time.
(You'll also need to store the relevant final numbers in some HistogramData element or similar when the job finishes to be able to then show this data in charts: the TimeMeasureEnd blocks automatically capture this histogram data in their distribution variable but, when calculating timings yourself, you need to store data yourself for charts.)

Increasing concurrency in Azure Data Factory

We have a parent pipeline that gets a list of tables and feeds it into a ForEach. Within the ForEach we then call another pipeline passing in some config, this child pipeline moves the data for the table it is passed as config.
When we run this at scale I often see 20 or so instances of the child pipeline created in the monitor. All but 4 will be "Queued", the other 4 are executing as "In progress" . I can't seem to find any setting for this limit of 4. We have several hundred pipelines to execute and I really could do with it doing more than 4 at a time. I have set concurrency as 20 throughout the pipelines and tasks, hence we get 20 instances fired up. But I can't figure out what it is I need to twiddle to get more than 4 executing at the same time.
The ForEach looks like this
activities in ForEach loop look like this
many thanks
I think I have found it. On the child Pipeline (the one that is being executed inside the ForEach loop) on the General Tab is a concurrency setting. I had this set to 4. When I increased this to 8 I got 8 executing, and when I increased it to 20 I got 20 executing.
It seems max 20 loop iteration can be executed at once in parallel.
The documentation is however a bit unclear.
The BatchCount setting that controls this have max value to 50, default 20. But in the documentation for isSequential it states maximum is 20.
Under Limitations and workarounds, the documentation states:
"The ForEach activity has a maximum batchCount of 50 for parallel processing, and a maximum of 100,000 items."
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity

Is this an intelligent use case for optaPlanner?

I'm trying to clean up an enterprise BI system that currently is using a prioritized FIFO scheduling algorithm (so a priority 4 report from Tuesday will be executed before priority 4 reports from Thursday and priority 3 reports from Monday.) Additional details:
The queue is never empty, jobs are always being added
Jobs range in execution time from under a minute to upwards of 24 hours
There are 40 some odd identical app servers used to execute jobs
I think I could get optaPlanner up and running for this scenario, with hard rules around priority and some soft rules around average time in the queue. I'm new to scheduling optimization so I guess my question is what should I be looking for in this situation to decide if optaPlanner is going to help me or not?
The problem looks like a form of bin packing (and possibly job shop scheduling), which are NP-complete, so OptaPlanner will do better than a FIFO algorithm.
But is it really NP-complete? If all of these conditions are met, it might not be:
All 40 servers are identical. So running a priority report on server A instead of server B won't deliver a report faster.
All 40 servers are identical. So total duration (for a specific input set) is a constant.
Total makespan doesn't matter. So given 20 small jobs of 1 hour and 1 big job of 20 hours and 2 machines, it's fine that it takes all small jobs are done after 10 hours before the big job starts, given a total makespan of 30 hours. There's no desire to reduce the makespan to 20 hours.
"the average time in the queue" is debatable: do you care about how long the jobs are in the queue until they are started or until they are finished? If the total duration is a constant, this can be done by merely FIFO'ing the small jobs first or last (while still respecting priority of course).
There are no dependencies between jobs.
If all these conditions are met, OptaPlanner won't be able to do better than a correctly written greedy algorithm (which schedules the highest priority job that is the smallest/largest first). If any of these conditions aren't met (for example you buy 10 new servers which are faster), then OptaPlanner can do better. You just have to evaluate if it's worth spending 1 thread to figure that out.
If you use OptaPlanner, definitely take a look at real-time scheduling and daemon mode, to replan as new reports enter the system.

Real time process missing deadline with SCHED_RR

I have below configs on ARMv7 embedded OMAP system.
sched_rt_period_us = 1000000 = 1 sec
sched_rt_runtime_us = 950000 = 0.95 sec
And i have 4 Real time processes running with SCHED_RR and pri = 1
and sched_rr_get_interval () returned 93750000 nanosec, i.e. 0.093750 sec on system.
I have added a new process with SCHED_RR and pri of 1 and same default rr_interval
of 0.09375 sec.
According to this configs:
On every second 5 RT processes must execute 2 times each (0.09375 * 10 = 0.9375 sec) and
rest of the time interval of 1 Sec is to be used by non-RT tasks
i.e., 1.0 - 0.9375 = 0.0625 Sec.
But as i see from execution the 5th newly added task misses the timeline and only executes randomly and produces output every 1 sec or indeterminate. Please help me on how to make
this new process deterministic so that it executes twice per sec as per above configs.
I tried to configure static pri of 2 and also checked with SCHED_FIFO but got the same
results.
Or is there anything i am missing in these calculations.
I am using :
Linux xxxx 2.6.33 #2 PREEMPT Tue Aug 14 16:13:05 CEST 2012 armv7l GNU/Linux
Are you sure that the scheduler does not fail because it is not able to honor the scheduling requests? I mean, that fifth task doesn't meet the deadline because the system is too heavily loaded?
As far as I know, sched_setscheduler does not have a way to signal that the system load is too heavy. To know if the system is able to meet the request, you need another scheduling algorithm, such as edf. Maybe you want to check its implementation for linux.