Frame collector & Out of memory errors (large memory allocations)

Frame collector & Out of memory errors (large memory allocations) - anylogic

Task: Tasks spawn with fixed time intervals (source), each has remaining processing time which is given by uniform random [0 .. x]. Each task is processed by the module (delay). Each module has a fixed processing time. Module substracts it's processing time from the task's remaining processing time. If a task's remaining processing time depleted (less than 0), that task becomes completed reaches (sink). Otherwise it goes to the next module, and the same process repeats. There are N modules, that are linked one after eachother. If the task's remaining processing time has not depleted after processing at the N'th module, it goes to the 1st module with the highest priority and is being processed there until remaining processing time depletes.
Model Image
The issue: I've created the model, the max amount of spawned/sinked agents i could get is 17 for -Xmx8G and 15 for -Xmx4G. Then CPU/RAM usage rises to max and nothing happens.
Task Manager + Simulation Image
Task Manager Image
I've also checked troubleshooting "I got “Out Of Memory” error message. How can I fix it?" page.
Case
Result
Large number of agents, or agents with considerable memory footprints
My agents have 2 parameters that are unique to each agent. One is double (remaining_processing_time), another one is integer (queue_priority). Also all 17 spawned agents reached sink.
System Dynamics delay structures under small numeric time step settings
Not using that function anywhere, besides delay block.
Datasets auto-created for dynamic variables
This option is turned off
Maybe i missing something, but i can't really analyze with such small amount of agents. I'll leave a model here.

This model really had me stumped. Could not figure out where the memory was going and why as you run the model visually it stops yet the memory keeps on increasing exponentially... Until I did a Java profiling and found that you have 100s of Main in the memory...
]
You create a new Main() for every agent that comes from the source - so every agent is a new Main and every Main is a completely new "simulation" if you will..
Simply change it back to the default or in your case create your own agent type, since you want to save the remaining time and queue priority
You will also need to change the agent type in all your other blocks
Now if you run. your app it uses a fraction of memory

Related

reorder buffer problem (computer architecture Udacity course)

can someone explains to me why the issue time for instruction I5 is cycle 6 and not cycle 5 according to the solution manual provided to that problem.
Notes: 1) the problem and its published solution is mentioned below 2) this problem is part of the problem set for the computer architecture course on Udacity
problem:
Using Tomasulo's algorithm, for each instruction in the following
sequence determine when (in which cycle, counting from the start) it
issues, begins execution, and writes its result to the CDB. Assume
that the result of an instruction can be written in the cycle after it
finishes its execution, and that a dependent instruction can (if
selected) begin its execution in the cycle after that. The execution
time of all instructions is two cycles, except for multiplication
(which takes 4 cycles) and division (which takes 8 cycles). The
processor has one multiply/divide unit and one add/subtract unit. The
multiply/divide unit has two reservation stations and the add/subtract
unit has four reservation stations. None of the execution units is
pipelined â€“ each can only be executing one instruction at a time. If
a conflict for the use of an execution unit occurs when selecting
which instruction should start to execute, the older instruction (the
one that appears earlier in program order) has priority. If a conflict
for use of the CBD occurs, the result of the add/subtract unit has
priority over the result of the multiply/divide unit. Assume that at
start all instructions are already in the instruction queue, but none
has yet been issued to any reservation stations. The processor can
issue only one instruction per cycle, and there is only one CDB for
writing results. A way of handling exceptions in the processor
described above would be to simply delete all instructions from
reservation stations and the instruction queue, set all RAT entries to
point to the register file, and jump to the exception handler as soon
as possible (i.e. in the cycle after the one in which divide-by-zero
is detected). 1)Find the cycle time of each instruction for Issue,
Exection, and Write back stages. 2)What would be printed in the
exception handler if exceptions are handled this way?
provided solution:
timing diagram
solution for second question
The exception occurs in cycle 20, so the cycle in which we start executing the exception handler
is cycle 21. At that time, the processor has completed instructions I1-I4, but it has also completed
instructions I6 and I10. As a result, register F4 in the register file would have the result of I10,
which is -1 (5-6). The exception handler would print 2,0, -2, -1, which is incorrect.

Is there a limited ROB or RS (scheduler) size that would stop the front-end from issuing more instructions until some have dispatched to make more room (RS size), or until some have retired (ROB size)? It's common for the front-end's best case to be better throughput than the back-end, precisely so the back-end can get a look at possible independent instructions later on. But there has to be some limit to how many un-executed instructions can be tracked by the back-end.
In this case, yes:
The multiply/divide unit has two reservation stations and the add/subtract unit has four reservation stations
So I think that's the limiting factor there: the first two instructions are mul and div, and the first of those finishes on cycle 5. Apparently this CPU doesn't free the RS entry until the cycle after writeback. (And instead of one unified scheduler, it has queues (reservation stations) for each kind of execution unit separately.)
Some real CPUs may be more aggressive, e.g. I think Intel CPUs can free an RS entry sooner, even though they sometimes need to replay a uop if it was optimistically dispatched early in anticipation of a cache hit (when an input is the result of a load): Are load ops deallocated from the RS when they dispatch, complete or some other time?

Partial batch sizes

I'm trying to simulate pallet behavior by using batch and move to. This works fine except towards the end where the number of elements left is smaller than the batch size, and these never get picked up. Any way out of this situation?
Have tried messing with custom queues, pickup/dropoff pairs.
To elaborate, the batch object has a queue size of 15. However once the entire set has been processed a number of elements less than 15 remain which don't get picked up by the subsequent moveTo block. I need to send the agents to the subsequent block once the queue size falls below 15.

You can dynamically change the batch size of your Batch object towards "the end" (whatever you mean by that :-) ). You need to figure out when to change the batch size (as this depends on your model). But once it is time to adjust, you can call myBatchItem.set_batchSize(1) and it will now batch things together individually.
However, a better model design might be to have a cool-down period before the model end, i.e. stop taking model measurements before your batch objects run out of agents to batch.

You need to know what the last element is somehow for example using a boolean variable called isLast in your agent that is true for the last agent.
and in the batch you have to change the batch size programatically.. maybe like this in the on enter action of your batch:
if(agent.isLast)
self.set_batchSize(self.size());

To determine if the "end" or any lack of supply is reached, I suggest a timeout. I would save a timestamp in a variable lastBatchDate in the OnExit code of the batch block:
lastBatchDate = date();
A cyclically activated event checkForLeftovers will check every once in a while if there is objects waiting to be batched and the timeout (here: 10 minutes) is reached. In this case, the batch size will be reduced to exactly the number of waiting objects, in order for them to continue in a smaller batch:
if( lastBatchDate!=null //prevent a NullPointerError when date is not yet filled
&& ((date().getTime()-lastBatchDate.getTime())/1000)>600 //more than 600 seconds since last batch
&& batch.size()>0 //something is waiting
&& batch.size()<BATCH_SIZE //not more then a normal batch is waiting
){
batch.set_batchSize(batch.size()); // set the batch size to exactly the amount waiting
}
else{
batch.set_batchSize(BATCH_SIZE); // reset the batch size to the default value BATCH_SIZE
}
The model will look something like this:
However, as Benjamin already noted, you should be careful if this is what you really need to model. Take care for example on these aspects:
Is the timeout long enough to not accidentally push smaller batches during normal operations?
Is the timeout short enough to have any effect?
Is it ok to have a batch of a smaller size downstream in your process?
etc.
You might just want to make sure upstream that the number of objects reaches the batching station are always filling full batches, or you might to just stop your simulation before the line "runs dry".
You can see the model and download the source code here.

how to write data from Database Log to an output in anylogic?

I'm running a similation where i would lige to know the total amount of time agents spends in a delay block. I can access the data when running single simulations in the Dataset log under flowchart_stats_time_in_state_log
https://imgur.com/R5DG51a
However i would like to to write the data from block 5 (spraying) to an output in order to store the data when running multiple simulations.
https://imgur.com/MwPBvO8
Im guessing that the value reffence should look something like the expression below. It is not working however so i would aprreciate it alot if anybody could help me out or suggest an alternate solution for getting the data.
flowchart_stats_time_in_state_log.total_seconds.spraying;
Btw. Time measures dose not work for this situation since i need to know the total amount of time spend in a block after a 12 hour shift. with time measures i do not get the data from the agents that are still in the block when the simulation ends.

Based on the goal of summing all processing times, you could solve it mathematically. Set the output equal to block.statsUtilization.mean() * capacity * time() calculated on simulation end.
For example, if you have a capacity of 1 and a run length of 100 minutes, then if you had a utilization of 50%; that means you had an agent in the block for 50 minutes. Utilization = time busy / total time. Because of this relationship, we can calculate how long agents were actually in the block.

Another alternative would be to have a variable to track time in block, incrementing when the agents leave. At end of the run, you would need to call a function to iterate over the agents still in the block to add their time. AnyLogic allows you to pretty easily loop over queues, delays, or anything that holds agents:
for( MyAgent agent : delayBlockName ){
variable += time() - agent.enterBlockTime;
}
To implement this solution, you would need to create your own agent (name it something better than MyAgent) with a variable for when the agent enters the block. You would then need to then mark the time each agent enters the block.

Can a process ask for x amount of time but take y amount instead?

If I am running a set of processes and they all want these burst times: 3, 5, 2 respectively, with the total expected time of execution being 10 time units.
Is it possible for one of the processes to take up more that what they ask for? For example even though it asked for 3 it took 11 instead because it was waiting on the user to enter some input. So the total execution time turns out to be 18.
This was all done in a non-preemptive cpu scheduler.

The reality is that software has no idea how long anything will take - my CPU runs at a different "nominal speed" to your CPU, both our CPUs keep changing their speed for power management reasons, and the speed of software executed by both our CPUs is effected by things like what other CPUs are doing (especially for SMT/hyper-threading) and what other devices happen to be doing at the time (their effect on caches, shared RAM bandwidth, etc); and software can't predict the future (e.g. guess when an IRQ will occur and take some time and upset the cache contents, guess when a read from memory will take 10 times longer because there was a single bit error that ECC needed to correct, guess when the CPU will get hot and reduce its speed to avoid melting, etc). It is possible to record things like "start time, burst time and end time" as it happens (to generate historical data from the past that can be analysed) but typically these things are only seen in fabricated academic exercises that have nothing to do with reality.
Note: I'm not saying fabricated academic exercises are bad - it's a useful tool to help learn basic theory before moving on to more advanced (and more realistic) theory.
Instead; for a non-preemptive scheduler, tasks don't try to tell the scheduler how much time they think they might take - the task can't know this information and the scheduler can't do anything with that information (e.g. a non-preemptive scheduler can't preempt the task when it takes longer than it guessed it might take). For a non-preemptive scheduler; a task simply runs until it calls a kernel function that waits for something (e.g. read() that waits for data from disk or network, sleep() that waits for time to pass, etc) and when that happens the kernel function that was called ends up telling the scheduler that the task is waiting and doesn't need the CPU, and the scheduler finds a different task to run that can use the CPU; and if the task never calls a kernel function that waits for something then the task runs "forever".
Of course "the task runs forever" can be bad (not just for malicious code that deliberately hogs all CPU time as a denial of service attack, but also for normal tasks that have bugs), which is why (almost?) nobody uses non-preemptive schedulers. For example; if one (lower priority) task is doing a lot of heavy processing (e.g. spending hours generating a photo-realistic picture using ray tracing techniques) and another (higher priority) task stops waiting (e.g. because it was waiting for the user to press a key and the user did press a key) then you want the higher priority task to preempt the lower priority task "immediately" (e.g. because most users don't like it when it takes hours for software to respond to their actions).

How to model pipeline processing when doing task scheduling and resource allocation in CPLEX?

I've come up with a task scheduling and resource allocation problem in which resources can start running a new task with a complex condition.
A resource can start a new task on an even time unit if at least 2n time units has passed from starting the previous task started on an even time unit.
The same holds for odd time units.
Below is a valid scheduling on a single resource. Each number represents that a new task has been started at that time.
0, 1, 2n, 2n+1, 4n, 4n+1, ...
I've got a lot of tasks with precedence relations in between (I know how to cope with the precedence relations) and several resources of this kind. I carried the scheduling out the following way which does not yields an optimal result:
Although a task can start on an odd or an even time unit, I've constrained half of the tasks to start on even time units and the other half on odd time units using "forbidStart" and "stepFunction".
Per resource s, I've considered two "cumulFunction"s s_even and s_odd.
Tasks that are forbidden to start on even (odd) time units need the s_odd (s_even) resource. I defined this constrained using "cumulFunction" and "pulse".
Although the above procedure produces a valid scheduling, it is not enough since I'm seeking for an optimal solution. Does anybody have any idea how to carry out this problem in CPLEX?

As said at https://www.ibm.com/developerworks/community/forums/html/topic?id=ac7a4fa1-f304-420c-8302-18501b4b7602&ps=25 by Philippe Laborie
just consider an additional interval variable 'task' of length 2n that represents the task and have an alternative on two optional tasks 'taskEven' and 'taskOdd'. These two intervals are the ones you already have in your model (with the adequate forbidStart constraints, and with a contribution to the adequate resource).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse