Terminate a benchmark early because of long score calculation - drools

What i want to achieve
Currently i am running large inputs in my OptaPlanner project and with the current implementation of the constraints they are taking a long time even to calculate the initial score. So a given solver destroys the whole benchmark because it gets stuck and can not terminate. As a score calculation type i am using Drools.
I am trying to achieve an early termination of a solver that after a certain amount of time still has not passed the initial score calculation(no "Solving started" is displayed). So in a single benchmark i want to run multiple different inputs and for each of them i want to have a given timer and if that timer expires before the initial score calculation is done i want the solver to be terminated immediately. A desirable option would be to have a percentage of how much of the score calculation was completed.
The reason why i'm not just jumping on to doing optimizations is because i want to have a baseline for comparison and keep track of the results as the optimizations go on. So the information how much percentage of the initial score calculation has passed is vital for me.
What i have/know currently
The version of OptaPlanner that i'm using is the one from GitHub that has the whole source code open(it is not the official version from the website which is compiled in JAR's and the core is not editable)
Implemented timers for each solver of the benchmark which after a given time period call the solver.terminateEarly() method.
Each solver runs on it's unique thread. So the relation solver : thread is 1:1. The way i am finding out which solver is currently executing the code is by doing a lookup in a Map<Integer,Solver> solverMap where the key is the value of the hashCode of the thread executing the solver -> Thread.currentThread().hashCode(). As the solvers start and finish this Map is being updated. This way i am able to do the lookup from all the places (optaplanner-examples, optaplanner-core, optaplanner-benchmark projects and Drools rules(example below))
Found out kcontext.getKieRuntime().halt() from the Drools documentation which is used to terminate rule execution immediately.
Implemented specialized rules that will reach the then part after each change of a planning/shadow entity and from the then part checks first is the solver terminated early(by the corresponding Timer) and if it is calls kcontext.getKieRuntime().halt(). For example:
In the rule below the then part will be reached after each change in a ShiftAssignment instance and the rule execution will be stopped if the solver is set to be terminated early.
salience 1 //so that it is triggered first
rule "ShiftAssignmentChange"
when
ShiftAssignment()
then
if(TerminateBenchmarkEarly.solverMap.get(Thread.currentThread().hashCode()).isTerminateEarly()){
kcontext.getKieRuntime().halt();//This command is used to terminate the fire loop early.
}
end
The intention with these rules is that they have salience 1 opposed to the default option which is 0, so they will be the first ones that will be executed and the rule execution will be immediately stopped
6. The kieSession.fireAllRules() call from the org.optaplanner.core.impl.score.director.drools.DroolsScoreDirector calculateScore method returns the number of rules that were executed. I can use this measure as a baseline for how much the initial score has achieved.As the optimizations go on it is expected that this number grows higher and the time taken is becoming smaller.
The problem that i'm facing currently
The problem that i have is that even with this implemented again it is taking it a lot of time to reach the check in the rules, or in some cases crashes because of an OutOfMemory error. Turning on the Trace option for Drools i was able to see that some smaller part of the time it was inserting the facts into the working memory, and then after that it constantly is outputting TRACE BetaNode stagedInsertWasEmpty=false. The problem lies in the kieSession.fireAllRules() call from the org.optaplanner.core.impl.score.director.drools.DroolsScoreDirector calculateScore method, the code of fireAllRules is from the Drools core and this code is compiled into a JAR so it cannot be edited.
Conclusion
Anyway i know that this is somehow a hack but as i say above i need this information as a baseline to know where my current solution is and keep track of the benchmark information as the optimizations go on.
If there is different(smarter) way with which i can achieve this, i would be happy to do it.
Results from a benchmark
Input 1
Entity count: 12,870
Variable count: 7,515
Maximum value count: 21
Problem scale: 22,068
Memory usage after loading the inputSolution (before creating the Solver): 44,830,840 bytes on average.
Average score calculation speed after Construction Heuristic = 1965/sec
Average score calculation speed after Local Search = 1165/sec
Average score calculation speed after Solver is finished = 1177/sec
Input 2
Entity count: 17,559
Variable count: 7,515
Maximum value count: 8
Problem scale: 21,474
Memory usage after loading the inputSolution (before creating the Solver): 5,964,200 bytes on average.
Average score calculation speed after Construction Heuristic = 1048/sec
Average score calculation speed after Local Search = 1075/sec
Average score calculation speed after Solver is finished = 1075/sec
Input 3
Entity count: 34,311
Variable count: 14,751
Maximum value count: 8
Problem scale: 43,358
Memory usage after loading the inputSolution (before creating the Solver): 43,178,536 bytes on average.
Average score calculation speed after Construction Heuristic = 1134/sec
Average score calculation speed after Local Search = 450/sec
Average score calculation speed after Solver is finished = 452/sec
Input 4
Entity count: 175,590
Variable count: 75,150
Maximum value count: 11
Problem scale: 240,390
Memory usage after loading the inputSolution (before creating the Solver): 36,089,240 bytes on average.
Average score calculation speed after Construction Heuristic = 739/sec
Average score calculation speed after Local Search = 115/sec
Average score calculation speed after Solver is finished = 123/sec
Input 5
Entity count: 231,000
Variable count: 91,800
Maximum value count: 31
Problem scale: 360,150
Memory usage after loading the inputSolution (before creating the Solver): 136,651,744 bytes on average.
Average score calculation speed after Construction Heuristic = 142/sec
Average score calculation speed after Local Search = 11/sec
Average score calculation speed after Solver is finished = 26/sec
Input 6
Entity count: 770,000
Variable count: 306,000 '
Maximum value count:
51
Problem scale: 1,370,500
Memory usage after loading the
inputSolution (before creating the Solver): 114,488,056 bytes on
average.
Average score calculation speed after Construction
Heuristic = 33/sec
Average score calculation speed after Local Search = 1/sec
Average score calculation speed after Solver is finished = 17/sec
When commenting out the rules in Drools i get the next average score
calculation speed (for Input 6):
After Construction Heuristic = 17800/sec
After Local Search = 22557/sec
After Solver is finished = 21690/sec

If possible, I'd first focus on making the DRL faster, instead of these hacks. So that comes down to figuring out which score rules are slow. Use the score calculation speed (in the last INFO log line) to determine that by commenting out score rules and seeing their impact on the score calculation speed.
That being said, normally I'd advice to look at unimprovedSecondsSpentLimit or a custom Termination - but that indeed won't help as those aren't checked while the initial score is calculated from scratched: they are only checked between every move (so between every fireAllRules(), usually 10k/sec).

Related

Anylogic - How to measure work in process inventory (WIP) within simulation

I am currently working on a simple simulation that consists of 4 manufacturing workstations with different processing times and I would like to measure the WIP inside the system. The model is PennyFab2 in case anybody knows it.
So far, I have measured throughput and cycle time and I am calculating WIP using Little's law, however the results don't match he expectations. The cycle time is measured by using the time measure start and time measure end agents and the throughput by simply counting how many pieces flow through the end of the simulation.
Any ideas on how to directly measure WIP without using Little's law?
Thank you!
For little's law you count the arrivals, not the exits... but maybe it doesn't make a difference...
Otherwise.. There are so many ways
you can count the number of agents inside your system using a RestrictedAreaStart block and use the entitiesInside() function
You can just have a variable that adds +1 if something enters and -1 if something exits
No matter what, you need to add the information into a dataset or a statistics object and you get the mean of agents in your system
Little's Law defines the relationship between:
Work in Process =(WIP)
Throughput (or Flow rate)
Lead Time (or Flow Time)
This means that if you have 2 of the three you can calculate the third.
Since you have a simulation model you can record all three items explicitly and this would be my advice.
Little's Law should then be used to validate if you are recording the 3 values correctly.
You can record them as follows.
WIP = Record the average number of items in your system
Simplest way would be to count the number of items that entered the system and subtract the number of items that left the system. You simply do this calculation every time unit that makes sense for the resolution of your model (hourly, daily, weekly etc) and save the values to a DataSet or Statistics Object
Lead Time = The time a unit takes from entering the system to leaving the system
If you are using the Process Modelling Library (PML) simply use the timeMeasureStart and timeMeasureEnd Blocks, see the example model in the help file.
Throughput = the number of units out of the system per time unit
If you run the model and your average WIP is 10 units and on average a unit takes 5 days to exit the system, your throughput will be 10 units/5 days = 2 units/day
You can validate this by taking the total units that exited your system at the end of the simulation and dividing it by the number of time units your model ran
if you run a model with the above characteristics for 10 days you would expect 20 units to have exited the system.

Prometheus query quantile of pod memory usage performance

I'd like to get the 0.95 percentile memory usage of my pods from the last x time. However this query start to take too long if I use a 'big' (7 / 10d) range.
The query that i'm using right now is:
quantile_over_time(0.95, container_memory_usage_bytes[10d])
Takes around 100s to complete
I removed extra namespace filters for brevity
What steps could I take to make this query more performant ? (except making the machine bigger)
I thought about calculating the 0.95 percentile every x time (let's say 30min) and label it p95_memory_usage and in the query use p95_memory_usage instead of container_memory_usage_bytes, so that i can reduce the amount of points the query has to go through.
However, would this not distort the values ?
As you already observed, aggregating quantiles (over time or otherwise) doesn't really work.
You could try to build a histogram of memory usage over time using recording rules, looking like a "real" Prometheus histogram (consisting of _bucket, _count and _sum metrics) although doing it may be tedious. Something like:
- record: container_memory_usage_bytes_bucket
labels:
le: 100000.0
expr: |
container_memory_usage_bytes > bool 100000.0
+
(
container_memory_usage_bytes_bucket{le="100000.0"}
or ignoring(le)
container_memory_usage_bytes * 0
)
Repeat for all bucket sizes you're interested in, add _count and _sum metrics.
Histograms can be aggregated (over time or otherwise) without problems, so you can use a second set of recording rules that computes an increase of the histogram metrics, at much lower resolution (e.g. hourly or daily increase, at hourly or daily resolution). And finally, you can use histogram_quantile over your low resolution histogram (which has a lot fewer samples than the original time series) to compute your quantile.
It's a lot of work, though, and there will be a couple of downsides: you'll only get hourly/daily updates to your quantile and the accuracy may be lower, depending on how many histogram buckets you define.
Else (and this only came to me after writing all of the above) you could define a recording rule that runs at lower resolution (e.g. once an hour) and records the current value of container_memory_usage_bytes metrics. Then you could continue to use quantile_over_time over this lower resolution metric. You'll obviously lose precision (as you're throwing away a lot of samples) and your quantile will only update once an hour, but it's much simpler. And you only need to wait for 10 days to see if the result is close enough. (o:
The quantile_over_time(0.95, container_memory_usage_bytes[10d]) query can be slow because it needs to take into account all the raw samples for all the container_memory_usage_bytes time series on the last 10 days. The number of samples to process can be quite big. It can be estimated with the following query:
sum(count_over_time(container_memory_usage_bytes[10d]))
Note that if the quantile_over_time(...) query is used for building a graph in Grafana (aka range query instead of instant query), then the number of raw samples returned from the sum(count_over_time(...)) must be multiplied by the number of points on Grafana graph, since Prometheus executes the quantile_over_time(...) individually per each point on the displayed graph. Usually Grafana requests around 1000 points for building smooth graph. So the number returned from sum(count_over_time(...)) must be multiplied by 1000 in order to estimate the number of raw samples Prometheus needs to process for building the quantile_over_time(...) graph. See more details in this article.
There are the following solutions for reducing query duration:
To add more specific label filters in order to reduce the number of selected time series and, consequently, the number of raw samples to process.
To reduce the lookbehind window in square brackets. For example, changing [10d] to [1d] reduces the number of raw samples to process by 10x.
To use recording rules for calculating coarser-grained results.
To try using other Prometheus-compatible systems, which may process heavy queries at faster speed. Try, for example, VictoriaMetrics.

How can a Neural Network learn from testing outputs against external conditions which it can not directly control

In order to simplify the question and hopefully the answer I will provide a somewhat simplified version of what I am trying to do.
Setting up fixed conditions:
Max Oxygen volume permitted in room = 100,000 units
Target Oxygen volume to maintain in room = 100,000 units
Maximum Air processing cycles per sec == 3.0 cycles per second (min is 0.3)
Energy (watts) used per second is this formula : (100w * cycles_per_second)SQUARED
Maximum Oxygen Added to Air per "cycle" = 100 units (minimum 0 units)
1 person consumes 10 units of O2 per second
Max occupancy of room is 100 person (1 person is min)
inputs are processed every cycle and outputs can be changed each cycle - however if an output is fed back in as an input it could only affect the next cycle.
Lets say I have these inputs:
A. current oxygen in room (range: 0 to 1000 units for simplicity - could be normalized)
B. current occupancy in room (0 to 100 people at max capacity) OR/AND could be changed to total O2 used by all people in room per second (0 to 1000 units per second)
C. current cycles per second of air processing (0.3 to 3.0 cycles per second)
D. Current energy used (which is the above current cycles per second * 100 and then squared)
E. Current Oxygen added to air per cycle (0 to 100 units)
(possible outputs fed back in as inputs?):
F. previous change to cycles per second (+ or - 0.0 to 0.1 cycles per second)
G. previous cycles O2 units added per cycle (from 0 to 100 units per cycle)
H. previous change to current occupancy maximum (0 to 100 persons)
Here are the actions (outputs) my program can take:
Change cycles per second by increment/decrement of (0.0 to 0.1 cycles per second)
Change O2 units added per cycle (from 0 to 100 units per cycle)
Change current occupancy maximum (0 to 100 persons) - (basically allowing for forced occupancy reduction and then allowing it to normalize back to maximum)
The GOALS of the program are to maintain a homeostasis of :
as close to 100,000 units of O2 in room
do not allow room to drop to 0 units of O2 ever.
allows for current occupancy of up to 100 people per room for as long as possible without forcibly removing people (as O2 in room is depleted over time and nears 0 units people should be removed from room down to minimum and then allow maximum to recover back up to 100 as more and more 02 is added back to room)
and ideally use the minimum energy (watts) needed to maintain above two conditions. For instance if the room was down to 90,000 units of O2 and there are currently 10 people in the room (using 100 units per second of 02), then instead of running at 3.0 cycles per second (90 kw) and 100 units per second to replenish 300 units per second total (a surplus of 200 units over the 100 being consumed) over 50 seconds to replenish the deficit of 10,000 units for a total of 4500 kw used. - it would be more ideal to run at say 2.0 cycle per second (40 kw) which would produce 200 units per second (a surplus of 100 units over consumed units) for 100 seconds to replenish the deficit of 10,000 units and use a total of 4000 kw used.
NOTE: occupancy may fluctuate from second to second based on external factors that can not be controlled (lets say people are coming and going into the room at liberty). The only control the system has is to forcibly remove people from the room and/or prevent new people from coming into the room by changing the max capacity permitted at that next cycle in time (lets just say the system could do this). We don't want the system to impose a permanent reduction in capacity just because it can only support outputting enough O2 per second for 30 people running at full power. We have a large volume of available O2 and it would take a while before that was depleted to dangerous levels and would require the system to forcibly reduce capacity.
My question:
Can someone explain to me how I might configure this neural network so it can learn from each action (Cycle) it takes by monitoring for the desired results. My challenge here is that most articles I find on the topic assume that you know the correct output answer (ie: I know A, B, C, D, E inputs all are a specific value then Output 1 should be to increase by 0.1 cycles per second).
But what I want is to meet the conditions I laid out in the GOALS above. So each time the program does a cycle and lets say it decides to try increasing the cycles per second and the result is that available O2 is either declining by a lower amount than it was the previous cycle or it is now increasing back towards 100,000, then that output could be considered more correct than reducing cycles per second or maintaining current cycles per second. I am simplifying here since there are multiple variables that would create the "ideal" outcome - but I think I made the point of what I am after.
Code:
For this test exercise I am using a Swift library called Swift-AI (specifically the NeuralNet module of it : https://github.com/Swift-AI/NeuralNet
So if you want to tailor you response in relation to that library it would be helpful but not required. I am more just looking for the logic of how to setup the network and then configure it to do initial and iterative re-training of itself based on those conditions I listed above. I would assume at some point after enough cycles and different conditions it would have the appropriate weightings setup to handle any future condition and re-training would become less and less impactful.
This is a control problem, not a prediction problem, so you cannot just use a supervised learning algorithm. (As you noticed, you have no target values for learning directly via backpropagation.) You can still use a neural network (if you really insist). Have a look at reinforcement learning. But if you already know what happens to the oxygen level when you take an action like forcing people out, why would you learn such a simple facts by millions of evaluations with trial and error, instead of encoding it into a model?
I suggest to look at model predictive control. If nothing else, you should study how the problem is framed there. Or maybe even just plain old PID control. It seems really easy to make a good dynamical model of this process with few state variables.
You may have a few unknown parameters in that model that you need to learn "online". But a simple PID controller can already tolerate and compensate some amount of uncertainty. And it is much easier to fine-tune a few parameters than to learn the general cause-effect structure from scratch. It can be done, but it involves trying all possible actions. For all your algorithm knows, the best action might be to reduce the number of oxygen consumers to zero permanently by killing them, and then get a huge reward for maintaining the oxygen level with little energy. When the algorithm knows nothing about the problem, it will have to try everything out to discover the effect.

Tunning gain table to match two-curves

I have two data set, let us name them "actual speed" and "desired speed". My main objective is to match actual speed with the desired speed.
But for doing that in my case, I need to tune FF(1x10), Integral(10x8) and Proportional gain table(10x8).
My approach till now was as follows:-
First, start the iteration with having 0.1 as the initial value in the first cells(FF[0]) of the FF table
Then find the R-square or Co-relation between two dataset( i.e. Actual Speed and Desired Speed)
Increment the value of first cell(FF[0]) by 0.25 and then again compute R-square or Co-relation of two data set.
Once the cell(FF[0]) value reaches 2(Gains Maximum value. Already defined by the lab). Evaluate R-square and re-write the gain value in FF[0] which gives min. error between the two curve.
Then tune the Integral and Proportional table in the same way for the same RPM Range
Once It is tune then go for higher RPM range and repeat step 2-5 (RPM Range: 800-1000; 1000-1200;....;3000-3200)
Now the problem is that this process is taking way too long time to complete. For example it takes around 1 Hr. time to tune one cell of FF. Which is actually very slow.
If possible, Please suggest any other approach which I can try to tune the tables. I am using MATLAB R2010a and I can't shift to any other version of MATLAB because my controller can communicate with this version only and I can't use any app for tuning since my GUI is already communicating with the controller and those two datasets are being made in real-time
In the given figure, lets us take (X1,Y1) curve as Desired speed and (X2,Y2) curve as Actual speed
UPDATE

MATLAB: Slow convergence of convex optimization algorithm

I want to speed up the convergence of a convex optimization problem in MATLAB.
My objective function is convex having three parameters and I am using gradient ascent for the maximization.
Right now I am manually writing the iteration with the termination condition being the difference between the new parameter value and old parameter value is very small (around 0.0000001). I cannot terminate based upon the number of iterations because it doesn't guarantee that it has converged to the optimum solution.
So, it takes a lot of time to converge - almost 2 days! Is there any way to speed this up?
Actually my objective function has only three parameters. I know that my first parameter's value should be greater than that of the second.
So starting with the initial condition, the second parameter's value starts increasing rapidly. After it has reached a certain point, the first parameter's value starts increasing rapidly. While the first parameter's value starts increasing, the second parameter's value starts decreasing slowly. Eventually, I have the first parameter's value greater than that of second.
Is there any way to speed up the process? 2 days is a very long time. Furthermore, calculating the gradient is also time consuming. It needs a lot of matrix computations.
I don't want to start with the defined parameter values like parameter1's value greater than that of second. Also it's not necessary that the first parameter always has to be greater than the the second. I just know which parameter value should be greater. Any suggestions?
If the calculation of gradients is very slow and you still want to do a manual implementation you could try this, it will take more steps but could be a lot quicker as the steps are so simple:
Define a stepsize
Try all the points where your variable moves -1, 0 or 1 times in the direction of the stepsize (3^3 = 27 possibilities)
Pick the best one
If the best one is your previous one, multiply the stepsize with a factor 0.5
Of course the success of this process depends on the properties of your function. Furthermore it should be noted that a much simpler solution could be to set the desired difference to something like 0.0001