The problem is to find the optimum quantity that incurs minimum total cost in a number of warehouses using genetic algorithm.
Let's say there are n warehouses. Associated with each warehouse are a few factors:
LCosti: loading cost for warehouse i
HCosti: holding cost for warehouse i
TCosti: transportation cost for warehouse i
OCosti: ordering cost for warehouse i
Each warehouse has quantity Qi associated with it that must satisfy these 4 criteria:
loading constraint: Qi * LCosti >= Ai for warehouse i
holding constraint: Qi * HCosti >= Bi for warehouse i
Transportation constraint: Qi * TCosti >= Ci for warehouse i
Ordering constraint: Qi * OCosti >= Di for warehouse i
where A, B, C and D are constants for each of the warehouses.
Another important criterion is that each Qi must satisfy:
Di >= Qi
where Di is the demand in warehouse i.
And the equation of total cost is:
Total cost = sum(Qi * (LCosti + HCosti + TCosti) + OCosti / Qi)
How do I encode a chromosome for this problem? What I am thinking is that combining one of the four constraints that gives a minimum allowable value for Qi and the last constraint, I can get a range for Qi. Then I can randomly generate values in that range for the initial population. But how do I perform crossover, and mutation in the above scenario? How do I encode the chromosomes?
Generally, in constrained problems you have basically three possible approaches (regarding evolutionary algorithms):
1. Incorporate constraint violation into fitness
You can design your fitness as a sum of the actual objective and penalties for violation of constraints. The extreme case is a "death penalty", i.e. any individual which violates any constraint in any way receives the worst possible fitness.
This approach is usually very easy to implement but, however, has a big drawback: it often penalizes solutions that have good building blocks but violate the constraints too much.
2. Correction operators, resistant encoding
If it is possible for your problem, you can implement "correction operators" - operators that take a solution that violate constraints and transform it into another one that does not violate the constraints, preserving as much structure from the original solution as possible. Similar thing is to use such an encoding that guarantees that the solution will always be feasible, i.e. you have such a decoding algorithm that always produces valid solution.
If it is possible, this is probably the best approach you can take. However, it is often quite hard to implement, or not possible without major changes in the solutions which can then significantly slow the search down, or even make it useless.
3. Multi-objective approach
Use some multi-objective (MO) algorithm, e.g. NSGA-II, and turn your measure(s) of constraint violation into objectives and optimize all the objectives at once. The MO algorithms usually provide a pareto-front of solutions - a set of solutions that are on the front of the objective-violation tradeoff space.
Using Differential Evolution you can keep the same representation and avoid the double conversion (integer -> binary, binary -> integer).
The mutation operation is:
V(g+1, i) = X(g, r1) + F ⋅ (X(g, r2) − X(g, r3))
where:
i, r1, r2, r3 are references to vectors in the population and none is equal to the other
F is a random constant in the [0, 1.5] range
V (the mutant vector) is recombined with elements of a target vector (X(g, i)) to build a trial vector u(g+1, i). The selection process chooses the better candidate from the trial vector and the target vector (see the references below for further details).
The interesting aspects of this approach are:
you haven't to redesign the code. You need a different mutation / recombination operator and (perhaps) you have to cast some reals to integers, but it's simple and fast;
for constraint management you can adopt the techniques described in zegkljan's answer;
DE has been shown to be effective on a large range of optimization problems and it seems to be suitable for your problem.
References:
Explain the Differential Evolution method and an old Dr.Dobb's article (by Kenneth Price and Rainer Storn) as introduction;
Storn's page for more details and many code examples.
Related
Modelica modeling is the first principle modeling, so how to test the model and set an effective benchmark is important, for example, I could design a fluid network as my wish, but when building a dynamic simulation model, I need to know the detailed geometry structure and parameters to set up every piece of my model. Usually, I would build a steady-state model with simple energy and mass conservation laws, then design every piece of equipment based on the corresponding design manual, but when I put every dynamic component together, when simulation till steady-state, the result is different from the steady-state model more or less. So I was wondering if I should modify my workflow to make the dynamic model agree with the steady-state model. Any suggestions are welcome.
#dymola #modelica
To my understanding of the question, your parameter values are fixed and physically known. I would attempt the following approach as a heuristic to identify the (few) component(s) that one needs to carefully investigate in order to understand how they influence or violates the assumed first principles.
This is just as a first trial and it could be subject to further improvement and fine-tuning.
Consider the set of significant set of variables xd(p,t) \in R^n and parameters p. Note that p also includes significant start values. p in R^m includes only the set of additional parameters not available in the steady state model.
Denote the corresponding variables of the steady state model by x_s
Denote a time point where the dynamic model is "numerically" in "semi-" steady-state by t*
Consider the function C(xd(p,t*),xs) = ||D||^2 with D = xd(p,t*) - xs
It could be beneficial to describe C as a vector rather than a single valued function.
Compute the partial derivatives of C w.t. p expressed in terms of dxd/dp, i.e.
dC/dp = d[D^T D]/dp
= d[(x_d-x_s)^T (x_d - x_s)]/dp
= (dx_d/dp)^T D + ...
Consider scaling the above function, i.e. dC/dp * p/C (avoid expected numerical issues via some epsilon-tricks)
Here you get a ranking of most significant parameters which are causing the apparent differences. The hopefully few number of components including these parameters could be the ones causing such violation.
If this still does not help, may be due to expected high correlation among parameters, I would go further and consider a dummy parameter identification problem, out of which a more rigorous ranking of significant model parameters can be obtained.
If the Modelica language had capabilities for expressing dynamic parameter sensitivities, all the above computation can be easily carried out as a single Modelica model (with a slightly modified formulation).
For instance, if we had something like der(x,p) corresponding to dx/dp, one could simply state
dcdp = der(C,p)
An alternative approach is proposed via the DerXP library
The Wikipedia page: wikipedia page states that
If m operations, either Union or Find, are applied to n elements, the total run time is O(m log*n).
The detailed analysis arrived at this result is :
My questions are:
Shouldn't the sum be (m+n)log*n instead of mlog*n?
Are the average time complexity for 1000 Find operations the same as the time complexity of each individual Find?
Disclaimer: I'm still trying to understand these proofs myself, so make no claim to being an expert! I think I may have some insights though.
1) I think they have assumed that m = O(n), thereby making O((m + n)lg*(n)) into O(mlg*(n)). In Tarjan's original proof (of the inverse Ackermann function bound) (found here: https://dl.acm.org/doi/pdf/10.1145/321879.321884?download=true) he assumes that the number m of FIND operations exceeds n. In Introduction to Algorithms (CLRS - ch.21), the bound they prove is for m operations of which n are MAKE-SET. It seems like people are assuming that m will be asymptotically greater than or equal to n.
2) What they have proved is an amortized cost for each operation. This is an analysis technique which bounds to within a constant factor the time taken for a series of operations, from which you can trivially compute the average time per operation. There are several different ways to go about it (I believe this is an example of aggregate analysis?). It's worth looking into!
I'm writing a program which can significantly lessen the number of collisions that occur while using hash functions like 'key mod table_size'. For this I would like to use Genetic Programming/Algorithm. But I don't know much about it. Even after reading many articles and examples I don't know that in my case (as in program definition) what would be the fitness function, target (target is usually the required result), what would pose as the population/individuals and parents, etc.
Please help me in identifying the above and with a few codes/pseudo-codes snippets if possible as this is my project.
Its not necessary to be using genetic programming/algorithm, it can be anything using evolutionary programming/algorithm.
thanks..
My advice would be: don't do this that way. The literature on hash functions is vast and we more or less understand what makes a good hash function. We know enough mathematics not to look for them blindly.
If you need a hash function to use, there is plenty to choose from.
However, if this is your uni project and you cannot possibly change the subject or steer it in a more manageable direction, then as you noticed there will be complex issues of getting fitness function and mutation operators right. As far as I can tell off the top of my head, there are no obvious candidates.
You may look up e.g. 'strict avalanche criterion' and try to see if you can reason about it in terms of fitness and mutations.
Another question is how do you want to represent your function? Just a boolean expression? Something built from word operations like AND, XOR, NOT, ROT ?
Depending on your constraints (or rather, assumptions) the question of fitness and mutation will be different.
Broadly fitness is clearly minimize the number of collisions in your 'hash modulo table-size' model.
The obvious part is to take a suitably large and (very important) representative distribution of keys and chuck them through your 'candidate' function.
Then you might pass them through 'hash modulo table-size' for one or more values of table-size and evaluate some measure of 'niceness' of the arising distribution(s).
So what that boils down to is what table-sizes to try and what niceness measure to apply.
Niceness is context dependent.
You might measure 'fullest bucket' as a measure of 'worst case' insert/find time.
You might measure sum of squares of bucket sizes as a measure of 'average' insert/find time based on uniform distribution of amongst the keys look-up.
Finally you would need to decide what table-size (or sizes) to test at.
Conventional wisdom often uses primes because hash modulo prime tends to be nicely volatile to all the bits in hash where as something like hash modulo 2^n only involves the lower n-1 bits.
To keep computation down you might consider the series of next prime larger than each power of two. 5(>2^2) 11 (>2^3), 17 (>2^4) , etc. up to and including the first power of 2 greater than your 'sample' size.
There are other ways of considering fitness but without a practical application the question is (of course) ill-defined.
If your 'space' of potential hash functions don't all have the same execution time you should also factor in 'cost'.
It's fairly easy to define very good hash functions but execution time can be a significant factor.
I have a linear model that is seeking to move 'units' between 'cells' in an optimal manner. Each transfer costs $2 plus 1% of the unit amount transferred.
Lets say a target cell requires 100 units and can receive it from any of 10 source cells. How can I encourage the optimiser to make a single transfer of 100 units from one of the source cells (total cost 2+1) rather than transferring 10 units from each of the valid source cells (total cost 20+1)?
I've implemented this in matlab using mosek if it matters.
(Apologies if the question is a bit vague, this is all self-taught and I'm not sure how to ask this unambiguously with the correct terminology. Happy to repost this question on a more appropriate SE if there is one.)
This is a standard Integer programming called the Fixed Charge Transportation Problem.
Let's say that there are S suppliers and D customers with Demands.
Each supplier i has S_i units and each customer j has a Demand D_j
You need two types of decision variables.
Xij is the amount that goes from Supplier i to Customer j.
But there is a also the fixed cost that we have to take care of. Fij = 2 ($2 for each supplier that ships units.) Let the fixed cost variable be
Y_ij = 1 if Supplier i sends a non-zero number of units to Customer j.
Y_ij = 0 Otherwise.
Formulation
Objective Minimize sum of all Subsets.
Min sum (F_ij Yij) + sum Cij*Xij
Subject to:
Sum over i Xij >= D_j for each customer j //Demand satisfaction
Sum over j Xij <= S_i for each supplier i //Supply limitation
// if you use a supplier for a customer, Yij has to become 1.
Yij >= Xij for each i and each j
Yij binary, Xij >=0
You will find more on Fixed Charge Integer Programming problems in any standard OR textbook. Look for the chapter where Integer Programming is introduced.
Hope that helps you move forward.
the point is which is the objective function that you want to minimize or maximize. If you only want to reduce the number of transfer, you must minimize the number of non-zero transfers: assuming you have a variable $x_{ij}$ for the transfer from $i$ to $j$, then you should minimize $\sum y_{ij}$ where $y_{ij}$ is a binary variable that takes value $0$ when $x_{ij}=0$, $1$ otherwise.
I guess you can formulate the overall model as a min-cost flow between cells, possibly with additional constraints and with a non-trivial objective function.
(By the way, if you need help you might also contact us at mosek on our google forum...)
I have a rather large(not too large but possibly 50+) set of conditions that must be placed on a set of data(or rather the data should be manipulated to fit the conditions).
For example, Suppose I have the a sequence of binary numbers of length n,
if n = 5 then a element in the data might be {0,1,1,0,0} or {0,0,0,1,1}, etc...
BUT there might be a set of conditions such as
x_3 + x_4 = 2
sum(x_even) <= 2
x_2*x_3 = x_4 mod 2
etc...
Because the conditions are quite complex in that they come from experiment(although they can be written down in logic form) and are hard to diagnose I would like instead to use a large sample set of valid data. i.e., Data I know satisfies the conditions and is a pretty large set. i.e., it is easier to collect the data then it is to deduce the conditions that the data must abide by.
Having said that, basically what I'm doing is very similar to neural networks. The difference is, I would like an actual algorithm, in some sense optimal, in some form of code that I can run instead of the network.
It might not be clear what I'm actually trying to do. What I have is a set of data in some raw format that is unique and unambiguous but not appropriate for my needs(in a sense the amount of data is too large).
I need to map the data into another set that actually is ambiguous to some degree but also has certain specific set of constraints that all the data follows(certain things just cannot happen while others are preferred).
The unique constraints and preferences are hard to figure out. That is, the mapping from the non-ambiguous set to the ambiguous set is hard to describe(which is why it is ambiguous). The goal, actually, is to have an unambiguous map by supplying the right constraints if at all possible.
So, on the vein of my initial example, I'm given(or supply) a set of elements and need some way to derive a list of constraints similar to what I've listed.
In a sense, I simply have a set of valid data and train it very similar to neural networks.
Then, after this "Training" I'm given the mapping function I can then use on any element in my dataset and it will produce a new element satisfying the constraint's, or if it can't, will give as close as possible an unambiguous result.
The main difference between neural networks and what I'm trying to achieve is I'd like to be able to use have an algorithm to code to be used instead of having to run a neural network. The difference here is the algorithm would probably be a lot less complex, not need potential retraining, and a lot faster.
Here is a simple example.
Suppose my "training set" are the binary sequences and mappings
01000 => 10000
00001 => 00010
01010 => 10100
00111 => 01110
then from the "Magical Algorithm Finder"(tm) I would get a mapping out like
f(x) = x rol 1 (rol = rotate left)
or whatever way one would want to express it.
Then I could simply apply f(x) to any other element, such as x = 011100 and could apply f to generate a hopefully unambiguous output.
Of course there are many such functions that will work on this example but the goal is to supply enough of the dataset to narrow it down to hopefully a few functions that make the most sense(at the very least will always map the training set correctly).
In my specific case I could easily convert my problem into mapping the set of binary digits of length m to the set of base B digits of length n. The constraints prevents some numbers from having an inverse. e.g., the mapping is injective but not surjective.
My algorithm could be a simple collection if statements acting on the digits if need be.
I think what you are looking for here is an application of Learning Classifier Systems, LCS -wiki. There are actually quite a few LCS open-source applications available, but you may need to experiment with the parameters in order to get a good result.
LCS/XCS/ZCS have the features that you are looking for including individual rules that could be heavily optimized, pressure to reduce the rule-set, and of course a human-readable/understandable set of rules. (Unlike a neural-net)