Speed of CPLEX vs CPLEX using SCIP - scipy

I am new to LP and have only briefly used PuLP in Python.
Why is there a speed difference between SCIP 3.2.1 - CPLEX 12.63 and CPLEX 12.6.3? Doesn't SCIP still use CPLEX for solving?
Why will someone use SCIP with CPLEX solver, instead of using CPLEX directly?

What is this difference about
This plot is not showing a LP-benchmark, but a Mixed-integer programming benchmark.
Mixed-integer programming solvers typically use a branch-and-cut-based algorithm (including heuristics and co.), where a lot of relaxations are solved (in sequence; treating binary-/integer-variables as continuous resulting in an LP-problem).
One decision then is to choose how to solve these relaxed subproblems. The most simple decision (there are many more; e.g. tuning the Simplex-algorithm's parameters; it get's even more complex when solving problems with nonlinear-conic objectives) is to choose the LP-solver.
SoPlex is a LP-solver implementation by the SCIP-team. Meaning:
SCIP - SoPlex will use SCIP's algorithm for MIP (handling branching, cut-generation and co.) using SoPlex as solver for the internal LP-subproblems
SCIP - CPLEX will use SCIP's algorithm for MIP using CPLEX as solver for the internal LP-subproblems
Why using SCIP with CPLEX (instead of using a pure CPLEX approach)
The why is not that easy to explain.
Keep in mind, that all MIP-solvers are heuristics-based and on some problems SCIP will be faster than CPLEX (despite the underlying LP-solver selected).
Keywords for some theory: NP-hardness (of MIP) and the No free lunch theorem
Faster could mean: faster due to the MIP-based strategies, not the speed of the underlying LP-solver so that you may even gain an overall speedup using CPLEX on the subproblems!
The two solvers (MIP-solvers) are probably also much different in regards to parameters & accessibility (of internal algorithmic components). It's obvious, that you can tune SCIP in a much more general way than CPLEX (because it's open source)
As mattmilten mentioned in the comments: SCIP and CPLEX are also different in regards to the support of problem-classes which can be solved. One example of this might be the possibility for some special nonlinear-constraints (resulting in a MINLP). Using SCIP for these kind of problems, can still use CPLEX' LP-solver internally (same arguments as above)

Related

what does impact a simulation runtime in Modelica

In order to make my model simulation's in Modelica run faster am asking the following quesion :
What does impact simulation runtime in Modelica ?
i will aprecicate any help possible.
Edit: More details can be consulted from my book "Modelica by Application -- Power Systems" (URL)
What does impact the runtime performance?
I. Applied compilation techniques
Naturally, object-oriented Modelica models, even trivial ones, would correspond to a large-scale system of equations. Modelica simulation environments would usually optimize such generated models:
reduce the number of possible equations by removing trivial ones (i.e. alias equations)
decompose a large-block of equation system with so called BLT-transformation into smaller cascaded blocks of equation systems that can be solved faster in a sequential manner and not as a single block of equations,
solve s.c. large algebraic loops using tearing methods.
It can theoretically even go too far and attempt to solve blocks of equation system in an analytical manner if possible instead of conducting expensive numerical integration
Thus, the runtime performance would be influenced by the underlying Modelica compiler and how far does it exploit equation-based compiler methods. Usually some extra settings need to be activated to exploit all possible kind of such techniques. Digging the documentation to enable such settings is needed.
II. The nature of the model
The nature of the model would influence the runtime performance, particularly:
Is the model a large-scale system? or a small-scale one?
Is it strongly nonlinear or semi-linear one?
Is the resulting optimized equation system corresponding to the model sparse (i.e. large set of equations each with few number of variables, e.g. power system network models) or dense (e.g. multibody systems and biochemical networks)
Is it a stiff system? (e.g. a system with several subsystems some exhibiting very quick dynamics and others very slow dynamics)
Does the system exhibit large number of state events
...
III The choice of the solver
The mentioned characteristics of a given model would typically influence the ideal choice of the solver. The solver can largely influence the runtime performance (and accuracy). A strategy for solver choice could be made in the following order:
For a non-stiff weakly nonlinear model, the ideal choice would be an explicit method, e.g. Single-step Runga-Kutta or Multi-step Adam-Bashforth of higher order. If accuracy is less significant, one can attempt an explicit method of a lower order which would executes faster. Naturally, increasing the solver error tolerance would also speed-up the simulation.
However, it could happen, particularly for large-scale systems, that numerical stability could be more difficult to guarantee. Then, smaller solver step-sizes (and/or smaller error tolerance) for explicit solvers should be attempted. In this case, an implicit solver with larger error tolerance can be comparable with an explicit solver with a smaller tolerance.
Actually, it is wise to try both methods, comparing the accuracy of the results, and figuring out if explicit methods produce comparably accurate results. However, as a warning this would be just a heuristic, since the system does not necessarily have the same behavior over the entire space of admissible parameter values.
For increasing nonlinearity of the model, the choice would tend more towards modern solvers making use of variable step-size techniques. Here I would start with implicit variable-step Runga-Kutta (i.e. single-step) and/or the implicit variable-step multi-step methods, Adams–Moulton. For both of these classes, one can enlarge the solver tolerance and/or lower the solver error order and figure out if the simulation produces comparably accurate solutions (but with faster runtime).
Implementations of the previous classes of methods are usually less conservative with error control, and therefore, for increasing stiffness of the model or badly scalable models, the choice would tend more towards modern solvers implementing so-called numerically more stable backward differentiation formula (BDF), s.a. DASSL, CVODE, IDA. These solvers (can) also make use of the s.c. Jacobian of the system for adaptive step-size control.
A modern solver like LSODAR that switches between explicit and implicit solvers and also perform automatic error order control (switching between different orders) is a good choice if one does not know that much information about the behavior of the model. May be some Modelica environments have an advanced solver making use of automatic switching. However, if one knows the behavior of the model in advance, it is also wise to use other suggested methods since LSODAR may not perform the most optimal switching when needed.
x. ...
The comparisons between solvers from classes 3,4 and 5 are not straightforward to judge and it depends also on whether the system is continuous or hybrid, i.e. the underlying root-finding algorithms.
Usually DASSL could be slower as it is more conservative with step-size/error control. So it seems that IDA and others are faster. Some published works exist that can give some intuitions regarding such comparisons. It would be nice to have a Modelica library including all possible types of models and running all possible benchmarks w.r.t. accuracy and runtime to draw some more solver/model specific conclusions. A library that could be used and extended for such a purpose is the ScalableTestSuite Modelica library.
IV. Advanced aspects
There have been some published works in the Modelica community regarding making use of sparse solvers to exploit the expected sparsity of the Jacobian. If such a feature is provided by the simulation environment, this would usually significantly improve the runtime performance of large-scale models.
For models with massive number of events, numerical integration in the standard way can be extremely inefficient. Particularly challenging is when an event is triggered, other sets of state-events could be further triggered and a queue of state-events should be evaluated. The root-finding algorithm could further trigger other events and the solver could be hanging on in a s.c. chattering situation. There are advanced strategies for such situations, s.c. sliding mode, however I am not sure how far Modelica simulation environments are handing this issue.
One set of suggested solutions (also for systems with high degree of stiffness) is to employ so called QSS (quantized state system) methods. This would be significantly beneficial particularly for models that can not be solved using explicit solvers. There are both explicit and implicit QSS methods. There have been also other worth-to-try numerical integration strategies where only subsets of the entire equation system is evaluated when approximating a state event. Here I am not sure about availability of such solvers.
Some simulation environments differentiate between two simulation modes which can influence the simulation runtime: the ODE Mode and DAE Mode. In the first mode, the system is reduced to an ODE system with potentially additional cascaded blocks of nonlinear equation systems. In the DAE mode, the system is reduced to a DAE system of index one. The former mode would be beneficial for dense systems exhibiting such large cascaded blocks of nonlinear equations to be solved using s.c. Tearing methods instead of numerical integration. The DAE mode would be beneficial for large-scale sparse systems solved using sparse solvers. I think the ODE mode is usually activated by choosing CVODE or LSODAR while DAE mode is activated by choosing IDA or DASSL. But digging the documentation here and there is also recommended.
There are also some published works regarding so called multirate numerical integration solvers. Here, in each numerical integration step, only the numerically-significant portion of the equation system and not the entire equation system is integrated. Hence, this is significantly beneficial for large-scale stiff systems.
x. ...
V. Parallelization
Obviously, making use of multicore / GPUs for executing numerical integration in parallel, among other approaches for applying parallelization can speed-up computations.
VI. quite very advanced topics
In order to pay attention at some excellent research attempts some of which can be exploited for speeding up the simulation runtime performance of large-scale (loosely-coupled) hybrid networked models, I am listing this here as well. Speed-up can be obtained by making use of hybrid paradigms, agent-based modeling paradigm and/or multimode paradigm. The idea behind is that it is possible to describe a loosely coupled system in several smaller subsystems and conduct the communication among subsystems only when necessary. This can be beneficial and the reasons can be traced by searching for relevant publications. There have been some excellent work in some of the mentioned directions, and it is worth to continue them where they have stopped if this is the case.
Remark: Any of the mentioned solvers is not necessarily present in all possible Modelica simulation environments. If a solver is not provided as a choice, one would still be able to produce an FMU-ME (Functional mockup unit for model exchange) and write code that numerically integrate this FMU with a desired solver.
Warning: Some of the above aspects are based on personal experiences for a particular type of models and are not necessarily true for all model types.
Few suggested reading and I am definitely missing a lot of key publications:
F. Casella, Simulation of Large-Scale Models in Modelica: State of the Art and Future Perspectives, Modelica 2016
Liu Liu, Felix Felgner and Georg Frey, Comparison of 4 numerical solvers for stiff and hybrid systems simulation, Conference 2010
Willi Braun, Francesco Casella and Bernhard Bachmann, Solving large-scale Modelica models: new approaches and experimental results using OpenModelica, Modelica 2017
Erik Henningsson and Hans Olsson and Luigi Vanfretti, DAE Solvers for Large-Scale Hybrid Models, Modelica 2019
Tamara Beltrame and François Cellier, Quantised state system simulation in Dymola/Modelica using the DEVS formalism, Modelica 2006
Victorino Sanz and Federico Bergero and Alfonso Urquia, An approach to agent-based modeling with Modelica, Simpra 2010

Maximum number of decision variables in scipy linear programming module in python

Is there any maximum limit for decision variables in scipy linear programming module (minimization) in python? If so, Can it be extended the number of decision variables to 10000? If scipy is limited to number of decision variables, Is there any other software which can be installed in python so that I can proceed with?
The original scipy Simplex LP solver was only for very small problems. The newer scipy Interior Point solver can handle larger problems more reliably. Also make sure to pass on A_eq and/or A_ub as sparse matrices. If you don't do this you may run out of memory.
Having said this, I would be more comfortable with LP solvers that have seen more large, sparse problems than scipy. Most LP solvers have a Python interface.
Finally, larger problems are often (but not always) more complex and it may help to use a modeling tool. This will allow you to express the problem in a more natural way than using matrices. For Python there is PuLP and Pyomo (among others). Some commercial solvers also provide excellent modeling tools.

Linear Programming Solver for MATLAB, similar to cplexlp or linprog

I'm using MATLAB 2010b 64bit and its cplex integration to solve an engineering problem. However, because of the memory leak of cplex, memory usage exceeds acceptable limits with cplex (100+GBs including virtual memory) hence I am not able to solve my problem. You can see a similar post here.
Then I tried to use MATLAB linprog from the optimization toolbox but the result was disappointing. The running time of the algorithm for a small problem instance was increased from 80 cpu sec to 2600 cpu sec.
Now, I need an LP solver integration to MATLAB which is similar to CPLEX or linprog. By "similar" I mean the way it accepts data input in the form (F, A, B, Aeq, Beq, ...etc).
I must be able to use it in loops. Do you have any suggestions for that?
I would be very surprised if there was a memory leak in cplex. If you have a large problem then the memory will grow with any sensible solver. Is there perhaps a memory leak in the interface to cplex? How big is your problem? Are you running multi threaded as each thread will take a copy of the problem and hence will eat a lot more memory.
You should not be surprised to find that other solvers take a lot longer than cplex to solve your problem. Certainly the free solvers will be very much slower than cplex for any large problem.
After some trials to fix MATLAB/CPLex API's memory usage problem (memory leak) and after referring to some studies I decided to switch to Gurobi solver. For pure LP problems, it seems to be slightly slower compared to CPlex but this can be due to the way I use Gurobi. Someone may find Gurobi faster compared to CPlex. I suggested that on my previous posts under different questions. Here are some academic studies[Analysis of commercial and free and open source solvers for linear optimization problems][1]
[1] : http://www.statistik.tuwien.ac.at/forschung/CS/CS-2012-1complete.pdf

solve multiobjective optimization: CPLEX or Matlab?

I have to solve a multiobjective problem but I don't know if I should use CPLEX or Matlab. Can you explain the advantage and disadvantage of both tools.
Thank you very much!
This is really a question about choosing the most suitable modeling approach in the presence of multiple objectives, rather than deciding between CPLEX or MATLAB.
Multi-criteria Decision making is a whole sub-field in itself. Take a look at: http://en.wikipedia.org/wiki/Multi-objective_optimization.
Once you have decided on the approach and formulated your problem (either by collapsing your multiple objectives into a weighted one, or as series of linear programs) either tool will do the job for you.
Since you are familiar with MATLAB, you can start by using it to solve a series of linear programs (a goal programming approach). This page by Mathworks has a few examples with step-by-step details: http://www.mathworks.com/discovery/multiobjective-optimization.html to get you started.
Probably this question is not a matter of your current concern. However my answer is rather universal, so let me post it here.
If solving a multiobjective problem means deriving a specific Pareto optimal solution, then you need to solve a single-objective problem obtained by scalarizing (aggregating) the objectives. The type of scalarization and values of its parameters (if any) depend on decision maker's preferences, e.g. how he/she/you want(s) to prioritize different objectives when they conflict with each other. Weighted sum, achievement scalarization (a.k.a. weighted Chebyshev), and lexicographic optimization are the most widespread types. They have different advantages and disadvantages, so there is no universal recommendation here.
CPLEX is preferred in the case, where (A) your scalarized problem belongs to the class solved by CPLEX (obviously), e.g. it is a [mixed integer] linear/quadratic problem, and (B) the problem is complex enough for computational time to be essential. CPLEX is specialized in the narrow class of problems, and should be much faster than Matlab in complex cases.
You do not have to limit the choice of multiobjective methods to the ones offered by Matlab/CPLEX or other solvers (which are usually narrow). It is easy to formulate a scalarized problem by yourself, and then run appropriate single-objective optimization (source: it is one of my main research fields, see e.g. implementation for the class of knapsack problems). The issue boils down to finding a suitable single-objective solver.
If you want to obtain some general information about the whole Pareto optimal set, I recommend to start with deriving the nadir and the ideal objective vectors.
If you want to derive a representation of the Pareto optimal set, besides the mentioned population based-heuristics such as GAs, there are exact methods developed for specific classes of problems. Examples: a library implemented in Julia, a recently published method.
All concepts mentioned here are described in the comprehensive book by Miettinen (1999).
Can cplex solve a pareto type multiobjective one? All i know is that it can solve a simple goal programming by defining the lexicographical objs, or it uses the weighted sum to change weights gradually with sensitivity information and "enumerate" the pareto front, which highly depends on the weights and looks very subjective.
You can refer here as how cplex solves the bi-objetive one, which seems not good.
For a true pareto way which includes the ranking, i only know some GA variants can do like NSGA-II.
A different approach would be to use a domain-specific modeling language for mathematical optimization like YALMIP (or JUMP.jl if you like to give Julia a try). There you can write your optimization problem with Matlab with some extra YALMIP functionalities and use CPLEX (or any other supported solver as a backend) without restricting to one solver.

Sensitivity analysis in LP solvers from MATLAB

As far as I understand, CPLEX, LP_solve and GLPK, among other LP solvers, offer sensitivity analysis.
I have the above three solvers installed on my machine, along with these two MATLAB wrappers:
CPLEX for MATLAB API (for CPLEX)
YALMIP (a general MATLAB wrapper for several solvers)
I looked in the documentation of these two wrappers but could not find a way of running sensitivity analysis from them. Do they support it? If not, are there any LP solvers that offer MATLAB support for their sensitivity analysis?
What do I mean by sensitivity analysis?
I mean sensitivity analysis with respect to the cost function and constraints. Conceptually speaking, sensitivity analysis tries to address the following question:
How would the solution change if some aspect of the problem is
changed?
For example:
What is the range of values the coefficient for the variable j can
take without affecting the optimality of the solution?
More specifically, here is a list of the Java, C++ and C APIs that CPLEX provides for sensitivity analysis.
Here is information about the sensitivity analysis provided by LP_solve. You can find the help text for the previous link within LP_solve's main reference guide by searching for "sensitivity" here.