Matlab's optimproblem class of objects allows users to define an Integer Linear Program (ILP) problems using symbolic variables. This is dubbed the "problem-based" formulation. Internal methods takes care of setting up the detailed ILP formulation by assembling the coefficient arrays and matrices for the objective function, equality constraints, and inequality constraints. In Matlab, these details are referred to as the "structure" for the "solver-based" formulation.
Users can see the order in which the optimproblem.Variables are taken in setting up the solver-based formulation by using prob2struct to explicitly convert an optimizationproblem object into a solver-based structure. The Algorithms section of the prob2struct page, the variables are taken in the order in which they appear in the optimizationproblem.Variables property.
I haven't been able to find what determines this order. Is there any way to control het order, maybe even change it if necessary? This would allow one to control the order of the scalar variables of the archetypal ILP problem setup, i.e., the solver-based formulation.
Thanks.
Reason for this question
I'm using Matlab as a prototyping environment, and may be relying on others to develop based on the prototype, possibly calling other solver engines. An uncontrolled ordering of variables makes it hard to compare, especially if the development has a deterministic way of arranging the variables. Hence my wish to control the variable ordering. If this is not possible, it would be nice to know. I would then know to turn my attention completely to mitigating the challenge of disparately ordered variables.
Related
I have seen MICE implemented with different types of algorithms e.g. RandomForest or Stochastic Regression etc.
My question is that does it matter which type of algorithm i.e. does one perform the best? Is there any empirical evidence?
I am struggling to find any info on the web
Thank you
Yes, (depending on your task) it can matter quite a lot, which algorithm you choose.
You also can be sure, the mice developers wouldn't out effort into providing different algorithms, if there was one algorithm that anyway always performs best. Because, of course like in machine learning the "No free lunch theorem" is also relevant for imputation.
In general you can say, that the default settings of mice are often a good choice.
Look at this example from the miceRanger Vignette to see, how far imputations can differ for different algorithms. (the real distribution is marked in red, the respective multiple imputations in black)
The Predictive Mean Matching (pmm) algorithm e.g. makes sure that only imputed values appear, that were really in the dataset. This is for example useful, where only integer values like 0,1,2,3 appear in the data (and no values in between). Other algorithms won't do this, so while doing their regression they will also provide interpolated values like on the picture to the right ( so they will provide imputations that are e.g. 1.1, 1.3, ...) Both solutions can come with certain drawbacks.
That is why it is important to actually assess imputation performance afterwards. There are several diagnostic plots in mice to do this.
I was wondering if there exists a technical way to choose initial parameters to these kind of problems (as they can take virtually any form). My question arises from the fact that my solution depends a little on initial parameters (as usual). My fit consists of 10 parameters and approximately 5120 data points (x,y,z) and has non linear constraints. I have been doing this by brute force, that is, trying parameters randomly and trying to observe a pattern but it led me nowhere.
I also have tried using MATLAB's Genetic Algorithm (to find a global optimum) but with no success as it seems my function has a ton of local minima.
For the purpose of my problem, I need justfy in some manner the reasons behind choosing initial parameters.
Without any insight on the model and likely values of the parameters, the search space is too large for anything feasible. Think that just trying ten values for every parameter corresponds to ten billion combinations.
There is no magical black box.
You can try Bayesian Optimization to find a global optimum for expensive black box functions. Matlab describes it's implementation [bayesopt][2] as
Select optimal machine learning hyperparameters using Bayesian optimization
but you can use it to optimize any function. Bayesian Optimization works by updating a prior belief over a distribution of functions with the observed data.
To speed up the optimization I would recommend adding your existing data via the InitialX and InitialObjective input arguments.
I am using Gurobi 7.0 through Matlab. Based on the documentation, in order to find the n best solutions you need to set the parameters:
PoolSearchMode=2, to find alternative optimal solutions in a systematic way.
PoolSolutions=n, number of of solution in the pool.
When I do this my result contains the same fields as with the default parameters, i.e. only one solution. I have also tried changing the parameter SolutionNumber, but it does not affect the outcome.
I suspect the alternative optimal solutions are being found, since the solver reports on the prompt a solution count equivalent to n with objective values, but I am not able to retrieve them. I hope this is not another limitation of the Gurobi Matlab API.
Also, I know I could find these solutions using integer cuts, but from my understanding that would be much more inefficient since it would require to start the branch and bound tree from the beginning.
It is not possible. The Gurobi Matlab interface is limited because it does not treat the model as a class, even though Matlab offers object oriented programming. This limits many functionalities. CPLEX however allows Matlab users to interact with the model class and retrieve solutions from the solutions pool.
I am currently running a multiple linear regression using MATLAB's LinearModel.fit function, and I am bit confused in regards to how to properly add interaction terms to the model by hand. As I am aware, LinearModel.fit does not standardize variables on its own, so I have been doing so manually.
So far, the way I have done it has been to
Standardize the observations for each variables
Multiply corresponding standardized values from specific variables to create the interaction terms and then add these new variables to the set of regression data
Run the regression
Is this the correct way to go about doing this? Should I standardize the interaction term variables also after calculating the 'raw' terms? Any help would be greatly appreciated!
Whether or not to standardize interaction terms probably depends on what you intend to do with the model. Standardization typically does not affect model performance as much as it allows for more straightforward model interpretation as your learned coefficients will be on similar scales. I suspect whether to do this or not is largely a matter of opinion. Here is a relevant stats.stackexchange post that may help.
My intuition would be the same as how you have described your process so far.
I have to solve a multiobjective problem but I don't know if I should use CPLEX or Matlab. Can you explain the advantage and disadvantage of both tools.
Thank you very much!
This is really a question about choosing the most suitable modeling approach in the presence of multiple objectives, rather than deciding between CPLEX or MATLAB.
Multi-criteria Decision making is a whole sub-field in itself. Take a look at: http://en.wikipedia.org/wiki/Multi-objective_optimization.
Once you have decided on the approach and formulated your problem (either by collapsing your multiple objectives into a weighted one, or as series of linear programs) either tool will do the job for you.
Since you are familiar with MATLAB, you can start by using it to solve a series of linear programs (a goal programming approach). This page by Mathworks has a few examples with step-by-step details: http://www.mathworks.com/discovery/multiobjective-optimization.html to get you started.
Probably this question is not a matter of your current concern. However my answer is rather universal, so let me post it here.
If solving a multiobjective problem means deriving a specific Pareto optimal solution, then you need to solve a single-objective problem obtained by scalarizing (aggregating) the objectives. The type of scalarization and values of its parameters (if any) depend on decision maker's preferences, e.g. how he/she/you want(s) to prioritize different objectives when they conflict with each other. Weighted sum, achievement scalarization (a.k.a. weighted Chebyshev), and lexicographic optimization are the most widespread types. They have different advantages and disadvantages, so there is no universal recommendation here.
CPLEX is preferred in the case, where (A) your scalarized problem belongs to the class solved by CPLEX (obviously), e.g. it is a [mixed integer] linear/quadratic problem, and (B) the problem is complex enough for computational time to be essential. CPLEX is specialized in the narrow class of problems, and should be much faster than Matlab in complex cases.
You do not have to limit the choice of multiobjective methods to the ones offered by Matlab/CPLEX or other solvers (which are usually narrow). It is easy to formulate a scalarized problem by yourself, and then run appropriate single-objective optimization (source: it is one of my main research fields, see e.g. implementation for the class of knapsack problems). The issue boils down to finding a suitable single-objective solver.
If you want to obtain some general information about the whole Pareto optimal set, I recommend to start with deriving the nadir and the ideal objective vectors.
If you want to derive a representation of the Pareto optimal set, besides the mentioned population based-heuristics such as GAs, there are exact methods developed for specific classes of problems. Examples: a library implemented in Julia, a recently published method.
All concepts mentioned here are described in the comprehensive book by Miettinen (1999).
Can cplex solve a pareto type multiobjective one? All i know is that it can solve a simple goal programming by defining the lexicographical objs, or it uses the weighted sum to change weights gradually with sensitivity information and "enumerate" the pareto front, which highly depends on the weights and looks very subjective.
You can refer here as how cplex solves the bi-objetive one, which seems not good.
For a true pareto way which includes the ranking, i only know some GA variants can do like NSGA-II.
A different approach would be to use a domain-specific modeling language for mathematical optimization like YALMIP (or JUMP.jl if you like to give Julia a try). There you can write your optimization problem with Matlab with some extra YALMIP functionalities and use CPLEX (or any other supported solver as a backend) without restricting to one solver.