Distributed computing with or-tools - or-tools

I want to distribute the work of a cp solver algorithm (or other deterministic Algorithms) in a simple way instead of looking for ways to break apart each specific problem into smaller ones.
Is it possible to execute the cp solver with a given number of threads (E.g., 64) and then have 8 different solvers run the same problem with a specific subset of 8 branches?
In this way I could distribute the solving of the problem in a generic way and then just check choose the best result returned.

Related

Can you reproduce results of VRP?

If I am using, say guided local search for a VRP, I keep getting slightly different results even when I set a seed to generate my data. Is this normal because it is a heuristic and not an algorithm or is there something else I am missing?
That is normal, and yes it is because it uses heuristics, not a "real" optimization since VRP is an NP-hard problem. Even if you run it on a different system that has different hardware configurations, especially the CPU, there is a possibility that it will produce different results. Everything depends on how complex is the VRP problem, the more complex the problem the probability of producing different results will increase.

results of two feature selection algo do not match

I am working on two feature selection algorithms for a real world problem where the sample size is 30 and feature size is 80. The first algorithm is wrapper forward feature selection using SVM classifier, the second is filter feature selection algorithm using Pearson product-moment correlation coefficient and Spearman's rank correlation coefficient. It turns out that the selected features by these two algorithms are not overlap at all. Is it reasonable? Does it mean I made mistakes in my implementation? Thank you.
FYI, I am using Libsvm + matlab.
It can definitely happen as both strategies do not have the same expression power.
Trust the wrapper if you want the best feature subset for prediction, trust the correlation if you want all features that are linked to the output/predicted variable. Those subsets can be quite different, especially if you have many redundant features.
Using top correlated features is a strategy which assumes that the relationships between the features and the output/predicted variable are linear, (or at least monotonous in case of Spearman's rank correlation), and that features are statistically independent one from another, and do not 'interact' with one another. Those assumptions are most often violated in real world problems.
Correlations, or other 'filters' such as mutual information, are better used either to filter out features, to decide which features not to consider, rather than to decide which features to consider. Filters are necessary when the initial feature count is very large (hundreds, thousands) to reduce the workload for a subsequent wrapper algorithm.
Depending on the distribution of the data you can either use spearman or pearson.The latter is used for normal distribution while former for non-normal.Find the distribution and use appropriate one.

Is it possible to improve speed in ODE solvers from matlab? (ode45 ode15s etc)

I wrote a code to solve a system using ode45 and ode15s in matlab. I am wondering if I can improve the speed of the code using multiple core (or parallel code) in my script.
Anyone have tried this ??
Thanks
No, you can't.
All numerical integrators, ode45 and friends included, use some form of iterative scheme to solve the user-implemented (coupled) non-linear (partial) differential equations.
Each new step in the iterative schemes of ode45/15s/.. (to compute the new state of the system) depends on the previous step (the old state of the system), therefore, these numerical integrators cannot be parallelized effectively.
The only speedup you can do that's likely to have a big impact is to optimize your implementation of the differential equation.
From my experience, the only way to use multiple cores for ODE suite solvers in MATLAB is to use "parfor loop" to start multiple computations together at the same time, your single computation not be any faster, but you can start many with different parameters and have multiple solutions after that long wait. So if you need to start ODE many times that might speed up your work.
To speed up one ODE function it also a good idea to play with RelTol and AbsTol settings (changes time form seconds to hours), using Jpattern option can also be very helpful (my almost tridiagonal pattern made it run twice as fast). If your differential equation is simple maybe try to compile it first, or at least vectorize (I used to write some part of code in Java and then point MATLAB to use compiled .class file). Obviously the length of your solution vector plays important role, so don't make it more than a few hounded.

Choosing Clustering Method based on results

I'm using WEKA for my thesis and have over 1000 lines of data. The database includes demographical information (Age, Location, status etc.) followed by name of products (valued 1 or 0). The end results is a recommender system.
I used two methods of clustering, K-Means and DBScan.
When using K-means I tried 3 different number of cluster, while using DBscan I chose 3 different epsilons (Epsilon 3 = 48 clusters with ignored 17% of data, Epsilone 2.5 = 19 clusters while cluster 0 holds 229 items with ignored 6%.) Meaning i have 6 different clustering results for same data.
How do I choose what's best suits my data ?
What is "best"?
As some smart people noticed:
the validity of a clustering is often in the eye of the beholder
There is no objectively "better" for clustering, or you are not doing cluster analysis.
Even when a result actually is "better" on some mathematical measure such as separation, silhouette or even when using a supervised evaluation using labels - its still only better at optimizing towards some mathematical goal, not to your use case.
K-means finds a local optimal sum-of-squares assignment for a given k. (And if you increase k, there exists a better assignment!) DBSCAN (it's actually correctly spelled all uppercase) always finds the optimal density-connected components for the given MinPts/Epsilon combination. Yet, both just optimize with respect to some mathematical criterion. Unless this critertion aligns with your requirements, it is worthless. So there is no best, until you know what you need. But if you know what you need, you would not need to do cluster analysis.
So what to do?
Try different algorithms and different parameters and analyze the output with your domain knowledge, if they help you with the problem you are trying to solve. If they help you solving your problem, then they are good. If they do not help, try again.
Over time, you will collect some experience. For example, if the sum-of-squares is meaningless for your domain, don't use k-means. If your data does not have meaningful density, don't use density based clustering such as DBSCAN. It's not that these algorithms fail. They just don't solve your problem, they solve a different problem that you are not interested in. And they might be really good at solving this other problem...

Matlab and GPU/CUDA programming

I need to run several independent analyses on the same data set.
Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).
As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.
I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.
My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.
Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong.
Am I completely wrong on the approach?
Any suggestion would be highly appreciated.
Edit
Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project).
Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.
How can they be interfaced with a cluster? Do they work with some job distribution system?
I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.
To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).
Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).
Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.
No experience with gpumat.
The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.
I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that