At my workplace I have one license of MATLAB on a virtual machine, which has Statistics Toolbox included with it. I like to use that instance of MATLAB to import csv data into dataset arrays, because of the convenience it provides.
However, I'd like to use the imported data on my local machine, which has its own license for MATLAB but (unfortunately) no Statistics Toolbox.
What is the best way to convert the dataset object to something that can be used with only base MATLAB? dataset2struct? It seems that if I'm just converting it back to a structure, I might as well just write a function that imports the data directly to a structure. Or is there any other way to work with dataset array in a MATLAB instance that lacks Statistics Toolbox?
In version 13b of MATLAB (out this September, prerelease is available now), there will be something similar to a dataset array in base MATLAB called a table data container (I haven't tried it yet, and can't be sure it will be exactly the same). Also a categorical array similar to that currently in Statistics Toolbox.
Until then, there's not really a way to use a dataset array without Statistics Toolbox, and I would suggest either of the two methods you mention (personally I'd go with just using a structure throughout, as I find the convenience of dataset arrays to be overrated - but that's just my experience, yours may differ).
Related
I am aware that .p files are Matlab protected files. So, I am not trying to access them. However, I was wondering if I could use their output onto the Matlab shell as input to a Matlab program.
What I mean is the following: I have to simulate a dynamic system in Matlab using a controller. Afterwards, I need to assess its performance. This is done by the .p file. Now, the controller behaviour is defined by six distinct variables. I pretty much know their range. So, what I did was create an optimization to find the optimal coefficients. However, when I run the .p file I see that the coefficients I obtained as optimal are in fact not optimal, i.e. my cost function is biased in some way.
So, what I would like to do is to use the output of the .p file (there are always six strings with only two numerical values - so they would be easy to extract if it were a text file) to run a new optimization so that I can understand what I did wrong in my original cost function.
The alternative is finding the parameters starting from my values by trial and error, but considering there are six variables I would prefer a more mathematically pure approach.
Basically, the question is how I can read the output onto the command prompt of a Matlab .p function, and use it as input in a Matlab function.
Thanks for the help!
I have different matrices to import into a Simulink Matlab function from the workspace. These matrices have all different dimension, which I don´t know at priori.
At the beginning I tried using the block 'constant' putting the data all together in a structure like this:
But then, I cannot pick the right matrix since I don´t know the dimension of each element (and also 'mux' cannot be used to split matrices).
I think I will have the same problem also with the block 'from workspace'.
I was wondering if there is a smart way to import heterogeneous structures like these. I tried also with cell-arrays, but it seems to be not supported by Simulink.
Thanks for any suggestions.
If the data is to be used in a Matlab Function block you could define the workspace matricies as parameters in the model explorer and in the Matlab Function port editor. You then have them accessible inside that function without even needing the "const" blocks or drawing any signals.
Even if your final intent is not to have data into a Matlab Function block those blocks are quite useful for extracting signals from heterogeneous data since you can do some size/type checking in them. Then you can output "simulink friendly" signals for use elsewhere.
Is there any package available for multiple imputation? Or any reference I can use to write my own function? Since the percentage of missing data is really high in some columns of the data (approximately 50–70%), I think multiple imputation is a good choice.
If you installed Bioinformatics Toolbox, check knnimpute for more details. It is used to impute missing data using nearest-neighbor method.
If I was to create interpolated splines from a large amount of data (about 400 charts, 500,000 values each), how could I then access the coordinates of those splines from another software quickly and efficiently?
Initially I intended to run a regression on the data and use the resulting formula in my delphi program, but that turned out to be a bigger pain than I thought.
I am currently using Matlab but I can use another software if need be.
Edit: It is probably relevant that this data represents the empirical cumulative distribution of some other data (which I already have in a database).
Here is what one of these charts would look like.
The emphasis is on speed of access. I intend to use this data to run simulations on financial data.
MATLAB has a command for converting a spline into a piecewise polynomial. You can then extract the breaks and the coefficients of each piece of polynomial with unmkpp, and evaluate them in another program.
If you are also familiar with C, you could use Matlab coder or something similar to get an intermediate library to connect your Delphi program and MATLAB together. Interfacing Delphi and C code is, albeit a tad tedious, certainly possible (or it was back in the days of Delphi 7). Or you could even write the algorithm in MATLAB, convert the code to C using Matlab coder and from within Delphi call the generated C library.
Perhaps a bit overkill, but you can store your data in a database (e.g. MySQL) from MATLAB and retrieve them from Delphi.
Finally: is Delphi a real constraint? You could also use MATLAB to do the simulations, as you might have the same tools (or even more) available for MATLAB than in Delphi. Afterwards you can just share the results, which I suppose is less speed critical.
My initial guess at doing this efficiently would be to create a memory mapped file in MATLAB using memmapfile, stuff a look-up table with your data into that, then open the memory mapped file in your Delphi code and read the data from that file.
The fastest is most likely a look-up table that you save to disk and that you load and use in your simulation code (although: why not run the simulation in Matlab?)
You can evaluate the spline for a finely-grained list of values of x using FNVAL, and use the closest value of x to look up the cdf.
I am trying to do some text classification with SVMs in MATLAB and really would to know if MATLAB has any methods for feature selection(Chi Sq.,MI,....), For the reason that I wan to try various methods and keeping the best method, I don't have time to implement all of them. That's why I am looking for such methods in MATLAB.Does any one know?
svmtrain
MATLAB has other utilities for classification like cluster analysis, random forests, etc.
If you don't have the required toolbox for svmtrain, I recommend LIBSVM. It's free and I've used it a lot with good results.
The Statistics Toolbox has sequentialfs. See also the documentation on feature selection.
A similar approach is dimensionality reduction. In MATLAB you can easily perform PCA or Factor analysis.
Alternatively you can take a wrapper approach to feature selection. You would search through the space of features by taking a subset of features each time, and evaluating that subset using any classification algorithm you decide (LDA, Decision tree, SVM, ..). You can do this as an exhaustively or using some kind of heuristic to guide the search (greedy, GA, SA, ..)
If you have access to the Bioinformatics Toolbox, it has a randfeatures function that does a similar thing. There's even a couple of cool demos of actual use cases.
May be this might help:
There are two ways of selecting the features in the classification:
Using fselect.py from libsvm tool directory (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#feature_selection_tool)
Using sequentialfs from statistics toolbox.
I would recommend using fselect.py as it provides more options - like automatic grid search for optimum parameters (using grid.py). It also provides an F-score based on the discrimination ability of the features (see http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf for details of F-score).
Since fselect.py is written in python, either you can use python interface or as I prefer, use matlab to perform a system call to python:
system('python fselect.py <training file name>')
Its important that you have python installed, libsvm compiled (and you are in the tools directory of libsvm which has grid.py and other files).
It is necessary to have the training file in libsvm format (sparse format). You can do that by using sparse function in matlab and then libsvmwrite.
xtrain_sparse = sparse(xtrain)
libsvmwrite('filename.txt',ytrain,xtrain_sparse)
Hope this helps.
For sequentialfs with libsvm, you can see this post:
Features selection with sequentialfs with libsvm