Matlab find closest function from plot - matlab

I have a probability distribution of some datasets. I have drawn this from the datasets in matlab. The question is that I want to generate similar data with these. I want to get the approximate function in this graph so that I can use it to generate new numbers. Is there a way to make this? either getting the nearest function of that plot OR some other suggestion to make this is appreciated.
EDIT: I have added the plot
A more general question that may be little more useful can be that
Given that there are some random variables I need an algorithm, software, technique to get similar random variable numbers.
I can understand that artificial intelligence techniques can be useful in this but I am wondering if there is a simple question. Possibly (my ai knowledge is not that good) some neural network or markov chain model can make this job but I believe that could be done easier. The function that I want is probability distribution function. I have 3 datasets and they are consistent with each other. They have very similar distributions.

Related

Are there any softwares that implemented the multiple output gauss process?

I am trying to implement bayesian optimization using gauss process regression, and I want to try the multiple output GP firstly.
There are many softwares that implemented GP, like the fitrgp function in MATLAB and the ooDACE toolbox.
But I didn't find any available softwares that implementd the so called multiple output GP, that is, the Gauss Process Model that predict vector valued functions.
So, Are there any softwares that implemented the multiple output gauss process that I can use directly?
I am not sure my answer will help you as you seem to search matlab libraries.
However, you can do co-kriging in R with gstat. See http://www.css.cornell.edu/faculty/dgr2/teach/R/R_ck.pdf or https://github.com/cran/gstat/blob/master/demo/cokriging.R for more details about usage.
The lack of tools to do cokriging is partly due to the relative difficulty to use it. You need more assumptions than for simple kriging: in particular, modelling the dependence between in of the cokriged outputs via a cross-covariance function (https://stsda.kaust.edu.sa/Documents/2012.AGS.JASA.pdf). The covariance matrix is much bigger and you still need to make sure that it is positive definite, which can become quite hard depending on your covariance functions...

Obtaining distribution from histogram

I have an array of values, with that values I plotted the histogram.I want to know the corresponding distribution from the histogram obtained. How is it possible.
Could you please explain the steps in obtaining appropriate probability distribution from histogram.
You'd better to ask this question in stats.stackexchange.com as it is more about the method than the programming. However, one thing that you can do is to fit a parametric distribution (using moment matching or maximum likelihood for example) then compare the fitted distribution to the normalized histogram using KL divergence or Bhattacharyya distance.
One option might be to use the "Distribution Fitting App" in the Statistics and Machine Learning Toolbox. That should help you evaluate if your data seems like it might have been drawn from some common distributions. You may never know for sure, since multiple distributions could account for the data, but if you have a lot of data it might help you narrow it down.
I think that in many cases an eye-ball comparison is enough. With a reasonable amount of data, it is quite difficult to not be able to distinguish between a gaussian or a weibull or...
I would use fitdist or fithist to eye-ball different distributions.
If you have no idea at all on the distribution and you want to know if two datasets are distributed differently it could be useful to compare their distributions by obtaining them with the option 'kernel'

Essential philosophy behind Support Vector Machine

I am studying Support Vector Machines (SVM) by reading a lot of material. However, it seems that most of it focuses on how to classify the input 2D data by mapping it using several kernels such as linear, polynomial, RBF / Gaussian, etc.
My first question is, can SVM handle high-dimensional (n-D) input data?
According to what I found, the answer is YES!
If my understanding is correct, n-D input data will be
constructed in Hilbert hyperspace, then those data will be
simplified by using some approaches (such as PCA ?) to combine it together / project it back to 2D plane, so that
the kernel methods can map it into an appropriate shape such a line or curve can separate it into distinguish groups.
It means most of the guides / tutorials focus on step (3). But some toolboxes I've checked cannot plot if the input data greater than 2D. How can the data after be projected to 2D?
If there is no projection of data, how can they classify it?
My second question is: is my understanding correct?
My first question is, does SVM can handle high-dimensional (n-D) input data?
Yes. I have dealt with data where n > 2500 when using LIBSVM software: http://www.csie.ntu.edu.tw/~cjlin/libsvm/. I used linear and RBF kernels.
My second question is, does it correct my understanding?
I'm not entirely sure on what you mean here, so I'll try to comment on what you said most recently. I believe your intuition is generally correct. Data is "constructed" in some n-dimensional space, and a hyperplane of dimension n-1 is used to classify the data into two groups. However, by using kernel methods, it's possible to generate this information using linear methods and not consume all the memory of your computer.
I'm not sure if you've seen this already, but if you haven't, you may be interested in some of the information in this paper: http://pyml.sourceforge.net/doc/howto.pdf. I've copied and pasted a part of the text that may appeal to your thoughts:
A kernel method is an algorithm that depends on the data only through dot-products. When this is the case, the dot product can be replaced by a kernel function which computes a dot product in some possibly high dimensional feature space. This has two advantages: First, the ability to generate non-linear decision boundaries using methods designed for linear classifiers. Second, the use of kernel functions allows the user to apply a classifier to data that have no obvious fixed-dimensional vector space representation. The prime example of such data in bioinformatics are sequence, either DNA or protein, and protein structure.
It would also help if you could explain what "guides" you are referring to. I don't think I've ever had to project data on a 2-D plane before, and it doesn't make sense to do so anyway for data with a ridiculous amount of dimensions (or "features" as it is called in LIBSVM). Using selected kernel methods should be enough to classify such data.

Functional form of 2D interpolation in Matlab

I need to construct an interpolating function from a 2D array of data. The reason I need something that returns an actual function is, that I need to be able to evaluate the function as part of an expression that I need to numerically integrate.
For that reason, "interp2" doesn't cut it: it does not return a function.
I could use "TriScatteredInterp", but that's heavy-weight: my grid is equally spaced (and big); so I don't need the delaunay triangularisation.
Are there any alternatives?
(Apologies for the 'late' answer, but I have some suggestions that might help others if the existing answer doesn't help them)
It's not clear from your question how accurate the resulting function needs to be (or how big, 'big' is), but one approach that you could adopt is to regress the data points that you have using a least-squares or Kalman filter-based method. You'd need to do this with a number of candidate function forms and then choose the one that is 'best', for example by using an measure such as MAE or MSE.
Of course this requires some idea of what the form underlying function could be, but your question isn't clear as to whether you have this kind of information.
Another approach that could work (and requires no knowledge of what the underlying function might be) is the use of the fuzzy transform (F-transform) to generate line segments that provide local approximations to the surface.
The method for this would be:
Define a 2D universe that includes the x and y domains of your input data
Create a 2D fuzzy partition of this universe - chosing partition sizes that give the accuracy you require
Apply the discrete F-transform using your input data to generate fuzzy data points in a 3D fuzzy space
Pass the inverse F-transform as a function handle (along with the fuzzy data points) to your integration function
If you're not familiar with the F-transform then I posted a blog a while ago about how the F-transform can be used as a universal approximator in a 1D case: http://iainism-blogism.blogspot.co.uk/2012/01/fuzzy-wuzzy-was.html
To see the mathematics behind the method and extend it to a multidimensional case then the University of Ostravia has published a PhD thesis that explains its application to various engineering problems and also provides an example of how it is constructed for the case of a 2D universe: http://irafm.osu.cz/f/PhD_theses/Stepnicka.pdf
If you want a function handle, why not define f=#(xi,yi)interp2(X,Y,Z,xi,yi) ?
It might be a little slow, but I think it should work.
If I understand you correctly, you want to perform a surface/line integral of 2-D data. There are ways to do it but maybe not the way you want it. I had the exact same problem and it's annoying! The only way I solved it was using the Surface Fitting Tool (sftool) to create a surface then integrating it.
After you create your fit using the tool (it has a GUI as well), it will generate an sftool object which you can then integrate in (2-D) using quad2d
I also tried your method of using interp2 and got the results (which were similar to the sfobject) but I had no idea how to do a numerical integration (line/surface) with the data. Creating thesfobject and then integrating it was much faster.
It was the first time I do something like this so I confirmed it using a numerically evaluated line integral. According to Stoke's theorem, the surface integral and the line integral should be the same and it did turn out to be the same.
I asked this question in the mathematics stackexchange, wanted to do a line integral of 2-d data, ended up doing a surface integral and then confirming the answer using a line integral!

Find class probabilities in matlab PNN and make ROC plot

I have a Probablistic Neural Network classification experiment set up in MATLAB. I can get the classes for unseen data using the sim command. Is there any way I can get the probabilities for the classes that the classifier calculates? Also, is there any direct way to plot the Reciever Operating Characterstic curve and calculate the Area Under the ROC for my classifier?
if you have the Statistics Toolbox, you can use perfcurve function added in recent versions of MATLAB to plot ROC curves and get AUC.
You may have better luck getting a response if you include a little more background and define your terms. I recognize ROC as receiver operating characteristic curve, but PNN and AUC are just alphabet soup to me. Don't make the mistake of assuming that someone outside of your very specific problem domain cannot help you. You have to build a bit of a language bridge by explaining your jargon first, though. This has the added advantage of making this particular question more useful to the stackoverflow community at large when it is eventually answered.