What is the k nearest neighbour regression function in Matlab? Is only knn classification function available? Is anybody knowing any useful literature regarding to that?
Regards
Farideh
I don't believe the k-NN regression algorithm is directly implemented in matlab, but if you do some googling you can find some valid implementations. The algorithm is fairly simple though.
Find the k-Nearest elements using whatever distance metric is suitable.
Convert the inverse distance weight of each of the k elements
Compute weighted mean of the k elements using the inverse distance weight.
Related
I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which exceeds the threshold. How can I generate it? My intention is to use the LSH results for clustering.
The whole point of LSH is to avoid pairwise distances, because that does not scale.
If you then put the data into a distance matrix, you get all the scalability problems again!
Instead consider an algorithm like DBSCAN clustering. It doesn't need a distance matrix, only neighbors at distance epsilon.
my aim is to classify the data into two sections- upper and lower- finding the mid line of the peaks.
I would like to apply machine learning methods- i.e. Discriminant analysis.
Could you let me know how to do that in MATLAB?
It seems that what you are looking for is GMM (gaussian mixture model). With K=2 (number of mixtures) and dimension equal 1 this will be simple, fast method, which will give you a direct solution. Given components it is easy to analytically find a local minima (which is just a weighted average of means, with weights proportional to the std's).
I 'm using k-means algorithm for clustering my data.
I have 5 thousand samples. .(Each of my sample is about a customer. to analyse customer value I 'm going to clustering them base on 4 behavior features.)
The distance is calculated using the Euclidean metric and Pearson correlation.
I need to know
I don't know Euclidean distance is the correct method for calculating distances or Pearson correlation?
I 'm using silhouette to validate my clustering. when I'm using Pearson correlation silhouette value is more than when I use Euclidean metric.
Whether this means that Pearson correlation is more appropriate for distance metric?
k-means does not support arbitrary distances.
It is based on variance minimization, which corresponds to (squared) Euclidean distance.
With Peason correlation, it will fail badly.
See this answer for an example how k-means fails badly with Pearson:
https://stackoverflow.com/a/21335448/1060350
short summary: the mean does not work for Pearson, but k-means is based on computing means. Use PAM or a similar method instead that uses medoids.
I read about spherical kmeans but i did not come across an implementation.To be clear, similarity is simple the dot product of two document unit vectors.I have read that standard k means uses distance as measure. Is the distance being specified the vector distance just like in coordinate geometry sqrt((x2 -x1)^2 + (y2-y1)^2)?
There are more clustering methods than k-means. The problem with k-means is not so much that is is built on Euclidean distance, but that the mean must reduce the distances for the algorithm to converge.
However, there are tons of other clustering algorithms that do not need to compute a mean or have triangle inequality. If you read the Wikipedia article on DBSCAN, it also mentions a version called GDBSCAN, Generalized DBSCAN. You definitely should be able to plug your similarity function into GDBSCAN. Most likely, you could just use 1/similarity and use it as a distance function, unless the algorithm requires triangle inequality. So this trick should work with DBSCAN and OPTICS, for example. Probably also with hierarchical clustering, k-medians and k-medoids (PAM).
I'm working on my thesis project on financial mathematics. One problem I'm having is that I want to find out if there is some correlation between a theoretical curve and scatter point data.
Here is the scatter data and the theoretical curve that I have.
Is there some easy way of doing this?
Bivariate correlation (usually Pearson correlation) is a statistic that measures linear dependence between two sets of data. The theoretical curve of your link does not seem to consist of discrete data points, therefore it is not possible to calculate correlation between it a and some set of data.
Depending on the model and the research question you have, you might be interested in analyzing the fit of your data to the model, using multivariate regression analysis or general[ized] linear model. These MATLAB commands could be useful: regress (multiple linear regression), regstats (regression diagnostics), glmfit (generalized linear model regression) and glmval (generalized linear model values).