There is some unknown spatial process on square.
How to generate random sample of points on square if I have only
distribution of average distance between points - any generic model or simple idea ?
What if I have distance matrix of some realization of such
process ~500 observations would it be useful (matrix of distances between points from realization of process ) ?
Time dimension have no use here, I don't have any temporal data about the process.
Related
I created a 2-dimensional random datasets (composed from a dataset of points and a column of labels) for centroid based k-means clustering in MATLAB where each point is represented by a vector of X and Y (the point coordinates) and each label represents the data point cluster,see example in figure below.
I applied the K-means clustering algorithm on these point datasets. I need help with the following:
What function can I use to evaluate the accuracy of the K-means algorithm? In more detail: My aim is to score the Kmeans algorithm based on how many assigned labels it correctly identifies by comparing with assigned numbers by matlab. For example, I verify if the point (7.200592168, 11.73878455) is assigned with the point (6.951107307, 11.27498898) to the same cluster... etc.
If I correctly understand your question, you are looking for the adjusted rand index. This will score the similarity between your matlab labels and your k-means labels.
Alternatively you can create a confusion matrix to visualise the mapping between your two labelsets.
I would use squared error
You are trying to minimize the total squared distance between each point and the mean coordinate of it's cluster.
I have a correlation matrix for N random variables. Each of them is uniformly distributed within [0,1]. I am trying to simulate these random variables, how can I do that? Note N > 2. I was trying to using Cholesky Decomposition and below is my steps:
get the lower triangle of the correlation matrix (L=N*N)
independently sample 10000 times for each of the N uniformly distributed random variables (S=N*10000)
multiply the two: L*S, and this gives me correlated samples but the range of them is not within [0,1] anymore.
How can I solve the problem?
I know that if I only have 2 random variables I can do something like:
1*x1+sqrt(1-tho^2)*y1
to get my correlated sample y. But if you have more than two variables correlated, not sure what should I do.
You can get approximate solutions by generating correlated normals using the Cholesky factorization, then converting them to U(0,1)'s using the normal CDF. The solution is approximate because the normals have the desired correlation, but converting to uniforms is a non-linear transformation and only linear xforms preserve correlation.
There's a transformation available which will give exact solutions if the transformed Var/Cov matrix is positive semidefinite, but that's not always the case. See the abstract at https://www.tandfonline.com/doi/abs/10.1080/03610919908813578.
I am using chessboard to estimate translation vector between it and the camera. Firstly, the intrinsic camera parameters are calculated, then translation vector are estimated using n points detected from the chessboard.
I found a very strange phenomenon: the translation vector is accurate and stable when using more points in the chessboard, and such phenomenon is more obvious when the distance is farer. For instance, the square in the chessboard is 1cm*1cm, when the distance is 3m, translation vector is accurately estimated when using 25 points, while it is inaccurate and unstable using the minimal 4 points. However, when the distance is 0.6m, estimation results of translation vector using 4 points and 25 points are similar, which are all accurate.
How to explain this phenomenon (in theory)? what's the relationship between stable estimation result and distance, and number of points?
Thanks.
When you are using a smaller number of points, the calculation of the translation vector is more sensitive to the noise in coordinates of those points. Point coordinates are noisy due to a finite resolution of the camera (among other things). A that noise only increases with distance. So using a larger number of points should provide for a better estimation.
Task: I am working in Matlab and I have to construct a dendrogram from maximum values of a matrix of Euclidean distance.
What have I done so far: I have constructed the distance matrix based on the correlation coefficients of returns of prices (this is what I have in my application). I have also built the MST based on these distances. Now I have to construct the ultrametric matrix which is obtained by defining the subdominant ultrametric distance D*ij between i and j as the maximum value of any Euclidean distance Dkl detected by moving in single steps from i to j in the MST.
CorrelMatrix=corrcoef(Returns);
DistMatrix=sqrt(2.*(1-CorrelMatrix));
DG=sparse(DistMatrix);
[ST,pred] = graphminspantree(DG,'Method','Prim');
Z = linkage(DistMatrix);
dendrogram(Z)
I am a newbie in Matlab and I do not know if there is a function or something that I should use to find the maximum distance between two nodes, and to put if after in a matrix.
I have written a simple SOM algorithm in MATLAB. My big challenge is that, how can I visualize/plot data in the format of U-Matrix, Sample Hits and Component/Input Planes? These three plots exists in the SOM toolbox in MATLAB. But the problem is that I cannot call them to visualize my data over my written code. Because they need a 'net' as input in which my code does not make any 'net'.
Is there any guidance?
You can create your own functions as they are not too complicated. I will assume a SOM of 20x20x10 (400 nodes, 4 features) for explanation.
The Hit-Map is no more than giving each sample to the already learned SOM and incrementing +1 to the node that was chosen as the Best Matching Unit (BMU). Then you plot this map. So if node(1,1) fires 10 times, and node(1,2) fires 100 times, then you will have an image where node(1,2) has a higher intensity than node(1,1).
The U-Matrix is a map representing the average distance between the node's weight vector and its closest neighbours. So here you can calculate the Euclidean distance between the feature vector of node X to every neighbour. So if you had a feature vector for node(1,1,:)=[1,1,2,3], node(1,2,:)=[2,2,1,1], and node(2,1,:)=[1,1,1,1], then the value of the U-matrix for node(1,1) could be U(1,1)=norm(squeeze(node(1,1,:)-node(1,2,:)))+norm(squeeze(node(1,1,:)-node(2,1,:)))=4.8818
The Component/Input Planes is the simplest one and does not require any processing. You just basically pick each feature of the SOM map and plot. So in our example of a 20x20x4 SOM, you would have 4 features and therefore 4 components, which you can plot through imagesc(node(:,:,1)) for feature 1