MATLAB: give IDs to points stored in a matrix to distinguish between neighbours - matlab

Even though the title might sound trivial at first, I hope someone can help me by giving me hints about the MATLAB functions I can use:
I have a matrix of points with properties for each (read: individuals with properties) of the form (x, y, direction):
A = [1 1 45°]
B = [3 1 225°]
C = [0 2 90°]
D = [5 5 187°]
With a probablity P particle A chooses one of B and C as neighbours and turns it direction according to its neighbour (while D is too far away) EDIT and moves towards it with a constant velocity (I basically forgot the most important part of the question ..., stupid me).
I have now implemented a matrix called:
I = [1 1 45; 3 1 225; 0 2 90; 5 5 187];
In a scenario A chooses C (randomly) as attractive neighbour and turns towards C. This means my program has to be able to distinguish between B and C.
Does there maybe exist a type like "point" where you can store properties with an ID? Do I have to use Vectors instead of one matrix? I am right now working with a lot of individuals, so preallocating 50 vectors would be not optimal (this is why chose a matrix).
To make a clear question:
I have a lot of points, I need to store 3 properties to an ID for each point and then check for one point with IDx which other points with IDy's are within reach.
The mathematics are irrelevant for now, but I need a function in MATLAB that gives a better option than storing these information in a matrix (because that one seems not good for identifying each point). This is part of a flocking simulation for individuals.
If anyone can help me with this I would be very happy! If I asked that question in a bad way please give me feedback as well to clarify.
Thanks!

From what I understood from you, the following can be done:
When you store your elements in the original matrix, let the row index be their ID.
Since points do not change locations but only orientation, then you can compute only once a matrix or relative distances (Upper triangle matrix with size n^2).
In the distance matrix use the IDs you have from your first matrix as IDs for the same objects in the second matrix. Your search will be a min-search over ~0.5*n^2 elements.

Related

Matlab plot scatterplot with intermediate coordinates

This is an exam task that I have. Lets say I have a 200x6 matrix where 200 people voted a movie with respect to 6 questions, each on a continous [0, 1]-scale (0: disagree, 1: agree).
To get a useful overview of the 6-dimensional dataset I want to plot the rank-2 approximation of the data. First I do the rank 2 approximation:
A = (200, 6); %some data
[U, S, V] = svd(A);
Ak = U(:, 1:2) * S(1:2, 1:2) * V(:, 1:2)';
I want to plot this approximation as a 2D scatterplot with a "*"-mark per survey participant using either U or V coordinates as intermediate coordinates depending on how my data is organized.. The problem is that I don't know what intermediate coordinates mean, and I can't find a good explanation anywhere. Wonder if someone could help, eventually providing a small code example. Any help appreciated, thank you.
Formally, intermediate axes are (ortogonal) linear combinations of your data (along maximun explained variance, a.k.a. principal components).
If most of data have similar shape (e.g. [5 4 3 2 1 0] pattern), then the first component will be similar to this shape/vector, since the variance arount it is minimal (or: variance along it is maximal). Next components as well minimize the rest of variance in ortogonal planes.
So, the answer is: principal components 1 and 2.
And more precicely: first intermediate coordinate value can be understood as magnitude of that "first main pattern" in a single data sample.

3-D Plotting with MATLAB for Galton's Skewness and Moor's Kurtosis

I know there are many plotting documents for Matlab online and I am pretty sure that it has been asked many times. I aplogize in advance for any inconvenience.
I am dealing with a new distribution and I need to draw 3D plot for different values of parameters (I can do it with Excel or any other programs, however, since my other graphs is drawn with MATLAB, and I need to put this 3D in Matlab, too, to publish it as an article). I calculated the result using MATLAB loops, however, plotting gives me the hardest time. I had no other choice but to ask for your assistance. I have these equations for different alphas and betas with a constant sigma and calculate Galton's Skewness and Moor's Kurtosis given with the last two equations.
median=sqrt(2*(sigma^2)*beta*gammaincinv(0.5,alpha));
q1=sqrt(2*(sigma^2)*beta*gammaincinv((6/8),alpha));
q3=sqrt(2*(sigma^2)*beta*gammaincinv((2/8),alpha));
q4=sqrt(2*(sigma^2)*beta*gammaincinv((7/8),alpha));
q5=sqrt(2*(sigma^2)*beta*gammaincinv((5/8),alpha));
q6=sqrt(2*(sigma^2)*beta*gammaincinv((3/8),alpha));
q7=sqrt(2*(sigma^2)*beta*gammaincinv((1/8),alpha));
galtonskewness=(q1-2*median+q3)/(q1-q3);
moorskurtosis=(q4-q5+q6-q7)/(q1-q3);
Let's assume that,
sigma=1
beta=[0.1 0.2 0.5 1 2 5];
alpha=[0.1 0.2 0.5 1 2 5];
I have used mesh(X,Y,Z) for the same range of alphas and betas with the same increment but I take the error "these values cannot be complex". I just want to draw something like the one below.
It must be something easy that I am missing out, but I do not understand where the mistake is. I appreciate any help. Thank you!
I ran the above code for a 2D mesh of points for alpha and beta between 0.1 and 5 for both dimensions and I got results for both.
I suspect it's due to your alpha and beta declaration. You are only providing a few points, and if you try to use mesh, it won't get good results. Therefore, define a meshgrid of points for both alpha and beta, then vectorize your MATLAB code to produce the kurotsis and skewness curves. Only under certain situations should you use for loops. In general, you should avoid using them whenever possible.
How meshgrid works is that given a range of X and Y values, it will produce two (or three if you want 3D co-ordinates) arrays where each location in each array gives you the spatial co-ordinate at that particular location. Therefore, if we did something like:
[X,Y] = meshgrid(1:3, 1:3);
This is what we get:
X =
1 2 3
1 2 3
1 2 3
Y =
1 1 1
2 2 2
3 3 3
Notice that in a 2D grid, for the top-left corner, (x,y) = (1,1), and so for the corresponding location in X, we get 1 and Y we get 1. If you do the same logic for any other position in the 2D grid, you simply look at the X and Y values in each array and it will tell you what the component is for each dimension.
As such, instead of looping through all possible points in your grid, generate them all using meshgrid, then vectorize the computation by calculating your values all at once rather than individually. Once you do this, you have the right structure to be able to put this into mesh.
Therefore, try doing this instead:
%// Define meshgrid of points
[alpha,beta] = meshgrid(0.1:0.1:5, 0.1:0.1:5);
%// From your code
sigma = 1;
%// Calculate quantities - Notice that this is all vectorized
med=sqrt(2*(sigma^2)*beta.*gammaincinv(0.5,alpha));
q1=sqrt(2*(sigma^2)*beta.*gammaincinv((6/8),alpha));
q3=sqrt(2*(sigma^2)*beta.*gammaincinv((2/8),alpha));
q4=sqrt(2*(sigma^2)*beta.*gammaincinv((7/8),alpha));
q5=sqrt(2*(sigma^2)*beta.*gammaincinv((5/8),alpha));
q6=sqrt(2*(sigma^2)*beta.*gammaincinv((3/8),alpha));
q7=sqrt(2*(sigma^2)*beta.*gammaincinv((1/8),alpha));
galtonskewness=(q1-2*med+q3)./(q1-q3);
moorskurtosis=(q4-q5+q6-q7)./(q1-q3);
%// Show our meshes
figure;
mesh(alpha, beta, galtonskewness);
figure;
mesh(alpha, beta, moorskurtosis);
Also take note that I renamed your median variable to med. MATLAB has a function called median and so you don't want to unintentionally shadow over this function with a variable of the same name.
This is what I get:
Take note that I'm not getting the plots that you have placed in your post. It may be because I'm choosing the wrong variables to define the mesh, or perhaps your equations may be incorrect. Double check what you know in theory to what you have here in code and try again.
This should hopefully give you enough to start with though!

How can I generate a set of n dimensional vectors that contains all integer points in an n-dimensional rectangular prism

Okay, so I'm working on a problem related to quantum chaos and one of the things I need to do is to map the unit cube in n-dimensions to a parallelepiped in n-dimensions and find all integer points in the interior of this parallelepiped. I have been trying to do this using the following scheme:
Given the linear map B and the dimension of the cube n, we find the coordinates of the corners of the unit hypercube by converting numbers j from 0 to (2^n -1) into their binary representation and turning them into vectors that describe the vertices of the cube.
The next step was to apply the map B to each of these vectors, which gives me a set of 2^n vectors describing the coordinates of the vertices of the parallelepiped in n dimensions
Now, we take the maximum and minimum value attained by any of these vertices in each coordinate direction, i.e the first element of my vectors might have a maximum value of 4 across all of the vertices and a minimum value of -3 etc. This gives me an n-dimensional rectangular prism that contains my parallelepiped and some extra unwanted space.
I now find all points with integer coordinates in this bounding rectangular prism described as vectors in n dimensions
Finally, I apply the inverse of the map B to each of the points and throw away any points that have any coefficients greater than 1 as they must originally have lain outside my unit hypercube.
My issue arises in step 4, I'm struggling to come up with a way of generating all vectors with integer coordinates in my rectangular hyper-prism such that I can change the number of dimensions n on the fly. Ideally, i'd like to be able to increase n at will until it becomes too computationally heavy to do so, but every method of finding all integer points in the prism i've tried so far has relied on n for loops to permute each element and thus I need to rewrite the code every time.
So I guess my question is this, is there any way to code this up so that I can change n on the fly? Also, any thoughts on the idea of the algorithm itself would be appreciated :) It wouldn't surprise me if i've overcomplicated things massively...
EDIT:
Of course as soon as I post the question I see a lovely little link in the side-bar where a clever method has been given already for how to do this: Generate a matrix containing all combinations of elements taken from n vectors
I'll leave this up for the moment just in case anyone has any comments on the method in general, but otherwise (since I can't upvote yet I'll just say it here) Luis Mendo, you are a hero!

Find the closest weight vector to each instance in the data matrix

Suppose I have a weight matrix W nxm where m is the number of variables and the n is the number of instances. Also I have data matrix X of the same size. I try to find the closest weight vector to each instance in X. However both matrices are so dimensional therefore plain methods are not sufficient enough. I have tried some GPU trick at MATLAB but it does not work well since it was sequential approach that was calculating the closest weight for each instance sequentially. I am now looking for efficient one shot code. That takes all the W and X and find the winner with some MATLAB tricks with possibly some GPU addition. Is there any one that can suggest any code snippet in the MATLAB?
This is the thing that I wrote for sequential
x_in_d = gpuArray(x_in); % take input instance to device
W_d = gpuArray(W); % take weight matrix to device
Dx = W_d - x_in_d(ones(size(W_d,1),1),logical(ones(1,length(x_in_d))));
[d_min,winner] = min(sum((Dx.^2)'));
d_min = gather(d_min); %gather results
winner = gather(winner);
What do you mean by so dimensional? It's just an m x n matrix right?
It would be really helpful if you could provide some sample data, based off your description (which isn't the clearest), here is what I think your data looks like.
weights=
[1 4 2
5 3 1]
data=
[2 5 1
1 2 2]
And you want to figure out which row of weights is closest to the row of data? Which in this case would be the first row of weights for both rows of data.
Please edit your question to clarify what your asking for and consider using some examples.
EDIT:
I like Rody's Dup. Comment, if I am correct, check out: Link Here

Matlab: how to find which variables from dataset could be discarded using PCA in matlab?

I am using PCA to find out which variables in my dataset are redundand due to being highly correlated with other variables. I am using princomp matlab function on the data previously normalized using zscore:
[coeff, PC, eigenvalues] = princomp(zscore(x))
I know that eigenvalues tell me how much variation of the dataset covers every principal component, and that coeff tells me how much of i-th original variable is in the j-th principal component (where i - rows, j - columns).
So I assumed that to find out which variables out of the original dataset are the most important and which are the least I should multiply the coeff matrix by eigenvalues - coeff values represent how much of every variable each component has and eigenvalues tell how important this component is.
So this is my full code:
[coeff, PC, eigenvalues] = princomp(zscore(x));
e = eigenvalues./sum(eigenvalues);
abs(coeff)/e
But this does not really show anything - I tried it on a following set, where variable 1 is fully correlated with variable 2 (v2 = v1 + 2):
v1 v2 v3
1 3 4
2 4 -1
4 6 9
3 5 -2
but the results of my calculations were following:
v1 0.5525
v2 0.5525
v3 0.5264
and this does not really show anything. I would expect the result for variable 2 show that it is far less important than v1 or v3.
Which of my assuptions is wrong?
EDIT I have completely reworked the answer now that I understand which assumptions were wrong.
Before explaining what doesn't work in the OP, let me make sure we'll have the same terminology. In principal component analysis, the goal is to obtain a coordinate transformation that separates the observations well, and that may make it easy to describe the data , i.e. the different multi-dimensional observations, in a lower-dimensional space. Observations are multidimensional when they're made up from multiple measurements. If there are fewer linearly independent observations than there are measurements, we expect at least one of the eigenvalues to be zero, because e.g. two linearly independent observation vectors in a 3D space can be described by a 2D plane.
If we have an array
x = [ 1 3 4
2 4 -1
4 6 9
3 5 -2];
that consists of four observations with three measurements each, princomp(x) will find the lower-dimensional space spanned by the four observations. Since there are two co-dependent measurements, one of the eigenvalues will be near zero, since the space of measurements is only 2D and not 3D, which is probably the result you wanted to find. Indeed, if you inspect the eigenvectors (coeff), you find that the first two components are extremely obviously collinear
coeff = princomp(x)
coeff =
0.10124 0.69982 0.70711
0.10124 0.69982 -0.70711
0.9897 -0.14317 1.1102e-16
Since the first two components are, in fact, pointing in opposite directions, the values of the first two components of the transformed observations are, on their own, meaningless: [1 1 25] is equivalent to [1000 1000 25].
Now, if we want to find out whether any measurements are linearly dependent, and if we really want to use principal components for this, because in real life, measurements my not be perfectly collinear and we are interested in finding good vectors of descriptors for a machine-learning application, it makes a lot more sense to consider the three measurements as "observations", and run princomp(x'). Since there are thus three "observations" only, but four "measurements", the fourth eigenvector will be zero. However, since there are two linearly dependent observations, we're left with only two non-zero eigenvalues:
eigenvalues =
24.263
3.7368
0
0
To find out which of the measurements are so highly correlated (not actually necessary if you use the eigenvector-transformed measurements as input for e.g. machine learning), the best way would be to look at the correlation between the measurements:
corr(x)
ans =
1 1 0.35675
1 1 0.35675
0.35675 0.35675 1
Unsurprisingly, each measurement is perfectly correlated with itself, and v1 is perfectly correlated with v2.
EDIT2
but the eigenvalues tell us which vectors in the new space are most important (cover the most of variation) and also coefficients tell us how much of each variable is in each component. so I assume we can use this data to find out which of the original variables hold the most of variance and thus are most important (and get rid of those that represent small amount)
This works if your observations show very little variance in one measurement variable (e.g. where x = [1 2 3;1 4 22;1 25 -25;1 11 100];, and thus the first variable contributes nothing to the variance). However, with collinear measurements, both vectors hold equivalent information, and contribute equally to the variance. Thus, the eigenvectors (coefficients) are likely to be similar to one another.
In order for #agnieszka's comments to keep making sense, I have left the original points 1-4 of my answer below. Note that #3 was in response to the division of the eigenvectors by the eigenvalues, which to me didn't make a lot of sense.
the vectors should be in rows, not columns (each vector is an
observation).
coeff returns the basis vectors of the principal
components, and its order has little to do with the original input
To see the importance of the principal components, you use eigenvalues/sum(eigenvalues)
If you have two collinear vectors, you can't say that the first is important and the second isn't. How do you know that it shouldn't be the other way around? If you want to test for colinearity, you should check the rank of the array instead, or call unique on normalized (i.e. norm equal to 1) vectors.