Adjacency matrix from edge list (preferrably in Matlab) - matlab

I have a list of triads (vertex1, vertex2, weight) representing the edges of a weighted directed graph. Since prototype implementation is going on in Matlab, these are imported as a Nx3 matrix, where N is the number of edges. So the naive implementation of this is
id1 = L(:,1);
id2 = L(:,2);
weight = L(:,3);
m = max(max(id1, id2)) % to find the necessary size
V = zeros(m,m)
for i=1:m
V(id1(i),id2(i)) = weight(i)
end
The trouble with tribbles is that "id1" and "id2" are nonconsecutive; they're codes. This gives me three problems. (1) Huge matrices with way too many "phantom", spurious vertices, which distorts the results of algorithms to be used with that matrix and (2) I need to recover the codes in the results of said algorithms (suffice to say this would be trivial if id codes where consecutive 1:m).
Answers in Matlab are preferrable, but I think I can hack back from answers in other languages (as long as they're not pre-packaged solutions of the kind "R has a library that does this").
I'm new to StackOverflow, and I hope to be contributing meaningfully to the community soon. For the time being, thanks in advance!
Edit: This would be a solution, if we didn't have vertices at the origin of multiple vertices. (This implies a 1:1 match between the list of edge origins and the list of identities)
for i=1:n
for j=1:n
if id1(i) >0 & i2(j) > 0
V(i,j) = weight(i);
end
end
end

You can use the function sparse:
sparse(id1,id2,weight,m,m)

If your problem is that the node ID numbers are nonconsecutive, why not re-map them onto consecutive integers? All you need to do is create a dictionary of all unique node ID's and their correspondence to new IDs.
This is really no different to the case where you're asked to work with named nodes (Australia, Britain, Canada, Denmark...) - you would map these onto consecutive integers first.

You can use GRP2IDX function to convert your id codes to consecutive numbers, and ids can be either numerical or not, does not matter. Just keep the mapping information.
[idx1, gname1, gmap1] = grp2idx(id1);
[idx2, gname2, gmap2] = grp2idx(id2);
You can recover the original ids with gmap1(idx1).
If your id1 and id2 are from the same set you can apply grp2idx to their union:
[idx, gname,gmap] = grp2idx([id1; id2]);
idx1 = idx(1:numel(id1));
idx2 = idx(numel(id1)+1:end);
For the reordering see a recent question - how to assign a set of coordinates in Matlab?
You can use ACCUMARRAY or SUB2IND to solve this problem.
V = accumarray([idx1 idx2], weight);
or
V = zeros(max(idx1),max(idx2)); %# or V = zeros(max(idx));
V(sub2ind(size(V),idx1,idx2)) = weight;
Confirm if you have non-unique combinations of id1 and id2. You will have to take care of that.

Here is another solution:
First put together all your vertex ids since there might a sink vertex in your graph:
v_id_from = edge_list(:,1);
v_id_to = edge_list(:,2);
v_id_all = [v_id_from; v_id_to];
Then find the unique vertex ids:
v_id_unique = unique(v_id_all);
Now you can use the ismember function to get the mapping between your vertex ids and their consecutive index mappings:
[~,from] = ismember(v_id_from, v_id_unique);
[~,to] = ismember(v_id_to, v_id_unique);
Now you can use sub2ind to populate your adjacency matrix:
adjacency_matrix = zeros(length(from), length(to));
linear_ind = sub2ind(size(adjacency_matrix), from, to);
adjacency_matrix(linear_ind) = edge_list(:,3);
You can always go back from the mapped consecutive id to the original vertex id:
original_vertex_id = v_id_unique(mapped_consecutive_id);
Hope this helps.

Your first solution is close to what you want. However it is probably best to iterate over your edge list instead of the adjacency matrix.
edge_indexes = edge_list(:, 1:2);
n_edges = max(edge_indexes(:));
adj_matrix = zeros(n_edges);
for local_edge = edge_list' %transpose in order to iterate by edge
adj_matrix(local_edge(1), local_edge(2)) = local_edge(3);
end

Related

Finding means from fields in a structure without for loop

I have data from an experiment in a structure like this:
data.subject.trial
I need to find the means for scores on trials across all participants (e.g. what is the mean score of all participants on trial x?).
I can get there using a for loop as below but it feels like there should be an easier one-liner to achieve the same thing (values in "trial" are numeric in this instance). Any tips? Many thanks!
for i = 1:length(data.subject)
for j = 1:length(data.subject(i).trial)
a(i,j) = data.subject(i).trial(j);
end
end
trialMeans = mean(a);
I think I've stumbled across an answer to my own question...
A = cell2mat({data.subject.trial}); % Put all scores from all trials into 1 vector
B = reshape(A,[],length(data.subject))'; % Reshape into rows of however many subjects there are
trialMeans = mean(B);
Thanks!

Indexing elements of parameters of a function within nested for loops

I have two matrices of results, A = 128x631 and B = 128x1014 and I have a function SSD that takes two elements (x,y) as parameters and then calculates the sum of squared differences. I also have a 631x1014 matrix of 0s, called SSDMatrix, ready to put the results of my SSD function into.
What I'm trying to do is compare each element of A with each element of B by passing them into SSD, but I can't figure out how to structure my for loops to get the desired results.
When I try:
SSDMatrix = SSD(A, B);
I get exactly the result I'm looking for, but only for the first cell. How can I repeat this process for each element of A and B?
Currently I have this:
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = SSD(A,B);
end
end
This just results in the first answer being repeated 631*1014 times, so I need a way to index A and B to get the appropriate answer for each (i,j) of SSDMatrix.
It seems you were needed to do something like this -
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = sum( (A(:,i) - B(:,j)).^ 2 );
end
end
This, you can achieve with pdist2 as well that gets us the square root of summed squared distances. Now, please do note that pdist2 is part of the Statistics Toolbox. So, to get the desired output, you can do -
out = pdist2(A.',B.').^2;
Or with bsxfun -
out = squeeze(sum(bsxfun(#minus,A,permute(B,[1 3 2])).^2,1));

Optimizing code, removing "for loop"

I'm trying to remove outliers from a tick data series, following Brownlees & Gallo 2006 (if you may be interested).
The code works fine but given that I'm working on really long vectors (the biggest has 20m observations and after 20h it was not done computing) I was wondering how to speed it up.
What I did until now is:
I changed the time and date format to numeric double and I saw that it saves quite some time in processing and A LOT OF MEMORY.
I allocated memory for the vectors:
[n] = size(price);
x = price;
score = nan(n,'double'); %using tic and toc I saw that nan requires less time than zeros
trimmed_mean = nan(n,'double');
sd = nan(n,'double');
out_mat = nan(n,'double');
Here is the loop I'd love to remove. I read that vectorizing would speed up a lot, especially using long vectors.
for i = k+1:n
trimmed_mean(i) = trimmean(x(i-k:i-1 & i+1:i+k),10,'round'); %trimmed mean computed on the 'k' closest observations to 'i' (i is excluded)
score(i) = x(i) - trimmed_mean(i);
sd(i) = std(x(i-k:i-1 & i+1:i+k)); %same as the mean
tmp = abs(score(i)) > (alpha .* sd(i) + gamma);
out_mat(i) = tmp*1;
end
Here is what I was trying to do
trimmed_mean=trimmean(regroup_matrix,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(regroup_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
out_mat = temp*1;
But given that I'm totally new to Matlab, I don't know how to properly construct the matrix of neighbouring observations. I just think it should be shaped like: regroup_matrix= nan (n,2*k).
EDIT: To be specific, what I am trying to do (and I am not able to) is:
Given a column vector "x" (n,1) for each observation "i" in "x" I want to take the "k" neighbouring observations to "i" (from i-k to i-1 and from i+1 to i+k) and put these observations as rows of a matrix (n, 2*k).
EDIT 2: I made a few changes to the code and I think I am getting closer to the solution. I posted another question specific to what I think is the problem now:
Matlab: Filling up matrix rows using moving intervals from a column vector without a for loop
What I am trying to do now is:
[n] = size(price,1);
x = price;
[j1]=find(x);
matrix_left=zeros(n, k,'double');
matrix_right=zeros(n, k,'double');
toc
matrix_left(j1(k+1:end),:)=x(j1-k:j1-1);
matrix_right(j1(1:end-k),:)=x(j1+1:j1+k);
matrix_group=[matrix_left matrix_right];
trimmed_mean=trimmean(matrix_group,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(matrix_group,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
I have problems with the matrix_left and matrix_right creation.
j1, that I am using for indexing is a column vector with the indices of price's observations. The output is simply
j1=[1:1:n]
price is a column vector of double with size(n,1)
For your reshape, you can do the following:
idxArray = bsxfun(#plus,(k:n)',[-k:-1,1:k]);
reshapedArray = x(idxArray);
Thanks to Jonas that showed me the way to go I came up with this:
idxArray_left=bsxfun(#plus,(k+1:n)',[-k:-1]); %matrix with index of left neighbours observations
idxArray_fill_left=bsxfun(#plus,(1:k)',[1:k]); %for observations from 1:k I take the right neighbouring observations, this way when computing mean and standard deviations there will be no problems.
matrix_left=[idxArray_fill_left; idxArray_left]; %Just join the two matrices and I have the complete matrix of left neighbours
idxArray_right=bsxfun(#plus,(1:n-k)',[1:k]); %same thing as left but opposite.
idxArray_fill_right=bsxfun(#plus,(n-k+1:n)',[-k:-1]);
matrix_right=[idxArray_right; idxArray_fill_right];
idx_matrix=[matrix_left matrix_right]; %complete index matrix, joining left and right indices
neigh_matrix=x(idx_matrix); %exactly as proposed by Jonas, I fill up a matrix of observations from 'x', following idx_matrix indexing
trimmed_mean=trimmean(neigh_matrix,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(neigh_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
Again, thanks a lot to Jonas. You really made my day!
Thanks also to everyone that had a look to the question and tried to help!

Matlab: Avoid for-loop by using clever matrix indexing & find? How?

I've been getting into Matlab more and more lately and another question came up during my latest project.
I generate several rectangles (or meshs) within an overall boundary.
These meshs can have varying spacings/intervals.
I do so, because I want to decrease the mesh/pixel resolution of certain areas of a digital elevation model. So far, everything works fine.
But because the rectangles can be chosen in a GUI, it might happen that the rectangles overlap. This overlap is what I want to find, and remove. Would they have the same spacing, e.g. rectangle 1&2 would look something like this:
[t1x, t1y] = meshgrid(1:1:9,1:1:9);
[t2x, t2y] = meshgrid(7:1:15,7:1:15);
[t3x, t3y] = meshgrid(5:1:17,7:1:24);
In this case, I could just use unique, to find the overlapping areas.
However, they look more like this:
[t1x, t1y] = meshgrid(1:2:9,1:2:9);
[t2x, t2y] = meshgrid(7:3:15,7:3:15);
[t3x, t3y] = meshgrid(5:4:17,7:4:24);
Therefore, unique cannot be applied, because mesh 1 might very well overlap with mesh 2 without having the same nodes. For convenience and further processing, all rectangles / meshes are brought into column notation and put in one result matrix within my code:
result = [[t1x(:), t1y(:)]; [t2x(:), t2y(:)]; [t3x(:), t3y(:)]];
Now I was thinking about using 2 nested for-loops to solve this problem, sth like this (which does not quite work yet):
res = zeros(length(result),1);
for i=1:length(result)
currX = result(i,1);
currY = result(i,2);
for j=1:length(result)
if result(j,1)< currX < result(j+1,1) && result(j,2)< currY < result(j+1,2)
res(j) = 1;
end
end
end
BUT: First of all, this does not quite work yet, because I get an out of bounds error due to length(result)=j+1 and moreover, res(j) = 1 seems to get overwritten by the loop.
But this was just for testing and demonstratin anyway.
Because the meshes shown here are just examples, and the ones I use are fairly big, the result Matrix contains up to 2000x2000 = 4 mio nodes --> lenght(result) ~4mio.
Putting this into a nested for-loop running over the entire length will most likely kill my memory.
Therefore I was hoping to find a sophisticade solution which does not require a nested loop, but takes advantage of Matlabs find and clever matrix indexing.
I am not able to think of something, but was hoping to get help here.
Discussions and help is very much appreciated!
Cheers,
Theo
Here follows a quick stab (not extensively tested):
% Example meshes
[t1x, t1y] = meshgrid(1:2:9,1:2:9);
[t2x, t2y] = meshgrid(7:3:15,7:3:15);
% Group points for convenience
A = [t1x(:), t1y(:)];
B = [t2x(:), t2y(:)];
% Compare which points of A within edges of B (and viceversa)
idxA = A(:,1) >= B(1,1) & A(:,1) <= B(end,1) & A(:,2) >= B(1,2) & A(:,2) <= B(end,2);
idxB = B(:,1) >= A(1,1) & B(:,1) <= A(end,1) & B(:,2) >= A(1,2) & B(:,2) <= A(end,2);
% Plot result of identified points
plot(A(:,1),A(:,2), '*r')
hold on
plot(B(:,1),B(:,2), '*b')
plot([A(idxA,1); B(idxB,1)], [A(idxA,2); B(idxB,2)], 'sk')
I squared the points that were identified as overlapping:
Also, related to your question is this Puzzler: overlapping rectangles by Doug Hull of TMW.

Rectifying compute_curvature.m error in Toolbox Graph in Matlab

I am currently using the Toolbox Graph on the Matlab File Exchange to calculate curvature on 3D surfaces and find them very helpful (http://www.mathworks.com/matlabcentral/fileexchange/5355). However, the following error message is issued in “compute_curvature” for certain surface descriptions and the code fails to run completely:
> Error in ==> compute_curvature_mod at 75
> dp = sum( normal(:,E(:,1)) .* normal(:,E(:,2)), 1 );
> ??? Index exceeds matrix dimensions.
This happens only sporadically, but there is no obvious reason why the toolbox works perfectly fine for some surfaces and not for others (of a similar topology). I also noticed that someone had asked about this bug back in November 2009 on File Exchange, but that the question had gone unanswered. The post states
"compute_curvature will generate an error on line 75 ("dp = sum(
normal(:,E(:,1)) .* normal(:,E(:,2)), 1 );") for SOME surfaces. The
error stems from E containing indices that are out of range which is
caused by line 48 ("A = sparse(double(i),double(j),s,n,n);") where A's
values eventually entirely make up the E matrix. The problem occurs
when the i and j vectors create the same ordered pair twice in which
case the sparse function adds the two s vector elements together for
that matrix location resulting in a value that is too large to be used
as an index on line 75. For example, if i = [1 1] and j = [2 2] and s
= [3 4] then A(1,2) will equal 3 + 4 = 7.
The i and j vectors are created here:
i = [face(1,:) face(2,:) face(3,:)];
j = [face(2,:) face(3,:) face(1,:)];
Just wanted to add that the error I mentioned is caused by the
flipping of the sign of the surface normal of just one face by
rearranging the order of the vertices in the face matrix"
I have tried debugging the code myself but have not had any luck. I am wondering if anyone here has solved the problem or could give me insight – I need the code to be sufficiently general-purpose in order to calculate curvature for a variety of surfaces, not just for a select few.
The November 2009 bug report on File Exchange traces the problem back to the behavior of sparse:
S = SPARSE(i,j,s,m,n,nzmax) uses the rows of [i,j,s] to generate an
m-by-n sparse matrix with space allocated for nzmax nonzeros. The
two integer index vectors, i and j, and the real or complex entries
vector, s, all have the same length, nnz, which is the number of
nonzeros in the resulting sparse matrix S . Any elements of s
which have duplicate values of i and j are added together.
The lines of code where the problem originates are here:
i = [face(1,:) face(2,:) face(3,:)];
j = [face(2,:) face(3,:) face(1,:)];
s = [1:m 1:m 1:m];
A = sparse(i,j,s,n,n);
Based on this information removal of the repeat indices, presumably using unique or similar, might solve the problem:
[B,I,J] = unique([i.' j.'],'rows');
i = B(:,1).';
j = B(:,2).';
s = s(I);
The full solution may look something like this:
i = [face(1,:) face(2,:) face(3,:)];
j = [face(2,:) face(3,:) face(1,:)];
s = [1:m 1:m 1:m];
[B,I,J] = unique([i.' j.'],'rows');
i = B(:,1).';
j = B(:,2).';
s = s(I);
A = sparse(i,j,s,n,n);
Since I do not have a detailed understanding of the algorithm it is hard to tell whether the removal of entries will have a negative effect.