How to efficiently create an array by tracing back the parent nodes in Matlab? - matlab

I am working on a path planner algorithm. I have a Nx2 matrix NodeInfo which has the current node number in its 1st column and parent node number in its 2nd column. For example:
NodeInfo = [3,1;
4,1;
5,2;
6,2;
7,3;
8,4;
9,4;
10,4;
11,5;
12,6;
13,6;
14,6;
15,7;
16,7;
17,8;
18,8;
19,9;
20,9;
21,10;
22,10;
23,11;
24,11;
25,12;
26,12];
When the algorithm reaches to a goal it outputs the node number, which is 26 in this case. I am looking for a smart way of tracking back the parent nodes and creating an array of the nodes that resulted with the goal node. So the output should be:
Array = [26, 12, 6, 2];
Thanks!

p = NodeInfo(end,1);
parents = [p]
while (~isempty(p))
p = NodeInfo(find(NodeInfo(:,1)==p),2)
parents = [parents p]
end
The answer is stored in the parents

The code below uses a container, and it may take some time to build up a hashmap, but it might faster than find() when there is actually a larger dataset with a vast number of requests.
Edit: Added 2 nodes into NodeMap to prevent isKey() function in while condition from wasting too much time.
NodeMap = containers.Map(NodeInfo(:,1),NodeInfo(:,2)); %Create a container
NodeMap(1)=0; NodeMap(2)=0; %Add 2 nodes
nodes=zeros(1,length(NodeMap)); %pre-allocate
k=2; [N,nodes(1)]=deal(26); %Init parameters
while(N>0)
[nodes(k),N]=deal(NodeMap(N));
k=k+1;
end
nodes(nodes == 0)=[] %Cleaning up & print
The output of N=26 is:
nodes =
26 12 6 2
hope it helps!

Related

Can I vectorize extraction of data from a cell array in MATLAB?

I am wondering if it is possible to use a vector to access data within a cell array. I am hoping to accomplish this using a vectorized approach rather than a for-loop.
I'm attempting to run a simple microsimulation in MATLAB. I have a simulated cohort that is initially healthy, but some are at low risk for a particular disease while others are at high risk. Thus, I have an array (Starting_Cohort) that indicates each patient's risk level (first column) and their initial status (second column). In addition, I have a cell array (pstar) that indicates each patient's likelihood of transitioning between two hypothetical health states (i.e., "healthy" and "sick").
What I would like to accomplish is the following:
1) During each period of the simulation (t = 1:T), use the first column of the starting cohort to determine the patient's risk level (i.e., 1 or 2).
2) Use the risk level to access a specific row (dependent on their current health status) within a specific cell (dependent on their risk level) of the cell array.
3) Compare the resultant vector against a random draw from the uniform distribution (contained in the array "r"), and select the column number associated with the first value larger than that draw (the column number determines their health state in the subsequent period).
HOWEVER, I want to avoid doing this for one patient at a time (i.e., introducing a nested loop), as this increases the execution time of the code by an order of magnitude (the actual cohort consists of approximately 20000 patients). I've been trying to accomplish this through vectorization - that is, running the simulation over the entire patient cohort concurrently - but I hit a roadblock when trying to access data from the cell array described above.
Starting_Cohort = [1 1; 1 1; 2 1; 2 1;];
[Cohort_Size, ~] = size(Starting_Cohort);
pstar = cell(2, 1);
pstar{1, 1} = [0.75 1.00; 0.15 1.00]; pstar{2, 1} = [0.65 1.00; 0.25 1.00];
rng(1234, 'twister'); T = 5; r = rand(Cohort_Size, T);
Sim_Results = [Starting_Cohort zeros(Cohort_Size, T)];
for t = 1:T
[~, Sim_Results(:, t+2)] = max(pstar{Sim_Results(:, 1), 1} ...
(Sim_Results(:, t+1), :) > r(:, t), [], 2);
end
When I run the above code, I obtain the error "Expected one output from a curly brace or dot indexing expression, but there were 4 results." I take this to mean that my approach to extracting information from the cell array is inappropriate, although I'm not sure whether I can address this or how. I would be deeply appreciative for any assistance rendered!
UPDATE 070619: I did eventually get this to work, using the code below. Effectively, I created a string array containing the expression I wanted to apply to each row. The expression is identical for every row EXCEPT in that it contains the row index. I can then use arrayfun and evalin to produce results similar to those I was looking for. Unfortunately, my own problem involves sparse arrays, so I could not actually solve my original problem. However, I'm hoping this information may nonetheless be useful for others.
Starting_Cohort = [1 1; 1 1; 2 1; 2 1;];
[Cohort_Size, ~] = size(Starting_Cohort);
pstar = cell(2, 1);
pstar{1, 1} = [0.75 1.00; 0.15 1.00];
pstar{2, 1} = [0.65 1.00; 0.25 1.00];
rng(1234, 'twister'); T = 5; r = rand(Cohort_Size, T);
Sim_Results = [Starting_Cohort zeros(Cohort_Size, T)];
for i = 1:Cohort_Size
TEST(i, 1) = strcat("max(pstar{Sim_Results(", string(i), ", 1), 1}",...
"(Sim_Results(", string(i), ", t+1), :) > ", ...
"r(", string(i), ", t), [], 2)");
end
for t = 1:T
[~, Sim_Results(:, t+2)] = arrayfun(#(x) evalin('base', x), TEST);
end

How can I avoid constructing these grid variables in MATLAB?

I have the following calculations in two steps:
Initially, I create a set of 4 grid vectors, each spanning from -2 to 2:
u11grid=[-2:0.1:2];
u12grid=[-2:0.1:2];
u22grid=[-2:0.1:2];
u21grid=[-2:0.1:2];
[ca, cb, cc, cd] = ndgrid(u11grid, u12grid, u22grid, u21grid);
u11grid=ca(:);
u12grid=cb(:);
u22grid=cc(:);
u21grid=cd(:);
%grid=[u11grid u12grid u22grid u21grid]
sg=size(u11grid,1);
Next, I have an algorithm assigning the same index (equalorder) to the rows of grid sharing a specific structure:
U1grid=[-u11grid -u21grid -u12grid -u22grid Inf*ones(sg,1) -Inf*ones(sg,1)];
U2grid=[u21grid-u11grid -u21grid u22grid-u12grid -u22grid Inf*ones(sg,1) -Inf*ones(sg,1)];
s1=size(U1grid,2);
s2=size(U2grid,2);
%-------------------------------------------------------
%sortedU1grid gives U1grid with each row sorted from smallest to largest
%for each row i of sortedU1grid and for j=1,2,...,s1 index1(i,j) gives
%the column position 1,2,...,s1 in U1grid(i,:) of sortedU1grid(i,j)
[sortedU1grid,index1] = sort(U1grid,2);
%for each row i of sortedU1grid, d1(i,:) is a 1x(s1-1) row of ones and zeros
% d1(i,j)=1 if sortedU1grid(i,j)-sortedU1grid(i,j-1)=0 and d1(i,j)=0 otherwise
d1 = diff(sortedU1grid,[],2) == 0;
%-------------------------------------------------------
%Repeat for U2grid
[sortedU2grid,index2] = sort(U2grid,2);
d2 = diff(sortedU2grid,[],2) == 0;
%-------------------------------------------------------
%Assign the same index to the rows of grid sharing the same "ordering"
[~,~,equalorder] = unique([index1 index2 d1 d2],'rows', 'stable'); %sgx1
My question: is there a way to compute the algorithm in step 2 without the initial construction of the grid vectors in step 1? I am asking this because step 1 takes a lot of memory given that it basically generates the Cartesian product of 4 sets.
A solution should not rely on the specific content of U1grid and U2grid as that part changes in my actual code. To be more clear: U1grid and U2grid are ALWAYS derived from u11grid, ..., u21grid; however, the way in which they are derived from u11grid, ..., u21grid is slightly more complicated in my actual code from what I have reported here.
As Cris Luengo mentions in a comment, you're always going to be dealing with a trade-off between speed and memory. That said, one option you have is to only compute each of your 4 grid variables (u11grid u12grid u22grid u21grid) when needed instead of computing them once and storing them. You will save on memory but will lose speed if you are recomputing each one multiple times.
The solution I came up with involves creating an anonymous function equivalent for each of the 4 grid variables, using combinations of repmat and repelem to compute each individually instead of ndgrid to compute them all together:
u11gridFcn = #() repmat((-2:0.1:2).', 41.^3, 1);
u12gridFcn = #() repmat(repelem((-2:0.1:2).', 41), 41.^2, 1);
u22gridFcn = #() repmat(repelem((-2:0.1:2).', 41.^2), 41, 1);
u21gridFcn = #() repelem((-2:0.1:2).', 41.^3);
sg = 41.^4;
You would then use these by replacing every usage of your 4 grid variables in U1grid and U2grid with their corresponding function call. For your specific example above, this would be the new code for U1grid and U2grid (note also the use of inf(...) instead of Inf*ones(...), a small detail):
U1grid = [-u11gridFcn() ...
-u21gridFcn() ...
-u12gridFcn() ...
-u22gridFcn() ...
inf(sg, 1) ...
-inf(sg, 1)];
U2grid = [u21gridFcn()-u11gridFcn() ...
-u21gridFcn() ...
u22gridFcn()-u12gridFcn() ...
-u22gridFcn() ...
inf(sg, 1) ...
-inf(sg, 1)];
In this example, you avoid the memory needed to store the 4 grid variables, but the values for u11grid and u12grid will each be computed twice while the values for u21grid and u22grid will each be computed three times. Likely a small time trade-off for a potentially significant memory savings.
You may be able to remove the ndgrid, but it is not the memory bottleneck of this code, which is the call to unique on the large matrix A = [index1 index2 d1 d2]. The size of A is 2825761 by 22 (much larger than the grids), and it seems that unique may even internally copy A. I was able to avoid this call using
[sorted, ind] = sortrows([index1 index2 d1 d2]);
change = [1; any(diff(sorted), 2)];
uniqueInd = cumsum(change);
equalorder(ind) = uniqueInd;
[~, ~, equalorder] = unique(equalorder, 'stable');
where the last line is still the memory bottleneck and is only needed if you want the same numbering as your code produces. If any unique ordering is okay, you can skip it. You may be able to further reduce the memory footprint by carefully clearing variables are soon as they are no longer needed.

Animated plot of infectious disease spread with for loop (Matlab)

I'm a beginner in Matlab and I'm trying to model the spread of an infectious disease using Matlab. However, I encounter some problems.
At first, I define the matrices that need to be filled and their initial status:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.0001; % Rate of spread
Now, I want to make a plot where the spread of the disease is shown, using a for loop. But i'm stuck here...
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix(t) = round((Rate).*(out_ZT));
diseasematrix(t) = diseasematrix(t-1) + infectionmatrix(t-1);
healthymatrix(t) = healthymatrix(t-1) - infectionmatrix(t-1);
imagesc(diseasematrix(t));
title(sprintf('Day %i',t));
drawnow;
end
This basically says that the infectionmatrix is calculated based upon the formula in the loop, the diseasematrix is calculated by adding up the sick people of the previous timestep with the infected people of the previous time. The healthy people that remain are calculated by substracting the healthy people of the previous time step with the infected people. The variable out_ZT is a function I made:
function [ZT] = calc_ZT(Zneighboursum, diseasematrix)
Zneighboursum = Zneighboursum + circshift(diseasematrix,[1 0]);
Zneighboursum = Zneighboursum + circshift(diseasematrix,[0 1]);
ZT=Zneighboursum;
end
This is to quantify the number of sick people around a central cell.
However, the result is not what I want. The plot does not evolve dynamically and the values don't seem to be right. Can anyone help me?
Thanks in advance!
There are several problems with the code:
(Rate).*(out_ZT) is wrong. Because first one is a scalar and
second is a matrix, while .* requires both to be matrices of the
same size. so a single * would work.
The infectionmatrix,
diseasematrix, healthymatrix are all 2 dimensional matrices and
in order to keep them in memory you need to have a 3 dimensional
matrix. But since you don't use the things you store later you can
just rewrite on the old one.
You store integers in the
infectionmatrix, because you calculate it with round(). That
sets the result always to zero.
The value for Rate was too low to see any result. So I increased it to 0.01 instead
(just a cautionary point) you haven't used healthymatrix in your code anywhere.
The code for the function is fine, so after debugging according to what I perceived, here's the code:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01;
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix = (Rate*out_ZT);
diseasematrix = diseasematrix + infectionmatrix;
healthymatrix = healthymatrix - infectionmatrix;
imagesc(diseasematrix);
title(sprintf('Day %i',t));
drawnow;
end
There is several problems:
1) If you want to save a 3D matrix you will need a 3D vector:
so you have to replace myvariable(t) by myvariable(:,:,t);
2) Why did you use round ? if you round a value < 0.5 the result will be 0. So nothing will change in your loop.
3) You need to define the boundary condition (t=1) and then start your loop with t = 2.
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix =zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01; % Rate of spread
for t=2:365
Zneighboursum=zeros(size(diseasematrix,1),size(diseasematrix,2));
out_ZT = calc_ZT(Zneighboursum, diseasematrix(:,:,t-1));
infectionmatrix(:,:,t) = (Rate).*(out_ZT);
diseasematrix(:,:,t) = diseasematrix(:,:,t-1) + infectionmatrix(:,:,t-1);
healthymatrix(:,:,t) = healthymatrix(:,:,t-1) - infectionmatrix(:,:,t-1);
imagesc(diseasematrix(:,:,t));
title(sprintf('Day %i',t));
drawnow;
end
IMPORTANT: circshift clone your matrix in order to deal with the boundary effect.

Indexing elements of parameters of a function within nested for loops

I have two matrices of results, A = 128x631 and B = 128x1014 and I have a function SSD that takes two elements (x,y) as parameters and then calculates the sum of squared differences. I also have a 631x1014 matrix of 0s, called SSDMatrix, ready to put the results of my SSD function into.
What I'm trying to do is compare each element of A with each element of B by passing them into SSD, but I can't figure out how to structure my for loops to get the desired results.
When I try:
SSDMatrix = SSD(A, B);
I get exactly the result I'm looking for, but only for the first cell. How can I repeat this process for each element of A and B?
Currently I have this:
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = SSD(A,B);
end
end
This just results in the first answer being repeated 631*1014 times, so I need a way to index A and B to get the appropriate answer for each (i,j) of SSDMatrix.
It seems you were needed to do something like this -
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = sum( (A(:,i) - B(:,j)).^ 2 );
end
end
This, you can achieve with pdist2 as well that gets us the square root of summed squared distances. Now, please do note that pdist2 is part of the Statistics Toolbox. So, to get the desired output, you can do -
out = pdist2(A.',B.').^2;
Or with bsxfun -
out = squeeze(sum(bsxfun(#minus,A,permute(B,[1 3 2])).^2,1));

Adjacency matrix from edge list (preferrably in Matlab)

I have a list of triads (vertex1, vertex2, weight) representing the edges of a weighted directed graph. Since prototype implementation is going on in Matlab, these are imported as a Nx3 matrix, where N is the number of edges. So the naive implementation of this is
id1 = L(:,1);
id2 = L(:,2);
weight = L(:,3);
m = max(max(id1, id2)) % to find the necessary size
V = zeros(m,m)
for i=1:m
V(id1(i),id2(i)) = weight(i)
end
The trouble with tribbles is that "id1" and "id2" are nonconsecutive; they're codes. This gives me three problems. (1) Huge matrices with way too many "phantom", spurious vertices, which distorts the results of algorithms to be used with that matrix and (2) I need to recover the codes in the results of said algorithms (suffice to say this would be trivial if id codes where consecutive 1:m).
Answers in Matlab are preferrable, but I think I can hack back from answers in other languages (as long as they're not pre-packaged solutions of the kind "R has a library that does this").
I'm new to StackOverflow, and I hope to be contributing meaningfully to the community soon. For the time being, thanks in advance!
Edit: This would be a solution, if we didn't have vertices at the origin of multiple vertices. (This implies a 1:1 match between the list of edge origins and the list of identities)
for i=1:n
for j=1:n
if id1(i) >0 & i2(j) > 0
V(i,j) = weight(i);
end
end
end
You can use the function sparse:
sparse(id1,id2,weight,m,m)
If your problem is that the node ID numbers are nonconsecutive, why not re-map them onto consecutive integers? All you need to do is create a dictionary of all unique node ID's and their correspondence to new IDs.
This is really no different to the case where you're asked to work with named nodes (Australia, Britain, Canada, Denmark...) - you would map these onto consecutive integers first.
You can use GRP2IDX function to convert your id codes to consecutive numbers, and ids can be either numerical or not, does not matter. Just keep the mapping information.
[idx1, gname1, gmap1] = grp2idx(id1);
[idx2, gname2, gmap2] = grp2idx(id2);
You can recover the original ids with gmap1(idx1).
If your id1 and id2 are from the same set you can apply grp2idx to their union:
[idx, gname,gmap] = grp2idx([id1; id2]);
idx1 = idx(1:numel(id1));
idx2 = idx(numel(id1)+1:end);
For the reordering see a recent question - how to assign a set of coordinates in Matlab?
You can use ACCUMARRAY or SUB2IND to solve this problem.
V = accumarray([idx1 idx2], weight);
or
V = zeros(max(idx1),max(idx2)); %# or V = zeros(max(idx));
V(sub2ind(size(V),idx1,idx2)) = weight;
Confirm if you have non-unique combinations of id1 and id2. You will have to take care of that.
Here is another solution:
First put together all your vertex ids since there might a sink vertex in your graph:
v_id_from = edge_list(:,1);
v_id_to = edge_list(:,2);
v_id_all = [v_id_from; v_id_to];
Then find the unique vertex ids:
v_id_unique = unique(v_id_all);
Now you can use the ismember function to get the mapping between your vertex ids and their consecutive index mappings:
[~,from] = ismember(v_id_from, v_id_unique);
[~,to] = ismember(v_id_to, v_id_unique);
Now you can use sub2ind to populate your adjacency matrix:
adjacency_matrix = zeros(length(from), length(to));
linear_ind = sub2ind(size(adjacency_matrix), from, to);
adjacency_matrix(linear_ind) = edge_list(:,3);
You can always go back from the mapped consecutive id to the original vertex id:
original_vertex_id = v_id_unique(mapped_consecutive_id);
Hope this helps.
Your first solution is close to what you want. However it is probably best to iterate over your edge list instead of the adjacency matrix.
edge_indexes = edge_list(:, 1:2);
n_edges = max(edge_indexes(:));
adj_matrix = zeros(n_edges);
for local_edge = edge_list' %transpose in order to iterate by edge
adj_matrix(local_edge(1), local_edge(2)) = local_edge(3);
end