Averaging over adjacency groups in a loop

Averaging over adjacency groups in a loop - matlab

I have the following code which eventually outputs a graph and a 'groups' value. The results are dependent on a random function so can provide different results every times.
function [t seqBeliefs] = extendedHK(n, tol, adj)
%extendedHK Summary of function goes here
%Detailed explanation goes here
beliefs = rand(n,1);
seqBeliefs = beliefs; %NxT matrix
converge = 0;
step = 0
t = step
while converge ~= 1
step = step+1;
t = [t step];
A = zeros (n,n);
for i=1:1:n
for j=i:1:n
if abs(beliefs(i) - beliefs(j)) < tol && adj(i,j)==1
A(j,i)=1;
A(i,j)=1;
end
end
end
beliefs = A*beliefs./ sum(A,2);
seqBeliefs = [seqBeliefs beliefs];
if sum(abs(beliefs - seqBeliefs(:,step)))<1e-12
converge = 1;
end
end
groups = length(uniquetol(seqBeliefs(:,step), 1e-10))
plot(t,seqBeliefs)
end
In command window type
adj=random_graph(n)
I usually use n as 100 then call extendedHK function with same n then tol value (I usually choose between 0.1 and 0.4) and 'adj'
e.g. adj = random_graph(100)
extendedHK(100, 0.2, adj)
What I now need help with is running this function say 100 times, and taking an average of how many 'groups' are formed.

First, include the parameter "groups" in your function output. to do so, try this instead of the first line of your code:
function [t seqBeliefs groups] = extendedHK(n, tol, adj)
then save this function in a extendedHK.m file.
open another .m file, say console.m and write this:
results = zeros(1,100);
for i = 1:100
% set n, tol and adj inputs here <=
[~,~,out] = extendedHK(n, tol, adj);
results(1,i) = out;
end
avg = mean(results)
don't forget to define "results" out of the loop. since parameters that change size every loop, will make your code slow
(suppressing unnecessary outputs with ~ added later to this reply)

It is not clear if you need the current outputs of extendedHK:[t seqBeliefs]
Anyhow, you can add "groups" to the output and then from the console
N=100;
groups_vec = zeros(1,N);
for i=1:N
adj = random_graph(100);
[~,~,groups_vec(i)] = extendedHK(100, 0.2, adj);
end
groups_avr = mean(groups_vec);
note that if you use this code you won't be able to "see" the graphs as they will be cleared every loop iteration. you can do one of the following (1) add "figure;" before the plot command and then you will have 100 figures. (2) add "pause" to wait for key press between each graph. (3) add "hold on" to print all graphs on the same figure.

Related

How can I loop through and process files from a directory individually?

I'm trying to write a function that iteratively loops through a bunch of .txt files in a directory and processes them individually. After a file has been processed I would like to plot output variables, then on the next iteration plot the new variables on the same graph. My code is shown below however does not seem to be working, it returns a 'output argument 'output1' (and maybe others) not assigned during call to function2.
Currently my code looks something like this:
function [output1, output2, fig] = Processing(Folder_1,Folder_2,Folder_3)
%% Read in all the Data
% Initialise Loop to process each file independently
for i = 1 : length(Text_Files1)
Sorted_Index1 = Indices1(i);
Path1 = fullfile(Folder_1,Text_Files1(Sorted_Index1).name);
Table1 = readtable(Path1);
Sorted_Index2 = Indices2(i);
Path2 = fullfile(Folder_2,Text_Files2(Sorted_Index2).name);
Table2 = readtable(Path2);
Sorted_Index3 = Indices3(i);
Path3 = fullfile(Folder_3,Text_Files3(Sorted_Index3).name);
Table3 = readtable(Path3,'Delimiter',';');
%% Process Data through the Loop
[HR] = function1(processed_data)
[output1 output2] = function2(HR, Time);
hold on
fig = figure('Visible', false);
subplot(10,10,1:90)
plot(output2, output1, 'Color', [0.92 0.47 0.44], 'LineWidth', 1.5);
end
end
%% Function 2:
function [output1, output2] = function2(t, y, z)
segment = 1:z:max(y);
% Iterate through each time segment and calculate average t value
for x = 1:length(segment)-1
SegmentScores = find(y > segment(x) & ...
y < segment(x+1));
output1(x) = mean(y(SegmentScores));
output2(x) = Segment(x+1);
end
end

The problem is that you're calling function2(HR, Time) with two inputs, while it needs three function2(t, y, z).
For that reason, I'm assuming that inside function2 segment is an empty array, and thus the loop for x = 1:length(segment) - 1 is not entered at all.
Since the definition of output1 and output2 is only inside the loop, they will never be created and thus the error.
If you provide three inputs to your function2, the problem will be solved. Just pay attention to the order in which you provide them (for example, I'm assuming your Time input should be the same as t in the function, so it should be the first input to be passed).

Fast way to get mean values of rows accordingly to subscripts

I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?

This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));

M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another

Need help saving a parameter to a variable each iteration

In the code shown, I want to save one parameter (fval) per iteration in one variable, but not sure how to do it. Can someone advise?
clear;
close all;
clc;
for i = 0 : 100
ii = i * 0.01;
options = optimset('Display','iter-detailed', ...
'Algorithm','interior point', ...
'Diagnostics','on');
options.TolCon = 0;
options.TolFun = 0;
[X,faval,exitfag,output,lambda,grad,hessian]=fmincon(#myfun9,0,[],[],[],[],ii,1,#mycon,options);
end;

If you want to create 101 separate variables for storing each value, that's not recommended. Pre-allocate an array for fval for storing the values of faval in each iteration as shown below:
fval = zeros(101,1); %Pre-allocating memory for fval
for k = 0 : 100
ii = k * 0.01;
options = optimset('Display','iter-detailed', ...
'Algorithm','interior point', ...
'Diagnostics','on');
options.TolCon = 0;
options.TolFun = 0;
[X,faval,exitfag,output,lambda,grad,hessian]=fmincon(#myfun9,0,[],[],[],[],ii,1, ...
#mycon,options);
%Storing value of faval in fval(k+1). Note that indexing starts from 1 in MATLAB
fval(k+1) = faval;
end
By the way, it seems from your code that you're not interested in the values of all other parameters i.e. X,exitfag,output,lambda,grad,hessi, because these parameters will be overwritten in each iteration. If you're not interested in these values. You can skip storing them using tilde (~). So you might also want to use the following instead:
[~,faval]=fmincon(#myfun9,0,[],[],[],[],ii,1, #mycon,options);
% exitfag, output, lambda, grad, hessi are skipped automatically as per fmincon doc
Suggested reading:
1. Dynamic Variables
2. Preallocation
3. Why does MATLAB have 1 based indexing?
4. Tilde as an Argument Placeholder

MATLAB list manipulation

Consider I have a code segment as follows:
Case 1
n = 20;
for i = 2 : n
mat = rand([2,i]);
mat = [mat, mat(:,1)]; %add the first column to the last
%feed the variable 'mat' to a function
end
Case 2
n = 10;
list = [];
for i = 1 : n
a = rand([2,1]);
b = rand([2,2])
list = [list, [a,b]];
end
In this way, MATLAB gives the below suggestion:
The variable 'mat' appears to change size on every loop. Consider preallocating for speed up.
The variable 'list' appears to change size on every loop. Consider preallocating for speed up.
I am a MATLAB newconer, So I would like to know how to deal with this issue. How to do this in native MATLAB style? Thanks in advance.

I'll focus on the second case, as it's the only one that makes sense:
n = 10;
list = [];
for i = 1 : n
a = rand([2,1]);
b = rand([2,2])
list = [list, [a,b]];
end
That you are doing here, for each loop, is to create two vectors with random numbers, a and b. a has dimension 2x1, and b has dimension 2x2. Then, you concatenate these two, with the matrix list.
Note that each call to rand are independent, so rand(2,3) will behave the same way [rand(2,2), rand(2,1)] does.
Now, since you loop 10 times, and you add rand(2,3) every time, you're essentially doing [rand(2,2), rand(2,1), rand(2,2), rand(2,1) ...]. This is equivalent to rand(2,30), which is a lot faster. Therefore, "Consider preallocating for speed up."
Now, if your concatenations doesn't contain random matrices, but are really the output from some function that can't output the entire matrix you want, then preallocate and insert it to the matrix using indices:
Let's define a few functions:
function x = loopfun(n)
x = n*[1; 2];
end
function list = myfun1(n)
list = zeros(2, n);
for ii = 1:n
list(:,ii) = loopfun(ii);
end
end
function list = myfun2(n)
list = [];
for ii = 1:n
list = [list, loopfun(ii)];
end
end
f1 = #() myfun1(100000); f2 = #() myfun2(100000);
fprintf('Preallocated: %f\nNot preallocated: %f\n', timeit(f1), timeit(f2))
Preallocated: 0.141617
Not preallocated: 0.318272
As you can see, the function with preallocation is twice as fast as the function with an increasing sized matrix. The difference is smaller if there are few iterations, but the general idea is the same.
f1 = #() myfun1(5); f2 = #() myfun2(5);
fprintf('Preallocated: %f\nNot preallocated: %f\n', timeit(f1), timeit(f2))
Preallocated: 0.000010
Not preallocated: 0.000018

MATLAB: subtracting each element in a large vector from each element in another large vector in the fastest way possible

here is the code I have, its not simple subtraction. We want subtract each value in one vector from each value in the other vector, within certain bounds tmin and tmax. time_a and time_b are the very long vectors with times (in ps). binsize is just for grouping times in a similar range for plotting. The longest way possible would be to loop through each element and subtract each element in the other vector, but this would take forever and we are talking about vectors with hundreds of megabytes up to gb.
function [c, dt, dtEdges] = coincidence4(time_a,time_b,tmin,tmax,binsize)
% round tmin, tmax to a intiger multiple of binsize:
if mod(tmin,binsize)~=0
tmin=tmin-mod(tmin,binsize)+binsize;
end
if mod(tmax,binsize)~=0
tmax=tmax-mod(tmax,binsize);
end
dt = tmin:binsize:tmax;
dtEdges = [dt(1)-binsize/2,dt+binsize/2];
% dtEdges = linspace((tmin-binsize/2),(tmax+binsize/2),length(dt));
c = zeros(1,length(dt));
Na = length(time_a);
Nb = length(time_b);
tic1=tic;
% tic2=tic1;
% bbMax=Nb;
bbMin=1;
for aa = 1:Na
ta = time_a(aa);
bb = bbMin;
% tic
while (bb<=Nb)
tb = time_b(bb);
d = tb - ta;
if d < tmin
bbMin = bb;
bb = bb+1;
elseif d > tmax
bb = Nb+1;
else
% tic
% [dum, dum2] = histc(d,dtEdges);
index = floor((d-dtEdges(1))/(dtEdges(end)-dtEdges(1))*(length(dtEdges)-1)+1);
% toc
% dt(dum2)
c(index)=c(index)+1;
bb = bb+1;
end
end
% if mod(aa, 200) == 0
% toc(tic2)
% tic2=tic;
% end
end
% c=c(1:end-1);
toc(tic1)
end

Well, not a final answer but a few clue to simplify and accelerate your system:
First, use cached values. For example, in your line:
index = floor((d-dtEdges(1))/(dtEdges(end)-dtEdges(1))*(length(dtEdges)-1)+1);
your loop repeat the same computations every iteration. You can calculate the value before starting the loop, cache it then reuse the stored result:
cached_dt_constant = (dtEdges(end)-dtEdges(1))*(length(dtEdges)-1) ;
Then in your loop simply use:
index = floor( (d-dtEdges(1)) / cached_dt_constant +1 ) ;
if you have so many loop iteration you'll save valuable time this way.
Second, I am not entirely sure of what the computations are trying to achieve, but you can save time again by using the indexing power of matlab. By replacing the lower part of your code like this, I get an execution time 2 to 3 time faster (and the same results obviously).
Na = length(time_a);
Nb = length(time_b);
tic1=tic;
dtEdge_span = (dtEdges(end)-dtEdges(1)) ;
cached_dt_constant = dtEdge_span * (length(dtEdges)-1) ;
for aa = 1:Na
ta = time_a(aa);
d = time_b - ta ;
iok = (d>=tmin) & (d<=tmax) ;
index = floor( (d(iok)-dtEdges(1)) ./ cached_dt_constant +1 ) ;
c(index) = c(index) +1 ;
end
toc(tic1)
end
Now there is only one loop to go through, the inner loop has been removed and replaced by vectorized calculation. By scratching the head a bit further there might be a way to do even without the top loop and use only vectorized computations. Although this will require to have enough memory to handle quite big arrays in one go.
If the precision of each value is not critical (I see you round and floor values often), try converting your initial vectors to 'single' type instead of the default matlab 'double'. that would almost double the size of array your memory will be able to handle in one go.