MATLAB: subtracting each element in a large vector from each element in another large vector in the fastest way possible - matlab

here is the code I have, its not simple subtraction. We want subtract each value in one vector from each value in the other vector, within certain bounds tmin and tmax. time_a and time_b are the very long vectors with times (in ps). binsize is just for grouping times in a similar range for plotting. The longest way possible would be to loop through each element and subtract each element in the other vector, but this would take forever and we are talking about vectors with hundreds of megabytes up to gb.
function [c, dt, dtEdges] = coincidence4(time_a,time_b,tmin,tmax,binsize)
% round tmin, tmax to a intiger multiple of binsize:
if mod(tmin,binsize)~=0
tmin=tmin-mod(tmin,binsize)+binsize;
end
if mod(tmax,binsize)~=0
tmax=tmax-mod(tmax,binsize);
end
dt = tmin:binsize:tmax;
dtEdges = [dt(1)-binsize/2,dt+binsize/2];
% dtEdges = linspace((tmin-binsize/2),(tmax+binsize/2),length(dt));
c = zeros(1,length(dt));
Na = length(time_a);
Nb = length(time_b);
tic1=tic;
% tic2=tic1;
% bbMax=Nb;
bbMin=1;
for aa = 1:Na
ta = time_a(aa);
bb = bbMin;
% tic
while (bb<=Nb)
tb = time_b(bb);
d = tb - ta;
if d < tmin
bbMin = bb;
bb = bb+1;
elseif d > tmax
bb = Nb+1;
else
% tic
% [dum, dum2] = histc(d,dtEdges);
index = floor((d-dtEdges(1))/(dtEdges(end)-dtEdges(1))*(length(dtEdges)-1)+1);
% toc
% dt(dum2)
c(index)=c(index)+1;
bb = bb+1;
end
end
% if mod(aa, 200) == 0
% toc(tic2)
% tic2=tic;
% end
end
% c=c(1:end-1);
toc(tic1)
end

Well, not a final answer but a few clue to simplify and accelerate your system:
First, use cached values. For example, in your line:
index = floor((d-dtEdges(1))/(dtEdges(end)-dtEdges(1))*(length(dtEdges)-1)+1);
your loop repeat the same computations every iteration. You can calculate the value before starting the loop, cache it then reuse the stored result:
cached_dt_constant = (dtEdges(end)-dtEdges(1))*(length(dtEdges)-1) ;
Then in your loop simply use:
index = floor( (d-dtEdges(1)) / cached_dt_constant +1 ) ;
if you have so many loop iteration you'll save valuable time this way.
Second, I am not entirely sure of what the computations are trying to achieve, but you can save time again by using the indexing power of matlab. By replacing the lower part of your code like this, I get an execution time 2 to 3 time faster (and the same results obviously).
Na = length(time_a);
Nb = length(time_b);
tic1=tic;
dtEdge_span = (dtEdges(end)-dtEdges(1)) ;
cached_dt_constant = dtEdge_span * (length(dtEdges)-1) ;
for aa = 1:Na
ta = time_a(aa);
d = time_b - ta ;
iok = (d>=tmin) & (d<=tmax) ;
index = floor( (d(iok)-dtEdges(1)) ./ cached_dt_constant +1 ) ;
c(index) = c(index) +1 ;
end
toc(tic1)
end
Now there is only one loop to go through, the inner loop has been removed and replaced by vectorized calculation. By scratching the head a bit further there might be a way to do even without the top loop and use only vectorized computations. Although this will require to have enough memory to handle quite big arrays in one go.
If the precision of each value is not critical (I see you round and floor values often), try converting your initial vectors to 'single' type instead of the default matlab 'double'. that would almost double the size of array your memory will be able to handle in one go.

Related

how to assign one value to a list of Objects in an efficient way in Matlab?

I want to assign one value to a list of objects in Matlab without using a for-loop (In order to increase efficiency)
Basically this works:
for i=1:Nr_of_Objects
Objectlist(i,1).weight=0.2
end
But I would like something like this:
Objectlist(:,1).weight=0.2
Which is not working. I get this error:
Expected one output from a curly brace or dot indexing expression, but there were 5 results.
Writing an array to the right hand side is also not working.
I`m not very familiar with object oriented programming in Matlab, so I would be happy if someone could help me.
Your looking for the deal function:
S(1,1).a = 1
S(2,1).a = 2
S(1,2).a = 3
[S(:,1).a] = deal(4)
Now S(1,1).a and S(2,1).a equal to 4.
In matlab you can concatenate several output in one array using []. And deal(X) copies the single input to all the requested outputs.
So in your case:
[Objectlist(:,1).weight] = deal(0.2)
Should work.
Noticed that I'm not sure that it will be faster than the for loop since I don't know how the deal function is implemented.
EDIT: Benchmark
n = 1000000;
[S(1:n,1).a] = deal(1);
tic
for ii=1:n
S(ii,1).a = 2;
end
toc
% Elapsed time is 3.481088 seconds
tic
[S(1:n,1).a] = deal(2);
toc
% Elapsed time is 0.472028 seconds
Or with timeit
n = 1000000;
[S(1:n,1).a] = deal(1);
g = #() func1(S,n);
h = #() func2(S,n);
timeit(g)
% ans = 3.67
timeit(h)
% ans = 0.41
function func1(S,n)
for ii=1:n
S(ii,1).a = 2;
end
end
function func2(S,n)
[S(1:n,1).a] = deal(2);
end
So it seems that using the deal function reduce the computational time.

Matlab generating random numbers and overlap check

I wrote a code for generating random number of rods on Matlab within a specified domain and then saving the output in a text file. I would like to ask for help on adding the following options to the code;
(i) if the randomly generated rod exceeds the specified domain size, the length of that rod should be shortened so that to keep it in that particular domain.
(ii) i would like to avoid the overlapping of the newly generated number (rod) with that of the previous one, in case of overlap generate another place for the new rod.
I can't figure out how shall I do it. It would be of much help if someone may help me write code for these two options.
Thank you
% myrandom.m
% Units are mm.
% domain size
bx = 160;
by = 40;
bz = 40;
lf = 12; % rod length
nf = 500; % Number of rods
rns = rand(nf,3); % Start
rne = rand(nf,3)-0.5; % End
% Start Points
for i = 1:nf
rns(i,1) = rns(i,1)*bx;
rns(i,2) = rns(i,2)*by;
rns(i,3) = rns(i,3)*bz;
end
% Unit Deltas
delta = zeros(nf,1);
for i = 1:nf
temp = rne(i,:);
delta(i) = norm(temp);
end
% Length Deltas
rne = lf*rne./delta;
% End Points
rne = rns + rne;
fileID = fopen('scfibers.txt','w');
for i = 1:nf
fprintf(fileID,'%12.8f %12.8f %12.8f\r\n',rns(i,1),rns(i,2),rns(i,3));
fprintf(fileID,'%12.8f %12.8f %12.8f\r\n\r\n',rne(i,1),rne(i,2),rne(i,3));
end
fclose(fileID);
I would start from writing a function that creates the random rods:
function [rns,rne] = myrandom(domain,len,N)
rns = rand(N,3).*domain; % Start --> rns = bsxfun(#times,rand(N,3),domain)
rne = rand(N,3)-0.5; % End
% Unit Deltas
delta = zeros(N,1);
for k = 1:N
delta(k) = norm(rne(k,:));
end
% Length Deltas
rne = len*rne./delta; % --> rne = len*bsxfun(#rdivide,rne,delta)
% End Points
rne = rns + rne;
% remove rods the exceed the domain:
notValid = any(rne>domain,2); % --> notValid = any(bsxfun(#gt,rne,domain),2);
rns(notValid,:)=[];
rne(notValid,:)=[];
end
This function gets the domain as [bx by bz] and also the length of the rods as len, and N the number of rods to generate. Note that using elementwise multiplication (.*) I have eliminated the first for loop.
In case you use MATLAB version prior to 2016b, you need to use bsxfun:
In MATLAB® R2016b and later, the built-in binary functions listed in this table independently support implicit expansion.
The affected lines are marked with --> in the code (with the alternative).
The last three lines in the function remove from the result all the rodes that exceed the domain size (I hope I got you correctly on this).
Next, I call this function within a script:
% domain size
bx = 160;
by = 40;
bz = 40;
domain = [bx by bz];
lf = 12; % rod length
nf = 500; % Number of rods
[rns,rne] = myrandom(domain,lf,nf);
u = unique([rns rne],'rows');
remain = nf-size(u,1);
while remain>0
[rns_temp,rne_temp] = myrandom(domain,lf,remain);
rns = [rns;rns_temp];
rne = [rne;rne_temp];
u = unique([rns rne],'rows');
remain = nf-size(u,1);
end
After the basic definitions, the function is called and returns rne and rns, which are probably smaller than nf. Then we check for duplicates, and store all unique rods in u. We calculate the rods remain to compute, and we use a while loop to generate new rods as needed. In each iteration of the loop, we add the newly created rods to those we have in rne and rns, and check how many unique vectors we have now, and if there are enough we quit the loop (then you can add printing to the file).
Note that:
I was not sure what you mean by "in case of overlap generate another place for the new rod" - do you want to have more than nf rods if some are duplicates, that from which nf are unique (what the code above does)? or you want to remove the duplicates and remain only with nf unique rods? In the case of the latter option, I would insert the unique function part into the function that creates the rods myrandom.
The wile loop as written above is not efficient since no preallocating of memory is done. I'm not sure that this is possible if you just want to create more rods but keep the duplicates, but if not (the second option in 1 above) and if you are going to use this allot, then preallocating is very recommended.

Fast way to get mean values of rows accordingly to subscripts

I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another

MATLAB: Using a for loop within another function

I am trying to concatenate several structs. What I take from each struct depends on a function that requires a for loop. Here is my simplified array:
t = 1;
for t = 1:5 %this isn't the for loop I am asking about
a(t).data = t^2; %it just creates a simple struct with 5 data entries
end
Here I am doing concatenation manually:
A = [a(1:2).data a(1:3).data a(1:4).data a(1:5).data] %concatenation function
As you can see, the range (1:2), (1:3), (1:4), and (1:5) can be looped, which I attempt to do like this:
t = 2;
A = [for t = 2:5
a(1:t).data
end]
This results in an error "Illegal use of reserved keyword "for"."
How can I do a for loop within the concatenate function? Can I do loops within other functions in Matlab? Is there another way to do it, other than copy/pasting the line and changing 1 number manually?
You were close to getting it right! This will do what you want.
A = []; %% note: no need to initialize t, the for-loop takes care of that
for t = 2:5
A = [A a(1:t).data]
end
This seems strange though...you are concatenating the same elements over and over...in this example, you get the result:
A =
1 4 1 4 9 1 4 9 16 1 4 9 16 25
If what you really need is just the .data elements concatenated into a single array, then that is very simple:
A = [a.data]
A couple of notes about this: why are the brackets necessary? Because the expressions
a.data, a(1:t).data
don't return all the numbers in a single array, like many functions do. They return a separate answer for each element of the structure array. You can test this like so:
>> [b,c,d,e,f] = a.data
b =
1
c =
4
d =
9
e =
16
f =
25
Five different answers there. But MATLAB gives you a cheat -- the square brackets! Put an expression like a.data inside square brackets, and all of a sudden those separate answers are compressed into a single array. It's magic!
Another note: for very large arrays, the for-loop version here will be very slow. It would be better to allocate the memory for A ahead of time. In the for-loop here, MATLAB is dynamically resizing the array each time through, and that can be very slow if your for-loop has 1 million iterations. If it's less than 1000 or so, you won't notice it at all.
Finally, the reason that HBHB could not run your struct creating code at the top is that it doesn't work unless a is already defined in your workspace. If you initialize a like this:
%% t = 1; %% by the way, you don't need this, the t value is overwritten by the loop below
a = []; %% always initialize!
for t = 1:5 %this isn't the for loop I am asking about
a(t).data = t^2; %it just creates a simple struct with 5 data entries
end
then it runs for anyone the first time.
As an appendix to gariepy's answer:
The matrix concatenation
A = [A k];
as a way of appending to it is actually pretty slow. You end up reassigning N elements every time you concatenate to an N size vector. If all you're doing is adding elements to the end of it, it is better to use the following syntax
A(end+1) = k;
In MATLAB this is optimized such that on average you only need to reassign about 80% of the elements in a matrix. This might not seam much, but for 10k elements this adds up to ~ an order of magnitude of difference in time (at least for me).
Bare in mind that this works only in MATLAB 2012b and higher as described in this thead: Octave/Matlab: Adding new elements to a vector
This is the code I used. tic/toc syntax is not the most accurate method for profiling in MATLAB, but it illustrates the point.
close all; clear all; clc;
t_cnc = []; t_app = [];
N = 1000;
for n = 1:N;
% Concatenate
tic;
A = [];
for k = 1:n;
A = [A k];
end
t_cnc(end+1) = toc;
% Append
tic;
A = [];
for k = 1:n;
A(end+1) = k;
end
t_app(end+1) = toc;
end
t_cnc = t_cnc*1000; t_app = t_app*1000; % Convert to ms
% Fit a straight line on a log scale
P1 = polyfit(log(1:N),log(t_cnc),1); P_cnc = #(x) exp(P1(2)).*x.^P1(1);
P2 = polyfit(log(1:N),log(t_app),1); P_app = #(x) exp(P2(2)).*x.^P2(1);
% Plot and save
loglog(1:N,t_cnc,'.',1:N,P_cnc(1:N),'k--',...
1:N,t_app,'.',1:N,P_app(1:N),'k--');
grid on;
xlabel('log(N)');
ylabel('log(Elapsed time / ms)');
title('Concatenate vs. Append in MATLAB 2014b');
legend('A = [A k]',['O(N^{',num2str(P1(1)),'})'],...
'A(end+1) = k',['O(N^{',num2str(P2(1)),'})'],...
'Location','northwest');
saveas(gcf,'Cnc_vs_App_test.png');

MATLAB: Dividing a year-length varying-resolution time vector into months

I have a time series in the following format:
time data value
733408.33 x1
733409.21 x2
733409.56 x3
etc..
The data runs from approximately 01-Jan-2008 to 31-Dec-2010.
I want to separate the data into columns of monthly length.
For example the first column (January 2008) will comprise of the corresponding data values:
(first 01-Jan-2008 data value):(data value immediately preceding the first 01-Feb-2008 value)
Then the second column (February 2008):
(first 01-Feb-2008 data value):(data value immediately preceding the first 01-Mar-2008 value)
et cetera...
Some ideas I've been thinking of but don't know how to put together:
Convert all serial time numbers (e.g. 733408.33) to character strings with datestr
Use strmatch('01-January-2008',DatesInChars) to find the indices of the rows corresponding to 01-January-2008
Tricky part (?): TransformedData(:,i) = OriginalData(start:end) ? end = strmatch(1) - 1 and start = 1. Then change start at the end of the loop to strmatch(1) and then run step 2 again to find the next "starting index" and change end to the "new" strmatch(1)-1 ?
Having it speed optimized would be nice; I am going to apply it on data sampled ~2 million times.
Thanks!
I would use histc with a list a list of last days of the month as the second parameter (Note: use histc with the two return functions).
The edge list can easily be created with datenum or datevec.
This way you don't have operation on string and you that should be fast.
EDIT:
Example with result in a simple data structure (including some code from #Rody):
% Generate some test times/data
tstart = datenum('01-Jan-2008');
tend = datenum('31-Dec-2010');
tspan = tstart : tend;
tspan = tspan(:) + randn(size(tspan(:))); % add some noise so it's non-uniform
data = randn(size(tspan));
% Generate list of edge
edge = [];
for y = 2008:2010
for m = 1:12
edge = [edge datenum(y, m, 1)];
end
end
% Histogram
[number, bin] = histc(tspan, edge);
% Setup of result
result = {};
for n = 1:length(edge)
result{n} = [tspan(bin == n), data(bin == n)];
end
% Test
% 04-Aug-2008 17:25:20
datestr(result{8}(4,1))
tspan(data == result{8}(4,2))
datestr(tspan(data == result{8}(4,2)))
Assuming you have sorted, non-equally-spaced date numbers, the way to go here is to put the relevant data in a cell array, so that each entry corresponds to the next month, and can hold a different amount of elements.
Here's how to do that quite efficiently:
% generate some test times/data
tstart = datenum('01-Jan-2008');
tend = datenum('31-Dec-2010');
tspan = tstart : tend;
tspan = tspan(:) + randn(size(tspan(:))); % add some noise so it's non-uniform
data = randn(size(tspan));
% find month numbers
[~,M] = datevec(tspan);
% find indices where the month changes
inds = find(diff([0; M]));
% extract data in columns
sz = numel(inds)-1;
cols = cell(sz,1);
for ii = 1:sz-1
cols{ii} = data( inds(ii) : inds(ii+1)-1 );
end
Note that it can be difficult to determine which entry in cols belongs to which month, year, so here's how to do it in a more human-readable way:
% change this line:
[y,M] = datevec(tspan);
% and change these lines:
cols = cell(sz,3);
for ii = 1:sz-1
cols{ii,1} = data( inds(ii) : inds(ii+1)-1 );
% also store the year and month
cols{ii,2} = y(inds(ii));
cols{ii,3} = M(inds(ii));
end
I'll assume you have a timeVals an Nx1 double vector holding the time value of each datum. Assuming data is also an Nx1 array. I also assume data and timeVals are sorted according to time: that is, the samples you have are ordered according to the time they were taken.
How about:
subs = #(x,i) x(:,i);
months = subs( datevec(timeVals), 2 ); % extract the month of year as a number from the time
r = find( months ~= [months(2:end), months(end)+1] );
monthOfCell = months( r );
r( 2:end ) = r( 2:end ) - r( 1:end-1 );
dataByMonth = mat2cell( data', r ); % might need to transpose data or r here...
timeByMonth = mat2cell( timeVal', r );
After running this code, you have a cell array dataByMonth each cell contains all data relevant to a specific month. The corresponding cell of timeByMonth holds the sampling times of the data of the respective month. Finally, monthOfCell tells you what is the month's number (1-12) of each cell.