Matlab - Sum groups of elements of a vector based on given indices - matlab

I repropose a question I asked this week and that, due to a missing tag, went unnoticed (basically it was viewed only by me).
I have two large vectors, values and indices. I need to sum the elements of values using indices as in this brute force example:
% The two vectors, which are given, look like this:
N = 3e7;
values = (rand(N, 1) > 0.3);
indices = cumsum(ceil(4*rand(N, 1)));
indices = [0; indices(find(indices > 1, 1, 'first'):find(indices < N, 1, 'last')); N];
HH = numel(indices) - 1;
% This is the brute force solution
tic
out1 = zeros(HH, 1);
for hh = 1:HH
out1(hh) = sum(values((indices(hh)+1):indices(hh+1)));
end
toc
A more efficient way to do it is the following:
tic
indices2 = diff(indices);
new_inds = (1:HH+1)';
tmp = zeros(N, 1);
tmp(cumsum(indices2)-indices2+1)=1;
new_inds_long = new_inds(cumsum(tmp));
out2 = accumarray(new_inds_long, values);
toc
A better solution is:
tic
out3 = cumsum(values);
out3 = out3(indices(2:end));
out3 = [out3(1); diff(out3)];
toc
The three solutions are equivalent
all(out1 == out2)
all(out1 == out3)
Question is: since this is really a basic function, is there any faster, already known approach/function that does the same and that I may be overlooking or that I am just not aware of?

If generating your indices is not simply a dummy for some other, this could be improved. Currently you are wasting 3/4 of the generated numbers. 1) Determine the number of indices you want (binomial distribution) 2) generate only the used indices.

Related

How can I exploit parallelism when defining values in a sparse matrix?

The following MATLAB code loops through all elements of a matrix with size 2IJ x 2IJ.
for i=1:(I-2)
for j=1:(J-2)
ij1 = i*J+j+1; % row
ij2 = i*J+j+1 + I*J; % col
D1(ij1,ij1) = 2;
D1(ij1,ij2) = -1;
end
end
Is there any way I can parallelize it use MATLAB's parfor command? You can assume any element not defined is 0. So this matrix ends up being sparse (mostly 0s).
Before using parfor it is recommended to read the guidelines related to decide when to use parfor. Specially this:
Generally, if you want to make code run faster, first try to vectorize it.
Here vectorization can be used effectively to compute indices of the nonzero elements. Those indices are used in function sparse. For it you need to define one of i or j to be a column vector and another a row vector. Implicit expansion takes effect and indices are computed.
I = 300;
J = 300;
i = (1:I-2).';
j = 1:J-2;
ij1 = i*J+j+1;
ij2 = i*J+j+1 + I*J;
D1 = sparse(ij1, ij1, 2, 2*I*J, 2*I*J) + sparse(ij1, ij2, -1, 2*I*J, 2*I*J);
However for the comparison this can be a way of using parfor (not tested):
D1 = sparse (2*I*J, 2*I*J);
parfor i=1:(I-2)
for j=1:(J-2)
ij1 = i*J+j+1;
ij2 = i*J+j+1 + I*J;
D1 = D1 + sparse([ij1;ij1], [ij1;ij2], [2;-1], 2*I*J, 2*I*J) ;
end
end
Here D1 used as reduction variable.

Regarding loop structure in Matlab for an iterative procedure

I'm trying to code a loop in Matlab that iteratively solves for an optimal vector s of zeros and ones. This is my code
N = 150;
s = ones(N,1);
for i = 1:N
if s(i) == 0
i = i + 1;
else
i = i;
end
select = s;
HI = (item_c' * (weights.*s)) * (1/(weights'*s));
s(i) = 0;
CI = (item_c' * (weights.*s)) * (1/(weights'*s));
standarderror_afterex = sqrt(var(CI - CM));
standarderror_priorex = sqrt(var(HI - CM));
ratio = (standarderror_afterex - standarderror_priorex)/(abs(mean(weights.*s) - weights'*select));
ratios(i) = ratio;
s(i) = 1;
end
[M,I] = min(ratios);
s(I) = 0;
This code sets the element to zero in s, which has the lowest ratio. But I need this procedure to start all over again, using the new s with one zero, to find the ratios and exclude the element in s that has the lowest ratio. I need that over and over until no ratios are negative.
Do I need another loop, or do I miss something?
I hope that my question is clear enough, just tell me if you need me to explain more.
Thank you in advance, for helping out a newbie programmer.
Edit
I think that I need to add some form of while loop as well. But I can't see how to structure this. This is the flow that I want
With all items included (s(i) = 1 for all i), calculate HI, CI and the standard errors and list the ratios, exclude item i (s(I) = 0) which corresponds to the lowest negative ratio.
With the new s, including all ones but one zero, calculate HI, CI and the standard errors and list the ratios, exclude item i, which corresponds to the lowest negative ratio.
With the new s, now including all ones but two zeros, repeat the process.
Do this until there is no negative element in ratios to exclude.
Hope that it got more clear now.
Ok. I want to go through a few things before I list my code. These are just how I would try to do it. Not necessarily the best way, or fastest way even (though I'd think it'd be pretty quick). I tried to keep the structure as you had in your code, so you could follow it nicely (even though I'd probably meld all the calculations down into a single function or line).
Some features that I'm using in my code:
bsxfun: Learn this! It is amazing how it works and can speed up code, and makes some things easier.
v = rand(n,1);
A = rand(n,4);
% The two lines below compute the same value:
W = bsxfun(#(x,y)x.*y,v,A);
W_= repmat(v,1,4).*A;
bsxfun dot multiplies the v vector with each column of A.
Both W and W_ are matrices the same size as A, but the first will be much faster (usually).
Precalculating dropouts: I made select a matrix, where before it was a vector. This allows me to then form a variable included using logical constructs. The ~(eye(N)) produces an identity matrix and negates it. By logically "and"ing it with select, then the $i$th column is now select, with the $i$th element dropped out.
You were explicitly calculating weights'*s as the denominator in each for-loop. By using the above matrix to calculate this, we can now do a sum(W), where the W is essentially weights.*s in each column.
Take advantage of column-wise operations: the var() and the sqrt() functions are both coded to work along the columns of a matrix, outputting the action for a matrix in the form of a row vector.
Ok. the full thing. Any questions let me know:
% Start with everything selected:
select = true(N);
stop = false; % Stopping flag:
while (~stop)
% Each column leaves a variable out...
included = ~eye(N) & select;
% This calculates the weights with leave-one-out:
W = bsxfun(#(x,y)x.*y,weights,included);
% You can comment out the line below, if you'd like...
W_= repmat(weights,1,N).*included; % This is the same as previous line.
% This calculates the weights before dropping the variables:
V = bsxfun(#(x,y)x.*y,weights,select);
% There's different syntax, depending on whether item_c is a
% vector or a matrix...
if(isvector(item_c))
HI = (item_c' * V)./(sum(V));
CI = (item_c' * W)./(sum(W));
else
% For example: item_c is a matrix...
% We have to use bsxfun() again
HI = bsxfun(#rdivide, (item_c' * V),sum(V));
CI = bsxfun(#rdivide, (item_c' * W),sum(W));
end
standarderror_afterex = sqrt(var(bsxfun(#minus,HI,CM)));
standarderror_priorex = sqrt(var(bsxfun(#minus,CI,CM)));
% or:
%
% standarderror_afterex = sqrt(var(HI - repmat(CM,1,size(HI,2))));
% standarderror_priorex = sqrt(var(CI - repmat(CM,1,size(CI,2))));
ratios = (standarderror_afterex - standarderror_priorex)./(abs(mean(W) - sum(V)));
% Identify the negative ratios:
negratios = ratios < 0;
if ~any(negratios)
% Drop out of the while-loop:
stop = true;
else
% Find the most negative ratio:
neginds = find(negratios);
[mn, mnind] = min(ratios(negratios));
% Drop out the most negative one...
select(neginds(mnind),:) = false;
end
end % end while(~stop)
% Your output:
s = select(:,1);
If for some reason it doesn't work, please let me know.

MATLAB - Vectorize a double loop containing a distance measure

I am trying to optimize my code and am not sure how and if I would be able to vectorize this particular section??
for base_num = 1:base_length
for sub_num = 1:base_length
dist{base_num}(sub_num) = sqrt((x(base_num) - x(sub_num))^2 + (y(base_num) - y(sub_num))^2);
end
end
The following example provides one method of vectorization:
%# Set example parameters
N = 10;
X = randn(N, 1);
Y = randn(N, 1);
%# Your loop based solution
Dist1 = cell(N, 1);
for n = 1:N
for m = 1:N
Dist1{n}(m) = sqrt((X(n) - X(m))^2 + (Y(n) - Y(m))^2);
end
end
%# My vectorized solution
Dist2 = sqrt(bsxfun(#minus, X, X').^2 + bsxfun(#minus, Y, Y').^2);
Dist2Cell = num2cell(Dist2, 2);
A quick speed test at N = 1000 has the vectorized solution running two orders of magnitude faster than the loop solution.
Note: I've used a second line in my vectorized solution to mimic your cell array output structure. Up to you whether you want to include it or two combine it into one line etc.
By the way, +1 for posting code in the question. However, two small suggestions for the future: 1) When posting to SO, use simple variable names - especially for loop subscripts - such as I have in my answer. 2) It is nice when we can copy and paste example code straight into a script and run it without having to do any changes or additions (again such as in my answer). This allows us to converge on a solution more rapidly.

Removing duplicate entries in a vector, when entries are complex and rounding errors are causing problems

I want to remove duplicate entries from a vector on Matlab. The problem I'm having is that rounding errors are stopping the inbuilt Matlab function 'unique' from working properly. Ideally I'd like a way to set some sort of tolerance on the 'unique' function, or a small procedure that will remove the duplicates otherwise. If both the real and imaginary parts of two entries differ by less than 0.0001, then I'm happy to consider them equal. How can I do this?
Any help will be greatly appreciated. Thanks
A simple approximation would be to round the numbers and the use the indices returned by unique:
X = ... (input vector)
[b, i] = unique(round(X / (tolerance * (1 + i))));
output = X(i);
(you can probably replace b with ~ depending on your Matlab version).
it won't quite have the behaviour you want, since it is possible that two numbers are very close but will be rounded differently. I think you could mitigate this by doing:
X = ... (input vector)
[b, ind] = unique(round(X / (tolerance * (1 + i))));
X = X(ind);
[b, ind] = unique(round(X / (tolerance * (1 + i)) + 0.5 * (1 + i)));
X = X(ind);
This will round them twice, so any numbers that are exactly on a rounding boundary will be caught by the second unique.
There is still some messiness in this - some numbers will be affected as though the tolerance was doubled. But it might be sufficient for your needs.
The alternative is probably a for loop:
X = sort(X);
last = X(1);
indices = ones(numel(X), 1);
for j=2:numel(X)
if X(j) > last + tolerance * (1 + i)
last = X(j) + tolerance * (1 + i) / 2;
else
indices(j) = 0;
end
end
X = X(logical(indices));
I think this has the best behaviour you can expect (because you want to represent the vector by as few unique values as possible - when there are lots of numbers that differ by less than the tolerance level, there may be multiple ways of splitting them. This algorithm does so greedily, starting with the smallest).
I'm almost certain the below ill always assume any values closer than 1e-8 are equal. Simply replace 1e-8 with whatever value you want.
% unique function that assumes 1e-8 is equal
function [out, I] = unique(input, first_last)
threshold = 1e-8;
if nargin < 2
first_last = 'last';
end
[out, I] = sort(input);
db = diff(out);
k = find(abs(db) < threshold);
if strcmpi(first_last, 'last')
k2 = min(I(k), I(k+1));
elseif strcmpi(first_last, 'first')
k2 = max(I(k), I(k+1));
else
error('unknown flag option for unique, must be first or last');
end
k3 = true(1, length(input));
k3(k2) = false;
out = out(k3(I));
I = I(k3(I));
end
The following might serve your purposes. Given X, an array of complex doubles, it sorts them, then checks whether the absolute value differences between elements is within the complex tolerance, real_tol and imag_tol. It removes elements that satisfy this tolerance.
function X_unique = unique_complex_with_tolerance(X,real_tol,imag_tol)
X_sorted = sort(X); %Sorts by magnitude first, then imaginary part.
dX_sorted = diff(X_sorted);
dX_sorted_real = real(dX_sorted);
dX_sorted_imag = imag(dX_sorted);
remove_idx = (abs(dX_sorted_real)<real_tol) & (abs(dX_sorted_imag)<imag_tol);
X_unique = X_sorted;
X_unique(remove_idx) = [];
return
Note that this code will remove all elements which satisfy this difference tolerance. For example, if X = [1+i,2+2i,3+3i,4+4i], real_tol = 1.1, imag_tol = 1.1, then this function will return only one element, X_unique = [4+4i], even though you might consider, for example, X_unique = [1+i,4+4i] to also be a valid answer.

Vectorizing for loops in MATLAB

I'm not too sure if this is possible, but my understanding of MATLAB could certainly be better.
I have some code I wish to vectorize as it's causing quite a bottleneck in my program. It's part of an optimisation routine which has many possible configurations of Short Term Average (STA), Long Term Average (LTA) and Sensitivity (OnSense) to run through.
Time is in vector format, FL2onSS is the main data (an Nx1 double), FL2onSSSTA is its STA (NxSTA double), FL2onSSThresh is its Threshold value (NxLTAxOnSense double)
The idea is to calculate a Red alarm matrix which will be 4D - the alarmStatexSTAxLTAxOnSense that is used throughout the rest of the program.
Red = zeros(length(FL2onSS), length(STA), length(LTA), length(OnSense), 'double');
for i=1:length(STA)
for j=1:length(LTA)
for k=1:length(OnSense)
Red(:,i,j,k) = calcRedAlarm(Time, FL2onSS, FL2onSSSTA(:,i), FL2onSSThresh(:,j,k));
end
end
end
I've currently got this repeating a function in an attempt to get a bit more speed out of it, but obviously it will be better if the entire thing can be vectorised. In other words I do not need to keep the function if there is a better solution.
function [Red] = calcRedAlarm(Time, FL2onSS, FL2onSSSTA, FL2onSSThresh)
% Calculate Alarms
% Alarm triggers when STA > Threshold
zeroSize = length(FL2onSS);
%Precompose
Red = zeros(zeroSize, 1, 'double');
for i=2:zeroSize
%Because of time chunks being butted up against each other, alarms can
%go off when they shouldn't. To fix this, timeDiff has been
%calculated to check if the last date is different to the current by 5
%seconds. If it isn't, don't generate an alarm as there is either a
%validity or time gap.
timeDiff = etime(Time(i,:), Time(i-1,:));
if FL2onSSSTA(i) > FL2onSSThresh(i) && FL2onSSThresh(i) ~= 0 && timeDiff == 5
%If Short Term Avg is > Threshold, Trigger
Red(i) = 1;
elseif FL2onSSSTA(i) < FL2onSSThresh(i) && FL2onSSThresh(i) ~= 0 && timeDiff == 5
%If Short Term Avg is < Threshold, Turn off
Red(i) = 0;
else
%Otherwise keep current state
Red(i) = Red(i-1);
end
end
end
The code is simple enough so I won't explain it any further. If you need elucidation on what a particular line is doing, let me know.
The trick is to bring all your data to the same form, using mostly repmat and permute. Then the logic is the simple part.
I needed a nasty trick to implement the last part (if none of the conditions hold, use the last results). usually that sort of logic is done using a cumsum. I had to use another matrix of 2.^n to make sure the values that are defined are used (so that +1,+1,-1 will really give 1,1,0) - just look at the code :)
%// define size variables for better readability
N = length(Time);
M = length(STA);
O = length(LTA);
P = length(OnSense);
%// transform the main data to same dimentions (3d matrices)
%// note that I flatten FL2onSSThresh to be 2D first, to make things simpler.
%// anyway you don't use the fact that its 3D except traversing it.
FL2onSSThresh2 = reshape(FL2onSSThresh, [N, O*P]);
FL2onSSThresh3 = repmat(FL2onSSThresh2, [1, 1, M]);
FL2onSSSTA3 = permute(repmat(FL2onSSSTA, [1, 1, O*P]), [1, 3, 2]);
timeDiff = diff(datenum(Time))*24*60*60;
timeDiff3 = repmat(timeDiff, [1, O*P, M]);
%// we also remove the 1st plain from each of the matrices (the vector equiv of running i=2:zeroSize
FL2onSSThresh3 = FL2onSSThresh3(2:end, :, :);
FL2onSSSTA3 = FL2onSSSTA3(2:end, :, :);
Red3 = zeros(N-1, O*P, M, 'double');
%// now the logic in vector form
%// note the chage of && (logical operator) to & (binary operator)
Red3((FL2onSSSTA3 > FL2onSSThresh3) & (FL2onSSThresh3 ~= 0) & (timeDiff3 == 5)) = 1;
Red3((FL2onSSSTA3 < FL2onSSThresh3) & (FL2onSSThresh3 ~= 0) & (timeDiff3 == 5)) = -1;
%// now you have a matrix with +1 where alarm should start, and -1 where it should end.
%// add the 0s at the begining
Red3 = [zeros(1, O*P, M); Red3];
%// reshape back to the same shape
Red2 = reshape(Red3, [N, O, P, M]);
Red2 = permute(Red2, [1, 4, 2, 3]);
%// and now some nasty trick to convert the start/end data to 1 where alarm is on, and 0 where it is off.
Weights = 2.^repmat((1:N)', [1, M, O, P]); %// ' damn SO syntax highlighting. learn MATLAB already!
Red = (sign(cumsum(Weights.*Red2))+1)==2;
%// and we are done.
%// print sum(Red(:)!=OldRed(:)), where OldRed is Red calculated in non vector form to test this.