Vectorizing array indexing/subsetting in Matlab - matlab

Suppose I have a long data vector y, plus some indices into it. I want to extract a short snippet or window around every index.
For example, suppose I want to construct a matrix containing 64 samples before and 64 samples after every value that is below three. This is trivial to do in a for-loop:
WIN_SIZE = 64;
% Sample data with padding
data = [nan(WIN_SIZE,1); randn(1e6,1); nan(WIN_SIZE,1)];
% Sample events, could be anything
index = find(data < 3);
snippets = nan(length(index), 2*WIN_SIZE + 1);
for ii=1:length(index)
snippets(ii,:) = data((index(ii)-WIN_SIZE):(index(ii)+WIN_SIZE));
end
However,this is not blazingly fast. Is there any way to vectorize (or otherwise speed up) this operation?
(In case this is unclear, the index could be anything and may not necessarily be a property of the data; I just wanted something simple to illustrate the idea.)

Use bsxfun -
snippets = data(bsxfun(#plus,index(:),[-WIN_SIZE:WIN_SIZE]))

Related

dynamically fill vector without assigning empty matrix

Oftentimes I need to dynamically fill a vector in Matlab. However this is sligtly annoying since you first have to define an empty variable first, e.g.:
[a,b,c]=deal([]);
for ind=1:10
if rand>.5 %some random condition to emphasize the dynamical fill of vector
a=[a, randi(5)];
end
end
a %display result
Is there a better way to implement this 'push' function, so that you do not have to define an empty vector beforehand? People tell me this is nonsensical in Matlab- if you think this is the case please explain why.
related: Push a variable in a vector in Matlab, is-there-an-elegant-way-to-create-dynamic-array-in-matlab
In MATLAB, pre-allocation is the way to go. From the docs:
for and while loops that incrementally increase the size of a data structure each time through the loop can adversely affect performance and memory use.
As pointed out in the comments by m7913d, there is a question on MathWorks' answers section which addresses this same point, read it here.
I would suggest "over-allocating" memory, then reducing the size of the array after your loop.
numloops = 10;
a = nan(numloops, 1);
for ind = 1:numloops
if rand > 0.5
a(ind) = 1; % assign some value to the current loop index
end
end
a = a(~isnan(a)); % Get rid of values which weren't used (and remain NaN)
No, this doesn't decrease the amount you have to write before your loop, it's even worse than having to write a = []! However, you're better off spending a few extra keystrokes and minutes writing well structured code than making that saving and having worse code.
It is (as for as I known) not possible in MATLAB to omit the initialisation of your variable before using it in the right hand side of an expression. Moreover it is not desirable to omit it as preallocating an array is almost always the right way to go.
As mentioned in this post, it is even desirable to preallocate a matrix even if the exact number of elements is not known. To demonstrate it, a small benchmark is desirable:
Ns = [1 10 100 1000 10000 100000];
timeEmpty = zeros(size(Ns));
timePreallocate = zeros(size(Ns));
for i=1:length(Ns)
N = Ns(i);
timeEmpty(i) = timeit(#() testEmpty(N));
timePreallocate(i) = timeit(#() testPreallocate(N));
end
figure
semilogx(Ns, timeEmpty ./ timePreallocate);
xlabel('N')
ylabel('time_{empty}/time_{preallocate}');
% do not preallocate memory
function a = testEmpty (N)
a = [];
for ind=1:N
if rand>.5 %some random condition to emphasize the dynamical fill of vector
a=[a, randi(5)];
end
end
end
% preallocate memory with the largest possible return size
function a = testPreallocate (N)
last = 0;
a = zeros(N, 1);
for ind=1:N
if rand>.5 %some random condition to emphasize the dynamical fill of vector
last = last + 1;
a(last) = randi(5);
end
end
a = a(1:last);
end
This figure shows how much time the method without preallocating is slower than preallocating a matrix based on the largest possible return size. Note that preallocating is especially important for large matrices due the the exponential behaviour.

Looping over fifty elements at a time (matlab)

I am currently new to matlab, and I am trying to do a loop over fifty elements at a time instead of one element at a time. For example, I have a list of 1000 elements, and I would like to compute the sum for every fifty elements. Instead of doing a sum function through indexing, it would be much faster with a loop. How would I go about doing this?
I.e. [1,...50th element, 51th element... 100...]
Output would be the the sum values of 1:50, 51:100, 101:150... and so on.
Thanks in advance
I'm not really sure what you mean by "a sum function through indexing", but there are various ways to do this. In general I try to avoid explicit loops in Matlab and let MathWorks functions do their magic.
results = zeros(20,1);
for i = 1:20
results(i) = sum(1 + (50 * (i - 1)):50 + 50 * (i - 1));
end
Another option is to do something like arrayfun.
sIndex = 1:50:951;
eIndex = 50:50:1000;
result = arrayfun(#(x, y) sum(x:y), sIndex, eIndex);
You could also use reshape and sum to do it one shot.
numbers = 1:1000;
numbers2 = reshape(numbers, 50, []);
result = sum(numbers2);
This last method is what I personally would say is a Matlab way of doing it. arrayfun is basically a wrapper around a loop and the loop is...well a loop.
In case you need the sum, you can also use movsum:
array = 1:1000;
win = 50; % window size
msum = movsum(array,win,'Endpoints','discard');
in the same way, you can use:
movmax Moving maximum
movmean Moving mean
movmedian Moving median
movmin Moving minimum
movstd Moving standard deviation
movvar Moving variance
Using cumsum and diff you can obtain the desired result.
C = [0 cumsum(a)];
out = diff(C(1:50:end));

MATLAB Piecewise function

I have to construct the following function in MATLAB and am having trouble.
Consider the function s(t) defined for t in [0,4) by
{ sin(pi*t/2) , for t in [0,1)
s(t) = { -(t-2)^3 , for t in [1,3)*
{ sin(pi*t/2) , for t in [3,4)
(i) Generate a column vector s consisting of 512 uniform
samples of this function over the interval [0,4). (This
is best done by concatenating three vectors.)
I know it has to be something of the form.
N = 512;
s = sin(5*t/N).' ;
But I need s to be the piecewise function, can someone provide assistance with this?
If I understand correctly, you're trying to create 3 vectors which calculate the specific function outputs for all t, then take slices of each and concatenate them depending on the actual value of t. This is inefficient as you're initialising 3 times as many vectors as you actually want (memory), and also making 3 times as many calculations (CPU), most of which will just be thrown away. To top it off, it'll be a bit tricky to use concatenate if your t is ever not as you expect (i.e. monotonically increasing). It might be an unlikely situation, but better to be general.
Here are two alternatives, the first is imho the nice Matlab way, the second is the more conventional way (you might be more used to that if you're coming from C++ or something, I was for a long time).
function example()
t = linspace(0,4,513); % generate your time-trajectory
t = t(1:end-1); % exclude final value which is 4
tic
traj1 = myFunc(t);
toc
tic
traj2 = classicStyle(t);
toc
end
function trajectory = myFunc(t)
trajectory = zeros(size(t)); % since you know the size of your output, generate it at the beginning. More efficient than dynamically growing this.
% you could put an assert for t>0 and t<3, otherwise you could end up with 0s wherever t is outside your expected range
% find the indices for each piecewise segment you care about
idx1 = find(t<1);
idx2 = find(t>=1 & t<3);
idx3 = find(t>=3 & t<4);
% now calculate each entry apprioriately
trajectory(idx1) = sin(pi.*t(idx1)./2);
trajectory(idx2) = -(t(idx2)-2).^3;
trajectory(idx3) = sin(pi.*t(idx3)./2);
end
function trajectory = classicStyle(t)
trajectory = zeros(size(t));
% conventional way: loop over each t, and differentiate with if-else
% works, but a lot more code and ugly
for i=1:numel(t)
if t(i)<1
trajectory(i) = sin(pi*t(i)/2);
elseif t(i)>=1 & t(i)<3
trajectory(i) = -(t(i)-2)^3;
elseif t(i)>=3 & t(i)<4
trajectory(i) = sin(pi*t(i)/2);
else
error('t is beyond bounds!')
end
end
end
Note that when I tried it, the 'conventional way' is sometimes faster for the sampling size you're working on, although the first way (myFunc) is definitely faster as you scale up really a lot. In anycase I recommend the first approach, as it is much easier to read.

Nearest column in matlab

I want to find the nearest column of a matrix with a vector.
Consider the matrix is D and the vector is y. I want an acceleration method for this function
function BOOLEAN = IsExsist(D,y)
[~, Ysize, ~] = size(D);
BOOLEAN = 0;
MIN = 1.5;
for i=1:Ysize
if(BOOLEAN == 1)
break;
end;
if(norm(y - D(:,i),1) < MIN )
BOOLEAN = 1;
end;
end;
end
I am assuming you are looking to "accelerate" this procedure. For the same, try this -
[~,nearest_column_number] = min(sum(abs(bsxfun(#minus,D,y))))
The above code uses 1-Norm (as used by you) along all the columns of D w.r.t. y. nearest_column_number is your desired output.
If you are interested in using a threshold MIN for the getting the first nearest column number, you may use the following code -
normvals = sum(abs(bsxfun(#minus,D,y)))
nearest_column_number = find(normvals<MIN,1)
BOOLEAN = ~isempty(nearest_column_number)
nearest_column_number and BOOLEAN are the outputs you might be interested in.
If you are looking to make a function out of it, just wrap in the above code into the function format you were using, as you already have the desired output from the code.
Edit 1: If you are using this procedure for a case with large D matrices with sizes like 9x88800, use this -
normvals = sum(abs(bsxfun(#minus,D,y)));
BOOLEAN = false;
for k = 1:numel(normvals)
if normvals(k) < MIN
BOOLEAN = true;
break;
end
end
Edit 2: It appears that you are calling this procedure/function a lot of times, which is the bottleneck here. So, my suggestion at this point would be to look into your calling function and see if you could reduce the number of calls, otherwise use your own code or try this slight modified version of it -
BOOLEAN = false;
for k = 1:numel(y)
if norm(y - D(:,k),1) < MIN %// You can try replacing this with "if sum(abs(y - D(:,i),1)) < MIN" to see if it gives any performance improvement
BOOLEAN = true;
break;
end
end
To find the nearest column of a matrix D to a column vector y, with respect to 1-norm distance, you can use pdist2:
[~, index] = min(pdist2(y.',D.','minkowski',1));
What you are currently trying to do is optimize your Matlab implementation of linear search.
No matter how much you optimize that it will always need to calculate all D=88800 distances over all d=9 dimensions for each search.
Now that's easy to implement in Matlab as discussed in the other answer, but if you are planning to do many such searches, I would recommend to use a different data-structure and search-algorithm instead.
A good canditate would be (binary) space partioning which recursively splits your space into two parts along your dimensions. This adds quite some intial overhead to create the tree and makes insert- and remove-operations a bit more expensive. But as I understand your comments, searches are much more frequent and their execution will reduce in complexits from O(D) downto O(log(D)) which is a tremendous improvement for this problem size.
I think that there should be some usable Matlab-implementations of BSP around, e.g. on Mathworks file-exchange.
But if you don't manage to find one, I'd be happy to provide some pointers as well.

Looping over matrix elements more efficiently in Matlab

I am writing some matlab code and have written an algorithm that works but I don't think its particularly efficient. Since I am trying to improve my programming skills I would like to know if there is a more efficient way of doing this.
I have a (reasonably large ~ E07) matrix of values which are unordered, but fall within the range [-100, 100]. I want to create a second matrix based on the first, by using the following rules:
If the value of the point is > 70, then the value of the point should be set to 70.
If the value of the point is < -70, then the value of the point should be set to -70.
All other values should be rounded to the nearest multiple of 5.
Here is what I am currently doing:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data));
for i = 1:length(data)
if (data(i) > 70)
new_data(i) = 70;
elseif (data(i) < -70)
new_data(i) = -70;
else
new_data(i) = round(data(i)/5.0)*5.0;
end
end
Is there a more efficient method? I think there should be a way to do this using logical indexes but those are a new discovery for me...
You do not need a loop at all:
data = 100*(-1+2*rand(1,10000000)); % create random dataset for stackoverflow
new_data = zeros(1,length(data)); % note that this memory allocation is not necessary at this point
new_data = round(data/5.0)*5.0;
new_data(data>70) = 70;
new_data(data<-70) = -70;
Even easier is to use max and min. Do it in one simple line.
new_data = round(5*max(-70,min(70,data)))/5;
The two answers by H.Muster and woodchips are of course the way to do it, but there still are small improvements to be found. If you are after performance you might want to exploit specifics of your problem. For example, your output data is integers -100 <= x <= 100. This obviously qualifies for 8-bit signed integer data type. This code (note explicit cast to int8 from arbitrary double precision data)
% your double precision input data
data = 100*(-1+2*rand(1,10000000));
% cast to int8 - matlab does usual round here
data = int8(data);
new_data = 5*(max(-70,min(70,data))/5);
is the fastest for two reasons:
1 data element takes 1 byte, not 8. Memory bandwidth is a limiting factor here, so you get a lot of improvement
round is no longer necessary
Here are some timings from the codes of H.Muster, woodchips, and my small modification:
H.Muster Elapsed time is 0.235885 seconds.
woodchips Elapsed time is 0.167659 seconds.
my code Elapsed time is 0.023061 seconds.
The difference is quite striking. Although MATLAB uses doubles everywhere, you should try to use integer data types when possible..
Edit This works because of how matlab implements integer arithmetic. Differently than in C, a cast of double to int implies a round operation:
a = 0.1;
int8(a)
ans =
0
a = 0.9;
int8(a)
ans =
1