Looping over fifty elements at a time (matlab) - matlab

I am currently new to matlab, and I am trying to do a loop over fifty elements at a time instead of one element at a time. For example, I have a list of 1000 elements, and I would like to compute the sum for every fifty elements. Instead of doing a sum function through indexing, it would be much faster with a loop. How would I go about doing this?
I.e. [1,...50th element, 51th element... 100...]
Output would be the the sum values of 1:50, 51:100, 101:150... and so on.
Thanks in advance

I'm not really sure what you mean by "a sum function through indexing", but there are various ways to do this. In general I try to avoid explicit loops in Matlab and let MathWorks functions do their magic.
results = zeros(20,1);
for i = 1:20
results(i) = sum(1 + (50 * (i - 1)):50 + 50 * (i - 1));
end
Another option is to do something like arrayfun.
sIndex = 1:50:951;
eIndex = 50:50:1000;
result = arrayfun(#(x, y) sum(x:y), sIndex, eIndex);
You could also use reshape and sum to do it one shot.
numbers = 1:1000;
numbers2 = reshape(numbers, 50, []);
result = sum(numbers2);
This last method is what I personally would say is a Matlab way of doing it. arrayfun is basically a wrapper around a loop and the loop is...well a loop.

In case you need the sum, you can also use movsum:
array = 1:1000;
win = 50; % window size
msum = movsum(array,win,'Endpoints','discard');
in the same way, you can use:
movmax Moving maximum
movmean Moving mean
movmedian Moving median
movmin Moving minimum
movstd Moving standard deviation
movvar Moving variance

Using cumsum and diff you can obtain the desired result.
C = [0 cumsum(a)];
out = diff(C(1:50:end));

Related

Indexing elements of parameters of a function within nested for loops

I have two matrices of results, A = 128x631 and B = 128x1014 and I have a function SSD that takes two elements (x,y) as parameters and then calculates the sum of squared differences. I also have a 631x1014 matrix of 0s, called SSDMatrix, ready to put the results of my SSD function into.
What I'm trying to do is compare each element of A with each element of B by passing them into SSD, but I can't figure out how to structure my for loops to get the desired results.
When I try:
SSDMatrix = SSD(A, B);
I get exactly the result I'm looking for, but only for the first cell. How can I repeat this process for each element of A and B?
Currently I have this:
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = SSD(A,B);
end
end
This just results in the first answer being repeated 631*1014 times, so I need a way to index A and B to get the appropriate answer for each (i,j) of SSDMatrix.
It seems you were needed to do something like this -
SSDMatrix = zeros(NumFeatures1,NumFeatures2);
for i = 1:631
for j = 1:1014
SSDMatrix(i,j) = sum( (A(:,i) - B(:,j)).^ 2 );
end
end
This, you can achieve with pdist2 as well that gets us the square root of summed squared distances. Now, please do note that pdist2 is part of the Statistics Toolbox. So, to get the desired output, you can do -
out = pdist2(A.',B.').^2;
Or with bsxfun -
out = squeeze(sum(bsxfun(#minus,A,permute(B,[1 3 2])).^2,1));

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

faster method of interpolation in matlab

I am using interp1 to inteprolate some data:
temp = 4 + (30-4).*rand(365,10);
depth = 1:10;
dz = 0.5; %define new depth interval
bthD = min(depth):dz:max(depth); %new depth vector
for i = 1:length(temp);
i_temp(i,:) = interp1(depth,temp(i,:),bthD);
end
Here, I am increasing the resolution of my measurements by interpolating the measurements from 1 m increments to 0.5 m increments. This code works fine i.e. it gives me the matrix I was looking for. However, when I apply this to my actual data, it takes a long time to run, primarily as I am running an additional loop which runs through various cells. Is there a way of achieving what is described above without using the loop, in other words, is there a faster method?
Replace your for loop with:
i_temp = interp1(depth,temp',bthD)';
You can get rid of the transposes if you change the way that temp is defined, and if you are OK with i_temp being a 19x365 array instead of 365x19.
BTW, the documentation for interp1 is very clear that you can pass in an array as the second argument.

For loop with multiplication step in MATLAB

Is there any way to use a for-loop in MATLAB with a custom step? What I want to do is iterate over all powers of 2 lesser than a given number. The equivalent loop in C++ (for example) would be:
for (int i = 1; i < 65; i *= 2)
Note 1: This is the kind of iteration that best fits for-loops, so I'd like to not use while-loops.
Note 2: I'm actually using Octave, not MATLAB.
Perhaps you want something along the lines of
for i=2.^[1:6]
disp(i)
end
Except you will need to figure out the range of exponents. This uses the fact that since
a_(i+1) = a_i*2 this can be rewritten as a_i = 2^i.
Otherwise you could do something like the following
i=1;
while i<65
i=i*2;
disp(i);
end
You can iterate over any vector, so you can use vector operations to create your vector of values before you start your loop. A loop over the first 100 square numbers, for example, could be written like so:
values_to_iterate = [1:100].^2;
for i = values_to_iterate
i
end
Or you could loop over each position in the vector values_to_iterate (this gives the same result, but has the benefit that i keeps track of how many iterations you have done - this is useful if you are writing a result from each loop sequentially to an output vector):
values_to_iterate = [1:100].^2;
for i = 1:length(values_to_iterate)
values_to_iterate(i)
results_vector(i) = some_function( values_to_iterate(i) );
end
More concisely, you can write the first example as simply:
for i = [1:100].^2
i
end
Unlike in C, there doesn't have to be a 'rule' to get from one value to the next.
The vector iterated over can be completely arbitrary:
for i = [10, -1000, 23.3, 5, inf]
i
end

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.