How to efficiently apply correction to a time series with fully vectorized code in Matlab - matlab

I face a computation efficiency problem. I have a time series of a non-monotonically drifting variable, that is measurements of objects going through a machine (where the measurement is made) in a production line. The job to do consists of simulating what this time series would yield if there was a correction made to the object each time the measurement drifts above or below a threshold.
To do that I could simply make a for loop, and each time the thresholds are crossed, I apply the correction to the rest of the time series. However the time series is very long and the for loop take too much time to compute. I would like to vectorize this code to make it more efficient. Does anyone sees a way to do so?
Here is a working example using a for loop:
ts = [1 2 3 4 3 2 1 0 -1 -2 -3];
threshold = [-3.5 3];
correction = [1 -1];
for i = 1:numel(ts)
if ts(i) > threshold(2)
ts(i:end) = ts(i:end) + correction(2);
elseif ts(i) < threshold(1)
ts(i:end) = ts(i:end) + correction(1);
end
end
disp(ts)
The result is and should be the following array:
[1 2 3 3 2 1 0 -1 -2 -3 -3]
How can I vectorize this problem?

Related

Improving the performance of many sub-matrix left division operations (mldivide, \)

I have two matrices, a1 and a2. a1 is 3x12000 and a2 is 3x4000. I want to create another array that is 3x4000 which is the left matrix division (mldivide, \) of the 3x3 sub-matrices of a1 and the 3x1 sub-matrices of a2. You can do this easily with a for loop:
for ii = 1:3:12000
a = a1(:,ii:ii+2)\a2(:, ceil(ii/3));
end
However, I was wondering if there was a faster way to do this.
Edit: I am aware preallocation increases speed, I was just showing that for visual purposes.
Edit2: Removed iterative increase of array. It seems my questions has been misinterpreted a bit. I was mainly wondering if there were some matrix operations I could do to achieve my goal as that would likely be quicker than a for loop i.e. reshape a1 to a 3x3x4000 matrix and a2 to a 3x1x4000 matrix and left matrix divide each level in one go, however, you can't left matrix divide with 3D matrices.
You can create one system of equations containing many independent 'sub-systems' of equations by putting the sub-matrices of a1 in a the diagonal of a 12000x12000 matrix like this:
a1(1,1) a1(1,2) a1(1,3) 0 0 0 0 0 0
a1(2,1) a1(2,2) a1(2,3) 0 0 0 0 0 0
a1(3,1) a1(3,2) a1(3,3) 0 0 0 0 0 0
0 0 0 a1(1,4) a1(1,5) a1(1,6) 0 0 0
0 0 0 a1(2,4) a1(2,5) a1(2,6) 0 0 0
0 0 0 a1(3,4) a1(3,5) a1(3,6) 0 0 0
0 0 0 0 0 0 a1(1,7) a1(1,8) a1(1,9)
0 0 0 0 0 0 a1(2,7) a1(2,8) a1(2,9)
0 0 0 0 0 0 a1(3,7) a1(3,8) a1(3,9)
and then left divide it by a2(:).
This can be done using kron and sparse matrix like this (source):
a1_kron = kron(speye(12000/3),ones(3));
a1_kron(logical(a1_kron)) = a1(:);
a = a1_kron\a2(:);
a = reshape(a, [3 12000/3]);
Advantage - Speed: This is about 3-4 times faster than a for loop with preallocation on my PC.
Disadvantage: There is one disadvantage you must consider with this approach: when using left division, Matlab looks for the best way to solve the systems of linear equations, so if you solve each sub-system independently, the best way will be chosen for each sub-system, but if you solve theme as one system, Matlab will find the best way for all the sub-systems together - not the best for each sub-system.
Note: As shown in Stefano M's answer, using one big system of equations (using kron and sparse matrix) is faster than using a for loop (with preallocation) only for very small size of sub-systems of equations (on my PC, for number of equation <= 7) for bigger sizes of sub-systems of equations, using a for loop is faster.
Comparing different methods
I wrote and ran a code to compare 4 different methods for solving this problems:
for loop, no preallocation
for loop, with preallocation
kron
cellfun
Test:
n = 1200000;
a1 = rand(3,n);
a2 = rand(3,n/3);
disp('Method 1: for loop, no preallocation')
tic
a_method1 = [];
for ii = 1:3:n
a_method1 = [a_method1 a1(:,ii:ii+2)\a2(:, ceil(ii/3))];
end
toc
disp(' ')
disp('Method 2: for loop, with preallocation')
tic
a1_reshape = reshape(a1, 3, 3, []);
a_method2 = zeros(size(a2));
for i = 1:size(a1_reshape,3)
a_method2(:,i) = a1_reshape(:,:,i) \ a2(:,i);
end
toc
disp(' ')
disp('Method 3: kron')
tic
a1_kron = kron(speye(n/3),ones(3));
a1_kron(logical(a1_kron)) = a1(:);
a_method3 = a1_kron\a2(:);
a_method3 = reshape(a_method3, [3 n/3]);
toc
disp(' ')
disp('Method 4: cellfun')
tic
a1_cells = mat2cell(a1, size(a1, 1), repmat(3 ,1,size(a1, 2)/3));
a2_cells = mat2cell(a2, size(a2, 1), ones(1,size(a2, 2)));
a_cells = cellfun(#(x, y) x\y, a1_cells, a2_cells, 'UniformOutput', 0);
a_method4 = cell2mat(a_cells);
toc
disp(' ')
Results:
Method 1: for loop, no preallocation
Elapsed time is 747.635280 seconds.
Method 2: for loop, with preallocation
Elapsed time is 1.426560 seconds.
Method 3: kron
Elapsed time is 0.357458 seconds.
Method 4: cellfun
Elapsed time is 3.390576 seconds.
Comparing the results of the four methods, you can see that using method 3 - kron gives slightly different results:
disp(['sumabs(a_method1(:) - a_method2(:)): ' num2str(sumabs(a_method1(:)-a_method2(:)))])
disp(['sumabs(a_method1(:) - a_method3(:)): ' num2str(sumabs(a_method1(:)-a_method3(:)))])
disp(['sumabs(a_method1(:) - a_method4(:)): ' num2str(sumabs(a_method1(:)-a_method4(:)))])
Result:
sumabs(a_method1(:) - a_method2(:)): 0
sumabs(a_method1(:) - a_method3(:)): 8.9793e-05
sumabs(a_method1(:) - a_method4(:)): 0
You are solving a series of N systems with m linear equations each, the N systems are of the form
Ax = b
You can convert these to a single system of Nm linear equations:
|A1 0 0 ... 0 | |x1| |b1|
|0 A2 0 ... 0 | |x2| |b2|
|0 0 A3 ... 0 | |x3| = |b3|
|. . . ... . | |. | |. |
|0 0 0 ... AN| |xN| |bN|
However, solving this one system of equations is a lot more expensive than solving all the little ones. Typically, the cost is O(n^3), so you go from O(N m^3) to O((Nm)^3). A huge pessimization. (Eliahu proved me wrong here, apparently the sparsity of the matrix can be exploited.)
Reducing the computational cost can be done, but you need to provide guarantees about the data. For example, if the matrices A are positive definite, the systems can be solved more cheaply. Nonetheless, given that you are dealing with 3x3 matrices, the winnings there will be slim, since those are pretty simple systems to solve.
If you are asking this because you think that loops are inherently slow in MATLAB, you should know that this is no longer the case, and hasn’t been the case since MATLAB gained a JIT 15 years or so ago. Today, many vectorization attempts lead to equally fast code, and oftentimes to slower code, especially for large data. (I could fish up some timings I’ve posted here on SO to prove this if necessary.)
I would think that solving all systems in one go could reduce the number of checks that MATLAB does every time the operator \ is called. That is, hard-coding the problem size and type might improve throughout. But the only way to do so is to write a MEX-file.
MarginalBiggest improvement would be to preallocate the output matrix, instead of growing it:
A1 = reshape(A1, 3, 3, []);
a = zeros(size(A2));
for i = 1:size(A1,3)
a(:,i) = A1(:,:,i) \ A2(:,i);
end
With the preallocate array, if the Parallel Toolbox is available, you can try parfor
Edit
This answer is no more relevant, since the OP rephrased the question to avoid growing the result array, which was the original major bottleneck.
The problem here is that one has to solve 4000 independent 3x3 linear systems. The matrix is so small that an ad hoc solution could be of interest, especially if one has some information on the matrix properties (symmetric, or not, condition number, etc.). However sticking to the \ matlab operator, the best way to speed up computations is by explicitly leverage of parallelism, e.g. by the parfor command.
The sparse matrix solution of the other answer by Eliahu Aaron is indeed very clever, but its speed advantage is not general but depends on the specific problem size.
With this function you can explore different problem sizes:
function [t2, t3] = sotest(n, n2)
a1 = rand(n,n*n2);
a2 = rand(n,n2);
tic
a1_reshape = reshape(a1, n, n, []);
a_method2 = zeros(size(a2));
for i = 1:size(a1_reshape,3)
a_method2(:,i) = a1_reshape(:,:,i) \ a2(:,i);
end
t2 = toc;
tic
a1_kron = kron(speye(n2),ones(n));
a1_kron(logical(a1_kron)) = a1(:);
a_method3 = a1_kron\a2(:);
a_method3 = reshape(a_method3, [n n2]);
t3 = toc;
assert ( norm(a_method2 - a_method3, 1) / norm(a_method2, 1) < 1e-8)
Indeed for n=3 the sparse matrix method is clearly superior, but for increasing n it becomes less competitive
The above figure was obtained with
>> for i=1:20; [t(i,1), t(i,2)] = sotest(i, 50000); end
>> loglog(1:20, t, '*-')
My final comment is that an explicit loop with the dense \ operator is indeed fast; the sparse matrix formulation is slightly less accurate and could become problematic in edge cases; and for sure the sparse matrix solution is not very readable. If the number n2 of systems to solve is very very big (>1e6) then maybe ad hoc solutions should be explored.

To see the results of patternsearch optimization for each iteration in MATLAB

I have a optimization problem which is very time consuming and I need to do it many times (This is somehow a trial and error problem for me). However, I do not want to wait for the final result. I need to see the result of optimization at each iteration. More specifically, I want to see the x value (the solution) and am not so interested in fval (objective function value at x). Unfortunately, patternsearch only shows fval and not the solution of optimization at each iteration. I know that I should fix this problem through the "Output Function" and spent a lot of time and could not understand how to do it. To make everything convenient for you and let's consider the following simple optimization problem:
options = optimoptions('patternsearch');
options = optimoptions(options,'Display', 'iter','TolX',1e-6);
x0=2;lb=-3;ub=3;
x = patternsearch(#(x)x^2,x0,[],[],[],[],lb,ub,[],options);
The first few lines we see on the command window looks like the following:
Iter f-count f(x) MeshSize Method
0 1 4 1
1 2 4 0.5 Refine Mesh
2 3 0 1 Successful Poll
Unfortunately, I see nothing about x.
Would you please change my code so that I can also see the solution (x) at each iteration, it is greatly appreciated!
Babak
A valid output function handler for patternsearch should be specified as follows:
function [stop,options,optchanged] = fun(optimvalues,options,flag)
The following code should be enough to show the information you are looking for:
options = optimoptions('patternsearch');
options = optimoptions(options,'Display','iter','OutputFcns',#custom,'TolX',1e-6);
x0 = 2; lb = -3; ub = 3;
x = patternsearch(#(x)x^2,x0,[],[],[],[],lb,ub,[],options);
function [stop,options,optchanged] = custom(optimvalues,options,flag)
stop = false;
optchanged = false;
if (strcmp(flag,'iter'))
disp([' Iteration performed for X=' num2str(optimvalues.x)]);
end
end
Here is the output:
Iter Func-count f(x) MeshSize Method
0 1 4 1
Iteration performed for X=2
1 2 4 0.5 Refine Mesh
Iteration performed for X=0
2 3 0 1 Successful Poll
Iteration performed for X=0
3 3 0 0.5 Refine Mesh
Iteration performed for X=0
4 5 0 0.25 Refine Mesh
Iteration performed for X=0
...
It's just an example and you can, of course, tweak the function so that the text is displayed the way you prefer.

determine lag between two vector

I want to find the minimum amount of lag between two vector , I mean the minimum distance that something is repeated in vector based on another one
for example for
x=[0 0 1 2 2 2 0 0 0 0]
y=[1 2 2 2 0 0 1 2 2 2]
I want to obtain 4 for x to y and obtain 2 for y to x .
I found out a finddelay(x,y) function that works correctly only for x to y (it gives -4 for y to x).
is there any function that only give me lag based on going to the right direction of the vector? I will be so thankful if you'd mind helping me to get this result
I think this may be a potential bug in finddelay. Note this excerpt from the documentation (emphasis mine):
X and Y need not be exact delayed copies of each other, as finddelay(X,Y) returns an estimate of the delay via cross-correlation. However this estimated delay has a useful meaning only if there is sufficient correlation between delayed versions of X and Y. Also, if several delays are possible, as in the case of periodic signals, the delay with the smallest absolute value is returned. In the case that both a positive and a negative delay with the same absolute value are possible, the positive delay is returned.
This would seem to imply that finddelay(y, x) should return 2, when it actually returns -4.
EDIT:
This would appear to be an issue related to floating-point errors introduced by xcorr as I describe in my answer to this related question. If you type type finddelay into the Command Window, you can see that finddelay uses xcorr internally. Even when the inputs to xcorr are integer values, the results (which you would expect to be integer values as well) can end up having floating-point errors that cause them to be slightly larger or smaller than an integer value. This can then change the indices where maxima would be located. The solution is to round the output from xcorr when you know your inputs are all integer values.
A better implementation of finddelay for integer values might be something like this, which would actually return the delay with the smallest absolute value:
function delay = finddelay_int(x, y)
[d, lags] = xcorr(x, y);
d = round(d);
lags = -lags(d == max(d));
[~, index] = min(abs(lags));
delay = lags(index);
end
However, in your question you are asking for the positive delays to be returned, which won't necessarily be the smallest in absolute value. Here's a different implementation of finddelay that works correctly for integer values and gives preference to positive delays:
function delay = finddelay_pos(x, y)
[d, lags] = xcorr(x, y);
d = round(d);
lags = -lags(d == max(d));
index = (lags <= 0);
if all(index)
delay = lags(1);
else
delay = lags(find(index, 1)-1);
end
end
And here are the various results for your test case:
>> x = [0 0 1 2 2 2 0 0 0 0];
>> y = [1 2 2 2 0 0 1 2 2 2];
>> [finddelay(x, y) finddelay(y, x)] % The default behavior, which fails to find
% the delays with smallest absolute value
ans =
4 -4
>> [finddelay_int(x, y) finddelay_int(y, x)] % Correctly finds the delays with the
% smallest absolute value
ans =
-2 2
>> [finddelay_pos(x, y) finddelay_pos(y, x)] % Finds the smallest positive delays
ans =
4 2

How to get the maximal values and the related coordinates? [duplicate]

suppose that we are determine peaks in vector as follow:
we have real values one dimensional vector with length m,or
x(1),x(2),.....x(m)
if x(1)>x(2) then clearly for first point peak(1)=x(1);else we are then comparing x(3) to x(2),if x(3)
[ indexes,peaks]=function(x,m);
c=[];
b=[];
if x(1)>x(2)
peaks(1)=x(1);
else
for i=2:m-1
if x(i+1)< x(i) & x(i)>x(i-1)
peak(i)=x(i);
end;
end
end
end
peaks are determined also using following picture:
sorry for the second picture,maybe it is not triangle,just A and C are on straight line,but here peak is B,so i can't continue my code for writing algorithm to find peak values in my vector.please help me to continue it
updated.numercial example given
x=[2 1 3 5 4 7 6 8 9]
here because first point is more then second,so it means that peak(1)=2,then we are comparing 1 to 3,because 3 is more then 1,we now want to compare 5 to 3,it is also more,compare 5 to 4,because 5 is more then 4,then it means that peak(2)=5,,so if we continue next peak is 7,and final peak would be 9
in case of first element is less then second,then we are comparing second element to third one,if second is more then third and first elements at the same time,then peak is second,and so on
You could try something like this:
function [peaks,peak_indices] = find_peaks(row_vector)
A = [min(row_vector)-1 row_vector min(row_vector)-1];
j = 1;
for i=1:length(A)-2
temp=A(i:i+2);
if(max(temp)==temp(2))
peaks(j) = row_vector(i);
peak_indices(j) = i;
j = j+1;
end
end
end
Save it as find_peaks.m
Now, you can use it as:
>> A = [2 1 3 5 4 7 6 8 9];
>> [peaks, peak_indices] = find_peaks(A)
peaks =
2 5 7 9
peak_indices =
1 4 6 9
This would however give you "plateaus" as well (adjacent and equal "peaks").
You can use diff to do the comparison and add two points in the beginning and end to cover the border cases:
B=[1 diff(A) -1];
peak_indices = find(B(1:end-1)>=0 & B(2:end)<=0);
peaks = A(peak_indices);
It returns
peak_indices =
1 4 6 9
peaks =
2 5 7 9
for your example.
findpeaks does it if you have a recent matlab version, but it's also a bit slow.
This proposed solution would be quite slow due to the for loop, and you also have a risk of rounding error due to the fact that you compare the maximal value to the central one instead of comparing the position of the maximum, which is better for your purpose.
You can stack the data so as to have three columns : the first one for the preceeding value, the second is the data and the third one is the next value, do a max, and your local maxima are the points for which the position of the max along columns is 2.
I've coded this as a subroutine of my own peak detection function, that adds a further level of iterative peak detection
http://www.mathworks.com/matlabcentral/fileexchange/42927-find-peaks-using-scale-space-approach

find peak values in matlab

suppose that we are determine peaks in vector as follow:
we have real values one dimensional vector with length m,or
x(1),x(2),.....x(m)
if x(1)>x(2) then clearly for first point peak(1)=x(1);else we are then comparing x(3) to x(2),if x(3)
[ indexes,peaks]=function(x,m);
c=[];
b=[];
if x(1)>x(2)
peaks(1)=x(1);
else
for i=2:m-1
if x(i+1)< x(i) & x(i)>x(i-1)
peak(i)=x(i);
end;
end
end
end
peaks are determined also using following picture:
sorry for the second picture,maybe it is not triangle,just A and C are on straight line,but here peak is B,so i can't continue my code for writing algorithm to find peak values in my vector.please help me to continue it
updated.numercial example given
x=[2 1 3 5 4 7 6 8 9]
here because first point is more then second,so it means that peak(1)=2,then we are comparing 1 to 3,because 3 is more then 1,we now want to compare 5 to 3,it is also more,compare 5 to 4,because 5 is more then 4,then it means that peak(2)=5,,so if we continue next peak is 7,and final peak would be 9
in case of first element is less then second,then we are comparing second element to third one,if second is more then third and first elements at the same time,then peak is second,and so on
You could try something like this:
function [peaks,peak_indices] = find_peaks(row_vector)
A = [min(row_vector)-1 row_vector min(row_vector)-1];
j = 1;
for i=1:length(A)-2
temp=A(i:i+2);
if(max(temp)==temp(2))
peaks(j) = row_vector(i);
peak_indices(j) = i;
j = j+1;
end
end
end
Save it as find_peaks.m
Now, you can use it as:
>> A = [2 1 3 5 4 7 6 8 9];
>> [peaks, peak_indices] = find_peaks(A)
peaks =
2 5 7 9
peak_indices =
1 4 6 9
This would however give you "plateaus" as well (adjacent and equal "peaks").
You can use diff to do the comparison and add two points in the beginning and end to cover the border cases:
B=[1 diff(A) -1];
peak_indices = find(B(1:end-1)>=0 & B(2:end)<=0);
peaks = A(peak_indices);
It returns
peak_indices =
1 4 6 9
peaks =
2 5 7 9
for your example.
findpeaks does it if you have a recent matlab version, but it's also a bit slow.
This proposed solution would be quite slow due to the for loop, and you also have a risk of rounding error due to the fact that you compare the maximal value to the central one instead of comparing the position of the maximum, which is better for your purpose.
You can stack the data so as to have three columns : the first one for the preceeding value, the second is the data and the third one is the next value, do a max, and your local maxima are the points for which the position of the max along columns is 2.
I've coded this as a subroutine of my own peak detection function, that adds a further level of iterative peak detection
http://www.mathworks.com/matlabcentral/fileexchange/42927-find-peaks-using-scale-space-approach