Matlab iterative polyfit - matlab

I have x and y data that has n number of points in each of the arrays.
I want to use polyfit on portions of the data.
I want to divide the data into a certain number of divisions(numDivisions).
My idea would be to do something along the lines of
n= size(x)%number of data points
numDivisions = 4;%number of times to divide the data
div = zeros(numDivisions,1)%number of points per division
p = zeros(numDivisions,4);% second number is degree of polynomial+1
S = zeros(numDivisions,1);
mu = zeros(numDivisions,1);
E = zeros(numDivisions,1);
for i = 1:numDivisions
div(i) = round(n(1,1)*i/numDivisions) %assign markers for divisions of points
end
for i = 1:size(div)
if i == 1
start = 1;
endpoint = div(i);
[p(i), S(i), mu(i)] = polyfit(x(start:endpoint), y(start:endpoint), 3);
else
[p(i), S(i), mu(i)] = polyfit(x(div(i-1):div(i)), y(div(i-1):div(i)), 3);
end
end
The goal would be to have an array of p values from the polyfits.
However, when I run it I get this error:
In an assignment A(I) = B, the number of elements in B
and I must be the same.
Error in (line 33)
[p(i), S(i), mu(i)] =
polyfit(x(start:endpoint),
y(start:endpoint), 3);

Related

average bins along a dimension of a nd array in matlab

To compute the mean of every bins along a dimension of a nd array in matlab, for example, average every 10 elements along dim 4 of a 4d array
x = reshape(1:30*30*20*300,30,30,20,300);
n = 10;
m = size(x,4)/10;
y = nan(30,30,20,m);
for ii = 1 : m
y(:,:,:,ii) = mean(x(:,:,:,(1:n)+(ii-1)*n),4);
end
It looks a bit silly. I think there must be better ways to average the bins?
Besides, is it possible to make the script applicable to general cases, namely, arbitray ndims of array and along an arbitray dim to average?
For the second part of your question you can use this:
x = reshape(1:30*30*20*300,30,30,20,300);
dim = 4;
n = 10;
m = size(x,dim)/10;
y = nan(30,30,20,m);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
for ii = 1 : m
idx1{dim} = ii;
idx2{dim} = (1:n)+(ii-1)*n;
y(idx1{:}) = mean(x(idx2{:}),dim);
end
For the first part of the question here is an alternative using cumsum and diff, but it may not be better then the loop solution:
function y = slicedmean(x,slice_size,dim)
s = cumsum(x,dim);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
idx1{dim} = slice_size;
idx2{dim} = slice_size:slice_size:size(x,dim);
y = cat(dim,s(idx1{:}),diff(s(idx2{:}),[],dim))/slice_size;
end
Here is a generic solution, using the accumarray function. I haven't tested how fast it is. There might be some room for improvement though.
Basically, accumarray groups the value in x following a matrix of customized index for your question
x = reshape(1:30*30*20*300,30,30,20,300);
s = size(x);
% parameters for averaging
dimAv = 4;
n = 10;
% get linear index
ix = (1:numel(x))';
% transform them to a matrix of index per dimension
% this is a customized version of ind2sub
pcum = [1 cumprod(s(1:end-1))];
sub = zeros(numel(ix),numel(s));
for i = numel(s):-1:1,
ixtmp = rem(ix-1, pcum(i)) + 1;
sub(:,i) = (ix - ixtmp)/pcum(i) + 1;
ix = ixtmp;
end
% correct index for the given dimension
sub(:,dimAv) = floor((sub(:,dimAv)-1)/n)+1;
% run the accumarray to compute the average
sout = s;
sout(dimAv) = ceil(sout(dimAv)/n);
y = accumarray(sub,x(:), sout, #mean);
If you need a faster and memory efficient operation, you'll have to write your own mex function. It shouldn't be so difficult, I think !

Solve for independent variable between data points in MATLAB

I have many sets of data over the same time period, with a timestep of 300 seconds. Sets that terminate before the end of the observation period (here I've truncated it to 0 to 3000 seconds) have NaNs in the remaining spaces:
x = [0;300;600;900;1200;1500;1800;2100;2400;2700;3000];
y(:,1) = [4.65;3.67;2.92;2.39;2.02;1.67;1.36;1.07;NaN;NaN;NaN];
y(:,2) = [4.65;2.65;2.33;2.18;2.03;1.89;1.75;1.61;1.48;1.36;1.24];
y(:,3) = [4.65;2.73;1.99;1.49;1.05;NaN;NaN;NaN;NaN;NaN;NaN];
I would like to know at what time each dataset would reach the point where y is equal to a specific value, in this case y = 2.5
I first tried finding the nearest y value to 2.5, and then using the associated time, but this isn't very accurate (the dots should all fall on the same horizontal line):
ybreak = 2.5;
for ii = 1:3
[~, index] = min(abs(y(:,ii)-ybreak));
yclosest(ii) = y(index,ii);
xbreak(ii) = x(index);
end
I then tried doing a linear interpolation between data points, and then solving for x at y=2.5, but wasn't able to make this work:
First I removed the NaNs (which it seems like there must be a simpler way of doing?):
for ii = 1:3
NaNs(:,ii) = isnan(y(:,ii));
for jj = 1:length(x);
if NaNs(jj,ii) == 0;
ycopy(jj,ii) = y(jj,ii);
end
end
end
Then tried fitting:
for ii = 1:3
f(ii) = fit(x(1:length(ycopy(:,ii))),ycopy(:,ii),'linearinterp');
end
And get the following error message:
Error using cfit/subsasgn (line 7)
Can't assign to an empty FIT.
When I try fitting outside the loop (for just one dataset), it works fine:
f = fit(x(1:length(ycopy(:,1))),ycopy(:,1),'linearinterp');
f =
Linear interpolant:
f(x) = piecewise polynomial computed from p
Coefficients:
p = coefficient structure
But I then still can't solve f(x)=2.5 to find the time at which y=2.5
syms x;
xbreak = solve(f(x) == 2.5,x);
Error using cfit/subsref>iParenthesesReference (line 45)
Cannot evaluate CFIT model for some reason.
Error in cfit/subsref (line 15)
out = iParenthesesReference( obj, currsubs );
Any advice or thoughts on other approaches to this would be much appreciated. I need to be able to do it for many many datasets, all of which have different numbers of NaN values.
As you mention y=2.5 is not in your data set so the value of x which corresponds to this depends on the interpolation method you use. For linear interpolation, you could use something like the following
x = [0;300;600;900;1200;1500;1800;2100;2400;2700;3000];
y(:,1) = [4.65;3.67;2.92;2.39;2.02;1.67;1.36;1.07;NaN;NaN;NaN];
y(:,2) = [4.65;2.65;2.33;2.18;2.03;1.89;1.75;1.61;1.48;1.36;1.24];
y(:,3) = [4.65;2.73;1.99;1.49;1.05;NaN;NaN;NaN;NaN;NaN;NaN];
N = size(y, 2);
x_interp = NaN(N, 1);
for i = 1:N
idx = find(y(:,i) >= 2.5, 1, 'last');
x_interp(i) = interp1(y(idx:idx+1, i), x(idx:idx+1), 2.5);
end
figure
hold on
plot(x, y)
scatter(x_interp, repmat(2.5, N, 1))
hold off
It's worth keeping in mind that the above code is assuming your data is monotonically decreasing (as your data is), but this solution could be adapted for monotonically increasing as well.

MATLAB LOOPS: Inserting values from a big array to a small array

I have a vector named signal consisting of 300001 values. In each iteration of the for loop, I want to pick up 2000 consecutive values from this vector and store it in another vector X (X is 1*2000 vector)
The code is as follows:
D = 1:300001;
A = zeros(1,2000);
r=1;
n=0;
m=1;
for i=1:300001
for p = (1+(2000*n)):(r*2000)
while m<2000
A(1,m)= signal(1,p);
%disp (m);
m = m+1;
end
end
r = r+1;
n = n+1;
m = 1;
end
But it gives me the error "Index exceeds matrix dimensions.
Can somebody help me out with a better way to do it?
this would work
signal = ones(1,30000);
index1= 1:2000:length(signal);
index2= 2000:2000:length(signal);
for i=1:length(index1)
A = signal(index1(i):index2(i));
end
or this
signal = ones(1,30000);
temp = reshape(signal,2000,[]);
for i = 1:size(temp,2)
A=temp(:,i);
end

How do I index codistributed arrays in a spmd block

I am doing a very large calculation (atmospheric absorption) that has a lot of individual narrow peaks that all get added up at the end. For each peak, I have pre-calculated the range over which the value of the peak shape function is above my chosen threshold, and I am then going line by line and adding the peaks to my spectrum. A minimum example is given below:
X = 1:1e7;
K = numel(a); % count the number of peaks I have.
spectrum = zeros(size(X));
for k = 1:K
grid = X >= rng(1,k) & X <= rng(2,k);
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(k),b(k),c(k)]);
end
Here, each peak has some parameters that define the position and shape (a,b,c), and a range over which to do the calculation (rng). This works great, and on my machine it benchmarks at around 220 seconds to do a complete data set. However, I have a 4 core machine and I would eventually like to run this on a cluster, so I'd like to parallelize it and make it scaleable.
Because each loop relies on the results of the previous iteration, I cannot use parfor, so I am taking my first step into learning how to use spmd blocks. My first try looked like this:
X = 1:1e7;
cores = matlabpool('size');
K = numel(a);
spectrum = zeros(size(X),cores);
spmd
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid,labindex) = spectrum(grid,labindex) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]);
end
end
finalSpectrum = sum(spectrum,2);
This almost works. The program crashes at the last line because spectrum is of type Composite, and the documentation for 2013a is spotty on how to turn Composite data into a matrix (cell2mat does not work). This also does not scale well because the more cores I have, the larger the matrix is, and that large matrix has to get copied to each worker, which then ignores most of the data. Question 1: how do I turn a Composite data type into a useable array?
The second thing I tried was to use a codistributed array.
spmd
spectrum = codistributed.zeros(K,cores);
disp(size(getLocalPart(spectrum)))
end
This tells me that each worker has a single vector of size [K 1], which I believe is what I want, but when I try to then meld the above methods
spmd
spectrum = codistributed.zeros(K,cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]); end
finalSpectrum = gather(spectrum);
end
finalSpectrum = sum(finalSpectrum,2);
I get Matrix dimensions must agree errors. Since it's in a parallel block, I can't use my normal debugging crutch of stepping through the loop and seeing what the size of each block is at each point to see what's going on. Question 2: what is the proper way to index into and out of a codistributed array in an spmd block?
Regarding question#1, the Composite variable in the client basically refers to a non-distributed variant array stored on the workers. You can access the array from each worker by {}-indexing using its corresponding labindex (e.g: spectrum{1}, spectrum{2}, ..).
For your code that would be: finalSpectrum = sum(cat(2,spectrum{:}), 2);
Now I tried this problem myself using random data. Below are three implementations to compare (see here to understand the difference between distributed and nondistributed arrays). First we start with the common data:
len = 100; % spectrum length
K = 10; % number of peaks
X = 1:len;
% random position and shape parameters
a = rand(1,K); b = rand(1,K); c = rand(1,K);
% random peak ranges (lower/upper thresholds)
ranges = sort(randi([1 len], [2 K]));
% dummy peakfn() function
fcn = #(x,a,b,c) x+a+b+c;
% prepare a pool of MATLAB workers
matlabpool open
1) Serial for-loop:
spectrum = zeros(size(X));
for i=1:size(ranges,2)
r = ranges(:,i);
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + fcn(X(idx), a(i), b(i), c(i));
end
s1 = spectrum;
clear spectrum i r idx
2) SPMD with Composite array
spmd
spectrum = zeros(1,len);
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s2 = sum(vertcat(spectrum{:}));
clear spectrum i r idx ind
3) SPMD with co-distributed array
spmd
spectrum = zeros(numlabs, len, codistributor('1d',1));
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(labindex,idx) = spectrum(labindex,idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s3 = sum(gather(spectrum));
clear spectrum i r idx ind
All three results should be equal (to within an acceptably small margin of error)
>> max([max(s1-s2), max(s1-s3), max(s2-s3)])
ans =
2.8422e-14

Traverse matrix in segments

I have a huge waveform matrix:
[w,fs] = wavread('file.wav');
length(w)
ans =
258048
I want to go through this matrix in segments (say 50) and get the maximum of these segments to compare it to another value. I tried this:
thold = max(w) * .04;
nwindows = 50;
left = 1;
right = length(w)/nwindows;
counter = 0;
for i = 1:nwindows
temp = w(left:right);
if (max(temp) > thold)
counter = counter + 1;
end
left = right;
right = right+right;
end
But MATLAB threw tons of warnings and gave me this error:
Index exceeds matrix dimensions.
Error in wlengthdur (line 17)
temp = w(left:right);
Am I close or way off course?
An alternative approach would be to use reshaped to arrange you vector in to a 2D matrix with number of row n and columns equal to ceil(length(w) / n) i.e. round up so that it is divisible as matlab matrices must be rectangular. This way you can find the max or whatever you need in one step without looping.
w = randn(47, 1);
%this needs to be a column vector, if yours isn't call w = w(:) to ensure that it is
n = 5;
%Pad w so that it's length is divisible by n
padded = [w; nan(n - mod(length(w), n), 1)];
segmented_w = reshape(padded, n, []);
max(segmented_w)