So I have quite a few (over 60000) data points
f(x_k) = k, here k=0,1,2,...,N.
Function is monotonically increasing and visually looks pretty smooth. I would love to be able to find fitting F(x) such that for every x_k it so happens that k <= F(x_k) < k+1.
How should I approach this problem?
Data example
x 0 1 3 5 8 10 14 16 20 23 27 29 35 37 41
f(x) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(This looks a bit like a lookup table. Maybe an image processing application of some sort? I did some tools in my past life where an unrounding was needed.)
Is this a one time problem, or will you be doing it often, so you have a need for speed?
I'd throw it into SLM. Since I don't have the data, I cannot test it out or give you any results myself, but there is certainly no problem with an assured fit of the quality you wish as long as you use sufficient number of knots. You would need additional knots on the right hand side, as it appears to approach a vertical asymptote, thus a singularity. Splines in general tend not to like singularities, as they are still polynomials at heart.
Better yet, swap the x and y axes to do the fit, thus fitting x = f(y). The left end point is not an asymptote, so there is no longer a singularity. Now all you need do is constrain the result to be monotonic increasing, and concave down (thus everywhere a negative second derivative.) You will require far fewer knots for the inverse fit, but use enough knots that the fit is of adequate quality for your goals.
To use the inverse fit, simply interpolate in the reverse direction, something that SLMEVAL is capable of doing. I'll see how it does on the little bit of test data you have provided (with just the default number of knots):
x = [0 1 3 5 8 10 14 16 20 23 27 29 35 37 41];
y = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14];
slm = slmengine(y,x,'plot','on','increasing','on');
So the fit seems reasonable, but I note that your data seems a bit bumpy. It may indeed be difficult to get a solution that is smooth, yet fits entirely within your requirements.
Lets see how well it did:
[x;y;slmeval(x,slm,-1)]'
ans =
0 0 0.0190
1.0000 1.0000 0.9656
3.0000 2.0000 2.0522
5.0000 3.0000 2.9239
8.0000 4.0000 4.1096
10.0000 5.0000 4.8419
14.0000 6.0000 6.1963
16.0000 7.0000 6.8331
20.0000 8.0000 8.0638
23.0000 9.0000 8.9699
27.0000 10.0000 10.1459
29.0000 11.0000 10.7088
35.0000 12.0000 12.2942
37.0000 13.0000 12.8285
41.0000 14.0000 NaN
It misses the last point completely, refusing to extrapolate. But the remainder are not far off. They do fail your requirement though, as it is not true that
k <= F(x_k) < k+1
Of course, I did not build the spline with such a requirement in the specs. Were I to try to solve this problem in general, I might write code that would estimate the values on the curve directly, with no spline intermediary. Then I could easily enforce your constraints, finding the smoothest set of points that satisfies your error bar requirements and monotonicity, that also lies as close to the original data as is possible. Of course, that would involve a large system solve, with 60k unknowns. I don't know how lsqlin would handle that large of a problem, but there are other solvers that might be able to do so if time was an issue.
Again, with your test data as a small scale example:
x = [0 1 3 5 8 10 14 16 20 23 27 29 35 37 41]';
n = numel(x);
k = (0:(n-1))';
% The "unrounding" bound constraints
LB = k;
UB = k+1;
% The best fit possible
Afit = speye(n,n);
% And as smooth as possible
ind = 1:(n-2);
% could do this with diff of course
dx1 = x(ind+1) - x(ind);
dx2 = x(ind+2) - x(ind + 1);
% central second finite difference, for unequal spacing
den = dx1.*dx2.*(dx1 + dx2)/2;
Areg = spdiags([dx2./den,-(dx1 + dx2)./den,dx1./den],[0 1 2],n-2,n);
rhs = [k;zeros(n-2,1)];
% monotonicity constraints...
Amono = spdiags(repmat([1 -1],14,1),[0 1],n-1,n);
bmono = zeros(n-1,1);
% choose a value for r, that allows you to control the smoothness
% larger values of r will make the curve smoother, but the bounds
% will always be enforced. I played with it, and r = 5 seemed a
% reasonable compromise here.
r = 5;
yhat = lsqlin([Afit;r*Areg],rhs,Amono,bmono,[],[],LB,UB);
lsqlin is a bit unhappy, since it does not handle sparse problem of this form at this time. So it throws a warning that it is converting the problem to a full one.
Warning: Large-scale algorithm can handle bound constraints only;
using medium-scale algorithm instead.
> In lsqlin at 270
Warning: This problem formulation not yet available for sparse matrices.
Converting to full to solve.
> In lsqlin at 320
Optimization terminated.
Of course, this conversion will be TOTALLY unacceptable for a problem with 60k unknowns. DO NOT TRY IT ON 60k data points!!!!!!!!!!!!!!!! Your computer will go into a deep freeze.
How did it do though?
disp([x,k,yhat,k+1])
0 0 0.4356 1.0000
1.0000 1.0000 1.0000 2.0000
3.0000 2.0000 2.0504 3.0000
5.0000 3.0000 3.0000 4.0000
8.0000 4.0000 4.2026 5.0000
10.0000 5.0000 5.0000 6.0000
14.0000 6.0000 6.2739 7.0000
16.0000 7.0000 7.0000 8.0000
20.0000 8.0000 8.0916 9.0000
23.0000 9.0000 9.0000 10.0000
27.0000 10.0000 10.2497 11.0000
29.0000 11.0000 11.0000 12.0000
35.0000 12.0000 12.2994 13.0000
37.0000 13.0000 13.0000 14.0000
41.0000 14.0000 14.0594 15.0000
It worked nicely, although it would be a hog of obscene proportions for large problems as you have. Perhaps there is another optimizer (maybe in TOMLAB or some other package) that can handle a large scale sparse linear problem, subject to linear and bound constraints. You also might wish to force the first point through zero, but that is trivial to do.
A final option, is if say 1000 points is doable, to recreate the curve in batches of 1010 at a time using the above scheme. lsqlin should be able to handle problems of that size with no problem. Leave some overlap at the ends, 5 points in each overlap region should be sufficient. Then average the results in the overlap regions.
Related
Is there a way to calculate a moving mean in a way that the values at the beginning and at the end of the array are averaged with the ones at the opposite end?
For example, instead of this result:
A=[2 1 2 4 6 1 1];
movmean(A,2)
ans = 2.0 1.5 1.5 3.0 5 3.5 1.0
I want to obtain the vector [1.5 1.5 1.5 3 5 3.5 1.0], as the initial array element 2 would be averaged with the ending element 1.
Generalizing to an arbitrary window size N, this is how you can add circular behavior to movmean in the way you want:
movmean(A([(end-floor(N./2)+1):end 1:end 1:(ceil(N./2)-1)]), N, 'Endpoints', 'discard')
For the given A and N = 2, you get:
ans =
1.5000 1.5000 1.5000 3.0000 5.0000 3.5000 1.0000
For an arbitrary window size n, you can use circular convolution with an averaging mask defined as [1/n ... 1/n] (with n entries; in your example n = 2):
result = cconv(A, repmat(1/n, 1, n), numel(A));
Convolution offers some nice ways of doing this. Though, you may need to tweak your input slightly if you are only going to partially average the ends (i.e. the first is averaged with the last in your example, but then the last is not averaged with the first).
conv([A(end),A],[0.5 0.5],'valid')
ans =
1.5000 1.5000 1.5000 3.0000 5.0000 3.5000 1.0000
The generalized case here, for a moving average of size N, is:
conv(A([end-N+2:end, 1:end]),repmat(1/N,1,N),'valid')
I am trying to compute a moving average on multiple columns of a matrix. After reading some answers on stackoverflow, namely this one, it seemed that the filter function was the way to go. However, it does not ignore NaN elements, and I would like to do this ignoring NaN elements in the spirit of the function nanmean. Below a sample code:
X = rand(100,100); %generate sample matrix
X(sort(randi([1 100],1,10)),sort(randi([1 100],1,10))) = NaN; %put some random NaNs
windowlenght = 7;
MeanMA = filter(ones(1, windowlenght) / windowlenght, 1, X);
Use colfilt with nanmean:
>> A = [1 2 3 4 5; 2 nan nan nan 6; 3 nan nan nan 7; 4 nan nan nan 8; 5 6 7 8 9]
A =
1 2 3 4 5
2 NaN NaN NaN 6
3 NaN NaN NaN 7
4 NaN NaN NaN 8
5 6 7 8 9
>> colfilt(A, [3,3], 'sliding', #nanmean)
ans =
0.6250 1.1429 1.5000 2.5714 1.8750
1.1429 2.2000 3.0000 5.0000 3.1429
1.5000 3.0000 NaN 7.0000 3.5000
2.5714 5.0000 7.0000 7.8000 4.5714
1.8750 3.1429 3.5000 4.5714 3.1250
(if you only care about 'full' blocks, select inner rows / columns appropriately)
Alternatively, you can also use nlfilter, but you then need to be explicit (via an anonymous function handle) about what you'll be doing with the block; in particular, to work with nanmean such that it will produce a scalar output from the whole block, you'll need to convert each block to a column-vector before calling nanmean in your anonymous function:
>> nlfilter(A, [3,3], #(x) nanmean(x(:)))
ans =
0.6250 1.1429 1.5000 2.5714 1.8750
1.1429 2.2000 3.0000 5.0000 3.1429
1.5000 3.0000 NaN 7.0000 3.5000
2.5714 5.0000 7.0000 7.8000 4.5714
1.8750 3.1429 3.5000 4.5714 3.1250
However, for the record, matlab claims colfilt will generally be faster, so generally nlfilter is better reserved for situations where it doesn't make sense for your input to be converted to a column when processing each block.
Also see matlab's manual page/chapter on sliding operations in general.
If you have R2016a or beyond, you can use the movmean function with the 'omitnan' option.
Try
MeanMA = filter(ones(1, windowlenght) / windowlenght, 1, X(find(~isnan(X)));
This will extract the non-nan values from X.
The question is... do you still have a valid filter processing? If X is filled iteratively, one element per timestep, then the "NaN-Elimination" will produce a shorter vector which values are not aligned with the original time vector any more.
EDIT
To still have a valid mean calculation, the filter parameters must be updated according to the number of non-NaN values.
values = X(find(~isnan(X));
templength = length(values);
MeanMA = filter(ones(1, templength ) / templength , 1, values );
So I have this data I'd like plotted on loglog scale, with linear values on the y-axis and the values in dB on the x axis and
loglog(EbN0,BER)
outputs a nice looking curve, but the problem is the axis ticks. It's fine on the y-axis, but the x axis only has one tick, at 10^0and no other ticks. Furthermore, that tick corresponds to the absolute value, not the dB value. Any convenient way to accomplish this?
(Note that both EbN0 and BER contain absolute values)
EDIT: I'll add my data and explain what I want a bit more.
EbN0 =
Columns 1 through 14
0.5000 1.0000 1.5000 2.0000 2.5000 3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000 6.5000 7.0000
Columns 15 through 20
7.5000 8.0000 8.5000 9.0000 9.5000 10.0000
BER_TOT_ITER =
Columns 1 through 14
0.2928 0.2024 0.1183 0.0511 0.0164 0.0046 0.0010 0.0003 0.0001 0 0.0000 0.0000 0.0000 0
Columns 15 through 20
0 0 0 0 0 0
If I do plot(10*log10(EbN0),10*log10(BER_TOT_ITER)), I actually get exactly the graph I want and the dB values on the x axis, but now the y ticks are displayed in dB's instead of absolute values... so I just want to relabel the y ticks, NOT rescale the figure.
Relabeling the ticks is really the wrong approach here. You'd replace numerical values with strings and resizing etc. wouldn't work anymore.
Also your data does not fit to what you're actually looking at.
You should always try to transform your data first.
So besides loglog have a look at semilogx and semilogy, which allow you to have a single logarithmic axis.
To sum up, what you're looking for is:
semilogy(10*log10(EbN0), BER_TOT_ITER)
I have a matrix that looks something like this:
a=[1 1 2 2 3 3 4 4;
1.5 1.5 2.5 2.5 3.5 3.5 4.5 4.5]
what I would like to do is reshape this ie.
What I want is to take the 2x2 matrices next to one another and put them underneath each other.
So get:
b=[1 1;
1.5 1.5;
2 2;
2.5 2.5;
3 3;
3.5 3.5;
4 4;
4.5 4.5]
but I can't seem to manipulate the reshape function to do this for me
edit: the single line version might be a bit complicated, so I've also added one based on a for loop
2 reshapes and a permute should do it (we first split the matrices and store them in 3d), and then stack them. In order to stack them we first need to permute the dimensions (similar to a transpose).
>> reshape(permute(reshape(a,2,2,4),[1 3 2]),8,2)
ans =
1.0000 1.0000
1.5000 1.5000
2.0000 2.0000
2.5000 2.5000
3.0000 3.0000
3.5000 3.5000
4.0000 4.0000
4.5000 4.5000
the for loop based version is a bit more straight forward. We create an empty array of the correct size, and then insert each of the 2x2 matrices separately:
b=zeros(8,2);
for i=1:4,
b((2*i-1):(2*i),:) = a(:,(2*i-1):(2*i));
end
Using MATLAB, how can I find the 3-day moving average of a specific column of a matrix and append the moving average to that matrix? I am trying to compute the 3-day moving average from bottom to top of the matrix. I have provided my code:
Given the following matrix a and mask:
a = [1,2,3;4,5,6;7,8,9;10,11,12;13,14,15;16,17,18];
mask = ones(3,1);
I have tried implementing the conv command but I am receiving an error. Here is the conv command I have been trying to use on the 2nd column of matrix a:
a(:,4) = conv(a(:,2),mask,'valid');
The output I desire is given in the following matrix:
desiredOutput = [1,2,3,5;4,5,6,8;7,8,9,11;10,11,12,14;13,14,15,0;16,17,18,0;]
If you have any suggestions, I would greatly appreciate it. Thank you!
In general it would help if you would show the error. In this case you are doing two things wrong:
First your convolution needs to be divided by three (or the length of the moving average)
c = conv(a(:,2),mask,'valid')/3
c =
5
8
11
14
Second, notice the size of c. You cannot just fit c into a. The typical way of getting a moving average would be to use same:
a(:,4) = conv(a(:,2),mask,'same')/3
a =
1.0000 2.0000 3.0000 2.3333
4.0000 5.0000 6.0000 5.0000
7.0000 8.0000 9.0000 8.0000
10.0000 11.0000 12.0000 11.0000
13.0000 14.0000 15.0000 14.0000
16.0000 17.0000 18.0000 10.3333
but that doesn't look like what you want.
Instead you are forced to use a couple of lines:
c = conv(a(:,2),mask,'valid')/3;
a(1:length(c),4) = c
a =
1 2 3 5
4 5 6 8
7 8 9 11
10 11 12 14
13 14 15 0
16 17 18 0