Finding roots in data - matlab

I have data that look like this:
These are curves of the same process but with different parameters.
I need to find the index (or x value) for certain y values (say, 10).
For the blue curve, this is easy: I'm using min to find the index:
[~, idx] = min(abs(y - target));
where y denotes the data and target the wanted value.
This approach works fine since I know that there is an intersection, and only one.
Now what to do with the red curve? I don't know beforehand, if there will be two intersections, so my idea of finding the first one and then stripping some of the data is not feasible.
How can I solve this?
Please note the the curves can shift in the x direction, so that checking the found solution for its xrange is not really an option (it could work for the data I have, but since there are more to come, this solution is probably not the best).

Shameless steal from here:
function x0 = data_zeros(x,y)
% Indices of Approximate Zero-Crossings
% (you can also use your own 'find' method here, although it has
% this pesky difference of 1-missing-element because of diff...)
dy = find(y(:).*circshift(y(:), [-1 0]) <= 0);
% Do linear interpolation of near-zero-crossings
x0 = NaN(size(dy,1)-1,1);
for k1 = 1:size(dy,1)-1
b = [[1;1] [x(dy(k1)); x(dy(k1)+1)]] \ ...
[y(dy(k1)); y(dy(k1)+1)];
x0(k1) = -b(1)/b(2);
end
end
Usage:
% Some data
x = linspace(0, 2*pi, 1e2);
y = sin(x);
% Find zeros
xz = data_zeros1(x,y);
% Plot original data and zeros found
figure(1), hold on
plot(x, y);
plot(xz, zeros(size(xz)), '+r');
axis([0,2*pi -1,+1]);
The gist: multiply all data points with their consecutive data points. Any of these products that is negative therefore has opposite sign, and gives you an approximate location of the zero. Then use linear interpolation between the same two points to get a more precise answer, and store that.
NOTE: for zeros exactly at the endpoints, this approach will not work. Therefore, it may be necessary to check those manually.

Subtract the desired number from your curve, i.e. if you want the values at 10 do data-10, then use and equality-within-tolerance, something like
TOL = 1e-4;
IDX = 1:numel(data(:,1)); % Assuming you have column data
IDX = IDX(abs(data-10)<=TOL);
where logical indexing has been used.

I figured out a way: The answer by b3 in this question did the trick.
idx = find(diff(y > target));
Easy as can be :) The exact xvalue can then be found by interpolation. For me, this is fine since i don't need exact values.

Related

How do I build a matrix using two vectors?

So I need to build a matrix of x and y coordinates. I have the x stored in one matrix called vx=0:6000; and y stored in Vy=repmat(300,1,6000);.
Values in x are 0,1,2,...,5999,6000.
Values in y are 300,300,...,300,300.
How do I build a "vector" with the x,y coordinates above?
It would look like this [(0,300);(1,300);...;(5999,300);(6000,300)].
After I finish doing this, I am going to want to find the distance between another fixed point x,y (that I will replicate 6000 times) and the vector above, in order to make a distance graph over time.
Thank you so much!
You can just use horizontal concatenation with []
X = [Vx(:), Vy(:)];
If you want to compute the distance between another point and every point in this 2D array, you could do the following:
point = [10, 100];
distances = sqrt(sum(bsxfun(#minus, X, point).^2, 2));
If you have R2016b or newer you can simply do
distances = sqrt(sum((X - point).^2, 2));
A slightly more elegant alternative (in my opinion) is the following:
Vx = (0:1:6000).';
C = [Vx 0*Vx+300]; % Just a trick to avoid the overly verbose `repmat`.
p = [10,100]; % Define some point of reference.
d = pdist2(C,p); % The default "distance type" is 'euclidian' - which is what you need.
This uses the pdist2 function, introduced in MATLAB 2010a, and requires the Statistics and Machine Learning Toolbox.

How to find if a value lies in between bounds of a histogram?

I have an empirical set of data (Hypothetically x=normrnd(10,3,1000,1);) which has a cumulative distribution function as follows:
I also have a set of data x1=[11,11.1,10.1]. I'd like to find the probability of finding the values of x1 if they came from the distribution x. If it was a continuous known function I could evaluate it exactly but I'd like to do it from the data I have. Any thoughts?
By hand I would find the value on the x axis and trace up to the line and across to the F(x) axis (see figure 1).
EDIT:
size(x1)
10,0000
So I now have found out how to get the data that plots F(x)
handles=cdfplot(X);
xdata=get(handles,'XData');
ydata=get(handles,'YData');
I think now it's a case of finding the location of x in an interval in xdata and subsequently the location in ydata.
e.g.
for i=1:length(x)
for j=1:length(xdata)
if x(i,1)<=xdata(jj,1)
X(i)=xdata(jj,1);
end
end
end
Y=ydata(X);
Is this the most elegant way?
There is a much more elegant way to do that using bsxfun. Also, you can just calculate empirical CDF using ecdf instead of cdfplot (unless you actually need the plot):
x = normrnd(10,3,1000,1);
[f_data, x_data] = ecdf(x);
x1 = [11, 11.1, 10.1];
idx = sum(bsxfun(#le, x_data(:)', x1(:)), 2);
y1 = f_data(idx);

Matlab reversed 2d interpolation interp2

I have a function V that is computed from two inputs (X,Y). Since the computation is quite demanding I just perform it on a grid of points and would like to rely on 2d linear interpolation. I now want to inverse that function for fixed Y. So basically my starting point is:
X = [1,2,3];
Y = [1,2,3];
V =[3,4,5;6,7,8;9,10,11];
Is is of course easy to obtain V at any combination of (X,Y), for instance:
Vq = interp2(X,Y,V,1.8,2.5)
gives
Vq =
8.3000
But how would I find X for given V and Y using 2d linear interploation? I will have to perform this task a lot of times, so I would need a quick and easy to implement solution.
Thank you for your help, your effort is highly appreciated.
P.
EDIT using additional info
If not both x and y have to be found, but one of them is given, this problem reduces to finding a minimum in only 1 direction (i.e. in x-direction). A simple approach is formulating this in a problem which can be minizmied by an optimization routine such as fminsearch. Therefore we define the function f which returns the difference between the value Vq and the result of the interpolation. We try to find the x which minimizes this difference, after we give an intial guess x0. Depending on this initial guess the result will be what we are looking for:
% Which x value to choose if yq and Vq are fixed?
xq = 1.8; % // <-- this one is to be found
yq = 2.5; % // (given)
Vq = interp2(X,Y,V,xq,yq); % // 8.3 (given)
% this function will be minimized (difference between Vq and the result
% of the interpolation)
f = #(x) abs(Vq-interp2(X, Y, V, x, yq));
x0 = 1; % initial guess)
x_opt = fminsearch(f, x0) % // solution found: 1.8
Nras, thank you very much. I did something else in the meantime:
function [G_inv] = G_inverse (lambda,U,grid_G_inverse,range_x,range_lambda)
for t = 1:size(U,1)
for i = 1:size(U,2)
xf = linspace(range_x(1), range_x(end),10000);
[Xf,Yf] = meshgrid(xf,lambda);
grid_fine = interp2(range_x,range_lambda,grid_G_inverse',Xf,Yf);
idx = find (abs(grid_fine-U(t,i))== min(min(abs(grid_fine-U(t,i))))); % find min distance point and take x index
G_inv(t,i)=xf(idx(1));
end
end
G_inv is supposed to contain x, U is yq in the above example and grid_G_inverse contains Vq. range_x and range_lambda are the corresponding vectors for the grid axis. What do you think about this solution, also compared to yours? I would guess mine is faster but less accurate. Spped is, however, a major issue in my code.

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

Sequential connecting points in 2D in Matlab

I was wondering if you could advise me how I can connect several points together exactly one after each other.
Assume:
data =
x y
------------------
591.2990 532.5188
597.8405 558.6672
600.0210 542.3244
606.5624 566.2938
612.0136 546.6825
616.3746 570.6519
617.4648 580.4575
619.6453 600.0688
629.4575 557.5777
630.5477 584.8156
630.5477 618.5906
639.2696 604.4269
643.6306 638.2019
646.9013 620.7697
652.3525 601.1584
"data" is coordinate of points.
Now, I would like to connect(plot) first point(1st array) to second point, second point to third point and so on.
Please mind that plot(data(:,1),data(:,2)) will give me the same result. However, I am looking for a loop which connect (plot) each pair of point per each loop.
For example:
data1=data;
figure
scatter(X,Y,'.')
hold on
for i=1:size(data,1)
[Liaa,Locbb] = ismember(data(i,:),data1,'rows');
data1(Locbb,:)=[];
[n,d] = knnsearch(data1,data(i,:),'k',1);
x=[data(i,1) data1(n,1)];
y=[data(i,2) data1(n,2)];
plot(x,y);
end
hold off
Although, the proposed loop looks fine, I want a kind of plot which each point connect to maximum 2 other points (as I said like plot(x,y))
Any help would be greatly appreciated!
Thanks for all of your helps, finally a solution is found:
n=1;
pt1=[data(n,1), data(n,2)];
figure
scatter(data(:,1),data(:,2))
hold on
for i=1:size(data,1)
if isempty(pt1)~=1
[Liaa,Locbb] = ismember(pt1(:)',data,'rows');
if Locbb~=0
data(Locbb,:)=[];
[n,d] = knnsearch(data,pt1(:)','k',1);
x=[pt1(1,1) data(n,1)];
y=[pt1(1,2) data(n,2)];
pt1=[data(n,1), data(n,2)];
plot(x,y);
end
end
end
hold off
BTW it is possible to delete the last longest line as it is not related to the question, if someone need it please let me know.
You don't need to use a loop at all. You can use interp1. Specify your x and y data points as control points. After, you can specify a finer set of points from the first x value to the last x value. You can specify a linear spline as this is what you want to accomplish if the behaviour you want is the same as plot. Assuming that data is a 2D matrix as you have shown above, without further ado:
%// Get the minimum and maximum x-values
xMin = min(data(:,1));
xMax = max(data(:,1));
N = 3000; % // Specify total number of points
%// Create an array of N points that linearly span from xMin to xMax
%// Make N larger for finer resolution
xPoints = linspace(xMin, xMax, N);
%//Use the data matrix as control points, then xPoints are the values
%//along the x-axis that will help us draw our lines. yPoints will be
%//the output on the y-axis
yPoints = interp1(data(:,1), data(:,2), xPoints, 'linear');
%// Plot the control points as well as the interpolated points
plot(data(:,1), data(:,2), 'rx', 'MarkerSize', 12);
hold on;
plot(xPoints, yPoints, 'b.');
Warning: You have two x values that map to 630.5477 but produce different y values. If you use interp1, this will give you an error, which is why I had to slightly perturb one of the values by a small amount to get this to work. This should hopefully not be the case when you start using your own data. This is the plot I get:
You'll see that there is a huge gap between those two points I talked about. This is the only limitation to interp1 as it assumes that the x values are strictly monotonically increasing. As such, you can't have the same two points in your set of x values.