How do I build a matrix using two vectors? - matlab

So I need to build a matrix of x and y coordinates. I have the x stored in one matrix called vx=0:6000; and y stored in Vy=repmat(300,1,6000);.
Values in x are 0,1,2,...,5999,6000.
Values in y are 300,300,...,300,300.
How do I build a "vector" with the x,y coordinates above?
It would look like this [(0,300);(1,300);...;(5999,300);(6000,300)].
After I finish doing this, I am going to want to find the distance between another fixed point x,y (that I will replicate 6000 times) and the vector above, in order to make a distance graph over time.
Thank you so much!

You can just use horizontal concatenation with []
X = [Vx(:), Vy(:)];
If you want to compute the distance between another point and every point in this 2D array, you could do the following:
point = [10, 100];
distances = sqrt(sum(bsxfun(#minus, X, point).^2, 2));
If you have R2016b or newer you can simply do
distances = sqrt(sum((X - point).^2, 2));

A slightly more elegant alternative (in my opinion) is the following:
Vx = (0:1:6000).';
C = [Vx 0*Vx+300]; % Just a trick to avoid the overly verbose `repmat`.
p = [10,100]; % Define some point of reference.
d = pdist2(C,p); % The default "distance type" is 'euclidian' - which is what you need.
This uses the pdist2 function, introduced in MATLAB 2010a, and requires the Statistics and Machine Learning Toolbox.

Related

Finding the nearest neighbor to a single point in MATLAB

I'm trying to do a nearest neighbor search that yields a single point as the single "nearest neighbor" to another point in matlab.
I've got the following data:
A longitude grid that is size 336x264 "lon"
some random point within the bounds of the longitude grid "dxf"
I've tried using MATLAB's "knnsearch" function
https://www.mathworks.com/help/stats/knnsearch.html
But sadly when I use the command:
idx = knnsearch(lon, dxf)
I am met with the error:
"Y must be a matrix with 264 columns."
Is there an alternative nearest neighbor search I can use to find the nearest neighbor to a single point within MATLAB? Is there a simpler solution I can implement?
I literally just want to find the closest point within the "lon" matrix to point "dxf".
Thanks!
Taylor
You should first convert your grid to an n-by-2 matrix (if you created this using meshgrid, it's simply G = [XX(:) YY(:)]), you can then try it with pdist2 if you have the Statistics and Machine Learning Toolbox (which you do):
[D,I] = pdist2(P, G, 'euclidian', 'Smallest', 1);
Where G is the grid and P is your m-by-2 array of points to test.
If you're working without Toolboxes, you can construct a simple distance formula yourself:
xx = [0:364]; % Not sure what your limits were so just making some up here
yy = [0:264];
[X, Y] = meshgrid(xx,yy);
dxf = [221.7, 109.1]; % Again just pulling numbers from nether regions
G = [X(:),Y(:)];
d = sqrt( sum( (G-dxf).^2, 2) );
[minDist, idxMinDist] = min(d);
solution = G(idxMinDist,:);
You can modify the limits for xx and yy to fit your specific setup accordingly.

Finding roots in data

I have data that look like this:
These are curves of the same process but with different parameters.
I need to find the index (or x value) for certain y values (say, 10).
For the blue curve, this is easy: I'm using min to find the index:
[~, idx] = min(abs(y - target));
where y denotes the data and target the wanted value.
This approach works fine since I know that there is an intersection, and only one.
Now what to do with the red curve? I don't know beforehand, if there will be two intersections, so my idea of finding the first one and then stripping some of the data is not feasible.
How can I solve this?
Please note the the curves can shift in the x direction, so that checking the found solution for its xrange is not really an option (it could work for the data I have, but since there are more to come, this solution is probably not the best).
Shameless steal from here:
function x0 = data_zeros(x,y)
% Indices of Approximate Zero-Crossings
% (you can also use your own 'find' method here, although it has
% this pesky difference of 1-missing-element because of diff...)
dy = find(y(:).*circshift(y(:), [-1 0]) <= 0);
% Do linear interpolation of near-zero-crossings
x0 = NaN(size(dy,1)-1,1);
for k1 = 1:size(dy,1)-1
b = [[1;1] [x(dy(k1)); x(dy(k1)+1)]] \ ...
[y(dy(k1)); y(dy(k1)+1)];
x0(k1) = -b(1)/b(2);
end
end
Usage:
% Some data
x = linspace(0, 2*pi, 1e2);
y = sin(x);
% Find zeros
xz = data_zeros1(x,y);
% Plot original data and zeros found
figure(1), hold on
plot(x, y);
plot(xz, zeros(size(xz)), '+r');
axis([0,2*pi -1,+1]);
The gist: multiply all data points with their consecutive data points. Any of these products that is negative therefore has opposite sign, and gives you an approximate location of the zero. Then use linear interpolation between the same two points to get a more precise answer, and store that.
NOTE: for zeros exactly at the endpoints, this approach will not work. Therefore, it may be necessary to check those manually.
Subtract the desired number from your curve, i.e. if you want the values at 10 do data-10, then use and equality-within-tolerance, something like
TOL = 1e-4;
IDX = 1:numel(data(:,1)); % Assuming you have column data
IDX = IDX(abs(data-10)<=TOL);
where logical indexing has been used.
I figured out a way: The answer by b3 in this question did the trick.
idx = find(diff(y > target));
Easy as can be :) The exact xvalue can then be found by interpolation. For me, this is fine since i don't need exact values.

find the line which best fits to the data

I'm trying to find the line which best fits to the data. I use the following code below but now I want to have the data placed into an array sorted so it has the data which is closest to the line first how can I do this? Also is polyfit the correct function to use for this?
x=[1,2,2.5,4,5];
y=[1,-1,-.9,-2,1.5];
n=1;
p = polyfit(x,y,n)
f = polyval(p,x);
plot(x,y,'o',x,f,'-')
PS: I'm using Octave 4.0 which is similar to Matlab
You can first compute the error between the real value y and the predicted value f
err = abs(y-f);
Then sort the error vector
[val, idx] = sort(err);
And use the sorted indexes to have your y values sorted
y2 = y(idx);
Now y2 has the same values as y but the ones closer to the fitting value first.
Do the same for x to compute x2 so you have a correspondence between x2 and y2
x2 = x(idx);
Sembei Norimaki did a good job of explaining your primary question, so I will look at your secondary question = is polyfit the right function?
The best fit line is defined as the line that has a mean error of zero.
If it must be a "line" we could use polyfit, which will fit a polynomial. Of course, a "line" can be defined as first degree polynomial, but first degree polynomials have some properties that make it easy to deal with. The first order polynomial (or linear) equation you are looking for should come in this form:
y = mx + b
where y is your dependent variable and X is your independent variable. So the challenge is this: find the m and b such that the modeled y is as close to the actual y as possible. As it turns out, the error associated with a linear fit is convex, meaning it has one minimum value. In order to calculate this minimum value, it is simplest to combine the bias and the x vectors as follows:
Xcombined = [x.' ones(length(x),1)];
then utilized the normal equation, derived from the minimization of error
beta = inv(Xcombined.'*Xcombined)*(Xcombined.')*(y.')
great, now our line is defined as Y = Xcombined*beta. to draw a line, simply sample from some range of x and add the b term
Xplot = [[0:.1:5].' ones(length([0:.1:5].'),1)];
Yplot = Xplot*beta;
plot(Xplot, Yplot);
So why does polyfit work so poorly? well, I cant say for sure, but my hypothesis is that you need to transpose your x and y matrixies. I would guess that that would give you a much more reasonable line.
x = x.';
y = y.';
then try
p = polyfit(x,y,n)
I hope this helps. A wise man once told me (and as I learn every day), don't trust an algorithm you do not understand!
Here's some test code that may help someone else dealing with linear regression and least squares
%https://youtu.be/m8FDX1nALSE matlab code
%https://youtu.be/1C3olrs1CUw good video to work out by hand if you want to test
function [a0 a1] = rtlinreg(x,y)
x=x(:);
y=y(:);
n=length(x);
a1 = (n*sum(x.*y) - sum(x)*sum(y))/(n*sum(x.^2) - (sum(x))^2); %a1 this is the slope of linear model
a0 = mean(y) - a1*mean(x); %a0 is the y-intercept
end
x=[65,65,62,67,69,65,61,67]'
y=[105,125,110,120,140,135,95,130]'
[a0 a1] = rtlinreg(x,y); %a1 is the slope of linear model, a0 is the y-intercept
x_model =min(x):.001:max(x);
y_model = a0 + a1.*x_model; %y=-186.47 +4.70x
plot(x,y,'x',x_model,y_model)

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

Finding the belonging value of given point on a grid of 3D histogram?

I use 2D dataset like below,
37.0235000000000 18.4548000000000
28.4454000000000 15.7814000000000
34.6958000000000 20.9239000000000
26.0374000000000 17.1070000000000
27.1619000000000 17.6757000000000
28.4101000000000 15.9183000000000
33.7340000000000 17.1615000000000
34.7948000000000 18.2695000000000
34.5622000000000 19.3793000000000
36.2884000000000 18.4551000000000
26.1695000000000 16.8195000000000
26.2090000000000 14.2081000000000
26.0264000000000 21.8923000000000
35.8194000000000 18.4811000000000
to create a 3D histogram.
How can I find the histogram value of a point on a grid? For example, if [34.7948000000000 18.2695000000000] point is given, I would like to find the corresponding value of a histogram for a given point on the grid.
I used this code
point = feat_vec(i,:); // take the point given by the data set
X = centers{1}(1,:); // take center of the bins at one dimension
Y = centers{2}(1,:); // take center of the bins at other dim.
distanceX = abs(X-point(1)); // find distance to all bin centers at one dimension
distanceY = abs(Y-point(2)); // find distance to center points of other dimension
[~,indexX] = min(distanceX); // find the index of minimum distant center point
[~,indexY] = min(distanceY); // find the index of minimum distant center point for other dimension
You could use interp2 to accomplish that!
If X (1-D Vector, length N) and Y (1-D vector, length M) determine discrete coordinate on the axes where your histogram has defined values Z (matrix, size M x N). Getting value for one particular point with coordinates (XI, YI) could be done with:
% generate grid
[XM, YM] = meshgrid(X, Y);
% interpolate desired value
ZI = interp2(XM, YM, Z, XI, YI, 'spline')
In general, this kind of problem is interpolation problem. If you would want to get values for multiple points, you would have to generate grid for them in similar fashion done in code above. You could also use another interpolating method, for example linear (refer to linked documentation!)
I think you mean this:
[N,C] = hist3(X,...) returns the positions of the bin centers in a
1-by-2 cell array of numeric vectors, and does not plot the histogram.
That being said, if you have a 2D point x=[x1, x2], you are only to look up the closest points in C, and take the corresponding value in N.
In Matlab code:
[N, C] = hist3(data); % with your data format...
[~,indX] = min(abs(C{1}-x(1)));
[~,indY] = min(abs(C{2}-x(2)));
result = N(indX,indY);
done. (You can make it into your own function say result = hist_val(data, x).)
EDIT:
I just saw, that my answer in essence is just a more detailed version of #Erogol's answer.