MATLAB: find midpoint between two mismatching time-series - matlab

In MATLAB (R2015b) I have to find the midpoint between two time series of different lengths (ca 2000 vs. 3000 rows), in both series the first column is time and second is a measurement. Such as A:
09:30:14 23
09:31:03 23.5
And B:
09:30:19 25.5
09:30:37 25
09:31:12 24.5
How can I get MATLAB to calculate the midpoint value between A and B and get the result as shown below?
09:30:19 24.25 (Here it is 23+(25.5-23)/2)
09:30:37 24 (Here it is 23+(25-23)/2)
09:30:12 24 (Here it is 23.5+(24.5-23.5)/2)

You can use the interp1 function to estimate the value of one series at the time points corresponding to the other samples. Then the time points agree and you can just take the mean of the values.
interp1 supports several interpolation methods, such as nearest and linear.

Related

How to quickly/easily merge and average data in matrix in MATLAB?

I have got a matrix of AirFuelRatio values at certain engine speeds and throttlepositions. (eg. the AFR is 14 at 2500rpm and 60% throttle)
The matrix is now 25x10, and the engine speed ranges from 1200-6000rpm with interval 200rpm, the throttle range from 0.1-1 with interval 0.1.
Say i have measured new values, eg. an AFR of 13.5 at 2138rpm and 74,3% throttle, how do i merge that in the matrix? The matrix closest values are 2000 or 2200rpm and 70 or 80% throttle. Also i don't want new data to replace the older data. How can i make the matrix take this value in and adjust its values to take the new value in account?
Simplified i have the following x-axis values(top row) and 1x4 matrix(below):
2 4 6 8
14 16 18 20
I just measured an AFR value of 15.5 at 3 rpm. If you interpolate the AFR matrix you would've gotten a 15, so this value is out of the ordinary.
I want the matrix to take this data and adjust the other variables to it, ie. average everything so that the more data i put in the more reliable and accurate the matrix becomes. So in the simplified case the matrix would become something like:
2 4 6 8
14.3 16.3 18.2 20.1
So it averages between old and new data. I've read the documentation about concatenation but i believe my problem can't be solved with that function.
EDIT: To clarify my question, the following visual clarification.
The 'matrix' keeps the same size of 5 points whil a new data point is added. It takes the new data in account and adjusts the matrix accordingly. This is what i'm trying to achieve. The more scatterd data i get, the more accurate the matrix becomes. (and yes the green dot in this case would be an outlier, but it explains my case)
Cheers
This is not a matter of simple merge/average. I don't think there's a quick method to do this unless you have simplifying assumptions. What you want is a statistical inference of the underlying trend. I suggest using Gaussian process regression to solve this problem. There's a great MATLAB toolbox by Rasmussen and Williams called GPML. http://www.gaussianprocess.org/gpml/
This sounds more like a data fitting task to me. What you are suggesting is that you have a set of measurements for which you wish to get the best linear fit. Instead of producing a table of data, what you need is a table of values, and then find the best fit to those values. So, for example, I could create a matrix, A, which has all of the recorded values. Let's start with:
A=[2,14;3,15.5;4,16;6,18;8,20];
I now need a matrix of points for the inputs to my fitting curve (which, in this instance, lets assume it is linear, so is the set of values 1 and x)
B=[ones(size(A,1),1), A(:,1)];
We can find the linear fit parameters (where it cuts the y-axis and the gradient) using:
B\A(:,2)
Or, if you want the points that the line goes through for the values of x:
B*(B\A(:,2))
This results in the points:
2,14.1897 3,15.1552 4,16.1207 6,18.0517 8,19.9828
which represents the best fit line through these points.
You can manually extend this to polynomial fitting if you want, or you can use the Matlab function polyfit. To manually extend the process you should use a revised B matrix. You can also produce only a specified set of points in the last line. The complete code would then be:
% Original measurements - could be read in from a file,
% but for this example we will set it to a matrix
% Note that not all tabulated values need to be present
A=[2,14; 3,15.5; 4,16; 5,17; 8,20];
% Now create the polynomial values of x corresponding to
% the data points. Choosing a second order polynomial...
B=[ones(size(A,1),1), A(:,1), A(:,1).^2];
% Find the polynomial coefficients for the best fit curve
coeffs=B\A(:,2);
% Now generate a table of values at specific points
% First define the x-values
tabinds = 2:2:8;
% Then generate the polynomial values of x
tabpolys=[ones(length(tabinds),1), tabinds', (tabinds').^2];
% Finally, multiply by the coefficients found
curve_table = [tabinds', tabpolys*coeffs];
% and display the results
disp(curve_table);

How can I find the difference between two plots with a dimensional mismatch?

I have a question that I don't know if there is a solution off the bat.
Here it goes,
I have two data sets, plotted on the same figure. I need to find their difference, simple so far...
the problem arises in the fact that say matrix A has 1000 data points while the second (matrix B) has 580 data points. How will I be able to find the difference between the two graphs since there is a dimensional miss match between the two figures.
One way that I thought of is artificially inflating matrix B to 1000 data points, but the trend of the plot will remain the same. Would this be possible? and if yes how?
for example:
A=[1 45 33 4 1009 ];
B=[1 22 33 44 55 66 77 88 99 1010];
Ya=A.*20+4;
Yb=B./10+3;
C=abs(B - A)
plot(A,Ya,'r',B,Yb)
xlim([-100 1000])
grid on
hold on
plot(length(B),C)
One way to do it is to resample the 580 element vector to 1000 samples. Use matlab resample (requires the Signal Processing Toolbox, I believe) for this:
x = randn(580,1);
y = randn(1000,1);
xr = resample(x, 50,29); # 50/29 = 1000/580 is the resampling ratio
You should then be able to compare the two data vectors.
There are two ways that I can think of:
1- Matching the size:
Generating more data for the matrix with lower number of elements (using interpolation, etc.)
Removing some data from the matrix with higher number of elements (i.e. outlier removal)
2- Comparing the matrices with their properties.
For instance, you can calculate the mean and the covariance of a matrix and compare it to the other matrix. The other options include, cov , mean , median , std, var , xcorr , xcov.

How to convert distance into probability?

Сan anyone shine a light to my matlab program?
I have data from two sensors and i'm doing a kNN classification for each of them separately.
In both cases training set looks like a set of vectors of 42 rows total, like this:
[44 12 53 29 35 30 49;
54 36 58 30 38 24 37;..]
Then I get a sample, e.g. [40 30 50 25 40 25 30] and I want to classify the sample to its closest neighbor.
As a criteria of proximity I use Euclidean metrics, sqrt(sum(Y2)), where Y is a difference between each element and it gives me an array of distances between Sample and each Class of Training Set.
So, two questions:
Is it possible to convert distance into distribution of probabilities, something like: Class1: 60%, Class 2: 30%, Class 3: 5%, Class 5: 1%, etc.
added: Up to this moment I'm using formula: probability = distance/sum of distances, but I cannot plot a correct cdf or histogram.
This gives me a distribution in some way, but I see a problem there, because if distance is large, for example 700, then the closest class will get a biggest probability, but it'd be wrong because the distance is too big to be compared with any of classes.
If I would be able to get two probability density functions, I guess then I would do some product of them. Is it possible?
Any help or remark is highly appreciated.
I think there are multiple way of doing this:
as Adam suggested using 1/d / sum(1/d)
use the square, or even higher ordered of inverse of distance, e.g 1/d^2 / sum(1/d^2), This will make the class probability distribution more skewed. For example if 1/d generated 40%/60% prediction, the 1/d^2 may gave a 10%/90%.
use softmax (https://en.wikipedia.org/wiki/Softmax_function), the exponential of negative distance.
use exp(-d^2)/sigma^2 / sum[exp(-d^2)/sigma^2], this will imitate the Gaussian Distribution likelihoods. Sigma could be the average within-cluster distance, or simply set to 1 for all clusters.
You could try to inverse your distances to get a likelihood measure. I.e. the bigger the distance x, the smaller the inverse of it. Then, you can normalize as in probability = (1/distance) / (sum (1/distance) )
Hi: Have you ever tried with the formula probability = 1-distance assuming that you are using a standardized distance between 0 and 1?

Using SVMs for Regression

I need to use SVMs for regression.
I have y: a 261x1 vector and x: a 261x10 vector.
I would like to calculate 10 weights such that the weighted 10 values of x at each of the 261 data points mimic the y value.
However, when I run this using the libsvm package, I am getting 261 weights and not the 10 I want.
From my understanding, libsvm requires the x and y vector to be the same length and hence inputting the transpose of x and y will not work.
(Note: this is a portfolio optimization problem and 261 is the number of days, and 10 is the number of stocks)
I could not understand what 'weights' means but I suggest you to use libsvmwrite function to write your labels and feature vectors in the required format. and use libsvmread method to get the formatted data to pass as an input.

How to resample with interp1 in Matlab when input vectors are of different length

I have two variables in a .mat file here:
https://www.yousendit.com/download/UW13UGhVQXA4NVVQWWNUQw
testz is a vector of cumulative distance (in meters, monotonically and regularly increasing)
testSDT is a vector of integrated (cumulative) sound wave travel time (in milliseconds) generated using the distance vector and a vector of velocities
(there is an intermediate step of creating interval travel times)
Since velocity is a continuously variable function the resulting interval travelt times and also the integrated travel times are non integers and variable in magnitude
What I want is to resample the distance vector at regular time intervals (e.g. 1 ms, 2 ms, ..., n ms)
What makes it difficult is that the maximum travel time, 994.6659, is less than the number of samples in the 2 vectors, therefore it is not straightforward to use interp1.
i.e.:
X=testSDT -> 1680 samples
Y=testz -> 1680 samples
XI=[1:1:994] -> 994 samples
This is the code I've come up with. It is a working code and it is not too bad I think.
%% Initial chores
M=fix(max(testSDT));
L=(1:1:M);
%% Create indices
% this loops finds the samples in the integrated travel time vector
% that are closest to integer milliseconds and their sample number
for i=1:M
[cl(i) ind(i)] = min(abs(testSDT-L(i)));
nearest(i) = testSDT(ind(i));
end
%% Remove duplicates
% this is necessary to remove duplicates in the index vector (happens in this test).
% For example: 2.5 ms would be the closest to both 2 ms and 2 ms
[clsst,ia,ic] = unique(nearest);
idx=(ind(ia));
%% Interpolation
% this uses the index vectors to resample the depth vectors at
% integer times
newz=interp1(clsst,testz(idx),[1:1:length(idx)],'cubic')';
As far as I can see there is one issue with this code:
I rely on the vector idx as my XI for interpolation. Vector idx is 1 sample shorter than vector ind (one duplicate was removed).
Therefore my new times will stop one millisecond short. This is a very small issue, and duplicate are unlikely but I am wondering if anybody can think of a workaround, or of a different way to approach the problem altogether.
Thank you
If I understand you correctly, you want to extrapolate to that extra point.
you can do this is many ways, one is to add that extra point to the interp1 line.
If you have some function you expect to follow your data you can use it by fitting it to the data and then obtaining that extra point or with a tool like fnxtr.
But I have a problem understanding what you want because of the way you used the line. The third argument you use, [1:1:length(idx)], is just the series [1 2 3 ...], usually when interpolating, one uses some vector x_i of points of interest, though I doubt your points of interest happen to be the series of integers 1:length(idx), what you want is just [1:length(idx) xi], where xi is that extra point x-axis value.
EDIT:
Instead of the loop just produce matrix forms out of L and testSDT, then matrix operation is somewhat faster in doing the min(abs(...:
MM=ones(numel(testSDT),1)*L;
TT=testSDT*ones(1,numel(L));
[cl ind]=(min(abs(TT-MM)));
nearest=testSDT(ind);