Dynamic Time Warping for geology time series, Matlab - matlab

Background:
Basically I'm using a dynamic time warping algorithm like used in speech recognition to try to warp geological data (filter out noise from environmental conditions) The main difference between these two problems is that dtw prints a warping function that allows both vectors that are input to be warped, whereas for the problem I'm trying to solve I need to keep one reference vector constant while stretching and shrinking the test variable vector to fit.
here is dtw in matlab:
function [Dist,D,k,w]=dtw()
%Dynamic Time Warping Algorithm
%Dist is unnormalized distance between t and r
%D is the accumulated distance matrix
%k is the normalizing factor
%w is the optimal path
%t is the vector you are testing against
%r is the vector you are testing
[t,r,x1,x2]=randomtestdata();
[rows,N]=size(t);
[rows,M]=size(r);
%for n=1:N
% for m=1:M
% d(n,m)=(t(n)-r(m))^2;
% end
%end
d=(repmat(t(:),1,M)-repmat(r(:)',N,1)).^2; %this replaces the nested for loops from above Thanks Georg Schmitz
D=zeros(size(d));
D(1,1)=d(1,1);
for n=2:N
D(n,1)=d(n,1)+D(n-1,1);
end
for m=2:M
D(1,m)=d(1,m)+D(1,m-1);
end
for n=2:N
for m=2:M
D(n,m)=d(n,m)+min([D(n-1,m),D(n-1,m-1),D(n,m-1)]);
end
end
Dist=D(N,M);
n=N;
m=M;
k=1;
w=[];
w(1,:)=[N,M];
while ((n+m)~=2)
if (n-1)==0
m=m-1;
elseif (m-1)==0
n=n-1;
else
[values,number]=min([D(n-1,m),D(n,m-1),D(n-1,m-1)]);
switch number
case 1
n=n-1;
case 2
m=m-1;
case 3
n=n-1;
m=m-1;
end
end
k=k+1;
w=cat(1,w,[n,m]);
end
w=flipud(w)
%w is a matrix that looks like this:
% 1 1
% 1 2
% 2 2
% 3 3
% 3 4
% 3 5
% 4 5
% 5 6
% 6 6
so what this is saying is that the both the first and second points of the second vector should be mapped to the first point of the first vector. i.e. 1 1
1 2
and that the fifth and sixth points on the first vector should be mapped to the second vector at point six. etc. so w contains the x coordinates of the warped data.
Normally I would be able to say
X1=w(:,1);
X2=w(:,2);
for i=1:numel(reference vector)
Y1(i)=reference vector(X1(i));
Y2(i)=test vector(X2(i));
end
but I need not to stretch the reference vector so I need to use the repeats in X1 to know how to shrink Y2 and the repeats in X2 to know how to stretch Y2 rather than using repeats in X1 to stretch Y1 and repeats in X2 to stretch Y2.
I tried using a find method to find the repeats in both X1 and X2 and then average(shrink) or interpolate linearly(stretch) as needed but the code became very complicated and difficult to debug.
Was this really unclear? I had a hard time explaining this problem, but I just need to know how to take w and create a Y2 that is stretched and shrunk accordingly.

First, here's DTW in Matlab translated from the pseudocode on wikipedia:
t = 0:.1:2*pi;
x0 = sin(t) + rand(size(t)) * .1;
x1 = sin(.9*t) + rand(size(t)) * .1;
figure
plot(t, x0, t, x1);
hold on
DTW = zeros(length(x0), length(x1));
DTW(1,:) = inf;
DTW(:,1) = inf;
DTW(1,1) = 0;
for i0 = 2:length(x0)
for i1 = 2:length(x1)
cost = abs(x0(i0) - x1(i1));
DTW(i0, i1) = cost + min( [DTW(i0-1, i1) DTW(i0, i1-1) DTW(i0-1, i1-1)] );
end
end
Whether you are warping x_0 onto x_1, x_1 onto x_0, or warping them onto each other, you can get your answer out of the matrix DTW. In your case:
[cost, path] = min(DTW, [], 2);
plot(t, x1(path));
legend({'x_0', 'x_1', 'x_1 warped to x_0'});

I don't have an answer but I have been playing with the code of #tokkot implemented from the pseudocode in the Wikipedia article. It works, but I think it lacks three requeriments of DTW:
The first and last points of both sequences must be a match, with the use of min(), some (or many) of the first and ending points of one of the sequences are lost.
The output sequence is not monotonically increasing. I have used x1(sort(path)) instead, but I don't believe it is the real minimum distance.
Additionally, for a reason I haven't found yet, some intermediate points of the warped sequences are lost, which I believe is not compatible with DTW.
I'm still searching for an algorithm like DTW in which one of the sequences is fixed (not warped). I need to compare a time series of equally spaced temperature measurements with another sequence. The first one cannot be time shifted, it does not make sense.

Related

how to replace the ode45 method with the runge-kutta in this matlab?

I tried everything and looked everywhere but can't find any solution for my question.
clc
clear all
%% Solving the Ordinary Differential Equation
G = 6.67408e-11; %Gravitational constant
M = 10; %Mass of the fixed object
r = 1; %Distance between the objects
tspan = [0 100000]; %Time Progression from 0 to 100000s
conditions = [1;0]; %y0= 1m apart, v0=0 m/s
F=#(t,y)var_r(y,G,M,r);
[t,y]=ode45(F,tspan,conditions); %ODE solver algorithm
%%part1: Plotting the Graph
% plot(t,y(:,1)); %Plotting the Graph
% xlabel('time (s)')
% ylabel('distance (m)')
%% part2: Animation of Results
plot(0,0,'b.','MarkerSize', 40);
hold on %to keep the first graph
for i=1:length(t)
k = plot(y(i,1),0,'r.','MarkerSize', 12);
pause(0.05);
axis([-1 2 -2 2]) %Defining the Axis
xlabel('X-axis') %X-Axis Label
ylabel('Y-axis') %Y-Axis Label
delete(k)
end
function yd=var_r(y,G,M,r) %function of variable r
g = (G*M)/(r + y(1))^2;
yd = [y(2); -g];
end
this is the code where I'm trying to replace the ode45 with the runge kutta method but its giving me errors. my runge kutta function:
function y = Runge_Kutta(f,x0,xf,y0,h)
n= (xf-x0)/h;
y=zeros(n+1,1);
x=(x0:h:xf);
y(1) = y0;
for i=1:n
k1 = f(x(i),y(i));
k2= f(x(i)+ h/2 , y(i) +h*(k1)/2);
y(i+1) = y(i)+(h*k2);
end
plot(x,y,'-.M')
legend('RKM')
title ('solution of y(x)');
xlabel('x');
ylabel('y(x)')
hold on
end
Before converting your ode45( ) solution to manually written RK scheme, it doesn't even look like your ode45( ) solution is correct. It appears you have a gravitational problem set up where the initial velocity is 0 so a small object will simply fall into a large mass M on a line (rectilinear motion), and that is why you have scalar position and velocity.
Going with this assumption, r is something you should be calculating on the fly, not using as a fixed input to the derivative function. E.g., I would have expected something like this:
F=#(t,y)var_r(y,G,M); % get rid of r
:
function yd=var_r(y,G,M) % function of current position y(1) and velocity y(2)
g = (G*M)/y(1)^2; % gravity accel based on current position
yd = [y(2); -g]; % assumes y(1) is positive, so acceleration is negative
end
The small object must start with a positive initial position for the derivative code to be valid as you have it written. As the small object falls into the large mass M, the above will only hold until it hits the surface or atmosphere of M. Or if you model M as a point mass, then this scheme will become increasingly difficult to integrate correctly because the acceleration becomes large without bound as the small mass gets very close to the point mass M. You would definitely need a variable step size approach in this case. The solution becomes invalid if it goes "through" mass M. In fact, once the speed gets too large the whole setup becomes invalid because of relativistic effects.
Maybe you could explain in more detail if your system is supposed to be set up this way, and what the purpose of the integration is. If it is really supposed to be a 2D or 3D problem, then more states need to be added.
For your manual Runge-Kutta code, you completely forgot to integrate the velocity so this is going to fail miserably. You need to carry a 2-element state from step to step, not a scalar as you are currently doing. E.g., something like this:
y=zeros(2,n+1); % 2-element state as columns of the y variable
x=(x0:h:xf);
y(:,1) = y0; % initial state is the first 2-element column
% change all the scalar y(i) to column y(:,i)
for i=1:n
k1 = f(x(i),y(:,i));
k2= f(x(i)+ h/2 , y(:,i) +h*(k1)/2);
y(:,i+1) = y(:,i)+(h*k2);
end
plot(x,y(1,:),'-.M') % plot the position part of the solution
This is all assuming the f that gets passed in is the same F you have in your original code.
y(1) is the first scalar element in the data structure of y (this counts in column-first order). You want to generate in y a list of column vectors, as your ODE is a system with state dimension 2. Thus you need to generate y with that format, y=zeros(length(x0),n+1); and then address the list entries as matrix columns y(:,1)=x0 and the same modification in every place where you extract or assign a list entry.
Matlab introduce various short-cuts that, if used consequently, lead to contradictions (I think the script-hater rant (german) is still valid in large parts). Essentially, unlike in other systems, Matlab gives direct access to the underlying data structure of matrices. y(k) is the element of the underlying flat array (that is interpreted column-first in Matlab like in Fortran, unlike, e.g., Numpy where it is row-first).
Only the two-index access is to the matrix with its dimensions. So y(:,k) is the k-th matrix column and y(k,:) the k-th matrix row. The single-index access is nice for row or column vectors, but leads immediately to problems when collecting such vectors in lists, as these lists are automatically matrices.

Matlab Convolution regarding the conv() function and length()/size() function

I'm kind've new to Matlab and stack overflow to begin with, so if I do something wrong outside of the guidelines, please don't hesitate to point it out. Thanks!
I have been trying to do convolution between two functions and I have been having a hard time trying to get it to work.
t=0:.01:10;
h=exp(-t);
x=zeros(size(t)); % When I used length(t), I would get an error that says in conv(), A and B must be vectors.
x(1)=2;
x(4)=5;
y=conv(h,x);
figure; subplot(3,1,1);plot(t,x); % The discrete function would not show (at x=1 and x=4)
subplot(3,1,2);plot(t,h);
subplot(3,1,3);plot(t,y(1:length(t))); %Nothing is plotted here when ran
I commented my issues with the code. I don't understand the difference of length and size in this case and how it would make a difference.
For the second comment, x=1 should have an amplitude of 2. While x=4 should have an amplitude of 5. When plotted, it only shows nothing in the locations specified but looks jumbled up at x=0. I'm assuming that's the reason why the convoluted plot won't be displayed.
The original problem statement is given if it helps to understand what I was thinking throughout.
Consider an input signal x(t) that consists of two delta functions at t = 1 and t = 4 with amplitudes A1 = 5 and A2 = 2, respectively, to a linear system with impulse response h that is an exponential pulse (h(t) = e ^−t ). Plot x(t), h(t) and the output of the linear system y(t) for t in the range of 0 to 10 using increments of 0.01. Use the MATLAB built-in function conv.
The initial question regarding size vs length
length yields a scalar that is equal to the largest dimension of the input. In the case of your array, the size is 1 x N, so length yields N.
size(t)
% 1 1001
length(t)
% 1001
If you pass a scalar (N) to ones, zeros, or a similar function, it will create a square matrix that is N x N. This results in the error that you see when using conv since conv does not accept matrix inputs.
size(ones(length(t)))
% 1001 1001
When you pass a vector to ones or zeros, the output will be that size so since size returns a vector (as shown above), the output is the same size (and a vector) so conv does not have any issues
size(ones(size(t)))
% 1 1001
If you want a vector, you need to explicitly specify the number of rows and columns. Also, in my opinion, it's better to use numel to the number of elements in a vector as it's less ambiguous than length
z = zeros(1, numel(t));
The second question regarding the convolution output:
First of all, the impulses that you create are at the first and fourth index of x and not at the locations where t = 1 and t = 4. Since you create t using a spacing of 0.01, t(1) actually corresponds to t = 0 and t(4) corresponds to t = 0.03
You instead want to use the value of t to specify where to put your impulses
x(t == 1) = 2;
x(t == 4) = 5;
Note that due to floating point errors, you may not have exactly t == 1 and t == 4 so you can use a small epsilon instead
x(abs(t - 1) < eps) = 2;
x(abs(t - 4) < eps) = 5;
Once we make this change, we get the expected scaled and shifted versions of the input function.

Get Matrix of minimum coordinate distance to point set

I have a set of points or coordinates like {(3,3), (3,4), (4,5), ...} and want to build a matrix with the minimum distance to this point set. Let me illustrate using a runnable example:
width = 10;
height = 10;
% Get min distance to those points
pts = [3 3; 3 4; 3 5; 2 4];
sumSPts = length(pts);
% Helper to determine element coordinates
[cols, rows] = meshgrid(1:width, 1:height);
PtCoords = cat(3, rows, cols);
AllDistances = zeros(height, width,sumSPts);
% To get Roh_I of evry pt
for k = 1:sumSPts
% Get coordinates of current Scribble Point
currPt = pts(k,:);
% Get Row and Col diffs
RowDiff = PtCoords(:,:,1) - currPt(1);
ColDiff = PtCoords(:,:,2) - currPt(2);
AllDistances(:,:,k) = sqrt(RowDiff.^2 + ColDiff.^2);
end
MinDistances = min(AllDistances, [], 3);
This code runs perfectly fine but I have to deal with matrix sizes of about 700 milion entries (height = 700, width = 500, sumSPts = 2k) and this slows down the calculation. Is there a better algorithm to speed things up?
As stated in the comments, you don't necessary have to put everything into a huge matrix and deal with gigantic matrices. You can :
1. Slice the pts matrix into reasonably small slices (say of length 100)
2. Loop on the slices and calculate the Mindistances slice over these points
3. Take the global min
tic
Mindistances=[];
width = 500;
height = 700;
Np=2000;
pts = [randi(width,Np,1) randi(height,Np,1)];
SliceSize=100;
[Xcoords,Ycoords]=meshgrid(1:width,1:height);
% Compute the minima for the slices from 1 to floor(Np/SliceSize)
for i=1:floor(Np/SliceSize)
% Calculate indexes of the next slice
SliceIndexes=((i-1)*SliceSize+1):i*SliceSize
% Get the corresponding points and reshape them to a vector along the 3rd dim.
Xpts=reshape(pts(SliceIndexes,1),1,1,[]);
Ypts=reshape(pts(SliceIndexes,2),1,1,[]);
% Do all the diffs between your coordinates and your points using bsxfun singleton expansion
Xdiffs=bsxfun(#minus,Xcoords,Xpts);
Ydiffs=bsxfun(#minus,Ycoords,Ypts);
% Calculate all the distances of the slice in one call
Alldistances=bsxfun(#hypot,Xdiffs,Ydiffs);
% Concatenate the mindistances
Mindistances=cat(3,Mindistances,min(Alldistances,[],3));
end
% Check if last slice needed
if mod(Np,SliceSize)~=0
% Get the corresponding points and reshape them to a vector along the 3rd dim.
Xpts=reshape(pts(floor(Np/SliceSize)*SliceSize+1:end,1),1,1,[]);
Ypts=reshape(pts(floor(Np/SliceSize)*SliceSize+1:end,2),1,1,[]);
% Do all the diffs between your coordinates and your points using bsxfun singleton expansion
Xdiffs=bsxfun(#minus,Xcoords,Xpts);
Ydiffs=bsxfun(#minus,Ycoords,Ypts);
% Calculate all the distances of the slice in one call
Alldistances=bsxfun(#hypot,Xdiffs,Ydiffs);
% Concatenate the mindistances
Mindistances=cat(3,Mindistances,min(Alldistances,[],3));
end
% Get global minimum
Mindistances=min(Mindistances,[],3);
toc
Elapsed time is 9.830051 seconds.
Note :
You'll not end up doing less calculations. But It will be a lot less intensive for your memory (700M doubles takes 45Go in memory), thus speeding up the process (With the help of vectorizing aswell)
About bsxfun singleton expansion
One of the great strength of bsxfun is that you don't have to feed it matrices whose values are along the same dimensions.
For example :
Say I have two vectors X and Y defined as :
X=[1 2]; % row vector X
Y=[1;2]; % Column vector Y
And that I want a 2x2 matrix Z built as Z(i,j)=X(i)+Y(j) for 1<=i<=2 and 1<=j<=2.
Suppose you don't know about the existence of meshgrid (The example is a bit too simple), then you'll have to do :
Xs=repmat(X,2,1);
Ys=repmat(Y,1,2);
Z=Xs+Ys;
While with bsxfun you can just do :
Z=bsxfun(#plus,X,Y);
To calculate the value of Z(2,2) for example, bsxfun will automatically fetch the second value of X and Y and compute. This has the advantage of saving a lot of memory space (No need to define Xs and Ys in this example) and being faster with big matrices.
Bsxfun Vs Repmat
If you're interested with comparing the computational time between bsxfun and repmat, here are two excellent (word is not even strong enough) SO posts by Divakar :
Comparing BSXFUN and REPMAT
BSXFUN on memory efficiency with relational operations

Finding values of x for a given y when approaching a limit

I'm trying to find two x values for each y value on a plot that is very similar to a Gaussian fn. The difficulty is that I need to be able to find the values of x for several values of y even when the gaussian fn is very close to zero.
I can't post an image due to being a new user, however think of a gaussian function and then the regions where it is close to zero on either side of the peak. This part where the fn is very close to reaching zero is where I need to find the x values for a given y.
What I've tried:
When the fn is discrete: I have tried interp1, however I get the error that it is not strictly monotonic increasing because of the many values that are close to zero.
When I fit a two-term gaussian:
I use fzero (fzero(function-yvalue)) however I get a lot of NaN's. These might be from me not having a close enough 'guess' value??
Does anyone have any other suggestions for me to try? Or how to improve what I've already attempted?
Thanks everyone
EDIT:
I've added a picture below. The data that I actually have is the blue line, while the fitted eqn is in red. The eqn should be accurate enough.
Again, I'm trying to pick out x values for a given y where y is very small (approaching 0).
I've tried splitting the function into left and right halves for the interpolation and fzero method.
Thanks for your responses anyway, I'll have a look at bisection.
Fitting a Gaussian seems to be uneffective, as its deviation (in the x-coordinate) from the real data is noticeable.
Since your data is already presented as a numeric vector y, the straightforward find(y<y0) seems adequate. Here is a sample code, in which the y-values are produced from a perturbed Gaussian.
x = 0:1:700;
y = 2000*exp(-((x-200)/50).^2 - sin(x/100).^2); % imitated data
plot(x,y)
y0 = 1e-2; % the y-value to look for
i = min(find(y>y0)); % first entry above y0
if i == 1
x1 = x(i);
else
x1 = x(i) - y(i)*(x(i)-x(i-1))/(y(i)-y(i-1)); % linear interpolation
end
i = max(find(y>y0)); % last entry above y0
if i == numel(y)
x2 = x(i);
else
x2 = x(i) - y(i)*(x(i)-x(i+1))/(y(i)-y(i+1)); % linear interpolation
end
fprintf('Roots: %g, %g \n', x1, x2)
Output: Roots: 18.0659, 379.306
The curve looks much like your plot.

simulating a random walk in matlab

i have variable x that undergoes a random walk according to the following rules:
x(t+1)=x(t)-1; probability p=0.3
x(t+1)=x(t)-2; probability q=0.2
x(t+1)=x(t)+1; probability p=0.5
a) i have to create this variable initialized at zero and write a for loop for 100 steps and that runs 10000 times storing each final value in xfinal
b) i have to plot a probability distribution of xfinal (a histogram) choosing a bin size and normalization!!* i have to report the mean and variance of xfinal
c) i have to recreate the distribution by application of the central limit theorem and plot the probability distribution on the same plot!
help would be appreciated in telling me how to choose the bin size and normalize the histogram and how to attempt part c)
your help is much appreciated!!
p=0.3;
q=0.2;
s=0.5;
numberOfSteps = 100;
maxCount = 10000;
for count=1:maxCount
x=0;
for i = 1:numberOfSteps
random = rand(1, 1);
if random <=p
x=x-1;
elseif random<=(p+q)
x=x-2;
else
x=x+1;
end
end
xfinal(count) = x;
end
[f,x]=hist(xfinal,30);
figure(1)
bar(x,f/sum(f));
xlabel('xfinal')
ylabel('frequency')
mean = mean(xfinal)
variance = var(xfinal)
For the first question, check the help for hist on mathworks homepage
[nelements,centers] = hist(data,nbins);
You do not select the bin size, but the number of bins. nelements gives the elements per bin and center is all the bin centers. So to say, it would be the same to call
hist(data,nbins);
as
[nelements,centers] = hist(data,nbins);
plot(centers,nelements);
except that the representation is different (line or pile). To normalize, simply divide nelements with sum(nelements)
For c, here i.i.d. variables it actually is a difference if the variables are real or complex. However for real variables the central limit theorem in short tells you that for a large number of samples the distribution will limit the normal distribution. So if the samples are real, you simply asssumes a normal distribution, calculates the mean and variance and plots this as a normal distribution. If the variables are complex, then each of the variables will be normally distributed which means that you will have a rayleigh distribution instead.
Mathworks is deprecating hist that is being replaced with histogram.
more details in this link
You are not applying the PDF function as expected, the expression Y doesn't work
For instance Y does not have the right X-axis start stop points. And you are using x as input to Y while x already used as pivot inside the double for loop.
When I ran your code Y generates a single value, it is not a vector but just a scalar.
This
bar(x,f/sum(f));
bringing down all input values with sum(f) division? no need.
On attempting to overlap the ideal probability density function, often one has to do additional scaling, to have both real and ideal visually overlapped.
MATLAB can do the scaling for us, and no need to modify input data /sum(f).
With a dual plot using yyaxis
You also mixed variance and standard deviation.
Instead try something like this
y2=1 / sqrt(2*pi*var1)*exp(-(x2-m1).^2 / (2*var1))
ok, the following solves your question(s)
codehere
clear all;
close all;
clc
p=0.3; % thresholds
q=0.2;
s=0.5;
n_step=100;
max_cnt=10000;
n_bin=30; % histogram amount bins
xf=zeros(1,max_cnt);
for cnt=1:max_cnt % runs loop
x=0;
for i = 1:n_step % steps loop
t_rand1 = rand(1, 1);
if t_rand1 <=p
x=x-1;
elseif t_rand1<=(p+q)
x=x-2;
else
x=x+1;
end
end
xf(cnt) = x;
end
% [f,x]=hist(xf,n_bin);
hf1=figure(1)
ax1=gca
yyaxis left
hp1=histogram(xf,n_bin);
% bar(x,f/sum(f));
grid on
xlabel('xf')
ylabel('frequency')
m1 = mean(xf)
var1 = var(xf)
s1=var1^.5 % sigma
%applying central limit theorem %finding the mean
n_x2=1e3 % just enough points
min_x2=min(hp1.BinEdges)
max_x2=max(hp1.BinEdges)
% quite same as
min_x2=hp1.BinLimits(1)
max_x2=hp1.BinLimits(2)
x2=linspace(min_x2,max_x2,n_x2)
y2=1/sqrt(2*pi*var1)*exp(-(x2-m1).^2/(2*var1));
% hold(ax1,'on')
yyaxis right
plot(ax1,x2,y2,'r','LineWidth',2)
.
.
.
note I have not used these lines
% Xp=-1; Xq=-2; Xs=1; mu=Xp.*p+Xq.*q+Xs.*s;
% muN=n_step.*mu;
%
% sigma=(Xp).^2.*p+(Xq).^2.*q+(Xs).^2.s; % variance
% sigmaN=n_step.(sigma-(mu).^2);
People ususally call sigma to variance^.5
This supplied script is a good start point to now take it to wherever you need it to go.