I have two sets of data taken from experiments, and they look very similar, except there is a horizontal offset between them, which I believe is due to some bugs in the instrument setting. Suppose they have the form y1=f(x1) and y2=f(x2)= f(x1+c), what's the best way to determine c so that I can take into account the offset to superimpose two data sets to become one data set?
Edit:
let's say my data sets (index 1 and 2) have the form:
x1 = 0:0.2:10;
y1 = sin(x1)
x2 = 0:0.3:10;
y2 = sin(x2+0.5)
Of course, the real data will have some noise, but say the best fit functions have the above forms. How do I find the offset c=0.5? I have looked into the cross-correlation, but I'm not sure if they can handle two data sets with different number of data (and different step sizes). Also, what if the offset values actually fall between two data points? Cross-correlation only returns the index of the data in the array, not something in between if I understand correctly.
This Matlab script calculates the random offset from -pi/2 to +pi/2 between two sine waves:
clear;
C= pi*(rand-0.5); % should be between -pi/2 and +pi/2
N=200; % should be large enough for acceptable sampling rate
N1=20; % fraction for Ts1
N2=30; % fraction for Ts2
Ts1=abs(C*N1/N); % fraction of C for accuracy
Ts2=abs(C*N2/N); % fraction of C for accuracy
Ts=min(Ts1,Ts2); % select highest sampling rate (smaller period)
fs=1/Ts;
P=4; % number of periods should be large enough for well defined correlation plot
x1 = 0:Ts:P*2*pi;
y1 = sin(x1);
x2 = 0:Ts:P*2*pi;
y2 = sin(x2+C);
subplot(3,1,1)
plot(x1,y1);
subplot(3,1,2);
plot(x2,y2);
[cor,lag]=xcorr(y1,y2);
subplot(3,1,3);
plot(lag,cor);
[~,I] = max(abs(cor));
lagdiff = lag(I);
timediff=lagdiff/fs;
In the particular case below, C = timediff = 0.5615:
write a function which takes the time shift as an input and calculates rms between overlapping portions of the two data sets. Then find the minimum of this function using optimization (fminbnd)
Related
I want to determine how well the estimated model fits to the future new data. To do this, prediction error plot is often used. Basically, I want to compare the measured output and the model output. I am using the Least Mean Square algorithm as the equalization technique. Can somebody please help what is the proper way to plot the comparison between the model and the measured data? If the estimates are close to true, then the curves should be very close to each other. Below is the code. u is the input to the equalizer, x is the noisy received signal, y is the output of the equalizer, w is the equalizer weights. Should the graph be plotted using x and y*w? But x is noisy. I am confused since the measured output x is noisy and the model output y*w is noise-free.
%% Channel and noise level
h = [0.9 0.3 -0.1]; % Channel
SNRr = 10; % Noise Level
%% Input/Output data
N = 1000; % Number of samples
Bits = 2; % Number of bits for modulation (2-bit for Binary modulation)
data = randi([0 1],1,N); % Random signal
d = real(pskmod(data,Bits)); % BPSK Modulated signal (desired/output)
r = filter(h,1,d); % Signal after passing through channel
x = awgn(r, SNRr); % Noisy Signal after channel (given/input)
%% LMS parameters
epoch = 10; % Number of epochs (training repetation)
eta = 1e-3; % Learning rate / step size
order=10; % Order of the equalizer
U = zeros(1,order); % Input frame
W = zeros(1,order); % Initial Weigths
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y = (W)*U'; % Calculating output of LMS
e = d(n) - y; % Instantaneous error
W = W + eta * e * U ; % Weight update rule of LMS
J(k,n) = e * e'; % Instantaneous square error
end
end
Lets start step by step:
First of all when using some fitting method it is a good practice to use RMS error . To get this we have to find error between input and output. As I understood x is an input for our model and y is an output. Furthermore you already calculated error between them. But you used it in loop without saving. Lets modify your code:
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y(n) = (W)*U'; % Calculating output of LMS
e(n) = x(n) - y(n); % Instantaneous error
W = W + eta * e(n) * U ; % Weight update rule of LMS
J(k,n) = e(n) * (e(n))'; % Instantaneous square error
end
end
Now e consists of errors at the last epoch. So we can use something like this:
rms(e)
Also I'd like to compare results using mean error and standard deviation:
mean(e)
std(e)
And some visualization:
histogram(e)
Second moment: we can't use compare function just for vectors! You can use it for dynamic system models. For it you have to made some workaround about using this method as dynamic model. But we can use some functions as goodnessOfFit for example. If you want something like error at each step that consider all previous points of data then make some math workaround - calculate it at each point using [1:currentNumber].
About using LMS method. There are built-in function calculating LMS. Lets try to use it for your data sets:
alg = lms(0.001);
eqobj = lineareq(10,alg);
y1 = equalize(eqobj,x);
And lets see at the result:
plot(x)
hold on
plot(y1)
There are a lot of examples of such implementation of this function: look here for example.
I hope this was helpful for you!
Comparison of the model output vs observed data is known as residual.
The difference between the observed value of the dependent variable
(y) and the predicted value (ŷ) is called the residual (e). Each data
point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to zero. That is,
Σ e = 0 and e = 0.
A residual plot is a graph that shows the residuals on the vertical
axis and the independent variable on the horizontal axis. If the
points in a residual plot are randomly dispersed around the horizontal
axis, a linear regression model is appropriate for the data;
otherwise, a non-linear model is more appropriate.
Here is an example of residual plots from a model of mine. On the vertical axis is the difference between the output of the model and the measured value. On the horizontal axis is one of the independent variables used in the model.
We can see that most of the residuals are within 0.2 units which happens to be my tolerance for this model. I can therefore make a conclusion as to the worth of the model.
See here for a similar question.
Regarding you question about the lack of noise in your models output. We are creating a linear model. There's the clue.
I am trying to compare the FFT of exp(-t^2) to the function's analytical fourier transform, exp(-(w^2)/4)/sqrt(2), over the frequency range -3 to 3.
I have written the following matlab code and have iterated on it MANY times now with no success.
fs = 100; %sampling frequency
dt = 1/fs;
t = 0:dt:10-dt; %time vector
L = length(t); %number of sample points
%N = 2^nextpow2(L); %necessary?
y = exp(-(t.^2));
Y=dt*ifftshift(abs(fft(y)));
freq = (-L/2:L/2-1)*fs/L; %freq vector
F = (exp(-(freq.^2)/4))/sqrt(2); %analytical solution
%Y_valid_pts = Y(W>=-3 & W<=3); %compare for freq = -3 to 3
%npts = length(Y_valid_pts);
% w = linspace(-3,3,npts);
% Fe = (exp(-(w.^2)/4))/sqrt(2);
error = norm(Y - F) %L2 Norm for error
hold on;
plot(freq,Y,'r');
plot(freq,F,'b');
xlabel('Frequency, w');
legend('numerical','analytic');
hold off;
You can see that right now, I am simply trying to get the two plots to look similar. Eventually, I would like to find a way to do two things:
1) find the minimum sampling rate,
2) find the minimum number of samples,
to reach an error (defined as the L2 norm of the difference between the two solutions) of 10^-4.
I feel that this is pretty simple, but I can't seem to even get the two graphs visually agree.
If someone could let me know where I'm going wrong and how I can tackle the two points above (minimum sampling frequency and minimum number of samples) I would be very appreciative.
Thanks
A first thing to note is that the Fourier transform pair for the function exp(-t^2) over the +/- infinity range, as can be derived from tables of Fourier transforms is actually:
Finally, as you are generating the function exp(-t^2), you are limiting the range of t to positive values (instead of taking the whole +/- infinity range).
For the relationship to hold, you would thus have to generate exp(-t^2) with something such as:
t = 0:dt:10-dt; %time vector
t = t - 0.5*max(t); %center around t=0
y = exp(-(t.^2));
Then, the variable w represents angular frequency in radians which is related to the normalized frequency freq through:
w = 2*pi*freq;
Thus,
F = (exp(-((2*pi*freq).^2)/4))*sqrt(pi); %analytical solution
My goal is to find the maximum values of wave heights and wave lengths.
dwcL01 though dwcL10 is arrays of <3001x2 double> with output from a numerical wave model.
Part of my script:
%% Plotting results from SWASH
% Examination of phase velocity on deep water with different number of layers
% Wave height 3 meters, wave peroid 8 sec on a depth of 30 meters
clear all; close all; clc;
T=8;
L0=1.56*T^2;
%% Loading results tabels.
load dwcL01.tbl; load dwcL02.tbl; load dwcL03.tbl; load dwcL04.tbl;
load dwcL05.tbl; load dwcL06.tbl; load dwcL07.tbl; load dwcL08.tbl;
load dwcL09.tbl; load dwcL10.tbl;
M(:,:,1) = dwcL01; M(:,:,2) = dwcL02; M(:,:,3) = dwcL03; M(:,:,4) = dwcL04;
M(:,:,5) = dwcL05; M(:,:,6) = dwcL06; M(:,:,7) = dwcL07; M(:,:,8) = dwcL08;
M(:,:,9) = dwcL09; M(:,:,10) = dwcL10;
%% Finding position of wave crest using diff and sign.
for i=1:10
Tp(:,1,i) = diff(sign(diff([M(1,2,i);M(:,2,i)]))) < 0;
Wc(:,:,i) = M(Tp,:,i);
L(:,i) = diff(Wc(:,1,i))
end
This works fine for finding the maximum values, if the data is "smooth". The following image shows a section of my data. I get all peaks, when I only need the one around x = 40. How do I filter so I only get the "real" wave crests. The solution needs to be general so that it still works if I change the domain size, wave height or wave period.
If you're basically trying to fit this curve of data to a sine wave, have you considered performing Fourier analysis (FFT in Matlab), then checking the magnitude of that fundamental frequency? The frequency will tell you the wave spacing, and the magnitude the height, and when used over multiple periods will find an average.
See the Matlab help page for an example of the usage
but the basic gist is:
y = [...] %vector of wave data points
N=length(y); %Make sure this is an even number
Y = fft(y); %Convert into frequency domain
figure;
plot(y(1:N)); %Plot original wave data
figure;
plot(abs(Y(1:N/2))./N); %Plot only the lower half of frequencies to hide aliasing
I have one more solution that might work for you. It involves computing the 2nd-order derivative using a 5-point central difference instead of the 2-point finite differences. When using diff twice you are performing two first-order derivatives consecutively (finite 2-point differences) which are very susceptible to noise/oscillations. The advantage of using a higher-order approximation is that the neighboring points help filter out the small oscillations, and this may work for your case.
Let f(:) = squeeze(M(:,2,i)) be the array of data points, and h is the uniform spacing distance between the points:
%Better approximation of the 2nd derivative using neighboring points:
for j=3:length(f)-2
Tp(j,i) = (-f(j-2) + 16*f(j-1) - 30*f(j) + 16*f(j+1) - f(j+2))/(12*h^2);
end
Note that since this 2nd-order derivative requires the 2 neighboring points to the left and right, that the range of the loop must start at the 3rd index and end 2 short of the array length.
I'll try to be more specific: I have several time histories of a signal which have pretty much all the same behaviour (sine waves) but all start at a different point in time. How do I automatically detect the initial time lag and delete it such that all sine waves start at the same instant in time?
If you have two signals, x and y, each being a n x 1 matrix where y is a shifted version of x:
[c,lags] = xcorr(x,y); % c is the correlation, should have a clear peak
s = lags(c==max(c)); % s is the shift you need
y2 = circshift(y,s); % y2 should now overlap x
(Demo purposes only - I don't suggest you circshift your actual data). The shift you are looking for in this case should ideally be relatively small compared to the length of x and y. A lot depends on the noise level and the nature of the offset.
The following works pretty well under low noise conditions and fast sampling and may do depending on your requirements for accuracy. It uses a simple threshold and thus is subject to inaccuracy when things get noisy. Adjust thresh to a low value above the noise.
Nwav = 3;
Np = 100;
tmax = 50;
A = 1000;
Nz = Np/5;
%%%%%%%%%%%%%%
thresh = A/50;
%%%%%%%%%%%%%%
% generate some waveforms
t = [0:tmax/(Np-1):tmax].';
w = rand(1,Nwav);
offs = round(rand(1,Nwav)*100);
sig = [A*sin(t(1:end-Nz)*w) ; zeros(Nz,Nwav)] + randn(Np,Nwav);
for ii=1:Nwav
sig(:,ii) = circshift(sig(:,ii),round(rand()*Nz));
end
figure, plot(t,sig)
hold on, plot(t,repmat(thresh,length(t),1),'k--')
% use the threshold and align the waveforms
for ii=1:Nwav
[ir ic] = find(sig(:,ii)>thresh,1)
sig(:,ii) = circshift(sig(:,ii),-ir+1);
end
figure, plot(t,sig)
hold on, plot(t,repmat(thresh,length(t),1),'k--')
There is room for improvement (noise filtering, slope detection) but this should get you started.
I also recommend you look into waveform processing toolboxes, in matlab central for instance.
I have a following stochastic model describing evolution of a process (Y) in space and time. Ds and Dt are domain in space (2D with x and y axes) and time (1D with t axis). This model is usually known as mixed-effects model or components-of-variation models
I am currently developing Y as follow:
%# Time parameters
T=1:1:20; % input
nT=numel(T);
%# Grid and model parameters
nRow=100;
nCol=100;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:1:nCol,1:1:nRow,T);
xPower=0.1;
tPower=1;
noisePower=1;
detConstant=1;
deterministic_mu = detConstant.*(((Grid.Nt).^tPower)./((Grid.Nx).^xPower));
beta_s = randn(nRow,nCol); % mean-zero random effect representing location specific variability common to all times
gammaTemp = randn(nT,1);
for t = 1:nT
gamma_t(:,:,t) = repmat(gammaTemp(t),nRow,nCol); % mean-zero random effect representing time specific variability common to all locations
end
var=0.1;% noise has variance = 0.1
for t=1:nT
kappa_st(:,:,t) = sqrt(var)*randn(nRow,nCol);
end
for t=1:nT
Y(:,:,t) = deterministic_mu(:,:,t) + beta_s + gamma_t(:,:,t) + kappa_st(:,:,t);
end
My questions are:
How to produce delta in the expression for Y and the difference in kappa and delta?
Help explain, through some illustration using Matlab, if I am correctly producing Y?
Please let me know if you need some more information/explanation. Thanks.
First, I rewrote your code to make it a bit more efficient. I see you generate linearly-spaced grids for x,y and t and carry out the computation for all points in this grid. This approach has severe limitations on the maximum attainable grid resolution, since the 3D grid (and all variables defined with it) can consume an awfully large amount of memory if the resolution goes up. If the model you're implementing will grow in complexity and size (it often does), I'd suggest you throw this all into a function accepting matrix/vector inputs for s and t, which will be a bit more flexible in this regard -- processing "blocks" of data that will otherwise not fit in memory will be a lot easier that way.
Then, I generated the the delta_st term with rand instead of randn since the noise should be "white". Now I'm very unsure about that last one, and I didn't have time to read through the paper you linked to -- can you tell me on what pages I can find relevant the sections for the delta_st?
Now, the code:
%# Time parameters
T = 1:1:20; % input
nT = numel(T);
%# Grid and model parameters
nRow = 100;
nCol = 100;
% noise has variance = 0.1
var = 0.1;
xPower = 0.1;
tPower = 1;
noisePower = 1;
detConstant = 1;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:nCol,1:nRow,T);
% deterministic mean
deterministic_mu = detConstant .* Grid.Nt.^tPower ./ Grid.Nx.^xPower;
% mean-zero random effect representing location specific
% variability common to all times
beta_s = repmat(randn(nRow,nCol), [1 1 nT]);
% mean-zero random effect representing time specific
% variability common to all locations
gamma_t = bsxfun(#times, ones(nRow,nCol,nT), randn(1, 1, nT));
% mean zero random effect capturing the spatio-temporal
% interaction not found in the larger-scale deterministic mu
kappa_st = sqrt(var)*randn(nRow,nCol,nT);
% mean zero random effect representing the micro-scale
% spatio-temporal variability that is modelled by white
% noise (i.i.d. at different time steps) in Ds·Dt
delta_st = noisePower * (rand(nRow,nCol,nT)-0.5);
% Final result:
Y = deterministic_mu + beta_s + gamma_t + kappa_st + delta_st;
Your implementation samples beta, gamma and kappa as if they are white (e.g. their values at each (x,y,t) are independent). The descriptions of the terms suggest that this is not meant to be the case. It looks like delta is supposed to capture the white noise, while the other terms capture the correlations over their respective domains. e.g. there is a non-zero correlation between gamma(t_1) and gamma(t_1+1).
If you wish to model gamma as a stationary Gaussian Markov process with variance var_g and correlation cor_g between gamma(t) and gamma(t+1), you can use something like
gamma_t = nan( nT, 1 );
gamma_t(1) = sqrt(var_g)*randn();
K_g = cor_g/var_g;
K_w = sqrt( (1-K_g^2)*var_g );
for t = 2:nT,
gamma_t(t) = K_g*gamma_t(t-1) + K_w*randn();
end
gamma_t = reshape( gamma_t, [ 1 1 nT ] );
The formulas I've used for gains K_g and K_w in the above code (and the initialization of gamma_t(1)) produce the desired stationary variance \sigma^2_0 and one-step covariance \sigma^2_1:
Note that the implementation above assumes that later you will sum the terms using bsxfun to do the "repmat" for you:
Y = bsxfun( #plus, deterministic_mu + kappa_st + delta_st, beta_s );
Y = bsxfun( #plus, Y, gamma_t );
Note that I haven't tested the above code, so you should confirm with sampling that it does actually produce a zero noise process of the specified variance and covariance between adjacent samples. To sample beta the same procedure can be extended into two dimensions, but the principles are essentially the same. I suspect kappa should be similarly modeled as a Markov Gaussian Process, but in all three dimensions and with a lower variance to represent higher-order effects not captured in mu, beta and gamma.
Delta is supposed to be zero mean stationary white noise. Assuming it to be Gaussian with variance noisePower one would sample it using
delta_st = sqrt(noisePower)*randn( [ nRows nCols nT ] );