Detect steps in a Piecewise constant signal - matlab

I have a piecewise constant signal shown below. I want to detect the location of step transition (Marked in red).
My current approach:
Smooth signal using moving average filter (http://www.mathworks.com/help/signal/examples/signal-smoothing.html)
Perform Discrete Wavelet transform to get discontinuities
Locate the discontinuities to get the location of step transition
I am currently implementing the last step of detecting the discontinuities. However, I cannot get the precise location and end with many false detection.
My question:
Is this the correct approach?
If yes, can someone shed some info/ algorithm to use for the last step?
Please suggest an alternate/ better approach.
Thanks

Convolve your signal with a 1st derivative of a Gaussian to find the step positions, similar to a Canny edge detection in 1-D. You can do that in a multi-scale approach, starting from a "large" sigma (say ~10 pixels) detect local maxima, then to a smaller sigma (~2 pixels) to converge on the right pixels where the steps are.
You can see an implementation of this approach here.

If your function is really piecewise constant, why not use just abs of diff compared to a threshold?
th = 0.1;
x_steps = x(abs(diff(y)) > th)
where x a vector with your x-axis values, y is your y-axis data, and th is a threshold.
Example:
>> x = [2 3 4 5 6 7 8 9];
>> y = [1 1 1 2 2 2 3 3];
>> th = 0.1;
>> x_steps = x(abs(diff(y)) > th)
x_steps =
4 7

Regarding your point 3: (Please suggest an alternate/ better approach)
I suggest to use a Potts "filter". This is a variational approach to get an accurate estimation of your piecewise constant signal (similar to the total variation minimization). It can be interpreted as adaptive median filtering. Given the Potts estimate u, the jump points are the points of non-zero gradient of u, that is, diff(u) ~= 0. (There are free Matlab implementations of the Potts filters on the web)
See also http://en.wikipedia.org/wiki/Step_detection

Total Variation Denoising can produce a piecewise constant signal. Then, as pointed out above, "abs of diff compared to a threshold" returns the position of the transitions.
There exist very efficient algorithms for TVDN that process millions of data points within milliseconds:
http://www.gipsa-lab.grenoble-inp.fr/~laurent.condat/download/condat_fast_tv.c
Here's an implementation of a variational approach with python and matlab interface that also uses TVDN:
https://github.com/qubit-ulm/ebs

I think, smoothing with a sharper lowpass filter should work better.
Try to use medfilt1() (a median filter) instead, since you have very concrete levels. If you know how long your plateau is, you can take half/quarter of the plateau length for example. Then you would get very sharp edges. The sharp edges should be detectable using a Haar wavelet or even just using simple differentiation.

Related

How to use the FFT (Fast Fourier Transform) in Matlab

I'm just learning Matlab and the fast fourier transform algorithm.
As a first step I tried to duplicate this example: https://en.wikipedia.org/wiki/Fourier_transform#Example
I use the following code:
t = -6:0.01:6;
s = cos(2 * pi * 3 * t) .* exp(-pi * t.^2);
figure(1);
plot(t, s);
xlim([-2 2]);
r = fft(s);
figure(2);
plot(t, abs(r));
And I obtained the following picture:
Figure 2:
Figure 1 is OK, but Figure 2 is not. I see one of the problem is that in Figure 2 I should plot vector r against frequency, not against time. Another problem in Figure 2 is the scale in the Y-axis.
Thus, I have 2 questions in order to duplicate the example:
How can I obtain the frequency domain (X-axis in Figure 2)?
How should I scale vector r (Y-axis in Figure 2)?
Your issue is that you aren't actually creating a frequency vector to plot the fft against. The reason that the fft is plotted against time is because that is what you specified in your plot command.
Here is a working fft outline:
N=length(t);
index=0:N-1;
FrequencyResolution=SamplingRate/N;
Frequency=index.*FrequencyResolution;
data_fft=fft(detrend(data));
%the detrend isn't necessary but it does look nicer because it focuses the plot on changes around the mean of the data
data_FFTmagnitude=abs(data_fft);
plot(Frequency, data_FFTmagnitude)
I remember once for the first time that I wanted to use DFT and FFT for one of my study projects I used this webpage, it explains in detail with examples on how to do so. I suggest you go through it and try to replicate for your case, doing so will give you insight and better understanding of the way one can use FFt as you said you are new to Matlab. Do not hesitate to ask again if you need more detailed help.
And also keep in mind that for FFT it is better to have signal length of a power of 2, that way you will get the most exact results, and if you cannot control your signal length you can take the largest power of 2 close to that length, as everyone usually does.

How to compute distance and estimate quality of heterogeneous grids in Matlab?

I want to evaluate the grid quality where all coordinates differ in the real case.
Signal is of a ECG signal where average life-time is 75 years.
My task is to evaluate its age at the moment of measurement, which is an inverse problem.
I think 2D approximation of the 3D case is hard (done here by Abo-Zahhad) with with 3-leads (2 on chest and one at left leg - MIT-BIT arrhythmia database):
where f is a piecewise continuous function in R^2, \epsilon is the error matrix and A is a 2D matrix.
Now, I evaluate the average grid distance in x-axis (time) and average grid distance in y-axis (energy).
I think this can be done by Matlab's Image Analysis toolbox.
However, I am not sure how complete the toolbox's approaches are.
I think a transform approach must be used in the setting of uneven and noncontinuous grids. One approach is exact linear time euclidean distance transforms of grid line sampled shapes by Joakim Lindblad et all.
The method presents a distance transform (DT) which assigns to each image point its smallest distance to a selected subset of image points.
This kind of approach is often a basis of algorithms for many methods in image analysis.
I tested unsuccessfully the case with bwdist (Distance transform of binary image) with chessboard (returns empty square matrix), cityblock, euclidean and quasi-euclidean where the last three options return full matrix.
Another pseudocode
% https://stackoverflow.com/a/29956008/54964
%// retrieve picture
imgRGB = imread('dummy.png');
%// detect lines
imgHSV = rgb2hsv(imgRGB);
BW = (imgHSV(:,:,3) < 1);
BW = imclose(imclose(BW, strel('line',40,0)), strel('line',10,90));
%// clear those masked pixels by setting them to background white color
imgRGB2 = imgRGB;
imgRGB2(repmat(BW,[1 1 3])) = 255;
%// show extracted signal
imshow(imgRGB2)
where I think the approach will not work here because the grids are not necessarily continuous and not necessary ideal.
pdist based on the Lumbreras' answer
In the real examples, all coordinates differ such that pdist hamming and jaccard are always 1 with real data.
The options euclidean, cytoblock, minkowski, chebychev, mahalanobis, cosine, correlation, and spearman offer some descriptions of the data.
However, these options make me now little sense in such full matrices.
I want to estimate how long the signal can live.
Sources
J. Müller, and S. Siltanen. Linear and nonlinear inverse problems with practical applications.
EIT with the D-bar method: discontinuous heart-and-lungs phantom. http://wiki.helsinki.fi/display/mathstatHenkilokunta/EIT+with+the+D-bar+method%3A+discontinuous+heart-and-lungs+phantom Visited 29-Feb 2016.
There is a function in Matlab defined as pdist which computes the pairwisedistance between all row elements in a matrix and enables you to choose the type of distance you want to use (Euclidean, cityblock, correlation). Are you after something like this? Not sure I understood your question!
cheers!
Simply, do not do it in the post-processing. Those artifacts of the body can be about about raster images, about the viewer and/or ... Do quality assurance in the signal generation/processing step.
It is much easier to evaluate the original signal than its views.

Gaussian derivative - Matlab

I have an RGB image and I am trying to calculate its Gaussian derivative.
Image is a greyscale image and the Gaussian window is 5x5,st is the standard deviation
This is the code i am using in order to find a 2D Gaussian derivative,in Matlab:
N=2
[X,Y]=meshgrid(-N:N,-N:N)
G=exp(-(x.^2+y.^2)/(2*st^2))/(2*pi*st)
G_x = -x.*G/(st^2);
G_x_s = G_x/sum(G_x(:));
G_y = -y.*G/(st^2);
G_y_s = G_y/sum(G_y(:));
where st is the standard deviation i am using. Before I proceed to the convolution of the Image using G_x_s and G_y_s, i have the following problem. When I use a standard deviation that is an even number(2,4,6,8) the program works and gives results as expected. But when i use an odd number for standard deviation (3 or 5) then the G_y_s value becomes Inf because sum(G_y(:))=0. I do not understand that behavior and I was wondering if there is some problem with the code or if in the above formula the standard deviation can only be an even number. Any help will be greatly appreciated.
Thank you.
Your program doesn't work at all. The results you find when using an even number is just because of some numerical errors.
Your G will be a matrix symmetrical to the center. x and y are both point symmetrical to the center. So the multiplication (G times x or y) will result in a matrix with a sum of zero. So a division by that sum is a division by zero. Everything else you observe is because of some roundoff errors. Here, I see a sum og G_xof about 1.3e-17.
I think your error is in the multiplication x.*G and y.*G. I can not figure out why you would do that.
I assume you want to do edge detection rigth? You can use fspecialto create several edge filters. Laplace of gaussian for instance. You could also create two gaussian filters with different standard deviations and subtract them from another to get an edge filter.

Fit sine wave with a distorted time-base

I want to know the best way to fit a sine-wave with a distorted time base, in Matlab.
The distortion in time is given by a n-th order polynomial (n~10), of the form t_distort = P(t).
For example, consider the distortion t_distort = 8 + 12t + 6t^2 + t^3 (which is just the power series expansion of (t-2)^3).
This will distort a sine-wave as follows:
I want to be able to find the distortion given this distorted sine-wave. (i.e. I want to find the function t = G(t_distort), but t_distort = P(t) is unknown.)
If your resolution is high enough, then this is basically an angle-demodulation problem. The standard way to demodulate an angle-modulated signal is to take the derivative, followed by an envelope detector, followed by an integrator.
Since I don't know the exact numbers you're using, I'll show an example with my own numbers. Suppose my original timebase has 10 million points from 0 to 100:
t = 0:0.00001:100;
I then get the distorted timebase and calculate the distorted sine wave:
td = 0.02*(t+2).^3;
yd = sin(td);
Now I can demodulate it. Take the "derivative" using approximate differences divided by the step size from before:
ydot = diff(yd)/0.00001;
The envelope can be easily detected:
envelope = abs(hilbert(ydot));
This gives an approximation for the derivative of P(t). The last step is an integrator, which I can approximate using a cumulative sum (we have to scale it again by the step size):
tdguess = cumsum(envelope)*0.00001;
This gives a curve that's almost identical to the original distorted timebase (so, it gives a good approximation of P(t)):
You won't be able to get the constant term of the polynomial since we made our approximation from its derivative, which of course eliminates the constant term. You wouldn't even be able to find a unique constant term mathematically from just yd, since infinitely many values will yield the same distorted sine wave. You can get the other three coefficients of P(t) using polyfit if you know the degree of P(t) (ignore the last number, it's the constant term):
>> polyfit(t(1:10000000), tdguess, 3)
ans =
0.0200 0.1201 0.2358 0.4915
This is pretty close to the original, aside from the constant term: 0.02*(t+2)^3 = 0.02t^3 + 0.12t^2 + 0.24t + 0.16.
You wanted the inverse mapping Q(t). Can you do that knowing a close approximation for P(t) as found so far?
Here's an analytical driven route that takes asin of the signal with proper unwrapping of the angle. Then you can fit a polynomial using polyfit on the angle or using other fit methods (search for fit and see). Last, take a sin of the fitted function and compare the signal to the fitted one... see this pedagogical example:
% generate data
t=linspace(0,10,1e2);
x=0.02*(t+2).^3;
y=sin(x);
% take asin^2 to obtain points of "discontinuity" where then asin hits +-1
da=(asin(y).^2);
[val locs]=findpeaks(da); % this can be done in many other ways too...
% construct the asin according to the proper phase unwrapping
an=NaN(size(y));
an(1:locs(1)-1)=asin(y(1:locs(1)-1));
for n=2:numel(locs)
an(locs(n-1)+1:locs(n)-1)=(n-1)*pi+(-1)^(n-1)*asin(y(locs(n-1)+1:locs(n)-1));
end
an(locs(n)+1:end)=n*pi+(-1)^(n)*asin(y(locs(n)+1:end));
r=~isnan(an);
p=polyfit(t(r),an(r),3);
figure;
subplot(2,1,1); plot(t,y,'.',t,sin(polyval(p,t)),'r-');
subplot(2,1,2); plot(t,x,'.',t,(polyval(p,t)),'r-');
title(['mean error ' num2str(mean(abs(x-polyval(p,t))))]);
p =
0.0200 0.1200 0.2400 0.1600
The reason I preallocate with NaNand avoid taking the asin at points of discontinuity (locs) is to reduce the error of the fit later. As you can see, for a 100 points between 0,10 the average error is of the order of floating point accuracy, and the polynomial coefficients are as exact as you can have them.
The fact that you are not taking a derivative (as in the very elegant Hilbert transform) allows to be numerically exact. For the same conditions the Hilbert transform solution will have a much bigger average error (order of unity vs 1e-15).
The only limitation of this method is that you need enough points in the regime where the asin flips direction and that function inside the sin is well behaved. If there's a sampling issue you can truncate the data and only maintain a smaller range closer to zero, such that it'll be enough to characterize the function inside the sin. After all, you don't need millions op points to fit to a 3 parameter function.

Am I using the Fourier transformation the right way?

I am wondering if I am using Fourier Transformation in MATLAB the right way. I want to have all the average amplitudes for frequencies in a song. For testing purposes I am using a free mp3 download of Beethovens "For Elise" which I converted to a 8 kHz mono wave file using Audacity.
My MATLAB code is as follows:
clear all % be careful
% load file
% Für Elise Recording by Valentina Lisitsa
% from http://www.forelise.com/recordings/valentina_lisitsa
% Converted to 8 kHz mono using Audacity
allSamples = wavread('fur_elise_valentina_lisitsa_8khz_mono.wav');
% apply windowing function
w = hanning(length(allSamples));
allSamples = allSamples.*w;
% FFT needs input of length 2^x
NFFT = 2^nextpow2(length(allSamples))
% Apply FFT
fftBuckets=fft(allSamples, NFFT);
fftBuckets=fftBuckets(1:(NFFT/2+1)); % because of symetric/mirrored values
% calculate single side amplitude spectrum,
% normalize by dividing by NFFT to get the
% popular way of displaying amplitudes
% in a range of 0 to 1
fftBuckets = (2*abs(fftBuckets))/NFFT;
% plot it: max possible frequency is 4000, because sampling rate of input
% is 8000 Hz
x = linspace(1,4000,length(fftBuckets));
bar(x,fftBuckets);
The output then looks like this:
Can somebody please tell me if my code is correct? I am especially wondering about the peaks around 0.
For normalizing, do I have to divide by NFFT or length(allSamples)?
For me this doesn't really look like a bar chart, but I guess this is due to the many values I am plotting?
Thanks for any hints!
Depends on your definition of "correct". This is doing what you intended, I think, but it's probably not very useful. I would suggest using a 2D spectrogram instead, as you'll get time-localized information on frequency content.
There is no one correct way of normalising FFT output; there are various different conventions (see e.g. the discussion here). The comment in your code says that you want a range of 0 to 1; if your input values are in the range -1 to 1, then dividing by number of bins will achieve that.
Well, exactly!
I would also recommend plotting the y-axis on a logarithmic scale (in decibels), as that's roughly how the human ear interprets loudness.
Two things that jump out at me:
I'm not sure why you are including the DC (index = 1) component in your plot. Not a big deal, but of course that bin contains no frequency data
I think that dividing by length(allSamples) is more likely to be correct than dividing by NFFT. The reason is that if you want the DC component to be equal to the mean of the input data, dividing by length(allSamples) is the right thing to do.
However, like Oli said, you can't really say what the "correct" normalization is until you know exactly what you are trying to calculate. I tend to use FFTs to estimate power spectra, so I want units like "DAC / rt-Hz", which would lead to a different normalization than if you wanted something like "DAC / Hz".
Ultimately there's no substitute for thinking about exacty what you want to get out of the FFT (including units), and working out for yourself what the correct normalization should be (starting from the definition of the FFT if necessary).
You should also be aware that MATLAB's fft has no requirement to use an array length that is a power of 2 (though doing so will presumably lead to the FFT running faster). Because zero-padding will introduce some ringing, you need to think about whether it is the right thing to do for your application.
Finally, if a periodogram / power spectrum is really what you want, MATLAB provides functions like periodogram, pwelch and others that may be helpful.