Remove unknown DC Offset from a non-periodic discrete time signal

Remove unknown DC Offset from a non-periodic discrete time signal - matlab

Is there some process that can determine / remove an unknown DC offset from a non-periodic discrete time signal?
The signal in in question has a sample rate of 25Hz and has harmonics of interest between 0.25 and 3 Hz.
I have tried using highpass filters mixed results, first I used a 10th order guassian with Fc = 0Hz, this did a good job of removing the offset but it severly attenuated the AC aswell although it did leave the signal shape intact, next I used a 168th order equilripple with a stopband at 0Hz and passband at 0.25Hz, the phase shift was too severe and the signal shape too distorted, the distortion could probably be reduced if the pass-band was brought down to 0.1Hz but this would just further increase the phase shift which I need to keep to the very minimum.
Before and after applying x - LPF(x), as suggested by Paul R

I recommend using a notch filter at DC and using filtfilt to make it zero phase.
a = [1 , -0.98]; b = [1,-1];
y = filtfilt(b,a,x);
The closer the second value of a gets to -1 the narrower your notch will be.

A DC offset means that some constant value was added to the signal (the name originates from adding a DC voltage to an analog AC signal). If the DC component is really constant (and not changing really slowly), then you don't have to design some high-order (and potentially unstable) high-pass filters - you can just subtract the average of your signal from the signal - which is, of course, a high-pass filter as well (averaging is a type of a low-pass, and '1 minus the average' is high-apss) --- but a very simple one.
If, on the other hand, you have a reason to believe that the DC component is not really a DC, but rather an AC with very low frequency, then you'd better average segments of your signal and not the signal as a whole, which is the same as using a low-pass filter with impulse response which is shorter then the length of the signal. In this case you have to make some assumptions about the "DC" component.

Rather than implementing a high pass filter directly (which can be rather tricky for very low frequencies - you end up with a large number of coefficients and various issues with stability and passband ripple etc), you might instead want to consider implementing a low pass filter which will give you an estimate of the DC offset value, and then subtract this filtered offset from your signal, i.e. rather than:
y = HPF(x)
do this:
y = x - LPF(x)
The low pass filter can probably just be quite a simple filter with a relatively small number of terms. The big advantage of this implementation is that your higher frequency components should not have any unwanted artefacts due to phase, ripple, etc, since all you are doing is subtracting an almost stationary DC value from the samples.
The only potential downside is that if the DC offset is large you may have quite a long initial settling time before the estimate of the DC offset is accurate (although this is also true of any other implementation such as a direct high pass filter of course). If you have any a priori knowledge of what the offset value is likely to be (e.g. if it doesn't change very much from run to run, and you know the value from the previous run) then you can use this to optimise the settling time, by initialising the LPF state variables to a suitable value rather than 0.

As others have said, to remove a DC offset, you can simply subtract the mean. Your signal does not need to be periodic, but it does need to be long enough to get a good estimate of the DC component.
If you still wish to go with a filtering approach, you can eliminate the severe distortion due to phase lag by using filtfilt. This function filters your timeseries once in the forwards direction and then once in the reverse direction, so that phase distortions cancel out.

You can design a symmetric FIR filter as the low-pass filter that estimates the DC and then subtract the output from your input signal. This filter has constant group-delay.

Related

How to avoid infinity as output value?

I have written the following code to calculate required transmission power based on distance between the sender and receiver and SNR threshold at the receiver. However I get huge values for required Intensity(Req_I) and Required Transmission Power (Req_Pt). Please Suggest the solution if I am making any mistake in the technique to calculate the transmission power or in the code itself.
Best Regards
Pt=12; %Transmit power in watts
spreading=1.5; %Spreading factor
f=10; %Frequency in Kilo Hz.
d=0.5; %Distance in Kilo Meters.
NL=47.69; %Noise Level in db
DI=0; %Directivity Index
pi=3.14159265359;
SNRth=17; %SNR threshold in db
%absorption=10^((0.002+0.11*(f^2/((1+f^2)+0.011*f^2)))/10); %Absorption factor
absorption=10^((0.11*(f^2/(1+f^2))+44*(f^2/(4100+f^2))+2.75*10^(-4)*f^2+0.003)/10);
TL=(d^spreading)*(absorption^d); %Transmission Loss
Req_SL=SNRth+TL+NL+DI; %Required Source Level
Req_I=((10^Req_SL)/10)*(0.67*10^(-18)); %Required Intensity
Req_Pt=Req_I*4*pi; %Required Transmission Power

The calculation of your TL factor is probably wrong, maybe you forgot to take the logarithm of it?
I don't know where your formulas come from, nor your specific application. If you don't have the correct formulas, you can take a look at this pdf, which provides expressions for the TL factor due to attenuation and spreading.

Check your units.
The original paper gives an expression (3) for low frequency propagation, the one you have used, but requires the input to be in kHz, not Hz. Either you should use
f = 10*1e-3; %frequency in kHz
or you should be using formula (2). Also note that the the attenuation is in dB/km, so you should convert your distance too, unless you actually are interested in propagating 500 km.

Matlab: finding phase difference using cpsd

From my understanding, when using the cpsd function as such:
[Pxy,f] = cpsd(x,y,window,Ns,NFFT,Fs);
matlab chops the time series data into smaller windows with size specified by you. And the windows are shifted by Ns data point. The final [Pxy, f] are an average of results obtained from each individual window. Please correct me if I am wrong about this process.
My question is, if I use angle(Pxy) at a specific frequency, say 34Hz. Does that give me the phase difference between signal x and y at the frequency 34Hz?
I am having doubt about this because if Pxy was an average between each individual window, and because each individual was offset by a window shift, doesn't that mean the averaged Pxy's phase is affected by the window shift?
I've tried to correct this by ensuring that the window shift corresponds to an integer of full phase difference corresponding to 34Hz. Is this correct?
And just a little background about what I am doing:
I basically have numerous time-series pressure measurement over 60 seconds at 1000Hz sampling rate.
Power spectrum analysis indicates that there is a peak frequency at 34 Hz for each signal. (averaged over all windows)
I want to compare each signal's phase difference from each other corresponding to the 34Hz peak.
FFT analysis of individual window reveals that this peak frequency moves around. So I am not sure if cpsd is the correct way to be going about this.
I am currently considering trying to use xcorr to calculate the overall time lag between the signals and then calculate the phase difference from that. I have also heard of hilbert transform, but I got no idea how that works yet.

Yes, cpsd works.
You can test your result by set two input signals, such as:
t=[0:0.001:5];
omega=25;
x1=sin(2*pi*omega*t);
x2=sin(2*pi*omega*t+pi/3);
you can check whether the phase shift calculated by cpsd is pi/3.

Filters performance analysis

I am working on some experimental data which, at some point, need to be time-integrated and then high-pass filtered (to remove low frequency disturbancies introduced by integration and unwanted DC component).
The aim of my work is not related to filtering, but still I would like to analyze more in detail the filters I am using to give some justification (for example to motivate why I chosed to use a 4th order filter instead of a higher/lower one).
This is the filter I am using:
delta_t = 1.53846e-04;
Fs = 1/delta_t;
cut_F = 8;
Wn = cut_F/(Fs/2);
ftype = 'high';
[b,a] = butter(4,Wn,ftype);
filtered_signal = filtfilt(b,a,signal);
I already had a look here: High-pass filtering in MATLAB to learn something about filters (I never had a course on signal processing) and I used
fvtool(b,a)
to see the impulse response, step response ecc. of the filter I have used.
The problem is that I do not know how to "read" these plots.
What do I have to look for?
How can I understand if a filter is good or not? (I do not have any specification about filter performances, I just know that the lowest frequency I can admit is 5 Hz)
What features of different filters are useful to be compared to motivate the choice?

I see you are starting your Uni DSP class on filters :)
First thing you need to remember is that Matlab can only simulate using finite values, so the results you see are technically all discrete. There are 4 things that will influence your filtering results(or tell you if your filter is good or bad) which you will learn about/have to consider while designing a Finite response filter:
1, the Type of the filter (i.e. Hamming, Butterworth (the one you are using), Blackman, Hanning .etc)
2, the number of filter Coefficients (which determines your filter resolution)
3, the sampling frequency of the original signal (ideally, if you have infinite sampling frequency, you can have perfect filters; not possible in Matlab due to reason above, but you can simulate its effect by setting it really high)
4, the cut-off frequency
You can play around with the 4 parameters so that your filter does what you want it to.
So here comes the theory:
There is a trade-off in terms of the width of your main lobe vs the spectrum leakage of your filter. The idea is that you have some signal with some frequencies, you want to filter out the unwanted (i.e. your DC noise) and keep the ones you want, but what if your desired signal frequency is so low that it is very close to the DC noise. If you have a badly designed filter, you will not be able to filter out the DC component. In order to design a good filter, you will need to find the optimal number for your filter coefficients, type of filter, even cut-off frequency to make sure your filter works as you wanted.
Here is a low-pass filter that I wrote back in the days, you can play around with filters a lot by filtering different kinds of signals and plotting the response.
N = 21; %number of filter coefficients
fc = 4000; %cut-off frequency
f_sampling = fs; %sampling freq
Fc = fc/f_sampling;
n = -(N-1)/2:(N-1)/2;
delta = [zeros(1,(N-1)/2) 1 zeros(1,(N-1)/2)];
h = delta - 2*Fc*sinc(2*n*Fc);
output = filter(h,1,yoursignal);
to plot the response, you want to plot your output in the frequency domain using DFT or FFT(in Matlab) and see how the signal has been distorted due to the leakage and etc.
NFFT=256; % FFT length
output=1/N*abs(fft(output,NFFT)).^2; % PSD estimate using FFT
this gives you what is known as a periodigram, when you plot, you might want to do the 10*log10 to it, so it looks nicer
Hope you do well in class.

Time delay estimation using crosscorrelation

I have two sensors seperated by some distance which receive a signal from a source. The signal in its pure form is a sine wave at a frequency of 17kHz. I want to estimate the TDOA between the two sensors. I am using crosscorrelation and below is my code
x1; % signal as recieved by sensor1
x2; % signal as recieved by sensor2
len = length(x1);
nfft = 2^nextpow2(2*len-1);
X1 = fft(x1);
X2 = fft(x2);
X = X1.*conj(X2);
m = ifft(X);
r = [m(end-len+1) m(1:len)];
[a,i] = max(r);
td = i - length(r)/2;
I am filtering my signals x1 and x2 by removing all frequencies below 17kHz.
I am having two problems with the above code:
1. With the sensors and source at the same place, I am getting different values of 'td' at each time. I am not sure what is wrong. Is it because of the noise? If so can anyone please provide a solution? I have read many papers and went through other questions on stackoverflow so please answer with code along with theory instead of just stating the theory.
2. The value of 'td' is sometimes not matching with the delay as calculated using xcorr. What am i doing wrong? Below is my code for td using xcorr
[xc,lags] = xcorr(x1,x2);
[m,i] = max(xc);
td = lags(i);

One problem you might have is the fact that you only use a single frequency. At f = 17 kHz, and an estimated speed-of-sound v = 340 m/s (I assume you use ultra-sound), the wavelength is lambda = v / f = 2 cm. This means that your length measurement has an unambiguity range of 2 cm (sorry, cannot find a good link, google yourself). This means that you already need to know your distance to better than 2 cm, before you can use the result of your measurement to refine the distance.
Think of it in another way: when taking the cross-correlation between two perfect sines, the result should be a 'comb' of peaks with spacing equal to the wavelength. If they overlap perfectly, and you displace one signal by one wavelength, they still overlap perfectly. This means that you first have to know which of these peaks is the right one, otherwise a different peak can be the highest every time purely by random noise. Did you make a plot of the calculated cross-correlation before trying to blindly find the maximum?
This problem is the same as in interferometry, where it is easy to measure small distance variations with a resolution smaller than a wavelength by measuring phase differences, but you have no idea about the absolute distance, since you do not know the absolute phase.
The solution to this is actually easy: let your source generate more frequencies. Even using (band-limited) white-noise should work without problems when calculating cross-correlations, and it removes the ambiguity problem. You should see the white noise as a collection of sines. The cross-correlation of each of them will generate a comb, but with different spacing. When adding all those combs together, they will add up significantly only in a single point, at the delay you are looking for!

White Noise, Maximum Length Sequency or other non-periodic signals should be used as the test signal for time delay measurement using cross correleation. This is because non-periodic signals have only one cross correlation peak and there will be no ambiguity to determine the time delay. It is possible to use the burst type of periodic signals to do the job, but with degraded SNR. If you have to use a continuous periodic signal as the test signal, then you can only measure a time delay within one period of the periodic test signal. This should explain why, in your case, using lower frequency sine wave as the test signal works while using higher frequency sine wave does not. This is demonstrated in these videos: https://youtu.be/L6YJqhbsuFY, https://youtu.be/7u1nSD0RlwY .

Accelerometer signal segmentation

I have a 1D accelerometer signal (one axis only). I would like to create a robust algorithm, which would be able to recognize some shapes in the signal.
At first I apply a moving average filter to the raw signal. On the attached picture the raw signal is coloured red and the averaged signal is black. As seen from the picture, some trends are visible from the averaged (black) signal - the signal contains 10 repetitions of a peak like pattern, where acceleration climbs to a maximum and then drops back down. I have marked the beginnings and endings of those patterns with a cross.
So my goal is to find the marked positions automatically. The problem making the pattern extraction difficult are:
the start of the pattern could have a different y value than the end of the pattern
the pattern could have more than one peak
I do not have any concrete time information (from start to the end of the pattern it takes A time units)
I have tried different approaches, which are pretty much home-brew, so I won't mention them - I don't want you to be biased by my way of thinking. Are there some standard or by the books approaches for doing that kind of pattern extraction? Or maybe does anyone know how to tackle the problem in a robust way?
Any idea will be appreciated.

Keep it simple!
It appears the moving average is a good enough damper device; keep it as-is, maybe only increasing or decreasing its sample count if you notice that it either leaves too much noise or removes too much signal respectively. You then work off the this averaged signal exclusively.
The pattern markers you seek seems relatively easy to detect. Expressed in English, these markers are:
Targets = the points of inflection in the averaged readings curve, when the slope goes markedly negative to positive.
You should therefore be able to detect this situation by comparison of the slope values, calculated along with the moving average, as each new reading value comes available (of course with a short delay, as of course the slope at a given point can only be calculated when the averaged reading for the next [few] point[s] is available)
To avoid false detection, however, you'd need to define a few parameters aimed at filtering the undesirable patterns. These paremeters will define more precisely the meaning of "markedly" in the above target definition.
Tentatively the formula for detecting a point of interest could be as simple as this
(-1 * S(t-1) + St ) > Min_delta_Slope
where
S is the slope (more on this) at time t-1 and t, respectively
Min_delta_Slope is a parameter defining how "sharp" a change in slope we want at a minimum.
Assuming normalized t and Y units, we can set the Min_delta_Slope parameter close to or even past 1. Intuitively a value of 1 (again in normalized units) would indicate that we target points where the curved "arrived" with a downward slope of say 50% and left the point with a upward slope of 50% (or 40% + 60% or .. 10% i.e almost flat and 90% i.e. almost vertical).
To avoid detecting points in the case when this is merely a small dip in the curve, we can take more points into consideration, with a fancier formula such as say
(Pm2 * S(t-2) + Pm1 * S(t-1) + P0 * St + Pp1 S(t+1) ) > Min_delta_Slope
where
Pm2, Pm1, P0 and Pp1 are coefficients giving relative importance to the slope at various point before and after the point of interest. (Pm2 and Pm1 typically negative values unless we use only positive parameter and use negative signs in the formula)
St +/- n is the slope a various times
and Min_delta_Slope is a parameter defining how "sharp" a change in slope we want at a minimum.
Intuitively, this 4 points formula would take into account the shape of the curve at a point two readings prior and two reading past the point of interest (in addition to considering the point right before and after it). Given the proper values for the parameters, the formula would require that the curve be steadily coming "down" for two time slices, then steadily going up for the next two time slices, hence avoiding to mark smaller dips in the curve.
An alternative way to achieve this, may be to compute the slope by using the difference in Y value between the [averaged] reading from two (or more) time slices ago and that of the current [averaged] reading. These two approaches are similar but would produce slightly different result; generally we'd have more say on the desired shape of the curve with the Pm2, Pm1, P0 and P1 parameters.

You might want to look at watershed segmentation, which does a related kind of thing (dividing landscapes into their separate catchment basins). Oddly enough, I'm actually writing a PhD thesis which uses watershed a lot at the moment (seriously :))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse