Remove noise and smoothen the ecg signal - scipy

I am processing Long term afib dataset - https://physionet.org/content/ltafdb/1.0.0/
When I test the 30s strips of this data, my model is not correcting predicting the signals. So I am trying to deal with noise in this dataset. Here how it looks
Here is the code to plot -
def plot_filter_graphs(data,xmin,xmax,order):
from numpy import sin, cos, pi, linspace
from numpy.random import randn
from scipy import signal
from scipy.signal import lfilter, lfilter_zi, filtfilt, butter
from matplotlib.pyplot import plot, legend, show, grid, figure, savefig,xlim
lowcut=1
highcut=35
nyq = 0.5 * 300
low = lowcut / nyq
high = highcut / nyq
b, a = signal.butter(order, [low, high], btype='band')
# Apply the filter to xn. Use lfilter_zi to choose the initial condition
# of the filter.
z = lfilter(b, a,data)
# Use filtfilt to apply the filter.
y = filtfilt(b, a, data)
y = np.flipud(y)
y = signal.lfilter(b, a, y)
y = np.flipud(y)
# Make the plot.
figure(figsize=(16,5))
plot(data,'b',linewidth=1.75)
plot(z, 'r--', linewidth=1.75)
plot( y, 'k', linewidth=1.75)
xlim(xmin,xmax)
legend(('actual',
'lfilter',
'filtfilt'),
loc='best')
grid(True)
show()
I am using butter band pass filter to filter the noise. I also checked with filtfilt and lfilt but that is also not giving good result.
Any suggestion, how noise can be removed so that signal accuracy is good and hense it can be used for model prediction

Related

FFT not showing any dominant frequencies

I am trying to perform an FFT from time series data of DC motor current from "F.A.I.R. open dataset of brushed DC motor faults for testing of AI algorithms". However, the result does not show any dominant frequency bands. It just resembles broadband noise. The first image is a zoomed in snap shot of the time series data (the entire series is over 100,000 data points), after the DC portion has been substracted.
Timeseries graph
The second image is the fft graph and my code is below. The time period is not yet set correctly but this does not effect the form of the data, only the frequency values assigned to it.
FFT graph
import matplotlib.pyplot as plt
import h5py
filename = "MOTOR-DC_2020_12_02_17_59_47_Analogico.hdf5"
#MOTOR-DC_2020_12_02_17_59_47_Analogico.hdf5
#MOTOR-DC_2020_12_02_17_30_42_Analogico.hdf5
with h5py.File(filename, "r") as f:
# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]
# Get the data
data = list(f[a_group_key])
vibration =[(data[i][0]) for i in range(0,len(data))]
current =[(data[i][1]) for i in range(0,len(data))]
voltage=current =[(data[i][2]) for i in range(0,len(data))]
x=list(range(0,len(vibration)))
from scipy.fft import fft, fftfreq
import numpy as np
# Number of sample points
N = len(data)#600
# sample spacing
T = 0.0001
x = np.linspace(0.0, N*T, N, endpoint=False)
y = current
y_mean=np.mean(y)
y_med=np.median(y)
print('Mean',y_mean,'Median=',y_med)
for i in range(0,len(y)):
y[i]=y[i]-y_mean
#plt.plot(x,current)
yf = fft(y)
xf = fftfreq(n=N, d=T)[:N//2]
import matplotlib.pyplot as plt
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()
Try
yf = fft(y)
xf = fftfreq(N)
xf[np.argmax(np.abs(yf))]
This will give you the normalized frequency of the most prominent harmonic.
You can then multiply it by the sampling frequency to get the actual frequency.

Using numerical methods to plot solution to first-order nonlinear differential equation in Matlab

I have a question about plotting x(t), the solution to the following differential equation knowing that dx/dt equals the expression below. The value of x is 0 at t = 0.
syms x
dxdt = -(1.0*(6.84e+45*x^2 + 5.24e+32*x - 2.49e+42))/(2.47e+39*x + 7.12e+37)
I want to plot the solution of this first-order nonlinear differential equation. The analytical solution involves complex numbers so that's not relevant because this equation models a real-life process, but Matlab can solve the equation using numerical methods and plot it. Can someone please suggest how to do this?
in matlab try this
tspan = [0 10];
x0 = 0;
[t,x] = ode45(#(t,x) -(1.0*(6.84e+45*x^2 + 5.24e+32*x - 2.49e+42))/(2.47e+39*x + 7.12e+37), tspan, x0);
plot(t,x,'b')
i try it and i got this
hope that help you.
I have written an example for how to use Python with SymPy and matplotlib. SymPy can be used to calculate both definite and indefinite integrals. By calculating the indefinite integral and adding a constant to set it to evaluate to 0 at t = 0. Now you have the integral, so just a matter of plotting. I would define an array from a starting point to an endpoint with 1000 points between (could likely be less). You can then calculate the value of the integral with the constant at each time point, which can then be plotted with matplotlib. There are plenty of other questions on how to customize plots with matplotlib.
This displays a basic plot of the indefinite integral of the function dxdt with assumption of x(t) = 0. Variation of the tuple when running Plotting() will set what range of x values to plot. This is set to plot 1000 data points between the minimum and maximum values set when calling the function.
For more information on customizing the plot, I recommend matplotlib documentation. Documentation on the integral can be found in SymPy documentation.
import pandas as pd
from sympy import *
from sympy.abc import x
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
def Plotting(xValues, dxdt):
# Calculate integral
xt = integrate(dxdt,x)
# Convert to function
f = lambdify(x, xt)
C = -f(0)
# Define x values, last number in linspace corresponding to number of points to plot
xValues = np.linspace(xValues[0],xValues[1],500)
yValues = [f(x)+C for x in xValues]
# Initialize figure
fig = plt.figure(figsize = (4,3))
ax = fig.add_axes([0, 0, 1, 1])
# Plot Data
ax.plot(xValues, yValues)
plt.show()
plt.close("all")
# Define Function
dxdt = -(1.0*(6.84e45*x**2 + 5.24e32*x - 2.49e42))/(2.47e39*x + 7.12e37)
# Run Plotting function, with left and right most points defined as tuple, and function as second argument
Plotting((-0.025, 0.05),dxdt)

How to implement a FIR high pass filter in Python?

First of all I asked this question in Stack Exchange and I am getting only concept related answers and not implementation oriented. So, my problem is I am trying to create high pass filter and I implemented using Python.
from numpy import cos, sin, pi, absolute, arange
from scipy.signal import kaiserord, lfilter, firwin, freqz, firwin2
from pylab import figure, clf, plot, xlabel, ylabel, xlim, ylim, title, grid, axes, show
# Nyquist rate.
nyq_rate = 48000 / 2
# Width of the roll-off region.
width = 500 / nyq_rate
# Attenuation in the stop band.
ripple_db = 12.0
num_of_taps, beta = kaiserord(ripple_db, width)
# Cut-off frequency.
cutoff_hz = 5000.0
# Estimate the filter coefficients.
if num_of_taps % 2 == 0:
num_of_taps = num_of_taps + 1
taps = firwin(num_of_taps, cutoff_hz/nyq_rate, window=('kaiser', beta), pass_zero='highpass')
w, h = freqz(taps, worN=1024)
plot((w/pi)*nyq_rate, absolute(h), linewidth=2)
xlabel('Frequency (Hz)')
ylabel('Gain')
title('Frequency Response')
ylim(-0.05, 1.05)
grid(True)
show()
By looking at the frequency response I am not getting the stop band attenuation as expected. I want 12dB attenuation and I am not getting that. What am I doing wrong?
Change the pass_zero argument of firwin to False. That argument must be a boolean (i.e. True or False). By setting it to False, you are selecting the behavior of the filter to be a high-pass filter (i.e. the filter does not pass the 0 frequency of the signal).
Here's a variation of your script. I've added horizontal dashed lines that show the desired attenuation in the stop band (cyan) and desired ripple bounds in the pass band (red) as determined by your choice of ripple_db. I also plot vertical dashed lines (green) to indicate the region of the transition from the stop band to the pass band.
import numpy as np
from scipy.signal import kaiserord, lfilter, firwin, freqz, firwin2
import matplotlib.pyplot as plt
# Nyquist rate.
nyq_rate = 48000 / 2
# Width of the roll-off region.
width = 500 / nyq_rate
# Attenuation in the stop band.
ripple_db = 12.0
num_of_taps, beta = kaiserord(ripple_db, width)
if num_of_taps % 2 == 0:
num_of_taps = num_of_taps + 1
# Cut-off frequency.
cutoff_hz = 5000.0
# Estimate the filter coefficients.
taps = firwin(num_of_taps, cutoff_hz/nyq_rate, window=('kaiser', beta), pass_zero=False)
w, h = freqz(taps, worN=4000)
plt.plot((w/np.pi)*nyq_rate, 20*np.log10(np.abs(h)), linewidth=2)
plt.axvline(cutoff_hz + width*nyq_rate, linestyle='--', linewidth=1, color='g')
plt.axvline(cutoff_hz - width*nyq_rate, linestyle='--', linewidth=1, color='g')
plt.axhline(-ripple_db, linestyle='--', linewidth=1, color='c')
delta = 10**(-ripple_db/20)
plt.axhline(20*np.log10(1 + delta), linestyle='--', linewidth=1, color='r')
plt.axhline(20*np.log10(1 - delta), linestyle='--', linewidth=1, color='r')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Gain (dB)')
plt.title('Frequency Response')
plt.ylim(-40, 5)
plt.grid(True)
plt.show()
Here is the plot that it generates.
If you look closely, you'll see that the frequency response is close to the corners of the region that defines the desired behavior of the filter.
Here's the plot when ripple_db is changed to 21:

Simple scipy curve_fit test not returning expected results

I am trying to estimate the amplitude, frequency, and phase of an incoming signal of about 50Hz based on measurement of only a few cycles. The frequency needs to be precise to .01 Hz. Since the signal itself is going to be a pretty clear sine wave, I am trying parameter fitting with SciPy's curve_fit. I've never used it before, so I wrote a quick test function.
I start by generating samples of a single cycle of a dummy cosine wave
from math import *
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
fs = 1000 # Sampling rate (Hz)
T = .1 # Length of collection (s)
windowlength = int(fs*T) # Number of samples
f0 = 10 # Fundamental frequency of our wave (Hz)
wave = [0]*windowlength
for x in range(windowlength):
wave[x] = cos(2*pi*f0*x/fs)
t = np.linspace(0,T,int(fs*T)) # This will be our x-axis for plotting
Then I try to fit those samples to a function, adapting the code from the official example provided by scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
# Define function to fit
def sinefit(x, A, ph, f):
return A * np.sin(2*pi*f * x + ph)
# Call curve_fit
popt,cov = curve_fit(sinefit, t, wave, p0=[1,np.pi/2,10])
# Plot the result
plt.plot(t, wave, 'b-', label='data')
plt.plot(t, sinefit(t, *popt), 'r-', label='fit')
print("[Amplitude,phase,frequency]")
print(popt)
This gives me popt = [1., 1.57079633, 9.9] and the plot
plot output
My question is: why is my frequency off? I initialized the curve_fit function with the exact parameters of the cosine wave, so shouldn't the first iteration of the LM algorithm realize that there is zero residual and that it has already arrived at the correct answer? That seems to be the case for amplitude and phase, but frequency is 0.1Hz too low.
I expect this is a dumb coding mistake, since the original wave and the fit are clearly lined up in the plot. I also confirmed that the difference between them was zero across the entire sample. If they really were .1 Hz out of phase, there would be a phase shift of 3.6 degrees over my 100ms window.
Any thoughts would be very much appreciated!
The problem is that your array t is not correct. The last value in your t is 0.1, but with a sampling period of 1/fs = 0.001, the last value in t should be 0.099. That is, the times of the 100 samples are [0, 0.001, 0.002, ..., 0.098, 0.099].
You can create t correctly with either
t = np.linspace(0, T, int(fs*T), endpoint=False)
or
t = np.arange(windowlength)/fs # Use float(fs) if you are using Python 2

How to use rp, rs, and Wn parameters in scipy.signal.filter_design.ellip?

I'd like to try out the elliptic filter design function from SciPy in scipy.signal.filter_design.ellip. I'm familiar with the filter design functions in Octave, but I'm not sure how to use this:
From the documentation at http://www.scipy.org/doc/api_docs/SciPy.signal.filter_design.html
ellip(N, rp, rs, Wn, btype = 'low', analog = 0, output = 'ba')
Elliptic (Cauer) digital and analog filter design.
Description:
Design an Nth order lowpass digital or analog elliptic filter and return the filter coefficients in (B,A) or (Z,P,K) form.
See also ellipord.
I understand N (order), btype (low or high), analog (true/false), and output (ba vs. zpk).
What are rp, rs, and Wn and how are they supposed to work?
From my experience with Octave, I'm guessing that rp and rs have to do with the maximum allowed ripple in the pass and stop bands, and that Wn is a weight or controls the cutoff frequency, but how these work isn't documented and I can't find any examples.
I believe HYRY is correct. From my experience using the Python Matlab clone scripts they work well, with the exception of poor documentation. Yes, Rp and Rs are the maximum allowable ripple in the passband and stopband respectively. The Wn is the digital cutoff, or edge frequency.
So...here's some code on how to use it to replicate the filter that the mathworks uses as an example:
import pylab
import scipy
import scipy.signal
[b,a] = scipy.signal.ellip(6,3,50,300.0/500.0);
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
plt.title('Digital filter frequency response')
ax1 = fig.add_subplot(111)
h,w = scipy.signal.freqz(b, a)
plt.semilogy(h, np.abs(w), 'b')
plt.semilogy(h, abs(w), 'b')
plt.ylabel('Amplitude (dB)', color='b')
plt.xlabel('Frequency (rad/sample)')
plt.grid()
plt.legend()
ax2 = ax1.twinx()
angles = np.unwrap(np.angle(w))
plt.plot(h, angles, 'g')
plt.ylabel('Angle (radians)', color='g')
plt.show()
sorry the format is so lame, but it works! You'll notice that the frequency scale is different than matlab shows, it's just cosmetic. This is what you get:
I think this function is the same as Octave or MATLAB, so you can read the MATLAB document about it.
http://www.mathworks.com/help/toolbox/signal/ref/ellip.html