Eigenvalues of a Laplacian in NetworkX - scipy

NetworkX has a decent code example for getting all the eigenvalues of a Laplacian matrix, given below:
import matplotlib.pyplot as plt
import networkx as nx
import numpy.linalg
n = 1000 # 1000 nodes
m = 5000 # 5000 edges
G = nx.gnm_random_graph(n, m)
L = nx.normalized_laplacian_matrix(G)
e = numpy.linalg.eigvals(L.A)
print("Largest eigenvalue:", max(e))
print("Smallest eigenvalue:", min(e))
plt.hist(e, bins=100) # histogram with 100 bins
plt.xlim(0, 2) # eigenvalues between 0 and 2
plt.show()
For the most part I follow all of this until you hit numpy.linalg.eigvals(L.A). What's the .A bit doing? I've looked at the documentation for sparse matrixes in SciPy, but I can't find a reference to this.

L.A is shorthand for L.toarray(). It is the matrix representation of the matrix object.

Related

Create a weights adjacency matrix in python with networkX

I want to implement the Dijkstra algorithm in python but with weighted adjacency matrix but NetworkX give us just the adjacency without the weights ( distance for my algorithm ) so I tried to search a way to create a weighted adjacency matrix but I didn't found. The only code I find from NetworkX is :
A = nx.adjacency_matrix(G, weight='weight')
This is my code for the rest :
G = ox.graph_from_bbox(nord, sud, est, ouest, network_type='drive')
noeud_origine = ox.get_nearest_node(G, point_origine) noeud_destination = ox.get_nearest_node(G, point_destination)

FFT not showing any dominant frequencies

I am trying to perform an FFT from time series data of DC motor current from "F.A.I.R. open dataset of brushed DC motor faults for testing of AI algorithms". However, the result does not show any dominant frequency bands. It just resembles broadband noise. The first image is a zoomed in snap shot of the time series data (the entire series is over 100,000 data points), after the DC portion has been substracted.
Timeseries graph
The second image is the fft graph and my code is below. The time period is not yet set correctly but this does not effect the form of the data, only the frequency values assigned to it.
FFT graph
import matplotlib.pyplot as plt
import h5py
filename = "MOTOR-DC_2020_12_02_17_59_47_Analogico.hdf5"
#MOTOR-DC_2020_12_02_17_59_47_Analogico.hdf5
#MOTOR-DC_2020_12_02_17_30_42_Analogico.hdf5
with h5py.File(filename, "r") as f:
# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]
# Get the data
data = list(f[a_group_key])
vibration =[(data[i][0]) for i in range(0,len(data))]
current =[(data[i][1]) for i in range(0,len(data))]
voltage=current =[(data[i][2]) for i in range(0,len(data))]
x=list(range(0,len(vibration)))
from scipy.fft import fft, fftfreq
import numpy as np
# Number of sample points
N = len(data)#600
# sample spacing
T = 0.0001
x = np.linspace(0.0, N*T, N, endpoint=False)
y = current
y_mean=np.mean(y)
y_med=np.median(y)
print('Mean',y_mean,'Median=',y_med)
for i in range(0,len(y)):
y[i]=y[i]-y_mean
#plt.plot(x,current)
yf = fft(y)
xf = fftfreq(n=N, d=T)[:N//2]
import matplotlib.pyplot as plt
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()
Try
yf = fft(y)
xf = fftfreq(N)
xf[np.argmax(np.abs(yf))]
This will give you the normalized frequency of the most prominent harmonic.
You can then multiply it by the sampling frequency to get the actual frequency.

Remove noise and smoothen the ecg signal

I am processing Long term afib dataset - https://physionet.org/content/ltafdb/1.0.0/
When I test the 30s strips of this data, my model is not correcting predicting the signals. So I am trying to deal with noise in this dataset. Here how it looks
Here is the code to plot -
def plot_filter_graphs(data,xmin,xmax,order):
from numpy import sin, cos, pi, linspace
from numpy.random import randn
from scipy import signal
from scipy.signal import lfilter, lfilter_zi, filtfilt, butter
from matplotlib.pyplot import plot, legend, show, grid, figure, savefig,xlim
lowcut=1
highcut=35
nyq = 0.5 * 300
low = lowcut / nyq
high = highcut / nyq
b, a = signal.butter(order, [low, high], btype='band')
# Apply the filter to xn. Use lfilter_zi to choose the initial condition
# of the filter.
z = lfilter(b, a,data)
# Use filtfilt to apply the filter.
y = filtfilt(b, a, data)
y = np.flipud(y)
y = signal.lfilter(b, a, y)
y = np.flipud(y)
# Make the plot.
figure(figsize=(16,5))
plot(data,'b',linewidth=1.75)
plot(z, 'r--', linewidth=1.75)
plot( y, 'k', linewidth=1.75)
xlim(xmin,xmax)
legend(('actual',
'lfilter',
'filtfilt'),
loc='best')
grid(True)
show()
I am using butter band pass filter to filter the noise. I also checked with filtfilt and lfilt but that is also not giving good result.
Any suggestion, how noise can be removed so that signal accuracy is good and hense it can be used for model prediction

How to calculate the cosine similarity of two vectors in PySpark?

I am about to compute the cosine similarity of two vectors in PySpark, like
1 - spatial.distance.cosine(xvec, yvec)
but scipy seems to not support the pyspark.ml.linalg.Vector type.
You can use dot and norm methods to calculate this pretty easily:
from pyspark.ml.linalg import Vectors
x = Vectors.dense([1,2,3])
y = Vectors.dense([2,3,5])
1 - x.dot(y)/(x.norm(2)*y.norm(2))
# 0.0028235350472619603
With scipy:
from scipy.spatial.distance import cosine
​
x = np.array([1,2,3])
y = np.array([2,3,5])
cosine(x, y)
# 0.0028235350472619603

Using scipy.stats.gaussian_kde with 2 dimensional data

I'm trying to use the scipy.stats.gaussian_kde class to smooth out some discrete data collected with latitude and longitude information, so it shows up as somewhat similar to a contour map in the end, where the high densities are the peak and low densities are the valley.
I'm having a hard time putting a two-dimensional dataset into the gaussian_kde class. I've played around to figure out how it works with 1 dimensional data, so I thought 2 dimensional would be something along the lines of:
from scipy import stats
from numpy import array
data = array([[1.1, 1.1],
[1.2, 1.2],
[1.3, 1.3]])
kde = stats.gaussian_kde(data)
kde.evaluate([1,2,3],[1,2,3])
which is saying that I have 3 points at [1.1, 1.1], [1.2, 1.2], [1.3, 1.3]. and I want to have the kernel density estimation using from 1 to 3 using width of 1 on x and y axis.
When creating the gaussian_kde, it keeps giving me this error:
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
Looking into the source code of gaussian_kde, I realize that the way I'm thinking about what dataset means is completely different from how the dimensionality is calculate, but I could not find any sample code showing how multi-dimension data works with the module. Could someone help me with some sample ways to use gaussian_kde with multi-dimensional data?
This example seems to be what you're looking for:
import numpy as np
import scipy.stats as stats
from matplotlib.pyplot import imshow
# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
axis=1)
kde = stats.kde.gaussian_kde(rvs.T)
# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)
z = kde(grid_coords.T)
z = z.reshape(128,128)
imshow(z,aspect=x_flat.ptp()/y_flat.ptp())
Axes need fixing, obviously.
You can also do a scatter plot of the data with
scatter(rvs[:,0],rvs[:,1])
I think you are mixing up kernel density estimation with interpolation or maybe kernel regression. KDE estimates the distribution of points if you have a larger sample of points.
I'm not sure which interpolation you want, but either the splines or rbf in scipy.interpolate will be more appropriate.
If you want one-dimensional kernel regression, then you can find a version in scikits.statsmodels with several different kernels.
update: here is an example (if this is what you want)
>>> data = 2 + 2*np.random.randn(2, 100)
>>> kde = stats.gaussian_kde(data)
>>> kde.evaluate(np.array([[1,2,3],[1,2,3]]))
array([ 0.02573917, 0.02470436, 0.03084282])
gaussian_kde has variables in rows and observations in columns, so reversed orientation from the usual in stats. In your example, all three points are on a line, so it has perfect correlation. That is, I guess, the reason for the singular matrix.
Adjusting the array orientation and adding a small noise, the example works, but still looks very concentrated, for example you don't have any sample point near (3,3):
>>> data = np.array([[1.1, 1.1],
[1.2, 1.2],
[1.3, 1.3]]).T
>>> data = data + 0.01*np.random.randn(2,3)
>>> kde = stats.gaussian_kde(data)
>>> kde.evaluate(np.array([[1,2,3],[1,2,3]]))
array([ 7.70204299e+000, 1.96813149e-044, 1.45796523e-251])
I found it difficult to understand the SciPy manual's description of how gaussian_kde works with 2D data. Here is an explanation which is intended to complement #endolith 's example. I divided the code into several steps with comments to explain the less intuitive bits.
First, the imports:
import numpy as np
import scipy.stats as st
from matplotlib.pyplot import imshow, show
Create some dummy data: these are 1-D arrays of the "X" and "Y" point coordinates.
np.random.seed(142) # for reproducibility
x = st.norm.rvs(loc=2, scale=1, size=2000)
y = st.norm.rvs(loc=0, scale=3, size=2000)
For 2-D density estimation the gaussian_kde object has to be initialised with an array with two rows containing the "X" and "Y" datasets. In NumPy terminology, we "stack them vertically":
xy = np.vstack((x, y))
so the "X" data is in the first row xy[0,:] and the "Y" data are in the second row xy[1,:] and xy.shape is (2, 2000). Now create the gaussian_kde object:
dens = st.gaussian_kde(xy)
We will evaluate the estimated 2-D density PDF on a 2-D grid. There is more than one way of creating such a grid in NumPy. I show here an approach which is different from (but functionally equivalent to) #endolith 's method:
gx, gy = np.mgrid[x.min():x.max():128j, y.min():y.max():128j]
gxy = np.dstack((gx, gy)) # shape is (128, 128, 2)
gxy is a 3-D array, the [i,j]-th element of gxy contains a 2-element list of the corresponding "X" and "Y" values: gxy[i, j] 's value is [ gx[i], gy[j] ].
We have to invoke dens() (or dens.pdf() which is the same thing) on each of the 2-D grid points. NumPy has a very elegant function for this purpose:
z = np.apply_along_axis(dens, 2, gxy)
In words, the callable dens (could have been dens.pdf as well) is invoked along axis=2 (the third axis) in the 3-D array gxy and the values should be returned as a 2-D array. The only glitch is that the shape of z will be (128,128,1) and not (128,128) what I expected. Note that the documentation says that:
The shape of out [the return value, L.D.] is identical to the shape of arr, except along the
axis dimension. This axis is removed, and replaced with new dimensions
equal to the shape of the return value of func1d. So if func1d returns
a scalar out will have one fewer dimensions than arr.
Most likely dens() returned a 1-long tuple and not a scalar which I was hoping for. I didn't investigate the issue any further, because this is easy to fix:
z = z.reshape(128, 128)
after which we can generate the image:
imshow(z, aspect=gx.ptp() / gy.ptp())
show() # needed if you try this in PyCharm
Here is the image. (Note that I have implemented #endolith 's version as well and got an image indistinguishable from this one.)
The example posted in the top answer didn't work for me. I had to tweak it little bit and it works now:
import numpy as np
import scipy.stats as stats
from matplotlib import pyplot as plt
# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
axis=1)
kde = stats.kde.gaussian_kde(rvs.T)
# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)
z = kde(grid_coords.T)
z = z.reshape(128,128)
plt.imshow(z,aspect=x_flat.ptp()/y_flat.ptp())
plt.show()