I am creating a graph from a numpy array where the values of the array elements represent weights. See code below. This graph clearly has one isolate. But why is the code for getting isolates not giving me an empty list.
import numpy
A=numpy.matrix([[3,0,1],[0,2,0], [0,0,5]])
G=nx.from_numpy_matrix(A, parallel_edges=False)
matrix([[3, 0, 1],
[0, 2, 0],
[0, 0, 5]])
nx.draw(G, node_color = 'green', node_size = 50, with_labels=False)
plt.show()
nx.degree(G)
4
nx.degree(G)
DegreeView({0: 3, 1: 2, 2: 3})
list(nx.isolates(G))
[]
According to the networkx documentation an 'isolate' is a degree 0 node.
In your case the node has an edge to itself. So it does not have degree 0.
One option to fix this would be to remove self-edges from the graph:
https://stackoverflow.com/a/49428652/2966723
But only do this if they aren't needed for whatever your application is.
Related
Scipy calculates the rmsd like this, and I'll paraphrase it here for convenience (for readability I drop the weights and the max(*, 0))
rmsd = np.sqrt(np.sum(b ** 2 + a ** 2) - 2 * np.sum(s))
To me this does not look like RMSD.
Now from the docs one would infer that the rmsd return value is defined as the square root of double this expression:
The latter is indeed what I would consider to be the RMSD. In fact I went ahead and coded it up (note that this function expects me to apply the estimated transformation to one of the sets of points first whereas the snippet above does not)
def _calc_rmsd(a: np.ndarray, b_transformed: np.ndarray) -> float:
distances = np.linalg.norm(a - b_transformed, axis=-1)
rmsd = np.sqrt((distances ** 2).sum() / len(distances))
return rmsd
I also plotted out what these would look like for randomly generated point pairs with normally distributed noise (blue is scipy, orange is mine)
Or extending the plot out to 200 point pairs:
So to sum it up:
The definition of rmsd in the docs is in agreement with what I believe to be the widely accepted notion of rmsd
The scipy code implementation of rmsd disagrees with the latter. I don't even understand what it's supposed to mathematically represent.
From monte-carlo simulations, clearly the two implementations have different outcomes.
So what's going on?
Apparently the SciPy code is not returning the root-mean-squared distance. It sums the squared differences, but it does not divide by the number of vectors before taking the square root. The difference between the SciPy calculation and yours is a factor of sqrt(len(a)). You can verify this with an example such as the following.
In [157]: from scipy.spatial.transform import Rotation
In [158]: def _calc_rmsd(a: np.ndarray, b_transformed: np.ndarray) -> float:
...: distances = np.linalg.norm(a - b_transformed, axis=-1)
...: rmsd = np.sqrt((distances ** 2).sum() / len(distances))
...: return rmsd
...:
Some test data:
In [159]: a = np.array([[0, 1, 1], [1, 1, 1.5], [2.0, -1.0, 4.0], [-1, 0, 5]])
In [160]: b = np.array([[0, 1, 1.5], [2, 2, 2], [1, -1, 5], [-3, 0.1, 1]])
Compute the rotation:
In [161]: R, rmsd = Rotation.align_vectors(a, b)
In [162]: rmsd
Out[162]: 3.8753534834716685
Here's your calculation of the RMSD:
In [163]: _calc_rmsd(a, R.apply(b))
Out[163]: 1.9376767417358356
And here is your calculation, multiplied by sqrt(len(a)), so it matches the result returned by Rotation.align_vectors:
In [164]: _calc_rmsd(a, R.apply(b)) * np.sqrt(len(a))
Out[164]: 3.875353483471671
This looks like a documentation issue. If you have a moment, you could create a new issue for this over in https://github.com/scipy/scipy/issues
I have a dataset that contains a cross pattern. How can I filter off this cross-like pattern?
Tried DBScan but it didn't work effectively. Also, can't use any cluster which needs to specify number of clusters as the data cleaning needs to be automated.
Sorry, forget SVM, that's for classification. Honestly, I didn't ready your question carefully the first time. I just re-read what you posted originally. Try Mean Shift for automatically detecting the optimal number of clusters. Here's an example. Hopefully you can adapt it for your specific use.
import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets import make_blobs
# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1], [1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.2)
# #############################################################################
# Compute clustering with MeanShift
# The following bandwidth can be automatically detected using
bandwidth = estimate_bandwidth(X, quantile=0.6, n_samples=5000)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)
# #############################################################################
# Plot result
import matplotlib.pyplot as plt
from itertools import cycle
plt.figure(1)
plt.clf()
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
my_members = labels == k
cluster_center = cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
Or, try this.
import numpy as np
import pandas as pd
from sklearn.cluster import MeanShift
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_blobs
# We will be using the make_blobs method
# in order to generate our own data.
clusters = [[2, 2, 2], [7, 7, 7], [5, 13, 13]]
X, _ = make_blobs(n_samples = 150, centers = clusters,
cluster_std = 0.60)
# After training the model, We store the
# coordinates for the cluster centers
ms = MeanShift()
ms.fit(X)
cluster_centers = ms.cluster_centers_
# Finally We plot the data points
# and centroids in a 3D graph.
fig = plt.figure()
ax = fig.add_subplot(111, projection ='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], marker ='o')
ax.scatter(cluster_centers[:, 0], cluster_centers[:, 1],
cluster_centers[:, 2], marker ='x', color ='red',
s = 300, linewidth = 5, zorder = 10)
plt.show()
There are a few clustering methodologies that help you choose the optimal number of clusters automatically. Check out the link below for some ideas of how to move forward with your project.
https://scikit-learn.org/stable/modules/clustering.html
I would like to implement an unsupervised clustering to detect grids (vertical/horizontal lines) for spatial points.
I have tried DBSCAN and it gives subpar results. It is able to pick out the grids as seen in red below:
However, it is not able to completely pick out all the points that form the vertical/horizontal lines and if i relax the parameters of epsilon, it will incorrectly classify more points as noisy (e.g. the bottom left of the picture).
I was wondering if maybe there is a modification model of DBSCAN that uses ellipse instead of circles? Or any other clustering methods recommended for this that does not need to prespecify the number of clusters?
Or is there a better method to identify these points that make the grid? Any help is appreciated.
You can use an anisotropical DBSCAN by modifying your data this way : value of anisotropy >1 will find vertical clusters and values <1 will find horizontal clusters.
from sklearn.cluster import DBSCAN
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
"""ANIsotropic DBSCAN clustering : some documentation would be nice here :)
returns an array with """
X[:, 1] = X[:, 1]*anisotropy
db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
return db
Here is a full example with data :
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
n_samples=750, centers=centers, cluster_std=0.4, random_state=0
)
print(X.shape)
def anisotropical_DBSCAN(X, anisotropy, eps, min_samples):
"""ANIsotropic DBSCAN clustering : some documentation would be nice here :)
returns an array with """
X[:, 1] = X[:, 1]*anisotropy
db = DBSCAN(eps=eps, min_samples=min_samples).fit(X)
return db
db = anisotropical_DBSCAN(X, anisotropy = 0.1, eps = 0.1, min_samples = 10)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
# #############################################################################
# Plot result
import matplotlib.pyplot as plt
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = labels == k
xy = X[class_member_mask & core_samples_mask]
plt.plot(
xy[:, 0],
xy[:, 1],
"o",
markerfacecolor=tuple(col),
markeredgecolor="k",
markersize=14,
)
xy = X[class_member_mask & ~core_samples_mask]
plt.plot(
xy[:, 0],
xy[:, 1],
"o",
markerfacecolor=tuple(col),
markeredgecolor="k",
markersize=6,
)
plt.title("Estimated number of clusters: %d" % n_clusters_)
You get vertical clusters :
Now change the parameters to db = anisotropical_DBSCAN(X, anisotropy = 10, eps = 1, min_samples = 10) I had to change eps value because the horizontal scale and vertical scale arent the same, but in your case, you should be able to keep the same (eps, min sample) for detecting lines
And you get horizontal clusters :
There are also implementations of anisotropical DBSCAN that are probably a lot cleaner https://github.com/gissong/ADCN
I am trying to compute a per-channel gradient image in PyTorch. To do this, I want to perform a standard 2D convolution with a Sobel filter on each channel of an image. I am using the torch.nn.functional.conv2d function for this
In my minimum working example code below, I get an error:
import torch
import torch.nn.functional as F
filters = torch.autograd.Variable(torch.randn(1,1,3,3))
inputs = torch.autograd.Variable(torch.randn(1,3,10,10))
out = F.conv2d(inputs, filters, padding=1)
RuntimeError: Given groups=1, weight[1, 1, 3, 3], so expected
input[1, 3, 10, 10] to have 1 channels, but got 3 channels instead
This suggests that groups need to be 3. However, when I make groups=3, I get a different error:
import torch
import torch.nn.functional as F
filters = torch.autograd.Variable(torch.randn(1,1,3,3))
inputs = torch.autograd.Variable(torch.randn(1,3,10,10))
out = F.conv2d(inputs, filters, padding=1, groups=3)
RuntimeError: invalid argument 4: out of range at
/usr/local/src/pytorch/torch/lib/TH/generic/THTensor.c:440
When I check that code snippet in the THTensor class, it refers to a bunch of dimension checks, but I don't know where I'm going wrong.
What does this error mean? How can I perform my intended convolution with this conv2d function? I believe I am misunderstanding the groups parameter.
If you want to apply a per-channel convolution then your out-channel should be the same as your in-channel. This is expected, considering each of your input channels creates a separate output channel that it corresponds to.
In short, this will work
import torch
import torch.nn.functional as F
filters = torch.autograd.Variable(torch.randn(3,1,3,3))
inputs = torch.autograd.Variable(torch.randn(1,3,10,10))
out = F.conv2d(inputs, filters, padding=1, groups=3)
whereas, filters of size (2, 1, 3, 3) or (1, 1, 3, 3) will not work.
Additionally, you can also make your out-channel a multiple of in-channel. This works for instances where you want to have multiple convolution filters for each input channel.
However, This only makes sense if it is a multiple. If not, then pytorch falls back to its closest multiple, a number less than what you specified. This is once again expected behavior. For example a filter of size (4, 1, 3, 3) or (5, 1, 3, 3), will result in an out-channel of size 3.
In the Tensorflow docs, the tf.nn.conv2d-operation is described to:
Flatten the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].
Extract image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
For each patch, right-multiply the filter matrix and the image patch vector.
Is there an operation to apply just step 2? I cannot find anything like that in the API docs. I might be searching with the wrong keywords.
This is now added to the tensorflow api: https://www.tensorflow.org/versions/r0.9/api_docs/python/array_ops.html#extract_image_patches
I guess a trick to do that would be to:
Take a filter of shape [filter_height, filter_width, in_channels, output_channels] with output_channels = filter_height * filter_width * in_channels
Fix the value of this filter in a way that when the filter is flattened to a 2-D matrix (cf. your step 2), it is the identity matrix. Check my example code below for a simple way to do that with np.eye().reshape()
Perform a normal tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
You now have an output of shape [batch, out_height, out_width, filter_height * filter_width * in_channels]
Here is a simple code for an input image of size 3*3 with 1 channel (and batch size 1).
import tensorflow as tf
import numpy as np
input_value = np.arange(1, 10).reshape((1, 3, 3, 1))
input = tf.constant(input_value)
input = tf.cast(input, tf.float32)
filter_value = np.eye(9).reshape((3, 3, 1, 9))
filter = tf.constant(filter_value)
filter = tf.cast(filter, tf.float32)
output = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')