Finding all subgraphs of depth 2 Networkx - networkx

I have a huge graph in networkx and I would like to get all the subgraphs of depth 2 from each node. Is there a nice way to do that using buildin function in networkx?

As I said in the comment, networkx.ego_graph fits the bill. You just need to make sure that you set the radius to 2 (default is 1):
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
# create some test graph
graph = nx.erdos_renyi_graph(1000, 0.005)
# create an ego-graph for some node
node = 0
ego_graph = nx.ego_graph(graph, node, radius=2)
# plot to check
nx.draw(ego_graph); plt.show()

Related

How to visualize HeteroData pytorch geometric graph with any tool?

Hello what is a good way to visualize a pyg HeteroData object ?
(defined similarly: https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html#creating-heterogeneous-gnns )
I tried with networkx but I think it is restricted to homogeneous graph ( it is possible to convert it but it is much less informative).
g = torch_geometric.utils.to_networkx(data.to_homogeneous(), to_undirected=False )
Did anyone try to do it with other python lib (matplotlib) or js (sigma.js/d3.js)?
Any docs link you can share?
I have done the following:
import networkx as nx
from matplotlib import pyplot as plt
from torch_geometric.nn import to_hetero
g = torch_geometric.utils.to_networkx(data.to_homogeneous())
# Networkx seems to create extra nodes from our heterogeneous graph, so I remove them
isolated_nodes = [node for node in g.nodes() if g.out_degree(node) == 0]
[g.remove_node(i_n) for i_n in isolated_nodes]
# Plot the graph
nx.draw(g, with_labels=True)
plt.show()
However, it's true that it was "flattened" to a homogeneous, while it'd be more interesting to, for example, use different colors for different types of nodes.

Feature Selection in Multivariate Linear Regression

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
a = make_regression(n_samples=300,n_features=5,noise=5)
df1 = pd.DataFrame(a[0])
df1 = pd.concat([df1,pd.DataFrame(a[1].T)],axis=1,ignore_index=True)
df1.rename(columns={0:"X1",1:"X2",2:"X3",3:"X4",4:"X5",5:"Target"},inplace=True)
sns.heatmap(df1.corr(),annot=True);
Correlation Matrix
Now I can ask my question. How can I choose features that will be included in the model?
I am not that well-versed in python as I use R most of the time.
But it should be something like this:
# Create a model
model = LinearRegression()
# Call the .fit method and pass in your data
model.fit(Variables,Target)
# Or simply do
model = LinearRegression().fit(Variables,Target)
# So based on the dataset head provided, it should be
X<-df1[['X1','X2','X3','X4','X5']]
Y<-df1['Target']
model = LinearRegression().fit(X,Y)
In order to do feature selections. You need to run the model first. Then check for the p-value. Typically, a p-value of 5% (.05) or less is a good cut-off point. If the p-value crosses the upper threshold of .05, the variable is insignificant and you can remove it from your model. You will have to do this manually. You can also tell by looking from the correlation matrix to see which value has less correlation to the target. AFAIK, there are no libs with built-in functionality to do feature selection automatically. In the end, statistics are just numbers. It is up to humans to interpret the results.

Density-Based Clustering Validation (DBCV) never stops running

I have completed running DBSCAN on a dataset of mine clustering patches of deforestation and I am attempting to validate the results according to this paper.
I have install the package from this Github, but when I try and run the code it never completes. I ran it for over a 5 days and it never stopped running or threw an error. Running DBSCAN only took 15 minutes so I am a little confused why just the validating is taking so long. Is there something I'm getting wrong with the DBCV code or the inputs?
Since it never finishes running the code I don't know of an error that I can report. I am unsure if I'm inputting the data into the code correctly, but I tried to copy the example on GitHub as closely as I could. I don't know how to share my .csv file to show what my file is like. It has 16 dimensions that I consense down using a MinMaxScaler before running DBSCAN. I have previously completed the DBSCAN clustering and an just trying to get the DBCV to work.
import pandas as pd
import numpy as np
from pylab import rcParams
import matplotlib.pyplot as plot
import sklearn
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import MinMaxScaler
from scipy.spatial import euclidean
from DBCV import DBCV
f = pd.read_csv('csv_file_I_Don't_know_how_to_share')
x = f.loc[:, [1-15]].values
norm_data = MinMaxScaler()
data = norm_data.fit_transform(x)
dbscan = DBSCAN(eps=.15, min_samples = 100)
clusters = dbscan.fit_predict(data)
DBCV_score = DBCV(data, clusters, dist_function=euclidean)
print ('DBCV Score: ' + DBCV_score)
I'm expecting a score to be printed but instead the code continues to run and doesn't stop. Any help would be great!
You run:
from scipy.spatial import euclidean
But the code on GitHub defines the method to use euclidean imported like this:
from scipy.spatial.distance import euclidean
Try changing this, it might work.
In addition to the answer by #Dumbfool there seems to be an error in:
print('DBCV Score: ' + DBCV_score)
Try changing the + to ,
I hope this helps.

sklearn minmaxscaler ported to a different notebook

How would I go about downloading the min_max_scaler attributes so that I could apply the same transform to data within a different notebook?
For full disclosure I've trained a NN within one notebook, and am running it in a different locations. It is simple for me to load the trained weights of the NN in the second location, but I need to scale the data before inputting it into the model. To be accurate I believe it has to use the original scale attributes.
Per the documentation, you can recreate what min max scaler does using
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
where X is your original dataset. (Although as long as your feature range is the default of (0,1), the second line above is not needed - you will come out with X_scaled = X_std)
If you want to do this same computation using your already trained MaxMinScaler instead of your original dataset, consider the following example (again assuming feature range is left at the default (0,1))
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
# Test data set
X = pd.DataFrame(np.random.randint(0, 100, size=(20,4)))
# Test scaler
scaler = MinMaxScaler()
sklearn_result = scaler.fit_transform(X)
# Compute, and verify results match up to machine precision
manual_result = (X - scaler.data_min_)/(scaler.data_max_ - scaler.data_min_)
(sklearn_result - test).max().max() . # Is around 10e-16

using draw_networkx(), How to show multiple drawing windows?

The following code will create only one window at a time, the second window will only show when the first one is closed by the user.
How to show them at the same time with different titles?
nx.draw_networkx(..a..)
nx.draw_networkx(..b..)
It works the same as making other plots with Matplotlib.
Use the figure() command to switch to a new figure.
import networkx as nx
import matplotlib.pyplot as plt
G=nx.cycle_graph(4)
H=nx.path_graph(4)
plt.figure(1)
nx.draw(G)
plt.figure(2)
nx.draw(H)
plt.show()
You can use matplotlib and a grid to show multiple graphs:
#!/usr/bin/env python
"""
Draw a graph with matplotlib.
You must have matplotlib for this to work.
"""
__author__ = """Aric Hagberg (hagberg#lanl.gov)"""
# Copyright (C) 2004-2008
# Aric Hagberg <hagberg#lanl.gov>
# Dan Schult <dschult#colgate.edu>
# Pieter Swart <swart#lanl.gov>
# All rights reserved.
# BSD license.
try:
import matplotlib.pyplot as plt
except:
raise
import networkx as nx
G=nx.grid_2d_graph(4,4) #4x4 grid
pos=nx.spring_layout(G,iterations=100)
plt.subplot(221)
nx.draw(G,pos,font_size=8)
plt.subplot(222)
nx.draw(G,pos,node_color='k',node_size=0,with_labels=False)
plt.subplot(223)
nx.draw(G,pos,node_color='g',node_size=250,with_labels=False,width=6)
plt.subplot(224)
H=G.to_directed()
nx.draw(H,pos,node_color='b',node_size=20,with_labels=False)
plt.savefig("four_grids.png")
plt.show()
Code above will generates this figure:
Reference: https://networkx.org/documentation/networkx-1.9/examples/drawing/four_grids.html