How to visualize HeteroData pytorch geometric graph with any tool? - networkx

Hello what is a good way to visualize a pyg HeteroData object ?
(defined similarly: https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html#creating-heterogeneous-gnns )
I tried with networkx but I think it is restricted to homogeneous graph ( it is possible to convert it but it is much less informative).
g = torch_geometric.utils.to_networkx(data.to_homogeneous(), to_undirected=False )
Did anyone try to do it with other python lib (matplotlib) or js (sigma.js/d3.js)?
Any docs link you can share?

I have done the following:
import networkx as nx
from matplotlib import pyplot as plt
from torch_geometric.nn import to_hetero
g = torch_geometric.utils.to_networkx(data.to_homogeneous())
# Networkx seems to create extra nodes from our heterogeneous graph, so I remove them
isolated_nodes = [node for node in g.nodes() if g.out_degree(node) == 0]
[g.remove_node(i_n) for i_n in isolated_nodes]
# Plot the graph
nx.draw(g, with_labels=True)
plt.show()
However, it's true that it was "flattened" to a homogeneous, while it'd be more interesting to, for example, use different colors for different types of nodes.

Related

Feature Selection in Multivariate Linear Regression

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
a = make_regression(n_samples=300,n_features=5,noise=5)
df1 = pd.DataFrame(a[0])
df1 = pd.concat([df1,pd.DataFrame(a[1].T)],axis=1,ignore_index=True)
df1.rename(columns={0:"X1",1:"X2",2:"X3",3:"X4",4:"X5",5:"Target"},inplace=True)
sns.heatmap(df1.corr(),annot=True);
Correlation Matrix
Now I can ask my question. How can I choose features that will be included in the model?
I am not that well-versed in python as I use R most of the time.
But it should be something like this:
# Create a model
model = LinearRegression()
# Call the .fit method and pass in your data
model.fit(Variables,Target)
# Or simply do
model = LinearRegression().fit(Variables,Target)
# So based on the dataset head provided, it should be
X<-df1[['X1','X2','X3','X4','X5']]
Y<-df1['Target']
model = LinearRegression().fit(X,Y)
In order to do feature selections. You need to run the model first. Then check for the p-value. Typically, a p-value of 5% (.05) or less is a good cut-off point. If the p-value crosses the upper threshold of .05, the variable is insignificant and you can remove it from your model. You will have to do this manually. You can also tell by looking from the correlation matrix to see which value has less correlation to the target. AFAIK, there are no libs with built-in functionality to do feature selection automatically. In the end, statistics are just numbers. It is up to humans to interpret the results.

port networkx graph to graphviz without pygraphviz

I am working in Python 3 on a windows machine and despite many efforts have not been able to get pygraphviz installed. Separate discussion.
I have networkx and graphviz modules...Is there a paradigm for building network graphs in networkx and extracting to a graphviz format for display that does not use pygraphviz?
It seems all the relevant functionality in drawing.nx_agraph and nx_agraph requires pygraphviz but I have gotten accustomed to using networkx and like the functionality therein. In the documentation for networkx it even says they are focusing on the development of graph objects and not on the actual display.
I know this is an old question, but I was looking for the same thing: How to get dot notation from a graph in NetworkX?
(If you are only interested in the answer, skip this paragraph. I actually wanted to display a NetworkX multigraph, but even though NetworkX is powerful python library for manipulation of networks and graphs, it has fairly limited options when it comes displaying (rendering) graphs and networks. NetworkX uses Matplotlib to provide basic functionality for visualizing graphs, which doesn't have the option for visualizing multigraph. NetworkX is not intended for visualizing graphs so in NetworkX documentation they recommend using some of graph visualization tools, Cytoscape, Gephi, Graphviz. All of this tools have some kind of python interfaces but the best one (easiest and simplest to use) is graphviz which supports the DOT language of the Graphviz drawing software. Visualizing graphs in graphviz is very simple and convenient, you can use one short line such as graph.view() which by default creates a PDF file and opens it in your system's default PDF viewer, but you can also view and create image files such as SVG and PNG. So it would be ideal to convert NetworkX graph to graphviz graph and to do that you need to convert NetworkX graph to DOT notation which can then be read by graphviz.)
Solution by Aric is simple and works for printing out DOT notation of graph, but is kind of a hack and also in my case, I need to save DOT notation in string variable so I can use it to create a graph in graphviz from that DOT notation.
networkx.drawing.nx_pydot.write_dot(source) is a function that creates DOT format and saves it to a path (file handle). Code of this function is:
#open_file(1, mode='w')
def write_dot(G, path):
P = to_pydot(G)
path.write(P.to_string())
return
And this gives us the solution for answer:
from networkx import path_graph
from networkx.drawing.nx_pydot import to_pydot
G = path_graph(4)
dot = to_pydot(G).to_string()
print(dot)
Function to_pydot (source) uses pydot library to create pydot.Dot object which has to_string (source) method that returns string representation of graph in DOT language.
P.S.:
To render a DOT source code in graphviz you can do something like this:
from graphviz import Source
src = Source(dot) # dot is string containing DOT notation of graph
src.view()
P.P.S.:
Two following pictures show difference when you use Matplotlib (left side of image) and graphviz (right side of image) to visualize graph networkx.margulis_gabber_galil_graph(3):
You can use pydot (https://pypi.python.org/pypi/pydot) as a pure Python alternative to PyGraphviz
In [1]: import networkx
In [2]: import sys
In [3]: G = networkx.path_graph(4)
In [4]: networkx.drawing.nx_pydot.write_dot(G,sys.stdout)
strict graph {
0;
1;
2;
3;
0 -- 1;
1 -- 2;
2 -- 3;
}

Finding all subgraphs of depth 2 Networkx

I have a huge graph in networkx and I would like to get all the subgraphs of depth 2 from each node. Is there a nice way to do that using buildin function in networkx?
As I said in the comment, networkx.ego_graph fits the bill. You just need to make sure that you set the radius to 2 (default is 1):
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
# create some test graph
graph = nx.erdos_renyi_graph(1000, 0.005)
# create an ego-graph for some node
node = 0
ego_graph = nx.ego_graph(graph, node, radius=2)
# plot to check
nx.draw(ego_graph); plt.show()

Do I have to preprocess test data using neural networks?

I am using Keras (version 2.0.0) and I'd like to make use of pretrained models like e.g. VGG16.
In order to get started, I ran the example of the [Keras documentation site ][https://keras.io/applications/] for extracting features with VGG16:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
The used preprocess_input() function bothers me
(the function does Zero-centering by mean pixel what can be seen by looking at the source code).
Do I really have to preprocess input data (validation/test data) before using a trained model?
a)
If yes, one can conclude that you always have to be aware of what preprocessing steps have been performed during training phase?!
b)
If no: Does preprocessing of validation/test data cause a bias?
I appreciate your help.
Yes you should use the preprocessing step. You can retrain the model without it but the first layers will learn to center your datas so this is a waste of parameters.
If you do not recenter your performances will suffer.
Great thread on reddit : https://www.reddit.com/r/MachineLearning/comments/3q7pjc/why_is_removing_the_mean_pixel_value_from_each/

Getting values in Seaborn boxplot

I would like to get the specific values by a boxplot generated in Seaborn
(i.e., media, quartile). For example, in the boxplot below (source: link)
Is there a any way to get the media and quartiles instead of manually estimation?
import numpy as np
import seaborn as sns
sns.set(style="ticks", palette="muted", color_codes=True)
# Load the example planets dataset
planets = sns.load_dataset("planets")
# Plot the orbital period with horizontal boxes
ax = sns.boxplot(x="distance", y="method", data=planets,
whis=np.inf, color="c")
I would encourage you to become familiar with using pandas to extract quantitative information from a dataframe. For instance, a simple thing you could to do to get the values you are looking for (and other useful ones) would be:
planets.groupby("method").distance.describe().unstack()
which prints a table of useful values for each method.
Or if you just want the median:
planets.groupby("method").distance.median()
Sometimes I use my data as a list of arrays instead of pandas. So for that, you might need:
min(d), np.quantile(d, 0.25), np.median(d), np.quantile(d, 0.75), max(d)