Distinguish classes with different colors Networkx - networkx

I have a huge dataset of 80,000 rows , I want to draw a meaningful graph in networkx using 2 dataframes (nodes and edges)
In "nodes", I have : actor1 , category_id(int :numerical value from 0 - 7 describe the type , and fatalities (float representing the number of injured or killed people))
In "edges" : "actor1", "actor2", "interaction: float 64"
my aim is to draw a graph with different colors according to category_id and different sizes based on number of fatalities
I started thid code which run perfectly until I tried to retrieve interaction and fatalities to calculate wights of nodes as follows
nodes = ACLED_to_graph[['actor1','category_id','fatalities']]
edges = ACLED_to_graph[['actor1','actor2','interaction']]
# Initiate the graph
G4 = nx.Graph()
for index, row in nodes.iterrows():
G4.add_node(row['actor1'], category_id=row['category_id'], nodesize=row['fatalities'])
for index, row in edges.iterrows():
G4.add_weighted_edges_from([(row['actor1'], row['actor2'], row['interaction'])])
#greater_than_ = [x for x in G.nodes(data=True) if x[2]['nodesize']>15]
# Sort nodes by degree
sorted(G4.degree, key=lambda x: x[1], reverse=True)
# remove anonymous nodes whose degree are <2 and <200
cond1 = [node for node,degree in G4.degree() if degree>=200]
cond2 = [node for node,degree in G4.degree() if degree<4]
remove = cond1+cond2
G4.remove_nodes_from(remove)
G4.remove_edges_from(remove)
# Customize the layout
pos=nx.spring_layout(G4, k=0.25, iterations=50)
# Define color map for classes
color_map = {0:'#f1f0c0',1:'#f09494', 2:'#eebcbc', 3:'#72bbd0', 4:'#91f0a1', 5:'#629fff', 6:'#bcc2f2',
7:'#eebcbc' }
plt.figure(figsize=(25,25))
options = {
'edge_color': '#FFDEA2',
'width': 1,
'with_labels': True,
'font_weight': 'regular',
}
colors = [color_map[G4.node[node]['category_id']] for node in G4.node]
#sizes = [G.node[node]['interaction'] for node in G]
"""
Using the spring layout :
- k controls the distance between the nodes and varies between 0 and 1
- iterations is the number of times simulated annealing is run
default k=0.1 and iterations=50
"""
#node_color=colors,
#node_size=sizes,
nx.draw(G4,node_color=colors, **options,cmap=plt.get_cmap('jet'))
ax = plt.gca()
ax.collections[0].set_edgecolor("#555555")
I am also removing some nodes with degrees greater than 200 and less than 3 to simplify the graph and make it more appealing.
I am getting the following error :
colors = [color_map[G4.node[node]['category_id']] for node in G4.node]
KeyError: 'category_id'

without the input data it is a bit hard to tell for sure, but it looks as if you are not constructing the graph nodes with a property 'category_id'. In the for index, row in nodes.iterrows(): you assign the data in the nodes dictionary, key 'category_id' to the property "group".
you can confirm this to be the case by checking what keys are set for an example node in your graph, e.g. print(G4.node['actor1 '].keys()).
To fix this, either
a) change the assignment
for index, row in nodes.iterrows():
G4.add_node(row['actor1'], category_id=row['category_id'], nodesize=row['interaction'])
or b) change the lookup
colors = [color_map[G4.node[node]['group']] for node in G4.node]

Solving mathematical operation using nodes attributes can be summarized as follows :
1 -After the subsetting the dataframe, we initialize the graph
nodes = ACLED_to_graph[['actor1','category_id','interaction']]
edges = ACLED_to_graph[['actor1','actor2','fatalities']]
# Initiate the graph
G8 = nx.Graph()
2- Add edges attributes first (I emphasize the use of from_pandas_edgelist)
for index, row in edges.iterrows():
G8 = nx.from_pandas_edgelist(edges, 'actor1', 'actor2', ['fatalities'])
3- Next we add nodes attributes using add_note, other techniques such as set_nodes_attributes didn't work in pandas
for index, row in nodes.iterrows():
G8.add_node(row['actor1'], category_id=row['category_id'], interaction=row['interaction'])
4- Sort nodes by degree to select the most connected nodes (I am choosing nodes with degrees more than 3 and less than 200)
sorted(G8.degree, key=lambda x: x[1], reverse=True)
# remove anonymous nodes whose degree are <2 and <200
cond1 = [node for node,degree in G8.degree() if degree>=200]
cond2 = [node for node,degree in G8.degree() if degree<3]
remove = cond1+cond2
G8.remove_nodes_from(remove)
G8.remove_edges_from(remove)
5- set the color based on degree (calling node.degree)
node_color = [G8.degree(v) for v in G8]
6- set the edge size based on fatalities
edge_width = [0.15*G8[u][v]['fatalities'] for u,v in G8.edges()]
7- set the node size based on interaction
node_size = [list(nx.get_node_attributes(G8, 'interaction').values()) for v in G8]
I used get.node_attribute instead pandas to access the features which allowed me to list the dictionary and convert it to a matrix of values , ready to compute.
8- Select the most important edges based on atalities
large_edges = [x for x in G8.edges(data=True) if x[2]['fatalities']>=3.0]
9- Finally, draw the network and edges seperately
nx.draw_networkx(G8, pos, node_size=node_size,node_color=node_color, alpha=0.7, with_labels=False, width=edge_width, edge_color='.4', cmap=plt.cm.Blues)
nx.draw_networkx_edges(G8, pos, edgelist=large_edges, edge_color='r', alpha=0.4, width=6)

Related

How to visualize rownames of specific data points within a cluster plot in R?

I have calculated clusters with a big dataset (1) and found four clusters which I plotted. Now I have 30 new data points (2) that I want to plot in/ on top of the existing clusters in order to see which of the new data points is closest to the original cluster centroids (of the 1. big dataset).
What I did so far:
#I have combined both data sets (1. my old big data set) and (2. my 30 new data points) and added an indicator variable in order to distinguish between the old and new data sets:
# I only chose variables that are needed for the cluster calculations as well as the indicator
combined.ind <- combined [, c(1752:1757, 1759:1762, 1942)]
#I created a factor variable that indicates "new' and old variables
combined.ind$indicator <- factor(combined.ind$indicator,
levels = c(0,1),
labels = c("new", "old"))
#Then I calculated a hierarchical cluster analysis with the ward-centroids which I have then used for calculating a k-means clustering:
#calculate ward-centroids:
combined.ward.cent <- aggregate(cbind(Z1, Z2, Z3, Z4, Z5, Z6, Z7, Z8, Z9, Z10)~CLU4_1,combined,mean)
combined.ward.cent2 <- combined.ward.cent[, c(2:11)]
#apply kmeans with ward centroids as initial starting points:
kmeans <- kmeans(combined.ind[1:(length(combined.ind)-1)], centers = combined.ward.cent2)
#Then I have plotted the results and tried to highlight the new data points:
#Plot the results
fviz_cluster(kmeans, data = combined.ind[, 1:length(combined.ind)-1])
#I changed the colors with scale color manual in order to see the new data points.
fviz_cluster(kmeans, data = combined.ind[, 1:length(combined.ind)-1], geom=c("point", "text"), ellipse = T) + geom_point(aes(color=combined.ind$indicator)) + ggtitle("My Beautiful Graph") +
scale_color_manual("Old vs New", values = c("new" = "black", "old" = "red"))
Since the first dataset is huge, I cannot see/read the rownames of the new data points because all of them overlap. When I add repel=True to the argument (see below) only the rownames of the data points on the edge are visualized, which does not help me because I am trying to only visualize the rownames of the new data points.
fviz_cluster(kmeans, data = combined.ind[, 1:length(combined.ind)-1], geom=c("point", "text"), repel = TRUE, ellipse = T) +
geom_point(aes(color=combined.ind$indicator)) + ggtitle("My Beautiful Graph") +
scale_color_manual("Old vs New", values = c("new" = "black", "old" = "red"))
How can I solve this problem?

How to generate a triangle free graph in Networkx (with randomseed)?

After checking the documentation on triangles of networkx, I've wondered if there is a more efficient way of generating a triangle free graph than to randomly spawn graphs until a triangle free one happens to emerge, (in particular if one would like to use a constant random seed).
Below is code that spawns graphs until they are triangle free, yet with varying random seeds. For a graph of 10 nodes it already takes roughly 20 seconds.
def create_triangle_free_graph(show_graphs):
seed = 42
nr_of_nodes = 10
probability_of_creating_an_edge = 0.85
nr_of_triangles = 1 # Initialise at 1 to initiate while loop.
while nr_of_triangles > 0:
graph = nx.fast_gnp_random_graph(
nr_of_nodes, probability_of_creating_an_edge
)
triangles = nx.triangles(G).values()
nr_of_triangles = sum(triangles) / 3
print(f"nr_of_triangles={nr_of_triangles}")
return graph
Hence, I would like to ask:
Are there faster ways to generate triangle free graphs (using random seeds) in networkx?
A triangle exists in a graph iff two vertices connected by an edge share one or more neighbours. A triangle-free graph can be expanded by adding edges between nodes that share no neighbours. The empty graph is triangle-free, so there is a straightforward algorithm to create triangle-free graphs.
#!/usr/bin/env python
"""
Create a triangle free graph.
"""
import random
import networkx as nx
from itertools import combinations
def triangle_free_graph(total_nodes):
"""Construct a triangle free graph."""
nodes = range(total_nodes)
g = nx.Graph()
g.add_nodes_from(nodes)
edge_candidates = list(combinations(nodes, 2))
random.shuffle(edge_candidates)
for (u, v) in edge_candidates:
if not set(n for n in g.neighbors(u)) & set(n for n in g.neighbors(v)):
g.add_edge(u, v)
return g
g = triangle_free_graph(10)
print(nx.triangles(g))
The number of edges in the resulting graph is highly dependent on the ordering of edge_candidates. To get a graph with the desired edge density, repeat the process until a graph with equal or higher density is found (and then remove superfluous edges), or until your patience runs out.
cutoff = 0.85
max_iterations = 1e+4
iteration = 0
while nx.density(g) < cutoff:
g = triangle_free_graph(10)
iteration += 1
if iteration == max_iterations:
import warnings
warnings.warn("Maximum number of iterations reached!")
break
# TODO: remove edges until the desired density is achieved

Edges' length in diagraph on Matlab

I need to create a digraph on Matlab. I have the sources, the targets and the matrix with the weights. Normally, all that is needed is the line:
G = digraph(S,T,weights);
My problem is that although I don't have the coordinates of nodes, I do have the lengths of the edges linking the nodes.
In order to have the weights being represented as edges' width, I have this:
LWidths = (1/max(G.Edges.Weight))*G.Edges.Weight;
p.LineWidth = LWidths;
How can I take into account also the length and have it imported from the user, and not by default?
After defining sources, targets, weights and lengths:
G = digraph(S,T,lengths);
p = plot(G);
G.Edges.Weight = lengths';
layout(p,'force','WeightEffect','direct')
G.Edges.LWidths = 7*weights'/max(weights);
p.LineWidth = G.Edges.LWidths;
, the edges lengths are dependent on the actual length thereof, and their width is proportional to their weights

How to efficiently create an array by tracing back the parent nodes in Matlab?

I am working on a path planner algorithm. I have a Nx2 matrix NodeInfo which has the current node number in its 1st column and parent node number in its 2nd column. For example:
NodeInfo = [3,1;
4,1;
5,2;
6,2;
7,3;
8,4;
9,4;
10,4;
11,5;
12,6;
13,6;
14,6;
15,7;
16,7;
17,8;
18,8;
19,9;
20,9;
21,10;
22,10;
23,11;
24,11;
25,12;
26,12];
When the algorithm reaches to a goal it outputs the node number, which is 26 in this case. I am looking for a smart way of tracking back the parent nodes and creating an array of the nodes that resulted with the goal node. So the output should be:
Array = [26, 12, 6, 2];
Thanks!
p = NodeInfo(end,1);
parents = [p]
while (~isempty(p))
p = NodeInfo(find(NodeInfo(:,1)==p),2)
parents = [parents p]
end
The answer is stored in the parents
The code below uses a container, and it may take some time to build up a hashmap, but it might faster than find() when there is actually a larger dataset with a vast number of requests.
Edit: Added 2 nodes into NodeMap to prevent isKey() function in while condition from wasting too much time.
NodeMap = containers.Map(NodeInfo(:,1),NodeInfo(:,2)); %Create a container
NodeMap(1)=0; NodeMap(2)=0; %Add 2 nodes
nodes=zeros(1,length(NodeMap)); %pre-allocate
k=2; [N,nodes(1)]=deal(26); %Init parameters
while(N>0)
[nodes(k),N]=deal(NodeMap(N));
k=k+1;
end
nodes(nodes == 0)=[] %Cleaning up & print
The output of N=26 is:
nodes =
26 12 6 2
hope it helps!

Plot portfolio composition map in Julia (or Matlab)

I am optimizing portfolio of N stocks over M levels of expected return. So after doing this I get the time series of weights (i.e. a N x M matrix where where each row is a combination of stock weights for a particular level of expected return). Weights add up to 1.
Now I want to plot something called portfolio composition map (right plot on the picture), which is a plot of these stock weights over all levels of expected return, each with a distinct color and length (at every level of return) is proportional to it's weight.
My questions is how to do this in Julia (or MATLAB)?
I came across this and the accepted solution seemed so complex. Here's how I would do it:
using Plots
#userplot PortfolioComposition
#recipe function f(pc::PortfolioComposition)
weights, returns = pc.args
weights = cumsum(weights,dims=2)
seriestype := :shape
for c=1:size(weights,2)
sx = vcat(weights[:,c], c==1 ? zeros(length(returns)) : reverse(weights[:,c-1]))
sy = vcat(returns, reverse(returns))
#series Shape(sx, sy)
end
end
# fake data
tickers = ["IBM", "Google", "Apple", "Intel"]
N = 10
D = length(tickers)
weights = rand(N,D)
weights ./= sum(weights, dims=2)
returns = sort!((1:N) + D*randn(N))
# plot it
portfoliocomposition(weights, returns, labels = tickers)
matplotlib has a pretty powerful polygon plotting capability, e.g. this link on plotting filled polygons:
ploting filled polygons in python
You can use this from Julia via the excellent PyPlot.jl package.
Note that the syntax for certain things changes; see the PyPlot.jl README and e.g. this set of examples.
You "just" need to calculate the coordinates from your matrix and build up a set of polygons to plot the portfolio composition graph. It would be nice to see the code if you get this working!
So I was able to draw it, and here's my code:
using PyPlot
using PyCall
#pyimport matplotlib.patches as patch
N = 10
D = 4
weights = Array(Float64, N,D)
for i in 1:N
w = rand(D)
w = w/sum(w)
weights[i,:] = w
end
weights = [zeros(Float64, N) weights]
weights = cumsum(weights,2)
returns = sort!([linspace(1,N, N);] + D*randn(N))
##########
# Plot #
##########
polygons = Array(PyObject, 4)
colors = ["red","blue","green","cyan"]
labels = ["IBM", "Google", "Apple", "Intel"]
fig, ax = subplots()
fig[:set_size_inches](5, 7)
title("Problem 2.5 part 2")
xlabel("Weights")
ylabel("Return (%)")
ax[:set_autoscale_on](false)
ax[:axis]([0,1,minimum(returns),maximum(returns)])
for i in 1:(size(weights,2)-1)
xy=[weights[:,i] returns;
reverse(weights[:,(i+1)]) reverse(returns)]
polygons[i] = matplotlib[:patches][:Polygon](xy, true, color=colors[i], label = labels[i])
ax[:add_artist](polygons[i])
end
legend(polygons, labels, bbox_to_anchor=(1.02, 1), loc=2, borderaxespad=0)
show()
# savefig("CompositionMap.png",bbox_inches="tight")
Can't say that this is the best way, to do this, but at least it is working.