How to calculate the GED between g2 and q? - networkx

I am learning the networkx function named "networkx.graph_edit_distance(g2,q)". Actually, GED(g2,q) = 2.If we want to tranform g2 to q, we should do at least 2 graph edit operations"substituing (1,3) whose label is '2' in g2 to (1,3) whose label is '1', inserting (3,4) which is not exsits in g2 to (3,4) whose label
is '1".My code is shown below:
nodes = [(1,{'label':'C1'}),
(2,{'label':'C2'}),
(3,{'label':'C3'}),
(4,{'label':'C4'}),
(5,{'label':'N'})]
edges = [(1,2,{'label':'1'}),
(2,4,{'label':'1'}),
(4,5,{'label':'1'}),
(5,3,{'label':'1'}),
(3,1,{'label':'2'})]
g2 = nx.Graph()
g2.add_nodes_from(nodes)
g2.add_edges_from(edges)
nodes = [(1,{'label':'C1'}),
(2,{'label':'C2'}),
(3,{'label':'C3'}),
(4,{'label':'C4'}),
(5,{'label':'N'})]
edges = [(1,2,{'label':'1'}),
(2,4,{'label':'1'}),
(4,5,{'label':'1'}),
(5,3,{'label':'1'}),
(3,1,{'label':'1'}),
(3,4,{'label':'1'})]
q = nx.Graph()
q.add_nodes_from(nodes)
q.add_edges_from(edges)
GED_q_g2 = nx.graph_edit_distance(g2, q)
But unfortunately, the expected answer is GED =2, but it gives the answer GED_q_g2 = 1.Please how could I get the right answer?

When I look at the defined graph, the edit distance of 1 is correct, because only (3,4) needs to be removed.
The graph that you've drawn displays two edges between 1 and 3, though. I guess you've misunderstood the label functionality: It's just an optional data attribute that you can use for identification or plotting - it has nothing to do with the number of edges or weights.
If you want to use multiple edges between two nodes, have a look at nx.Multigraph.

just use the edge_match.
nx.graph_edit_distance(g1,g2,edge_match=lambda a,b: a['label'] == b['label']))

Related

N-dimensional GP Regression

I'm trying to use GPflow for a multidimensional regression. But I'm confused by the shapes of the mean and variance.
For example: A 2-dimensional input space X of shape (20,20) is supposed to be predicted. My training samples are of shape (8,2) which means 8 training samples overall for the two dimensions. The y-values are of shape (8,1) which of course means one value of the ground truth per combination of the 2 input dimensions.
If I now use model.predict_y(X) I would expect to receive a mean of shape (20,20) but obtain a shape of (20,1). Same goes for the variance. I think that this problem comes from the shape of the y-values but I have have no idea how to fix it.
bound = 3
num = 20
X = np.random.uniform(-bound, bound, (num,num))
print(X_sample.shape) # (8,2)
print(Y_sample.shape) # (8,1)
k = gpflow.kernels.RBF(input_dim=2)
m = gpflow.models.GPR(X_sample, Y_sample, kern=k)
m.likelihood.variance = sigma_n
m.compile()
gpflow.train.ScipyOptimizer().minimize(m)
mean, var = m.predict_y(X)
print(mean.shape) # (20, 1)
print(var.shape) # (20, 1)
It sounds like you may be confused between the shape of a grid of input positions and the shape of the numpy arrays: if you want to predict on a 20 x 20 grid in two dimensions, you have 400 points in total, each with 2 values. So X (the one that you pass to m.predict_y()) should have shape (400, 2). (Note that the second dimension needs to have the same shape as X_sample!)
To construct this array of shape (400,2) you can use np.meshgrid (e.g., see What is the purpose of meshgrid in Python / NumPy?).
m.predict_y(X) only predicts the marginal variance at each test point, so the returned mean and var both have shape (400,1) (same length as X). You can of course reshape them to the 20 x 20 values on your grid.
(It is also possible to compute the full covariance, for the latent f this is implemented as m.predict_f_full_cov, which for X of shape (400,2) would return a 400x400 matrix. This is relevant if you want consistent samples from the GP, but I suspect that goes well beyond this question.)
I was indeed making the mistake to not flatten the arrays which in return produced the mistake. Thank you for the fast response STJ!
Here is an example of the working code:
# Generate data
bound = 3.
x1 = np.linspace(-bound, bound, num)
x2 = np.linspace(-bound, bound, num)
x1_mesh,x2_mesh = np.meshgrid(x1, x2)
X = np.dstack([x1_mesh, x2_mesh]).reshape(-1, 2)
z = f(x1_mesh, x2_mesh) # evaluation of the function on the grid
# Draw samples from feature vectors and function by a given index
size = 2
np.random.seed(1991)
index = np.random.choice(range(len(x1)), size=(size,X.ndim), replace=False)
samples = utils.sampleFeature([x1,x2], index)
X1_sample = samples[0]
X2_sample = samples[1]
X_sample = np.column_stack((X1_sample, X2_sample))
Y_sample = utils.samplefromFunc(f=z, ind=index)
# Change noise parameter
sigma_n = 0.0
# Construct models with initial guess
k = gpflow.kernels.RBF(2,active_dims=[0,1], lengthscales=1.0,ARD=True)
m = gpflow.models.GPR(X_sample, Y_sample, kern=k)
m.likelihood.variance = sigma_n
m.compile()
#print(X.shape)
mean, var = m.predict_y(X)
mean_square = mean.reshape(x1_mesh.shape) # Shape: (num,num)
var_square = var.reshape(x1_mesh.shape) # Shape: (num,num)
# Plot mean
fig = plt.figure(figsize=(16, 12))
ax = plt.axes(projection='3d')
ax.plot_surface(x1_mesh, x2_mesh, mean_square, cmap=cm.viridis, linewidth=0.5, antialiased=True, alpha=0.8)
cbar = ax.contourf(x1_mesh, x2_mesh, mean_square, zdir='z', offset=offset, cmap=cm.viridis, antialiased=True)
ax.scatter3D(X1_sample, X2_sample, offset, marker='o',edgecolors='k', color='r', s=150)
fig.colorbar(cbar)
for t in ax.zaxis.get_major_ticks(): t.label.set_fontsize(fontsize_ticks)
ax.set_title("$\mu(x_1,x_2)$", fontsize=fontsize_title)
ax.set_xlabel("\n$x_1$", fontsize=fontsize_label)
ax.set_ylabel("\n$x_2$", fontsize=fontsize_label)
ax.set_zlabel('\n\n$\mu(x_1,x_2)$', fontsize=fontsize_label)
plt.xticks(fontsize=fontsize_ticks)
plt.yticks(fontsize=fontsize_ticks)
plt.xlim(left=-bound, right=bound)
plt.ylim(bottom=-bound, top=bound)
ax.set_zlim3d(offset,np.max(z))
which leads to (red dots are the sample points drawn from the function). Note: Code not refactored what so ever :)

Distinguish classes with different colors Networkx

I have a huge dataset of 80,000 rows , I want to draw a meaningful graph in networkx using 2 dataframes (nodes and edges)
In "nodes", I have : actor1 , category_id(int :numerical value from 0 - 7 describe the type , and fatalities (float representing the number of injured or killed people))
In "edges" : "actor1", "actor2", "interaction: float 64"
my aim is to draw a graph with different colors according to category_id and different sizes based on number of fatalities
I started thid code which run perfectly until I tried to retrieve interaction and fatalities to calculate wights of nodes as follows
nodes = ACLED_to_graph[['actor1','category_id','fatalities']]
edges = ACLED_to_graph[['actor1','actor2','interaction']]
# Initiate the graph
G4 = nx.Graph()
for index, row in nodes.iterrows():
G4.add_node(row['actor1'], category_id=row['category_id'], nodesize=row['fatalities'])
for index, row in edges.iterrows():
G4.add_weighted_edges_from([(row['actor1'], row['actor2'], row['interaction'])])
#greater_than_ = [x for x in G.nodes(data=True) if x[2]['nodesize']>15]
# Sort nodes by degree
sorted(G4.degree, key=lambda x: x[1], reverse=True)
# remove anonymous nodes whose degree are <2 and <200
cond1 = [node for node,degree in G4.degree() if degree>=200]
cond2 = [node for node,degree in G4.degree() if degree<4]
remove = cond1+cond2
G4.remove_nodes_from(remove)
G4.remove_edges_from(remove)
# Customize the layout
pos=nx.spring_layout(G4, k=0.25, iterations=50)
# Define color map for classes
color_map = {0:'#f1f0c0',1:'#f09494', 2:'#eebcbc', 3:'#72bbd0', 4:'#91f0a1', 5:'#629fff', 6:'#bcc2f2',
7:'#eebcbc' }
plt.figure(figsize=(25,25))
options = {
'edge_color': '#FFDEA2',
'width': 1,
'with_labels': True,
'font_weight': 'regular',
}
colors = [color_map[G4.node[node]['category_id']] for node in G4.node]
#sizes = [G.node[node]['interaction'] for node in G]
"""
Using the spring layout :
- k controls the distance between the nodes and varies between 0 and 1
- iterations is the number of times simulated annealing is run
default k=0.1 and iterations=50
"""
#node_color=colors,
#node_size=sizes,
nx.draw(G4,node_color=colors, **options,cmap=plt.get_cmap('jet'))
ax = plt.gca()
ax.collections[0].set_edgecolor("#555555")
I am also removing some nodes with degrees greater than 200 and less than 3 to simplify the graph and make it more appealing.
I am getting the following error :
colors = [color_map[G4.node[node]['category_id']] for node in G4.node]
KeyError: 'category_id'
without the input data it is a bit hard to tell for sure, but it looks as if you are not constructing the graph nodes with a property 'category_id'. In the for index, row in nodes.iterrows(): you assign the data in the nodes dictionary, key 'category_id' to the property "group".
you can confirm this to be the case by checking what keys are set for an example node in your graph, e.g. print(G4.node['actor1 '].keys()).
To fix this, either
a) change the assignment
for index, row in nodes.iterrows():
G4.add_node(row['actor1'], category_id=row['category_id'], nodesize=row['interaction'])
or b) change the lookup
colors = [color_map[G4.node[node]['group']] for node in G4.node]
Solving mathematical operation using nodes attributes can be summarized as follows :
1 -After the subsetting the dataframe, we initialize the graph
nodes = ACLED_to_graph[['actor1','category_id','interaction']]
edges = ACLED_to_graph[['actor1','actor2','fatalities']]
# Initiate the graph
G8 = nx.Graph()
2- Add edges attributes first (I emphasize the use of from_pandas_edgelist)
for index, row in edges.iterrows():
G8 = nx.from_pandas_edgelist(edges, 'actor1', 'actor2', ['fatalities'])
3- Next we add nodes attributes using add_note, other techniques such as set_nodes_attributes didn't work in pandas
for index, row in nodes.iterrows():
G8.add_node(row['actor1'], category_id=row['category_id'], interaction=row['interaction'])
4- Sort nodes by degree to select the most connected nodes (I am choosing nodes with degrees more than 3 and less than 200)
sorted(G8.degree, key=lambda x: x[1], reverse=True)
# remove anonymous nodes whose degree are <2 and <200
cond1 = [node for node,degree in G8.degree() if degree>=200]
cond2 = [node for node,degree in G8.degree() if degree<3]
remove = cond1+cond2
G8.remove_nodes_from(remove)
G8.remove_edges_from(remove)
5- set the color based on degree (calling node.degree)
node_color = [G8.degree(v) for v in G8]
6- set the edge size based on fatalities
edge_width = [0.15*G8[u][v]['fatalities'] for u,v in G8.edges()]
7- set the node size based on interaction
node_size = [list(nx.get_node_attributes(G8, 'interaction').values()) for v in G8]
I used get.node_attribute instead pandas to access the features which allowed me to list the dictionary and convert it to a matrix of values , ready to compute.
8- Select the most important edges based on atalities
large_edges = [x for x in G8.edges(data=True) if x[2]['fatalities']>=3.0]
9- Finally, draw the network and edges seperately
nx.draw_networkx(G8, pos, node_size=node_size,node_color=node_color, alpha=0.7, with_labels=False, width=edge_width, edge_color='.4', cmap=plt.cm.Blues)
nx.draw_networkx_edges(G8, pos, edgelist=large_edges, edge_color='r', alpha=0.4, width=6)

Shift a semi-log chart

There are two related things I would like to ask help with.
1) I'm trying to shift a "semi-log" chart (using semilogy) such that the new line passes through a given point on the chart, but still appears to be parallel to the original.
2) Shift the "line" exactly as in 1), but then also invert the slope.
I think that the desired results are best illustrated with an actual chart.
Given the following code:
x = [50 80];
y = [10 20];
all_x = 1:200;
P = polyfit(x, log10(y),1);
log_line = 10.^(polyval(log_line,all_x));
semilogy(all_x,log_line)
I obtain the following chart:
For 1), let's say I want to move the line such that it passes through point (20,10). The desired result would look something like the orange line below (please note that I added a blue dot at the (20,10) point only for reference):
For 2), I want to take the line from 1) and take an inverse of the slope, so that the final result looks like the orange line below:
Please let me know if any clarifications are needed.
EDIT: Based on Will's answer (below), the solution is as follows:
%// to shift to point (40, 10^1.5)
%// solution to 1)
log_line_offset = (10^1.5).^(log10(log_line)/log10(10^1.5) + 1-log10(log_line(40))/log10(10^1.5));
%// solution to 2)
log_line_offset_inverted = (10^1.5).^(1 + log10(log_line(40))/log10(10^1.5) - log10(log_line)/log10(10^1.5));
To do transformations described as linear operations on logarithmic axes, perform those linear transformations on the logarithm of the values and then reapply the exponentiation. So for 1):
log_line_offset = 10.^(log10(log_line) + 1-log10(log_line(20)));
And for 2):
log_line_offset_inverted = 10.^(2*log10(log_line_offset(20)) - log10(log_line_offset));
or:
log_line_offset_inverted = 10.^(1 + log10(log_line(20)) - log10(log_line));
These can then be plot with semilogy in the same way:
semilogy(all_x,log_line,all_x, log_line_offset, all_x,log_line_offset_inverted)
I can't guarantee that this is a sensible solution for the application that you're creating these plots and their underlying data though. It seems an odd way to describe the problem, so you might be better off creating these offsets further up the chain of calculation.
For example, log_line_offset can just as easily be calculated using your original code but for an x value of [20 50], but whether that is a meaningful way to treat the data may depend on what it's supposed to represent.

Plot portfolio composition map in Julia (or Matlab)

I am optimizing portfolio of N stocks over M levels of expected return. So after doing this I get the time series of weights (i.e. a N x M matrix where where each row is a combination of stock weights for a particular level of expected return). Weights add up to 1.
Now I want to plot something called portfolio composition map (right plot on the picture), which is a plot of these stock weights over all levels of expected return, each with a distinct color and length (at every level of return) is proportional to it's weight.
My questions is how to do this in Julia (or MATLAB)?
I came across this and the accepted solution seemed so complex. Here's how I would do it:
using Plots
#userplot PortfolioComposition
#recipe function f(pc::PortfolioComposition)
weights, returns = pc.args
weights = cumsum(weights,dims=2)
seriestype := :shape
for c=1:size(weights,2)
sx = vcat(weights[:,c], c==1 ? zeros(length(returns)) : reverse(weights[:,c-1]))
sy = vcat(returns, reverse(returns))
#series Shape(sx, sy)
end
end
# fake data
tickers = ["IBM", "Google", "Apple", "Intel"]
N = 10
D = length(tickers)
weights = rand(N,D)
weights ./= sum(weights, dims=2)
returns = sort!((1:N) + D*randn(N))
# plot it
portfoliocomposition(weights, returns, labels = tickers)
matplotlib has a pretty powerful polygon plotting capability, e.g. this link on plotting filled polygons:
ploting filled polygons in python
You can use this from Julia via the excellent PyPlot.jl package.
Note that the syntax for certain things changes; see the PyPlot.jl README and e.g. this set of examples.
You "just" need to calculate the coordinates from your matrix and build up a set of polygons to plot the portfolio composition graph. It would be nice to see the code if you get this working!
So I was able to draw it, and here's my code:
using PyPlot
using PyCall
#pyimport matplotlib.patches as patch
N = 10
D = 4
weights = Array(Float64, N,D)
for i in 1:N
w = rand(D)
w = w/sum(w)
weights[i,:] = w
end
weights = [zeros(Float64, N) weights]
weights = cumsum(weights,2)
returns = sort!([linspace(1,N, N);] + D*randn(N))
##########
# Plot #
##########
polygons = Array(PyObject, 4)
colors = ["red","blue","green","cyan"]
labels = ["IBM", "Google", "Apple", "Intel"]
fig, ax = subplots()
fig[:set_size_inches](5, 7)
title("Problem 2.5 part 2")
xlabel("Weights")
ylabel("Return (%)")
ax[:set_autoscale_on](false)
ax[:axis]([0,1,minimum(returns),maximum(returns)])
for i in 1:(size(weights,2)-1)
xy=[weights[:,i] returns;
reverse(weights[:,(i+1)]) reverse(returns)]
polygons[i] = matplotlib[:patches][:Polygon](xy, true, color=colors[i], label = labels[i])
ax[:add_artist](polygons[i])
end
legend(polygons, labels, bbox_to_anchor=(1.02, 1), loc=2, borderaxespad=0)
show()
# savefig("CompositionMap.png",bbox_inches="tight")
Can't say that this is the best way, to do this, but at least it is working.

Adjacency matrix from edge list (preferrably in Matlab)

I have a list of triads (vertex1, vertex2, weight) representing the edges of a weighted directed graph. Since prototype implementation is going on in Matlab, these are imported as a Nx3 matrix, where N is the number of edges. So the naive implementation of this is
id1 = L(:,1);
id2 = L(:,2);
weight = L(:,3);
m = max(max(id1, id2)) % to find the necessary size
V = zeros(m,m)
for i=1:m
V(id1(i),id2(i)) = weight(i)
end
The trouble with tribbles is that "id1" and "id2" are nonconsecutive; they're codes. This gives me three problems. (1) Huge matrices with way too many "phantom", spurious vertices, which distorts the results of algorithms to be used with that matrix and (2) I need to recover the codes in the results of said algorithms (suffice to say this would be trivial if id codes where consecutive 1:m).
Answers in Matlab are preferrable, but I think I can hack back from answers in other languages (as long as they're not pre-packaged solutions of the kind "R has a library that does this").
I'm new to StackOverflow, and I hope to be contributing meaningfully to the community soon. For the time being, thanks in advance!
Edit: This would be a solution, if we didn't have vertices at the origin of multiple vertices. (This implies a 1:1 match between the list of edge origins and the list of identities)
for i=1:n
for j=1:n
if id1(i) >0 & i2(j) > 0
V(i,j) = weight(i);
end
end
end
You can use the function sparse:
sparse(id1,id2,weight,m,m)
If your problem is that the node ID numbers are nonconsecutive, why not re-map them onto consecutive integers? All you need to do is create a dictionary of all unique node ID's and their correspondence to new IDs.
This is really no different to the case where you're asked to work with named nodes (Australia, Britain, Canada, Denmark...) - you would map these onto consecutive integers first.
You can use GRP2IDX function to convert your id codes to consecutive numbers, and ids can be either numerical or not, does not matter. Just keep the mapping information.
[idx1, gname1, gmap1] = grp2idx(id1);
[idx2, gname2, gmap2] = grp2idx(id2);
You can recover the original ids with gmap1(idx1).
If your id1 and id2 are from the same set you can apply grp2idx to their union:
[idx, gname,gmap] = grp2idx([id1; id2]);
idx1 = idx(1:numel(id1));
idx2 = idx(numel(id1)+1:end);
For the reordering see a recent question - how to assign a set of coordinates in Matlab?
You can use ACCUMARRAY or SUB2IND to solve this problem.
V = accumarray([idx1 idx2], weight);
or
V = zeros(max(idx1),max(idx2)); %# or V = zeros(max(idx));
V(sub2ind(size(V),idx1,idx2)) = weight;
Confirm if you have non-unique combinations of id1 and id2. You will have to take care of that.
Here is another solution:
First put together all your vertex ids since there might a sink vertex in your graph:
v_id_from = edge_list(:,1);
v_id_to = edge_list(:,2);
v_id_all = [v_id_from; v_id_to];
Then find the unique vertex ids:
v_id_unique = unique(v_id_all);
Now you can use the ismember function to get the mapping between your vertex ids and their consecutive index mappings:
[~,from] = ismember(v_id_from, v_id_unique);
[~,to] = ismember(v_id_to, v_id_unique);
Now you can use sub2ind to populate your adjacency matrix:
adjacency_matrix = zeros(length(from), length(to));
linear_ind = sub2ind(size(adjacency_matrix), from, to);
adjacency_matrix(linear_ind) = edge_list(:,3);
You can always go back from the mapped consecutive id to the original vertex id:
original_vertex_id = v_id_unique(mapped_consecutive_id);
Hope this helps.
Your first solution is close to what you want. However it is probably best to iterate over your edge list instead of the adjacency matrix.
edge_indexes = edge_list(:, 1:2);
n_edges = max(edge_indexes(:));
adj_matrix = zeros(n_edges);
for local_edge = edge_list' %transpose in order to iterate by edge
adj_matrix(local_edge(1), local_edge(2)) = local_edge(3);
end