Ullman’s Subgraph Isomorphism Algorithm - matlab

Could somebody give me a working Ullman's graph isomorphism problem implementation in MATLAB, or link to it. Or if you have at least in C so I would try to implement it in MATLAB.
Thanks

i'm lookign for it too. I've been loking in the web but with no luck so far, but i've found this:
Algorithm, where the algorithm is explained.
On another hand, i found this:
def search(graph,subgraph,assignments,possible_assignments):
update_possible_assignments(graph,subgraph,possible_assignments)
i=len(assignments)
# Make sure that every edge between assigned vertices in the subgraph is also an
# edge in the graph.
for edge in subgraph.edges:
if edge.first<i and edge.second<i:
if not graph.has_edge(assignments[edge.first],assignments[edge.second]):
return False
# If all the vertices in the subgraph are assigned, then we are done.
if i==subgraph.n_vertices:
return True
for j in possible_assignments[i]:
if j not in assignments:
assignments.append(j)
# Create a new set of possible assignments, where graph node j is the only
# possibility for the assignment of subgraph node i.
new_possible_assignments = deep_copy(possible_assignments)
new_possible_assignments[i] = [j]
if search(graph,subgraph,assignments,new_possible_assignments):
return True
assignments.pop()
possible_assignments[i].remove(j)
update_possible_assignments(graph,subgraph,possible_assignments)
def find_isomporhism(graph,subgraph):
assignments=[]
possible_assignments = [[True]*graph.n_vertices for i in range(subgraph.n_vertices)]
if search(graph,subgraph,asignments,possible_assignments):
return assignments
return None
here: implementation. I do not have the skills to transform this into Matlab, if you have them , i would really appreciate if you could share your code when you're done.

Related

Is it possible to use callbacks to access a single trajectory in Julia's DifferentialEquations Ensemble Problems?

I am new to Julia and trying to use the Julia package DifferentialEquations to simultaneously solve for several conditions of the same set of coupled ODEs. My system is a model of an experiment and in one of the conditions, I increase the amount of one of the dependent variables at mid-way through the process.
I would like to be able to adjust the condition of this single trajectory, however so far I am only able to adjust all the trajectories at once. Is it possible to access a single one using callbacks? If not, is there a better way to do this?
Here is a simplified example using the lorentz equations for what I want to be doing:
#Differential Equations setup
function lorentz!(du,u,p,t)
a,r,b=p
du[1]= a*(u[2]-u[1])
du[2]=u[1]*(r-u[3])-u[2]
du[3]=u[1]*u[2]-b*u[3];
end
#function to cycle through inital conditions
function prob_func(prob,i,repeat)
remake(prob; u0 = u0_arr[i]);
end
#inputs
t_span=[(0.0,100.0),(0.0,100.0)];
u01=[0.0;1.0;0.0];
u02=[0.0;1.0;0.0];
u0_arr = [u01,u02];
p=[10.,28.,8/3];
#initialising the Ensemble Problem
prob = ODEProblem(lorentz!,u0_arr[1],t_span[1],p);
CombinedProblem = EnsembleProblem(prob,
prob_func = prob_func, #-> (prob),#repeat is a count for how many times the trajectories had been repeated
safetycopy = true # determines whether a safetly deepcopy is called on the prob before the prob_func (sounds best to leave as true for user-given prob_func)
);
#introducing callback
function condition(u,t,repeat)
return 50 .-t
end
function affect!(repeat)
repeat.u[1]=repeat.u[1] +50
end
callback = DifferentialEquations.ContinuousCallback(condition, affect!)
#solving
sim=solve(CombinedProblem,Rosenbrock23(),EnsembleSerial(),trajectories=2,callback=callback);
# Plotting for ease of understanding example
plot(sim[1].t,sim[1][1,:])
plot!(sim[2].t,sim[2][1,:])
I want to produce something like this:
Example_desired_outcome
But this code produces:
Example_current_outcome
Thank you for your help!
You can make that callback dependent on a parameter and make the parameter different between problems. For example:
function f(du,u,p,t)
if p == 0
du[1] = 2u[1]
else
du[1] = -2u[1]
end
du[2] = -u[2]
end
condition(t,u,integrator) = u[2] - 0.5
affect!(integrator) = integrator.prob.p = 1
For more information, check out the FAQ on this topic: https://diffeq.sciml.ai/stable/basics/faq/#Switching-ODE-functions-in-the-middle-of-integration

How To Use kmedoids from pyclustering with set number of clusters

I am trying to use k-medoids to cluster some trajectory data I am working with (multiple points along the trajectory of an aircraft). I want to cluster these into a set number of clusters (as I know how many types of paths there should be).
I have found that k-medoids is implemented inside the pyclustering package, and am trying to use that. I am technically able to get it to cluster, but I do not know how to control the number of clusters. I originally thought it was directly tied to the number of elements inside what I called initial_medoids, but experimentation shows that it is more complicated than this. My relevant code snippet is below.
Note that D holds a list of lists. Each list corresponds to a single trajectory.
def hausdorff( u, v):
d = max(directed_hausdorff(u, v)[0], directed_hausdorff(v, u)[0])
return d
traj_count = len(traj_lst)
D = np.zeros((traj_count, traj_count))
for i in range(traj_count):
for j in range(i + 1, traj_count):
distance = hausdorff(traj_lst[i], traj_lst[j])
D[i, j] = distance
D[j, i] = distance
from pyclustering.cluster.kmedoids import kmedoids
initial_medoids = [104, 345, 123, 1]
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()[0]
num_clusters = len(np.unique(cluster_lst))
print('There were %i clusters found' %num_clusters)
I have a total of 1900 trajectories, and the above-code finds 1424 clusters. I had expected that I could control the number of clusters through the length of initial_medoids, as I did not see any option to input the number of clusters into the program, but this seems unrelated. Could anyone guide me as to the mistake I am making? How do I choose the number of clusters?
In case of requirement to obtain clusters you need to call get_clusters():
cluster_lst = kmedoids_instance.get_clusters()
Not get_clusters()[0] (in this case it is a list of object indexes in the first cluster):
cluster_lst = kmedoids_instance.get_clusters()[0]
And that is correct, you can control amount of clusters by initial_medoids.
It is true you can control the number of cluster, which correspond to the length of initial_medoids.
The documentation is not clear about this. The get__clusters function "Returns list of medoids of allocated clusters represented by indexes from the input data". so, this function does not return the cluster labels. It returns the index of rows in your original (input) data.
Please check the shape of cluster_lst in your example, using .get_clusters() and not .get_clusters()[0] as annoviko suggested. In your case, this shape should be (4,). So, you have a list of four elements (clusters), each containing the index or rows in your original data.
To get, for example, data from the first cluster, use:
kmedoids_instance = kmedoids(traj_lst, initial_medoids)
kmedoids_instance.process()
cluster_lst = kmedoids_instance.get_clusters()
traj_lst_first_cluster = traj_lst[cluster_lst[0]]

Trying to balance my dataset through sample_weight in scikit-learn

I'm using RandomForest for classification, and I got an unbalanced dataset, as: 5830-no, 1006-yes. I try to balance my dataset with class_weight and sample_weight, but I can`t.
My code is:
X_train,X_test,y_train,y_test = train_test_split(arrX,y,test_size=0.25)
cw='auto'
clf=RandomForestClassifier(class_weight=cw)
param_grid = { 'n_estimators': [10,50,100,200,300],'max_features': ['auto', 'sqrt', 'log2']}
sw = np.array([1 if i == 0 else 8 for i in y_train])
CV_clf = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 10,fit_params={'sample_weight': sw})
But I don't get any improvement on my ratios TPR, FPR, ROC when using class_weight and sample_weight.
Why? Am I doing anything wrong?
Nevertheless, if I use the function called balanced_subsample, my ratios obtain a great improvement:
def balanced_subsample(x,y,subsample_size):
class_xs = []
min_elems = None
for yi in np.unique(y):
elems = x[(y == yi)]
class_xs.append((yi, elems))
if min_elems == None or elems.shape[0] < min_elems:
min_elems = elems.shape[0]
use_elems = min_elems
if subsample_size < 1:
use_elems = int(min_elems*subsample_size)
xs = []
ys = []
for ci,this_xs in class_xs:
if len(this_xs) > use_elems:
np.random.shuffle(this_xs)
x_ = this_xs[:use_elems]
y_ = np.empty(use_elems)
y_.fill(ci)
xs.append(x_)
ys.append(y_)
xs = np.concatenate(xs)
ys = np.concatenate(ys)
return xs,ys
My new code is:
X_train_subsampled,y_train_subsampled=balanced_subsample(arrX,y,0.5)
X_train,X_test,y_train,y_test = train_test_split(X_train_subsampled,y_train_subsampled,test_size=0.25)
cw='auto'
clf=RandomForestClassifier(class_weight=cw)
param_grid = { 'n_estimators': [10,50,100,200,300],'max_features': ['auto', 'sqrt', 'log2']}
sw = np.array([1 if i == 0 else 8 for i in y_train])
CV_clf = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 10,fit_params={'sample_weight': sw})
This is not a full answer yet, but hopefully it'll help get there.
First some general remarks:
To debug this kind of issue it is often useful to have a deterministic behavior. You can pass the random_state attribute to RandomForestClassifier and various scikit-learn objects that have inherent randomness to get the same result on every run. You'll also need:
import numpy as np
np.random.seed()
import random
random.seed()
for your balanced_subsample function to behave the same way on every run.
Don't grid search on n_estimators: more trees is always better in a random forest.
Note that sample_weight and class_weight have a similar objective: actual sample weights will be sample_weight * weights inferred from class_weight.
Could you try:
Using subsample=1 in your balanced_subsample function. Unless there's a particular reason not to do so we're better off comparing the results on similar number of samples.
Using your subsampling strategy with class_weight and sample_weight both set to None.
EDIT: Reading your comment again I realize your results are not so surprising!
You get a better (higher) TPR but a worse (higher) FPR.
It just means your classifier tries hard to get the samples from class 1 right, and thus makes more false positives (while also getting more of those right of course!).
You will see this trend continue if you keep increasing the class/sample weights in the same direction.
There is a imbalanced-learn API that helps with oversampling/undersampling data that might be useful in this situation. You can pass your training set into one of the methods and it will output the oversampled data for you. See simple example below
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=1)
x_oversampled, y_oversampled = ros.fit_sample(orig_x_data, orig_y_data)
Here it the link to the API: http://contrib.scikit-learn.org/imbalanced-learn/api.html
Hope this helps!

scala: directed and undirected edges of Graphs

If a directed Edge is implemented something like:
class EdgeImpl(origin: Node, dest: Node) {
def from = origin
def to = dest
}
then which is the difference for implementing an undirected Edge while when we create a new Edge we also have to say in both cases: new EdgeImpl(node1, node2)? I do not get the difference in implementation :(
Edit
I was analyzing, more concretely, this example
There is no real difference in the implementation of an Edge, in both cases just the two connected nodes need to be specified.
The difference would pop-up when you wanted to implement something else where the meaning of an edge needs interpretation. For instance, if you had a method areConnected(a: Node, b: Node): Boolean, then its implementation would traverse the list of edges and, if in a directed graph would return true if from == a && to == b. The undirected version would evaluate (from == a && to == b) || from == b && to == a) instead.
That example is kind of convoluted and does not make clear why the features described are really needed, but consider for instance how you would go creating a WeightedDirectedGraph, where each edge also contains a weight or distance between the connected nodes.

networkx: efficiently find absolute longest path in digraph

I want networkx to find the absolute longest path in my directed,
acyclic graph.
I know about Bellman-Ford, so I negated my graph lengths. The problem:
networkx's bellman_ford() requires a source node. I want to find the
absolute longest path (or the shortest path after negation), not the
longest path from a given node.
Of course, I could run bellman_ford() on each node in the graph and
sort, but is there a more efficient method?
From what I've read (eg,
http://en.wikipedia.org/wiki/Longest_path_problem) I realize there
actually may not be a more efficient method, but was wondering if
anyone had any ideas (and/or had proved P=NP (grin)).
EDIT: all the edge lengths in my graph are +1 (or -1 after negation), so a method that simply visits the most nodes would also work. In general, it won't be possible to visit ALL nodes of course.
EDIT: OK, I just realized I could add an additional node that simply connects to every other node in the graph, and then run bellman_ford from that node. Any other suggestions?
There is a linear-time algorithm mentioned at http://en.wikipedia.org/wiki/Longest_path_problem
Here is a (very lightly tested) implementation
EDIT, this is clearly wrong, see below. +1 for future testing more than lightly before posting
import networkx as nx
def longest_path(G):
dist = {} # stores [node, distance] pair
for node in nx.topological_sort(G):
pairs = [[dist[v][0]+1,v] for v in G.pred[node]] # incoming pairs
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
node, max_dist = max(dist.items())
path = [node]
while node in dist:
node, length = dist[node]
path.append(node)
return list(reversed(path))
if __name__=='__main__':
G = nx.DiGraph()
G.add_path([1,2,3,4])
print longest_path(G)
EDIT: Corrected version (use at your own risk and please report bugs)
def longest_path(G):
dist = {} # stores [node, distance] pair
for node in nx.topological_sort(G):
# pairs of dist,node for all incoming edges
pairs = [(dist[v][0]+1,v) for v in G.pred[node]]
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
node,(length,_) = max(dist.items(), key=lambda x:x[1])
path = []
while length > 0:
path.append(node)
length,node = dist[node]
return list(reversed(path))
if __name__=='__main__':
G = nx.DiGraph()
G.add_path([1,2,3,4])
G.add_path([1,20,30,31,32,4])
# G.add_path([20,2,200,31])
print longest_path(G)
Aric's revised answer is a good one and I found it had been adopted by the networkx library link
However, I found a little flaw in this method.
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
because pairs is a list of tuples of (int,nodetype). When comparing tuples, python compares the first element and if they are the same, will process to compare the second element, which is nodetype. However, in my case the nodetype is a custom class whos comparing method is not defined. Python therefore throw out an error like 'TypeError: unorderable types: xxx() > xxx()'
For a possible improving, I say the line
dist[node] = max(pairs)
can be replaced by
dist[node] = max(pairs,key=lambda x:x[0])
Sorry about the formatting since it's my first time posting. I wish I could just post below Aric's answer as a comment but the website forbids me to do so stating I don't have enough reputation (fine...)