Networkx - outdegree calculation - networkx

I'm calculating the outdegree over a series of different graphs. However, the last 'levels' in the graphs always produce a 0 outdegree when they shouldn't. For example, in the picture below the uppermost level nodes (labeled as 4) all have an outdegree of 0 (when it should be 1):
[Graph]: https://i.stack.imgur.com/yAi1H.jpg
Everything else calculates correctly. Any tips? I've created this graph from a previous piece of code (not mine), and can provide the piece where I'm creating the networkx graph. I could be adding edges incorrectly, but from the docs I seem to be fine. The 'graph' is the networkx DiGraph() which is passed into the function:
[Code - 1]: https://i.stack.imgur.com/E4gGL.png
[Code - 2]: https://i.stack.imgur.com/Y7reI.png
[Output]: https://i.stack.imgur.com/fRQK3.png
From the output the only one that should have a zero outdegree is the first, Part9518. You can see the visual of this graph in the first link.
Sorry about the links -- not enough rep to upload in post.

Related

append an atom with exisiting variables and create new set in clingo

I am totally new in asp, I am learning clingo and I have a problem with variables. I am working on graphs and paths in the graphs so I used a tuple such as g((1,2,3)). what I want is to add new node to the path in which the tuple sequence holds. for instance the code below will give me (0, (1,2,3)) but what I want is (0,1,2,3).
Thanks in advance.
g((1,2,3)).
g((0,X)):-g(X).
Naive fix:
g((0,X,Y,Z)) :- g((X,Y,Z)).
However I sense that you want to store the path in the tuple as is it is a list. Bad news: unlike prolog clingo isn't meant to handle lists as terms of atoms (like your example does). Lists are handled by indexing the elements, for example the list [a,b,c] would be stored in predicates like p(1,a). p(2,b). p(3,c).. Why? Because of grounding: you aim to get a small ground program to reduce the complexity of the solving process. To put it in numbers: assuming you are searching for a path which includes all n nodes. This sums up to n!. For n=10 this are 3628800 potential paths, introducing 3628800 predicates for a comparively small graph. Numbering the nodes as mentioned will lead to only n*n potential predicates to represent the path. For n=10 these are just 100, in comparison to 3628800 a huge gain.
To get an impression what you are searching for, run the following example derived from the potassco website:
% generating path: for every time exactly one node
{ path(T,X) : node(X) } = 1 :- T=1..6.
% one node isn't allowed on two different positions
:- path(T1,X), path(T2,X), T1!=T2.
% there has to be an edge between 2 adjascent positions
:- path(T,X), path(T+1,Y), not edge(X,Y).
#show path/2.
% Nodes
node(1..6).
% (Directed) Edges
edge(1,(2;3;4)). edge(2,(4;5;6)). edge(3,(1;4;5)).
edge(4,(1;2)). edge(5,(3;4;6)). edge(6,(2;3;5)).
Output:
Answer: 1
path(1,1) path(2,3) path(3,4) path(4,2) path(5,5) path(6,6)
Answer: 2
path(1,1) path(2,3) path(3,5) path(4,4) path(5,2) path(6,6)
Answer: 3
path(1,6) path(2,2) path(3,5) path(4,3) path(5,4) path(6,1)
Answer: 4
path(1,1) path(2,4) path(3,2) path(4,5) path(5,6) path(6,3)
Answer: 5
...

Centralities in networkx weighted graph

I am not able to compute centralities for a simple NetworkX weighted graph.
Is it normal or I am rather doing something wrong?
I add edges with a simple add_edge(c[0],c[1],weight = my_values), where
c[0],c[1] are strings (names of the nodes) and my_values integers, within a for loop. This is an example of the resulting edges:
('first node label', 'second node label', {'weight': 14})
(the number of nodes does't really matter — for now I keep it to only 20)
The edge list of my graph is a list of tuples, with (string_node1,string_node2,weight_dictionary) - everything looks fine, as I am also able to draw/save/read/ the graph...
Why?:
nx.degree_centrality gives me all 1s ?
nx.closeness_centrality gives me all 1s ?
example:
{'first node name': 1.0,
...
'last node name': 1.0}
Thanks for your help.
It was easy:
instead of using nx.degree_centrality() I use
my_graph.degree(weight='weight') - still I think this is a basic lack in the module...
...but, the issue is still open for nx.closeness_centrality
For making closeness_centrality consider weight, you have to add a distance attribute of 1 / weight to graph edges, as suggested in this issue.
Here's code to do it (graph is g):
g_distance_dict = {(e1, e2): 1 / weight for e1, e2, weight in g.edges(data='weight')}
nx.set_edge_attributes(g, g_distance_dict, 'distance')
I know this is a pretty old question, but just wanted to point out that the reason why your degree centrality values are all 1 is probably because your graph is complete (i.e., all nodes are connected to every other node), and degree centrality refers to the proportion of nodes in the graph to which a node is connected.
Per networkx's documentation:
The degree centrality for a node v is the fraction of nodes it is connected to.
The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in G.

How do I plot values in an array vs the number of times those values appear in Matlab?

I have a set of ages (over 10000 of them) and I want to plot a graph with the age from 20 to 100 on the x axis and then the number of times each of those ages appears in the data on the y axis. I have tried several ways to do this and I can't figure it out. I also have some other data which requires me to plot values vs how many times they occur so any advice on how to do this would be much appreciated.
I'm quite new to Matlab so it would be great if you could explain how things in your answer work rather than just typing out some code.
Thanks.
EDIT:
So I typed histogram(Age, 80) because as I understand that will plot the values in Age on a histogram split up into 80 bars (1 for each age). Instead I get this:
The bars aren't aligned and it's clearly not 1 per age nor has it plotted the number of times each age occurs on the y axis.
You have to use histogram(), and that's correct.
Let's see with an example.
I extract 100 ages between 20 and 100:
ages=randsample([20:100],100,true);
Now I call histogram() in this manner:
h=histogram(ages,[20:100]);
where h is an histogram object and this will also show the following plot:
However, this might look easy due to the fact that my ages vector is in range 20:100, so it will not contain any other values. If your vector, as instead, contains also ages not in range 20:100, you can specify the additional option 'BinLimits' as third input in histogram() like this:
h=histogram(ages,length([20:100]),'BinLimits',[20:100]);
and this option plots a histogram using the values in ages that fall between 20 and 100 inclusive.
Note: by inspecting h you can actually see and/or edit some proprieties of your histogram. An attribute (field) of such object you might be interested to is Values. This is a vector of length 80 (in our case, since we work with 80 bins) in which the i-th element is the number of items is the i-th bin. This will help you count the occurrences (just in case you need them to go on with your analysis).
Like Luis said in comments, hist is the way to go. You should specify bin edges, rather than the number of bins:
ages = randi([20 100], [1 10000]);
hist(ages, [20:100])
Is this what you were looking for?

Determine the attribute that influences the outcome most

I have a dataset in .csv format as shown:
NRC_CLASS,L1_MARKS_FINAL,L2_MARKS_FINAL,L3_MARKS_FINAL,S1_MARKS_FINAL,S2_MARKS_FINAL,S3_MARKS_FINAL,
FAIL,7,12,12,24,4,30,
PASS,49,36,46,51,31,56,
FAIL,59,35,42,18,18,45,
PASS,61,30,51,33,30,52,
PASS,68,30,35,53,45,54,
2,82,77,75,32,36,56,
FAIL,18,35,35,32,21,35,
2,86,56,46,44,37,60,
1,94,45,62,70,50,59,
Where the first column talks about the over all grade:
FAIL - Fail
PASS - Pass class
1 - First class
2 - Second class
D - Distinction
This is followed by marks of each student in 6 subjects.
Is there anyway i can find out performance in which subject makes a difference in overall outcome?
I am using Weka and had used J48 to build a tree.
The summary of J48 classifier is:
=== Summary ===
Correctly Classified Instances 30503 92.5371 %
Incorrectly Classified Instances 2460 7.4629 %
Kappa statistic 0.902
Mean absolute error 0.0332
Root mean squared error 0.1667
Relative absolute error 10.8867 %
Root relative squared error 42.7055 %
Total Number of Instances 32963
Also I discretized the marks data into 10 bins with useEqualFrequency set to true. The summary of J48 now is:
=== Summary ===
Correctly Classified Instances 28457 86.3301 %
Incorrectly Classified Instances 4506 13.6699 %
Kappa statistic 0.8205
Mean absolute error 0.0742
Root mean squared error 0.2085
Relative absolute error 24.3328 %
Root relative squared error 53.4264 %
Total Number of Instances 32963
First of all, you may need to quantify a value for each of the NRC_CLASS Values (or even better, use the actual grade out of 100) to improve the quality of attribute testing.
From there, you could potentially use Attribute Selection (found in the Select Attribute tab of Weka Explorer) to find the attributes that have the greatest influence on the overall grade. Perhaps the CorrelationAttributeEval as the Attribute Evaluator coupled with the Ranker search method could assist in identifying the attributes of greatest importance to the least.
Hope this Helps!
It seems you want to determine the relative relevance of each attribute. In this case, you need to use a weight learning algorithm. Weka has a few, I just used Relief. Go to the tab Select attributes, in Attribute Evaluator, select ReliefF-AttributeEval, it will select the
Select the attribute that has the value for the outcome class.
Search Method for you. Click Start.
The results will include the ranked attributes, the highest ranked is the most relevant.
In a test data set T with 25 attributes, run i=1:25 rounds where you replace the values of the i-th attribute with random values (=noise). Compare the test performance of each of the 25 rounds with the case where no attribute was replaced, and identify the round in which the performance dropped the most.
If the worst performance decrease occurred e.g. in round 13, this indicates that attribute 13 is the most important one.

To find the largest edge in the path between two given nodes / vertices

I am trying to update a MST by adding a new vertex in the MST. For this, I have been following "Updating Spanning Tree" by Chin and Houck. http://www.computingscience.nl/docs/vakken/al/WerkC/UpdatingSpanningTrees.pdf
A step in the paper requires me to find the largest edge in the path/paths between two given vertices. My idea is to find all the possible paths between the vertices and then, subsequently find the largest edge from the paths. I have been trying to implement this in MATLAB. However, so far, I have been unsuccessful. Any lead / clear algorithm to find all paths between two vertices or even the largest edge in the path between two given nodes/ vertices would be really welcome.
For reference, I would like to put forward an example. If the graph has following edges 1-2, 1-3, 2-4 and 3-4, the paths between 4 and 4 are:
1) 4-2-1-3-4
2) 4-3-1-2-4
Thank you
The algorithm works by lowering the t value to exclude large edges from the new MST. When the algorithm completes, t will be the lowest edge that remains to be inserted to complete the MST.
The m value represents the largest edge on a path from r to z, local to each run of INSERT. m is lowered at each iteration of the loop if possible, thereby removing the previous m edge as a possible candidate for t.
It's not easy to explain in words, I recommend doing a run of the algorithm on paper until the steps are clear.
I made a quick attempt to sketch the steps here: http://jacob.midtgaard-olesen.dk/?p=140
But basically, the algorithm adds edges from the old MST unless it finds a smaller edge to add between the new node z and another node in the old MST. In the example, the edge (A,B) is not in the new tree, since a better connection to B was found by the algorithm.
Note that on selecting h and k, if t and (w,r) have equal edge value, I believe you should choose (w,r)
Finally you should probably go trough the proof following the algorithm to understand why the algorithm works. (I didn't read it all :) )