I am trying to process the graph given in wiki-Vote.txt (https://snap.stanford.edu/data/wiki-Vote.html). There are 7115 nodes with id ranging from 3 to 8297. I want to relabel the nodes from 0 to 7114. I checked the mappings in relabel_nodes() but still could not solve the problem. Please suggest. thanks
edit
I'm not sure if it's new, but my original answer didn't mention
nx.convert_node_labels_to_integers(G, first_label=0, ordering='default', label_attribute=None). So for a given graph G, you can do
H=nx.convert_node_labels_to_integers(G). This doesn't guarantee that the order is the same as in G. You can have the original label be stored in H if you call H=nx.convert_node_labels_to_integers(G, label_attribute='original_name'). You can guarantee that the order is the same as in G, by setting ordering=sorted(G.nodes()).
original answer
Given a graph G with some set of nodes, the simplest thing would be
mapping = {old_label:new_label for new_label, old_label in enumerate(G.nodes())}
H = nx.relabel_nodes(G, mapping)
This will create a dictionary mapping whose keys are the old labels and whose values are their new labels (read up on dictionary comprehensions). The ordering of the new labels is given by the order that G.nodes() returns the values (which you can't control). The new graph H has the node labels changed.
If you want a specific order, you need to sort G.nodes() appropriately. So you can do
nodelist = G.nodes()
nodelist.sort()
mapping = {old_label:new_label for new_label, old_label in enumerate(nodelist)}
H = nx.relabel_nodes(G, mapping)
which would have them sorted in numerical order (or alphabetical order if the node names are strings). If you want some other custom order, you'll have to figure out how to sort nodelist.
Related
I have some graphs built with NetworkX with labelled nodes (names). I have computed trophic levels with the trophic tools script and obtained a numpy array of trophic values.
I want to create a node list of these values, with the according labels, similar for other topological indices (e.g. nx.degree_centrality is easily interpretable as every node names is followed by the relative value).
Can someone suggest how to merge or convert the numpy array to a labelled node list?
Thanks in advance!
I realised that the algorithm doesn't produce a real Laplacian and that every entry in the array was simply the node trophic value. I assumed the order of the array values as the same of the original node list, and the final data seems to concur with that (plant with lower values and predator with higher values).
This is the code to join node names with their trophic values if someone need to compute a similar trophic index:
trophic_levels = ta.trophic_levels(G)
trophic_levels_list = list(trophic_levels)
trophic_levels_series = pd.Series(trophic_levels_list)
G_nodes = pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index')
G_nodes.reset_index(inplace=True)
G_nodes['trophic_level'] = trophic_levels_series
I'm really sorry to bother so I hope it is not a silly or repetitive question.
I have been scraping a website, saving the results as a collection in MongoDB, exporting it as a JSON file and importing it in MATLAB.
At the end of the story I obtained a struct object organised
like this one in the picture.
What I'm interested in are the two last cell arrays (which can be easily converted to string arrays with string()). The first cell array is a collection of keys (think unique products) and the second cell array is a collection of values (think prices), like a dictionary. Each field is an instance of possible values for a set of this keys (think daily prices). My goal is to build a matrix made like this:
KEYS VALUES_OF_FIELD_1 VALUES_OF_FIELD2 ... VALUES_OF_FIELDn
A x x x
B x z NaN
C z x y
D NaN y x
E y x z
The main problem is that, as shown in the image and as I tried to explain in the example matrix, I don't always have a value for all the keys in every field (as you can see sometimes they are 321, other times 319 or 320 or 317) and so the key is missing from the first array. In that case I should fill the missing value with a NaN. The keys can be ordered alphabetically and are all unique.
What would you think would be the best and most scalable way to approach this problem in MATLAB?
Thank you very much for your time, I hope I explained myself clearly.
EDIT:
Both arrays are made of strings in my case, so types are not a problem (I've modified the example). The main problem is that, since the keys vary in each field, firstly I have to find all the (unique) keys in the structure, to build the rows, and then for each column (field) I have to fill the values putting NaN where the key is missing.
One thing to remember you can't simply use both strings and number in one matrix. So, if you combine them together they can be either all strings or all numbers. I think all strings will work for you.
Before make a matrix make sure that all the cells have same element.
new_matrix = horzcat(keys,values1,...valuesn);
This will provide a matrix for each row (according to your image). Now you can use a for loop to get matrices for all the rows.
For now, I've solved it by considering the longest array of keys in the structure as the complete set of keys, let's call it keys_set.
Then I've created for each field in the structure a Map object in this way:
for i=1:length(structure)
structure(i).myMap = containers.Map(structure(i).key_field, structure(i).value_field);
end
Then I've built my matrix (M) by checking every map against the keys_set array:
for i=1:length(keys_set)
for j=1:length(structure)
if isKey(structure(j).myMap,char(keys_set(i)))
M(i,j) = string(structure(j).myMap(char(keys_set(i))));
else
M(i,j) = string('MISSING');
end
end
end
This works, but it would be ideal to also be able to check that keys_set is really complete.
EDIT: I've solved my problem by using this function and building the correct set of all the possible keys:
%% Finding the maximum number of keys in all the fields
maxnk = length(structure(1).key_field);
for i=2:length(structure)
if length(structure(i).key_field) > maxnk
maxnk = length(structure(i).key_field);
end
end
%% Initialiting the matrix containing all the possibile set of keys
keys_set=string(zeros(maxnk,length(structure)));
%% Filling the matrix by putting "0" if the dimension is smaller
for i=1:length(structure)
d = length(string(structure(i).key_field));
if d == maxnk
keys_set(:,i) = string(structure(i).key_field);
else
clear tmp
tmp = [string(structure(i).key_field); string(zeros(maxnk-d,1))];
keys_set(:,i) = tmp;
end
end
%% Merging without duplication and removing the "0" element
keys_set = union_several(keys_set);
keys_set = keys_set(keys_set ~= string(0));
For example,
A=[a,b,c,d]
B=[1,2,3,4]
my question is: how to generate all possible ways to merge A and B, such that in the new list we can have a appears before b, b appears before c,etc., and 1 appears before 2, 2 appears before 3,etc.?
I can think of one implementation:
We choose 4 slots from 8,then for each possible selection, there are 2 possible ways--A first or B first.
I wonder is there a better way to do this?
EDIT:
I've just learned a more intuitive way--use recursion.
For each spot, there are two possible cases, either taken from A or taken from B; keep recursing until A or B is empty, and concatenate the remaining.
If the relative order is different than what constitutes a sorted list (I assume it is, because otherwise it would not be a problem), then you need to formalize the initial order. Multiple ways to do that. the easiest being remembering the index of each element in each list. Example: valid position for a is 1 in the first array [...]
Then you could just go ahead and join the lists, then generate all the permutations of elements. Any valid permutation is one that keeps the order relationship of the new indexes with the order you have stored
Example of one valid permutation array
a12b3cd4
You can know and check that this is valid permutation because the index of element 'a' is smaller than the index of b, and so on. and you know the indexes must be smaller because this is what you have formulated at the first step
Similarly an invalid permutation array is
ba314cd2
same way of checking
Assume that, there are 'M' objects aiming to form coalitions together. I need to know how to exhaustively generate all possible formations of coalitions using an M*M binary matrix given the following properties:
1- The elements of main diagonal are set to 1 (each object is in the same coalition with itself)
2- The matrix is symmetrical (being in the same coalition for two objects is a mutual relationship)
3- if objects (i,j) are in the same coalition, and (j,k) are in the same coalition, thus (i,k) are in the same coalition as well.
A simple formation of the coalitions with 4 objects is given by this example:
You can use another data structure which is easier to generate, then convert it to the matrix you want. Use a list with the coalition ids, where the coalition id is the minimum of all object ids. For your example this would be [1 2 1 1]. Using this data structure it's easier to describe a generator.
For each object you have the choice between joining one of the existing coalitions opened by objects with a smaller id or to open a new coalition.
There is probably no vectrorized solution, to implement this use a recursion or dynamic programming.
Now, I have separate any pair that is in common between the two input files. Find out the mean between that pair like this : (correlation in first text file)X(correlation in second text file)/ (correlation in first text file)+(correlation in second text file). Again store these in a separate matrix.
Building a tree :
Now, out of all the elements in both the input files, select the 10 most frequent ones. Each of these form the root of a separate K tree.The algorithm goes like this : For the word at the root level, check all its harmonic mean values with the other tags in the matrix that is developed in the previous step. Select the top two highest harmonic means, and put the other word in the tag pair as the child node of the root.
Can someone please guide me through the MATLAB steps of going through this? Thank you for your time.
Okay, so start by putting the data in a useful format; maybe count the number of distinct words, and make an N-by-M matrix of binary values (I'll call this data1). Each of the N rows will describe the words associated with a single image. Each of the M columns will descibe the images for which a single word is tagged. Therefore, the value at (N, M) is 0 if tag M is not in image N, and 1 if it is.
From this matrix, to find correlation between all pairs of words, you could do:
correlations1 = zeros(M, M);
for i=1:M
for j=1:M
correlations1(i, j) = corr(data1(:, i), data1(:, j));
end
end
now the matrix correlations tells you the correlation between tags. Do the same for the other text file. You can make a matrix of harmonic means with:
h_means = correlations1.*correlations2./(correlations1+correlations2);
You can find the 30 most freqent tags by counting the number of 1s in each column of the data matrix. Since we want to find the most common tags in both files, we'll add the data matricies first:
[~, tag_ranks] = sort(sum(data1 + data2, 1), 'descending'); %get the indices in sorted order
top_tags = tag_ranks(1:30);
For the tree building at the end, you will either want to create a tree class (see classdef), or store the tree in an array. To find the top two highest harmonic means, you will want to look in the h_means matrix; for a tag m1, we can do:
[~, tag_ranks] = sort(h_means(m1, :), 'descending');
top_tag = tag_ranks(1);
second_tag = tag_ranks(2);
You will then need to insert these tags into the tree and repeat.