I apologize for the formatting and what seems like a very easy question. I am new to matlab and this stack exchange. I am attempting to create an adjacency matrix from a few column vectors in matlab. The information was imported from a text file. The information looks like this.
X Y Z W
aa bb 1 aa
bb cc 2 bb
cc dd 3 cc
Where columns X and Y are the names of the vertex columns. Z is the weight. Columns X and Y have about 30000 entries, with repetition. Column W is all of the vertices in my graph sorted alphabetically without repetition.
The output should look like this for the sample data.
aa bb cc dd
aa 0 1 0 0
bb 1 0 2 0
cc 0 2 0 3
dd 0 0 3 0
I know how to create the matrix if the vertices are numerical. But I can't figure out how to assign numeric values to the vertices in column W and make everything still match up.
This code will work if the values in all the columns are numerical.
A = sparse([X; Y],[Y; X],[Z; Z]);
Where X, Y and Z are the columns above. When I try this with I get the following error
'Undefined function 'sparse' for input arguments of type 'cell'
You can still use sparse but you're going to have to do a bit more work. For one thing, we need to transform the labels in X and Y into unique integer IDs. Try using unique on the combined X and Y inputs so that you can get unique integer IDs shared between both.
Specifically, unique will give you a list of all unique entries of the input (so X and Y combined). The reason why we combine both X and Y is because there are certain tokens in X that may not be present in Y and vice-versa. Doing this ID assigning on the combined input will ensure consistency. The 'stable' flag is there because unique actually sorts all of the unique entries by default. If the input is a cell array of strings, the cell array is sorted in lexicographical order. If you want to maintain the order in which unique entries are encountered starting from the beginning to the end of the cell array, you use the 'stable' flag.
Next, what I would use is an associative array via a containers.Map that maps a string to a unique integer. Think of an associative array as a dictionary where the input is a key and the output is a value that is associated with this key. The best example of an associative array in this context would be the English dictionary. The key in this case is the word you want to look up, and the value is the definition of this word. The key is a character string, and the output is another character string.
Here, what we'll do is make the input a string and the output a single number. For each unique string we encountered with the combination of X and Y, we'll assign a unique ID to it. After that, we can use X and Y as inputs into the containers.Map to get our IDs which can then be used as input into sparse.
Without further ado, here's the code:
%// Your example
X = {'aa', 'bb', 'cc'};
Y = {'bb', 'cc', 'dd'};
Z = [1 2 3];
%// Call unique and get the unique entries
chars = unique([X Y], 'stable');
%// Create containers.Map
map = containers.Map(chars, 1:numel(chars));
%// Find the IDs for each of X and Y
idX = cell2mat(values(map, X)).';
idY = cell2mat(values(map, Y)).';
%// Create sparse matrix
A = sparse([idX; idY], [idY; idX], [Z; Z]);
The third and second last lines of code are a bit peculiar. You need to use the values function to retrieve the values given a cell array of keys. We have X and Y as both cell arrays, and so the output is also a cell array of values. We don't want this to be a cell array but to be a numerical vector instead as input into sparse, so that's why we use cell2mat to convert this back for us. Once we finally retrieve the IDs for X and Y, we put this into sparse to complete the matrix.
When we display the full version of A, we get:
>> full(A)
ans =
0 1 0 0
1 0 2 0
0 2 0 3
0 0 3 0
Minor Note
I see that W is the cell array of the vertex names sorted and in alphabetical order. If that's the case, then you don't need to do any unique calling, and you can just use W as the input into the containers.Map. As such, do this:
%// Create containers.Map
map = containers.Map(W, 1:numel(W));
%// Find the IDs for each of X and Y
idX = cell2mat(values(map, X)).';
idY = cell2mat(values(map, Y)).';
%// Create sparse matrix
A = sparse([idX; idY], [idY; idX], [Z; Z]);
Related
I have sum of 3 cell arrays
A=72x1
B=72x720
C=72x90
resultant=A+B+C
size of resultant=72x64800
now when I find the minimum value with row and column indices I can locate the row element easily but how can I locate the column element in variables?
for example
after dong calculations for A,B,C I added them all and got a resultant in from of <72x(720x90)> or can say a matrix of integers of size <72x64800> then I found the minimum value of resultant with row and column index using the code below.
[minimumValue,ind]=min(resultant(:));
[row,col]=find(result== minimumValue);
then row got 14 and column got 6840 value..
now I can trace row 14 of all A,B,C variables easily but how can I know that the resultant column 6480 belongs to which combination of A,B,C?
Instead of using find, use the ind output from the min function. This is the linear index for minimumValue. To do that you can use ind2sub:
[r,c] = ind2sub(size(resultant),ind);
It is not quite clear what do you mean by resultant = A+B+C since you clearly don't sum them if you get a bigger array (72x64800), on the other hand, this is not a simple concatenation ([A B C]) since this would result in a 72x811 array.
However, assuming this is a concatenation you can do the following:
% get the 2nd dimension size of all matrices:
cols = cellfun(#(x) size(x,2),{A,B,C})
% create a vector with reapiting matrices names for all their columns:
mats = repelem(['A' 'B' 'C'],cols);
% get the relevant matrix for the c column:
mats(c)
so mats(c) will be the matrix with the minimum value.
EDIT:
From your comment I understand that your code looks something like this:
% arbitrary data:
A = rand(72,1);
B = rand(72,720);
C = rand(72,90);
% initializing:
K = size(B,2);
N = size(C,2);
counter = 1;
resultant = zeros(72,K*N);
% summing:
for k = 1:K
for n = 1:N
resultant(:,counter) = A + B(:,k) + C(:,n);
counter = counter+1;
end
end
% finding the minimum value:
[minimumValue,ind] = min(resultant(:))
and from the start of the answer you know that you can do this:
[r,c] = ind2sub(size(resultant),ind)
to get the row and column of minimumValue in resultant. So, in the same way you can do:
[Ccol,Bcol] = ind2sub([N,K],c)
where Bcol and Ccol is the column in B and C, respectively, so that:
minimumValue == A(r) + B(r,Bcol) + C(r,Ccol)
To see how it's working imagine that the loop above fills a matrix M with the value of counter, and M has a size of N-by-K. Because we fill M with a linear index, it will be filled in a column-major way, so the row will correspond to the n iterator, and the column will correspond to the k iterator. Now c corresponds to the counter where we got the minimum value, and the row and column of counter in M tells us the columns in B and C, so we can use ind2sub again to get the subscripts of the position of counter. Off course, we don't really need to create M, because the values within it are just the linear indices themselves.
Is it possible to vectorize a loop that goes through different index mappings? For example:
a = zeros(1, 5);
m = [4 3 5; 5 1 3];
f = [1 2 3; 4 5 6];
for ii = 1:size(m,1)
a(m(ii,:)) = a(m(ii,:)) + f(ii,:);
end
Gives output:
a = [5 0 2+6 1 3+4] = [5 0 8 1 7]
Can this be done without the for loop?
This is a classic case of accumarray. accumarray works by providing a set of keys and a set of values associated with each key. accumarray groups all values that belong to the same key and does something to all of the values. The default behaviour is to sum all of the values that belong to the same key together, which is what you're after.
In your case, m are the keys and f are the values you want to add up that belong to the same key. Therefore:
>> a = accumarray(m(:), f(:))
a =
5
0
8
1
7
In general, you may have keys that are missing. Therefore, you may opt to specify the output dimensions of the output array where it should be the maximum key value seen in m:
a = accumarray(m(:), f(:), [max(f(:)), 1]);
This is of course assuming that f consists of strictly positive values.
In general, if you have floating point numbers in f, then accumarray out of the box won't work because the keys are assumed to be strictly positive and integer. However, a common trick is to assign a unique ID to each value of f and use this as the input into accumarray. The third output of unique should do this for you. You'll also need the first output of unique to help you figure out which sum belongs to what key:
[msorted,~,id] = unique(m);
a = accumarray(id, f(:));
out = [msorted a];
out will contain a 2 column matrix where each row gives you a unique value in m and the associated sum for all values that shared the same key in m.
I have got a matrix Nx2 which contains edges from a graph. Indexes of the matrix correspond to ids of twitter users. Their relation is the retweeted status(if a user retweets another user). Totally in my graph there exists N retweeted relations. The number of users are M. I want to transform the ids of the graph from the initial twitter ids to 1:M ids. For example to replace the first id of the graph with 1(in every line and column that exists). I want to do so, without changing again the id which already have been changed. I tried to use a for-loop combined with find function in order to tranform ids to index. However what should I do in order to avoid changing items that already have been changed? I know that my code is wrong:
counter = 0;
for index = 1:length(grph)
index1 = find(grph(:,1) == grph(index,1));
index2 = find(grph(:,2) == grph(index,2));
counter = counter+1;
grph(index1,1) = counter;
counter = counter+1;
grph(index2,2) = counter;
end
A little example which illustrates what I want, the following:
35113 45010
5695 57711
22880 33193
22880 45010
43914 35113
Desired output :
1 2
3 4
5 6
5 2
7 1
Pretty simple. Use a combination of unique and reshape. Assuming your ID matrix in your example was stored in A:
[~,~,B] = unique(A.', 'stable');
C = reshape(B, [size(A,2) size(A,1)]).';
A would be the matrix of IDs while C is your desired output. How this works is that unique's third output would give you an array of unique IDs for each value that is encountered in A. The reason why we need to transpose the result first is because MATLAB operates along the columns, and your result needs to operate along the rows. Transposing the result effectively does this. Also, you need to the 'stable' flag so that we assign IDs in the order we encounter them. Not doing 'stable' will sort the values in A first, then assign the IDs.
B will inevitably become a column vector, and so we need to reshape this back into a matrix that is the same size as your input A. Note that I need to reshape by the transpose of the result as reshape will operate among the columns. Because we were operating along the rows, I need to reshape the matrix by its transpose, and then transpose that result to get your desired output.
Example use:
A = [35113 45010
5695 57711
22880 33193
22880 45010
43914 35113]; %// Matrix defined by you
[~,~,B] = unique(A.', 'stable');
C = reshape(B, [size(A,2) size(A,1)]).';
C =
1 2
3 4
5 6
5 2
7 1
However, if sorting the IDs isn't required and you just want to have IDs per node ID, then you can just use unique as is without the stable flag.
Now, if you want to know which IDs from your graph got assigned to which IDs in the output matrix, just use the first output of unique:
[mapping, ~, B] = unique(A.', 'stable');
mapping will give you a list of all unique IDs that were encountered in your matrix. Their position identifies what ID was used to assign them into B. In other words, running this, we get:
mapping =
35113
45010
5695
57711
22880
33193
43914
This means that ID 35113 in A gets mapped to 1 in B, ID 45010 in A gets mapped to 2 in B and so on. As a more verbose illustration:
mappings = [(1:numel(mapping)).' mapping]
mappings =
1 35113
2 45010
3 5695
4 57711
5 22880
6 33193
7 43914
I can't test right now, but this should do what you want:
[~, ~, kk] = unique(A.','stable');
result = reshape(kk, fliplr(size(A))).';
You need a recent enough Matlab version, so that unique has the 'stable' option.
If you have the Communications Toolbox, the second line could be replaced by
result = vec2mat(kk, size(A,2));
I have a matrix A which is a n X 2 matrix of floats with the second column in each row representing the column index of the value in the first column. I would ideally like to vectorize the insertion of elements in the first column of A in the row rowIndex and their respective columns as specified by A(:,2).
The pseudo-code for what I am looking to achieve is as follows:
myCellArray = cell(n X n)
%rowIndex is some predefined integer.
rowIndex
%A is my n X 2 matrix of values and corresponding column indices.
A
myCellArray{(rowIndex*ones(size(A(:,1),1),1)),A(:,2)} = A(:,1)
I have provided code for what I have tried at the bottom although I have tried something similar to the last line and it has failed hence I was wondering how something like this would work in MATLAB. Basically if my question is confusing, I am looking to vectorize insertion of elements into cell array by using a vector of indices and vector of values to insert at each index.
a{1,z(:,2)} = z(:,1)
Here I am trying to insert all values in the first column of z matrix into the cells indexed by 1 and the corresponding entry in the second column of z into the cell array.
Use
a(1,z(:,2)) = mat2cell(z(:,1), ones(1,size(z,1)), 1);
For example, with
z = [1 2
3 4];
this results in
a =
[] [1] [] [3]
I am pretty new to Matlab, now i want to use the matlab to do some clustering job.
if I have 3 columns values
id1 id2 distvalue1
id1 id3 distvalue2
....
id2 id4 distvalue i
.....
5000 ids in total, but some ids pairs are missing the distance value
in python I can make loops to import these distance value into a matrix form. How I can do it in matlab?
and also let the matlab knows id1,...idx are identifies and the third column is the value
Thanks!
Based on the comments, you know how to get the data into the form of an N x 3 matrix, called X, where X(:,1) is the first index, X(:,2) is the second index, and X(:,3) is the corresponding distance.
Let's assume that the indices (id1... idx) are arbitrary numeric labels.
So then we can do the following:
% First, build a list of all the unique indices
indx = unique([X(:,1); X(:,2)]);
Nindx = length(indx);
% Second, initialize an empty connection matrix, C
C = zeros(Nindx, Nindx); %or you could use NaN(Nindx, Nindx)
% Third, loop over the rows of X, and map them to points in the matrix C
for n = 1:size(X,1)
row = find(X(n,1) == indx);
col = find(X(n,2) == indx);
C(row,col) = X(n,3);
end
This is not the most efficient method (that would be to remap the indices of X to the range [1... Nindx] in a vectorized manner), but it should be fine for 5000 ids.
If you end up dealing with very large numbers of unique indices, for which only very few of the index-pairs have assigned distance values, then you may want to look at using sparse matrices -- try help sparse -- instead of pre-allocating a large zero matrix.