how to transfer the pairwise distance value in a distance matrix - matlab

I am pretty new to Matlab, now i want to use the matlab to do some clustering job.
if I have 3 columns values
id1 id2 distvalue1
id1 id3 distvalue2
....
id2 id4 distvalue i
.....
5000 ids in total, but some ids pairs are missing the distance value
in python I can make loops to import these distance value into a matrix form. How I can do it in matlab?
and also let the matlab knows id1,...idx are identifies and the third column is the value
Thanks!

Based on the comments, you know how to get the data into the form of an N x 3 matrix, called X, where X(:,1) is the first index, X(:,2) is the second index, and X(:,3) is the corresponding distance.
Let's assume that the indices (id1... idx) are arbitrary numeric labels.
So then we can do the following:
% First, build a list of all the unique indices
indx = unique([X(:,1); X(:,2)]);
Nindx = length(indx);
% Second, initialize an empty connection matrix, C
C = zeros(Nindx, Nindx); %or you could use NaN(Nindx, Nindx)
% Third, loop over the rows of X, and map them to points in the matrix C
for n = 1:size(X,1)
row = find(X(n,1) == indx);
col = find(X(n,2) == indx);
C(row,col) = X(n,3);
end
This is not the most efficient method (that would be to remap the indices of X to the range [1... Nindx] in a vectorized manner), but it should be fine for 5000 ids.
If you end up dealing with very large numbers of unique indices, for which only very few of the index-pairs have assigned distance values, then you may want to look at using sparse matrices -- try help sparse -- instead of pre-allocating a large zero matrix.

Related

Access multiple elements and assign to each selected element a different value

I need to know if there is any efficient way of doing the following in MATLAB.
I have several big sparse matrices, the size of each one is roughly 9000000x9000000.
I need to access multiple element of such matrix and assign to each selected element a different value stored in another array. I'll give an example:
What I have:
SPARSE MATRIX of size 9000000x9000000
Matrix with the list of indexes and values I want to access, this is a matrix like this:
[row1, col1, value1;
row2, col2, value2;
...
rowN, colN, valueN]
Where N is the length of such matrix.
What I need:
Assign to the SPARSE MATRIX the corresponding value to the corresponding index, this is:
SPARSE_MATRIX(row1, col1) = value1
SPARSE_MATRIX(row2, col2) = value2
...
SPARSE_MATRIX(rowN, colN) = valueN
Thanks in advance!
EDIT:
Thank you to both for answering, I think I did not explain myself well, I'll try again.
I already have a large SPARSE MATRIX of about 9000000 rows x 9000000 columns, it is a SPARSE MATRIX filled with zeros.
Then I have another array or matrix, let's call it M with N number of rows, where N could take values from 0 to 9000000; and 3 columns. The first two columns are used to index an element of my SPARSE MATRIX, and the third column stores the value I want to transfer to the SPARSE MATRIX, this is, given a random row of M, i:
SPARSE_MATRIX(M(i, 1), M(i, 2)) = M(i, 3)
The idea is to do that for all the rows, I have tried it with common indexing:
SPARSE_MATRIX(M(:, 1), M(:, 2)) = M(:, 3)
Now I would like to do this assignation for all the rows in M as fast as possible, because if I use a loop or common indexing it takes ages (I am using a 7th Gen i7 processor with 16 GB of RAM). And I also need to keep the zeros in the SPARSE_MATRIX.
EDIT 2: SOLVED! Thank you Metahominid, I was not thinking through, but yes the sparse function does solve my problem, I just think my brain circuits were shortcircuited yesterday and was unable to see through it hahaha. Thank you to both anyway!
Regards!
You can construct a sparse matrix like this.
A = sparse(i,j,v)
S = sparse(i,j,v) generates a sparse matrix S from the triplets i, j,
and v such that S(i(k),j(k)) = v(k). The max(i)-by-max(j) output
matrix has space allotted for length(v) nonzero elements. sparse adds
together elements in v that have duplicate subscripts in i and j.
So you can simply construct the row vector, column vector and value vector.
I am answering in part because I cannot comment. You question seems a little confusing to me. The sparse() function in MATLAB does just this.
You can enter your arrays of indices and values directly into the interface, or declare a sparse matrix of zeros and set each individually.
Given your data format make three vectors, ROWS = [row1; ...; rown], COLS = [col1; ...; coln], and DATA = [val1; ... valn]. I am assuming that your size is the overall size of the full matrix and not the sparse portion.
Then
A = sparse(ROWS, COLS, DATA) will do just what you want. You can even specify the original matrix size.
A = sparse(ROWS, COLS, DATA, 90...., 90....).

how to check the values of each variables from a resultant matrix in matlab?

I have sum of 3 cell arrays
A=72x1
B=72x720
C=72x90
resultant=A+B+C
size of resultant=72x64800
now when I find the minimum value with row and column indices I can locate the row element easily but how can I locate the column element in variables?
for example
after dong calculations for A,B,C I added them all and got a resultant in from of <72x(720x90)> or can say a matrix of integers of size <72x64800> then I found the minimum value of resultant with row and column index using the code below.
[minimumValue,ind]=min(resultant(:));
[row,col]=find(result== minimumValue);
then row got 14 and column got 6840 value..
now I can trace row 14 of all A,B,C variables easily but how can I know that the resultant column 6480 belongs to which combination of A,B,C?
Instead of using find, use the ind output from the min function. This is the linear index for minimumValue. To do that you can use ind2sub:
[r,c] = ind2sub(size(resultant),ind);
It is not quite clear what do you mean by resultant = A+B+C since you clearly don't sum them if you get a bigger array (72x64800), on the other hand, this is not a simple concatenation ([A B C]) since this would result in a 72x811 array.
However, assuming this is a concatenation you can do the following:
% get the 2nd dimension size of all matrices:
cols = cellfun(#(x) size(x,2),{A,B,C})
% create a vector with reapiting matrices names for all their columns:
mats = repelem(['A' 'B' 'C'],cols);
% get the relevant matrix for the c column:
mats(c)
so mats(c) will be the matrix with the minimum value.
EDIT:
From your comment I understand that your code looks something like this:
% arbitrary data:
A = rand(72,1);
B = rand(72,720);
C = rand(72,90);
% initializing:
K = size(B,2);
N = size(C,2);
counter = 1;
resultant = zeros(72,K*N);
% summing:
for k = 1:K
for n = 1:N
resultant(:,counter) = A + B(:,k) + C(:,n);
counter = counter+1;
end
end
% finding the minimum value:
[minimumValue,ind] = min(resultant(:))
and from the start of the answer you know that you can do this:
[r,c] = ind2sub(size(resultant),ind)
to get the row and column of minimumValue in resultant. So, in the same way you can do:
[Ccol,Bcol] = ind2sub([N,K],c)
where Bcol and Ccol is the column in B and C, respectively, so that:
minimumValue == A(r) + B(r,Bcol) + C(r,Ccol)
To see how it's working imagine that the loop above fills a matrix M with the value of counter, and M has a size of N-by-K. Because we fill M with a linear index, it will be filled in a column-major way, so the row will correspond to the n iterator, and the column will correspond to the k iterator. Now c corresponds to the counter where we got the minimum value, and the row and column of counter in M tells us the columns in B and C, so we can use ind2sub again to get the subscripts of the position of counter. Off course, we don't really need to create M, because the values within it are just the linear indices themselves.

Partial sum of a vector

For a vector v (e.g. v=[1,2,3,4,5]), and two index vectors (e.g. a=[1,1,1,2,3] and b=[3,4,5,5,5], with all a(i)<b(i)), I would like to construct w=sum(v(a:b)), which gives the values
w = zeros(length(a),1);
for i = 1:length(a)
w(i)=sum(v(a(i):b(i)));
end
It is slow when length(a) is large. Can I compute w without the for loop?
Yes! The nth element of cumsum(v) is the sum of the first n elements in v, so just take that and subtract the sum of the elements that you don't want to include:
v=[1,2,3,4,5]
a=[1,1,1,2,3]
b=[3,4,5,5,5]
C=cumsum(v)
C(b)-C(a)+v(a)
%// or alternatively
C=cumsum([0 v])
C(b+1)-C(a)
The following code works, but it is of course much less readable:
% assume v is a column vector
units = 1:length(v); units = units'; %units is a column vector
units_matrix = repmat(units, [1 length(a)]);
a_matrix = repmat(a, [length(v) 1]); % assuming a is is a row vector
b_matrix = repmat(b, [length(v) 1]);
weights = (units_matrix>=a_matrix) & (units_matrix<=b_matrix);
v_matrix = repmat(v, [1 length(a)]);
w = sum(v_matrix.*weights);
Explanation:
v_matrix contains copies of v. The summation will be done along
the column, so we need to prepare the other needed information in
vectorized form. units_matrix contains the indexes in v along the
columns. The columns are identical. a_matrix and b_matrix, in
each of their column, contains the indexes that are relevant for each
partial summation. All rows are identical. weights is a logical
matrix where, for each column, the indexes contained in
units_matrix between the corresponding a and b are 1 (true),
and the rest is 0. The element-wise multiplication thus filters the
"right" values, and all the indexes outside the range (again, for
each different column) is multiplied by zero. w is then he result
of the sum function, i.e. a row vector (every column of the
"filtered" matrix is summed).

How to sum up weights of equal rows of a matrix without using 'for loops'

I have a m x n matrix 'A' with a m x 1 vector of Weights corresponding to A. I have used 'unique' to find the unique matrix and to find IA and IC. How I can sum up the weights of equal rows in A in a faster way than using two 'for loops'? So far, I have
[Dis_Good_path,IA,IC]=unique(Good_path,'rows','stable');
for i=1:length(IA) % Summing up the weights corresponding to equal paths
Dis_R2(i)=0;
for j=1:length(IC)
if IA(i)==IC(j)
Dis_R2(i)= Dis_R2(i)+R2(j);
end
end
end
An answer with one loop less would look like this :)
[Dis_Good_path,ia,ic]=unique(Good_path,'rows','stable');
Dis_R2 = R2(ia);
rIA = setxor(ia,1:size(Good_path,1));
rIC = ic(rIA);
for i = 1:numel(rIC)
Dis_R2(rIC(i)) = Dis_R2(rIC(i)) + R2(rIA(i));
end

N-Dimensional Histogram Counts

I am currently trying to code up a function to assign probabilities to a collection of vectors using a histogram count. This is essentially a counting exercise, but requires some finesse to be able to achieve efficiently. I will illustrate with an example:
Say that I have a matrix X = [x1, x2....xM] with N rows and M columns. Here, X represents a collection of M, N-dimensional vectors. IN other words, each of the columns of X is an N-dimensional vector.
As an example, we can generate such an X for M = 10000 vectors and N = 5 dimensions using:
X = randint(5,10000)
This will produce a 5 x 10000 matrix of 0s and 1s, where each column is represents a 5 dimensional vector of 1s and 0s.
I would like to assign a probability to each of these vectors through a basic histogram count. The steps are simple: first find the unique columns of X; second, count the number of times each unique column occurs. The probability of a particular occurrence is then the #of times this column was in X / total number of columns in X.
Returning to the example above, I can do the first step using the unique function in MATLAB as follows:
UniqueXs = unique(X','rows')'
The code above will return UniqueXs, a matrix with N rows that only contains the unique columns of X. Note that the transposes are due to weird MATLAB input requirements.
However, I am unable to find a good way to count the number of times each of the columns in UniqueX is in X. So I'm wondering if anyone has any suggestions?
Broadly speaking, I can think of two ways of achieving the counting step. The first way would be to use the find function, though I think this may be slow since find is an elementwise operation. The second way would be to call unique recursively as it can also provide the index of one of the unique columns in X. This should allow us to remove that column from X and redo unique on the resulting X and keep counting.
Ideally, I think that unique might already be doing some counting so the most efficient way would probably be to work without the built-in functions.
Here are two solutions, one assumes all values are either 0's or 1's (just like the example in your description), the other does not. Both codes should be very fast (more so the one with binary values), even on large data.
1) only zeros and ones
%# random vectors of 0's and 1's
x = randi([0 1], [5 10000]); %# RANDINT is deprecated, use RANDI instead
%# convert each column to a binary string
str = num2str(x', repmat('%d',[1 size(x,1)])); %'
%# convert binary representation to decimal number
num = (str-'0') * (2.^(size(s,2)-1:-1:0))'; %'# num = bin2dec(str);
%# count frequency of how many each number occurs
count = accumarray(num+1,1); %# num+1 since it starts at zero
%# assign probability based on count
prob = count(num+1)./sum(count);
2) any positive integer
%# random vectors with values 0:MAX_NUM
x = randi([0 999], [5 10000]);
%# format vectors as strings (zero-filled to a constant length)
nDigits = ceil(log10( max(x(:)) ));
frmt = repmat(['%0' num2str(nDigits) 'd'], [1 size(x,1)]);
str = cellstr(num2str(x',frmt)); %'
%# find unique strings, and convert them to group indices
[G,GN] = grp2idx(str);
%# count frequency of occurrence
count = accumarray(G,1);
%# assign probability based on count
prob = count(G)./sum(count);
Now we can see for example how many times each "unique vector" occurred:
>> table = sortrows([GN num2cell(count)])
table =
'000064850843749' [1] # original vector is: [0 64 850 843 749]
'000130170550598' [1] # and so on..
'000181606710020' [1]
'000220492735249' [1]
'000275871573376' [1]
'000525617682120' [1]
'000572482660558' [1]
'000601910301952' [1]
...
Note that in my example with random data, the vector space becomes very sparse (as you increase the maximum possible value), thus I wouldn't be surprised if all counts were equal to 1...