Pairwise distance matrix containing strings

Pairwise distance matrix containing strings - matlab

I need to calculate the pairwise distance between two matrix elements in a way that distance is equal to the number of binary differences between features/dimensions.
I want to do this with MATLAB codes without using a loop.
For example:
Assume I want to calculate the distance between instances in A and B:
A = [ 1 2 3 ; 2 3 4] % (two instances with three features)
B = [ 2 3 4 ; 2 5 6 ; 4 5 6] % (three instances with three features)
I need to calculate C, which would be a 2x3 matrix contain the distance of instances in A and B in a way that the distance between [1 3 3] and [2 3 4] would be 2: comparing the features, when a feature is equivalent, add 0 to distance and when they are dissimilar add 1 to distance.
So in this case,
C = [3 3 3; 0 2 3].
A and B may contain strings instead of numbers.

You can use bsxfun with #ne (not equal), followed by a sum to count the number of dissimilar features for an instance:
A = [1 2 3; 2 3 4];
B = [2 3 4; 2 5 6; 4 5 6];
C = squeeze(sum(bsxfun(#ne,A,permute(B,[3 2 1])),2))
C =
3 3 3
0 2 3
The above works by generating a logical array testing for equality of each feature for each instance pair via bsxfun(#ne,...). Then a sum is performed over dimension 2 to count the number of dissimilar features for each instance.

The function pdist2 with Hamming distance already does this for you:
pdist2(A,B,'hamming')
This gives the result as percentage of coordinates that differ. Since you want number instead of percentage, multiply by the number of columns:
pdist2(A,B,'hamming')*size(A,2)
ans =
3 3 3
0 2 3

Related

How to split a matrix based on how close the values are?

Suppose I have a matrix A:
A = [1 2 3 6 7 8];
I would like to split this matrix into sub-matrices based on how relatively close the numbers are. For example, the above matrix must be split into:
B = [1 2 3];
C = [6 7 8];
I understand that I need to define some sort of criteria for this grouping so I thought I'd take the absolute difference of the number and its next one, and define a limit upto which a number is allowed to be in a group. But the problem is that I cannot fix a static limit on the difference since the matrices and sub-matrices will be changing.
Another example:
A = [5 11 6 4 4 3 12 30 33 32 12];
So, this must be split into:
B = [5 6 4 4 3];
C = [11 12 12];
D = [30 33 32];
Here, the matrix is split into three parts based on how close the values are. So the criteria for this matrix is different from the previous one though what I want out of each matrix is the same, to separate it based on the closeness of its numbers. Is there any way I can specify a general set of conditions to make the criteria dynamic rather than static?

I'm afraid, my answer comes too late for you, but maybe future readers with a similar problem can profit from it.
In general, your problem calls for cluster analysis. Nevertheless, maybe there's a simpler solution to your actual problem. Here's my approach:
First, sort the input A.
To find a criterion to distinguish between "intraclass" and "interclass" elements, I calculate the differences between adjacent elements of A, using diff.
Then, I calculate the median over all these differences.
Finally, I find the indices for all differences, which are greater or equal than three times the median, with a minimum difference of 1. (Depending on the actual data, this might be modified, e.g. using mean instead.) These are the indices, where you will have to "split" the (sorted) input.
At last, I set up two vectors with the starting and end indices for each "sub-matrix", to use this approach using arrayfun to get a cell array with all desired "sub-matrices".
Now, here comes the code:
% Sort input, and calculate differences between adjacent elements
AA = sort(A);
d = diff(AA);
% Calculate median over all differences
m = median(d);
% Find indices with "significantly higher difference",
% e.g. greater or equal than three times the median
% (minimum difference should be 1)
idx = find(d >= max(1, 3 * m));
% Set up proper start and end indices
start_idx = [1 idx+1];
end_idx = [idx numel(A)];
% Generate cell array with desired vectors
out = arrayfun(#(x, y) AA(x:y), start_idx, end_idx, 'UniformOutput', false)
Due to the unknown number of possible vectors, I can't think of way to "unpack" these to individual variables.
Some tests:
A =
1 2 3 6 7 8
out =
{
[1,1] =
1 2 3
[1,2] =
6 7 8
}
A =
5 11 6 4 4 3 12 30 33 32 12
out =
{
[1,1] =
3 4 4 5 6
[1,2] =
11 12 12
[1,3] =
30 32 33
}
A =
1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3
out =
{
[1,1] =
1 1 1 1 1 1 1
[1,2] =
2 2 2 2 2 2
[1,3] =
3 3 3 3 3 3 3
}
Hope that helps!

MATLAB: Obtaining the rank ordering of a vector, with no tied ranks allowed

I've got a vector A = [6 5 7 7 4] and want to obtain the ranks as either [3 2 4 5 1] or [3 2 5 4 1] - I don't mind which. The answer is a vector in which each element is replaced by the rank it holds. This indicates to me that the fifth element is the smallest, then the second element is the second smallest, and so on.
I thought of doing [~,~,rnk] = unique(A), however that doesn't work, and produces instead [3 2 4 4 1].
How can I obtain the solution with no tied ranks?

It's almost a duplicate of this question.
We use sort twice, first sorting the array to get the index and then sort the index.
A = [6 5 7 7 4];
[~, rnk] = sort(A);
[~, rnk] = sort(rnk);
rnk =
3 2 4 5 1

Matrix operations matlab to move values around

I have a matrix that I'd like to create a new ordering of, for example,
vals = [1 2; 3 4]
I also have two matrices, new_x and new_y such that new_x(a,b) = j and new_x(a,b) = k means that I want the value at vals (a,b) to be mapped to new_vals(j,k).
For example, given
new_x = [1 2; 2 1]
new_y = [2 2; 1 1]
I'd want
new_vals = [4 3; 1 2]
I understand that I could just write two for loops to build the new array, but matlab is notoriously good at providing operations on entire matricies. My question is, how would I build new_vals without the for loops?

Basically you are trying to get a matrix that when indexed with new_x and new_y would give us vals, i.e. -
output(new_x(1,1),new_y(1,1)) must be equal to vals(1,1),
output(new_x(1,2),new_y(1,2)) must be equal to vals(1,2) and so on.
We will try to verify this later on. For now, here's one solution using linear indexing -
nrows = size(vals,1); %// Store number of rows
%// Calculate linear indices
idx = (new_x + (new_y-1)*nrows);
%// Trace/map back to sorted version of "1:numel(vals)"
[~,traced_back_idx] = sort(idx(:));
%// Index into vals with traced back linear indices & then reshape & transpose
out = reshape(vals(traced_back_idx),[],nrows).'
Here's another and possibly faster way -
out = nan(size(vals));
out((new_x + (new_y-1)*nrows)) = vals;
out = out.'
As discussed earlier for verification, let's index into out with new_x and new_y and that should match up with vals. Here's a code to do so -
for ii = 1:size(out,1)
for jj = 1:size(out,2)
check_back(ii,jj) = out(new_y(ii,jj),new_x(ii,jj));
end
end
Sample runs -
Case #1 (sample from question):
vals =
1 2
3 4
new_x =
1 2
2 1
new_y =
2 2
1 1
new_vals =
4 3
1 2
out =
4 3
1 2
check_back = (must be same as vals)
1 2
3 4
Case #2:
vals =
1 2 5
3 4 5
6 8 3
new_x =
1 2 3
3 1 2
3 2 1
new_y =
2 2 3
2 1 1
1 3 3
out =
4 5 6
1 2 3
3 8 5
check_back = (must be same as vals)
1 2 5
3 4 5
6 8 3

I think i see what you are trying to do here. new_x and new_y are just coordinates for the new_val matrix rigth? The problem is tha what you are trying to do only works for vectors, not for matrix, so the only way is to transform the matrix into a vector, reorder the values and then go back to matrix like:
vals = [1 ,2; 3, 4];
A=reshape(vals,1,4); % A is a vector [ 1 3 2 4]
new_coord=[2,3,4,1];
B(new_c)=A; %B is [4 1 3 2]
new_val=reshape(B,2,2) %back to matrix
Obtainig new_val=[4 3; 1 2]. Also B=A(new_c) is also allowed but with different coordinates, eventhoug is much easy to think the rigth coordinates in that way.
I am sure there must be a way to include the new_x matrix and transform everything into new_coord

put random position set of number with conditional repeat

I have this set of number a=[1 2 3]. And I want to put this number randomly into matrix 7 x 1, and number 1 must have 2 times, number 2 must have 3 times and number 3 must have 2 times.
The sequence is not necessary. The answer look like.
b=[1 2 2 2 1 3 3]'

Try randperm:
a=[1 2 3];
samps = [1 1 2 2 2 3 3]; % specify your desired repeats
samps = samps(randperm(numel(samps))); % shuffle them
b = a(samps)
Or, instead of specifying samps explicitly, you can specify the number of repetitions for each element of a and use arrayfun to compute samps:
reps = [2 3 2];
sampC = arrayfun(#(x,y)x*ones(1,y),a,reps,'uni',0);
samps = [sampC{:}];
samps = samps(randperm(numel(samps))); % shuffle them
b = a(samps)

%how often each value should occure
quantity=[2,2,3]
%values
a=[1,2,3]
l=[]
%get list of all values
for idx=1:numel(a)
l=[l,ones(1,quantity(idx))*v(idx)]
end
%shuffle l
l=l(randperm(numel(l)))

Generate a vector with elements that contain a fixed number of unique values

Suppose I have a vector
A =
3 5 3 3 2 2 4 2 6
I need to produce a new vector B that will contain all these values from the beggining vector A that will result in a unique number of n elements (suppose n=3, for the purpose of this example). The new vector should be B =
3 5 3 3 2
since up to the fifth element of vector A we have 3 unique values(3,5,2).
Actual vectors are a lot larger, so I would rather need a general solution and preferably by avoiding a loop. Any ideas? Thanks in advance

You can use unique for this problem. However, be sure to use the 'stable' option.
A = [3 5 3 3 2 2 4 2 6];
n = 3;
[x, id] = unique(A,'stable');
B = A(1:id(3))
This results in
B =
3 5 3 3 2

Do the following:
A = [3 5 3 3 2 2 4 2 6];
n = 3;
[b,i] = unique(A,'first');
h = sort(i);
A(1:h(n))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Pairwise distance matrix containing strings - matlab

The function pdist2 with Hamming distance already does this for you: pdist2(A,B,'hamming') This gives the result as percentage of coordinates that differ. Since you want number instead of percentage, multiply by the number of columns: pdist2(A,B,'hamming')*size(A,2) ans = 3 3 3 0 2 3

Related

How to split a matrix based on how close the values are?

MATLAB: Obtaining the rank ordering of a vector, with no tied ranks allowed

Matrix operations matlab to move values around

put random position set of number with conditional repeat

Generate a vector with elements that contain a fixed number of unique values

Categories

Resources