Conditionally replacing cell values with column names - perl

I have a 165 x 165 rank matrix such that each row has values ranging from 1-165. I want to parse each row and delete all values >= 5, sort each row in increasing order, then replace the values 1-5 with the name of the column from the original matrix.
For example, for row k the values 1 ,2 3, 4, 5, would result after the first two transformations and would be replaced by p,d, m, n, a.

I am assuming that your array consists of an array of arrays...
Neither Awk, Sed, or Perl have multi-dimensional arrays. However, they can be emulated in Perl by using arrays of arrays.
$a[0]->[0] = xx;
$a[0]->[1] = yy;
[...]
$a[0]->[164] = zz;
$a[1]->[0] = qq;
$a[1]->[1] = rr;
[...]
$a[164]->[164] = vv;
Does this make sense?
I'm calling the row $x and columns $y, so an element in your array will be $array[$x]->[$y]. Is that good?
Okay, your column names will be in row $array[0], so if we find a value less than five in $array[$x]->[$y], we know the column name is in $array[0]->[$y]. Is that good?
for my $x (1..164) { #First row is column names
for my $y (0..164) {
if ($array[$x]->[$y] <= 5) {
$array[$x]->[$y] = $array[0]->[$y];
}
}
}
I'm simply going through all the rows, and for each row, all the columns, and checking the value. If the value is less than or equal to five, I replace it with the column name.
I hope I'm not doing your homework for you.

This GNU sed solution might work although it will need scaling up as I only used a 10x10 matrix for testing purposes:
# { echo {a..j};for x in {1..10};do seq 1 10 | shuf |sed 'N;N;N;N;N;N;N;N;N;s/\n/ /g';done; }> test_data
# cat test_data
a b c d e f g h i j
4 5 9 3 6 2 10 8 7 1
3 7 4 2 1 6 10 5 8 9
10 9 3 1 2 7 8 5 6 4
5 10 4 9 7 8 1 3 6 2
8 6 5 9 1 4 3 2 7 10
2 8 9 3 5 6 10 1 4 7
3 9 8 2 1 4 10 6 7 5
3 7 2 1 8 6 10 4 5 9
1 10 8 3 6 5 4 2 7 9
7 2 3 5 6 1 10 4 8 9
# cat test_data |
sed -rn '1{h;d};s/[0-9]{2,}|[6-9]/0/g;G;s/\n|$/ &/g;s/$/&1 2 3 4 5 /;:a;s/^(\S*) (.*\n)(\S* )(.*)/\2\4\1\3/;ta;s/\n//;s/0[^ ]? //g;:b;s/([1-5])(.*)\1(.)/\3\2/;tb;p'
j f d a b
e d a c h
d e c j h
g j h c a
e h g f c
h a d i e
e d a f j
d c a h i
a h d g f
f b c h d
The sed command works as follows.
The first line of the data file contains the column headings is stored in the hold space then the pattern space (current line) is deleted. For all subsequent data lines all two or more digit numbers and values 6 to 9 are converted to 0. The column names are appended, along with a newline to the data values. Spaces are inserted before the newline and end of string. The data is transformed into a lookup and the sorted values i.e.. 1 2 3 4 5 is prepended to it. The newline is removed along with any 0 values and associated lookups. The values 1 to 5 are replaced by the column names in the lookup.
EDIT:
I may have misunderstood the problem regarding sorting columns or rows, if so it's a minimal fix - replace 1 2 3 4 5 by the original values and perform a numeric sort prior to replacing the numeric data with column names from the lookup.

Related

How do I evaluate a function into a matrix in KDB?

Let's say I've got a function that defines a matrix in terms of it's i and j coordinates:
f: {y+2*x}
I'm trying to create a square matrix that evaluates this function at all locations.
I know it needs to be something like f ' (til 5) /:\: til 5, but I'm struggling with rest.
Rephrasing your question a bit, you want to create a matrix A = [aij] where aij = f(i, j), i, j = 0..N-1.
In other words you want to evaluate f for all possible combinations of i and j. So:
q)N:5;
q)i:til[N] cross til N; / all combinations of i and j
q)a:f .' i; / evaluate f for all pairs (i;j)
q)A:(N;N)#a; / create a matrix using #: https://code.kx.com/q/ref/take/
0 1 2 3 4
2 3 4 5 6
4 5 6 7 8
6 7 8 9 10
8 9 10 11 12
P.S. No, (til 5) /:\: til 5 is not exactly what you'd need but close. You are generating a list of all pairs i.e. you are pairing or joining the first element of til 5 with every element of (another) til 5 one by one, then the second , etc. So you need the join operator (https://code.kx.com/q/ref/join/):
(til 5),/:\: til 5
You were close. But there is no need to generate all the coordinate pairs and then iterate over them. Each Right Each Left /:\: manages all that for you and returns the matrix you want.
q)(til 5)f/:\:til 5
0 1 2 3 4
2 3 4 5 6
4 5 6 7 8
6 7 8 9 10
8 9 10 11 12

How to get the difference of matrixes without repetitions removed

The function setdiff(A,B,'rows') is used to return the set of rows that are in A but not B, with repetitions removed.
Is there any way to do it without removing the repetitions?
Thanks a lot.
You can use ismember instead of setdiff, to find all the rows of B that appear in A.
Because you want only those that NOT appear in A, use the ~ sign, and finally take all A rows in these rows indices:
A =
1 2 3
4 5 6
1 2 3
7 8 9
B =
4 5 6
C=A(~ismember(A,B,'rows'),:)
C =
1 2 3
1 2 3
7 8 9

Sorting data in MATLAB dependant on one column

How do I sort a column based on the values in another column in MATLAB?
Column A shows position data (it is neither ascending or descending in order) Column B contains another column of position data. Finally column C contains numerical values. Is it possible to link the first position value in B with its numerical value in the first cell of C? Then after this I want to sort B such that it is in the same order as column A with the C values following their B counterparts?The length of my columns would be 1558 values.
Before case;
A B C
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
After Case;
A B C
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
Basically A and B became the same and Column C followed B.
Since you don't want things necessarily in ascending or descending order, I don't think any built-in sorting functions like sortrows() will help here. Instead you are matching elements in one column with elements in another column.
Using [~,idx]=ismember(A,B) will tell you where each element of B is in A. You can use that to sort the desired columns.
M=[1 4 10
4 1 20
3 5 30
5 2 40
2 3 50];
A=M(:,1); B=M(:,2); C=M(:,3);
[~,idx]=ismember(A,B);
sorted_matrix = [A B(idx) C(idx)]
Powerful combo of bsxfun and matrix-multiplication solves it and good for code-golfing too! Here's the implementation, assuming M as the input matrix -
[M(:,1) bsxfun(#eq,M(:,1),M(:,2).')*M(:,2:3)]
Sample run -
>> M
M =
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
>> [M(:,1) bsxfun(#eq,M(:,1),M(:,2).')*M(:,2:3)]
ans =
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
Given M = [A B C]:
M =
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
You need to sort the rows of the matrix excluding the first column:
s = sortrows(M(:,2:3));
s =
1 20
2 40
3 50
4 10
5 30
Then use the first column as the indices to reorder the resulting submatrix:
s(M(:,1),:);
ans =
1 20
4 10
3 50
5 30
2 40
This would be used to build the output matrix:
N = [M(:,1) s(M(:,1),:)];
N =
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
The previous technique will obviously only work if A and B are permutations of the values (1..m). If this is not the case, then we need to find the ranking of each value in the array. Let's start with new values for our arrays:
A B C
1 5 60
6 1 80
9 6 60
-4 9 40
5 -4 30
We construct s as before:
s = sortrows([B C]);
s =
-4 30
1 80
5 60
6 60
9 40
We can generate the rankings one of two ways. If the elements of A (and B) are unique, we can use the third output of unique as in this answer:
[~, ~, r] = unique(A);
r =
2
4
5
1
3
If the values of A are not unique, we can use the second return value of sort, the indices in the original array of the elements in sorted order, to generate the rank of each element:
[~, r] = sort(A);
r =
4
1
5
2
3
[~, r] = sort(r);
r =
2
4
5
1
3
As you can see, the resulting r is the same, it just takes 2 calls to sort rather than 1 to unique. We then use r as the list of indices for s above:
M = [A s(r, :)];
M =
1 1 80
6 6 60
9 9 40
-4 -4 30
5 5 60
If you must retain the order of A then use something like this
matrix = [1 4 10; 4 1 20; 3 5 30; 5 2 40; 2 3 50];
idx = arrayfun(#(x) find(matrix(:,2) == x), matrix(:,1));
sorted = [matrix(:,1), matrix(idx,2:3)];

dot product of matrix columns

I have a 4x8 matrix which I want to select two different columns of it then derive dot product of them and then divide to norm values of that selected columns, and then repeat this for all possible two different columns and save the vectors in a new matrix. can anyone provide me a matlab code for this purpose?
The code which I supposed to give me the output is:
A=[1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;];
for i=1:8
for j=1:7
B(:,i)=(A(:,i).*A(:,j+1))/(norm(A(:,i))*norm(A(:,j+1)));
end
end
I would approach this a different way. First, create two matrices where the corresponding columns of each one correspond to a unique pair of columns from your matrix.
Easiest way I can think of is to create all possible combinations of pairs, and eliminate the duplicates. You can do this by creating a meshgrid of values where the outputs X and Y give you a pairing of each pair of vectors and only selecting out the lower triangular part of each matrix offsetting by 1 to get the main diagonal just one below the diagonal.... so do this:
num_columns = size(A,2);
[X,Y] = meshgrid(1:num_columns);
X = X(tril(ones(num_columns),-1)==1); Y = Y(tril(ones(num_columns),-1)==1);
In your case, here's what the grid of coordinates looks like:
>> [X,Y] = meshgrid(1:num_columns)
X =
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Y =
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
As you can see, if we select out the lower triangular part of each matrix excluding the diagonal, you will get all combinations of pairs that are unique, which is what I did in the last parts of the code. Selecting the lower-part is important because by doing this, MATLAB selects out values column-wise, and traversing the columns of the lower-triangular part of each matrix gives you the exact orderings of each pair of columns in the right order (i.e. 1-2, 1-3, ..., 1-7, 2-3, 2-4, ..., etc.)
The point of all of this is that can then use X and Y to create two new matrices that contain the columns located at each pair of X and Y, then use dot to apply the dot product to each matrix column-wise. We also need to divide the dot product by the multiplication of the magnitudes of the two vectors respectively. You can't use MATLAB's built-in function norm for this because it will compute the matrix norm for matrices. As such, you have to sum over all of the rows for each column respectively for each of the two matrices then multiply both of the results element-wise then take the square root - this is the last step of the process:
matrix1 = A(:,X);
matrix2 = A(:,Y);
B = dot(matrix1, matrix2, 1) ./ sqrt(sum(matrix1.^2,1).*sum(matrix2.^2,1));
I get this for B:
>> B
B =
Columns 1 through 11
1 1 1 1 1 1 1 1 1 1 1
Columns 12 through 22
1 1 1 1 1 1 1 1 1 1 1
Columns 23 through 28
1 1 1 1 1 1
Well.. this isn't useful at all. Why is that? What you are actually doing is finding the cosine angle between two vectors, and since each vector is a scalar multiple of another, the angle that separates each vector is in fact 0, and the cosine of 0 is 1.
You should try this with different values of A so you can see for yourself that it works.
To make this code compatible for copying and pasting, here it is:
%// Define A here:
A = repmat(1:8, 4, 1);
%// Code to produce dot products here
num_columns = size(A,2);
[X,Y] = meshgrid(1:num_columns);
X = X(tril(ones(num_columns),-1)==1); Y = Y(tril(ones(num_columns),-1)==1);
matrix1 = A(:,X);
matrix2 = A(:,Y);
B = dot(matrix1, matrix2, 1) ./ sqrt(sum(matrix1.^2,1).*sum(matrix2.^2,1));
Minor Note
If you have a lot of columns in A, this may be very memory intensive. You can get your original code to work with loops, but you need to change what you're doing at each column.
You can do something like this:
num_columns = nchoosek(size(A,2),2);
B = zeros(1, num_columns);
counter = 1;
for ii = 1 : size(A,2)
for jj = ii+1 : size(A,2)
B(counter) = dot(A(:,ii), A(:,jj), 1) / (norm(A(:,ii))*norm(A(:,jj)));
counter = counter + 1;
end
end
Note that we can use norm because we're specifying vectors for each of the inputs into the function. We first preallocate a matrix B that will contain the dot products of all possible combinations. Then, we go through each pair of combinations - take note that the inner for loop starts from the outer most for loop index added with 1 so you don't look at any duplicates. We take the dot product of the corresponding columns referenced by positions ii and jj and store the results in B. I need an external counter so we can properly access the right slot to place our result in for each pair of columns.

matching two matrices in matlab

Suppose I have two matrices p
p =
1 3 6 7 3 6
8 5 10 10 10 4
5 4 8 9 1 7
5 5 5 3 8 9
9 3 5 4 3 1
3 3 9 10 4 1
then after sorting the columns of matrix p into ascending order
y =
1 3 5 3 1 1
3 3 5 4 3 1
5 3 6 7 3 4
5 4 8 9 4 6
8 5 9 10 8 7
9 5 10 10 10 9
I want to know, given a value from y, what its row was in p
ex: the value 3 which is in matrix p located in row 6 column 1
then after sorting it located in matrix y in row 2 column 1
So I want at the end the values after sorting in matrix y, where it was originally in matrix p
Just use second output of sort:
[y ind] = sort(p);
Your desired result (original row of each value) is in matrix ind.
The Matlab sort command returns a second value which can be used to index into the original array or matrix. From the sort documentation:
[Y,I] = sort(X,DIM,MODE) also returns an index matrix I.
If X is a vector, then Y = X(I).
If X is an m-by-n matrix and DIM=1, then
for j = 1:n, Y(:,j) = X(I(:,j),j); end
Ok i understand exactly what you want.
I will give you my code that i write now, it is not optimal but you can optimize it or i can work with you in order to get the better code..
P and y have the same size.
[n,m]=size(p);
for L=1:m
i=1;
temp=y(i,L);
while(i<=n)
if(temp==y(i,L))
% So it is present in case i of p
disp(['It is present in line' num2str(i) ' of p']);
end
i=i+1;
end
end
VoilĂ !!