delete specific columns when other column has specific value (perl or awk) - perl

I have a file with 16 different columns (tab-separated values):
22 51169729 G 39 A 0 0 C 0 0 G 38 0.974359 T 1 0.025641
22 51169730 A 36 A 36 1 C 0 0 G 0 0 T 0 0
22 51169731 C 39 A 0 0 C 39 1 G 0 0 T 0 0
22 51169732 G 37 A 0 0 C 0 0 G 37 1 T 0 0
22 51169733 G 33 A 0 0 C 0 0 G 33 1 T 0 0
22 51169734 C 35 A 0 0 C 35 1 G 0 0 T 0 0
22 51169735 A 32 A 32 1 C 0 0 G 0 0 T 0 0
22 51169736 G 32 A 0 0 C 0 0 G 32 1 T 0 0
22 51169737 C 30 A 0 0 C 30 1 G 0 0 T 0 0
22 51169738 T 27 A 0 0 C 0 0 G 0 0 T 27 1
22 51169739 G 26 A 0 0 C 0 0 G 26 1 T 0 0
22 51169740 A 25 A 25 1 C 0 0 G 0 0 T 0 0
22 51169741 C 22 A 0 0 C 22 1 G 0 0 T 0 0
22 51169742 G 23 A 0 0 C 0 0 G 23 1 T 0 0
22 51169743 C 21 A 0 0 C 21 1 G 0 0 T 0 0
22 51169744 C 22 A 0 0 C 22 1 G 0 0 T 0 0
22 51169745 C 19 A 0 0 C 19 1 G 0 0 T 0 0
22 51169746 C 19 A 0 0 C 19 1 G 0 0 T 0 0
22 51169747 A 15 A 14 0.933333 C 1 0.0666667 G 0 0 T 0 0
22 51169748 C 20 A 0 0 C 20 1 G 0 0 T 0 0
The third column can be A, G, C or T.
I would like to:
remove columns 5, 6 and 7 when column 3 is an 'A' OR when $7=='0'.
Similarly, remove columns 8, 9, 10 when $3== 'C' OR when $10=='0'.
remove columns 11, 12, 13 when $3=='G' OR when $13=='0'.
and remove columns 14, 15, 16 when $3=='T' OR when $16=='0'.
When this is done for the entire file, there would only be 4 columns left in some cases and 7 columns in other cases, like in the following example:
22 51169729 G 39 T 1 0.025641
22 51169730 A 36
22 51169731 C 39
22 51169732 G 37
22 51169733 G 33
22 51169734 C 35
22 51169735 A 32
22 51169736 G 32
22 51169737 C 30
22 51169738 T 27
22 51169739 G 26
22 51169740 A 25
22 51169741 C 22
22 51169742 G 23
22 51169743 C 21
22 51169744 C 22
22 51169745 C 19
22 51169746 C 19
22 51169747 A 15 C 2 0.133333
22 51169748 C 20
Any suggestions?

Perl solution for the first part:
#!/usr/bin/perl
use warnings;
use strict;
my %remove = ( A => 4, # Where to start removing the columns
C => 7, # for a given character in column #3.
G => 10,
T => 13,
);
$\ = "\n"; # Add newline to prints.
$, = "\t"; # Separate values by tabs.
while (<>) { # Read input line by line;
chomp; # Remove newline.
my #F = split /\t/; # Split on tabs, populate an array.
splice #F, $remove{ $F[2] }, 3; # Remove the columns.
print #F; # Output.
}
Once you clarify the second requirement, I can try to add more code. What values do you want to remove? Can you show more examples?

Here's one way to do the first part, assuming no empty fields:
$ cat tst.awk
$3 == "A" { $5=$6=$7="" }
$3 == "C" { $8=$9=$10="" }
$3 == "G" { $11=$12=$13="" }
$3 == "T" { $14=$15=$16="" }
{ gsub(/[[:space:]]+/,"\t"); print }
$ awk -f tst.awk file
1 957584 C 157 A 1 0.006 G 0 0 T 0 0
I don't really understand what you're trying to do in the 2nd part but it sounds like this might be what you want if the test on $7/10/13 is the modified field numbers after the first phase:
$3 == "A" { $5=$6=$7="" }
$3 == "C" { $8=$9=$10="" }
$3 == "G" { $11=$12=$13="" }
$3 == "T" { $14=$15=$16="" }
{ $0=$0 }
$7 ~ /0/ { c++ }
$10 ~ /0/ { c++ }
$13 ~ /0/ { c++ }
c > 1 { $8=$9=$10="" }
{ c=0; gsub(/[[:space:]]+/,"\t"); print }
or this if the test on $7/10/13 is the original field numbers:
$7 ~ /0/ { c++ }
$10 ~ /0/ { c++ }
$13 ~ /0/ { c++ }
$3 == "A" { $5=$6=$7="" }
$3 == "C" { $8=$9=$10="" }
$3 == "G" { $11=$12=$13="" }
$3 == "T" { $14=$15=$16="" }
c > 1 { $8=$9=$10="" }
{ c=0; gsub(/[[:space:]]+/,"\t"); print }
If not, edit your question to clarify with a better example.

Related

Matlab: find a value in a matrix

I have the following matrix:
A= [23 34 45 0 0 0; 21 34 0 0 23 11; 34 23 0 0 0 22]
I want to find if a value is present and if it's present, I want to find the following values.
Eg I want to find in A the value 23, if it's present I want like output a matrix only with 23 and its following values
B= [23 34 45 0 0 0; 0 0 0 0 23 11; 0 23 0 0 0 22]
This is an interesting question, and I have a non-loopy answer, it uses the interesting effect of cumsum and find to great efficiency.
G = zeros(size(A));
T = find(A==23);
G(T) = 1;
mask = cumsum(G,2)>0;
result = mask .* A;
>> result =
23 34 45 0 0 0
0 0 0 0 23 11
0 23 0 0 0 22
This is I think, one of the more efficient way of doing this.
========EDIT========
even better, use logical indexing:
B = A.*(cumsum(A==23,2)>0);
Thanks to #obchardon
find() returns the row and the column of the desired value, in your case "23", in matrix A.
using a for loop you can copy the value and its following ones:
A = [23 34 45 0 0 0; ...
21 34 0 0 23 11; ...
34 23 0 0 0 22];
[r, c] = find(A==23);
B = zeros(3,6);
for i=1:length(r)
columns = c(i):length(B);
B(i,columns) = A(r(i),columns);
end;

How can I make a diamond of zeroes in a matrix of any size? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a square Matrix N x M, odd dimensions, and I want to put a diamond of zeroes, for example, for a 5 x 5 matrix:
1 3 2 4 2
5 7 8 9 5
3 2 4 6 3
6 8 2 1 3
3 3 3 3 3
Is transform to:
1 3 0 4 2
5 0 8 0 5
0 2 4 6 0
6 0 2 0 3
3 3 0 3 3
How can this be done efficiently?
I'll bite, here is one approach:
% NxN matrix
N = 5;
assert(N>1 && mod(N,2)==1);
A = magic(N);
% diamond mask
N2 = fix(N/2);
[I,J] = meshgrid(-N2:N2);
mask = (abs(I) + abs(J)) == N2;
% fill with zeros
A(mask) = 0;
The result:
>> A
A =
17 24 0 8 15
23 0 7 0 16
0 6 13 20 0
10 0 19 0 3
11 18 0 2 9
I also had some time to play around. For my solution there are no limits concerning A being odd or even or larger than 1. Every integer is fine (even 0 works, though it does not make sense).
% NxN matrix
N = 7;
A = magic(N);
half = ceil( N/2 );
mask = ones( half );
mask( 1 : half+1 : half*half ) = 0;
mask = [ fliplr( mask ) mask ];
mask = [ mask; flipud( mask ) ];
if( mod(N,2) == 1 )
mask(half, :) = []
mask(:, half) = []
end
A( ~mask ) = 0;
A
I am first creating a square sub-matrix mask of "quarter" size (half the number of columns and half the number of rows, ceil() to get one more in the case N is odd).
Example for N=7 -> half=4.
mask =
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
I then set it's diagonal values to zero:
mask =
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
Mirror the mask horizontally:
mask =
1 1 1 0 0 1 1 1
1 1 0 1 1 0 1 1
1 0 1 1 1 1 0 1
0 1 1 1 1 1 1 0
Then mirror it vertically:
mask =
1 1 1 0 0 1 1 1
1 1 0 1 1 0 1 1
1 0 1 1 1 1 0 1
0 1 1 1 1 1 1 0
0 1 1 1 1 1 1 0
1 0 1 1 1 1 0 1
1 1 0 1 1 0 1 1
1 1 1 0 0 1 1 1
As N is odd we got a redundant row and redundant column that are then removed:
mask =
1 1 1 0 1 1 1
1 1 0 1 0 1 1
1 0 1 1 1 0 1
0 1 1 1 1 1 0
1 0 1 1 1 0 1
1 1 0 1 0 1 1
1 1 1 0 1 1 1
The logical not is then used as a mask to select the values in the original matrix that are set to 0.
Probably not as efficient as #Amro's solution, but it works. :D
My solution:
looking at the first left half of the matrix
in the first row 0 is in the middle column (let's call it mc)
in the second row the 0is in column mc-1
and so on while the rows increase
when you reach column 1 the sequence continue but with mc+1 but the rows decrease
In a similar way for the right half of the matrix
n=7
a=randi([20 30],n,n)
% Centre of the matrix
p=ceil(n/2)
% Identify the column sequence
col=[p:-1:1 2:p p+1:n n-1:-1:p]
% Identify the row sequence
row=[1:n n-1:-1:1]
% Transorm the row and column index in linear index
idx=sub2ind(size(a),row,col)
% Set the 0'
a(idx)=0
a =
22 29 23 27 27 21 23
29 29 21 27 24 26 24
30 28 21 27 29 28 25
28 22 24 20 27 24 25
23 26 21 20 30 20 29
26 20 26 23 25 22 25
21 24 25 25 23 21 30
a =
22 29 23 0 27 21 23
29 29 0 27 0 26 24
30 0 21 27 29 0 25
0 22 24 20 27 24 0
23 0 21 20 30 0 29
26 20 0 23 0 22 25
21 24 25 0 23 21 30
Hope this helps.
Qapla'
Using indexing (only works when N is odd):
N = 7;
% Random matrix
A = randi(100, N);
idx = [N-1:-2:1; 2:2:N];
A(cumsum([ceil(N/2) idx(:)' idx(end-1:-1:1)])) = 0
A =
60 77 74 0 54 83 9
8 48 0 76 0 28 67
6 0 32 78 83 0 10
0 27 25 5 11 39 0
76 0 49 43 67 0 16
79 7 0 86 0 70 78
57 28 85 0 81 44 81

Changing index of matrix

I'm trying to change the following code so that the first matrix will become the second matrix:
function BellTri = matrix(n)
BellTri = zeros(n);
BellTri(1,1) = 1;
for i = 2:n
BellTri(i,1) = BellTri(i-1,i-1);
for j = 2:i
BellTri(i,j) = BellTri(i - 1,j-1) + BellTri(i,j-1);
end
end
BellTri
First matrix (when n = 7)
1 0 0 0 0 0 0
1 2 0 0 0 0 0
2 3 5 0 0 0 0
5 7 10 15 0 0 0
15 20 27 37 52 0 0
52 67 87 114 151 203 0
203 255 322 409 523 674 877
Second matrix
1 1 2 5 15 52 877
1 3 10 37 151 674 0
2 7 27 114 523 0 0
5 20 87 409 0 0 0
15 67 322 0 0 0 0
52 255 0 0 0 0 0
203 0 0 0 0 0 0
An option is to cyclically permute the columns using circshift.
function [BellTri, Second] = matrix(n)
BellTri = zeros(n);
BellTri(1,1) = 1;
for i = 2:n
BellTri(i,1) = BellTri(i-1,i-1);
for j = 2:i
BellTri(i,j) = BellTri(i - 1,j-1) + BellTri(i,j-1);
end
end
Second = BellTri;
for i = 1:n
Second(:, i) = circshift(Second(:,i), 1-i);
end
for i = n-1:-1:2
Second(1, i) = Second(1, i-1);
end
end
Input: [BellTri, Second] = matrix(7)
Output:
BellTri =
1 0 0 0 0 0 0
1 2 0 0 0 0 0
2 3 5 0 0 0 0
5 7 10 15 0 0 0
15 20 27 37 52 0 0
52 67 87 114 151 203 0
203 255 322 409 523 674 877
Second =
1 1 2 5 15 52 877
1 3 10 37 151 674 0
2 7 27 114 523 0 0
5 20 87 409 0 0 0
15 67 322 0 0 0 0
52 255 0 0 0 0 0
203 0 0 0 0 0 0
One approach:
out = zeros(size(A));
out(logical(fliplr(triu(ones(size(A,1)))))) = A(logical(tril(ones(size(A,1)))));
Note: As Divakar pointed out, there should be a typo in the first row. This method gives the corrected one.
Results:
A = [1 0 0 0 0 0 0;
1 2 0 0 0 0 0;
2 3 5 0 0 0 0;
5 7 10 15 0 0 0;
15 20 27 37 52 0 0;
52 67 87 114 151 203 0;
203 255 322 409 523 674 877];
>> out
out =
1 2 5 15 52 203 877
1 3 10 37 151 674 0
2 7 27 114 523 0 0
5 20 87 409 0 0 0
15 67 322 0 0 0 0
52 255 0 0 0 0 0
203 0 0 0 0 0 0

Taking a matrix and retrieving both the diagonals keeping the dimensions of the original matrix in MATLAB

Given the matrix A = magic(5) you get:
A = 17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
I want to use commands such as rot90, diag, triu, tril and matrices sum to get the matrix:
A = 17 0 0 0 15
0 5 0 14 0
0 0 13 0 0
0 12 0 21 0
11 0 0 0 9
Please, if you can't think of a way to solve this without the commands I wrote, it's OK to do it your own way.
You can use eye function for indexing
>> A(~eye(size(A)) & ~flipud(eye(size(A))))=0
A =
17 0 0 0 15
0 5 0 14 0
0 0 13 0 0
0 12 0 21 0
11 0 0 0 9
You can simply use linear indexing to access the diagonals:
n = size(A,1);
B = zeros(n);
B( 1:(n+1):end ) = A( 1:(n+1):end ); %// main diagonal
B( n:(n-1):(end-n+1) ) = A( n:(n-1):(end-n+1) )
And you get
B =
17 0 0 0 15
0 5 0 14 0
0 0 13 0 0
0 12 0 21 0
11 0 0 0 9
Another approach is:
mDiag = diag(diag(A));
aDiag = rot90(diag(diag(rot90(A))))';
overlap = A.*((diag(diag(A)) ~= 0) & (rot90(diag(diag(rot90(A)))) ~= 0));
solution = mDiag + aDiag - overlap
And than:
solution =
17 0 0 0 15
0 5 0 14 0
0 0 13 0 0
0 12 0 21 0
11 0 0 0 9
Using bsxfun
out = A.*bsxfun(#(x,y) x == y | x+y == size(A,1)+1,(1:size(A,1)).',1:size(A,1)) %//'

Is it possible to rotate a matrix by 45 degrees in matlab

i.e. so that it appears like a diamond. (it's a square matrix) with each row having 1 more element than the row before up until the middle row which has the number of elements equal to the dimensions of the original matrix, and then back down again with each row back to 1?
A rotation is of course not possible as the "grid" a matrix is based on is regular.
But I remember what your initially idea was, so the following will help you:
%example data
A = magic(5);
A =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
d = length(A)-1;
diamond = zeros(2*d+1);
for jj = d:-2:-d
ii = (d-jj)/2+1;
kk = (d-abs(jj))/2;
D{ii} = { [zeros( 1,kk ) A(ii,:) zeros( 1,kk ) ] };
diamond = diamond + diag(D{ii}{1},jj);
end
will return the diamond:
diamond =
0 0 0 0 17 0 0 0 0
0 0 0 23 0 24 0 0 0
0 0 4 0 5 0 1 0 0
0 10 0 6 0 7 0 8 0
11 0 12 0 13 0 14 0 15
0 18 0 19 0 20 0 16 0
0 0 25 0 21 0 22 0 0
0 0 0 2 0 3 0 0 0
0 0 0 0 9 0 0 0 0
Now you can again search for words or patterns row by row or column by column, just remove the zeros then:
Imagine you extract a single row:
row = diamond(5,:)
you can extract the non-zero elements with find:
rowNoZeros = row( find(row) )
rowNoZeros =
11 12 13 14 15
Not a real diamond, but probably useful as well:
(Idea in the comments by #beaker. I will remove this part, if he is posting it by himself.)
B = spdiags(A)
B =
11 10 4 23 17 0 0 0 0
0 18 12 6 5 24 0 0 0
0 0 25 19 13 7 1 0 0
0 0 0 2 21 20 14 8 0
0 0 0 0 9 3 22 16 15