Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a huge data like the following
NDDDDTSVCLGTRQCSWFAGCTNRTWNSSA 0
VCLGTRQCSWFAGCTNRTWNSSAVPLIGLP 0
LTWSGNDTCLYSCQNQTKGLLYQLFRNLFC 0
CQNQTKGLLYQLFRNLFCSYGLTEAHGKWR 0
ITNDKGHDGHRTPTWWLTGSNLTLSVNNSG 0
GHRTPTWWLTGSNLTLSVNNSGLFFLCGNG 0
FLCGNGVYKGFPPKWSGRCGLGYLVPSLTR 0
KGFPPKWSGRCGLGYLVPSLTRYLTLNASQ 0
QSVCMECQGHGERISPKDRCKSCNGRKIVR 1
I want to use the following key to replace the letter with numbers
A 1
R 2
N 3
D 4
B 5
C 6
E 7
Q 8
Z 9
G 10
H 11
I 12
L 13
K 14
M 15
F 16
P 17
S 18
T 19
W 20
Y 21
V 22
at first I want to remove all the numbers close to letter and then replace the letters , so lets look at the first like
NDDDDTSVCLGTRQCSWFAGCTNRTWNSSA
will have this
3 4 4 4 4 19 18 22 6 19 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1
and for the rest of lines the same as many lines as I have
perl -e'
use autodie;
my %charmap = (
A => 1, R => 2, N => 3, D => 4, B => 5, C => 6, E => 7, Q => 8,
Z => 9, G => 10, H => 11, I => 12, L => 13, K => 14, M => 15, F => 16,
P => 17, S => 18, T => 19, W => 20, Y => 21, V => 22,
);
while (<>) {
s{(.)}{ ($charmap{$1} // $1) . " " }ge;
print;
}
' file
Or just
perl -pe'
BEGIN { #charmap{ split //, "ARNDBCEQZGHILKMFPSTWYV" } = 1..22 }
s{(.)}{ ($charmap{$1} // $1) . " " }ge;
' file
With any awk in any shell on any UNIX box:
$ cat tst.awk
BEGIN {
chars = "ARNDBCEQZGHILKMFPSTWYV"
for (i=1; i<=length(chars); i++) {
char = substr(chars,i,1)
map[char] = i
}
}
{
out = ""
chars = $1
for (i=1; i<=length(chars); i++) {
char = substr(chars,i,1)
out = (out == "" ? "" : out " ") (char in map ? map[char] : char)
}
print out
}
$ awk -f tst.awk file
3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1
22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1 22 17 13 12 10 13 17
13 19 20 18 10 3 4 19 6 13 21 18 6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6
6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6 18 21 10 13 19 7 1 11 10 14 20 2
12 19 3 4 14 10 11 4 10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10
10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10 13 16 16 13 6 10 3 10
16 13 6 10 3 10 22 21 14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2
14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2 21 13 19 13 3 1 18 8
8 18 22 6 15 7 6 8 10 11 10 7 2 12 18 17 14 4 2 6 14 18 6 3 10 2 14 12 22 2
Alternative Perl solution:
#!/usr/bin/perl
use strict;
use warnings;
my %key = (
A => 1, R => 2, N => 3, D => 4, B => 5,
C => 6, E => 7, Q => 8, Z => 9, G => 10,
H => 11, I => 12, L => 13, K => 14, M => 15,
F => 16, P => 17, S => 18, T => 19, W => 20,
Y => 21, V => 22,
);
while (<STDIN>) {
my($text) = /^(\w+)/;
print join(' ',
map { $key{$_} }
split(//, $text)
), "\n";
}
exit 0;
Output with your given text:
$ perl dummy.pl <dummy.txt
3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1
22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1 22 17 13 12 10 13 17
13 19 20 18 10 3 4 19 6 13 21 18 6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6
6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6 18 21 10 13 19 7 1 11 10 14 20 2
12 19 3 4 14 10 11 4 10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10
10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10 13 16 16 13 6 10 3 10
16 13 6 10 3 10 22 21 14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2
14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2 21 13 19 13 3 1 18 8
8 18 22 6 15 7 6 8 10 11 10 7 2 12 18 17 14 4 2 6 14 18 6 3 10 2 14 12 22 2
On second thought...
As OP wants to obfuscate clear text then the more appropriate solution IMHO should be something like this:
$ bash <dummy.txt -c "$(echo /Td6WFoAAATm1rRGBMCtAbgBIQEWAAAAAAAAACsG0SbgALcApV0AOBlKq3igoJRmX9TqJifIRDIcDLdDtNRSv+tJBsifrrsdnlllNt2qqnlz0/uBmSnlO0FTKjKH/HXplJm9LaV7kXiNp/ZWDsyVqoV8EPjIEHHkXXd6jKahyq7tcCA4NGTHp/pwmk8jith6j/dcX67QCKmL0UtZUz9BqVWefD41lbrTNazbD8IP6zMLmAVxJav51SSTHzsUqhUfqhVmLsUg8sJkgloAAAAAAOMYtQXt21WNAAHJAbgBAABTvtYRscRn+wIAAAAABFla | base64 -d | xzcat)"
3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1
22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1 22 17 13 12 10 13 17
13 19 20 18 10 3 4 19 6 13 21 18 6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6
6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6 18 21 10 13 19 7 1 11 10 14 20 2
12 19 3 4 14 10 11 4 10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10
10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10 13 16 16 13 6 10 3 10
16 13 6 10 3 10 22 21 14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2
14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2 21 13 19 13 3 1 18 8
8 18 22 6 15 7 6 8 10 11 10 7 2 12 18 17 14 4 2 6 14 18 6 3 10 2 14 12 22 2
another awk
$ awk 'NR==FNR {a[$1]=$2; next}
{n=length($1);
for(i=1;i<=n;i++)
printf "%s", a[substr($1,i,1)] (i==n?ORS:OFS)}' mapfile datafile
3 4 4 4 4 19 18 22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1
22 6 13 10 19 2 8 6 18 20 16 1 10 6 19 3 2 19 20 3 18 18 1 22 17 13 12 10 13 17
13 19 20 18 10 3 4 19 6 13 21 18 6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6
6 8 3 8 19 14 10 13 13 21 8 13 16 2 3 13 16 6 18 21 10 13 19 7 1 11 10 14 20 2
12 19 3 4 14 10 11 4 10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10
10 11 2 19 17 19 20 20 13 19 10 18 3 13 19 13 18 22 3 3 18 10 13 16 16 13 6 10 3 10
16 13 6 10 3 10 22 21 14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2
14 10 16 17 17 14 20 18 10 2 6 10 13 10 21 13 22 17 18 13 19 2 21 13 19 13 3 1 18 8
8 18 22 6 15 7 6 8 10 11 10 7 2 12 18 17 14 4 2 6 14 18 6 3 10 2 14 12 22 2
however, there is no provision of missing mappings that are not specified, i.e. if you have chars not listed on the mapfile they will be ignored.
If the goal is encryption I'll propose a different approach:
First let's generate a mapping (or encryption key)
$ key=$(printf "%s\n" {A..Z} | shuf | paste -sd' ' | tr -d ' ')
$ echo "$key"
CNYSGFRDKQTOXJVLEWBAHZPMUI
now you can encrypt/decrypt your file contents, simply
$ tr [A-Z] "$key" < datafile > file.encrypted
and to reverse
$ tr "$key" [A-Z] < file.encrypted > file.decrypted
obviously, you need to save the key.
Related
I have data like
Date ColumnName1 ColumnName2 ColumnName3 ColumnName4 ColumnName5
2018-04-01 1 2 3 4 5
2018-04-02 6 7 8 9 10
2018-04-03 11 12 13 14 15
2018-04-04 16 17 18 19 20
2018-04-05 21 22 23 24 25
and I want data like following
2018-04-01 2018-04-02 2018-04-03 2018-04-03 2018-04-05
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Now what to do?
In MATLAB, we can use im2col and col2im to transform from columns to blocks and back, for example
>> A = floor(30*rand(4,6))
A =
8 5 2 13 15 11
22 11 27 13 24 24
5 18 23 9 23 15
20 23 14 15 19 10
>> B = im2col(A,[2 2],'distinct')
B =
8 5 2 23 15 23
22 20 27 14 24 19
5 18 13 9 11 15
11 23 13 15 24 10
>> col2im(B,[2 2],[4,6],'distinct')
ans =
8 5 2 13 15 11
22 11 27 13 24 24
5 18 23 9 23 15
20 23 14 15 19 10
my question is that: after using im2col with sliding mode
>> B = im2col(A,[2 2],'sliding')
B =
8 22 5 5 11 18 2 27 23 13 13 9 15 24 23
22 5 20 11 18 23 27 23 14 13 9 15 24 23 19
5 11 18 2 27 23 13 13 9 15 24 23 11 24 15
11 18 23 27 23 14 13 9 15 24 23 19 24 15 10
I wish to get a 4-by-6 matrix C from B(without knowing A) that the value at each site equals the original value multiple the times of sampling.
In other word, C(1,1)=A(1,1), C(1,2)=A(1,2)*2, C(2,2) = A(2,2)*4
Though we can easily implement with a for-loop, but the efficiency is critically low. So how to vectorize the implementation?
If I'm understanding correctly, you're desired output is
C = [ 8 10 4 26 30 11
44 44 108 52 96 48
10 72 92 36 92 30
20 46 28 30 38 10 ]
which I got by computing C = A.*S where
S = [ 1 2 2 2 2 1
2 4 4 4 4 2
2 4 4 4 4 2
1 2 2 2 2 1 ]_
The entries in S represent how many sliding blocks each entry is a member of.
I believe your question boils down to how to construct the matrix S.
Solution:
S = min(min(1:M,M:-1:1),x)'*min(min(1:N,N:-1:1),y)
C = A.*S
where A is size M-by-N, and your sliding block is size x-by-y.
Explanation:
In the given example, M=4, N=6, x=2, and y=2.
Notice the solution S can be written as the outer product of two vectors:
S = [1;2;2;1] * [1,2,2,2,2,1]
We construct each of these two vectors using the values of M,N,x,y:
min(1:M,M:-1:1)' == min(1:4,4:-1:1)'
== min([1,2,3,4], [4,3,2,1])'
== [1,2,2,1]'
== [1;2;2;1]
In this case, the extra min(...,x) does nothing since all entries are already <=x.
min(1:N,N:-1:1) == min(1:6,6:-1:1)
== min([1,2,3,4,5,6],[6,5,4,3,2,1])
== [1,2,3,3,2,1]
This time the extra min(...,y) does matter.
min(min(1:N,N:-1:1),y) == min([1,2,3,3,2,1],y)
== min([1,2,3,3,2,1],2)
== [1,2,2,2,2,1]
I've found many posts to calculate distance between atoms, even I've written my own code for it. Now I want it to be in very less number of lines, I've written something like follows
#!/usr/bin/perl -w
#ARGV = <>;
for ( $i = 0; $i <= $#ARGV; $i++ ) {
#temp = split( /\s+/, $ARGV[$i] );
if ( $temp[0] eq "ATOM" and $temp[2] eq "CA" ) {
( $n1, $ax, $ay, $az ) = #temp[ 5, 6, 7, 8 ];
if ( $temp[0] eq "ATOM" and $temp[2] eq "CA" ) {
( $n2, $bx, $by, $bz ) = #temp[ 5, 6, 7, 8 ];
}
$dista = sprintf( "%0.3f",
sqrt( ( $ax - $bx )**2 + ( $ay - $by )**2 + ( $az - $bz )**2 ) );
print "$n1\t$n2\t$dista\n";
}
}
A sample input file is http://www.rcsb.org/pdb/files/5PTI.pdb. When I run the program it is not taking the next "CA" atom to calculate the distance, I want to calculate first "CA" to all other "CA" atoms and 2nd CA to all other CA's and so on. I know for loop is missing in my code, I tried to include that but something was going wrong. Where can I modify my code to get correct results.
In order to compare all distinct pairs of atoms you need to access the file data out of order, so it is best to read all of the relevant data into memory before doing the calculations.
This program uses a while loop to read all of the CA ATOM numbers and positions into #data, and then calculates the distance between every different pair using a double nested for loop
use strict;
use warnings;
my #data;
while ( <> ) {
my #fields = split;
next unless $fields[0] eq 'ATOM' and $fields[2] eq 'CA';
push #data, [ #fields[5..8] ];
}
for my $i (0 .. $#data-1) {
my ($an, $ax, $ay, $az) = #{ $data[$i] };
for my $j ($i+1 .. $#data) {
my ($bn, $bx, $by, $bz) = #{ $data[$j] };
my ($dx, $dy, $dz) = ($ax-$bx, $ay-$by, $az-$bz);
my $dist = sqrt($dx*$dx + $dy*$dy + $dz*$dz);
printf "%3d %3d %6.3f\n", $an, $bn, $dist;
}
}
output
1 2 3.772
1 3 6.357
1 4 8.230
1 5 6.883
1 6 8.835
1 7 11.836
1 8 14.819
1 9 16.272
1 10 18.781
1 11 22.069
1 12 23.577
1 13 27.304
1 14 28.550
1 15 30.435
1 16 29.411
1 17 27.994
1 18 25.290
1 19 23.197
1 20 19.549
1 21 16.377
1 22 13.683
1 23 10.803
1 24 12.584
1 25 10.300
1 26 13.713
1 27 14.787
1 28 11.633
1 29 13.140
1 30 13.576
1 31 16.899
1 32 19.413
1 33 20.494
1 34 23.292
1 35 22.504
1 36 25.633
1 37 25.598
1 38 24.936
1 39 22.477
1 40 19.299
1 41 16.005
1 42 12.638
1 43 11.659
1 44 14.746
1 45 15.146
1 46 18.399
1 47 17.588
1 48 15.860
1 49 15.021
1 50 13.194
1 51 11.300
1 52 10.500
1 53 9.874
1 54 7.246
1 55 5.502
1 56 7.255
1 57 8.869
1 58 12.364
2 3 3.712
2 4 5.264
2 5 5.705
2 6 7.882
2 7 9.931
2 8 13.310
2 9 14.660
2 10 16.571
2 11 19.974
2 12 20.988
2 13 24.611
2 14 26.005
2 15 28.222
2 16 27.436
2 17 26.450
2 18 23.863
2 19 22.267
2 20 18.602
2 21 15.926
2 22 13.079
2 23 10.991
2 24 13.023
2 25 11.272
2 26 14.827
2 27 16.369
2 28 13.696
2 29 14.802
2 30 14.428
2 31 17.305
2 32 19.214
2 33 19.670
2 34 22.010
2 35 20.704
2 36 23.494
2 37 23.285
2 38 22.198
2 39 19.454
2 40 16.542
2 41 13.059
2 42 9.817
2 43 9.717
2 44 13.047
2 45 14.062
2 46 17.571
2 47 17.545
2 48 16.496
2 49 16.013
2 50 13.461
2 51 11.599
2 52 12.013
2 53 11.355
2 54 7.839
2 55 6.843
2 56 9.812
2 57 12.211
2 58 15.658
3 4 3.765
3 5 5.298
3 6 5.675
3 7 7.436
3 8 11.137
3 9 13.211
3 10 14.962
3 11 18.642
3 12 19.458
3 13 22.991
3 14 24.871
3 15 27.175
3 16 26.862
3 17 25.995
3 18 23.864
3 19 22.429
3 20 18.984
3 21 16.478
3 22 13.154
3 23 11.180
3 24 12.363
3 25 10.420
3 26 13.564
3 27 15.868
3 28 13.932
3 29 15.417
3 30 15.143
3 31 17.475
3 32 19.250
3 33 19.132
3 34 21.369
3 35 20.109
3 36 22.662
3 37 22.879
3 38 21.582
3 39 18.587
3 40 15.998
3 41 12.239
3 42 9.840
3 43 9.607
3 44 13.335
3 45 15.232
3 46 18.951
3 47 19.301
3 48 18.221
3 49 18.412
3 50 15.954
3 51 13.507
3 52 14.244
3 53 14.343
3 54 10.830
3 55 9.041
3 56 11.810
3 57 14.209
3 58 17.229
4 5 3.820
4 6 5.318
4 7 5.362
4 8 8.786
4 9 10.091
4 10 11.574
4 11 15.140
4 12 15.971
4 13 19.569
4 14 21.246
4 15 23.537
4 16 23.146
4 17 22.377
4 18 20.242
4 19 19.031
4 20 15.631
4 21 13.511
4 22 10.308
4 23 9.277
4 24 10.980
4 25 10.254
4 26 13.570
4 27 15.521
4 28 13.857
4 29 14.475
4 30 13.337
4 31 15.229
4 32 16.434
4 33 16.013
4 34 17.953
4 35 16.460
4 36 18.973
4 37 19.118
4 38 17.898
4 39 15.038
4 40 12.292
4 41 8.585
4 42 6.257
4 43 6.086
4 44 9.793
4 45 12.070
4 46 15.828
4 47 16.653
4 48 16.036
4 49 16.682
4 50 13.996
4 51 11.475
4 52 13.125
4 53 13.565
4 54 9.951
4 55 8.588
4 56 11.858
4 57 14.982
4 58 17.912
5 6 3.786
5 7 5.549
5 8 8.148
5 9 9.417
5 10 12.003
5 11 15.345
5 12 17.034
5 13 20.795
5 14 22.183
5 15 23.943
5 16 23.209
5 17 21.821
5 18 19.548
5 19 17.663
5 20 14.226
5 21 11.392
5 22 8.020
5 23 5.923
5 24 7.697
5 25 6.909
5 26 10.530
5 27 12.000
5 28 10.100
5 29 10.716
5 30 9.947
5 31 12.216
5 32 14.072
5 33 14.342
5 34 16.948
5 35 16.188
5 36 19.251
5 37 19.710
5 38 19.200
5 39 16.885
5 40 13.668
5 41 10.418
5 42 7.851
5 43 5.721
5 44 9.267
5 45 10.954
5 46 14.616
5 47 14.695
5 48 13.357
5 49 14.154
5 50 12.112
5 51 9.007
5 52 10.078
5 53 11.232
5 54 8.148
5 55 5.688
5 56 8.503
5 57 11.702
5 58 14.362
6 7 3.831
6 8 6.572
6 9 9.275
6 10 11.961
6 11 15.547
6 12 17.196
6 13 20.827
6 14 22.748
6 15 24.452
6 16 24.212
6 17 22.799
6 18 21.078
6 19 19.207
6 20 16.153
6 21 13.423
6 22 9.721
6 23 7.486
6 24 7.340
6 25 5.698
6 26 8.505
6 27 11.057
6 28 10.121
6 29 11.598
6 30 11.538
6 31 13.190
6 32 15.162
6 33 14.966
6 34 17.633
6 35 17.217
6 36 20.067
6 37 21.104
6 38 20.504
6 39 18.040
6 40 15.210
6 41 11.855
6 42 10.280
6 43 8.110
6 44 11.650
6 45 13.922
6 46 17.544
6 47 17.700
6 48 16.106
6 49 17.323
6 50 15.639
6 51 12.259
6 52 13.014
6 53 14.602
6 54 11.760
6 55 8.788
6 56 10.724
6 57 13.305
6 58 15.397
7 8 3.788
7 9 6.501
7 10 8.500
7 11 12.241
7 12 13.561
7 13 17.118
7 14 19.243
7 15 21.116
7 16 21.202
7 17 20.145
7 18 18.765
7 19 17.426
7 20 14.643
7 21 12.683
7 22 9.085
7 23 8.406
7 24 8.404
7 25 8.354
7 26 10.628
7 27 13.077
7 28 12.781
7 29 13.456
7 30 12.474
7 31 13.236
7 32 14.271
7 33 13.178
7 34 15.234
7 35 14.441
7 36 16.903
7 37 18.077
7 38 17.268
7 39 14.740
7 40 12.239
7 41 9.000
7 42 8.569
7 43 6.762
7 44 10.022
7 45 13.081
7 46 16.670
7 47 17.588
7 48 16.646
7 49 18.349
7 50 16.355
7 51 13.074
7 52 14.771
7 53 16.437
7 54 13.415
7 55 11.103
7 56 13.508
7 57 16.519
7 58 18.574
8 9 3.829
8 10 6.391
8 11 9.730
8 12 11.714
8 13 15.183
8 14 17.246
8 15 18.615
8 16 18.817
8 17 17.467
8 18 16.564
8 19 15.151
8 20 12.958
8 21 11.376
8 22 8.076
8 23 8.330
8 24 7.464
8 25 8.810
8 26 10.170
8 27 12.231
8 28 12.876
8 29 12.969
8 30 11.707
8 31 11.373
8 32 11.921
8 33 10.304
8 34 12.345
8 35 12.170
8 36 14.656
8 37 16.484
8 38 16.239
8 39 14.329
8 40 11.939
8 41 9.623
8 42 10.141
8 43 7.644
8 44 9.770
8 45 12.967
8 46 16.144
8 47 17.164
8 48 16.234
8 49 18.629
8 50 17.098
8 51 13.644
8 52 15.542
8 53 17.852
8 54 15.331
8 55 12.977
8 56 14.899
8 57 17.929
8 58 19.411
9 10 3.796
9 11 6.560
9 12 9.191
9 13 12.815
9 14 14.244
9 15 15.405
9 16 15.226
9 17 13.799
9 18 12.745
9 19 11.505
9 20 9.462
9 21 8.624
9 22 6.111
9 23 8.130
9 24 8.288
9 25 10.836
9 26 12.463
9 27 13.537
9 28 14.017
9 29 13.030
9 30 10.756
9 31 9.762
9 32 9.230
9 33 7.106
9 34 8.767
9 35 8.406
9 36 11.173
9 37 12.963
9 38 13.196
9 39 11.907
9 40 9.274
9 41 7.975
9 42 9.009
9 43 6.426
9 44 7.058
9 45 10.329
9 46 13.110
9 47 14.574
9 48 14.241
9 49 16.984
9 50 15.450
9 51 12.315
9 52 14.872
9 53 17.243
9 54 15.016
9 55 13.356
9 56 15.411
9 57 18.781
9 58 20.275
10 11 3.828
10 12 5.496
10 13 9.172
10 14 10.923
10 15 12.620
10 16 12.993
10 17 12.458
10 18 11.905
10 19 11.855
10 20 10.354
10 21 10.795
10 22 9.077
10 23 11.601
10 24 12.078
10 25 14.440
10 26 16.059
10 27 17.305
10 28 17.774
10 29 16.703
10 30 14.136
10 31 12.928
10 32 11.515
10 33 8.580
10 34 8.535
10 35 6.945
10 36 8.611
10 37 10.409
10 38 10.097
10 39 8.733
10 40 6.955
10 41 6.570
10 42 9.147
10 43 8.045
10 44 7.942
10 45 11.516
10 46 13.970
10 47 16.239
10 48 16.684
10 49 19.362
10 50 17.428
10 51 14.794
10 52 17.795
10 53 19.775
10 54 17.311
10 55 16.195
10 56 18.651
10 57 22.165
10 58 23.882
11 12 3.829
11 13 6.896
11 14 7.832
11 15 8.935
11 16 9.358
11 17 8.968
11 18 9.167
11 19 9.914
11 20 9.600
11 21 11.270
11 22 10.756
11 23 13.943
11 24 14.437
11 25 17.339
11 26 18.680
11 27 19.414
11 28 20.142
11 29 18.516
11 30 15.604
11 31 13.656
11 32 11.238
11 33 7.743
11 34 6.051
11 35 4.259
11 36 5.107
11 37 7.753
11 38 8.381
11 39 8.497
11 40 7.487
11 41 8.904
11 42 11.750
11 43 10.782
11 44 9.363
11 45 12.346
11 46 13.858
11 47 16.459
11 48 17.387
11 49 20.381
11 50 18.730
11 51 16.466
11 52 19.658
11 53 21.770
11 54 19.716
11 55 18.921
11 56 21.179
11 57 24.715
11 58 26.195
12 13 3.798
12 14 5.872
12 15 8.412
12 16 9.787
12 17 10.835
12 18 11.309
12 19 12.940
12 20 12.564
12 21 14.413
12 22 13.731
12 23 16.751
12 24 17.446
12 25 19.934
12 26 21.458
12 27 22.598
12 28 23.145
12 29 21.769
12 30 18.861
12 31 17.257
12 32 14.923
12 33 11.528
12 34 9.487
12 35 6.828
12 36 5.623
12 37 7.368
12 38 6.441
12 39 6.083
12 40 6.825
12 41 8.668
12 42 12.254
12 43 12.427
12 44 11.299
12 45 14.440
12 46 16.010
12 47 19.024
12 48 20.323
12 49 23.002
12 50 20.893
12 51 18.915
12 52 22.282
12 53 23.962
12 54 21.538
12 55 21.023
12 56 23.674
12 57 27.301
12 58 29.102
13 14 3.879
13 15 6.705
13 16 9.300
13 17 11.362
13 18 12.772
13 19 15.105
13 20 15.435
13 21 17.730
13 22 17.366
13 23 20.477
13 24 21.031
13 25 23.571
13 26 24.880
13 27 26.028
13 28 26.780
13 29 25.340
13 30 22.400
13 31 20.523
13 32 17.874
13 33 14.332
13 34 11.618
13 35 9.221
13 36 6.442
13 37 8.090
13 38 7.079
13 39 7.719
13 40 9.752
13 41 12.037
13 42 15.744
13 43 16.183
13 44 14.857
13 45 17.760
13 46 18.893
13 47 22.081
13 48 23.627
13 49 26.366
13 50 24.297
13 51 22.512
13 52 25.936
13 53 27.588
13 54 25.210
13 55 24.803
13 56 27.451
13 57 31.078
13 58 32.820
14 15 3.805
14 16 5.952
14 17 8.843
14 18 10.375
14 19 13.401
14 20 14.307
14 21 17.253
14 22 17.705
14 23 21.206
14 24 22.143
14 25 25.041
14 26 26.510
14 27 27.150
14 28 27.737
14 29 25.808
14 30 22.545
14 31 20.510
14 32 17.400
14 33 14.104
14 34 10.779
14 35 8.226
14 36 4.525
14 37 5.440
14 38 5.667
14 39 8.155
14 40 10.012
14 41 13.178
14 42 16.588
14 43 17.017
14 44 14.871
14 45 17.120
14 46 17.488
14 47 20.842
14 48 22.838
14 49 25.579
14 50 23.675
14 51 22.349
14 52 25.908
14 53 27.494
14 54 25.489
14 55 25.493
14 56 28.046
14 57 31.725
14 58 33.414
15 16 3.763
15 17 6.550
15 18 9.298
15 19 12.391
...etc.
I would do it in this way
use strict;
use warnings;
sub get_ca_atom {
my #result = split;
return
$result[0] eq 'ATOM'
&& $result[2] eq 'CA'
&& #result > 8 ? [ #result[ 5 .. 8 ] ] : ();
}
my #atoms = map get_ca_atom, <>;
while (#atoms) {
my $a = shift #atoms;
for my $b (#atoms) {
my $dist
= sqrt( ( $$a[1] - $$b[1] )**2
+ ( $$a[2] - $$b[2] )**2
+ ( $$a[3] - $$b[3] )**2 );
printf "%s\t%s\t%0.3f\n", $$a[0], $$b[0], $dist;
}
}
But in reality I would like to separate handling with the atom internals from the main algorithm to be clear and communicate an intention of the code as good as possible. There is nothing worse than dealing your own code after few months when written without keeping this in mind.
use strict;
use warnings;
# handle CA ATOM record
use constant { ATOM_TAG => 0, ATOM_TYPE => 2 };
use constant ATOM_SPLICE => ( 5 .. 8 );
use constant { NAME => 0, X => 1, Y => 2, Z => 3 };
sub get_ca_atom {
my #result = split;
return
$result[ATOM_TAG] eq 'ATOM'
&& $result[ATOM_TYPE] eq 'CA'
&& #result > (ATOM_SPLICE)[-1] ? [ #result[ATOM_SPLICE] ] : ();
}
sub get_name { shift->[NAME] }
sub distance {
my ( $a, $b ) = #_;
sqrt( ( $$a[X] - $$b[X] )**2
+ ( $$a[Y] - $$b[Y] )**2
+ ( $$a[Z] - $$b[Z] )**2 );
}
# end of handle CA ATOM record
my #atoms = map get_ca_atom, <>;
while (#atoms) {
my $a = shift #atoms;
for my $b (#atoms) {
my $dist = distance( $a, $b );
printf "%s\t%s\t%0.3f\n", get_name($a), get_name($b), $dist;
}
}
Then you can play with the main algorithm as you wish. For example above code reads all file content into memory which should not be a problem in the real task most time. But if you wish to keep only CA ATOMS just change the line with map to the following.
my #atoms;
while (<>) {
my $atom = get_ca_atom;
push #atoms, $atom if $atom;
}
As you can see, intention of the code is slowly missing with more lines but it can be opposite sometimes. The main objection to code is communicating intention even it would mean more lines of code. Especially you should not mix the low level with the high level which was the reason why I separated distance and get_name in the second code example.
If you prefer to process atoms in order as they go, you can use following code, but notice you will not save memory because you need #atoms stored anyway for calculating distance.
my #atoms;
while (<>) {
my $atom = get_ca_atom;
next unless $atom;
for my $a (#atoms) {
my $dist = distance( $atom, $a );
printf "%s\t%s\t%0.3f\n", get_name($a), get_name($atom), $dist;
}
push #atoms, $atom;
}
Note output comes in a different order. And also note I used next unless $atom; instead of using if ($atom) { and enclosing rest of the loop in block. The reason is I want to emphasize: Just skip it if it's not what you expect. If you would like to surprise yourself with the third different order of output you can replace push with unshift.
I am trying to store vectors. When I run the program in the loop I see all the values, but when referred outside the loop only the last vector is evaluated and stored (the one that ends with prime number 953, see below). Any calculations done with the PVX vector are done only with the last entry. I want PVX to do calculations with all the results not just the last entry. How can I store these results to do calculations with?
This is the code:
PV=[2 3 5 7 11 13 17 19 23 29];
for numba=2:n
if mod(numba,PV)~=0;
xp=numba;
PVX=[2 3 5 7 11 13 17 19 23 29 xp]
end
end
The first few results looks like this:
PVX: Prime Vectors (Result)
PVX =
2 3 5 7 11 13 17 19 23 29 31
PVX =
2 3 5 7 11 13 17 19 23 29 37
PVX =
2 3 5 7 11 13 17 19 23 29 41
PVX =
2 3 5 7 11 13 17 19 23 29 43
PVX = ...........................................................
PVX =
2 3 5 7 11 13 17 19 23 29 953
If you want to store all PVX values, use a different row for each:
PV = [2 3 5 7 11 13 17 19 23 29];
PVX = [];
for numba=2:n
if mod(numba,PV)~=0;
xp = numba;
PVX = [PVX; 2 3 5 7 11 13 17 19 23 29 xp];
end
end
Of course if would be better to initiallize the PVX matrix to the appropriate size, but the number of rows is hard to predict.
Alternatively, build the PVX without loops:
xp = setdiff(primes(n), primes(29)).'; %'// all primes > 29 and <= n
PVX = [ repmat([2 3 5 7 11 13 17 19 23 29], numel(xp), 1) xp ];
As an example, for n=100, either of the above approaches gives
PVX =
2 3 5 7 11 13 17 19 23 29 31
2 3 5 7 11 13 17 19 23 29 37
2 3 5 7 11 13 17 19 23 29 41
2 3 5 7 11 13 17 19 23 29 43
2 3 5 7 11 13 17 19 23 29 47
2 3 5 7 11 13 17 19 23 29 53
2 3 5 7 11 13 17 19 23 29 59
2 3 5 7 11 13 17 19 23 29 61
2 3 5 7 11 13 17 19 23 29 67
2 3 5 7 11 13 17 19 23 29 71
2 3 5 7 11 13 17 19 23 29 73
2 3 5 7 11 13 17 19 23 29 79
2 3 5 7 11 13 17 19 23 29 83
2 3 5 7 11 13 17 19 23 29 89
2 3 5 7 11 13 17 19 23 29 97
I'm assuming you were going for this:
PVX=[2 3 5 7 11 13 17 19 23 29];
for numba=2:n
if mod(numba,PVX)~=0;
xp=numba;
PVX(end+1) = xp;
%// Or alternatively PVX = [PVX, xp];
end
end
but if you could get an estimate of how large PVX will be in the end, you should pre-allocate the array first for a significant speed up.
So, looks like you need all prime till n
As Dan said use this :
PVX=[2 3 5 7 11 13 17 19 23 29 ];
for numba=2:n
if mod(numba,PVX)~=0
xp=numba;
PVX=[ PVX xp];
end
end
Or why not simply use primes function ?
PVX = primes( n ) ;
A simple MATLAB-problem:
coordinates=[1 6 ;9 20];
coordinates =
1 6
9 20
What i now want to have is:
idxList=[1 2 3 4 5 6 9 10 11 12 13 14 15 16 17 18 19 20];
idxList =
1 2 3 4 5 6 9 10 11 12 13 14 15 16 17 18 19 20
How i have to make that?
Here's one way:
>> cell2mat(cellfun(#(x) x(1):x(2), num2cell(coordinates, 2), 'UniformOutput', 0)')
ans =
1 2 3 4 5 6 9 10 11 12 13 14 15 16 17 18 19 20