Related
I have data like
Date ColumnName1 ColumnName2 ColumnName3 ColumnName4 ColumnName5
2018-04-01 1 2 3 4 5
2018-04-02 6 7 8 9 10
2018-04-03 11 12 13 14 15
2018-04-04 16 17 18 19 20
2018-04-05 21 22 23 24 25
and I want data like following
2018-04-01 2018-04-02 2018-04-03 2018-04-03 2018-04-05
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
Now what to do?
I have two matrices A and B. Both have different sizes and 1st, 2nd, 3rd, & 4th value show year, month, day and values in both matrices. I need to extract rows with same year and month however, day of +/-6 days from matrix A and related rows form matrix B. If two or more days are close in matrices A & B, I should choose the rows corresponding to highest value from both matrices.
A = 1954 1 16 2,3042
1954 12 5 2,116
1954 12 21 1,9841
1954 12 22 2,7411
1955 1 13 1,8766
1955 10 16 1,4003
1955 12 29 1,4979
1956 1 19 2,1439
1956 1 21 1,7666
1956 11 26 1,7367
1956 11 27 1,8914
1957 1 27 1,151
1957 2 2 1,1484
1957 12 29 1,1906
1957 12 30 1,3157
1958 1 10 1,6186
1958 1 20 1,1637
1958 2 6 1,1639
1958 10 16 1,1444
1959 1 3 1,7784
1959 1 24 1,1871
1959 2 20 1,2264
1959 10 25 1,2194
1960 6 29 1,2327
1960 12 4 1,7213
1960 12 5 1,373
1961 3 21 1,7149
1961 3 27 1,4404
1961 11 3 1,3934
1961 12 5 1,777
1962 2 12 2,1813
1962 2 16 3,5776
1962 2 17 1,9236
1963 9 27 1,6164
1963 10 13 1,786
1963 10 14 1,9203
1963 11 22 1,7575
1964 2 2 1,4402
1964 11 15 1,437
1964 11 17 1,7588
1964 12 4 1,6358
1965 2 13 1,874
1965 11 2 2,6468
1965 11 26 1,7163
1965 12 11 1,8283
1966 12 1 2,1165
1966 12 19 1,6672
1966 12 24 1,8173
1966 12 25 1,4923
1967 2 23 2,3002
1967 3 1 1,9614
1967 3 18 1,673
1967 11 12 1,724
1968 1 4 1,6355
1968 1 15 1,6567
1968 3 6 1,1587
1968 3 18 1,212
1969 9 29 1,5613
1969 10 1 1,5016
1969 11 20 1,9304
1969 11 29 1,9279
1970 10 3 1,9859
1970 10 28 1,4065
1970 11 4 1,4227
1970 11 9 1,7901
B = 1954 12 28 774
1954 12 29 734
1955 3 26 712
1955 3 27 648
1956 7 18 1030
1956 7 23 1090
1957 2 17 549
1957 2 28 549
1958 2 27 759
1958 2 28 798
1959 1 10 421
1959 1 24 419
1960 12 5 762
1960 12 8 829
1961 2 12 788
1961 2 13 776
1962 2 15 628
1962 4 9 628
1963 3 12 552
1963 3 13 552
1964 2 12 260
1964 2 13 253
1965 12 22 862
1965 12 23 891
1966 1 5 828
1966 12 27 802
1967 1 1 777
1967 1 2 787
1968 1 17 981
1968 1 18 932
1969 3 15 511
1969 3 16 546
1970 2 25 1030
1970 2 26 1030
The expected output is a new matrix C:
C = 1954 12 22 2,7411 1954 12 28 774
1959 1 3 1,7784 1959 1 10 421
1959 1 24 1,1871 1959 1 24 419
1960 12 4 1,7213 1960 12 8 829
1962 2 12 2,1813 1962 2 15 628
1966 12 24 1,8173 1966 12 27 802
1968 1 15 1,6567 1968 1 17 981
Any help how to code this?
I think the following should do what you want -
To deal with overlaps at year and month boundaries, it's useful to have the dates mapped to number of days since an epoch. The first function finds the earliest data in either dataset, and then formats it to be interpreted by the 'daysact' function.
function epoch_date_str = get_epoch_datestr(A,B)
Astr = int2str(A(:,1:3));
Bstr = int2str(B(:,1:3));
[epoch_Ay, epoch_Am, epoch_Ad] = earliest_date(A);
[epoch_By, epoch_Bm, epoch_Bd] = earliest_date(B);
[epoch_y, epoch_m, epoch_d] = earliest_date([epoch_Ay, epoch_Am, epoch_Ad; epoch_By, epoch_Bm, epoch_Bd]);
epoch_str = int2str([epoch_y, epoch_m, epoch_d]);
epoch_date_str = regexprep(epoch_str,'\s+','/')
end
This function then does the calculation of the number of days from the epoch to each date in the dataset, it's basically just wrangling data into a format accepted by the daysact function.
function ndays = days_since_epoch(A, epoch_date_str)
ndays = zeros(size(A,1),1);
Astr = int2str(A(:,1:3));
for i=1:size(Astr,1)
ndays(i) = daysact(epoch_date_str, regexprep(Astr(i,:),'\s+','/'));
end
end
And now we can get on with the actual calculations - I was a bit confused by the fifth column in the 'A' matrix you presented, I assume that is the score, but if not it's configured by the A_MATRIX_SCORE_COL variable. Similarly the 6 day window is configured by the WINDOW_SIZE.
ep_str = get_epoch_datestr(A,B);
ndaysA = days_since_epoch(A, ep_str);
ndaysB = days_since_epoch(B, ep_str);
C = [];
WINDOW_SIZE= 6;
A_MATRIX_SCORE_COL = 5;
for i=1:length(B)
% Find dates within the date window
overlaps = find(ndaysA >= (ndaysB(i) - window_size ) & (ndaysA <= (ndaysB(i) + window_size )));
% If there are multiple matches, choose the highest and append to C
if (length(overlaps) > 0)
[~, max_idx] = max(A(overlaps,A_MATRIX_SCORE_COL));
match_row = overlaps(max_idx);
C = [C; A(match_row,:) B(i,:)];
end
end
C = unique(C,'rows');
The output I get differs from yours:
C =
1954 12 22 2 7411 1954 12 28 774
1959 1 24 1 1871 1959 1 24 419
1960 12 4 1 7213 1960 12 5 762
1960 12 4 1 7213 1960 12 8 829
1962 2 16 3 5776 1962 2 15 628
1966 12 24 1 8173 1966 12 27 802
1968 1 15 1 6567 1968 1 17 981
1968 1 15 1 6567 1968 1 18 932
But your second row has a difference of 7 days, so I wouldn't expect it to be found. It can be included by increasing the window_size to 7.
As you can see, it's possible for a row in A to be included twice in C if it matches more than one date in B. This could be easily filtered from C if you want:
D = []
for i = 1:size(C,1)
% Find matching dates from A. Due to the way C was built, there won't be duplicates from B.
dupes = find((C(:,1) == C(i,1) & C( :,2) == C(i,2) & C( :,3) == C(i,3)))
% If there's only one match (i.e. it matches itself), then add to D
if (length(dupes) == 1)
D = [D; C(i,:)]
else
% If there are duplicates, then compare the scores from B and only add the highest score to D.
best = true;
for j=1:length(dupes)
if C(i,end) < C(dupes(j),end)
best = false;
end
end
if (best == true)
D = [D; C(i,:)]
end
end
end
The matrix 'D' is then your de-duplicated output.
I've found many posts to calculate distance between atoms, even I've written my own code for it. Now I want it to be in very less number of lines, I've written something like follows
#!/usr/bin/perl -w
#ARGV = <>;
for ( $i = 0; $i <= $#ARGV; $i++ ) {
#temp = split( /\s+/, $ARGV[$i] );
if ( $temp[0] eq "ATOM" and $temp[2] eq "CA" ) {
( $n1, $ax, $ay, $az ) = #temp[ 5, 6, 7, 8 ];
if ( $temp[0] eq "ATOM" and $temp[2] eq "CA" ) {
( $n2, $bx, $by, $bz ) = #temp[ 5, 6, 7, 8 ];
}
$dista = sprintf( "%0.3f",
sqrt( ( $ax - $bx )**2 + ( $ay - $by )**2 + ( $az - $bz )**2 ) );
print "$n1\t$n2\t$dista\n";
}
}
A sample input file is http://www.rcsb.org/pdb/files/5PTI.pdb. When I run the program it is not taking the next "CA" atom to calculate the distance, I want to calculate first "CA" to all other "CA" atoms and 2nd CA to all other CA's and so on. I know for loop is missing in my code, I tried to include that but something was going wrong. Where can I modify my code to get correct results.
In order to compare all distinct pairs of atoms you need to access the file data out of order, so it is best to read all of the relevant data into memory before doing the calculations.
This program uses a while loop to read all of the CA ATOM numbers and positions into #data, and then calculates the distance between every different pair using a double nested for loop
use strict;
use warnings;
my #data;
while ( <> ) {
my #fields = split;
next unless $fields[0] eq 'ATOM' and $fields[2] eq 'CA';
push #data, [ #fields[5..8] ];
}
for my $i (0 .. $#data-1) {
my ($an, $ax, $ay, $az) = #{ $data[$i] };
for my $j ($i+1 .. $#data) {
my ($bn, $bx, $by, $bz) = #{ $data[$j] };
my ($dx, $dy, $dz) = ($ax-$bx, $ay-$by, $az-$bz);
my $dist = sqrt($dx*$dx + $dy*$dy + $dz*$dz);
printf "%3d %3d %6.3f\n", $an, $bn, $dist;
}
}
output
1 2 3.772
1 3 6.357
1 4 8.230
1 5 6.883
1 6 8.835
1 7 11.836
1 8 14.819
1 9 16.272
1 10 18.781
1 11 22.069
1 12 23.577
1 13 27.304
1 14 28.550
1 15 30.435
1 16 29.411
1 17 27.994
1 18 25.290
1 19 23.197
1 20 19.549
1 21 16.377
1 22 13.683
1 23 10.803
1 24 12.584
1 25 10.300
1 26 13.713
1 27 14.787
1 28 11.633
1 29 13.140
1 30 13.576
1 31 16.899
1 32 19.413
1 33 20.494
1 34 23.292
1 35 22.504
1 36 25.633
1 37 25.598
1 38 24.936
1 39 22.477
1 40 19.299
1 41 16.005
1 42 12.638
1 43 11.659
1 44 14.746
1 45 15.146
1 46 18.399
1 47 17.588
1 48 15.860
1 49 15.021
1 50 13.194
1 51 11.300
1 52 10.500
1 53 9.874
1 54 7.246
1 55 5.502
1 56 7.255
1 57 8.869
1 58 12.364
2 3 3.712
2 4 5.264
2 5 5.705
2 6 7.882
2 7 9.931
2 8 13.310
2 9 14.660
2 10 16.571
2 11 19.974
2 12 20.988
2 13 24.611
2 14 26.005
2 15 28.222
2 16 27.436
2 17 26.450
2 18 23.863
2 19 22.267
2 20 18.602
2 21 15.926
2 22 13.079
2 23 10.991
2 24 13.023
2 25 11.272
2 26 14.827
2 27 16.369
2 28 13.696
2 29 14.802
2 30 14.428
2 31 17.305
2 32 19.214
2 33 19.670
2 34 22.010
2 35 20.704
2 36 23.494
2 37 23.285
2 38 22.198
2 39 19.454
2 40 16.542
2 41 13.059
2 42 9.817
2 43 9.717
2 44 13.047
2 45 14.062
2 46 17.571
2 47 17.545
2 48 16.496
2 49 16.013
2 50 13.461
2 51 11.599
2 52 12.013
2 53 11.355
2 54 7.839
2 55 6.843
2 56 9.812
2 57 12.211
2 58 15.658
3 4 3.765
3 5 5.298
3 6 5.675
3 7 7.436
3 8 11.137
3 9 13.211
3 10 14.962
3 11 18.642
3 12 19.458
3 13 22.991
3 14 24.871
3 15 27.175
3 16 26.862
3 17 25.995
3 18 23.864
3 19 22.429
3 20 18.984
3 21 16.478
3 22 13.154
3 23 11.180
3 24 12.363
3 25 10.420
3 26 13.564
3 27 15.868
3 28 13.932
3 29 15.417
3 30 15.143
3 31 17.475
3 32 19.250
3 33 19.132
3 34 21.369
3 35 20.109
3 36 22.662
3 37 22.879
3 38 21.582
3 39 18.587
3 40 15.998
3 41 12.239
3 42 9.840
3 43 9.607
3 44 13.335
3 45 15.232
3 46 18.951
3 47 19.301
3 48 18.221
3 49 18.412
3 50 15.954
3 51 13.507
3 52 14.244
3 53 14.343
3 54 10.830
3 55 9.041
3 56 11.810
3 57 14.209
3 58 17.229
4 5 3.820
4 6 5.318
4 7 5.362
4 8 8.786
4 9 10.091
4 10 11.574
4 11 15.140
4 12 15.971
4 13 19.569
4 14 21.246
4 15 23.537
4 16 23.146
4 17 22.377
4 18 20.242
4 19 19.031
4 20 15.631
4 21 13.511
4 22 10.308
4 23 9.277
4 24 10.980
4 25 10.254
4 26 13.570
4 27 15.521
4 28 13.857
4 29 14.475
4 30 13.337
4 31 15.229
4 32 16.434
4 33 16.013
4 34 17.953
4 35 16.460
4 36 18.973
4 37 19.118
4 38 17.898
4 39 15.038
4 40 12.292
4 41 8.585
4 42 6.257
4 43 6.086
4 44 9.793
4 45 12.070
4 46 15.828
4 47 16.653
4 48 16.036
4 49 16.682
4 50 13.996
4 51 11.475
4 52 13.125
4 53 13.565
4 54 9.951
4 55 8.588
4 56 11.858
4 57 14.982
4 58 17.912
5 6 3.786
5 7 5.549
5 8 8.148
5 9 9.417
5 10 12.003
5 11 15.345
5 12 17.034
5 13 20.795
5 14 22.183
5 15 23.943
5 16 23.209
5 17 21.821
5 18 19.548
5 19 17.663
5 20 14.226
5 21 11.392
5 22 8.020
5 23 5.923
5 24 7.697
5 25 6.909
5 26 10.530
5 27 12.000
5 28 10.100
5 29 10.716
5 30 9.947
5 31 12.216
5 32 14.072
5 33 14.342
5 34 16.948
5 35 16.188
5 36 19.251
5 37 19.710
5 38 19.200
5 39 16.885
5 40 13.668
5 41 10.418
5 42 7.851
5 43 5.721
5 44 9.267
5 45 10.954
5 46 14.616
5 47 14.695
5 48 13.357
5 49 14.154
5 50 12.112
5 51 9.007
5 52 10.078
5 53 11.232
5 54 8.148
5 55 5.688
5 56 8.503
5 57 11.702
5 58 14.362
6 7 3.831
6 8 6.572
6 9 9.275
6 10 11.961
6 11 15.547
6 12 17.196
6 13 20.827
6 14 22.748
6 15 24.452
6 16 24.212
6 17 22.799
6 18 21.078
6 19 19.207
6 20 16.153
6 21 13.423
6 22 9.721
6 23 7.486
6 24 7.340
6 25 5.698
6 26 8.505
6 27 11.057
6 28 10.121
6 29 11.598
6 30 11.538
6 31 13.190
6 32 15.162
6 33 14.966
6 34 17.633
6 35 17.217
6 36 20.067
6 37 21.104
6 38 20.504
6 39 18.040
6 40 15.210
6 41 11.855
6 42 10.280
6 43 8.110
6 44 11.650
6 45 13.922
6 46 17.544
6 47 17.700
6 48 16.106
6 49 17.323
6 50 15.639
6 51 12.259
6 52 13.014
6 53 14.602
6 54 11.760
6 55 8.788
6 56 10.724
6 57 13.305
6 58 15.397
7 8 3.788
7 9 6.501
7 10 8.500
7 11 12.241
7 12 13.561
7 13 17.118
7 14 19.243
7 15 21.116
7 16 21.202
7 17 20.145
7 18 18.765
7 19 17.426
7 20 14.643
7 21 12.683
7 22 9.085
7 23 8.406
7 24 8.404
7 25 8.354
7 26 10.628
7 27 13.077
7 28 12.781
7 29 13.456
7 30 12.474
7 31 13.236
7 32 14.271
7 33 13.178
7 34 15.234
7 35 14.441
7 36 16.903
7 37 18.077
7 38 17.268
7 39 14.740
7 40 12.239
7 41 9.000
7 42 8.569
7 43 6.762
7 44 10.022
7 45 13.081
7 46 16.670
7 47 17.588
7 48 16.646
7 49 18.349
7 50 16.355
7 51 13.074
7 52 14.771
7 53 16.437
7 54 13.415
7 55 11.103
7 56 13.508
7 57 16.519
7 58 18.574
8 9 3.829
8 10 6.391
8 11 9.730
8 12 11.714
8 13 15.183
8 14 17.246
8 15 18.615
8 16 18.817
8 17 17.467
8 18 16.564
8 19 15.151
8 20 12.958
8 21 11.376
8 22 8.076
8 23 8.330
8 24 7.464
8 25 8.810
8 26 10.170
8 27 12.231
8 28 12.876
8 29 12.969
8 30 11.707
8 31 11.373
8 32 11.921
8 33 10.304
8 34 12.345
8 35 12.170
8 36 14.656
8 37 16.484
8 38 16.239
8 39 14.329
8 40 11.939
8 41 9.623
8 42 10.141
8 43 7.644
8 44 9.770
8 45 12.967
8 46 16.144
8 47 17.164
8 48 16.234
8 49 18.629
8 50 17.098
8 51 13.644
8 52 15.542
8 53 17.852
8 54 15.331
8 55 12.977
8 56 14.899
8 57 17.929
8 58 19.411
9 10 3.796
9 11 6.560
9 12 9.191
9 13 12.815
9 14 14.244
9 15 15.405
9 16 15.226
9 17 13.799
9 18 12.745
9 19 11.505
9 20 9.462
9 21 8.624
9 22 6.111
9 23 8.130
9 24 8.288
9 25 10.836
9 26 12.463
9 27 13.537
9 28 14.017
9 29 13.030
9 30 10.756
9 31 9.762
9 32 9.230
9 33 7.106
9 34 8.767
9 35 8.406
9 36 11.173
9 37 12.963
9 38 13.196
9 39 11.907
9 40 9.274
9 41 7.975
9 42 9.009
9 43 6.426
9 44 7.058
9 45 10.329
9 46 13.110
9 47 14.574
9 48 14.241
9 49 16.984
9 50 15.450
9 51 12.315
9 52 14.872
9 53 17.243
9 54 15.016
9 55 13.356
9 56 15.411
9 57 18.781
9 58 20.275
10 11 3.828
10 12 5.496
10 13 9.172
10 14 10.923
10 15 12.620
10 16 12.993
10 17 12.458
10 18 11.905
10 19 11.855
10 20 10.354
10 21 10.795
10 22 9.077
10 23 11.601
10 24 12.078
10 25 14.440
10 26 16.059
10 27 17.305
10 28 17.774
10 29 16.703
10 30 14.136
10 31 12.928
10 32 11.515
10 33 8.580
10 34 8.535
10 35 6.945
10 36 8.611
10 37 10.409
10 38 10.097
10 39 8.733
10 40 6.955
10 41 6.570
10 42 9.147
10 43 8.045
10 44 7.942
10 45 11.516
10 46 13.970
10 47 16.239
10 48 16.684
10 49 19.362
10 50 17.428
10 51 14.794
10 52 17.795
10 53 19.775
10 54 17.311
10 55 16.195
10 56 18.651
10 57 22.165
10 58 23.882
11 12 3.829
11 13 6.896
11 14 7.832
11 15 8.935
11 16 9.358
11 17 8.968
11 18 9.167
11 19 9.914
11 20 9.600
11 21 11.270
11 22 10.756
11 23 13.943
11 24 14.437
11 25 17.339
11 26 18.680
11 27 19.414
11 28 20.142
11 29 18.516
11 30 15.604
11 31 13.656
11 32 11.238
11 33 7.743
11 34 6.051
11 35 4.259
11 36 5.107
11 37 7.753
11 38 8.381
11 39 8.497
11 40 7.487
11 41 8.904
11 42 11.750
11 43 10.782
11 44 9.363
11 45 12.346
11 46 13.858
11 47 16.459
11 48 17.387
11 49 20.381
11 50 18.730
11 51 16.466
11 52 19.658
11 53 21.770
11 54 19.716
11 55 18.921
11 56 21.179
11 57 24.715
11 58 26.195
12 13 3.798
12 14 5.872
12 15 8.412
12 16 9.787
12 17 10.835
12 18 11.309
12 19 12.940
12 20 12.564
12 21 14.413
12 22 13.731
12 23 16.751
12 24 17.446
12 25 19.934
12 26 21.458
12 27 22.598
12 28 23.145
12 29 21.769
12 30 18.861
12 31 17.257
12 32 14.923
12 33 11.528
12 34 9.487
12 35 6.828
12 36 5.623
12 37 7.368
12 38 6.441
12 39 6.083
12 40 6.825
12 41 8.668
12 42 12.254
12 43 12.427
12 44 11.299
12 45 14.440
12 46 16.010
12 47 19.024
12 48 20.323
12 49 23.002
12 50 20.893
12 51 18.915
12 52 22.282
12 53 23.962
12 54 21.538
12 55 21.023
12 56 23.674
12 57 27.301
12 58 29.102
13 14 3.879
13 15 6.705
13 16 9.300
13 17 11.362
13 18 12.772
13 19 15.105
13 20 15.435
13 21 17.730
13 22 17.366
13 23 20.477
13 24 21.031
13 25 23.571
13 26 24.880
13 27 26.028
13 28 26.780
13 29 25.340
13 30 22.400
13 31 20.523
13 32 17.874
13 33 14.332
13 34 11.618
13 35 9.221
13 36 6.442
13 37 8.090
13 38 7.079
13 39 7.719
13 40 9.752
13 41 12.037
13 42 15.744
13 43 16.183
13 44 14.857
13 45 17.760
13 46 18.893
13 47 22.081
13 48 23.627
13 49 26.366
13 50 24.297
13 51 22.512
13 52 25.936
13 53 27.588
13 54 25.210
13 55 24.803
13 56 27.451
13 57 31.078
13 58 32.820
14 15 3.805
14 16 5.952
14 17 8.843
14 18 10.375
14 19 13.401
14 20 14.307
14 21 17.253
14 22 17.705
14 23 21.206
14 24 22.143
14 25 25.041
14 26 26.510
14 27 27.150
14 28 27.737
14 29 25.808
14 30 22.545
14 31 20.510
14 32 17.400
14 33 14.104
14 34 10.779
14 35 8.226
14 36 4.525
14 37 5.440
14 38 5.667
14 39 8.155
14 40 10.012
14 41 13.178
14 42 16.588
14 43 17.017
14 44 14.871
14 45 17.120
14 46 17.488
14 47 20.842
14 48 22.838
14 49 25.579
14 50 23.675
14 51 22.349
14 52 25.908
14 53 27.494
14 54 25.489
14 55 25.493
14 56 28.046
14 57 31.725
14 58 33.414
15 16 3.763
15 17 6.550
15 18 9.298
15 19 12.391
...etc.
I would do it in this way
use strict;
use warnings;
sub get_ca_atom {
my #result = split;
return
$result[0] eq 'ATOM'
&& $result[2] eq 'CA'
&& #result > 8 ? [ #result[ 5 .. 8 ] ] : ();
}
my #atoms = map get_ca_atom, <>;
while (#atoms) {
my $a = shift #atoms;
for my $b (#atoms) {
my $dist
= sqrt( ( $$a[1] - $$b[1] )**2
+ ( $$a[2] - $$b[2] )**2
+ ( $$a[3] - $$b[3] )**2 );
printf "%s\t%s\t%0.3f\n", $$a[0], $$b[0], $dist;
}
}
But in reality I would like to separate handling with the atom internals from the main algorithm to be clear and communicate an intention of the code as good as possible. There is nothing worse than dealing your own code after few months when written without keeping this in mind.
use strict;
use warnings;
# handle CA ATOM record
use constant { ATOM_TAG => 0, ATOM_TYPE => 2 };
use constant ATOM_SPLICE => ( 5 .. 8 );
use constant { NAME => 0, X => 1, Y => 2, Z => 3 };
sub get_ca_atom {
my #result = split;
return
$result[ATOM_TAG] eq 'ATOM'
&& $result[ATOM_TYPE] eq 'CA'
&& #result > (ATOM_SPLICE)[-1] ? [ #result[ATOM_SPLICE] ] : ();
}
sub get_name { shift->[NAME] }
sub distance {
my ( $a, $b ) = #_;
sqrt( ( $$a[X] - $$b[X] )**2
+ ( $$a[Y] - $$b[Y] )**2
+ ( $$a[Z] - $$b[Z] )**2 );
}
# end of handle CA ATOM record
my #atoms = map get_ca_atom, <>;
while (#atoms) {
my $a = shift #atoms;
for my $b (#atoms) {
my $dist = distance( $a, $b );
printf "%s\t%s\t%0.3f\n", get_name($a), get_name($b), $dist;
}
}
Then you can play with the main algorithm as you wish. For example above code reads all file content into memory which should not be a problem in the real task most time. But if you wish to keep only CA ATOMS just change the line with map to the following.
my #atoms;
while (<>) {
my $atom = get_ca_atom;
push #atoms, $atom if $atom;
}
As you can see, intention of the code is slowly missing with more lines but it can be opposite sometimes. The main objection to code is communicating intention even it would mean more lines of code. Especially you should not mix the low level with the high level which was the reason why I separated distance and get_name in the second code example.
If you prefer to process atoms in order as they go, you can use following code, but notice you will not save memory because you need #atoms stored anyway for calculating distance.
my #atoms;
while (<>) {
my $atom = get_ca_atom;
next unless $atom;
for my $a (#atoms) {
my $dist = distance( $atom, $a );
printf "%s\t%s\t%0.3f\n", get_name($a), get_name($atom), $dist;
}
push #atoms, $atom;
}
Note output comes in a different order. And also note I used next unless $atom; instead of using if ($atom) { and enclosing rest of the loop in block. The reason is I want to emphasize: Just skip it if it's not what you expect. If you would like to surprise yourself with the third different order of output you can replace push with unshift.
I'm using MESH2D in Matlab in order to mesh ROI (Region Of Interest) from images. Now I would like to make binary masks from these triangular meshes. The outputs from [p,t] = mesh2d(node) are:
p = Nx2 array of nodal XY co-ordinates.
t = Mx3 array of triangles as indicies into P, defined with a counter-clockwise node ordering.
Example of an initial code (feel free to improve it!):
mask= logical([0 0 0 0 0; 0 1 1 0 0; 0 1 1 1 1; 0 1 1 0 0]) %let's say this is my ROI
figure, imagesc(mask)
lol=regionprops(mask,'all')
[p,t] = mesh2d(lol.ConvexHull); %it should mesh the ROI
How to make masks from this triangular mesh?
Thank you in advance!
This is p:
1,50000000000000 2
1,50000000000000 2,50000000000000
1,50000000000000 3
1,50000000000000 3,50000000000000
1,50000000000000 4
1,93703949778653 2,56171771423604
1,96936200278303 3,98632617574682
2 1,50000000000000
2 4,50000000000000
2,00975325040940 3,53647067507122
2,01137717786904 2,05700769275495
2,05400996239344 3,03376821385856
2,41193753423879 2,49774899749798
2,45957145752038 3,46313210038859
2,50000000000000 1,50000000000000
2,50000000000000 4,50000000000000
2,51246316199066 3,99053096338726
2,56500321259084 1,97186739050944
2,64423955240966 2,98576823004855
3 1,50000000000000
3 4,50000000000000
3,00248771086621 2,47385860181019
3,01650848812758 3,52665319517610
3,08981230082503 3,98949609178151
3,12731558449295 2,02370031640169
3,36937385842331 2,99811446160210
3,50000000000000 1,75000000000000
3,50000000000000 4,25000000000000
3,85193739480358 3,46578962137238
3,85353024582881 2,53499308989903
4 2
4 4
4,42246720814684 3,00037409439956
4,50000000000000 2,25000000000000
4,50000000000000 3,75000000000000
4,97304775909580 2,99999314296989
5 2,50000000000000
5 3,50000000000000
5,50000000000000 3
and t:
9 5 7
20 18 15
1 8 11
8 15 11
11 15 18
11 2 1
6 2 11
20 27 25
25 18 20
27 30 25
17 10 14
7 10 17
24 21 17
9 7 17
29 35 32
26 30 29
23 19 26
14 19 23
26 29 23
23 29 24
23 17 14
24 17 23
6 11 13
13 11 18
34 30 31
31 30 27
3 2 6
12 19 14
14 10 12
6 13 12
12 13 19
12 3 6
28 21 24
28 29 32
24 29 28
9 17 16
16 17 21
38 35 33
35 29 33
33 29 30
34 37 33
33 30 34
19 13 22
26 19 22
18 25 22
22 13 18
22 30 26
22 25 30
4 7 5
4 10 7
4 12 10
3 12 4
38 33 36
36 33 37
39 38 36
36 37 39
To get the mask for the ix-th triangle, use:
poly2mask(p(t(ix,:),1),p(t(ix,:),2),width,height)
t is used to index n to get the data for one triangle.
I am trying to store vectors. When I run the program in the loop I see all the values, but when referred outside the loop only the last vector is evaluated and stored (the one that ends with prime number 953, see below). Any calculations done with the PVX vector are done only with the last entry. I want PVX to do calculations with all the results not just the last entry. How can I store these results to do calculations with?
This is the code:
PV=[2 3 5 7 11 13 17 19 23 29];
for numba=2:n
if mod(numba,PV)~=0;
xp=numba;
PVX=[2 3 5 7 11 13 17 19 23 29 xp]
end
end
The first few results looks like this:
PVX: Prime Vectors (Result)
PVX =
2 3 5 7 11 13 17 19 23 29 31
PVX =
2 3 5 7 11 13 17 19 23 29 37
PVX =
2 3 5 7 11 13 17 19 23 29 41
PVX =
2 3 5 7 11 13 17 19 23 29 43
PVX = ...........................................................
PVX =
2 3 5 7 11 13 17 19 23 29 953
If you want to store all PVX values, use a different row for each:
PV = [2 3 5 7 11 13 17 19 23 29];
PVX = [];
for numba=2:n
if mod(numba,PV)~=0;
xp = numba;
PVX = [PVX; 2 3 5 7 11 13 17 19 23 29 xp];
end
end
Of course if would be better to initiallize the PVX matrix to the appropriate size, but the number of rows is hard to predict.
Alternatively, build the PVX without loops:
xp = setdiff(primes(n), primes(29)).'; %'// all primes > 29 and <= n
PVX = [ repmat([2 3 5 7 11 13 17 19 23 29], numel(xp), 1) xp ];
As an example, for n=100, either of the above approaches gives
PVX =
2 3 5 7 11 13 17 19 23 29 31
2 3 5 7 11 13 17 19 23 29 37
2 3 5 7 11 13 17 19 23 29 41
2 3 5 7 11 13 17 19 23 29 43
2 3 5 7 11 13 17 19 23 29 47
2 3 5 7 11 13 17 19 23 29 53
2 3 5 7 11 13 17 19 23 29 59
2 3 5 7 11 13 17 19 23 29 61
2 3 5 7 11 13 17 19 23 29 67
2 3 5 7 11 13 17 19 23 29 71
2 3 5 7 11 13 17 19 23 29 73
2 3 5 7 11 13 17 19 23 29 79
2 3 5 7 11 13 17 19 23 29 83
2 3 5 7 11 13 17 19 23 29 89
2 3 5 7 11 13 17 19 23 29 97
I'm assuming you were going for this:
PVX=[2 3 5 7 11 13 17 19 23 29];
for numba=2:n
if mod(numba,PVX)~=0;
xp=numba;
PVX(end+1) = xp;
%// Or alternatively PVX = [PVX, xp];
end
end
but if you could get an estimate of how large PVX will be in the end, you should pre-allocate the array first for a significant speed up.
So, looks like you need all prime till n
As Dan said use this :
PVX=[2 3 5 7 11 13 17 19 23 29 ];
for numba=2:n
if mod(numba,PVX)~=0
xp=numba;
PVX=[ PVX xp];
end
end
Or why not simply use primes function ?
PVX = primes( n ) ;