Extra, changing character in output file MATLAB fprintf - matlab

I have two data structures:
atomName <855*1 cell>
atomSASA <855*1 double>
which I am using in code to produce an output file:
%write out SASA values for individual atoms to file
results_file = fopen('output.txt','w');
fprintf(results_file,'SASA for Individual Atoms: \n');
i=1;
while i < (855)
fprintf(results_file,'Atom %s %s: %.3f\n',i,cell2mat(atomName(i)),atomSASA(i));
i = i + 1;
end
The file seems correct for the first 32 lines, but starting at line 33, there is an extra character after the word Atom. The character changes each line, eventually going through digits, capital letters A-Z, lower letters a-z etc. I would like to remove this extra character from each line:
Atom ! HG23: 6.286
Atom " N: 0.000
Atom # CA: 0.000
Atom $ C: 0.000
Atom % O: 0.000
Atom & CB: 0.000
Atom ' CG: 0.000
Atom ( CD1: 0.000
Atom ) CD2: 0.000
Atom * CE1: 0.000
Atom + CE2: 0.000
Atom , CZ: 0.000
Atom - OH: 0.000
Atom . H: 0.000
Atom / HA: 0.000
Atom 0 HB2: 0.000
Atom 1 HB3: 0.000
Atom 2 HD1: 0.000
Atom 3 HD2: 0.000
Atom 4 HE1: 0.000
Atom 5 HE2: 0.000
Atom 6 HH: 0.000
Atom 7 N: 0.000
Atom 8 CA: 0.000
Atom 9 C: 0.000
Atom : O: 0.000
Atom ; CB: 0.000
Atom < CG: 0.000
Atom = CD: 0.000
Atom > CE: 1.208
Atom ? NZ: 1.312
Atom # H: 0.000
Atom A HA: 0.000
Atom B HB2: 0.000
Atom C HB3: 0.000
Atom D HG2: 0.000
Atom E HG3: 0.000
Atom F HD2: 0.000
Atom G HD3: 0.000
Atom H HE2: 0.000
Atom I HE3: 33.979
Atom J HZ1: 0.000
Atom K HZ2: 0.000
Atom L HZ3: 44.513
Atom M N: 0.000
Atom N CA: 0.000
Atom O C: 0.000
Atom P O: 0.000
Atom Q CB: 0.000
Atom R CG: 0.000
Atom S CD1: 0.000
Atom T CD2: 0.000
Atom U H: 0.000
Atom V HA: 0.000
Atom W HB2: 0.000
Atom X HB3: 0.000
Atom Y HG: 0.000
Atom Z HD11: 0.000
Atom [ HD12: 0.000
Atom \ HD13: 0.000
Atom ] HD21: 0.000
Atom ^ HD22: 0.000
Atom _ HD23: 0.000
Atom ` N: 0.000
Atom a CA: 0.000
Atom b C: 0.000
Atom c O: 0.000
Atom d CB: 0.000
Atom e CG1: 0.000
Atom f CG2: 0.000

You should probably replace your fprintf by:
fprintf(results_file,'Atom %i %s: %.3f\n', i, atomName{i}, atomSASA(i));
To print an integer, you should use %i or %d, with %s an integer gets interpreted as an ascii value:
>> fprintf('as ascii: %s\n', 66)
as ascii: B
>> fprintf('as number: %i\n', 66)
as number: 66
One other thing: it is easier to write atomName{i} instead of cell2mat(atomName(i)). Indexing a cell array with curly braces gives you the content of the i-th cell, what you want. Indexing it with normal parenthesis will give you a 1x1 cell array with the content of the i-th cell:
>> atoms = {'aa','bb','cc'}
atoms =
'aa' 'bb' 'cc'
>> class(atoms{2})
ans =
char
>> class(atoms(2))
ans =
cell
See the manual.

Related

How to print same row data in multiple time from pdb file in perl

I am new in perl, I ma trying to write a program to input pdb file (from Directory, I have 3000 files) and output will save another directory (Another folder).
Code:
open( filehandler, "Document1.txt" ) or die $!; #Input file
my #file1 = <filehandler>;
my $OutputDir = 'C:\test_result_file';
foreach my $line (#file1) {
chomp $line;
open( fh, "$line" ) or die $!;
open( out, ">$OutputDir/$line.pdb" ) or die $!;
while ( $file = <fh> ) {
if ( $file =~ /^ATOM.{9}(?:CG|CD1|CD2B|CE1|CE2|CZ|C|O|CB|CG|CD)/ ) {
$hash{$1}{$2}++;
}
foreach $key ( sort { $hash{$1} <=> $hash{$2} or $1 cmp $2 } keys %hash ) {
print out $key;
}
}
print "Completed", "\n";
}
for example input file:
ATOM 1752 CG TYR A 248 89.088 39.843 51.944 1.00 32.03 C
ATOM 1753 CD1 TYR A 248 89.759 39.356 50.810 1.00 37.15 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1756 CE2 TYR A 248 87.035 39.774 50.706 1.00 35.66 C
ATOM 1757 CZ TYR A 248 87.708 39.285 49.599 1.00 35.16 C
ATOM 7394 C GLN B 331 37.664 74.934 36.854 1.00 22.75 C
ATOM 7395 O GLN B 331 37.728 73.730 36.607 1.00 31.73 O
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7398 CD GLN B 331 35.390 75.877 33.328 1.00 35.70 C
Expected output:
A chain:
ATOM 1753 CD1 TYR A 248 89.759 39.356 50.810 1.00 37.15 C
ATOM 1752 CG TYR A 248 89.088 39.843 51.944 1.00 32.03 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1753 CD1 TYR A 248 89.759 39.356 50.810 1.00 37.15 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1756 CE2 TYR A 248 87.035 39.774 50.706 1.00 35.66 C
ATOM 1754 CD2 TYR A 248 87.727 40.049 51.864 1.00 32.81 C
ATOM 1755 CE1 TYR A 248 89.078 39.081 49.646 1.00 36.00 C
ATOM 1756 CE2 TYR A 248 87.035 39.774 50.706 1.00 35.66 C
ATOM 1757 CZ TYR A 248 87.708 39.285 49.599 1.00 35.16 C
B chain:
ATOM 7394 C GLN B 331 37.664 74.934 36.854 1.00 22.75 C
ATOM 7395 O GLN B 331 37.728 73.730 36.607 1.00 31.73 O
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7395 O GLN B 331 37.728 73.730 36.607 1.00 31.73 O
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7398 CD GLN B 331 35.390 75.877 33.328 1.00 35.70 C
ATOM 7396 CB GLN B 331 37.467 76.222 34.712 1.00 27.88 C
ATOM 7397 CG GLN B 331 36.515 76.825 33.693 1.00 32.42 C
ATOM 7398 CD GLN B 331 35.390 75.877 33.328 1.00 35.70 C
ATOM 7394 C GLN B 331 37.664 74.934 36.854 1.00 22.75 C
Chain ID may be a to h. so, rule is see above expected output: First four row will unique and then line five will be same row of second row and will add new row as eight line row.
I am unable to write a code to solve this problem, any one pl help
I'm afraid I have to say your code is rather confusing but what I get from the data samples and further explanation is:
There should be a window of four lines moving through the input lines.
The window contents (all four lines) should be printed as you advance to each new line.
The window should be reset each time a new sequence is encountered.
Each line is a series of space delimited fields and the fifth field identifies the sequence.
The window contents can be stored in a simple Perl array (see #window in the snippet below). You simply append data to it with push and remove the first line with shift as you move to a next line. When the sequence changes, print the current window and reset it. In the sample code below I made an assumption that sequences do not intermix. If this is note the case, you need to read all the input beforehand and sort it as necessary.
use strict;
use warnings;
my $win_len = 4;
my #window = ();
my $prev_chain = "";
while (<>) {
my ($atom_name, $chain) = (split)[2, 4];
next unless $atom_name =~ /\b(?:CG|CD1|CD2B|CE1|CE2|CZ|C|O|CB|CG|CD)\b/;
if ($chain eq $prev_chain) {
if (#window == $win_len) {
print_window();
shift #window;
}
push #window, $_;
} else {
print_window() if #window;
#window = ($_);
$prev_chain = $chain;
}
}
print_window() if #window;
sub print_window {
print foreach #window;
print "\n";
}
Demo: https://ideone.com/oOB8bF
The script reads data from STDIN and prints result to STDOUT for the sake of simplicity. Your code sample suggests you store a list of files to process in the Document1.txt and the actual input is read from these files. In this case you need an extra loop:
use strict;
use warnings;
my $OutputDir = 'C:/test_result_file';
open my $dir, "Document1.txt" or die "Failed to open Document1.txt:$!";
chomp(my #files = <$dir>);
foreach my $file (#files) {
my $win_len = 4;
my #window = ();
my $prev_chain = "";
open my $input, $file or die "failed to open $file: $!\n";
open my $output, '>', "$OutputDir/$file.pdb" or die "failed to open $OutputDir/$file.pdb: $!\n";
while (<$input>) {
my ($atom_name, $chain) = (split)[2, 4];
next unless $atom_name =~ /\b(?:CG|CD1|CD2B|CE1|CE2|CZ|C|O|CB|CG|CD)\b/;
if ($chain eq $prev_chain) {
if (#window == $win_len) {
print_window($output, #window);
shift #window;
}
push #window, $_;
} else {
print_window($output, #window) if #window;
#window = ($_);
$prev_chain = $chain;
}
}
print_window($output, #window) if #window;
}
sub print_window {
my $fh = shift;
print $fh $_ foreach #_;
print $fh "\n";
}

How to illustrate large data using bar plot or other styles?

I have a set of data and want to plot them. However I don't know how to plot them nicely. I tried to use bar plot in Matlab and the resulted plot is not readable.
I thought to ask if there is a setting or even a new style to plot my results in a better way.
There are 15 bars for each element of x axis [1:19].
Here you can see the required data:
DAT=...
[2.476 4.142 0.000 4.302 4.302 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
6.680 8.703 2.611 6.680 6.680 3.261 0.000 3.799 6.680 0.000 0.000 0.000 0.000 0.000 0.000
7.672 6.498 7.809 7.809 7.809 6.615 5.062 7.809 7.809 3.916 5.895 7.809 2.780 5.195 2.385
27.126 17.441 11.386 0.000 0.000 17.435 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.085 0.156 0.284 0.284 0.284 0.000 0.284 0.284 0.284 0.226 0.284 0.284 0.284 0.284 0.284
10.688 12.159 10.688 10.688 10.688 10.688 10.073 5.633 10.688 0.000 0.000 6.681 0.000 0.000 0.000
11.002 11.020 11.002 11.002 11.002 11.002 11.002 9.456 11.002 5.459 2.434 3.585 3.527 2.160 0.117
2.111 2.111 2.111 2.111 2.111 2.111 2.111 0.000 0.000 2.111 2.111 0.000 0.000 0.581 0.000
9.085 9.906 9.085 1.256 9.085 9.085 9.085 7.299 0.000 9.085 0.000 0.000 0.000 0.000 0.000
3.661 3.661 3.661 3.661 3.661 3.661 3.661 0.000 0.000 1.206 0.000 0.000 0.000 0.000 0.000
3.307 3.703 3.307 2.968 3.307 3.307 3.307 0.000 0.000 3.307 0.000 0.000 0.000 0.000 0.000
4.136 4.123 4.123 4.123 4.123 4.123 4.123 4.123 0.000 4.123 4.123 0.000 3.350 0.000 0.000
2.993 3.000 2.993 2.993 2.993 2.993 0.000 1.134 0.000 2.993 0.000 0.000 1.938 0.000 0.000
1.857 1.864 1.857 1.857 1.857 1.857 1.857 1.857 0.000 1.857 1.857 0.016 1.857 0.704 0.000
3.754 3.754 3.754 3.754 3.754 3.754 3.754 0.000 0.000 3.754 3.754 0.000 0.000 0.000 0.000
2.752 2.760 2.752 2.752 2.752 2.752 2.752 2.752 0.000 2.752 2.752 2.752 2.752 2.752 0.000
3.788 5.611 3.788 3.788 3.788 3.788 3.788 3.369 0.000 3.788 3.788 0.000 3.788 0.000 0.000
0.132 0.123 0.123 0.123 0.132 0.123 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
15.465 14.081 12.468 4.808 1.787 12.858 11.067 3.644 9.317 11.846 10.631 4.233 8.394 5.094 4.225];
bar(DAT)
Try out bar3(DAT)
Output graph:

Field manipulation

I have some text files and I need to remove the first character from the fourth column only if the column has four characters
file1 as follows
ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N AARG K 408 12.186 3.982 29.147 0.50 6.55 N
file2 as follows
ATOM 41 CA ATRP A 6 -18.975 -29.894 -7.425 0.50 19.50 C
ATOM 42 CA BTRP A 6 -18.979 -29.890 -7.428 0.50 19.16 C
ATOM 43 C HIS A 6 -18.091 -29.845 -8.669 1.00 19.84 C
ATOM 44 O HIS A 6 -17.015 -30.452 -8.696 1.00 20.10 O
ATOM 45 CB ASER A 9 -18.499 -28.879 -6.370 0.50 19.73 C
ATOM 46 CB BSER A 9 -18.565 -28.837 -6.367 0.50 19.13 C
ATOM 47 CG CHIS A 12 -19.421 -27.711 -6.216 0.50 21.30 C
Desired output
file1
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
file2
ATOM 41 CA TRP A 6 -18.975 -29.894 -7.425 0.50 19.50 C
ATOM 42 CA TRP A 6 -18.979 -29.890 -7.428 0.50 19.16 C
ATOM 43 C HIS A 6 -18.091 -29.845 -8.669 1.00 19.84 C
ATOM 44 O HIS A 6 -17.015 -30.452 -8.696 1.00 20.10 O
ATOM 45 CB SER A 9 -18.499 -28.879 -6.370 0.50 19.73 C
ATOM 46 CB SER A 9 -18.565 -28.837 -6.367 0.50 19.13 C
ATOM 47 CG HIS A 12 -19.421 -27.711 -6.216 0.50 21.30 C
This might work for you (GNU sed):
sed -r 's/^((\S+\s+){3})\S(\S{3}\s)/\1 \3/' file
This replaces the first character of the fourth column with a space if that column has four non-space characters.
Use the length() function to find the length of the column and the substr() function to print the substring you need:
$ awk 'length($4)==4{$4=substr($4,2)}1' file | column -t
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
Piping to column -t rebuilds a nice table format. To store the changes back to a file uses the redirection operator:
$ awk 'length($4)==4{$4=substr($4,2)}1' file | column -t > new_file
With sed you could do:
$ sed -r 's/^((\S+\s+){3})\S(\S{3}\s)/\1\3/' file
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
To store the changes back to the original file you can use the -i option:
$ sed -ri 's/^((\S+\s+){3})\S(\S{3}\s)/\1\3/' file

insert word between lines

I have pdb (protein data base) file which has thousands of lines.
REMARK 1 PDB file generated by ptraj (set 1000)
ATOM 1 O22 DDM 1 2.800 4.419 20.868 0.00 0.00
ATOM 2 H22 DDM 1 3.427 4.096 20.216 0.00 0.00
ATOM 3 C22 DDM 1 3.351 5.588 21.698 0.00 0.00
ATOM 4 H42 DDM 1 3.456 5.274 22.736 0.00 0.00
ATOM 5 C23 DDM 1 2.530 6.846 21.639 0.00 0.00
ATOM 6 H43 DDM 1 2.347 7.159 20.611 0.00 0.00
ATOM 7 O23 DDM 1 1.313 6.498 22.334 0.00 0.00
ATOM 8 H23 DDM 1 0.903 5.837 21.771 0.00 0.00
ATOM 9 C24 DDM 1 3.073 8.109 22.266 0.00 0.00
ATOM 10 H44 DDM 1 3.139 7.837 23.319 0.00 0.00
ATOM 11 O24 DDM 1 2.218 9.278 22.007 0.00 0.00
ATOM 12 H24 DDM 1 1.278 9.184 22.179 0.00 0.00
ATOM 13 C25 DDM 1 4.494 8.317 21.764 0.00 0.00
ATOM 14 H45 DDM 1 4.391 8.452 20.687 0.00 0.00
'
I want to insert word "TER" every 81 lines in that file whcih contains more than 20,000 lines but ignoring the first line since it is a comment.
I browse through internet, seems SED can do it. But i am lost.
Can anyone guide?
Thanks in advance.
Try this:
sed -i -e '1~81 i\TER' file
I'm partial to awk myself:
awk '{if(FNR%81==0)print "TER"; print}' file
I find this is a lot easier to understand and debug than the sed equivalent. The only magic is that FNR is the line number
You might have to fiddle with the numbers in the if to get it exactly the way you want it.
The more verbose shell commands would be
{
read header
echo "$header"
i=0
while read line; do
echo "$line"
if (( ++i == 81 )); then
echo TER
i=0
fi
done
} < infile > outfile &&
mv outfile infile

match a pattern and print subsequent lines

there are 200 files named File1_0.pdb,File1_60.pdb etc....it looks like:
ATOM 1 N VAL 1 8.897 -21.545 -7.276 1.00 0.00
ATOM 2 H1 VAL 1 9.692 -22.015 -6.868 1.00 0.00
ATOM 3 H2 VAL 1 9.228 -20.766 -7.827 1.00 0.00
ATOM 4 H3 VAL 1 8.289 -22.236 -7.693 1.00 0.00
TER
ATOM 5 CA VAL 1 8.124 -20.953 -6.203 1.00 0.00
ATOM 6 HA VAL 1 8.072 -19.874 -6.345 1.00 0.00
ATOM 7 CB VAL 1 6.693 -21.515 -6.176 1.00 0.00
ATOM 8 HB VAL 1 6.522 -22.024 -5.227 1.00 0.00
ATOM 9 CG1 VAL 1 5.684 -20.370 -6.330 1.00 0.00
ATOM 10 1HG1 VAL 1 5.854 -19.861 -7.279 1.00 0.00
i have to extract the part after TER and put in a different file...this has to be done on all 200 files. I did something like sed '1,/TER/d' File1_0.pdb > 1_0.pdb. But this will work for one file at a time...can there be a solution for all 200 files in one go... output file is named same only "File" is removed from the name...
for i in *.pdb; do sed '1,/TER/d' $i > ${i/File/}; done
This might work:
seq 0 200| xargs -i -n1 cp File1_{}.pdb 1_{}.pbd # backup files
sed -si '1,/TER/d' 1_{0..200}.pdb # edit files separately inline