Edit text columns

Edit text columns - sed

I have a text file (the first two lines are character spacings):
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM 1 N1 SPINA 3 30.616 29.799 14.979 1.00 20.00 S N
ATOM 2 N1 SPINA 3 28.146 28.381 13.950 1.00 20.00 S N
ATOM 3 N1 SPINA 3 27.605 28.239 14.037 1.00 20.00 S N
ATOM 4 N1 SPINA 3 30.333 29.182 15.464 1.00 20.00 S N
ATOM 5 N1 SPINA 3 29.608 29.434 14.333 1.00 20.00 S N
ATOM 6 N1 SPINA 3 29.303 29.830 13.317 1.00 20.00 S N
ATOM 7 N1 SPINA 3 28.963 31.116 13.472 1.00 20.00 S N
ATOM 8 N1 SPINA 3 28.859 28.743 13.828 1.00 20.00 S N
ATOM 9 N1 SPINA 3 29.699 30.575 14.564 1.00 20.00 S N
ATOM 10 N1 SPINA 3 29.518 29.194 15.301 1.00 20.00 S N
I want to edit it and make it like:
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM 1 N001 SPINA 3 30.616 29.799 14.979 1.00 20.00 S N
ATOM 2 N002 SPINA 3 28.146 28.381 13.950 1.00 20.00 S N
ATOM 3 N003 SPINA 3 27.605 28.239 14.037 1.00 20.00 S N
ATOM 4 N004 SPINA 3 30.333 29.182 15.464 1.00 20.00 S N
ATOM 5 N005 SPINA 3 29.608 29.434 14.333 1.00 20.00 S N
ATOM 6 N006 SPINA 3 29.303 29.830 13.317 1.00 20.00 S N
ATOM 7 N007 SPINA 3 28.963 31.116 13.472 1.00 20.00 S N
ATOM 8 N008 SPINA 3 28.859 28.743 13.828 1.00 20.00 S N
ATOM 9 N009 SPINA 3 29.699 30.575 14.564 1.00 20.00 S N
ATOM 10 N010 SPINA 3 29.518 29.194 15.301 1.00 20.00 S N
The number of spaces between each column are important and the list of atoms needs to go up to 190 (N001-N190). Thus I would like to replace characters 13-16 (" N1 ") in file 1 with ("N001") and keep the remainder of the file in the original spacing.

You don't need 10 long lines of sample input to demonstrate the problem or the solution:
$ cat file
ATOM 1 N1 SPINA 3
ATOM 2 N1 SPINA 3
ATOM 10 N1 SPINA 3
$ awk '{print substr($0,1,12) sprintf("N%03d",$2) substr($0,17)}' file
ATOM 1 N001 SPINA 3
ATOM 2 N002 SPINA 3
ATOM 10 N010 SPINA 3
I'm assuming we could use $2 as the numeric part of the 3rd field. It seems to increment sequentially with your line numbers. Using NR might be an alternative. If neither of those is actually what you want, post some more representative sample input/output.
Also, note that any solution that involves assigning to a field (e.g. $3=...) WILL cause awk to recompile the line using the value of OFS as the field separator and so will change your spacing.
Oh, and if those 2 initial lines of character spacings are really present in your files, this is the tweak:
$ cat file
1 2
12345678901234567890123456
ATOM 1 N1 SPINA 3
ATOM 2 N1 SPINA 3
ATOM 10 N1 SPINA 3
$ awk 'NR>2{$0 = substr($0,1,12) sprintf("N%03d",$2) substr($0,17)} 1' file
1 2
12345678901234567890123456
ATOM 1 N001 SPINA 3
ATOM 2 N002 SPINA 3
ATOM 10 N010 SPINA 3

Try :
$ awk '{$3=substr($3,1,1) sprintf("%03d",$2)}1' OFS=\\t file
Note : OFS will be tab
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
--edit--
if you want to increment with line
$ awk '{$3=substr($3,1,1) sprintf("%03d",NR)}1' OFS=\\t file

Here is yet another way:
awk 'sub(/.$/,sprintf("%03d",NR),$3)' OFS='\t' file
Output:
$ awk 'sub(/.$/,sprintf("%03d",NR),$3)' OFS='\t' file
ATOM 1 N001 SPINA 3 30.616 29.799 14.979 1.00 20.00 S N
ATOM 2 N002 SPINA 3 28.146 28.381 13.950 1.00 20.00 S N
ATOM 3 N003 SPINA 3 27.605 28.239 14.037 1.00 20.00 S N
ATOM 4 N004 SPINA 3 30.333 29.182 15.464 1.00 20.00 S N
ATOM 5 N005 SPINA 3 29.608 29.434 14.333 1.00 20.00 S N
ATOM 6 N006 SPINA 3 29.303 29.830 13.317 1.00 20.00 S N
ATOM 7 N007 SPINA 3 28.963 31.116 13.472 1.00 20.00 S N
ATOM 8 N008 SPINA 3 28.859 28.743 13.828 1.00 20.00 S N
ATOM 9 N009 SPINA 3 29.699 30.575 14.564 1.00 20.00 S N
ATOM 10 N010 SPINA 3 29.518 29.194 15.301 1.00 20.00 S N

If you are interesting to resolve it with pure shell, here is the code:
while IFS="\n" read -r line
do
n=${line:9:3}
printf "%sN%03d%s\n" "${line:0:12}" $n "${line:16}"
done < file

awk '$3="N"sprintf("%03d",$2)' OFS='\t' infile.txt
Result
ATOM 1 N001 SPINA 3 30.616 29.799 14.979 1.00 20.00SN
ATOM 2 N002 SPINA 3 28.146 28.381 13.950 1.00 20.00SN
ATOM 3 N003 SPINA 3 27.605 28.239 14.037 1.00 20.00SN
ATOM 4 N004 SPINA 3 30.333 29.182 15.464 1.00 20.00SN
ATOM 5 N005 SPINA 3 29.608 29.434 14.333 1.00 20.00SN
ATOM 6 N006 SPINA 3 29.303 29.830 13.317 1.00 20.00SN
ATOM 7 N007 SPINA 3 28.963 31.116 13.472 1.00 20.00SN
ATOM 8 N008 SPINA 3 28.859 28.743 13.828 1.00 20.00SN
ATOM 9 N009 SPINA 3 29.699 30.575 14.564 1.00 20.00SN
ATOM 10 N010 SPINA 3 29.518 29.194 15.301 1.00 20.00SN

Related

find text sequences and create new files with replacement text [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to find a way to write a script that does the following:
Open and detect the first use of a three-letter sequence that is repeated in the input file
Edit and permute this three letter sequence 19 times, giving 19 outputs each with a different three letter code that corresponds to a list of 19 possible three letter codes
Essentially, this is a fairly straightforward find and replace problem that I know how to do. The problem is that I then need to loop this so that, after creating the 19 files from the previous line, the next line with a different three letter code has the same replacement done to it.
I'm struggling to find a way to have the script recognize sequences of text when it can be one of twenty different things.
Let me know if anyone has any ideas on how I could go about doing this, I'll provide any clarification if necessary too!
Here is an example of an input file:
ATOM 1 N SER A 2 37.396 -5.247 -4.830 1.00 65.06 N
ATOM 2 CA SER A 2 37.881 -6.354 -3.929 1.00 64.88 C
ATOM 3 C SER A 2 36.918 -7.555 -3.786 1.00 64.14 C
ATOM 4 O SER A 2 37.287 -8.576 -3.177 1.00 64.31 O
ATOM 5 CB SER A 2 38.251 -5.804 -2.552 1.00 65.31 C
ATOM 6 OG SER A 2 37.122 -5.210 -1.918 1.00 66.94 O
ATOM 7 N GLU A 3 35.705 -7.438 -4.342 1.00 62.82 N
ATOM 8 CA GLU A 3 34.716 -8.539 -4.306 1.00 61.94 C
ATOM 9 C GLU A 3 35.126 -9.833 -5.033 1.00 59.71 C
ATOM 10 O GLU A 3 34.927 -10.911 -4.473 1.00 59.23 O
ATOM 11 CB GLU A 3 33.328 -8.094 -4.789 1.00 62.49 C
ATOM 12 CG GLU A 3 32.291 -7.994 -3.693 1.00 66.67 C
ATOM 13 CD GLU A 3 31.552 -9.302 -3.426 1.00 71.93 C
ATOM 14 OE1 GLU A 3 32.177 -10.254 -2.892 1.00 73.96 O
ATOM 15 OE2 GLU A 3 30.329 -9.364 -3.723 1.00 74.25 O
ATOM 16 N PRO A 4 35.663 -9.732 -6.280 1.00 57.83 N
ATOM 17 CA PRO A 4 36.131 -10.951 -6.967 1.00 56.64 C
Where an output would look like this:
ATOM 1 N ALA A 2 37.396 -5.247 -4.830 1.00 65.06 N
ATOM 2 CA SER A 2 37.881 -6.354 -3.929 1.00 64.88 C
ATOM 3 C SER A 2 36.918 -7.555 -3.786 1.00 64.14 C
ATOM 4 O SER A 2 37.287 -8.576 -3.177 1.00 64.31 O
ATOM 5 CB SER A 2 38.251 -5.804 -2.552 1.00 65.31 C
ATOM 6 OG SER A 2 37.122 -5.210 -1.918 1.00 66.94 O
ATOM 7 N GLU A 3 35.705 -7.438 -4.342 1.00 62.82 N
ATOM 8 CA GLU A 3 34.716 -8.539 -4.306 1.00 61.94 C
ATOM 9 C GLU A 3 35.126 -9.833 -5.033 1.00 59.71 C
ATOM 10 O GLU A 3 34.927 -10.911 -4.473 1.00 59.23 O
ATOM 11 CB GLU A 3 33.328 -8.094 -4.789 1.00 62.49 C
ATOM 12 CG GLU A 3 32.291 -7.994 -3.693 1.00 66.67 C
ATOM 13 CD GLU A 3 31.552 -9.302 -3.426 1.00 71.93 C
ATOM 14 OE1 GLU A 3 32.177 -10.254 -2.892 1.00 73.96 O
ATOM 15 OE2 GLU A 3 30.329 -9.364 -3.723 1.00 74.25 O
ATOM 16 N PRO A 4 35.663 -9.732 -6.280 1.00 57.83 N
ATOM 17 CA PRO A 4 36.131 -10.951 -6.967 1.00 56.64 C
On the first pass, the SER should be changed to a series of twenty different text sequences, the first being ALA. The issue I'm having is that I'm not sure how to write a script that will change more than one line of text.
My current script can form the 19 mutations of the first SER, but that's where it will stop. It won't mutate the next one, and it won't mutate a different three letter code, for example it wouldn't change the GLU. Is there any easy way to integrate this functionality?
Currently, the way I've approached this is to do a simple text transformation using sed, but as this seems more complicated than what sed can bring to the table, I think perl is likely the way to go. I can add the sed code, but I didn't think it would be of much help.

Your question and comments aren't entirely clear, but I believe this script will do what you want. It parses a PDB file until it reaches the amino acid of interest. A set of 19 files are produced where this AA is substituted by the other 19 AAs. From there onwards, every time an AA differs from the AA in the previous line, another set of 19 files will be generated.
#!/usr/bin/perl
use warnings;
use strict;
# we're going to start mutating when we find this residue.
my $target = 'GLU';
my #aas = ( 'ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLU', 'GLN', 'GLY', 'HIS', 'ILE', 'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL' );
my $prev = '';
my $line_no = 0;
my #lines;
my %changes;
# uncomment the following lines and comment out "while (<DATA>) {"
# to read the input from a file
# my $input = 'path/to/pdb_file';
# open( my $fh, "<", $input ) or die "Could not open $input: $!";
# while (<$fh>) {
while (<DATA>) {
# split the line into columns (assuming it is tab-delimited;
# switch this for "\s+" if it is separated with whitespace.
my #cols = split "\t";
if ($target && $cols[3] eq $target) {
# Found our target residue! unset $target so that the following
# set of tests are performed
undef $target;
}
# see if this AA is the same as the AA in the previous line
if (! $target && $prev ne $cols[3]) {
# if it isn't, store the line number and the amino acid
$changes{ $line_no } = $cols[3];
# update $prev to reflect the new AA
$prev = $cols[3];
}
# store all the lines
push #lines, $_;
# increment the line number
$line_no++;
}
# now, for each of the changes, create substitute files
for (keys %changes) {
create_substitutes($_, $changes{$_}, [#aas], [#lines]);
}
sub create_substitutes {
# arguments: line no, $res: residue, $aas: array of amino acids,
# $all_lines: all lines in the file
my ($line_no, $res, $aas, $all_lines) = #_;
# this is the target line that we want to substitute
my #target = split "\t", $all_lines->[$line_no];
# for each AA in the list of AAs, create a new file called 'XXX-##.txt',
# where XXX is the amino acid and ## is the line number where the
# substituted residue is.
for (#$aas) {
next if $_ eq $res;
open( my $fh, ">", $_."-$line_no.txt") or die "Could not create output file for $_: $!";
# print out all lines up to the changed line
print { $fh } #$all_lines[0..$line_no-1];
# print out the changed line, substituting in the AA
print { $fh } join "\t", #target[0..2], $_, #target[4..$#target];
# print out the rest of the lines.
print { $fh } #$all_lines[$line_no+1 .. $#{$all_lines}];
}
}
__DATA__
ATOM 1 N SER A 2 37.396 -5.247 -4.830 1.00 65.06 N
ATOM 2 CA SER A 2 37.881 -6.354 -3.929 1.00 64.88 C
ATOM 3 C SER A 2 36.918 -7.555 -3.786 1.00 64.14 C
ATOM 4 O SER A 2 37.287 -8.576 -3.177 1.00 64.31 O
ATOM 5 CB SER A 2 38.251 -5.804 -2.552 1.00 65.31 C
ATOM 6 OG SER A 2 37.122 -5.210 -1.918 1.00 66.94 O
ATOM 7 N GLU A 3 35.705 -7.438 -4.342 1.00 62.82 N
ATOM 8 CA GLU A 3 34.716 -8.539 -4.306 1.00 61.94 C
ATOM 9 C GLU A 3 35.126 -9.833 -5.033 1.00 59.71 C
ATOM 10 O GLU A 3 34.927 -10.911 -4.473 1.00 59.23 O
ATOM 11 CB GLU A 3 33.328 -8.094 -4.789 1.00 62.49 C
ATOM 12 CG GLU A 3 32.291 -7.994 -3.693 1.00 66.67 C
ATOM 13 CD GLU A 3 31.552 -9.302 -3.426 1.00 71.93 C
ATOM 14 OE1 GLU A 3 32.177 -10.254 -2.892 1.00 73.96 O
ATOM 15 OE2 GLU A 3 30.329 -9.364 -3.723 1.00 74.25 O
ATOM 16 N PRO A 4 35.663 -9.732 -6.280 1.00 57.83 N
ATOM 17 CA PRO A 4 36.131 -10.951 -6.967 1.00 56.64 C
ATOM 18 CA ARG A 4 36.131 -10.951 -6.967 1.00 56.64 C
This example data will produce a set of files for the first GLU found (line 6), then another set for line 15 (PRO residue), and another set for line 17 (ARG residue).
Example of ALA-6.txt file:
ATOM 1 N SER A 2 37.396 -5.247 -4.830 1.00 65.06 N
ATOM 2 CA SER A 2 37.881 -6.354 -3.929 1.00 64.88 C
ATOM 3 C SER A 2 36.918 -7.555 -3.786 1.00 64.14 C
ATOM 4 O SER A 2 37.287 -8.576 -3.177 1.00 64.31 O
ATOM 5 CB SER A 2 38.251 -5.804 -2.552 1.00 65.31 C
ATOM 6 OG SER A 2 37.122 -5.210 -1.918 1.00 66.94 O
ATOM 7 N ALA A 3 35.705 -7.438 -4.342 1.00 62.82 N
ATOM 8 CA GLU A 3 34.716 -8.539 -4.306 1.00 61.94 C
ATOM 9 C GLU A 3 35.126 -9.833 -5.033 1.00 59.71 C
(etc.)
If this isn't the correct behaviour, you'll have to edit your question as it isn't very clear!

Because your question isn't very clear (more precisely, it is totally unclear), i created the following:
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
use Path::Tiny;
use Bio::PDB::Structure;
use Data::Dumper;
my $residues_file = "input2.txt"; #residue names, one per line
my $molfile = "m1.pdb"; #molecule file
#read the residues
my(#residues) = path($residues_file)->lines({chomp => 1});
my $m= Bio::PDB::Structure::Molecule->new;
for my $res (#residues) { #for each residue name from a file "input2.txt"
$m->read("m1.pdb"); #read the molecule
my $atom = $m->atom(0); #get the 1st atom
$atom->residue_name($res); #change the residue to the from file
#create output filename
my $outfile = path($molfile)->basename('.pdb') . '_' . lc($res) . '.pdb';
#write the result
$m->print($outfile);
}
for example, if the input2.txt contains
ALA
ARG
ASN
ASP
CYS
GLN
GLU
GLY
HIS
ILE
LEU
LYS
MET
PHE
PRO
SER
THR
TRP
TYR
VAL
the from your input, generates 20 files where the residue in the 1st atom is changed (according to your output example) to like:
==> m1_ala.pdb <==
ATOM 1 N ALA A 2 37.396 -5.247 -4.830 1.00 65.06
==> m1_arg.pdb <==
ATOM 1 N ARG A 2 37.396 -5.247 -4.830 1.00 65.06
==> m1_asn.pdb <==
ATOM 1 N ASN A 2 37.396 -5.247 -4.830 1.00 65.06
==> m1_asp.pdb <==
ATOM 1 N ASP A 2 37.396 -5.247 -4.830 1.00 65.06
==> m1_cys.pdb <==
ATOM 1 N CYS A 2 37.396 -5.247 -4.830 1.00 65.06
... etc, 20 times...

Field manipulation

I have some text files and I need to remove the first character from the fourth column only if the column has four characters
file1 as follows
ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N AARG K 408 12.186 3.982 29.147 0.50 6.55 N
file2 as follows
ATOM 41 CA ATRP A 6 -18.975 -29.894 -7.425 0.50 19.50 C
ATOM 42 CA BTRP A 6 -18.979 -29.890 -7.428 0.50 19.16 C
ATOM 43 C HIS A 6 -18.091 -29.845 -8.669 1.00 19.84 C
ATOM 44 O HIS A 6 -17.015 -30.452 -8.696 1.00 20.10 O
ATOM 45 CB ASER A 9 -18.499 -28.879 -6.370 0.50 19.73 C
ATOM 46 CB BSER A 9 -18.565 -28.837 -6.367 0.50 19.13 C
ATOM 47 CG CHIS A 12 -19.421 -27.711 -6.216 0.50 21.30 C
Desired output
file1
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
file2
ATOM 41 CA TRP A 6 -18.975 -29.894 -7.425 0.50 19.50 C
ATOM 42 CA TRP A 6 -18.979 -29.890 -7.428 0.50 19.16 C
ATOM 43 C HIS A 6 -18.091 -29.845 -8.669 1.00 19.84 C
ATOM 44 O HIS A 6 -17.015 -30.452 -8.696 1.00 20.10 O
ATOM 45 CB SER A 9 -18.499 -28.879 -6.370 0.50 19.73 C
ATOM 46 CB SER A 9 -18.565 -28.837 -6.367 0.50 19.13 C
ATOM 47 CG HIS A 12 -19.421 -27.711 -6.216 0.50 21.30 C

This might work for you (GNU sed):
sed -r 's/^((\S+\s+){3})\S(\S{3}\s)/\1 \3/' file
This replaces the first character of the fourth column with a space if that column has four non-space characters.

Use the length() function to find the length of the column and the substr() function to print the substring you need:
$ awk 'length($4)==4{$4=substr($4,2)}1' file | column -t
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
Piping to column -t rebuilds a nice table format. To store the changes back to a file uses the redirection operator:
$ awk 'length($4)==4{$4=substr($4,2)}1' file | column -t > new_file
With sed you could do:
$ sed -r 's/^((\S+\s+){3})\S(\S{3}\s)/\1\3/' file
ATOM 5181 N MET K 406 12.440 6.552 25.691 0.50 7.37 N
ATOM 5182 CA MET K 406 13.685 5.798 25.578 0.50 5.87 C
ATOM 5183 C MET K 406 14.045 5.179 26.909 0.50 5.07 C
ATOM 5184 O MET K 406 14.595 4.083 27.003 0.50 7.07 O
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5185 CB MET K 406 14.812 6.674 25.044 0.50 6.80 C
ATOM 5202 N ARG K 408 12.186 3.982 29.147 0.50 6.55 N
To store the changes back to the original file you can use the -i option:
$ sed -ri 's/^((\S+\s+){3})\S(\S{3}\s)/\1\3/' file

Data restructure based on the values using perl

My data looks like this,
1 20010101 945 A 6
1 20010101 946 B 4
1 20010101 947 P 3.5
1 20010101 950 A 5
1 20010101 951 P 4
1 20010101 952 P 4
1 20010101 1010 A 4
1 20010101 1011 P 4
2 20010101 940 A 3.5
2 20010101 1015 A 3
2 20010101 1113 B 3.5
2 20010101 1114 P 3.2
2 20010101 1115 B 3.4
2 20010101 1116 P 3.1
2 20010101 1119 P 3.6
I am trying to find all the lines (with P) followed by the latest A and B values based on the matching of first two columns (e.g., 1 and 20010101).
The result is expected to be like this,
1 20010101 947 P 3.5 6 4
1 20010101 951 P 4 5 4
1 20010101 952 P 4 5 4
1 20010101 1011 P 4 4 4
2 20010101 1114 P 3.2 3 3.5
2 20010101 1116 P 3.1 3 3.4
2 20010101 1119 P 3.6 3 3.4
Does it need to sort by using hash in Perl? I am lack of ideas could anybody give any hint? I will be much appreciated!

perl -ane 'if($F[3] eq "P"){ s/$/ $la $lb/; print; }else{ ($la,$lb) = ($F[3] eq "A")?($F[4],$lb):($la,$F[4]) }' data.txt

Simplest solved with a simple if-elsif structure:
use strict;
use warnings;
my ($A, $B);
while (<DATA>) {
my #data = split;
if ($data[3] eq "A") {
$A = $data[4];
} elsif ($data[3] eq "B") {
$B = $data[4];
} elsif ($data[3] eq "P") {
print join("\t", #data, $A, $B), "\n";
}
}
__DATA__
1 20010101 945 A 6
1 20010101 946 B 4
1 20010101 947 P 3.5
1 20010101 950 A 5
1 20010101 951 P 4
1 20010101 952 P 4
1 20010101 1010 A 4
1 20010101 1011 P 4
2 20010101 940 A 3.5
2 20010101 1015 A 3
2 20010101 1113 B 3.5
2 20010101 1114 P 3.2
2 20010101 1115 B 3.4
2 20010101 1116 P 3.1
2 20010101 1119 P 3.6
Output:
1 20010101 947 P 3.5 6 4
1 20010101 951 P 4 5 4
1 20010101 952 P 4 5 4
1 20010101 1011 P 4 4 4
2 20010101 1114 P 3.2 3 3.5
2 20010101 1116 P 3.1 3 3.4
2 20010101 1119 P 3.6 3 3.4
You might want to compensate for possible empty/undefined/old values in $A and $B.

matlab: change matrix

I have a code that load cell array and convert them to matrix.
now this matrix shows 4 numbers after floating point for example
0 5 15 1 51,9000 3,4000
0 5 15 1 51,9000 3,4000
0 5 15 1 51,9000 3,4000
how can I change all af the rows to just show 2 numbers after the floating point ?
please consider that I want to change the matrix not print it in command window !

If you want to see it in the command window/editor for debugging purposes, use bank format:
format bank;
Example:
A =[ 51.213123 6.132434]
format bank
disp(A);
Will result in :
A =
51.21 6.13
Also, you can use sprintf
A = [51.900 3.4000];
disp(sprintf('%2.2f ',A));

x = [0 5 15 1 51.9000 3.4000
0 5 15 1 51.9000 3.4000
0 5 15 1 51.9000 3.4000];
fprintf([repmat('%.2f ',1,size(x,2)) '\n'], x')
0.00 5.00 15.00 1.00 51.90 3.40
0.00 5.00 15.00 1.00 51.90 3.40
0.00 5.00 15.00 1.00 51.90 3.40

match a pattern and print subsequent lines

there are 200 files named File1_0.pdb,File1_60.pdb etc....it looks like:
ATOM 1 N VAL 1 8.897 -21.545 -7.276 1.00 0.00
ATOM 2 H1 VAL 1 9.692 -22.015 -6.868 1.00 0.00
ATOM 3 H2 VAL 1 9.228 -20.766 -7.827 1.00 0.00
ATOM 4 H3 VAL 1 8.289 -22.236 -7.693 1.00 0.00
TER
ATOM 5 CA VAL 1 8.124 -20.953 -6.203 1.00 0.00
ATOM 6 HA VAL 1 8.072 -19.874 -6.345 1.00 0.00
ATOM 7 CB VAL 1 6.693 -21.515 -6.176 1.00 0.00
ATOM 8 HB VAL 1 6.522 -22.024 -5.227 1.00 0.00
ATOM 9 CG1 VAL 1 5.684 -20.370 -6.330 1.00 0.00
ATOM 10 1HG1 VAL 1 5.854 -19.861 -7.279 1.00 0.00
i have to extract the part after TER and put in a different file...this has to be done on all 200 files. I did something like sed '1,/TER/d' File1_0.pdb > 1_0.pdb. But this will work for one file at a time...can there be a solution for all 200 files in one go... output file is named same only "File" is removed from the name...

for i in *.pdb; do sed '1,/TER/d' $i > ${i/File/}; done

This might work:
seq 0 200| xargs -i -n1 cp File1_{}.pdb 1_{}.pbd # backup files
sed -si '1,/TER/d' 1_{0..200}.pdb # edit files separately inline

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Edit text columns - sed

If you are interesting to resolve it with pure shell, here is the code: while IFS="\n" read -r line do n=${line:9:3} printf "%sN%03d%s\n" "${line:0:12}" $n "${line:16}" done < file

Related

find text sequences and create new files with replacement text [closed]

Field manipulation

Data restructure based on the values using perl

matlab: change matrix

match a pattern and print subsequent lines

Categories

Resources