Write old fasta header and new to file - perl

I want to extract the old fasta names which looks something like this:
>Bartonella bibbi
AUUCCGGUUGAUCCUGCCGGAGGCCACUGCUAUCGGGGUCCG
The new headers should look like this:
>Seq1
AUUCCGGUUGAUCCUGCCGGAGGCCACUGCUAUCGGGGUCCG
and so on...
The Bartonella Bibbi should be saved together with the new name Seq1 in a new file an so on. So I've started a bit, by looking for lines with >, and then I split to get an array to get the old name. I don't know how to continue, because I want two things here, first to put the new name in there, but also extracting the old name together with the new in a file, and ALSO get an output file with my sequence and my new names. Please, any input from you will help!
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
while (my $line = <$IN>) {
if ($line =~ /^>/) {
my #header = split (/\>/, $line);
my $oldfasta = "$header[1]";
}
}
So after some edits, this is the current script:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $seqid = 1;
my %id;
while (my $line = <$IN>) {
if ($line =~ /^>/) {
$id{"Seq$seqid "} = $line;
print ">Seq$seqid\n";
$seqid++
} else {
print $line;
}
}
my $outfile = 'output';
open my $OUT, '>', $outfile or die "Could not open $outfile: $!, $?"; # overwrites the file $outfile;
print $OUT %id;
This gives me a file that looks like this:
Seq29 >Sulfophobococcus_zilligii
Seq20 >Pyrococcus_shinkaii
and so on.
They are not in order, how do I sort them and get rid of the > in the species name?

You’re simply not printing anything. Once you add a print statement, it should work.
In addition, it’s unclear what you’re using split for. Just increase a counter for the sequence:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $seqid = 1;
while (my $line = <$IN>) {
if ($line =~ /^>/) {
print ">Seq$seqid\n";
$seqid++;
} else {
print $line;
}
}

Simply write the new entries as you create them.
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $outfile = 'output';
open my $OUT, '>', $outfile or die "Could not open $outfile: $!, $?"; # overwrites the file $outfile;
my $seqid = 1;
while (my $line = <$IN>) {
if ($line =~ /^>(.+)/) {
print $OUT "Seq$seqid\t$1\n"
print ">Seq$seqid\n";
$seqid++
} else {
print $line;
}
}
I tried to fix the indentation but left the gratutious variable for the $OUT file name.
If you want to keep the mapping in memory for other reasons (maybe to develop this into a much more complex script) using an array instead of a hash would seem like a natural way to keep the entries sorted; the new label is trivially derivable from the array index.

Related

perl to merge csv files removing the headings

I have several monthly reports in csv format in a folder. The csv files all have 8 common columns (with headings) . Using perl, I would like to merge these files together line by line.
say
file 1:
1,2,3,4,5,6,7,8,
a1,b1,c1,d1,e1,f1,g1,h1,
a1,b1,c1,d1,e1,f1,g1,h1,
a1,b1,c1,d1,e1,f1,g1,h1,
file 2:
1,2,3,4,5,6,7,8,
a2,b2,c2,d2,e2,f2,g2,h2,
a2,b2,c2,d2,e2,f2,g2,h2,
a2,b2,c2,d2,e2,f2,g2,h2,
I would like the output to look something like that (join the rows and remove the headings)
output:
1,2,3,4,5,6,7,8,
a1,b1,c1,d1,e1,f1,g1,h1,
a1,b1,c1,d1,e1,f1,g1,h1,
a1,b1,c1,d1,e1,f1,g1,h1,
a2,b2,c2,d2,e2,f2,g2,h2,
a2,b2,c2,d2,e2,f2,g2,h2,
a2,b2,c2,d2,e2,f2,g2,h2,
I have managed to save the names of the files in an array. but for some reason, I could not join them.
can you please help me figure out what is wrong with my code. I am quite new to perl.
#! C:Strawberry/perl/bin;
use feature ':5.12';
use strict;
use warnings;
my $data_directory = 'R:/testing_data/';
opendir( DIR, $data_directory ) or die "Could not open $data_directory $!\n";
my #files = grep {/_monthlyreport\.csv$/} readdir(DIR); #to get on the monthly reports csv files
foreach my $file (#files) {
open( HANR, "<", '$data_directory' . my $files ) or die "cannot open $files: $!"; #read handler
open( HANW, ">>", "G:/outputfile_script.csv" ) or die "error $! \n"; #write handler for creating new sorted files
my #lines = ();
#lines = <HANR>;
foreach my $line (#lines) {
chomp($line);
my $count++;
next unless $count; # skip header i.e the first line containing stock details
print HANW join $line, "\n";
}
my $count = -1;
close(HANW);
close(HANR);
}
closedir(DIR);
exit 0;
Your open statement to your input filehandle is malformed, and my $count++; is also broken.
I'd also recommend modernizing your code by using lexical file handles. The following is a cleaned up version of your code:
use feature ':5.12';
use strict;
use warnings;
use autodie;
my $data_directory = 'R:/testing_data/';
opendir my $dh, "$data_directory";
open my $outfh, ">>", "G:/outputfile_script.csv";
my $seenheader = 0;
while (my $file = readdir $dh) {
next unless $file =~ /_monthlyreport\.csv$/;
open my $infh, '<', "$data_directory/$file";
while (<$infh>) {
print $outfh $_ if $. > 1 || ! $seenheader++;
}
}
This line is wrong.
open(HANR ,"<",'$data_directory'.my $files) or die "cannot open $files: $!";
Should be
open(HANR ,"<","$data_directory".$files) or die "cannot open $files: $!";
Add a counter and stop printing if the counter equals 0;
#! C:Strawberry/perl/bin;
use feature ':5.12';
use strict;
use warnings;
my $data_directory = 'R:/testing_data/';
opendir(DIR,$data_directory) or die "Could not open $data_directory $!\n";
my #files = grep {/_monthlyreport\.csv$/} readdir(DIR); #to get on the monthly reports csv files
foreach my $file (#files) {
open(HANR ,"<",'$data_directory'.my $files) or die "cannot open $files: $!"; #read handler
open(HANW , ">>","G:/outputfile_script.csv") or die "error $! \n"; #write handler for creating new sorted files
my #lines=();
#lines=<HANR>;
my $i =0;
foreach my $line (#lines){
next if ($i==0) ;
chomp ($line) ;
my $count++;
next unless $count; # skip header i.e the first line containing stock details
print HANW join $line,"\n";
}
my $count= -1;
close(HANW);
close(HANR);
}
closedir(DIR);
exit 0;

Can't write to the file

Why can't I write output to the input file?
It prints it well, but isn't writing to the file.
my $i;
my $regex = $ARGV[0];
for (#ARGV[1 .. $#ARGV]){
open (my $fh, "<", "$_") or die ("Can't open the file[$_] ");
$i++;
foreach (<$fh>){
open (my $file, '>>', '/results.txt') or die ("Can't open the file "); #input file
for (<$file>){
print "Given regexp: $regex\nfile$i:\n line $.: $1\n" if $_ =~ /\b($regex)\b/;
}
}
}
It's unclear whether your problem has been solved.
My best guess is that you want your program to search for the regex passed as the first parameter in the files named in the following paramaters, appending the results to results.txt.
If that is right, then this is closer to what you need
use strict;
use warnings;
use autodie;
my $i;
my $regex = shift;
open my $out, '>>', 'results.txt';
for my $filename (#ARGV) {
open my $fh, '<', $filename;
++$i;
while (<$fh>) {
next unless /\b($regex)\b/;
print $out "Given regexp: $regex\n";
print $out "file$i:\n";
print $out "line $.: $1\n";
last;
}
}

Extracting specific multiple line of records that is pipe delimited in perl

I have a file that looks like
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
I want to separate the records by country. I have stored each line into array variable #fields
my #fields = split(/\|/, $_ );
making $fields[3] as my basis for sorting it. I wanted it to separate into 2 output text files
OUTPUT TEXT FILE 1:
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
OUTPUT TEXT FILE 2
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
Putting all that is from JPN to output text 1 & non-JPN country to output text file 2
here's the code that what trying to work out
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my $count;
;
my ($line, $i);
my $filename = 'data.txt';
open(my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open(OUTPUTA, ">", 'JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
open(OUTPUTB, ">", 'Non-JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
my $fh;
while (<$input_fh>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
for ($i=1; $i < #fields; $i++) {
if ($fields[3] eq 'JPN') {
$fh = $_;
print OUTPUTA $fh;
}
else {
$fh = $_;
print OUTPUTB $fh;
}
}
}
}
close(OUTPUTA);
close(OUTPUTB)
Still has no luck on it :(
Here is the way I think ikegami was saying, but I've never tried this before (although it gave the correct results).
#!/usr/bin/perl
use strict;
use warnings;
open my $jpn_fh, ">", 'o33.txt' or die $!;
open my $other_fh, ">", 'o44.txt' or die $!;
my $fh;
while (<DATA>) {
if (/^NAME/) {
if (/JPN$/) {
$fh = $jpn_fh;
}
else {
$fh = $other_fh;
}
}
print $fh $_;
}
close $jpn_fh or die $!;
close $other_fh or die $!;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
You didn't say what you needed help with, so I'm assuming it's coming up with an algorithm. Here's a good one:
Open the file to read.
Open the file for the JPN entries.
Open the file for the non-JPN entries.
While not eof,
Read a line.
Parse the line.
If it's the first line of a record,
If the person's country is JPN,
Set current file handle to the file handle for JPN entries.
Else,
Set current file handle to the file handle for non-JPN entries.
Print the line to the current file handle.
my $jpn_qfn = '...';
my $other_qfn = '...';
open(my $jpn_fh, '>', $jpn_qfn)
or die("Can't create $jpn_qfn: $!\n");
open(my $other_fh, '>', $other_qfn)
or die("Can't create $other_qfn: $!\n");
my $fh;
while (<>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
$fh = $fields[3] eq 'JPN' ? $jpn_fh : $other_fh;
}
say $fh $_;
}
#!/usr/bin/env perl
use 5.012;
use autodie;
use strict;
use warnings;
# store per country output filehandles
my %output;
# since this is just an example, read from __DATA__ section
while (my $line = <DATA>) {
# split the fields
my #cells = split /[|]/, $line;
# if first field is NAME, this is a new record
if ($cells[0] eq 'NAME') {
# get the country code, strip trailing whitespace
(my $country = $cells[3]) =~ s/\s+\z//;
# if we haven't created and output file for this
# country, yet, do so
unless (defined $output{$country}) {
open my $fh, '>', "$country.out";
$output{$country} = $fh;
}
my $out = $output{$country};
# output this and the next two lines to
# country specific output file
print $out $line, scalar <DATA>, scalar <DATA>;
}
}
close $_ for values %output;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
Thanks for your Help heaps
I was able to solved this problem in perl,
many thanks
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my ($rec_type, $country);
my $filename = 'data.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'o33.txt' or die $!;
open my $OUTPUTB, ">", 'o44.txt' or die $!;
my $Combline;
while (<$input_fh>) {
$_ = _trim($_);
#fields = split (/\|/, $_);
$rec_type = $fields[0];
$country = $fields[3];
if ($rec_type eq 'NAME') {
if ($country eq 'JPN') {
*Combline = $OUTPUTA;
}
else {
*Combline = $OUTPUTB;
}
}
print Combline;
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}

copy text after a specific string from a file and append to another in perl

I want to extract the desired information from a file and append it into another. the first file consists of some lines as the header without a specific pattern and just ends with the "END OF HEADER" string. I wrote the following code for find the matching line for end of the header:
$find = "END OF HEADER";
open FILEHANDLE, $filename_path;
while (<FILEHANDLE>) {
my $line = $_;
if ($line =~ /$find/) {
#??? what shall I do here???
}
}
, but I don't know how can I get the rest of the file and append it to the other file.
Thank you for any help
I guess if the content of the file isn't enormous you can just load the whole file in a scalar and just split it with the "END OF HEADER" then print the output of the right side of the split in the new file (appending)
open READHANDLE, 'readfile.txt' or die $!;
my $content = do { local $/; <READHANDLE> };
close READHANDLE;
my (undef,$restcontent) = split(/END OF HEADER/,$content);
open WRITEHANDLE, '>>writefile.txt' or die $!;
print WRITEHANDLE $restcontent;
close WRITEHANDLE;
This code will take the filenames from the command line, print all files up to END OF HEADER from the first file, followed by all lines from the second file. Note that the output is sent to STDOUT so you will have to redirect the output, like this:
perl program.pl headfile.txt mainfile.txt > newfile.txt
Update Now modified to print all of the first file after the line END OF HEADER followed by all of the second file
use strict;
use warnings;
my ($header_file, $main_file) = #ARGV;
open my $fh, '<', $header_file or die $!;
my $print;
while (<$fh>) {
print if $print;
$print ||= /END OF HEADER/;
}
open $fh, '<', $main_file or die $!;
print while <$fh>;
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('readfile.txt');
while ( my $line = shift #lines) {
next unless ($line =~ m/END OF HEADER/);
last;
}
append_file('writefile.txt', #lines);
I believe this will do what you need:
use strict;
use warnings;
my $find = 'END OF HEADER';
my $fileContents;
{
local $/;
open my $fh_read, '<', 'theFile.txt' or die $!;
$fileContents = <$fh_read>;
}
my ($restOfFile) = $fileContents =~ /$find(.+)/s;
open my $fh_write, '>>', 'theFileToAppend.txt' or die $!;
print $fh_write $restOfFile;
close $fh_write;
my $status = 0;
my $find = "END OF HEADER";
open my $fh_write, '>', $file_write
or die "Can't open file $file_write $!";
open my $fh_read, '<', $file_read
or die "Can't open file $file_read $!";
LINE:
while (my $line = <$fh_read>) {
if ($line =~ /$find/) {
$status = 1;
next LINE;
}
print $fh_write $line if $status;
}
close $fh_read;
close $fh_write;

merging two files using perl keeping the copy of original file in other file

I have to files like A.ini and B.ini ,I want to merge both the files in A.ini
examples of files:
A.ini::
a=123
b=xyx
c=434
B.ini contains:
a=abc
m=shank
n=paul
my output in files A.ini should be like
a=123abc
b=xyx
c=434
m=shank
n=paul
I want to this merging to be done in perl language and I want to keep the copy of old A.ini file at some other place to use old copy
A command line variant:
perl -lne '
($a, $b) = split /=/;
$v{$a} = $v{$a} ? $v{$a} . $b : $_;
END {
print $v{$_} for sort keys %v
}' A.ini B.ini >NEW.ini
How about:
#!/usr/bin/perl
use strict;
use warnings;
my %out;
my $file = 'path/to/A.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
$out{$key} = $val;
}
close $fh;
$file = 'path/to/B.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
if (exists $out{$key}) {
$out{$key} .= $val;
} else {
$out{$key} = $val;
}
}
close $fh;
$file = 'path/to/A.ini';
open my $fh, '>', $file or die "unable to open '$file' for writing: $!";
foreach(keys %out) {
print $fh $_,'=',$out{$_},"\n";
}
close $fh;
The two files to be merged can be read in a single pass and don't need to be treated as separate source files. That allows the use of <> to read all files passed as parameters on the command line.
Keeping a backup copy of A.ini is simply a matter of renaming it before writing the merged data to a new file of the same name.
This program appears to do what you need.
use strict;
use warnings;
my $file_a = $ARGV[0];
my (#keys, %values);
while (<>) {
if (/\A\s*(.+?)\s*=\s*(.+?)\s*\z/) {
push #keys, $1 unless exists $values{$1};
$values{$1} .= $2;
}
}
rename $file_a, "$file_a.bak" or die qq(Unable to rename "$file_a": $!);
open my $fh, '>', $file_a or die qq(Unable to open "$file_a" for output: $!);
printf $fh "%s=%s\n", $_, $values{$_} for #keys;
output (in A.ini)
a=123abc
b=xyx
c=434
m=shank
n=paul