copy text after a specific string from a file and append to another in perl - perl

I want to extract the desired information from a file and append it into another. the first file consists of some lines as the header without a specific pattern and just ends with the "END OF HEADER" string. I wrote the following code for find the matching line for end of the header:
$find = "END OF HEADER";
open FILEHANDLE, $filename_path;
while (<FILEHANDLE>) {
my $line = $_;
if ($line =~ /$find/) {
#??? what shall I do here???
}
}
, but I don't know how can I get the rest of the file and append it to the other file.
Thank you for any help

I guess if the content of the file isn't enormous you can just load the whole file in a scalar and just split it with the "END OF HEADER" then print the output of the right side of the split in the new file (appending)
open READHANDLE, 'readfile.txt' or die $!;
my $content = do { local $/; <READHANDLE> };
close READHANDLE;
my (undef,$restcontent) = split(/END OF HEADER/,$content);
open WRITEHANDLE, '>>writefile.txt' or die $!;
print WRITEHANDLE $restcontent;
close WRITEHANDLE;

This code will take the filenames from the command line, print all files up to END OF HEADER from the first file, followed by all lines from the second file. Note that the output is sent to STDOUT so you will have to redirect the output, like this:
perl program.pl headfile.txt mainfile.txt > newfile.txt
Update Now modified to print all of the first file after the line END OF HEADER followed by all of the second file
use strict;
use warnings;
my ($header_file, $main_file) = #ARGV;
open my $fh, '<', $header_file or die $!;
my $print;
while (<$fh>) {
print if $print;
$print ||= /END OF HEADER/;
}
open $fh, '<', $main_file or die $!;
print while <$fh>;

use strict;
use warnings;
use File::Slurp;
my #lines = read_file('readfile.txt');
while ( my $line = shift #lines) {
next unless ($line =~ m/END OF HEADER/);
last;
}
append_file('writefile.txt', #lines);

I believe this will do what you need:
use strict;
use warnings;
my $find = 'END OF HEADER';
my $fileContents;
{
local $/;
open my $fh_read, '<', 'theFile.txt' or die $!;
$fileContents = <$fh_read>;
}
my ($restOfFile) = $fileContents =~ /$find(.+)/s;
open my $fh_write, '>>', 'theFileToAppend.txt' or die $!;
print $fh_write $restOfFile;
close $fh_write;

my $status = 0;
my $find = "END OF HEADER";
open my $fh_write, '>', $file_write
or die "Can't open file $file_write $!";
open my $fh_read, '<', $file_read
or die "Can't open file $file_read $!";
LINE:
while (my $line = <$fh_read>) {
if ($line =~ /$find/) {
$status = 1;
next LINE;
}
print $fh_write $line if $status;
}
close $fh_read;
close $fh_write;

Related

Write old fasta header and new to file

I want to extract the old fasta names which looks something like this:
>Bartonella bibbi
AUUCCGGUUGAUCCUGCCGGAGGCCACUGCUAUCGGGGUCCG
The new headers should look like this:
>Seq1
AUUCCGGUUGAUCCUGCCGGAGGCCACUGCUAUCGGGGUCCG
and so on...
The Bartonella Bibbi should be saved together with the new name Seq1 in a new file an so on. So I've started a bit, by looking for lines with >, and then I split to get an array to get the old name. I don't know how to continue, because I want two things here, first to put the new name in there, but also extracting the old name together with the new in a file, and ALSO get an output file with my sequence and my new names. Please, any input from you will help!
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
while (my $line = <$IN>) {
if ($line =~ /^>/) {
my #header = split (/\>/, $line);
my $oldfasta = "$header[1]";
}
}
So after some edits, this is the current script:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $seqid = 1;
my %id;
while (my $line = <$IN>) {
if ($line =~ /^>/) {
$id{"Seq$seqid "} = $line;
print ">Seq$seqid\n";
$seqid++
} else {
print $line;
}
}
my $outfile = 'output';
open my $OUT, '>', $outfile or die "Could not open $outfile: $!, $?"; # overwrites the file $outfile;
print $OUT %id;
This gives me a file that looks like this:
Seq29 >Sulfophobococcus_zilligii
Seq20 >Pyrococcus_shinkaii
and so on.
They are not in order, how do I sort them and get rid of the > in the species name?
You’re simply not printing anything. Once you add a print statement, it should work.
In addition, it’s unclear what you’re using split for. Just increase a counter for the sequence:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $seqid = 1;
while (my $line = <$IN>) {
if ($line =~ /^>/) {
print ">Seq$seqid\n";
$seqid++;
} else {
print $line;
}
}
Simply write the new entries as you create them.
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $outfile = 'output';
open my $OUT, '>', $outfile or die "Could not open $outfile: $!, $?"; # overwrites the file $outfile;
my $seqid = 1;
while (my $line = <$IN>) {
if ($line =~ /^>(.+)/) {
print $OUT "Seq$seqid\t$1\n"
print ">Seq$seqid\n";
$seqid++
} else {
print $line;
}
}
I tried to fix the indentation but left the gratutious variable for the $OUT file name.
If you want to keep the mapping in memory for other reasons (maybe to develop this into a much more complex script) using an array instead of a hash would seem like a natural way to keep the entries sorted; the new label is trivially derivable from the array index.

Can't write to the file

Why can't I write output to the input file?
It prints it well, but isn't writing to the file.
my $i;
my $regex = $ARGV[0];
for (#ARGV[1 .. $#ARGV]){
open (my $fh, "<", "$_") or die ("Can't open the file[$_] ");
$i++;
foreach (<$fh>){
open (my $file, '>>', '/results.txt') or die ("Can't open the file "); #input file
for (<$file>){
print "Given regexp: $regex\nfile$i:\n line $.: $1\n" if $_ =~ /\b($regex)\b/;
}
}
}
It's unclear whether your problem has been solved.
My best guess is that you want your program to search for the regex passed as the first parameter in the files named in the following paramaters, appending the results to results.txt.
If that is right, then this is closer to what you need
use strict;
use warnings;
use autodie;
my $i;
my $regex = shift;
open my $out, '>>', 'results.txt';
for my $filename (#ARGV) {
open my $fh, '<', $filename;
++$i;
while (<$fh>) {
next unless /\b($regex)\b/;
print $out "Given regexp: $regex\n";
print $out "file$i:\n";
print $out "line $.: $1\n";
last;
}
}

To replace a string and append a string in perl in 1 file

I want to replace a line in my file and after replacing it I want to append another line. As you can see here, I have to open and close files for 2 times. Can I do it by opening a file only once? Thanks
use strict;
use warnings;
open(FILE,"tmp1.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
my #lines = <FILE>;
my #newlines;
for each(#lines) {
$_ =~ s/hello/hi/g;
push(#newlines,$_);
}
close(FILE);
open(FILE, "> tmp1.txt ") || die "File not found";
print FILE #newlines;
close(FILE);
open(FILE,"tmp1.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
my #lines = <FILE>;
my $first_line = "hi";
my $second_line = "sun";
my $insert = "good morning";
$file =~ s/\Q$first_line\E\n\Q$second_line\E/$first_line\n$insert\n$second_line/;
open(OUTPUT,"> tmp3.txt") || die "Can't open file: $!";
print OUTPUT $file;
close(OUTPUT);
Use Three-arg open and open your file in Read+Write mode by +<.

Read Increment Then Write to a text file in perl

I have this little perl script which opens a txt file, reads the number in it, then overwrites the file with the number incremented by 1. I can open and read from the file, I can write to the file but I"m having issues overwriting. In addition, I'm wondering if there is a way to do this without opening the file twice. Here's my code:
#!/usr/bin/perl
open (FILE, "<", "data.txt") or die "$! error trying to a\
ppend";
undef $/;
$number = <FILE>;
$number = int($number);
$myNumber = $number++;
print $myNumber+'\n';
close(FILE);
open(FILE, ">data.txt") or die "$! error";
print FILE $myNumber;
close(FILE);
Change the line
$myNumber = $number++;
to
$myNumber = $number+1;
That should solve the problem.
Below is how you could do by opening the file just once:
open(FILE, "+<data.txt") or die "$! error";
undef $/;
$number = <FILE>;
$number = int($number);
$myNumber = $number+1;
seek(FILE, 0, 0);
truncate(FILE, tell FILE);
print $myNumber+"\n";
print FILE $myNumber;
close(FILE);
It's good that you used the three-argument form of open the first time. You also needed to do that in your second open. Also, you should use lexical variables, i.e., those which begin with my, in your script--even for your file handles.
You can just increment the variable that holds the number, instead of passing it to a new variable. Also, it's a good idea to use chomp. This things being said, consider the following option:
#!/usr/bin/env perl
use strict;
use warnings;
undef $/;
open my $fhIN, "<", "data.txt" or die "Error trying to open for reading: $!";
chomp( my $number = <$fhIN> );
close $fhIN;
$number++;
open my $fhOUT, ">", "data.txt" or die "Error trying to open for writing: $!";
print $fhOUT $number;
close $fhOUT;
Another option is to use the Module File::Slurp, letting it handle all the I/O operations:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Slurp qw/edit_file/;
edit_file { chomp; $_++ } 'data.txt';
Try this:
#!/usr/bin/perl
use strict;
use warnings;
my $file = "data.txt";
my $number = 0;
my $fh;
if( -e $file ) {
open $fh, "+<", $file or die "Opening '$file' failed, because $!\n";
$number = <$fh>;
seek( $fh, 0, 0 );
} else { # if no data.txt exists - yet
open $fh, ">", $file or die "Creating '$file' failed, because $!\n";
}
$number++;
print "$number\n";
print $fh $number;
close( $fh );
If you're using a bash shell, and you save the code to test.pl, you can test it with:
for i in {1..10}; do ./test.pl; done
Then 'cat data.txt', should show a 10.

merging two files using perl keeping the copy of original file in other file

I have to files like A.ini and B.ini ,I want to merge both the files in A.ini
examples of files:
A.ini::
a=123
b=xyx
c=434
B.ini contains:
a=abc
m=shank
n=paul
my output in files A.ini should be like
a=123abc
b=xyx
c=434
m=shank
n=paul
I want to this merging to be done in perl language and I want to keep the copy of old A.ini file at some other place to use old copy
A command line variant:
perl -lne '
($a, $b) = split /=/;
$v{$a} = $v{$a} ? $v{$a} . $b : $_;
END {
print $v{$_} for sort keys %v
}' A.ini B.ini >NEW.ini
How about:
#!/usr/bin/perl
use strict;
use warnings;
my %out;
my $file = 'path/to/A.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
$out{$key} = $val;
}
close $fh;
$file = 'path/to/B.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
if (exists $out{$key}) {
$out{$key} .= $val;
} else {
$out{$key} = $val;
}
}
close $fh;
$file = 'path/to/A.ini';
open my $fh, '>', $file or die "unable to open '$file' for writing: $!";
foreach(keys %out) {
print $fh $_,'=',$out{$_},"\n";
}
close $fh;
The two files to be merged can be read in a single pass and don't need to be treated as separate source files. That allows the use of <> to read all files passed as parameters on the command line.
Keeping a backup copy of A.ini is simply a matter of renaming it before writing the merged data to a new file of the same name.
This program appears to do what you need.
use strict;
use warnings;
my $file_a = $ARGV[0];
my (#keys, %values);
while (<>) {
if (/\A\s*(.+?)\s*=\s*(.+?)\s*\z/) {
push #keys, $1 unless exists $values{$1};
$values{$1} .= $2;
}
}
rename $file_a, "$file_a.bak" or die qq(Unable to rename "$file_a": $!);
open my $fh, '>', $file_a or die qq(Unable to open "$file_a" for output: $!);
printf $fh "%s=%s\n", $_, $values{$_} for #keys;
output (in A.ini)
a=123abc
b=xyx
c=434
m=shank
n=paul