How many different ways are there to concatenate two files line by line using Perl? - perl

Suppose file1 looks like this:
bye bye
hello
thank you
And file2 looks like this:
chao
hola
gracias
The desired output is this:
bye bye chao
hello hola
thank you gracias
I myself have already come up with five different approaches to solve this problem. But I think there must be more ways, probably more concise and more elegant ways, and I hope I can learn more cool stuff :)
The following is what I have tried so far, based on what I've learnt from the many solutions of my previous problems. Also, I'm trying to sort of digest or internalize the knowledge I've acquired from the Llama book.
Code 1:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
open my $file2,'<','c:/file2.txt';
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)){
die "Files are different sizes!\n" unless eof(file1) == eof(file2);
$line1 .= $line2;
$line1 =~ s/\n/ /;
print "$line1 \n";
}
Code 2:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
my #file1 = <$file1>;
open my $file2,'<','c:/file2.txt';
my #file2 =<$file2>;
for (my $n=0; $n<=$#file1; $n++) {
$file1[$n] .=$file2[$n];
$file1[$n]=~s/\n/ /;
print $file1[$n];
}
Code 3:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
open my $file2,'<','c:/file2.txt';
my %hash;
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)) {
chomp $line1;
chomp $line2;
my ($key, $val) = ($line1,$line2);
$hash{$key} = $val;
}
print map { "$_ $hash{$_}\n" } sort keys %hash;
Code 4:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
open my $file2,'<','c:/file2.txt';
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)) {
$line1 =~ s/(.+)/$1 $line2/;
print $line1;
}
Code 5:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
my #file1 =<$file1>;
open my $file2,'<','c:/file2.txt';
my #file2 =<$file2>;
while ((#file1) && (#file2)){
my $m = shift (#file1);
chomp($m);
my $n = shift (#file2);
chomp($n);
$m .=" ".$n;
print "$m \n";
}
I have tried something like this:
foreach $file1 (#file2) && foreach $file2 (#file2) {...}
But Perl gave me a syntactic error warning. I was frustrated. But can we run two foreach loops simultaneously?
Thanks, as always, for any comments, suggestions and of course the generous code sharing :)

This works for any number of files:
use strict;
use warnings;
use autodie;
my #handles = map { open my $h, '<', $_; $h } #ARGV;
while (#handles){
#handles = grep { ! eof $_ } #handles;
my #lines = map { my $v = <$_>; chomp $v; $v } #handles;
print join(' ', #lines), "\n";
}
close $_ for #handles;

The most elegant way doesn't involve perl at all:
paste -d' ' file1 file2

If I were a golfing man, I could rewrite #FM's answer as:
($,,$\)=(' ',"\n");#_=#ARGV;open $_,$_ for #_;print
map{chomp($a=<$_>);$a} #_=grep{!eof $_} #_ while #_
which you might be able to turn into a one-liner but that is just evil. ;-)
Well, here it is, under 100 characters:
C:\Temp> perl -le "$,=' ';#_=#ARGV;open $_,$_ for #_;print map{chomp($a =<$_>);$a} #_=grep{!eof $_ }#_ while #_" file1 file2
If it is OK to slurp (and why the heck not — we are looking for different ways), I think I have discovered the path the insanity:
#_=#ARGV;chomp($x[$.-1]{$ARGV}=$_) && eof
and $.=0 while<>;print "#$_{#_}\n" for #x
C:\Temp> perl -e "#_=#ARGV;chomp($x[$.-1]{$ARGV}=$_) && eof and $.=0 while<>;print qq{#$_{#_}\n} for #x" file1 file2
Output:
bye bye chao
hello hola
thank you gracias

An easier alternative to your Code 5 which allows for an arbitrary number of lines and does not care if files have different numbers of lines (hat tip #FM):
#!/usr/bin/perl
use strict; use warnings;
use File::Slurp;
use List::AllUtils qw( each_arrayref );
my #lines = map [ read_file $_ ], #ARGV;
my $it = each_arrayref #lines;
while ( my #lines = grep { defined and chomp and length } $it->() ) {
print join(' ', #lines), "\n";
}
And, without using any external modules:
#!perl
use autodie; use warnings; use strict;
my ($file1, $file2) = #ARGV;
open my $file1_h,'<', $file1;
my #file1 = grep { chomp; length } <$file1_h>;
open my $file2_h,'<', $file2;
my #file2 = grep { chomp; length } <$file2_h>;
my $n_lines = #file1 > #file2 ? #file1 : #file2;
for my $i (0 .. $n_lines - 1) {
my ($line1, $line2) = map {
defined $_ ? $_ : ''
} $file1[$i], $file2[$i];
print $line1, ' ', $line2, "\n";
}
If you want to concatenate only the lines that appear in both files:
#!perl
use autodie; use warnings; use strict;
my ($file1, $file2) = #ARGV;
open my $file1_h,'<', $file1;
my #file1 = grep { chomp; length } <$file1_h>;
open my $file2_h,'<', $file2;
my #file2 = grep { chomp; length } <$file2_h>;
my $n_lines = #file1 < #file2 ? #file1 : #file2;
for my $i (0 .. $n_lines - 1) {
print $file1[$i], ' ', $file2[$i], "\n";
}

An easy one with minimal error checking:
#!/usr/bin/perl -w
use strict;
open FILE1, '<file1.txt';
open FILE2, '<file2.txt';
while (defined(my $one = <FILE1>) or defined(my $twotemp = <FILE2>)){
my $two = $twotemp ? $twotemp : <FILE2>;
chomp $one if ($one);
chomp $two if ($two);
print ''.($one ? "$one " : '').($two ? $two : '')."\n";
}
And no, you can't run two loops simultaneous within the same thread, you'd have to fork, but that would not be guaranteed to run synchronously.

Related

perl array prints as GLOB(#x#########)

I have a file which contains a list of email addresses which are separated by a semi-colon which is configured much like this (but much larger) :
$ cat email_casper.txt
casper1#foo.com; casper2#foo.com; casper3#foo.com; casper.casper4#foo.com;
#these throw outlook error :
#casper101#foo.com ; casper100#foo.com
#cat /tmp/emailist.txt | tr '\n' '; '
#cat /tmp/emallist.txt | perl -nle 'print /\<(.*)\>/' | sort
I want to break it up on the semicolon - so I suck the whole file into an array supposedly the contents are split on semicolon.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift #ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
my #values = split(';', $fh);
foreach my $val (#values) {
print "$val\n";
}
exit 0 ;
But the file awards me with a golb. I just don't know what is going one.
$ ./spliton_semi.pl email_casper.txt
GLOB(0x80070b90)
If I use Data::Dumper I get
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper ;
my $filename = shift #ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
my #values = split(';', $fh);
print Dumper \#values ;
This is what the Dumper returns :
$ ./spliton_semi.pl email_casper.txt
$VAR1 = [
'GLOB(0x80070b90)'
];
You do not "suck the whole file into an array". You don't even attempt to read from the file handle. Instead, you pass the file handle to split. Expecting a string, it stringifies the file handle into GLOB(0x80070b90).
You could read the file into an array of lines as follows:
my #lines = <$fh>;
for my $line ($lines) {
...
}
But it's far simpler to read one line at a time.
while ( my $line = <$fh> ) {
...
}
In fact, there is no reason not to use ARGV here, simplifying your program to the following:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
while (<>) {
chomp;
say for split /\s*;\s*/, $_;
}
This line
my #values = split(';', $fh);
is not reading from the filehandle like you think it is. You're actually calling split on the filehandle object itself.
You want this:
my $line = <$fh>;
my #values = split(';', $line);
Starting point:
#!/usr/bin/perl
use strict;
use warnings;
open(my $fh, '<', 'dummy.txt')
or die "$!";
my #values;
while (<$fh>) {
chomp;
# original
# push(#values, split(';', $_));
# handle white space
push(#values, split(/\s*;\s*/, $_));
}
close($fh);
foreach my $val (#values) {
print "$val\n";
}
exit 0;
Output for your example:
$ perl dummy.pl
casper1#foo.com
casper2#foo.com
casper3#foo.com
casper.casper4#foo.com

perl- extract duplicate sequences from a multi-fasta file

I have a big fasta file input.fasta which consists many duplicate sequences. I want to enter a header name and extract out all the sequences with the matching header. I know this could be done easily done with awk/sed/grep but I need a Perl code.
input.fasta
>OGH38127_some_organism
PAAALGFSHLARQEDSALTPKHYTWTAPGEGDVRAPCPVLNTLANHEFLPHNGKNITVDK
AITALGDAMNISPALATTFFTGGLKTNPTPNATWFDLDMLHKHNVLEHDGSLSRRDMHFD
TSNKFDAATFANFLSYFDANATVLGVNETADARARHAYDMSKMNPEFTITSSMLPIMVGE
SVMMMLVWGSVEEPGAQRDYFEYFFRNERLPVELGWTPGETEIGVPVVTAMITAMVAASP
TDVP
>ABC14110_some_different_org_name
WWVAPGPGDSRGPCPGLNTLANHGYLPHDGKGITLSILADAMLDGFNIARSDALLLFTQ
AIRTSPQYPATNSFNLHDLGRDQLNRHNVLEHDASLSRADDFFGSNHIFNETVFDESRAY
AMLANSKIARQINSKAFNPQYKFTSKTEQFSLGEIAAPIIAFGNSTSGEVNRTLVEYFFM
NERLPIELGWKKSEDGIALDDILRVTQMISKAASLITPSALSWTAETLTP
>OGH38127_some_organism
LPWSRPGPGAVRAPCPMLNTLANHGFLPHDGKNISEARTVQALGRALNIEKELSQFLFEK
ALTTNPHTNATTFSLNDLSRHNLLEHDASLSRQDAYFGDNHDFNQTIFDETRSYWPHPVI
DIQAAALSRQARVNTSIAKNPTYNMSELGLDFSYGETAAYILILGDKDFGKVNRSWVEYL
FENERLPVELGWTRHNETITSDDLNTMLEKVVN
.
.
.
I have tried with the following script but it is not giving any output.
script.pl
#!/perl/bin/perl -w
use strict;
use warnings;
print "Enter a fasta header to search for:\n";
my $head = <>;
my $file = "input.fasta";
open (READ, "$file") || die "Cannot open $file: $!.\n";
my %seqs;
my $header;
while (my $line = <READ>){
chomp $line;
$line =~ s/^>(.*)\n//;
if ($line =~ m/$head/){
$header = $1;
}
}
close (READ);
open( my $out , ">", "out.fasta" ) or die $!;
my #count_seq = keys %seqs;
foreach (#count_seq){
print $out $header, "\n";
print $out $seqs{$header}, "\n";
}
exit;
Please help me correct this script.
Thanks!
If you use the Bioperl module Bio::SeqIO to handle the parsing of the fasta files, it becomes really simple:
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
my ($file, $name) = #ARGV;
my $in = Bio::SeqIO->new(-file => $file, -format => "fasta");
my $out = Bio::SeqIO->new(-fh => \*STDOUT, -format => "fasta");
while (my $s = $in->next_seq) {
$out->write_seq($s) if $s->display_id eq $name;
}
run with perl grep_fasta.pl input.fasta OGH38127_some_organism
There's no need to store the sequences in memory, you can print them directly when reading the file. Use a flag variable ($inside in the example) that tells you whether you're reading the desired sequence or not.
#! /usr/bin/perl
use warnings;
use strict;
my ($file, $header) = #ARGV;
my $inside;
open my $in, '<', $file or die $!;
while (<$in>) {
$inside = $1 eq $header if /^>(.*)/;
print if $inside;
}
Run as
perl script.pl file.fasta OGH38127_some_organism > output.fasta

perl code for splitting a single file by matching charater into multiple files

i want to split a large data file into multiple files where ever it matches a '^' character.
#!/usr/bin/perl -w
use strict;
print "enter the data file name";
chomp( my $a=<STDIN> );
open (<READ>,"$a")| "error";
while ($line=<READ>)
{
my #array=split(" ",$line) unless ^ ;
after splitting of the data file. A total of 23 files will be created
Here's a slight different answer from saiprathapreddy.obula's:
use warnings;
use strict;
my ($file) = #ARGV;
open(my $input, "<$file");
local $/ = "^";
my $i = 0;
while(<>){
chomp;
$i++;
open(my $output, ">file$i.txt");
print $output "$_";
}
$,="";
$"="";
my $i=1;
open OUT ,">DATA_${i}.txt";
while(<>){
chomp;
my #F=split(/\^/);
if(#F==1){
print OUT $_,"\n";
}
elsif(#F>1){
$i++;
close OUT;
open OUT ,">DATA_${i}.txt";
print OUT "#F[1..$#F] \n";
}
}
close OUT;
Here is a cleaned up and tested version of the program of saiprathapreddy.obula.
use strict;
use warnings;
open(FILE,'AUTOSAR.txt');
local $/;
my $var = <FILE>;
close FILE;
my #arr = split('\^',$var);
my $i=0;
foreach (#arr) {
$i++;
open(FILE1,">$i.txt");
print FILE1;
close FILE1;
}
use strict;
open(FILE,'AUTOSAR.txt');
local $/;
my $var = <FILE>;
my #arr = split('\^',$var);
my $i=0;
foreach (#arr) {
$i++;
open(FILE1,">$i.txt");
print FILE1 $_;
close FILE1;
}
close FILE;

Replacing a pattern in file without reading the whole file- Perl

I want to replace a pattern 'good' with 'bad' in my file. Currently what I am doing is:
#!/perl/bin/perl
$filename= "abc.txt";
open my $fh, $filename;
my $text = do { local( $/ ); <$fh> };
close $fh;
$text =~s/good/bad/g;
Is there any way I can do this without reading the whole file??
Edit: Suppose I know that there's only 1 'good' in the file.
P.S.,Hi I am new here. Hope I am doing it correctly.
You could use Tie::File:
#!/usr/bin/perl
use strict;
use Tie::File;
my $filename = ...;
tie my #lines, 'Tie::File', $filename or die;
for (#lines) {
s/good/bad/g and last;
}
Too make sure that not the whole file is slurped in you want to read the lines one-by-one, e.g. using:
#!/usr/bin/perl
use strict;
use Tie::File;
my $filename = ...;
tie my #lines, 'Tie::File', $filename or die;
for(my $i=0; ; $i++) {
last if !defined $lines[$i]; # eof
last if $lines[$i] =~ s/good/bad/g;
}

how to compare the the array values with different file in different directory?

#!/usr/bin/perl
use strict;
use warnings;
use warnings;
use 5.010;
my #names = ("RD", "HD", "MP");
my $flag = 0;
my $filename = 'Sample.txt';
if (open(my $fh, '<', $filename))
{
while (my $row = <$fh>)
{
foreach my $i (0 .. $#names)
{
if( scalar $row =~ / \G (.*?) ($names[$i]) /xg )
{
$flag=1;
}
}
}
if( $flag ==1)
{
say $filename;
}
$flag=0;
}
here i read the content from one file and compare with array values, if file contant matches with array value i just display the file. in the same way how can i access different file from different direcory and compare the array values with same?
Q: How can I access a different file?
A: By specifying a different filename.
By the way: If you are using flags for loop control in Perl, you are doing something wrong. You can specify that this was the last iteration of the loop (in C: break), or that you want to start the next iteration. You can label the loops so that you can break out of as many loops as you like at once:
#!/usr/bin/perl
use 5.010; use warnings;
my #names = qw(RD HD MP);
# unpack command line arguments
my ($filename) = #ARGV;
open my $fh, "<", $filename or die "Oh noes, $filename is bad: $!";
LINE:
while (my $line = <$fh>) {
NAME:
foreach my $name (#names) {
if ($line =~ /\Q$name\E/) { # \QUOT\E the $name to escape everything
say "$filename contains $name";
last LINE;
}
}
}
Other highlights:
using a foreach loop as intended and
removing the (in this context) senseless \G assertion
You can then execute the script as perl script.pl Sample.txt or perl script.pl ../another.dir/foo.bar or whatever.
You can use the ~~ operator in Perl 5.10.
Don't forget to chomp the trailing whitespace.
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #names = ('RD', 'HD', 'MP');
my $other_dir = '/tmp';
my $filename = 'Sample.txt';
if ( open( my $fh, '<', "$other_dir/$filename" ) ) {
ROW:
while ( my $row = <$fh> ) {
chomp $row; # remove trailing \n
if ( $row ~~ #names ) {
say $filename;
last ROW;
}
}
}
close $fh;