Perl script to merge multiple files line by line - perl

Can anyone please help me with writing a Perl script which can take as input 5 text files and create a new text file with merging each row of all 5 files.
Should this be done by opening 5 read streams at a time or like java some random file reader is available in Perl ?
Thank You!

Here is a Perl script that will work on an arbitrary number of files:
use strict;
use warnings;
my #files = ('a.txt','b.txt');
my #fh;
#create an array of open filehandles.
#fh = map { open my $f, $_ or die "Cant open $_:$!"; $f } #files;
open my $out_file, ">merged.txt" or die "can't open out_file: $!";
my $output;
do
{
$output = '';
foreach (#fh)
{
my $line = <$_>;
if (defined $line)
{
#Special case: might not be a newline at the end of the file
#add a newline if none is found.
$line .= "\n" if ($line !~ /\n$/);
$output .= $line;
}
}
print {$out_file} $output;
}
while ($output ne '');
a.txt:
foo1
foo2
foo3
foo4
foo5
b.txt:
bar1
bar2
bar3
merged.txt:
foo1
bar1
foo2
bar2
foo3
bar3
foo4
foo5

This program expects a list of files on the command line (or, on Unix systems, a wildcard file spec). It creates an array of filehandles #fh for these files and then reads from each of them in turn, printing the merged data to STDOUT
use strict;
use warnings;
my #fh;
for (#ARGV) {
open my $fh, '<', $_ or die "Unable to open '$_' for reading: $!";
push #fh, $fh;
}
while (grep { not eof } #fh) {
for my $fh (#fh) {
if (defined(my $line = <$fh>)) {
chomp $line;
print "$line\n";
}
}
}

If a non-perl solution is ok with you, you can try this:
paste -d"\n\n\n\n\n" f1 f2 f3 f4 f5
where f1,f2..are your text files.

Related

how to combine the code to make the output is on the same line?

Can you help me to combine both of these progeam to display the output in a row with two columns? The first column is for $1 and the second column is $2.
Kindly help me to solve this. Thank you :)
This is my code 1.
#!/usr/local/bin/perl
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
This is my code 2.
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
this is my output for code 1 which contain 26 line of data:
**async_default**
**clock_gating_default**
Ddia_link_clk
Ddib_link_clk
Ddic_link_clk
Ddid_link_clk
FEEDTHROUGH
INPUTS
Lclk
OUTPUTS
VISA_HIP_visa_tcss_2000
ckpll_npk_npkclk
clstr_fscan_scanclk_pulsegen
clstr_fscan_scanclk_pulsegen_notdiv
clstr_fscan_scanclk_wavegen
idvfreqA
idvfreqB
psf5_primclk
sb_nondet4tclk
sb_nondetl2tclk
sb_nondett2lclk
sbclk_nondet
sbclk_sa_det
stfclk_scan
tap4tclk
tapclk
The output code 1 also has same number of line.
paste is useful for this: assuming your shell is bash, then using process substitutions
paste <(perl script1.pl) <(perl script2.pl)
That emits columns separated by a tab character. For prettier output, you can pipe the output of paste to column
paste <(perl script1.pl) <(perl script2.pl) | column -t -s $'\t'
And with this, you con't need to try and "merge" your perl programs.
To combine the two scripts and to output two items of data on the same line, you need to hold on until the end of the file (or until you have both data items) and then output them at once. So you need to combine both loops into one:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
my( $levels, $timing );
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
}
print "$levels, $timing\n";
close (FILE);
You still haven't given us one vital piece of information - what does the input data looks like. Most importantly, are the two pieces of information you're looking for on the same line?
[Update: Looking more closely at your regexes, I see it's possible for both pieces of information to be on the same line - as they are both supposed to be the first item on the line. It would be helpful if you were clearer about that in your question.]
I think this will do the right thing, no matter what the answer to your question is. I've also added the improvements I suggested in my answer to your previous question:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $zipped = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $unzipped = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $zipped => $unzipped
or die "gunzip failed: $GunzipError\n";
open (my $fh, '<', $unzipped) or die "Cannot open '$unzipped': $!\n";
my ($levels, $timing);
while (<$fh>) {
chomp;
if (m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if (m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
# If we have both values, then print them out and
# set the variables to 'undef' for the next iteration
if ($levels and $timing) {
print "$levels, $timing\n";
undef $levels;
undef $timing;
}
}
close ($fh);

What produces the white space in my perl programm?

As the title says, I have a program or better two functions to read and write a file either in an array or to one. But now to the mean reason why I write this: when running my test several times my test program that tests my functions produces more and more white space. Is there somebody that could explain my fail and correct me?
my code
Helper.pm:
#!/usr/bin/env perl
package KconfCtl::Helper;
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
return #rray;
}
sub array_to_file($$;$) {
my #rray = #{ shift() };
my $file = shift();
my $mode = shift();
$mode='>' if not $mode;
my $filestream;
if ( not defined $file ) {
$filestream = STDOUT;
}
else {
open( $filestream, $mode, $file ) or die("cant open $file: $!");
}
my $l = #rray; print $l,"\n";
foreach my $line (#rray) {
print $filestream "$line\n";
}
close($filestream);
}
1;
test_helper.pl:
use KconfCtl::Helper;
use strict;
my #t;
#t= KconfCtl::Helper::file_to_array("kconf.test");
#print #t;
my $t_index=#t;
#t[$t_index]="n";
KconfCtl::Helper::array_to_file(\#t, "kconf.test", ">");
the result after the first:
n
and the 2nd run:
n
n
When you read from a file, the data includes the newline characters at the end of each line. You're not stripping those off, but you are adding an additional newline when you output your data again. That means your file is gaining additional blank lines each time you read and write it
Also, you must always use strict and use warnings 'all' at the top of every Perl script; you should avoid using subroutine prototypes; and you should declare all of your variables as late as possible
Here's a more idiomatic version of your module code which removes the newlines on input using chomp. Note that you don't need the #! line on the module file as it won't be run from the command line, but you my want it on the program file. It's also more normal to export symbols from a module using the Exporter module so that you don't have to qualify the subroutine names by prefixing them with the full package name
use strict;
use warnings 'all';
package KconfCtl::Helper;
sub file_to_array {
my ($file) = #_;
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!}; #'
chomp(my #array = <$fh>);
return #array;
}
sub array_to_file {
my ($array, $file, $mode) = #_;
$mode //= '>';
my $fh;
if ( $file ) {
open $fh, $mode, $file or die qq{Can't open "$file" for output: $!}; #'
}
else {
$fh = \*STDOUT;
}
print $fh $_, "\n" for #$array;
}
1;
and your test program would be like this
#!/usr/bin/env perl
use strict;
use warnings 'all';
use KconfCtl::Helper;
use constant FILE => 'kconf.test';
my #t = KconfCtl::Helper::file_to_array(FILE);
push #t, 'n';
KconfCtl::Helper::array_to_file(\#t, FILE);
When you read in from your file, you need to chomp() the lines, or else the \n at the end of the line is included.
Try this and you'll see what's happening:
use Data::Dumper; ## add this line
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, '<', $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
print Dumper( \#rray ); ### add this line
return #rray;
}
you can add
foreach(#rray){
chomp();
}
into your module to stop this happening.

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

How to replace string dynamically using perl script

I am trying to solve below issues.
I have 2 files. Address.txt and File.txt. I want to replace all A/B/C/D (File.txt) with corresponding string value (Read from Address.txt file) using perl script. It's not replacing in my output file. I am getting same content of File.txt.
I tried below codes.
Here is Address.txt file
A,APPLE
B,BAL
C,CAT
D,DOG
E,ELEPHANT
F,FROG
G,GOD
H,HORCE
Here is File.txt
A B C
X Y X
M N O
D E F
F G H
Here is my code :
use strict;
use warnings;
open (MYFILE, 'Address.txt');
foreach (<MYFILE>){
chomp;
my #data_new = split/,/sm;
open INPUTFILE, "<", $ARGV[0] or die $!;
open OUT, '>ariout.txt' or die $!;
my $src = $data_new[0];
my $des = $data_new[1];
while (<INPUTFILE>) {
# print "In while :$src \t$des\n";
$_ =~ s/$src/$des/g;
print OUT $_;
}
close INPUTFILE;
close OUT;
# /usr/bin/perl -p -i -e "s/A/APPLE/g" ARGV[0];
}
close (MYFILE);
If i Write $_ =~ s/A/Apple/g;
Then output file is fine and A is replacing with "Apple". But when dynamically coming it's not getting replaced.
Thanks in advance. I am new in perl scripting language . Correct me if I am wrong any where.
Update 1: I updated below code . It's working fine now. My questions Big O of this algo.
Code :
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "Address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "File1.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
Not entirely sure how you want your output formatted. Do you want to keep the rows and columns as is?
I took a similar approach as above but kept the formatting the same as in your 'file.txt' file:
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "file.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
That will give you the output:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
Here's another option that lets Perl handle the opening and closing of files:
use strict;
use warnings;
my $addresses_txt = pop;
my %hash = map { $1 => $2 if /(.+?),(.+)/ } <>;
push #ARGV, $addresses_txt;
while (<>) {
my #array;
push #array, $hash{$_} // $_ for split;
print "#array\n";
}
Usage: perl File.txt Addresses.txt [>outFile.txt]
The last, optional parameter directs output to a file.
Output on your dataset:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
The name of the addresses' file is implicitly popped off of #ARGV for use later. Then, a hash is built, using the key/value pairs in File.txt.
The addresses' file is read, splitting each line into its single elements, and the defined-or (//) operator is used to returned the defined hash item or the single element, which is then pushed onto #array. Finally, the array is interpolated in a print statement.
Hope this helps!
First, here is your existing program, rewritten slightly
open the address file
convert the address file to a hash so that the letters are the keys and the strings the values
open the other file
read in the single line in it
split the line into single letters
use the letters to lookup in the hash
use strict;
use warnings;
open(my $a,"Address.txt")||die $!;
my %address=map {split(/,/) } map {split(' ')} <$a>;
open(my $f,"File.txt")||die $!;
my $list=<$f>;
for my $letter (split(' ',$list)) {
print $address{$letter}."\n" if (exists $address{$letter});
}
to make another file with the substitutions in place alter the loop that processes $list
for my $letter (split(' ',$list)) {
if (exists $address{$letter}) {
push #output, $address{$letter};
}
else {
push #output, $letter;
}
}
open(my $o,">newFile.txt")||die $!;
print $o "#output";
Your problem is that in every iteration of your foreach loop you overwrite any changes made earlier to output file.
My solution:
use strict;
use warnings;
open my $replacements, 'Address.txt' or die $!;
my %r;
foreach (<$replacements>) {
chomp;
my ($k, $v) = split/,/sm;
$r{$k} = $v;
}
my $re = '(' . join('|', keys %r) . ')';
open my $input, "<", $ARGV[0] or die $!;
while (<$input>) {
s/$re/$r{$1}/g;
print;
}
#!/usr/bin/perl -w
# to replace multiple text strings in a file with text from another file
#select text from 1st file, replace in 2nd file
$file1 = 'Address.txt'; $file2 = 'File.txt';
# save the strings by which to replace
%replacement = ();
open IN,"$file1" or die "cant open $file1\n";
while(<IN>)
{chomp $_;
#a = split ',',$_;
$replacement{$a[0]} = $a[1];}
close IN;
open OUT,">replaced_file";
open REPL,"$file2" or die "cant open $file2\n";
while(<REPL>)
{chomp $_;
#a = split ' ',$_; #replaced_data = ();
# replace strings wherever possible
foreach $i(#a)
{if(exists $replacement{$i}) {push #replaced_data,$replacement{$i};}
else {push #replaced_data,$i;}
}
print OUT trim(join " ",#replaced_data),"\n";
}
close REPL; close OUT;
########################################
sub trim
{
my $str = $_[0];
$str=~s/^\s*(.*)/$1/;
$str=~s/\s*$//;
return $str;
}

How to read from a file and direct output to a file if a file name is given in the command line, and printing to console if no argument given

I made a file, "rootfile", that contains paths to certain files and the perl program mymd5.perl gets the md5sum for each file and prints it in a certain order. How do I redirect the output to a file if a name is given in the command line? For instance if I do
perl mymd5.perl md5file
then it will feed output to md5file. And if I just do
perl mydm5.perl
it will just print to the console.
This is my rootfile:
/usr/local/courses/cs3423/assign8/cmdscan.c
/usr/local/courses/cs3423/assign8/driver.c
/usr/local/courses/cs3423/assign1/xpostitplus-2.3-3.diff.gz
This is my program right now:
open($in, "rootfile") or die "Can't open rootfile: $!";
$flag = 0;
if ($ARGV[0]){
open($out,$ARGV[0]) or die "Can't open $ARGV[0]: $!";
$flag = 1;
}
if ($flag == 1) {
select $out;
}
while ($line = <$in>) {
$md5line = `md5sum $line`;
#md5arr = split(" ",$md5line);
if ($flag == 0) {
printf("%s\t%s\n",$md5arr[1],$md5arr[0]);
}
}
close($out);
If you don't give a FILEHANDLE to print or printf, the output will go to STDOUT (the console).
There are several way you can redirect the output of your print statements.
select $out; #everything you print after this line will go the file specified by the filehandle $out.
... #your print statements come here.
close $out; #close connection when done to avoid counfusing the rest of the program.
#or you can use the filehandle right after the print statement as in:
print $out "Hello World!\n";
You can print a filename influenced by the value in #ARGV as follows:
This will take the name of the file in $ARGV[0] and use it to name a new file, edit.$ARGV[0]
#!/usr/bin/perl
use warnings;
use strict;
my $file = $ARGV[0];
open my $input, '<', $file or die $!;
my $editedfile = "edit.$file";
open my $name_change, '>', $editedfile or die $!;
if ($input eq "md5file"){
while ($in){
# Do something...
print $name_change "$_\n";
}
}
Perhaps the following will be helpful:
use strict;
use warnings;
while (<>) {
my $md5line = `md5sum $_`;
my #md5arr = split( " ", $md5line );
printf( "%s\t%s\n", $md5arr[1], $md5arr[0] );
}
Usage: perl mydm5.pl rootfile [>md5file]
The last, optional parameter will direct output to the file md5file; if absent, the results are printed to the console.