Compare multiple file contents in perl - perl

I have list of files (more than 2) where I need to verify whether all of those files are identical.
I try to use File::Compare module , But it seems to accept only two files. But in my case I have multiple files where I want to verify its contents are same? Do we have any other way for my requirement.

The solution to this problem is to take a digest of every file. Many options exist, most will be 'good enough' (e.g. technically MD5 has some issues, but they're not likely to matter outside a cryptographic/malicious code scenario).
So simply:
#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw ( md5_hex );
use Data::Dumper;
my %digest_of;
my %unique_files;
foreach my $file (#ARGV) {
open( my $input, '<', $file ) or warn $!;
my $digest = md5_hex ( do { local $/; <$input> } );
close ( $input );
$digest_of{$file} = $digest;
push #{$unique_files{$digest}}, $file;
}
print Dumper \%digest_of;
print Dumper \%unique_files;
%unique_files will give you each unique fingerprint, and all the files with that fingerprint - if you've got 2 (or more) then you've got files that aren't identical.

Related

Including a perl file that is generated in current file

I am working on a perl script that successfully generates output files containing hashes. I want to use those hashes in my file. Is it possible to include a file that is generated in that file or will I have to create another file?
Technically, it might be cleaner to start a new .pl file that uses those hashes, but I would like to keep everything in a single script if possible. Is it even possible to do so?
Edit: I'm just unsure if I can "circle" it back around so I can use those hashes in my file because the hashes are generated on a weekly basis. I don't want my file to mistakenly reach out for last week's hashes instead of the newly generated ones. I have not yet wrote my script in a manner to classify each week's generated hashes.
In summary, here is what my file does. It extracts a table from another file. removes columns and rows that are not needed. Once left with the only two columns needed, it takes them and puts them into a hash. One column being the key and the other being the value. For this reason, I've found Data::Dumper to be the best option for my hashes. I'm intermediate in Perl and this is a script I'm putting together for an internship.
Here is an example how you can save a hash as JSON to a file and later read back the JSON to a perl hash. This example is using JSON::XS:
use strict;
use warnings;
use Data::Dumper;
use JSON::XS;
{
my %h = (a => 1, b => 2);
my $str = encode_json( \%h );
my $fn = 'test.json';
save_json( $fn, \%h );
my $h2 = read_json( $fn );
print Dumper( $h2 );
}
sub read_json {
my ( $fn ) = #_;
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my $str = do { local $/; <$fh> };
close $fh;
my $h = decode_json $str;
return $h;
}
sub save_json {
my ( $fn, $hash ) = #_;
my $str = encode_json( $hash );
open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
print $fh $str;
close $fh;
}
Output:
$VAR1 = {
'a' => 1,
'b' => 2
};
Some alternatives to JSON are YAML and Storable.

perl code for comparing file contents

I'm a newbie in perl scripting. I have 2 files. I want to compare contents line by line and delete the matching ones. if i use a wild card in 1 file to match multiple lines in second file, it should delete multiple matches and write the rest to another file. I got a bit from another mail it does not take care of wild cards
use strict;
use warnings;
$\="\n";
open my $FILE, "<", "file.txt" or die "Can't open file.txt: $!";
my %Set = map {$_ => undef} <$FILE>;
open my $FORBIDDEN, "<", "forbidden.txt" or die "Can't open forbidden.txt: $!";
my %Forbidden = map {$_ => undef} <$FORBIDDEN>;
open my $OUT, '>', 'output' or die $!;
my %Result = %Set; # make a copy
delete $Result{$_} for keys %Forbidden;
print $OUT keys %Result
I'm not sure what you mean with "wild card".
Nevertheless there are many ways to do what you want. Since it's prettier to use some existing modules you can use the List::Compare module available at cpan.
With the following code you use this module to store all the lines contained in the one file (file.txt) but not in the other file (forbidden.txt). So you implicitly match the lines which are equal. This code doesn't delete them from the file, but find them.
Your code would look like:
use strict;
use warnings;
use File::Slurp qw(read_file); #cpan-module
use List::Compare; #cpan-module
chomp( my #a_file = read_file 'file.txt' );
chomp( my #b_file = read_file 'forbidden.txt' );
#here it stores all the lines contained in the 'file.txt'
#but not in the 'forbidden.txt' in an array
my #a_file_only = List::Compare->new( \#a_file, \#b_file )->get_Lonly;
print "$_\n" for #a_file_only;
#here you could write these lines in a new file to store them.
#At this point I just print them out.
the new approach:
foreach my $filter (#b_file){
#a_file = grep{ /${filter}/} #a_file;
}
print Dumper(#a_file);
It will reduce the lines in the #a_file step by step by using each filter.

File::Temp pass system command output to temp file

I'm trying to capture the output of a tail command to a temp file.
here is a sample of my apache access log
Here is what I have tried so far.
#!/usr/bin/perl
use strict;
use warnings;
use File::Temp ();
use File::Temp qw/ :seekable /;
chomp($tail = `tail access.log`);
my $tmp = File::Temp->new( UNLINK => 0, SUFFIX => '.dat' );
print $tmp "Some data\n";
print "Filename is $tmp\n";
I'm not sure how I can go about passing the output of $tail to this temporoy file.
Thanks
I would use a different approach for tailing the file. Have a look to File::Tail, I think it will simplify things.
It sounds like all you need is
print $tmp $tail;
But you also need to declare $tail and you probably shouldn't chomp it, so
my $tail = `tail access.log`;
Is classic Perl approach to use the proper filename for the handle?
if(open LOGFILE, 'tail /some/log/file |' and open TAIL, '>/tmp/logtail')
{
print LOGFILE "$_\n" while <TAIL>;
close TAIL and close LOGFILE
}
There is many ways to do this but since you are happy to use modules, you might as well use File::Tail;
use v5.12;
use warnings 'all';
use File::Tail;
my $lines_required = 10;
my $out_file = "output.txt";
open(my $out, '>', $out_file) or die "$out_file: $!\n";
my $tail = File::Tail->new("/some/log/file");
for (1 .. $lines_required) {
print $out $tail->read;
}
close $out;
This sits and monitors the log file until it gets the 10 new lines. If you just want a copy of the last 10 lines as is, the easiest way is to use I/O redirection from the shell: tail /some/log/file > my_copy.txt

Opening, spliting and sorting into an Arrray in perl

I am a beginner programmer, who has been given a weeklong assignment to build a complex program, but is having a difficult time starting off. I have been given a set of data, and the goal is separate it into two separate arrays by the second column, based on whether the letter is M or F.
this is the code I have thus far:
#!/usr/local/bin/perl
open (FILE, "ssbn1898.txt");
$x=<FILE>;
split/[,]/$x;
#array1=$y;
if #array1[2]="M";
print #array2;
else;
print #array3;
close (FILE);
How do I fixed this? Please try and use the simplest terms possible I stared coding last week!
Thank You
First off - you split on comma, so I'm going to assume your data looks something like this:
one,M
two,F
three,M
four,M
five,F
six,M
There's a few problems with your code:
turn on strict and warnings. The warn you about possible problems with your code
open is better off written as open ( my $input, "<", $filename ) or die $!;
You only actually read one line from <FILE> - because if you assign it to a scalar $x it only reads one line.
you don't actually insert your value into either array.
So to do what you're basically trying to do:
#!/usr/local/bin/perl
use strict;
use warnings;
#define your arrays.
my #M_array;
my #F_array;
#open your file.
open (my $input, "<", 'ssbn1898.txt') or die $!;
#read file one at a time - this sets the implicit variable $_ each loop,
#which is what we use for the split.
while ( <$input> ) {
#remove linefeeds
chomp;
#capture values from either side of the comma.
my ( $name, $id ) = split ( /,/ );
#test if id is M. We _assume_ that if it's not, it must be F.
if ( $id eq "M" ) {
#insert it into our list.
push ( #M_array, $name );
}
else {
push ( #F_array, $name );
}
}
close ( $input );
#print the results
print "M: #M_array\n";
print "F: #F_array\n";
You could probably do this more concisely - I'd suggest perhaps looking at hashes next, because then you can associate key-value pairs.
There's a part function in List::MoreUtils that does exactly what you want.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::MoreUtils 'part';
my ($f, $m) = part { (split /,/)[1] eq 'M' } <DATA>;
say "M: #$m";
say "F: #$f";
__END__
one,M,foo
two,F,bar
three,M,baz
four,M,foo
five,F,bar
six,M,baz
The output is:
M: one,M,foo
three,M,baz
four,M,foo
six,M,baz
F: two,F,bar
five,F,bar
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #boys=();
my #girls=();
my $fname="ssbn1898.txt"; # I keep stuff like this in a scalar
open (FIN,"< $fname")
or die "$fname:$!";
while ( my $line=<FIN> ) {
chomp $line;
my #f=split(",",$line);
push #boys,$f[0] if $f[1]=~ m/[mM]/;
push #girls,$f[1] if $f[1]=~ m/[gG]/;
}
print Dumper(\#boys);
print Dumper(\#girls);
exit 0;
# Caveats:
# Code is not tested but should work and definitely shows the concepts
#
In fact the same thing...
#!/usr/bin/perl
use strict;
my (#m,#f);
while(<>){
push (#m,$1) if(/(.*),M/);
push (#f,$1) if(/(.*),F/);
}
print "M=#m\nF=#f\n";
Or a "perl -n" (=for all lines do) variant:
#!/usr/bin/perl -n
push (#m,$1) if(/(.*),M/);
push (#f,$1) if(/(.*),F/);
END { print "M=#m\nF=#f\n";}

Writing to a file in perl

I want to write the key and value pair that i have populated in the hash.I am using
open(OUTFILE,">>output_file.txt");
{
foreach my $name(keys %HoH) {
my $values = $HoH{$name};
print "$name: $values\n";
}
}
close(OUTFILE);
Somehow it creates the output_file.txt but it does not write the data to it.What could be the reason?
Use:
print OUTFILE "$name: $values\n";
Without specifying the filehandle in the print statement, you are printing to STDOUT, which is by default the console.
open my $outfile, '>>', "output_file.txt";
print $outfile map { "$_: $HOH{$_}\n" } keys %HoH;
close($outfile);
I cleaned up for code, using the map function here would be more concise. Also I used my variables for the file handles, always good practice. There are still more ways to do this, you should check out Perl Cook book, here
When you open OUTFILE you have a couple of choices for how to write to it. One, you can specify the filehandle in your print statements, or two, you can select the filehandle and then print normally (without specifying a filehandle). You're doing neither. I'll demonstrate:
use strict;
use warnings;
use autodie;
my $filename = 'somefile.txt';
open my( $filehandle ), '>>', $filename;
foreach my $name ( keys %HoH ) {
print $filehandle "$name: $HoH{$name}\n";
}
close $filehandle;
If you were to use select, you could do it this way:
use strict;
use warnings;
use autodie;
my $filename = 'somefile.txt';
open my( $filehandle ), '>>', $filename;
my $oldout = select $filehandle;
foreach my $name( keys %HoH ) {
print "$name: $HoH{$name}\n";
}
close $filehandle;
select $oldout;
Each method has its uses, but more often than not, in the interest of writing clear and easy to read/maintain code, you use the first approach unless you have a real good reason.
Just remember, whenever you're printing to a file, specify the filehandle in your print statement.
sergio's answer of specifying the filehandle is the best one.
Nonetheless there is another way: use select to change the default output filehandle. And in another alternate way to do things, using while ( each ) rather than foreach ( keys ) can be better in some cases (particularly, when the hash is tied to a file somehow and it would take a lot of memory to get all the keys at once).
open(OUTFILE,">>output_file.txt");
select OUTFILE;
while (my ($name, $value) = each %HoH) {
print "$name: $value\n";
}
close(OUTFILE);