Perl - Compare two text files and then match only the difference found on the first file - perl

I'm trying to make a script that would only print the difference in text found in the first file but not in the second file.
For example the first text file contains:
a
b
c
d
While the second file contains:
a
x
y
z
With the script that I'm trying, it prints the difference for both the files which is:
b
c
d
x
y
z
But the result I can't figure out to make is just:
b
c
d
Here is the code:
use strict;
use warnings;
my $f1 = 'C:\Strawberry\new.raw';
my $f2 = 'C:\Strawberry\orig.raw';
my $outfile = 'C:\Strawberry\mt_deleted.txt';
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){
$results{$line}=1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
$results{$line}++;
}
close(FILE2);
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
print OUTFILE $line if $results{$line} == 1;
}
close OUTFILE;

You need to add chomp, and assign different value for keys of file2
use strict;
use warnings;
my $f1 = 'C:\Strawberry\new.raw';
my $f2 = 'C:\Strawberry\orig.raw';
my $outfile = 'C:\Strawberry\mt_deleted.txt';
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while ( my $line = <FILE1> ) {
chomp $line;
$results{$line} = 1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while ( my $line = <FILE2> ) {
chomp $line;
$results{$line} = 2;
}
close(FILE2);
open( OUTFILE, ">$outfile" ) or die "Cannot open $outfile for writing \n";
foreach my $line ( keys %results ) {
print OUTFILE "$line\n" if $results{$line} == 1;
}
close OUTFILE;

Let's start by counting the number of occurrences of each line in file 2.
my %counts;
while (<$fh2>) {
chomp;
++$counts{$_};
}
To print each line of file 1 not matched by a line in file 2, simply process file 1 line by line, decrementing the count, and printing the line if the count is negative.
while (<$fh1>) {
chomp;
say if --$counts{$_} < 0;
}
You said the files could have duplicate lines, but you didn't say how you wanted to handle them. The above handles duplicates as follows:
File 1:
a
a
a
b
c
File 2:
c
a
Output:
a
a
b

Let's start by forming a lookup table of what's in file 2.
my %seen;
while (<$fh2>) {
chomp;
++$seen{$_};
}
To print each line of file 1 not found in file 2, simply process file 1 line by line and printing the line if it's not in the lookup table.
while (<$fh1>) {
chomp;
say if !$seen{$_};
}
You said the files could have duplicate lines, but you didn't say how you wanted to handle them. The above handles duplicates as follows:
File 1:
a
a
a
b
c
File 2:
c
a
Output:
b

Related

Trying to compare "parts of speech" tags of two files and print the matched tags in a separate file

I am trying to code a perl program to compare the "parts of speech" tags of two text files and print the matched tags along with the corresponding words in a separate file in Windows.
File1:
boy N
went V
loves V
girl N
File2:
boy N
swims V
girl N
loves V
The expected output:
boy N N
girl N N
loves V V
The columns are separated by tabs. The coding I did so far:
use strict;
use warnings;
my $filename = 'file1.txt';
open(my $fh, $filename)
or die "Could not open file '$filename'";
while (my $row = <$fh>) {
chomp $row;
print "$row\n";
}
my $tagfile = 'file2.txt';
open(my $tg, $tagfile)
or die "Could not open file '$filename'";
while (my $row = <$tg>) {
chomp $row;
print "$row\n";
}
It's really unclear what you're asking for. But I think this is close.
#!/usr/bin/perl
use strict;
use warnings;
my ($file1, $file2) = #ARGV;
my %words; # Keep details of the words
while (<>) { # Read all input files a line at a time
chomp;
my ($word, $pos) = split;
$words{$ARGV}{$word}{$pos}++;
# If we're processing file1 then don't look for a match
next if $ARGV eq $file1;
if (exists $words{$file1}{$word}{$pos}) {
print join(' ', $word, ($pos) x 2), "\n";
}
}
Running it like this:
./pos file1 file2
Gives:
boy N N
girl N N
loves V V
OK, first off what you want is a hash.
You need to:
read the first file, split it into "word" and "pos".
save it in a hash
read the second file, split each line into "word" and "pos".
compare it to the hash you populated, and check it matches.
Something like this:
#!/usr/bin/env perl
use strict;
use warnings;
#declare our hash:
my %pos_for;
#open the first file
my $filename = 'file1.txt';
open( my $fh, '<', $filename ) or die "Could not open file '$filename'";
while (<$fh>) {
#remove linefeed from this line.
#note - both chomp and split default to using $_ which is defined by the while loop.
chomp;
#split it on whitespace.
my ( $word, $pos ) = split;
#record this value in the hash %pos_for
$pos_for{$word} = $pos;
}
close($fh);
#process second file:
my $tagfile = 'file2.txt';
open( my $tg, '<', $tagfile ) or die "Could not open file '$filename'";
while (<$tg>) {
#remove linefeed from this line.
chomp;
#split it on whitespace.
my ( $word, $pos ) = split;
#check if this word was in the other file
if (defined $pos_for{$word}
#and that it's the same "pos" value.
and $pos_for{$word} eq $pos
)
{
print "$word $pos\n";
}
}
close($tg);

How to replace string dynamically using perl script

I am trying to solve below issues.
I have 2 files. Address.txt and File.txt. I want to replace all A/B/C/D (File.txt) with corresponding string value (Read from Address.txt file) using perl script. It's not replacing in my output file. I am getting same content of File.txt.
I tried below codes.
Here is Address.txt file
A,APPLE
B,BAL
C,CAT
D,DOG
E,ELEPHANT
F,FROG
G,GOD
H,HORCE
Here is File.txt
A B C
X Y X
M N O
D E F
F G H
Here is my code :
use strict;
use warnings;
open (MYFILE, 'Address.txt');
foreach (<MYFILE>){
chomp;
my #data_new = split/,/sm;
open INPUTFILE, "<", $ARGV[0] or die $!;
open OUT, '>ariout.txt' or die $!;
my $src = $data_new[0];
my $des = $data_new[1];
while (<INPUTFILE>) {
# print "In while :$src \t$des\n";
$_ =~ s/$src/$des/g;
print OUT $_;
}
close INPUTFILE;
close OUT;
# /usr/bin/perl -p -i -e "s/A/APPLE/g" ARGV[0];
}
close (MYFILE);
If i Write $_ =~ s/A/Apple/g;
Then output file is fine and A is replacing with "Apple". But when dynamically coming it's not getting replaced.
Thanks in advance. I am new in perl scripting language . Correct me if I am wrong any where.
Update 1: I updated below code . It's working fine now. My questions Big O of this algo.
Code :
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "Address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "File1.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
Not entirely sure how you want your output formatted. Do you want to keep the rows and columns as is?
I took a similar approach as above but kept the formatting the same as in your 'file.txt' file:
#!/usr/bin/perl
use warnings;
use strict;
open( my $out_fh, ">", "output.txt" ) || die "Can't open the output file for writing: $!";
open( my $address_fh, "<", "address.txt" ) || die "Can't open the address file: $!";
my %lookup = map { chomp; split( /,/, $_, 2 ) } <$address_fh>;
open( my $file_fh, "<", "file.txt" ) || die "Can't open the file.txt file: $!";
while (<$file_fh>) {
my #line = split;
for my $char ( #line ) {
( exists $lookup{$char} ) ? print $out_fh " $lookup{$char} " : print $out_fh " $char ";
}
print $out_fh "\n";
}
That will give you the output:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
Here's another option that lets Perl handle the opening and closing of files:
use strict;
use warnings;
my $addresses_txt = pop;
my %hash = map { $1 => $2 if /(.+?),(.+)/ } <>;
push #ARGV, $addresses_txt;
while (<>) {
my #array;
push #array, $hash{$_} // $_ for split;
print "#array\n";
}
Usage: perl File.txt Addresses.txt [>outFile.txt]
The last, optional parameter directs output to a file.
Output on your dataset:
APPLE BAL CAT
X Y X
M N O
DOG ELEPHANT FROG
FROG GOD HORCE
The name of the addresses' file is implicitly popped off of #ARGV for use later. Then, a hash is built, using the key/value pairs in File.txt.
The addresses' file is read, splitting each line into its single elements, and the defined-or (//) operator is used to returned the defined hash item or the single element, which is then pushed onto #array. Finally, the array is interpolated in a print statement.
Hope this helps!
First, here is your existing program, rewritten slightly
open the address file
convert the address file to a hash so that the letters are the keys and the strings the values
open the other file
read in the single line in it
split the line into single letters
use the letters to lookup in the hash
use strict;
use warnings;
open(my $a,"Address.txt")||die $!;
my %address=map {split(/,/) } map {split(' ')} <$a>;
open(my $f,"File.txt")||die $!;
my $list=<$f>;
for my $letter (split(' ',$list)) {
print $address{$letter}."\n" if (exists $address{$letter});
}
to make another file with the substitutions in place alter the loop that processes $list
for my $letter (split(' ',$list)) {
if (exists $address{$letter}) {
push #output, $address{$letter};
}
else {
push #output, $letter;
}
}
open(my $o,">newFile.txt")||die $!;
print $o "#output";
Your problem is that in every iteration of your foreach loop you overwrite any changes made earlier to output file.
My solution:
use strict;
use warnings;
open my $replacements, 'Address.txt' or die $!;
my %r;
foreach (<$replacements>) {
chomp;
my ($k, $v) = split/,/sm;
$r{$k} = $v;
}
my $re = '(' . join('|', keys %r) . ')';
open my $input, "<", $ARGV[0] or die $!;
while (<$input>) {
s/$re/$r{$1}/g;
print;
}
#!/usr/bin/perl -w
# to replace multiple text strings in a file with text from another file
#select text from 1st file, replace in 2nd file
$file1 = 'Address.txt'; $file2 = 'File.txt';
# save the strings by which to replace
%replacement = ();
open IN,"$file1" or die "cant open $file1\n";
while(<IN>)
{chomp $_;
#a = split ',',$_;
$replacement{$a[0]} = $a[1];}
close IN;
open OUT,">replaced_file";
open REPL,"$file2" or die "cant open $file2\n";
while(<REPL>)
{chomp $_;
#a = split ' ',$_; #replaced_data = ();
# replace strings wherever possible
foreach $i(#a)
{if(exists $replacement{$i}) {push #replaced_data,$replacement{$i};}
else {push #replaced_data,$i;}
}
print OUT trim(join " ",#replaced_data),"\n";
}
close REPL; close OUT;
########################################
sub trim
{
my $str = $_[0];
$str=~s/^\s*(.*)/$1/;
$str=~s/\s*$//;
return $str;
}

Comparing Two text files in perl and output the matched result

I want to compare two text files that i have generated from one of the perl script that i wrote.
I want to print out the matched results from those two text files. I tried looking at couple of answers and questions that people have asked on stackoverflow but it does not work for me. Here is what i have tried.
my $file1 = "Scan1.txt";
my $file2 = "Scan2.txt";
my $OUTPUT = "final_result.txt";
my %results = ();
open FILE1, "$file1" or die "Could not open $file1 \n";
while(my $matchLine = <FILE1>)
{
$results{$matchLine} = 1;
}
close(FILE1);
open FILE2, "$file2" or die "Could not open $file2 \n";
while(my $matchLine =<FILE2>)
{
$results{$matchLine}++;
}
close(FILE2);
open (OUTPUT, ">$OUTPUT") or die "Cannot open $OUTPUT \n";
foreach my $matchLine (keys %results) {
print OUTPUT $matchLine if $results{$matchLine} ne 1;
}
close OUTPUT;
EXAPLE OF OUTPUT THAT I WANT
FILE1.TXT
data 1
data 2
data 3
FILE2.TXT
data2
data1
OUTPUT
data 1
data 2
Your problem is that your hash now has following states:
0 (line not found anywhere),
1 (line found in file1 OR line found once in file2),
2 (line found in file1 and once in file2, OR line found twice in file2)
n (line found in file1 and n-1 times in file2, OR line found n times in file2)
This ambiguity will make your check (hash ne 1) fail.
The minimal required change to your algorithm would be:
my $file1 = "Scan1.txt";
my $file2 = "Scan2.txt";
my $OUTPUT = "final_result.txt";
my %results = ();
open FILE1, "$file1" or die "Could not open $file1 \n";
while(my $matchLine = <FILE1>)
{
$results{$matchLine} = 1;
}
close(FILE1);
open FILE2, "$file2" or die "Could not open $file2 \n";
while(my $matchLine =<FILE2>)
{
$results{$matchLine} = 2 if $results{$matchLine}; #Only when already found in file1
}
close(FILE2);
open (OUTPUT, ">$OUTPUT") or die "Cannot open $OUTPUT \n";
foreach my $matchLine (keys %results) {
print OUTPUT $matchLine if $results{$matchLine} ne 1;
}
close OUTPUT;

How to remove the header line in many files and renaming them using Perl

I need help trying to process many small files. I need to remove the first line (header date line) if it exists and then rename the file q_dat_20110816.out => q_dat_20110816.dat.
I figured out how to open the file and do the match and print out the line I need to remove.
Now I need to figure out how to remove that line and then rename the file altogether.
How would you approach this?
Test code:
#!/usr/local/bin/perl
use strict;
use warnings;
my $file = '/share/dev/dumps/q_dat_20110816.out';
$file = $ARGV[0] if (defined $ARGV[0]);
open DATA, "< $file" or die "Could not open '$file'\n";
while (my $line = <DATA>) {
$count++;
chomp($line);
if ($line =~m/(Data for Process Q)/) {
print "GOT THE DATE: --$line\n";
exit;
}
}
close DATA;
Sample file: q_dat_20110816.out
Data for Process Q, for 08/16/2011
Make Model Text
a b c
d e f
g h i
New file: q_dat_20110816.dat
Make Model Text
a b c
d e f
g h i
Here's one way to do it:
use strict;
use warnings;
my #old_file_names = #ARGV;
for my $f (#old_file_names){
# Slurp up the lines.
local #ARGV = ($f);
my #lines = <>;
# Drop the line you don't want.
shift #lines if $lines[0] =~ /^Data for Process Q/;
# Delete old file.
unlink $f;
# Write the new file.
$f =~ s/\.out$/.dat/;
open(my $h, '>', $f) or die "$f: $!";
print $h #lines;
}
Low-on-memory father-son solution:
use strict;
use warnings;
for my $fni (#ARGV) {
open(FI, '<', $fni) or die "cant open in '$fni', $!,";
my $fno = $fni; $fno =~ s/\.out$/.dat/;
open(FO, '>', $fno) or die "cant open out '$fno', $!,";
foreach ( <FI> ) {
print FO unless $. == 1 and /^Data for Process Q/;
};
close FO;
close FI;
unlink $fni;
};
It is untested!

perl increasing the counter number every time the script running

I have a script to compare 2 files and print out the matching lines on the file. what I want to add a logic to help me to identify for how long these devices are matched. currently I have add the starting point 1 so I want to increase that number every time the script run and matched.
Example.
inputfile:-########################
retiredDevice.txt
Alpha
Beta
Gamma
Delta
prodDevice.txt
first
second
third
forth
Gamma
Delta
output file :-#######################
final_result.txt
1 Delta
1 Gamma
my objective is to add a counter stamp on each matching line to identify for how long "Delta" and "Gamma" matched. the script running every week. so every time the script running adding 1 so when I audit the 'finalResult.txt. the result should looks like
Delta 4
Gamma 3
the result indicate me Delta matched for last 4 weeks and Gamma for last 3 weeks.
#! /usr/local/bin/perl
my $ndays = 1;
my $f1 = "/opt/retiredDevice.txt ";
my $f2 = "prodDevice.txt";
my $outfile = "/opt/final_result.txt";
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){ $results{$line}=1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
$results{$line}++;
}
close(FILE2);
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
my $x = $ndays;
$x++;
print OUTFILE "$x : ", $line if $results{$line} != 1;
}
close OUTFILE;
Thanks in advance for any help!
Based on your earlier question and comments, perhaps this might work.
use strict;
use warnings;
use autodie;
my $logfile = 'int.txt';
my $f1 = shift || "/opt/test.txt";
my $f2 = shift || "/opt/test1.txt";
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>) {
chomp $line;
$results{$line} = 1;
}
open my $file2, '<', $f2;
while (my $line = <$file2>) {
chomp $line;
$results{$line}++;
}
{ ############ added part
my %c;
for (keys %results) {
$c{$_} = $results{$_} if $results{$_} > 1;
}
%results = %c;
} ############ end added part
my (%log, $log);
if ( -e $logfile ) {
open $log, '<', $logfile;
while (<$log>) {
my ($num, $key) = split;
$log{$key} = $num;
}
}
open $log, '>', $logfile or die $!;
for my $key (keys %results) {
my $old = ( $log{$key} || 0 ); # keep old count, or 0 otherwise
my $new = ( $results{$key} ? 1 : 0 ); # 1 if it exists, 0 otherwise
print $log $old + $new, " $key\n";
}
Perform this computation in two steps.
Each time you run the comparison between retired and prod, produce an output file that you save with a unique file name, e.g. result-XXX where XXX denotes when you ran the comparison.
Then write a script which iterates over all of the result-XXX files and produces a summary.
I would name the files result-YYYY-MM-DD where YYYY-MM-DD is the date that the comparison was created. Then it will be relatively easy to iterate over a subset of the files (e.g. ones for a certain month).
Or store the data in a relational database.