How to show matching and Miss matching records of two text files in command prompt using Perl? - perl

I'm using two text files sampleA.txt and sampleB.txt. I have two fields in each file and I need to compare first record(first row) of sampleA.txt with the first row of sampleB.txt and I want to show matching records as well as miss matching records in command prompt.I need to do that in Perl.
Using the below script I'm getting one output but it is wrong. I need to populate both matching as well as mismatching. How to do that?
sampleA.txt:
1|X
2|A
4|Z
5|A
sampleB.txt:
2|A
2|X
3|B
4|C
Output I'm getting:
2|A
2|X
4|C
Outputs I want:
Matching-Output:
2|A
Miss-matching-Output:
1|X
4|Z
5|A
3|B
4|C
Perl Script:
#!/usr/bin/perl
use strict;
use warnings;
open(FILE1,'C:\Users\sathiya.kumar\Desktop\sampleA.txt') || die $!;
open(FILE2,'C:\Users\sathiya.kumar\Desktop\sampleB.txt') || die $!;
my $interline;
while (my $line= <FILE1>) {
my #fields = split('\|',$line);
parser($fields[0]);
}
sub parser {
my $mergeid = shift;
while (defined $interline || ($interline= <FILE2>)) {
my #fields = split('\|',$interline);
my $key = $fields[0];
if ($key lt $mergeid) {
# Skip non-matching records
$interline = undef;
next;
} elsif ($key gt $mergeid) {
# wait for next key
last;
} else {
print $interline;
$interline = undef;
}
}
}
close(FILE1);
close(FILE2);
Let me know if you need more information.

You left out 2|X:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
#Create a set from the entries in sampleA.txt:
my $fname = 'sampleA.txt';
open my $A_INFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %a;
while (my $line = <$A_INFILE>) {
chomp $line;
$a{$line} = undef;
}
close $A_INFILE;
say Dumper(\%a);
#Create a set from the entries in sampleB.txt:
$fname = 'sampleB.txt';
open my $B_INFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %b;
while (my $line = <$B_INFILE>) {
chomp $line;
$b{$line} = undef;
}
close $B_INFILE;
say Dumper(\%b);
#Divide the entries in both files into matches and mismatches:
my (#matches, #mismatches);
for my $a_val (keys %a) {
if (exists $b{$a_val}) {
push #matches, $a_val;
}
else {
push #mismatches, $a_val;
}
}
for my $b_val (keys %b) {
if (not exists $a{$b_val}) {
push #mismatches, $b_val;
}
}
say Dumper(\#matches);
say Dumper(\#mismatches);
--output:--
$VAR1 = {
'5|A' => undef,
'4|Z' => undef,
'1|X' => undef,
'2|A' => undef
};
$VAR1 = {
'2|X' => undef,
'3|B' => undef,
'4|C' => undef,
'2|A' => undef
};
$VAR1 = [
'2|A'
];
$VAR1 = [
'5|A',
'4|Z',
'1|X',
'2|X',
'3|B',
'4|C'
];
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16" , which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen. If a tied hash is evaluated in scalar context, the SCALAR method is called (with a fallback to FIRSTKEY ).
http://perldoc.perl.org/perldata.html

Related

Passing string and hash ref to subroutines

I am passing a string and a hash to a subroutine where the hash is accepted as a reference to the subroutine. At the end of the subroutine call, I expect my hash to be filled. Snapshot:
#!/usr/local/bin/perl5.8
sub passHashAndFile {
my ($file, $hashRef) = #_;
open(HANDLE, $file) or die("Can not open file $file \n");
while(<HANDLE>) {
my #splitted_values = split("--", $_);
$hashRef->{$spllited_values[0]} = $hashRef->{$spllited_values[1]};
}
close(HANDLE);
}
my %hash;
passHashAndFile("test.txt", %hash);
foreach my $elem (keys %hash) {
print "Key = $elem, Value = $hash{$elem}\n";
}
And my test.txt looks like this:
1--2
3--4
5--6
7--8
I am catching hash as an reference and then dereferencing it to fill the values. What wrong did it do?
passHashAndFile("test.txt", %hash);
should be
passHashAndFile("test.txt", \%hash);
Additionally, you have misspelled variable names, you assigning the wrong value, you're using unlocalized global vars, and you're using problematic 2-arg open. Also, most people would have the sub return a hash ref instead of taking a ref to an empty hash.
#!/usr/local/bin/perl5.8
use strict;
use warnings;
sub parse_file {
my ($qfn) = #_;
open(my $fh, '<', $qfn)
or die("Can't open file \"$qfn\": $!\n");
my %hash;
while (my $line = <$fh>) {
my ($key, $val) = split(/--/, $line);
$hash{$key} = $val;
}
return \%hash;
}
my $hash = parse_file("test.txt");
for my $key (keys %$hash) {
print "Key = $key, Value = $hash->{$key}\n";
}
We usually leave out of our answers, but always use use strict; use warnings;. It would have found at least one of the problems.

perl: How to make 'warn' think we read from a file?

I have a function (a variation of string++):
sub inc
{
$_[0] =~ /^(.*?)([0-9]+)$/;
my ($a,$b)=($1,$2);
die "cannot increment [$_[0]]" unless defined $b;
warn "increment overflow [$_[0]]" if length(++$b) != length($2);
$a.$b;
}
It is invoked in many places of a script, on different data (sometimes from a file, sometimes from a database).
When I read from a filehandle, die and warn print a message like this:
cannot increment [abc] at script line 5, <filehandle> line 123.
otherwise a shorter message is printed:
cannot increment [abc] at script line 5.
When I read from database I would like to have a message like this:
cannot increment [abc] at script line 5, <SELECT...> line 123.
Is it possible?
Setting the line number is quite simple: an assignment to $. can be made. But how to set the 'filehandle' part and make it visible?
I have found such a workaround:
my $fh = "SELECT...";
open $fh, "/dev/null";
<$fh>;
but it is a bit long, and it actually does open a file.
The filehandle information that appears in warn and die messages is only set after calls to <HANDLE>, readline, tell, eof, and seek. When you fetch data from a database with DBI, for example, you're not calling any of these, so you have to pass the extra data yourself.
One way to do this is to write a custom exception class that stringifies to the text you want:
package MyException;
use strict;
use warnings 'all';
use v5.18.0;
use overload '""' => \&as_string;
sub new {
my ($self, $message, $src, $src_line) = #_;
my ($package, $file, $line) = caller;
if (! defined $src && ref ${^LAST_FH} eq 'GLOB') {
$src = *${^LAST_FH}{NAME};
$src_line = $.;
}
bless { message => $message,
file => $file,
line => $line,
src => $src,
src_line => $src_line }, $self;
}
sub as_string {
my ($self) = #_;
my $message = "$self->{message} at $self->{file} line $self->{line}";
if (defined $self->{src} && defined $self->{src_line}) {
$message .= ", <$self->{src}> line $self->{src_line}";
}
$message .= "\n";
}
1;
Note that Perl 5.18.0 or up is required to use the read-only ${^LAST_FH} variable, which holds a reference to the last read filehandle.
Here's how you would use this when reading from a file:
use strict;
use warnings 'all';
use MyException;
while (<DATA>) {
warn MyException->new('foo'); # equivalent to warn 'foo'
}
__DATA__
first
second
Output:
foo at ./myscript line 9, <DATA> line 1
foo at ./myscript line 9, <DATA> line 2
And here's how you would use it when fetching records from a database:
use strict;
use warnings 'all';
use DBI;
use MyException;
my $dbh = DBI->connect('dbi:mysql:test', 'user', 'pass', {
RaiseError => 1
});
my $sql = 'SELECT * FROM test';
my $sth = $dbh->prepare($sql);
$sth->execute;
my $count;
while (my $row = $sth->fetch) {
warn MyException->new('foo', $sql, ++$count);
}
Output:
foo at ./myscript line 19, <SELECT * FROM test> line 1
foo at ./myscript line 19, <SELECT * FROM test> line 2
(Unfortunately, DBI doesn't provide a method to get the number of rows that have been fetched so far, so you have to count them yourself.)
Since you're trying to warn or die from inside a subroutine, you have to do a little bit more work. The simplest approach for die would be to trap exceptions from your subroutine with eval and re-throw them:
my $count = 1;
while (my $row = $sth->fetch) {
eval {
inc($row[0]);
};
if ($# =~ /^(cannot increment \[.*?\])/) {
die MyException->new($1, $sql, $count);
}
elsif ($#) {
die $#;
}
$count++;
}
You can handle warnings in a similar way by creating a __WARN__ handler:
{
my $count = 1;
local $SIG{__WARN__} = sub {
if ($_[0] =~ /^(increment overflow \[.*?\])/) {
warn MyException->new($1, $sql, $count);
}
else {
warn #_;
}
};
while (my $row = $sth->fetch) {
inc($row[0]);
$count++;
}
}
You may prefer this implementation of your inc subroutine. Your own uses the reserved variables $a and $b, as well as saving and retrieving the initial non-numeric part of the string
Note that the STDERR output is not in sync with STDOUT, so the warning appears prematurely in the aggregated text. In reality the warning is issued only when the passed string has an all-nines numeric field
use strict;
use warnings 'all';
my $s = 'ZZ90';
for ( 1 .. 20 ) {
$s = inc_str($s);
print $s, "\n";
}
sub inc_str {
my ($str) = #_;
$str =~ s{([0-9]+)$}{
my $num = $1;
warn "Increment overflow [$str]" unless $num =~ /[^9]/;
sprintf '%0*d', length($num), $num+1;
}e or die "Cannot increment [$str]";
return $str;
}
output
Increment overflow [ZZ99] at E:\Perl\source\inc_str.pl line 18.
ZZ91
ZZ92
ZZ93
ZZ94
ZZ95
ZZ96
ZZ97
ZZ98
ZZ99
ZZ100
ZZ101
ZZ102
ZZ103
ZZ104
ZZ105
ZZ106
ZZ107
ZZ108
ZZ109
ZZ110

How to distinguish between "0" and NULL in perl?

Here we are looking for the string "reftext" in the given file. The line next to this contains a string with 3 integers. So we are extracting them in #all_num. We are printing the value of #all_num[2] only if is not NULL. But the logic used here doesn't print #all_num[2] even if it has 0.
#!/usr/bin/perl
open( READFILE, "<myfile.txt" );
#list = <READFILE>;
$total_lines = scalar #list;
for ( $count = 0; $count < $total_lines; $count++ ) {
if (#list[ $count =~ /reftext/ )
{
#all_num = #list[ $count + 1 ] =~ /(\d+)/g;
if ( #all_num[2] != NULL ) {
print "#all_num[2]\n";
}
}
}
Hope this helps,
use strict;
use warnings;
my #fvals = (
[ i => undef ],
[ j => 0 ],
[ k => "" ],
);
for my $r (#fvals) {
my ($k, $v) = #$r;
if (!defined($v)) { print "$k is undef\n"; }
elsif (!length($v)) { print "$k is empty string\n"; }
# elsif (!$v) { print "$k is zero\n"; }
# recognizes zero value in "0.0" or "0E0" notation
elsif ($v == 0) { print "$k is zero\n"; }
}
output
i is undef
j is zero
k is empty string
Perl does not include a NULL, so the line
if(#all_num[2]!= NULL)
is nonsensical in Perl. (More accurately, it attempts to locate a sub named NULL and run it to get the value to compare against #all_num[2], but fails to do so because you (presumably) haven't defined such a sub.) Note that, if you had enabled use strict, this would cause a fatal error instead of pretending to work. This is one of the many reasons to always use strict.
Side note: When you pull a value out of an array, it's only a single value, so you should say $all_num[2] rather than #all_num[2] when referring to the third element of the array #all_num. (Yes, this is a little confusing to get used to. I hear that it's been changed in Perl 6, but I'm assuming you're using Perl 5 here.) Note that, if you had enabled use warnings, it would have told you that "Scalar value #all_num[2] better written as $all_num[2]". This is one of the many reasons to always use warnings.
If you want to test whether $all_num[2] contains a value, the proper way to express that in Perl is
if (defined $all_num[2])
This is how your program would look using best practices
You should
Always use strict and use warnings, and declare all your variables with my
Use the three-parameter form of open
Check that open calls succeeded, and include $! in the die string if not
Use a while loop to process a file one line at a time, in preference to reading the entire file into memory
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'myfile.txt' or die $!;
while ( <$fh> ) {
next unless /reftext/;
my $next_line = <$fh>;
my #all_num = $next_line =~ /\d+/g;
print "$all_num[2]\n" if defined $all_num[2];
}
Try this:
#!/usr/bin/perl
use warnings;
use strict;
open(READFILE, "<", "myfile.txt") or die $!;
my #list = <READFILE>;
my $total_lines = scalar #list;
close (READFILE);
for(my $count=0; $count<$total_lines; $count++)
{
if($list[$count] =~ /reftext/)
{
my #all_num = $list[$count+1] =~ /(\d+)/g;
if($all_num[2] ne '')
{
print "$all_num[2]\n";
}
}
}
To check a variable is null or not:
if ($str ne '')
{
print $str;
}
or better:
my ($str);
$str = "";
if (defined($str))
{
print "defined";
}
else
{
print "not defined";
}
If the other answers do not work, try treating the variable as a string:
if ( $all_num[2] == 'null' && length($all_num[2]) == 4 ){
# null
} else {
# not null
}
As with any code you write, be sure to test your code.

Read ini files without section names

I want to make a configuration file which hold some objects, like this (where of course none of the paramaters can be considered as a primary key)
param1=abc
param2=ghj
param1=bcd
param2=hjk
; always the sames parameters
This file could be read, lets say with Config::IniFiles, because it has a direct transcription into ini file, like this
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
with, for example, something like
perl -pe 'if (m/^\s*$/ || !$section ) print "[", ($section++ || 0) , "]"'
And finish with
open my $fh, '<', "/path/to/config_file.ini" or die $!;
$cfg = Config::IniFiles->new( -file => $fh );
(...parse here the sections starting with 0.)
But, I here ask me some question about the thing becoming quite complex....
(A) Is There a way to transform the $fh, so that it is not required to execute the perl one-liner BEFORE reading the file sequentially? So, to transform the file during perl is actually reading it.
or
(B) Is there a module to read my wonderfull flat database? Or something approching? I let myslef said, that Gnu coreutils does this kind of flat file reading, but I cannot remember how.
You can create a simple subclass of Config::INI::Reader:
package MyReader;
use strict;
use warnings;
use base 'Config::INI::Reader';
sub new {
my $class = shift;
my $self = $class->SUPER::new( #_ );
$self->{section} = 0;
return $self;
}
sub starting_section { 0 };
sub can_ignore { 0 };
sub parse_section_header {
my ( $self, $line ) = #_;
return $line =~ /^\s*$/ ? ++$self->{section} : undef ;
}
1;
With your input this gives:
% perl -MMyReader -MData::Dumper -e 'print Dumper( MyReader->read_file("cfg") )'
$VAR1 = {
'1' => {
'param2' => 'hjk',
'param1' => 'bcd'
},
'0' => {
'param2' => 'ghj',
'param1' => 'abc'
}
};
You can use a variable reference instead of a file name to create a filehandle that reads from it:
use strict;
use warnings;
use autodie;
my $config = "/path/to/config_file.ini";
my $content = do {
local $/;
open my $fh, "<", $config;
"\n". <$fh>;
};
# one liner replacement
my $section = 0;
$content =~ s/^\s*$/ "\n[". $section++ ."]" /mge;
open my $fh, '<', \$content;
my $cfg = Config::IniFiles->new( -file => $fh );
# ...
You can store the modified data in a real file or a string variable, but I suggest that you use paragraph mode by setting the input record separator $/ to the empty string. Like this
use strict;
use warnings;
{
local $/ = ''; # Read file in "paragraphs"
my $section = 0;
while (<DATA>) {
printf "[%d]\n", $section++;
print;
}
}
__DATA__
param1=abc
param2=ghj
param1=bcd
param2=hjk
output
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
Update
If you read the file into a string, adding section identifiers as above, then you can read the result directly into a Config::IniFiles object using a string reference, for instance
my $config = Config::IniFiles->new(-file => \$modified_contents)
This example shows the tie interface, which results in a Perl hash that contains the configuration information. I have used Data::Dump only to show the structure of the resultant hash.
use strict;
use warnings;
use Config::IniFiles;
my $config;
{
open my $fh, '<', 'config_file.ini' or die "Couldn't open config file: $!";
my $section = 0;
local $/ = '';
while (<$fh>) {
$config .= sprintf "[%d]\n", $section++;
$config .= $_;
}
};
tie my %config, 'Config::IniFiles', -file => \$config;
use Data::Dump;
dd \%config;
output
{
# tied Config::IniFiles
"0" => {
# tied Config::IniFiles::_section
param1 => "abc",
param2 => "ghj",
},
"1" => {
# tied Config::IniFiles::_section
param1 => "bcd",
param2 => "hjk",
},
}
You may want to perform operations on a flux of objects (as Powershell) instead of a flux of text, so
use strict;
use warnings;
use English;
sub operation {
# do something with objects
...
}
{
local $INPUT_RECORD_SEPARATOR = '';
# object are separated with empty lines
while (<STDIN>) {
# key value
my %object = ( m/^ ([^=]+) = ([[:print:]]*) $ /xmsg );
# key cannot have = included, which is the delimiter
# value are printable characters (one line only)
operation ( \%object )
}
A like also other answers.

Dealing with multiple capture groups in multiple records

Data Format:
attribname: data
Data Example:
cheese: good
pizza: good
bagel: good
fire: bad
Code:
my $subFilter='(.+?): (.+)';
my #attrib = ($dataSet=~/$subFilter/g);
for (#attrib)
{
print "$_\n";
}
The code spits out:
cheese
good
pizza
good
[etc...]
I was wondering what an easy Perly way to do this is? I am parsing the data from a log the data above is trash for simplicity. I am newer to Perl, I suspect I could do this via fanangling indexes, but I was wondering if there is a short method of implementing this? Is there any way to have the capture groups put into two different variables instead of serially appended to the list along with all matches?
Edit: I want the attribute and it's associated value together so I can the do what I need to to them. For example if within my for loop I could access both the attribute name and attribute value.
Edit:
I tried
my %attribs;
while (my $line = <$data>)
{
my ($attrib, $value) = ($line=~m/$subFilter/);
print $attribs{$attrib}," : ", $value,"\n";
}
and no luck :( I don't get any output with this. My data is in a variable not a file, because it parsed out of a set of parent data which is in a file. It would be convenient if the my variable worked so that my (#attrib, #value) = ($line=~/$subFilter/g); filled the lists appropriately with the multiple matches.
Solution:
my #line = ($7 =~/(.+?)\n/g);
for (#line)
{
my ($attrib, $value) = ($_=~m/$subFilter/);
if ($attrib ne "")
{
print $attrib," : ", $value,"\n";
}
}
I'm not really clear on what you actually want to store, but here's how you could store the data in a hash table, with '1' indicating good and '0' indicating 'bad':
use strict;
use warnings;
use Data::Dumper;
my %foods;
while (my $line = <DATA>)
{
chomp $line;
my ($food, $good) = ($line =~ m/^(.+?): (.+)$/);
$foods{$food} = ($good eq 'good' ? 1 : 0);
}
print Dumper(\%foods);
__DATA__
cheese: good
pizza: good
bagel: good
fire: bad
This prints:
$VAR1 = {
'bagel' => 1,
'cheese' => 1,
'fire' => 0,
'pizza' => 1
};
A sensible approach would be to make use of the split function:
my %attrib;
open my $data, '<', 'fileName' or die "Unable to open file: $!";
while ( my $line = <$data> ) {
my ( $attrib, $value ) = split /:\s*/, $line, 2;
$attrib{$attrib} = $value;
}
close $data;
foreach my $attrib ( keys %attrib ) {
print "$attrib: $attrib{$attrib}\n";
}
If you're into one-liners, the following would achieve the same:
$ perl -F/:\s*/ -ane '$attrib{$F[0]} = $F[1]; } END { print $_,"\t",$attrib{$_},"\n" foreach keys %attrib;" fileName