compare the value of same items with one exception case - perl

A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
QC4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
QA4,train,75
F,movie,
It should do comparison between the values of same type (wherever the 2nd column matches) and Print the differ .Now i want output to look like:
**A,food,75 is not matching with B,food,75 C,food,85
A,car,69 is not matching with C,car,136 B,Car,136
A,house,179 is not matching with B,house,150
QC4,train,95 is not matching with QA4,train,75
F,movie missing value
Code I've written is below but its not printing the format the way I want.
while (FILE) {
my $line = $_ ;
my #lines = split /,/, $line ;
$data{$lines[1]}{$lines[0]} = $lines[2] ;
}
foreach my $item (keys %val) {
foreach my $letter1 (keys %{$val{$item}}) {
foreach my $letter2 (keys %{$val{$item}}) {
if ( ($val{$item}{$letter1} != $val{$item}{$letter2}) && ($letter1 ne
$letter2) && ( (!defined $done{$item}{$letter1}{$letter2}) ||
(!defined
$done{$item}{$letter2}{$letter1}) ) ) {
print "$item : $letter1, $val{$item}{$letter1}, $letter2 ,
$val{$item}
{$letter2}\n" ;
}
}

Really hard to follow the logic of your code.
But I seem to get the desired result with this:
[Edit] The code was edited as per the comments
use strict;
use warnings;
my (%hash,
);
while(my $line=<DATA>) {
chomp $line;
my ($letter, $object, $number)=split /,/, $line;
### here we are dealing with missing values
if ($number =~ m{^\s*$}) {
print $line, "missing value\n";
next;
}
### here we dissever exceptional items apart from the others
if ($letter=~m{^E[AC]X\d$}) {
$object = "exceptional_$object";
}
$number+=0; # in case there is whitespace at the end
push #{$hash{$object}{$number}}, [$letter,$number,$line];
}
for my $object(sort keys %hash) {
my $oref = $hash{$object};
if (1==keys %$oref) {
next;
}
my $str;
for my $item (values %$oref) {
$str .= $str ? " $item->[0][2]" : "$item->[0][2] is not matching with";
}
print ($str,"\n");
}
__DATA__
A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
ECX4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
EAX4,train,75
F,movie,
output
F,movie,missing value
A,car,69 is not matching with B,car,136
EAX4,train,75 is not matching with ECX4,train,95
C,food,85 is not matching with A,food,75
A,house,179 is not matching with B,house,150
What the algorithm does:
Looping through the input we remember the all the lines for each unique pair of object and number.
After going through the input loop we do the following:
For each object we skip it if it has no different numbers:
if (1==keys %$oref) {
next
}
If it has, we build an output string from a list of the first remembered lines for that object and number (that is we omit the duplicates for the object and number);
the first item from the list amended with "is not matching with".
Also, I am reading from the special filehandle DATA, which accesses embedded data in the script. This is for convenience of demoing the code

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

Extra HASH() reference added to Perl hash output

I'm trying to read a FORTRAN program using Perl, and remove an INCLUDE command and replace it with a USE.
This is working great, except when printing out the contents of the hash storing the existing USE statements, I get an extra entry.
my #lines = ( );
my %uses = { };
foreach $f1line (<>) {
$f1line =~ s/\r[\n]*//g;
if ($f1line =~ /^\s*INCLUDE 'FILE1.CMN'/ ||
$f1line =~ /^\s*INCLUDE 'FILE2.CMN'/ ||
$f1line =~ /^\s*INCLUDE 'FILE3.CMN'/) {
$f1line = " USE My_Mod";
}
if ($f1line =~ /^\s*USE /) {
$uses{$f1line} = 1;
}
push #lines, $f1line . $/;
}
$found = 0;
foreach $line (#lines) {
if ($found == 0 && $line =~ /^\s*USE /) {
foreach my $x (sort(keys(%uses))) {
print $x . $/; # (1)
}
$found = 1;
} elsif ($found == 1 && $line =~ /^\s*USE /) {
next;
} else {
print $line;
}
}
The output is this:
C Include parameters here
USE My_Mod
USE MyOther_Mod
USE EvenAnother_Mod
HASH(0x7f9dc3805ce8)
Where is the HASH(0x...) reference coming from? The only place I'm printing the contents of the hash is on line (1). It almost looks like iterating over the keys of the hash is somehow including the hash itself.
How can I get rid of this?
You are not really having a big problem, the big deal here is that you are not able to see the errors you are doing.
That's why you should always strict and warnings
In your code you start with:
my %uses = { };
When it should be:
my %uses = ();
or
my %uses; #it's fine also
And then it will works.
By using {} in a "hash" context you could create a hashref which is not the case.
A reference to an anonymous hash can be created using curly brackets:
$hashref = {
'Adam' => 'Eve',
'Clyde' => 'Bonnie',
};
Also is a good practice declare your variables in foreach loop like:
foreach my $line (#lines) {
And in the rest of your code.

what aren't I getting here?

This one really has me confused and I don't know how to accurately title it.
I am writing a program, the purpose is irrelevant, but some of you may know as I've been asking a few questions about it recently.
I'm going to post the entire program but I don't think that's necessary. The part you need to look at is the nested loops where it says "beginning search algorithm."
The program takes a long time to complete, so after every iteration of the outermost loop, I print a '.'. after every 7 dots a new line is printed.
for some reason, however no dots will get printed, until a newline is printed.
heres the code:
#!/usr/bin/perl
use v5.14;
use warnings;
# this is a cgi implementation of a theorum proover.
# the program uses resolution refutation, using a breadth-first and set of support strategy
# to generate a proof(if possible) and relay the results to the user.
########################################################################################
#Algorithm:
#1.) Get size(i) of knowledge base
#2.) untill you have i clauses
# 3.) get the clause, add to knowledge base
#4.) get the conclusion variable(conjecture)
#5.) add the negation of the conjecture to the knowledge base
#6.) add the negation of the conjecture to the SOS set.
#7.) compare the SOS set to ever other clause
# 8.) if resolution is possible, add the new clause to the knowledge base if it does not already exist.
# 9.) add the new clause to the SOS set.
#10.) repeat 7-9 untill the null clause is generated or no more resolution is possible.
########################################################################################
my $conclusion;
my $conclusion2;
my #conclusion;
my #SOS;
my #clauses;
my $found=0;
#batch mode
if($ARGV[0])
{
my $filename = $ARGV[0];
open(IN, "<", $filename);
chomp(#clauses=<IN>);
close(IN);
for(#clauses)
{
$_ =~ s/[^A-Za-z~,]//g;
}
#negate the negation to get the desired conclusion for later
$conclusion2=$clauses[$#clauses];
print "$conclusion2";
#conclusion = split("", $conclusion2);
if($conclusion[0] eq '~')
{
splice(#conclusion, 0, 1);
$found=1;
}
if (!$found)
{
$conclusion = "~$conclusion2";
}
else
{
$conclusion = join("", #conclusion);
}
#now break up each line and make #clauses 2d
$_ = [split /,/ ] for #clauses;
}
#interactive mode
else
{
my $count=0;
say "Welcome to my Theorum Proover!";
say "How many clauses are in your knowledge base?";
say "(this does not include the conclusion)";
print "Amount: ";
my $amt = <>;
say "Enter your clauses: ";
say "Negations can be indicated with a '~'.";
say "Variable names must contain only letters.";
say "Separate each literal with a ','<br>";
my $clauses;
while($count < $amt)
{
print "clause $count:";
$clauses .= <>;
$clauses =~ s/[^A-Za-z~,]//g;
$clauses .= ";";
$count++;
print "\n";
}
print "\n \n \n Enter the conclusion, your conclusion should be a literal:";
$conclusion = <>;
$conclusion =~ s/[^A-Za-z~]//g;
print "\n";
#negate the conclusion and add it to the set of clauses.
#conclusion = split("", $conclusion);
if($conclusion[0] eq '~')
{
splice(#conclusion, 0, 1);
$found=1;
}
if (!$found)
{
$conclusion2 = "~$conclusion";
}
else
{
$conclusion2 = join("", #conclusion);
}
# split up the contents of the clause string and add them to a 2d array.
#then, add the negated conclusion to the list.
my #PartClauses= split(';', $clauses);
my $last=#PartClauses;
for my $i (0 .. $#PartClauses)
{
my #tmp=split(',', $PartClauses[$i]);
for my $j (0 .. #tmp)
{
$clauses[$i][$j] = $tmp[$j];
}
}
$clauses[$last][0] = $conclusion2;
}
open(RESULTS, ">", 'results.txt');
for my $i (0 .. $#clauses)
{
print RESULTS "clause $i: {";
for my $j (0 .. $#{ $clauses[$i] })
{
print RESULTS "$clauses[$i][$j]";
if($j != $#{ $clauses[$i] })
{
print RESULTS ",";
}
}
print RESULTS "}\n";
}
print RESULTS "_____________________________\n";
print "Beginning search ....";
##################################################
#begin breadthfirst/sos search/add algorithm
$SOS[0][0]=$conclusion2;
my $cSize=$#clauses;
say "\nworking......";
my $sAdd=0;
my $cAdd=0;
my $res=0;
my $flag=0;
my $dots=0;
SOSROW:
for (my $a=0; $a<=$#SOS; $a++)
{
&update;
CLAUSEROW:
for (my $i=0; $i<=$#clauses; $i++)
{
SOSCOL:
for (my $b=0; $b<=$#{ $SOS[$a] }; $b++)
{
CLAUSECOL:
for my $j (0 .. $#{ $clauses[$i] })
{
if($SOS[$a][$b] eq "~$clauses[$i][$j]"
|| $clauses[$i][$j] eq "~$SOS[$a][$b]")
{
my #tmp;
#found a resolution, so add all other literals from
#both clauses to each set as a single clause
#start with the SOS literals(use a hash to keep track of duplicates)
my %seen;
for my $x (0 .. $#{ $SOS[$a] })
{
if($x != $b)
{
$seen{$SOS[$a][$x]}=1;
push #tmp, "$SOS[$a][$x]";
}
}
#now add the literals from the non-SOS clause
for my $y (0 .. $#{ $clauses[$i] })
{
if($y != $j)
{
if(! $seen{ $clauses[$i][$y] })
{
push(#tmp, "$clauses[$i][$y]");
}
}
}
#check to see if the clause is already listed
my $dupl = 0;
my #a1 = sort(#tmp);
my $s1 = join("", #a1);
MATCH:
for my $i (0 .. $#clauses)
{
my #a2= sort(#{ $clauses[$i] });
my $s2= join("", #a2);
if($s1 eq $s2 )
{
$dupl = 1;
last MATCH;
}
}
#if it isn't, go ahead and add it in
if(! $dupl)
{
$res++;
$sAdd++;
$cAdd++;
my $s = $cSize + $cAdd;
push(#SOS, \#tmp);
push(#clauses, \#tmp);
#print out the new clauses.
print RESULTS"clause $s: ";
my $clause = $cSize+$a;
print RESULTS "{";
if($SOS[$sAdd][0])
{
for my $j(0 .. $#{ $clauses[$s] })
{
if($clauses[$s][$j])
{
print RESULTS "$clauses[$s][$j]";
}
if($j!= $#{ $clauses[$s] })
{
print RESULTS ",";
}
}
print RESULTS "} ($i,$clause)\n";
}
#if you found a new res, but there was nothing to push, you found
# the contradiction, so signal and break.
else
{
print RESULTS "} ($i,$clause)\n";
$flag=1;
last SOSROW;
}
}
}
}
}
}
}
close(RESULTS);
if($flag)
{
say "After $res resolutions, a resolvent was found and the empty set was generated.";
say "This indicates that when '$conclusion' is false, the entire knowledge base is false.";
say "Because we know that the clauses in the knowledge base are actually true, we can soundly conclude that '$conclusion must also be true.";
say "The clauses generated by each resolution can be found below.\n\n";
}
else
{
say "We were not able to generate the empty clause.";
say "this means that adding the negation of the desired conclusion does not render the theorum false.";
say "Therefore, we can not safely conclude that '$conclusion' is true.";
say "Any clauses that we were able to generate through a resoluton can be viewed below.\n\n";
}
print `more results.txt`;
sub update
{
if((($dots % 7) == 0))
{
print "\n";
}
if($dots==14)
{
print "You might want to get some coffee.\n";
}
if($dots==35)
{
print "I'm being VERY Thorough.\n";
}
if($dots==63 || $dots==140)
{
print "Hows that coffee?\n";
}
if($dots==105)
{
print "I think it might be time for a second cup of coffee\n"
}
if($dots==210)
{
print "Like I said, VERY thorough\n";
}
if($dots==630)
{
print "My O is bigger than you can imagine\n"
}
$dots++;
print ".";
}
I can't figure out why this is happening. could it have something to do with buffering?
If instead of calling the subroutine, i just say print "."; nothing will be printed until, the prog finishes execution.
Yes, filehandles are buffered by default. If STDOUT points to a terminal it will be line-buffered (nothing is output until a newline is printed), otherwise it will be block-buffered (nothing is output until a certain number of bytes is printed). The easiest way to change that is to set $|=1, which will make the current output filehandle (usually STDOUT unbuffered), so it will flush after every print.

Ignore empty lines in a CSV file using Perl

I have to read a CSV file, abc.csv, select a few fields from them and form a new CSV file, def.csv.
Below is my code. I am trying to ignore empty lines from abc.csv.
genNewCsv();
sub genNewCsv {
my $csv = Text::CSV->new();
my $aCsv = "abc.csv"
my $mCsv = "def.csv";
my $fh = FileHandle->new( $aCsv, "r" );
my $nf = FileHandle->new( $mCsv, "w" );
$csv->print( $nf, [ "id", "flops""ub" ] );
while ( my $row = $csv->getline($fh) ) {
my $id = $row->[0];
my $flops = $row->[2];
next if ( $id =~ /^\s+$/ ); #Ignore empty lines
my $ub = "TRUE";
$csv->print( $nf, [ $id, $flops, $ub ] );
}
$nf->close();
$fh->close();
}
But I get the following error:
Use of uninitialized value $flops in pattern match (m//)
How do I ignore the empty lines in the CSV file?
I have used Stack Overflow question Remove empty lines and space with Perl, but it didn't help.
You can skip the entire row if any fields are empty:
unless(defined($id) && defined($flop) && defined($ub)){
next;
}
You tested if $id was empty, but not $flops, which is why you got the error.
You should also be able to do
unless($id && $flop && $ub){
next;
}
There, an empty string would evaluate to false.
Edit: Now that I think about it, what do you mean by ignore lines that aren't there?
You can also do this
my $id = $row[0] || 'No Val' #Where no value is anything you want to signify it was empty
This will show a default value for the the variable, if the first value evaluated to false.
Run this on your file first to get non-empty lines
sub removeblanks {
my #lines = ();
while(#_) {
push #lines, $_ unless( $_ =~ /^\s*$/ ); #add any other conditions here
}
return \#lines;
}
You can do the following to skip empty lines:
while (my $row = $csv->getline($fh)){
next unless grep $_, #$row;
...
You could use List::MoreUtils to check if any of the fields are defined:
use List::MoreUtils qw(any);
while(my $row = ...) {
next unless any { defined } #{ $row };
...
}

Perl Hashref Substitutions

I'm using DBI to connect to Sybase to grab records in a hash_ref element. The DBI::Sybase driver has a nasty habit of returning records with trailing characters, specifically \x00 in my case. I'm trying to write a function to clean this up for all elements in the hashref, the code I have below does the trick, but I can't find a way to make it leaner, and I know there is away to do this better:
#!/usr/bin/perl
my $dbh = DBI->connect('dbi:Sybase:...');
my $sql = qq {SELECT * FROM table WHERE age > 18;};
my $qry = $dbh->selectall_hashref($sql, 'Name');
foreach my $val(values %$qry) {
$qry->{$val} =~ s/\x00//g;
}
foreach my $key(keys %$qry) {
$qry->{$key} =~ s/\x00//g;
foreach my $val1(keys %{$qry->{$key}}) {
$qry->{$key}->{$val1} =~ s/\x00//g;
}
foreach my $key1(keys %{$qry->{$key}}) {
$qry->{$key}->{$key1} =~ s/\x00//g;
}
While I think that a regex substitution is not exactly an ideal solution (seems like it should be fixed properly instead), here's a handy way to solve it with chomp.
use Data::Dumper;
my %a = (
foo => {
a => "foo\x00",
b => "foo\x00"
},
bar => {
c => "foo\x00",
d => "foo\x00"
},
baz => {
a => "foo\x00",
a => "foo\x00"
}
);
$Data::Dumper::Useqq=1;
print Dumper \%a;
{
local $/ = "\x00";
chomp %$_ for values %a;
}
print Dumper \%a;
chomp will remove a single trailing value equal to whatever the input record separator $/ is set to. When used on a hash, it will chomp the values.
As you will note, we do not need to use the values directly, as they are aliased. Note also the use of a block around the local $/ statement to restrict its scope.
For a more manageable solution, it's probably best to make a subroutine, called recursively. I used chomp again here, but you can just as easily skip that and use s/\x00//g. Or tr/\x00//d, which basically does the same thing. chomp is only safer in that it only removes characters from the end of the string, like s/\x00$// would.
strip_null(\%a);
print Dumper \%a;
sub strip_null {
local $/ = "\x00";
my $ref = shift;
for (values %$ref) {
if (ref eq 'HASH') {
strip_null($_); # recursive strip
} else {
chomp;
}
}
}
First your code:
foreach my $val(values %$qry) {
$qry->{$val} =~ s/\x00//g;
# here you are using a value as if it was a key
}
foreach my $key(keys %$qry) {
$qry->{$key} =~ s/\x00//g;
foreach my $val1(keys %{$qry->{$key}}) {
$qry->{$key}->{$val1} =~ s/\x00//g;
}
foreach my $key1(keys %{$qry->{$key}}) {
$qry->{$key}->{$key1} =~ s/\x00//g;
}
# and this does the same thing twice...
what you should do is:
foreach my $x (values %$qry) {
foreach my $y (ref $x eq 'HASH' ? values %$x : $x) {
$y =~ s/(?:\x00)+$//
}
}
which will clean up only ending nulls in the values of two levels of the hash.
the body of the loop could also be written as:
if (ref $x eq 'HASH') {
foreach my $y (values %$x) {
$y =~ s/(?:\x00)+$//
}
}
else {
$x =~ s/(?:\x00)+$//
}
But that forces you to write the substitution twice, and you shouldn't repeat yourself.
Or if you really want to reduce the code, using the implicit $_ variable works well:
for (values %$qry) {
s/(?:\x00)+$// for ref eq 'HASH' ? values %$_ : $_
}