Perl Hashref Substitutions - perl

I'm using DBI to connect to Sybase to grab records in a hash_ref element. The DBI::Sybase driver has a nasty habit of returning records with trailing characters, specifically \x00 in my case. I'm trying to write a function to clean this up for all elements in the hashref, the code I have below does the trick, but I can't find a way to make it leaner, and I know there is away to do this better:
#!/usr/bin/perl
my $dbh = DBI->connect('dbi:Sybase:...');
my $sql = qq {SELECT * FROM table WHERE age > 18;};
my $qry = $dbh->selectall_hashref($sql, 'Name');
foreach my $val(values %$qry) {
$qry->{$val} =~ s/\x00//g;
}
foreach my $key(keys %$qry) {
$qry->{$key} =~ s/\x00//g;
foreach my $val1(keys %{$qry->{$key}}) {
$qry->{$key}->{$val1} =~ s/\x00//g;
}
foreach my $key1(keys %{$qry->{$key}}) {
$qry->{$key}->{$key1} =~ s/\x00//g;
}

While I think that a regex substitution is not exactly an ideal solution (seems like it should be fixed properly instead), here's a handy way to solve it with chomp.
use Data::Dumper;
my %a = (
foo => {
a => "foo\x00",
b => "foo\x00"
},
bar => {
c => "foo\x00",
d => "foo\x00"
},
baz => {
a => "foo\x00",
a => "foo\x00"
}
);
$Data::Dumper::Useqq=1;
print Dumper \%a;
{
local $/ = "\x00";
chomp %$_ for values %a;
}
print Dumper \%a;
chomp will remove a single trailing value equal to whatever the input record separator $/ is set to. When used on a hash, it will chomp the values.
As you will note, we do not need to use the values directly, as they are aliased. Note also the use of a block around the local $/ statement to restrict its scope.
For a more manageable solution, it's probably best to make a subroutine, called recursively. I used chomp again here, but you can just as easily skip that and use s/\x00//g. Or tr/\x00//d, which basically does the same thing. chomp is only safer in that it only removes characters from the end of the string, like s/\x00$// would.
strip_null(\%a);
print Dumper \%a;
sub strip_null {
local $/ = "\x00";
my $ref = shift;
for (values %$ref) {
if (ref eq 'HASH') {
strip_null($_); # recursive strip
} else {
chomp;
}
}
}

First your code:
foreach my $val(values %$qry) {
$qry->{$val} =~ s/\x00//g;
# here you are using a value as if it was a key
}
foreach my $key(keys %$qry) {
$qry->{$key} =~ s/\x00//g;
foreach my $val1(keys %{$qry->{$key}}) {
$qry->{$key}->{$val1} =~ s/\x00//g;
}
foreach my $key1(keys %{$qry->{$key}}) {
$qry->{$key}->{$key1} =~ s/\x00//g;
}
# and this does the same thing twice...
what you should do is:
foreach my $x (values %$qry) {
foreach my $y (ref $x eq 'HASH' ? values %$x : $x) {
$y =~ s/(?:\x00)+$//
}
}
which will clean up only ending nulls in the values of two levels of the hash.
the body of the loop could also be written as:
if (ref $x eq 'HASH') {
foreach my $y (values %$x) {
$y =~ s/(?:\x00)+$//
}
}
else {
$x =~ s/(?:\x00)+$//
}
But that forces you to write the substitution twice, and you shouldn't repeat yourself.
Or if you really want to reduce the code, using the implicit $_ variable works well:
for (values %$qry) {
s/(?:\x00)+$// for ref eq 'HASH' ? values %$_ : $_
}

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

compare the value of same items with one exception case

A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
QC4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
QA4,train,75
F,movie,
It should do comparison between the values of same type (wherever the 2nd column matches) and Print the differ .Now i want output to look like:
**A,food,75 is not matching with B,food,75 C,food,85
A,car,69 is not matching with C,car,136 B,Car,136
A,house,179 is not matching with B,house,150
QC4,train,95 is not matching with QA4,train,75
F,movie missing value
Code I've written is below but its not printing the format the way I want.
while (FILE) {
my $line = $_ ;
my #lines = split /,/, $line ;
$data{$lines[1]}{$lines[0]} = $lines[2] ;
}
foreach my $item (keys %val) {
foreach my $letter1 (keys %{$val{$item}}) {
foreach my $letter2 (keys %{$val{$item}}) {
if ( ($val{$item}{$letter1} != $val{$item}{$letter2}) && ($letter1 ne
$letter2) && ( (!defined $done{$item}{$letter1}{$letter2}) ||
(!defined
$done{$item}{$letter2}{$letter1}) ) ) {
print "$item : $letter1, $val{$item}{$letter1}, $letter2 ,
$val{$item}
{$letter2}\n" ;
}
}
Really hard to follow the logic of your code.
But I seem to get the desired result with this:
[Edit] The code was edited as per the comments
use strict;
use warnings;
my (%hash,
);
while(my $line=<DATA>) {
chomp $line;
my ($letter, $object, $number)=split /,/, $line;
### here we are dealing with missing values
if ($number =~ m{^\s*$}) {
print $line, "missing value\n";
next;
}
### here we dissever exceptional items apart from the others
if ($letter=~m{^E[AC]X\d$}) {
$object = "exceptional_$object";
}
$number+=0; # in case there is whitespace at the end
push #{$hash{$object}{$number}}, [$letter,$number,$line];
}
for my $object(sort keys %hash) {
my $oref = $hash{$object};
if (1==keys %$oref) {
next;
}
my $str;
for my $item (values %$oref) {
$str .= $str ? " $item->[0][2]" : "$item->[0][2] is not matching with";
}
print ($str,"\n");
}
__DATA__
A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
ECX4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
EAX4,train,75
F,movie,
output
F,movie,missing value
A,car,69 is not matching with B,car,136
EAX4,train,75 is not matching with ECX4,train,95
C,food,85 is not matching with A,food,75
A,house,179 is not matching with B,house,150
What the algorithm does:
Looping through the input we remember the all the lines for each unique pair of object and number.
After going through the input loop we do the following:
For each object we skip it if it has no different numbers:
if (1==keys %$oref) {
next
}
If it has, we build an output string from a list of the first remembered lines for that object and number (that is we omit the duplicates for the object and number);
the first item from the list amended with "is not matching with".
Also, I am reading from the special filehandle DATA, which accesses embedded data in the script. This is for convenience of demoing the code

Extra HASH() reference added to Perl hash output

I'm trying to read a FORTRAN program using Perl, and remove an INCLUDE command and replace it with a USE.
This is working great, except when printing out the contents of the hash storing the existing USE statements, I get an extra entry.
my #lines = ( );
my %uses = { };
foreach $f1line (<>) {
$f1line =~ s/\r[\n]*//g;
if ($f1line =~ /^\s*INCLUDE 'FILE1.CMN'/ ||
$f1line =~ /^\s*INCLUDE 'FILE2.CMN'/ ||
$f1line =~ /^\s*INCLUDE 'FILE3.CMN'/) {
$f1line = " USE My_Mod";
}
if ($f1line =~ /^\s*USE /) {
$uses{$f1line} = 1;
}
push #lines, $f1line . $/;
}
$found = 0;
foreach $line (#lines) {
if ($found == 0 && $line =~ /^\s*USE /) {
foreach my $x (sort(keys(%uses))) {
print $x . $/; # (1)
}
$found = 1;
} elsif ($found == 1 && $line =~ /^\s*USE /) {
next;
} else {
print $line;
}
}
The output is this:
C Include parameters here
USE My_Mod
USE MyOther_Mod
USE EvenAnother_Mod
HASH(0x7f9dc3805ce8)
Where is the HASH(0x...) reference coming from? The only place I'm printing the contents of the hash is on line (1). It almost looks like iterating over the keys of the hash is somehow including the hash itself.
How can I get rid of this?
You are not really having a big problem, the big deal here is that you are not able to see the errors you are doing.
That's why you should always strict and warnings
In your code you start with:
my %uses = { };
When it should be:
my %uses = ();
or
my %uses; #it's fine also
And then it will works.
By using {} in a "hash" context you could create a hashref which is not the case.
A reference to an anonymous hash can be created using curly brackets:
$hashref = {
'Adam' => 'Eve',
'Clyde' => 'Bonnie',
};
Also is a good practice declare your variables in foreach loop like:
foreach my $line (#lines) {
And in the rest of your code.

Optimize perl hash mess

I have Perl code, which looks messy:
my $x = $h->[1];
foreach my $y (keys %$x) {
my $ax = $x->{$y};
foreach my $ay (keys %$ax) {
if (ref($ax->{$ay}) eq 'JE::Object::Proxy') {
my $bx = $ax->{$ay};
if ($$bx->{class_info}->{name} eq 'HTMLImageElement') {
print $$bx->{value}->{src}, "\n";
}
}
}
}
Is it possible to optimize the code above to not use any variables, just $h, as that one is an input?
Here's my crack at it:
print $$_->{value}{src}, "\n" for grep {
ref $_ eq 'JE::Object::Proxy' &&
$$_->{class_info}{name} eq 'HTMLImageElement'
} map {
values %$_
} values %{ $h->[1] };
You're using keys, when you really just want values.
foreach my $h ( grep { ref() eq 'HASH' } values %$x ) {
foreach my $obj (
grep { ref() eq 'JE::Object::Proxy'
and $_->{class_info}{name} eq 'HTMLImageElement'
} values %$h
) {
say $obj->{value}{src};
}
}
A lot of the "messiness" can be cleaned up by reducing your line count and minimizing how much nested code you have. Use the each command to get the next key and its associated value from the hash in one line. [EDIT: as Axeman pointed out, you really only need the values, so I'm replacing my use of each with values]. Also, use a pair of next statement to skip the print statement.
for my $ax (values %{$h->[1]} ) {
for my $bx (values %$ax ) {
next unless ref($bx) eq 'JE::Object::Proxy';
next unless $$bx->{class_info}->{name} eq 'HTMLImageElement';
print "$$bx->{value}->{src}\n";
}
}
Just removing the helper variables is easy, something like this should do it:
foreach my $y (keys %{$h->[1]}) {
foreach my $ax (%{$h->[1]->{$y}) {
foreach my $ay (keys %$ax) {
if(ref($h->[1]->{$y}->{$ay}) eq 'JE::Object::Proxy') {
if($h->[1]->{$y}->{$ay}->{class_info}->{name} eq 'HTMLImageElement') {
print $h->[1]->{$y}->{$ay}->{value}->{src}, "\n";
}
}
}
}
}
You could also remove the duplicated if:
foreach my $y (keys %{$h->[1]}) {
foreach my $ax (%{$h->[1]->{$y}) {
foreach my $ay (keys %$ax) {
if(ref($h->[1]->{$y}->{$ay}) eq 'JE::Object::Proxy' && $h->[1]->{$y}->{$ay}->{class_info}->{name} eq 'HTMLImageElement') {
print $h->[1]->{$y}->{$ay}->{value}->{src}, "\n";
}
}
}
}
But I don't really see how to make it more readable: it is a iteration over a three dimensional structure.

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?