How to use refernce concept and access element of subroutine argument using Perl? - perl

I am writing a code for calling a subroutine which has 4 argument(3 hashes and one file handler).i want to know how to access them in subroutine.My code is as below.
#print OUTFILE "Content of TPC file:.\n";
my $DATA_INFO = $ARGV[0];
my $OUT_DIR = $ARGV[1];
my $log= "$OUT_DIR/log1";
open(LOG1,">$log");
require "$DATA_INFO";
my $SCRIPT_DIR = $ENV{"SCRIPT_DIR"} ;
require "$SCRIPT_DIR/cmp_fault.pl";
require "$SCRIPT_DIR/pattern_mismatch.pl";
require "$SCRIPT_DIR/scan_count.pl";
print "\nComparing data:\n\n" ;
pattern_mismatch("\%data","\%VAR1","\%status",*LOG1);
cmp_fault("\%data","\%VAR1","\%status",*LOG1);
scan_count("\%data","\%status",*LOG1);
print "\n Comparison done:\n";
foreach $pattern (keys %status) {
print "pattern";
foreach $attr (keys %{$status{$pattern}}) {
print ",$attr";
}
print "\n";
last;
}
#Print Data
foreach $pattern (keys %status) {
print "$pattern";
foreach $attr (keys %{$status{$pattern}}) {
print ",$status{$pattern}{$attr}";
}
print "\n";
Sub routine cmp_fault is here:
sub cmp_fault {
use strict;
use warning;
$data_ref= $_[0];;
$VAR1_ref= $_[1];
$status_ref = $_[2];
$log1_ref=$_[3];
# print LOG1"For TPC : First find the pattern and then its fault type\n";
for $pat ( keys %$data_ref ) {
print "fgh:\n$pat,";
for $key (keys %{$data_ref{$pat}}) {
if($key=~/fault/){
print LOG1 "$key:$data_ref{$pat}{$key},\n";
}
}
}
# print LOG1 "\nFor XLS : First find the pattern and then its pattern type\n";
for $sheet (keys %$VAR1_ref){
if ("$sheet" eq "ATPG") {
for $row (1 .. $#{$VAR1_ref->{$sheet}}) {
$patname = $VAR1_ref->{'ATPG'}[$row]{'Pattern'} ;
next if ("$patname" eq "") ;
$faultXls = $VAR1_ref->{'ATPG'}[$row]{'FaultType'} ;
# print LOG1 " $patname==>$faultXls \n";
if (defined $data{$patname}{'fault'}) {
$faultTpc = $data{$patname}{'fault'} ;
# print LOG1 "\n $patname :XLS: $faultXls :TPC: $faultTpc\n";
if("$faultXls" eq "$faultTpc") {
print LOG1 "PASS: FaultType Matched $patname :XLS: $faultXls :TPC: $faultTpc\n\n\n";
print "PASS: FaultType Matched $patname :XLS: $faultXls :TPC: $faultTpc\n\n";
$status_ref->{$patname}{'FaultType'} = PASS;
}
else {
print LOG1 "FAIL: FaultType Doesn't Match\n\n";
$status_ref->{$patname}{'FaultType'} = Fail;
}
}
}
}
}
}
return 1;

When passing parameters into an array, you can only ever pass a single list of parameters.
For scalars, this isn't a problem. If all you're acting on is a single array, this also isn't a problem.
If you need to send scalars and an array or hash, then the easy way is to 'extract' the scalar parameters first, and then treat 'everything else' as the list.
use strict;
use warnings;
sub scalars_and_array {
my ( $first, $second, #rest ) = #_;
print "$first, $second, ", join( ":", #rest ), "\n";
}
scalars_and_array( "1", "2", "3", 4, 5, 6 );
But it should be noted that by doing so - you're passing values. You can do this with hashes too.
To pass data structure references, it's as you note - pass by reference, then dereference. It's useful to be aware though, that -> becomes useful, because it's accessing a hash and dereferencing it.
use strict;
use warnings;
use Data::Dumper;
sub pass_hash {
my ( $hashref ) = #_;
print $hashref,"\n";
print $hashref -> {"one"},"\n";
print $hashref -> {"fish"} -> {"haddock"};
}
my %test_hash = ( "one" => 2,
"three" => 4,
"fish" => { "haddock" => "plaice" }, );
pass_hash ( \%test_hash );
print "\n";
print Dumper \%test_hash;
The core of your problem here though, is that you haven't turned on strict and warnings which would tell you that:
for $pat ( keys %data_ref ) {
is wrong - there is no hash called data_ref there's only a scalar (which holds a hash reference) called $data_ref.
You need %$data_ref here.
And here:
for $key ( keys %{ $data{$pat} } ) {
You also have no $data - your code says $data_ref. (You might have %data in scope, but that's a really bad idea to mess around with within a sub).
There's a bunch of other errors - which would also be revealed by strict and warnings. That's a very basic debugging step, and you will generally get a much better response from Stack Overflow if you do this before asking for assistance. So please - do that, tidy up your code and remove errors/warnings. If you are still having problems after that, then by all means make a post outlining where and what problem you're having.

Related

Printing content of ARRAY inside hash

how do I print a content of an array inside the hash? I am using Dumper so you can see the data that I am parsing.
print Dumper \%loginusers;
for my $key ( keys %loginusers ) {
my $value = $loginusers{$key};
print "$key => $value\n";
}
printf "%s %-32s %-18s\n","User","Hostname","Since";
The output is
$VAR1 = {
'server1.localdomain.com:8080' => [
', 'user=user1
' 'since=2017-03-10 13:53:27
]
};
server1.localdomain.com:8080 => ARRAY(0x1584748)
User Hostname Since
As you can see there is an ARRAY(0x1584748) and I don't know how to get that value inside from the hash.
What I would like to see is something like:
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
Thank you very much for someone that can help.
Update:
So after trying this to see the data how it looks:
foreach my $key (keys %loginusers)
{
print "For $key:\n";
print "\t|$_|\n" for #{$loginusers{$key}};
}
The output looks like this:
For server1.localdomain.com:8080:
| |user=user1
| |since=2017-03-10 13:53:27
Update:
tried the add these on the code:
foreach my $key (keys %loginusers)
{
my #fields =
map { s/^\s*//; s/\s*\Z//; s/\s*\n\s*/ /g; $_ }
grep { /\S/ }
#{$loginusers{$key}};
print "For $key:\n";
print "$_\n" for #fields;
}
And using the both sample code:
printf "%-8s %-32s %s\n", qw(User Hostname Since);
foreach my $key (keys %loginusers)
{
my %field = map { /\s*(.*?)=\s*(.*)/ } #{$loginusers{$key}};
my ($host, $rgsender, $port) = split /:/, $key;
printf "%-8s %-32s %s\n", $field{user}, $host, $field{since};
}
my $newusers;
for my $host ( keys %loginusers ) {
local $/ = "\r\n"; #localised "input record separator" for the "chomp"
%{$newusers->{$host}} = map { chomp; split /=/,$_,2 } #{$loginusers{$host}};
}
undef %loginusers; #not needed anymore
#print "NEW STRUCTURE: ", Dumper $newusers;
printline( qw(User Hostname Since) );
printline($newusers->{$_}{user}, $_, $newusers->{$_}{since}) for (keys %$newusers);
sub printline { printf "%-8s %-32s %-18s\n", #_; }
and here is the results:
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
A hash value is a scalar, and it can take a reference. This is how we build complex data structures. Yours apparently have arrayrefs, so you need to dereference them. Something like
foreach my $key (keys %hash) {
say "$key => #{$hash{key}}";
}
See the tutorial perlretut and the cookbook on data structures perldsc.
The strange output from Dumper indicates that there may be leading/trailing spaces around values (or worse), which need be cleaned out. Until this is clarified I'll assume data like
'server1.localdomain.com:8080' => ['user=user1', 'since=2017-03-10 13:53:27']
In order to get the desired output you need to split each element
printf "%-8s %-32s %s\n", qw(User Hostname Since);
foreach my $key (keys %hash)
{
my ($user, $since) = map { /=\s*(.*)/ } #{$hash{$key}};
printf "%-8s %-32s %s\n", $user, $key, $since;
}
For each value, we dereference it and pass that through map. The code in maps block, that is applied to each element, pulls what is after =. Given the data, the first one is the user and the second one is timestamp. Since this is an array (and not a hash) I assume that the order is fixed. If not, get strings from both sides of = and analyze them to see which one goes where. Or better use a hash
my %field = map { /\s*(.*?)=\s*(.*)/ } #{$hash{$key}};
where .*? is the non-greedy version of .*, capturing until the first =. Then print as
printf "%-8s %-32s %s\n", $field{user}, $key, $field{since};
and you don't rely on the order in the arrayref. See the answer by jm666 for a nice and consistent approach building this from the beginning.
With the hash shown above this prints
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
I've used 8 and 32 widths based on shown data. For more precision, there are modules for tabular output. If you do it by hand you need to pre-process and find the longest word for each column among keys and/or values, and then use those lengths in the second pass with printf.
It appears that Dumper is getting confused by strange data. To see what we have do
foreach my $key (keys %loginusers)
{
print "For $key:\n";
print "\t|$_|\n" for #{$loginusers{$key}};
}
To clean up the data you can try
foreach my $key (keys %loginusers)
{
my #fields =
map { s/^\s*//; s/\s*$//; s/\s*\R\s*/ /g; $_ }
grep { /\S/ }
#{$loginusers{$key}};
print "For $key:\n";
print "$_\n" for #fields;
}
The grep takes an input list and filters out those elements for which the code inside its block evaluates false. Here we require at least one non-space character. Then its output goes into map, which removes all leading and trailing whitespace, and replaces all newlines with spaces.
The your data-structure isn't very nice. I would convert it to some better, using:
#convert to better structure
my $newusers;
for my $host ( keys %loginusers ) {
%{$newusers->{$host}} = map { chomp; split /=/,$_,2 } #{$loginusers{$host}};
}
undef %loginusers; #the old not needed anymore
print "NEW STRUCTURE: ", Dumper $newusers;
The dump now looks like:
NEW STRUCTURE: $VAR1 = {
'server1.localdomain.com:8080' => {
'user' => 'user1',
'since' => '2017-03-10 13:53:27'
}
};
after the above the printing is simple:
printline( qw(User Hostname Since) );
printline($newusers->{$_}{user}, $_, $newusers->{$_}{since}) for (keys %$newusers);
sub printline { printf "%-8s %-32s %-18s\n", #_; }
For the explanation read #zdim's excellent answer (and accept his answer :))
full code
use 5.014;
use warnings;
use Data::Dumper;
my %loginusers = (
'server1.localdomain.com:8080' => [
"user=user1\r\n", # you probably have the \r too
"since=2017-03-10 13:53:27\r\n",
]
);
say "OLD STRUCTURE: ", Dumper \%loginusers;
#convert to better structure
my $newusers;
for my $host ( keys %loginusers ) {
%{$newusers->{$host}} = map { s/[\r\n]//g; split /=/, $_, 2 } #{$loginusers{$host}}; #removes all \r and \n
}
undef %loginusers; #not needed anymore
say "NEW STRUCTURE: ", Dumper $newusers;
printline( qw(User Hostname Since) );
printline($newusers->{$_}{user}, $_, $newusers->{$_}{since}) for (keys %$newusers);
sub printline { printf "%-8s %-32s %-18s\n", #_; }
result:
OLD STRUCTURE: $VAR1 = {
'server1.localdomain.com:8080' => [
'user=user1
',
'since=2017-03-10 13:53:27
'
]
};
NEW STRUCTURE: $VAR1 = {
'server1.localdomain.com:8080' => {
'user' => 'user1',
'since' => '2017-03-10 13:53:27'
}
};
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
EDIT
You probably have the \r in your data too. See the updated code.

Use of reference to elements in #_ to avoid duplicating code

Is it safe to take reference of elements of #_ in a subroutine in order to avoid duplicating code? I also wonder if the following is good practice or can be simplified. I have a subroutine mod_str that takes an option saying if a string argument should be modified in-place or not:
use feature qw(say);
use strict;
use warnings;
my $str = 'abc';
my $mstr = mod_str( $str, in_place => 0 );
say $mstr;
mod_str( $str, in_place => 1 );
say $str;
sub mod_str {
my %opt;
%opt = #_[1..$#_];
if ( $opt{in_place} ) {
$_[0] =~ s/a/A/g;
# .. do more stuff with $_[0]
return;
}
else {
my $str = $_[0];
$str =~ s/a/A/g;
# .. do more stuff with $str
return $str;
}
}
In order to avoid repeating/duplicating code in the if and else blocks above, I tried to improve mod_str:
sub mod_str {
my %opt;
%opt = #_[1..$#_];
my $ref;
my $str;
if ( $opt{in_place} ) {
$ref = \$_[0];
}
else {
$str = $_[0]; # make copy
$ref = \$str;
}
$$ref =~ s/a/A/g;
# .. do more stuff with $$ref
$opt{in_place} ? return : return $$ref;
}
The "in place" flag changes the function's interface to the point where it should be a new function. It will simplify the interface, testing, documentation and the internals to have two functions. Rather than having to parse arguments and have a big if/else block, the user has already made that choice for you.
Another way to look at it is the in_place option will always be set to a constant. Because it fundamentally changes how the function behaves, there's no sensible case where you'd write in_place => $flag.
Once you do that, the reuse becomes more obvious. Write one function to do the operation in place. Write another which calls that on a copy.
sub mod_str_in_place {
# ...Do work on $_[0]...
return;
}
sub mod_str {
my $str = $_[0]; # string is copied
mod_str_in_place($str);
return $str;
}
In the absence of the disgraced given I like using for as a topicalizer. This effectively aliases $_ to either $_[0] or the local copy depending on the value of the in_place hash element. It's directly comparable to your $ref but with aliases, and a lot cleaner
I see no reason to return a useless undef / () in the case that the string is modified in place; the subroutine may as well return the new value of the string. (I suspect the old value might be more useful, after the fashion of $x++, but that makes for uglier code!)
I'm not sure whether this is readable code to anyone but me, so comments are welcome!
use strict;
use warnings;
my $ss = 'abcabc';
printf "%s %s\n", mod_str($ss), $ss;
$ss = 'abcabc';
printf "%s %s\n", mod_str($ss, in_place => 1), $ss;
sub mod_str {
my ($copy, %opt) = #_;
for ( $opt{in_place} ? $_[0] : $copy ) {
s/a/A/g;
# .. do more stuff with $_
return $_;
}
}
output
AbcAbc abcabc
AbcAbc AbcAbc

Perl + recursive subroutine + accessing variable defined outside of subroutine

I am pulling bitbucket repo list using Perl. The response from bitbucket will contain only 10 repositories and a marker for the next page where there will be another 10 repos and so on ... (they call it paging response)
So, I wrote a recursive subroutine which calls itself if the next page marker is present. This will continue until it reaches the last page.
Here is my code:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use LWP::UserAgent;
use JSON;
my #array;
recursive("my_bitbucket_url");
foreach ( #array ) { print $_."\n"; }
sub recursive
{
my $url = $_[0];
### here goes my LWP::UserAgent code which connects to bitbucket and pulls back the response in a JSON as $response->decoded_content
### hence, removing this code for brevity
my $hash = decode_json $response->decoded_content;
#print Dumper ($hash);
foreach my $a ( #{$hash->{values}} )
{
push #array, $a->{links}->{self}->{href};
}
if ( defined $hash->{next})
{
print "Next page Exists \n";
print "Recursing with $hash->{next} \n";
recursive( $hash->{next} );
}
else
{
print "Last page reached. No more recursion \n"
}
}
Now, my code works fine and it lists all the repos.
Question:
I am not sure about the way I have used the variable my #array; above. I have defined it outside the subroutine, However, I am accessing it directly from the subroutine. Somehow, I feel this is not right.
So, how to append to an array using a recursive subroutine in such cases. Does my code obey Perl ethics or is it something really absurd (yet correct because it works) ?
UPDATE
After following suggestions from #ikegami, #Sobrique and #Hynek -Pichi- Vychodil, I have come with below code which uses while loop and avoids recusrsion.
Here is my thought process:
Define an array #array.
Call the subroutine call_url with initial bitbucket URL and save the response in $hash
Check the $hash for the next page marker
If next page marker exists, then push the elements to #array and call call_url with the new marker. This will be done with the while loop.
If the next page marker does NOT exists, then push the elements to #array. Period.
Print #array content.
And here is my code:
my #array;
my $hash = call_url("my_bitbucket_url ");
if (defined $hash->{next})
{
while (defined $hash->{next})
{
foreach my $a ( #{$hash->{values}} )
{
push #array, $a->{links}->{self}->{href};
}
$hash = call_url($hash->{next});
}
}
foreach my $a ( #{$hash->{values}} )
{
push #array, $a->{links}->{self}->{href};
}
foreach (#array) { print $_."\n"; }
sub call_url
{
### here goes my LWP::UserAgent code which connects to bitbucket and pulls back the response in a JSON as $response->decoded_content
### hence, removing this code for brevity
my $hash = decode_json $response->decoded_content;
#print Dumper ($hash);
return $hash;
}
Would definitely like to hear whether this looks OK or is there still a room for improvement.
Using global variables to return values demonstrates high coupling, something to be avoided.
You're asking if the following is acceptable:
my $sum;
sum(4, 5);
print("$sum\n");
sub sum {
my ($x, $y) = #_;
$sum = $x + $y;
}
The fact that the sub is recursive is completely irrelevant; it just makes your example larger.
Problem fixed:
sub recursive
{
my $url = $_[0];
my #array;
my $hash = ...;
foreach my $a ( #{$hash->{values}} )
{
push #array, $a->{links}->{self}->{href};
}
if ( defined $hash->{next})
{
print "Next page Exists \n";
print "Recursing with $hash->{next} \n";
push #array, recursive( $hash->{next} );
}
else
{
print "Last page reached. No more recursion \n"
}
return #array;
}
{
my #array = recursive("my_bitbucket_url");
foreach ( #array ) { print $_."\n"; }
}
Recursion removed:
sub recursive
{
my $url = $_[0];
my #array;
while (defined($url)) {
my $hash = ...;
foreach my $a ( #{$hash->{values}} )
{
push #array, $a->{links}->{self}->{href};
}
$url = $hash->{next};
if ( defined $url)
{
print "Next page Exists \n";
print "Recursing with $url\n";
}
else
{
print "Last page reached. No more recursion \n"
}
}
return #array;
}
{
my #array = recursive("my_bitbucket_url");
foreach ( #array ) { print $_."\n"; }
}
Clean up of the latest code you posted:
my $url = "my_bitbucket_url";
my #array;
while ($url) {
my $hash = call_url($url);
for my $value ( #{ $hash->{values} } ) {
push #array, $value->{links}{self}{href};
}
$url = $hash->{next};
}
print("$_\n") for #array;
Yes, using a global variable is bad habit even it is lexical scoped variable.
Each recursive code can be rewritten into its imperative loop version and vice versa. It is because all of this is implemented on CPU which doesn't know anything about recursion at all. Thre are only jumps. All calls and returns are just jumps with some stack manipulation so you can rewrite your recursion algorithm into loop. If it is not obvious and simple as in this case you can even emulate stack and behaviour as it is done in your favourite language interpreter or compiler. In this case it's very simple:
my #array = with_loop("my_bitbucket_url");
foreach ( #array ) { print $_."\n"; }
sub with_loop
{
my $url = $_[0];
my #array;
while(1)
{
### here goes my LWP::UserAgent code which connects to bitbucket and
### pulls back the response in a JSON as $response->decoded_content
### hence, removing this code for brevity
my $hash = decode_json $response->decoded_content;
#print Dumper ($hash);
foreach my $a ( #{$hash->{values}} )
{
push #array, $a->{links}->{self}->{href};
}
unless ( defined $hash->{next})
{
print "Last page reached. No more recursion \n";
last
};
print "Next page Exists \n";
print "Recursing with $hash->{next} \n";
$url = $hash->{next};
};
return #array;
}
But when you would like to stick with recursion you can but it is a little bit more tricky. First of all, there is not tail call optimization so you don't have to try write tail call code as your original version does. So you can do this:
my #array = recursion("my_bitbucket_url");
foreach ( #array ) { print $_."\n"; }
sub recursion
{
my $url = $_[0];
### here goes my LWP::UserAgent code which connects to bitbucket and
### pulls back the response in a JSON as $response->decoded_content
### hence, removing this code for brevity
my $hash = decode_json $response->decoded_content;
#print Dumper ($hash);
# this map version is same as foreach with push but more perlish
my #array = map $_->{links}->{self}->{href}, #{$hash->{values}};
if (defined $hash->{next})
{
print "Next page Exists \n";
print "Recursing with $hash->{next} \n";
push #array, recursive( $hash->{next} );
}
else
{
print "Last page reached. No more recursion \n"
}
return #array;
}
But this version is not very efficient so there is way how to write tail call recursive version in perl which is a little bit tricky.
my #array = tail_recursive("my_bitbucket_url");
foreach ( #array ) { print $_."\n"; }
sub tail_recursive
{
my $url = $_[0];
my #array;
return tail_recursive_inner($url, \#array);
# url is mutable parameter
}
sub tail_recursive_inner
{
my $url = $_[0];
my $array = $_[1];
# $array is reference to accumulator #array
# from tail_recursive function
### here goes my LWP::UserAgent code which connects to bitbucket and
### pulls back the response in a JSON as $response->decoded_content
### hence, removing this code for brevity
my $hash = decode_json $response->decoded_content;
#print Dumper ($hash);
foreach my $a ( #{$hash->{values}} )
{
push #$array, $a->{links}->{self}->{href};
}
if (defined $hash->{next})
{
print "Next page Exists \n";
print "Recursing with $hash->{next} \n";
# first parameter is mutable so its OK to assign
$_[0] = $hash->{next};
goto &tail_recursive_inner;
}
else
{
print "Last page reached. No more recursion \n"
}
return #$array;
}
And if you are interested in some real perl trickery
print $_."\n" for tricky_tail_recursion("my_bitbucket_url");
sub tricky_tail_recursion {
my $url = shift;
### here goes my LWP::UserAgent code which connects to bitbucket and
### pulls back the response in a JSON as $response->decoded_content
### hence, removing this code for brevity
my $hash = decode_json $response->decoded_content;
#print Dumper ($hash);
push #_, $_->{links}->{self}->{href} for #{$hash->{values}};
if (defined $hash->{next}) {
print "Next page Exists \n";
print "Recursing with $hash->{next} \n";
unshift #_, $hash->{next};
goto &tricky_tail_recursion;
} else {
print "Last page reached. No more recursion \n"
};
return #_;
}
See also: LWP::UserAgent docs.
A variable defined outside any closures is available to the whole program. It works fine, there's nothing to worry about. Some might call it 'bad style' in certain cases (mostly around program length and action at distance) but that's not a hard constraint.
I'm not sure I necessarily see the advantage of recursion here though - your problem doesn't seem to warrant it. That's not a problem per-se, but it can be a little confusing for future maintenance programmers ;).
I'd be thinking something along the lines of (non recursive):
my $url = "my_bitbucket_url";
while ( defined $url ) {
##LWP Stuff;
my $hash = decode_json $response->decoded_content;
foreach my $element ( #{ $hash->{values} } ) {
print join( "\n", #{ $element->{links}->{self}->{href} } ), "\n";
}
$url = $hash->{next}; #undef if it doesn't exist, so loop breaks.
}

Push into end of hash in Perl

So what I am trying to do with the following code is push a string, let's say "this string" onto the end of each key in a hash. I'm completely stumped on how to do this. Here's my code:
use warnings;
use strict;
use File::Find;
my #name;
my $filename;
my $line;
my #severity = ();
my #files;
my #info = ();
my $key;
my %hoa;
my $xmlfile;
my $comment;
my #comments;
open( OUTPUT, "> $ARGV[0]" );
my $dir = 'c:/programs/TEST/Test';
while ( defined( $input = glob( $dir . "\\*.txt" ) ) ) {
open( INPUT, "< $input" );
while (<INPUT>) {
chomp;
if (/File/) {
my #line = split /:/;
$key = $line[1];
push #{ $hoa{$key} }, "Filename\n";
}
if ( /XML/ ... /File/ ) {
$xmlfile = $1;
push #{ $hoa{$key} }, "XML file is $xmlfile\n";
}
if (/Important/) {
push #{ $hoa{$key} }, "Severity is $_\n";
}
if (/^\D/) {
next if /Important/;
push #{ $hoa{$key} }, "Given comment is $_\n";
}
push #{ $hoa{$key} }, "this string\n";
}
}
foreach my $k ( keys %hoa ) {
my #list = #{ $hoa{$k} };
foreach my $l (#list) {
print OUTPUT $l, "\n";
}
}
}
close INPUT;
close OUTPUT;
Where I have "this string" is where I was trying to push that string onto the end of the array. However, what ended up happening was that it ended up printing "this string" three times, and not at the end of every key like I wanted. When I tried to put it outside the while() loop, it said that the value of $key was not initialized. So please, any help? And if you need any clarification on what I'm asking, just let me know. Thank you!
No offence, but there are so many issues in this code I don't even know where to start...
First, the 'initialization block' (all these my $something; my #somethings lines at the beginning of this script) is not required in Perl. In fact, it's not just 'redundant' - it's actually confusing: I had to move my focus back and forth every time I encountered a new variable just to check its type. Besides, even with all this $input var is still not declared as local; it's either missing in comments, or the code given has omissions.
Second, why do you declare your intention to use File::Find (good) - but then do not use it at all? It could greatly simplify all this while(glob) { while(<FH>) { ... } } routine.
Third, I'm not sure why you assign something to $key only when the line read is matched by /File/ - but then use its value as a key in all the other cases. Is this an attempt to read the file organized in sections? Then it can be done a bit more simple, either by slurp/splitting or localizing $/ variable...
Anyway, the point is that if the first line of the file scanned is not matched by /File/, the previous (i.e., from the previous file!) value is used - and I'm not quite sure that it's intended. And if the very first line of the first file is not /File/-matched, then an empty string is used as a key - again, it smells like a bug...
Could you please describe your task in more details? Give some test input/output results, perhaps... It'd be great to proceed in short tasks, organizing your code in process.
Your program is ill-conceived and breaks a lot of good practice rules. Rather than enumerate them all, here is an equivalent program with a better structure.
I wonder if you are aware that all of the if statements will be tested and possibly executed? Perhaps you need to make use of elsif?
Aside from the possibility that $key is undefined when it is used, you are also setting $xmlfile to $1 which will never be defined as there are no captures in any of your regular expressions.
It is impossible to tell from your code what you are trying to do, so we can help you only if you show us your output, input and say how to derive one from the other.
use strict;
use warnings;
use File::Find;
my ($outfile) = #ARGV;
my $dir = 'c:/programs/TEST/Test';
my %hoa;
my $key;
while (my $input = glob "$dir/*.txt") {
open my $in, '<', $input or die $!;
while (<$in>) {
chomp;
if (/File/) {
my $key = (split /:/)[1];
push #{ $hoa{$key} }, "Filename\n";
}
if (/XML/ ... /File/) {
my $xmlfile = $1;
push #{ $hoa{$key} }, "XML file is $xmlfile\n";
}
if (/Important/) {
push #{ $hoa{$key} }, "Severity is $_\n";
}
if (/^\D/) {
next if /Important/;
push #{ $hoa{$key} }, "Given comment is $_\n";
}
push #{ $hoa{$key} }, "this string\n";
}
close $in;
}
open my $out, '>', $outfile or die $!;
foreach my $k (keys %hoa) {
foreach my $l (#{ $hoa{$k} }) {
print $out $l, "\n";
}
}
close $out;
I suspect based on your code, that the line where $key is set is not called each time through the loop, and that you do not trigger any of the other if statements.
This would append "this string" to the end of the array. Based on that you are getting 3 of the "this strings" at the end of the array, I would suspect that two lines do not go through the if (/FILE/) or any of the other if statements. This would leave the $key value the same and at the end, you would append "this string" to the array, using whatever the last value of $key was when it was set.
This will append the string "this string" to every element of the hash %hoa, which elements are array refs:
for (values(%hoa)) { push #{$_}, "this string"; }
Put that outside your while loop, and you'll print "this string" at the end of each element of %hoa.
It will autovivify array refs where it finds undefined elements. It will also choke if it cannot dereference an element as an array, and will manipulate arrays by symbolic reference if it finds a simple scalar and is not running under strict:
my %autoviv = ( a => ['foo'], b => undef );
push #$_, "PUSH" for values %autoviv; # ( a => ['foo', 'PUSH'], b => ['PUSH'] )
my %fatal = ( a => {} );
push #$_, "PUSH" for values %fatal; # FATAL: "Not an ARRAY reference at..."
my %dangerous = (a => "foo");
push #$_, "PUSH" for values %dangerous; # Yikes! #foo is now ("PUSH")
use strict;
my %kablam = (a => "foo");
push #$_, "PUSH" for values %kablam; # "Can't use string ("foo") as an ARRAY ref ..."
As I understand it, traverse the hash with a map command to modify its keys. An example:
EDIT: I've edited because I realised that the map command can be assigned to the same hash. No need to create a new one.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = qw|
key1 value1
key2 value2
key3 value3
|;
my %hash = map { $_ . "this string" => $hash{ $_ } } keys %hash;
print Dump \%hash;
Run it like:
perl script.pl
With following output:
$VAR1 = {
'key3this string' => 'value3',
'key2this string' => 'value2',
'key1this string' => 'value1'
};

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?