help understanding perl hash - perl

Perl newbie here...I had help with this working perl script with some HASH code and I just need help understanding that code and if it could be written in a way that I would understand the use of HASHES more easily or visually??
In summary the script does a regex to filter on date and the rest of the regex will pull data related to that date.
use strict;
use warnings;
use constant debug => 0;
my $mon = 'Jul';
my $day = 28;
my $year = 2010;
my %items = ();
while (my $line = <>)
{
chomp $line;
print "Line: $line\n" if debug;
if ($line =~ m/(.* $mon $day) \d{2}:\d{2}:\d{2} $year: ([a-zA-Z0-9._]*):.*/)
{
print "### Scan\n" if debug;
my $date = $1;
my $set = $2;
print "$date ($set): " if debug;
$items{$set}->{'a-logdate'} = $date;
$items{$set}->{'a-dataset'} = $set;
if ($line =~ m/(ERROR|backup-date|backup-size|backup-time|backup-status)[:=](.+)/)
{
my $key = $1;
my $val = $2;
$items{$set}->{$key} = $val;
print "$key=$val\n" if debug;
}
}
}
print "### Verify\n";
for my $set (sort keys %items)
{
print "Set: $set\n";
my %info = %{$items{$set}};
for my $key (sort keys %info)
{
printf "%s=%s;", $key, $info{$key};
}
print "\n";
}
What I am trying to understand is these lines:
$items{$set}->{'a-logdate'} = $date;
$items{$set}->{'a-dataset'} = $set;
And again couple lines down:
$items{$set}->{$key} = $val;
Is this an example of hash reference? hash of hashes?
I guess i'm confused with the use of {$set} :-(

%items is a hash of hash references (conceptually, a hash of hashes). $set is the key into %items and then you get back another hash, which is being added to with keys 'a-logdate' and 'a-dataset'.
(corrected based on comments)

Lou Franco's answer is close, with one minor typographical error—the hash of hash references is %items, not $items. It is referred to as $items{key} when you are retrieving a value from %items because the value you are retrieving is a scalar (in this case, a hash reference), but $items would be a different variable.

Related

Perl DBI — ignore few columns in output

I've used this code:
while (my $line = <IN>)
{
chomp $line;
if($line =~ /(.*?: )\{(.+)\}/)
{
my $value2 = $2;
my #values2 = split(/,/, $value2);
my $insertKeys;
my $insertValues;
foreach $data(#values2)
{
chomp $data;
my ($key, $value) = split(/:/, $data);
$key =~ s/"//g;
$value =~ s/"/'/g;
$insertKeys .= $key.',';
$insertValues .= $value.',';
}
Input:
"actor_ip":"127.0.0.1" "note":"From Git" "user":"Username for 'https" "user_id":null "actor":"Username for 'https" "actor_id":null "org_id":null "action":"user.failed_login" "created_at":1412256345456789 "data":{"actor_location":{"location":{"lat":null "lon":null}}}
Output:
KEYS: actor_ip,note,user,user_id,actor,actor_id,org_id,action,created_at,data,lon,
VALUES: '127.0.0.1','From Git','Username for 'https',null,'Username for 'https',null,null,'user.failed_login',1412256456789,{'actor_location',null
I want to remove these two key and values from output Please let me know how to regex these below
"user":"Username for 'https"
"data":{"actor_location":{"location":{"lat":null "lon":null}}}
You simply need to exclude the keys you don't want:
if ($key !~ /^(data|user)$/)
{
$insertKeys .= $key.',';
$insertValues .= $value.',';
}
However, a more flexible design might be to insert key/value pairs into a hash:
my %params;
foreach $data(#values2)
{
chomp $data;
my ($key, $value) = split(/:/, $data);
$key =~ s/"//g;
$value =~ s/"/'/g;
$params{$key} = $value;
}
Then it would be easy to do whatever you want with the parameters later.
Also, you don't show your DBI code, but this code suggests you are manually building the whole insert query string. A safer (and better-designed) approach would be a parameterized query.

Perl - sort filenames by order based on the filemask YYY-MM-DD that is in the filename

Need some help, not grasping a solution here on what method I should use.
I need to scan a directory and obtain the filenames by order of
1.YYYY-MM-DD, YYYY-MM-DD is part of the filename.
2. Machinename which is at the start of the filename to the left of the first "."
For example
Machine1.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine2.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine3.output.log.2014-02-26
So that it outputs in an array as follows
Machine1.output.log.2014-02-26
Machine2.output.log.2014-02-26
Machine3.output.log.2014-02-26
Machine1.output.log.2014-02-27
Machine2.output.log.2014-02-27
Thanks,
Often, temporarily turning your strings into a hash or array for sorting purposes, and then turning them back into the original strings is the most maintainable way.
my #filenames = qw/
Machine1.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine2.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine3.output.log.2014-02-26
/;
#filenames =
map $_->{'orig_string'},
sort {
$a->{'date'} cmp $b->{'date'} || $a->{'machine_name'} cmp $b->{'machine_name'}
}
map {
my %attributes;
#attributes{ qw/orig_string machine_name date/ } = /\A(([^.]+)\..*\.([^.]+))\z/;
%attributes ? \%attributes : ()
} #filenames;
You can define your own sort like so ...
my #files = (
"Abc1.xxx.log.2014-02-26"
, "Abc1.xxx.log.2014-02-27"
, "Abc2.xxx.log.2014-02-26"
, "Abc2.xxx.log.2014-02-27"
, "Abc3.xxx.log.2014-02-26"
);
foreach my $i ( #files ) { print "$i\n"; }
sub bydate {
(split /\./, $a)[3] cmp (split /\./, $b)[3];
}
print "sort it\n";
foreach my $i ( sort bydate #files ) { print "$i\n"; }
You can take your pattern 'YYYY-MM-DD' and match it to what you need.
#!/usr/bin/perl
use strict;
opendir (DIRFILES, ".") || die "can not open data file \n";
my #maplist = readdir(DIRFILES);
closedir(MAPS);
my %somehash;
foreach my $tmp (#maplist) {
next if $tmp =~ /^.{1,2}$/;
next if $tmp =~ /test/;
$tmp =~ /(\d{4})-(\d{2})-(\d{2})/;
$somehash{$tmp} = $1 . $2 . $3; # keep the original file name
# allows for duplicate dates
}
foreach my $tmp (keys %somehash) {
print "-->", $tmp, " --> " , $somehash{$tmp},"\n";
}
my #list= sort { $somehash{$a} <=> $somehash{$b} } keys(%somehash);
foreach my $tmp (#list) {
print $tmp, "\n";
}
Works, tested it with touch files.

cant retrieve values from hash reversal (Perl)

I've initialized a hash with Names and their class ranking as follows
a=>5,b=>2,c=>1,d=>3,e=>5
I've this code so far
my %Ranks = reverse %Class; #As I need to find out who's ranked first
print "\nFirst place goes to.... ", $Ranks{1};
The code only prints out
"First place goes to...."
I want it to print out
First place goes to....c
Could you tell me where' I'm going wrong here?
The class hash prints correctly
but If I try to print the reversed hash using
foreach $t (keys %Ranks) {
print "\n $t $Ranks{$t}"; }
It prints
5
abc23
cab2
ord
If this helps in any way
FULL CODE
#Script to read from the data file and initialize it into a hash
my %Code;
my %Ranks;
#Check whether the file exists
open(fh, "Task1.txt") or die "The File Does Not Exist!\n", $!;
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}
close(fh);
#Prints the dataset
print "Code \t Name\n";
foreach $code ( keys %Code) {
print "$code \t $Code{$code}\n";
}
#Find out who comes first
my %Ranks = reverse %Class;
foreach $t (keys %Ranks)
{
print "\n $t $Ranks{$t}";
}
print "\nFirst place goes to.... ", $Ranks{1}, "\n";
When you want to check what your data structures actually contain, use Data::Dumper. use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper(\%Class));. You'll find un-chomped newlines.
You need to use chomp. At present your $fields[2] value has a trailing newline.
Change your file read loop to this
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}

perl printing hash of arrays with out Data::Dumper

Here is the code, I know it is not perfect perl. If you have insight on how I an do better let me know. My main question is how would I print out the arrays without using Data::Dumper?
#!/usr/bin/perl
use Data::Dumper qw(Dumper);
use strict;
use warnings;
open(MYFILE, "<", "move_headers.txt") or die "ERROR: $!";
#First split the list of files and the headers apart
my #files;
my #headers;
my #file_list = <MYFILE>;
foreach my $source_parts (#file_list) {
chomp($source_parts);
my #parts = split(/:/, $source_parts);
unshift(#files, $parts[0]);
unshift(#headers, $parts[1]);
}
# Next get a list of unique headers
my #unique_files;
foreach my $item (#files) {
my $found = 0;
foreach my $i (#unique_files) {
if ($i eq $item) {
$found = 1;
last;
}
}
if (!$found) {
unshift #unique_files, $item;
}
}
#unique_files = sort(#unique_files);
# Now collect the headers is a list per file
my %hash_table;
for (my $i = 0; $i < #files; $i++) {
unshift #{ $hash_table{"$files[$i]"} }, "$headers[$i]";
}
# Process the list with regex
while ((my $key, my $value) = each %hash_table) {
if (ref($value) eq "ARRAY") {
print "$value", "\n";
}
}
The Perl documentation has a tutorial on "Printing of a HASH OF ARRAYS" (without using Data::Dumper)
perldoc perldsc
You're doing a couple things the hard way. First, a hash will already uniqify its keys, so you don't need the loop that does that. It appears that you're building a hash of files, with the values meant to be the headers found in those files. The input data is "filename:header", one per line. (You could use a hash of hashes, since the headers may need uniquifying, but let's let that go for now.)
use strict;
use warnings;
open my $files_and_headers, "<", "move_headers.txt" or die "Can't open move_headers: $!\n";
my %headers_for_file;
while (defined(my $line = <$files_and_headers> )) {
chomp $line;
my($file, $header) = split /:/, $line, 2;
push #{ $headers_for_file{$file} }, $header;
}
# Print the arrays for each file:
foreach my $file (keys %headers_for_file) {
print "$file: #{ $headers_for_file{$file}}\n";
}
We're letting Perl do a chunk of the work here:
If we add keys to a hash, they're always unique.
If we interpolate an array into a print statement, Perl adds spaces between them.
If we push onto an empty hash element, Perl automatically puts an empty anonymous array in the element and then pushes onto that.
An alternative to using Data::Dumper is to use Data::Printer:
use Data::Printer;
p $value;
You can also use this to customise the format of the output. E.g. you can have it all in a single line without the indexes (see the documentation for more options):
use Data::Printer {
index => 0,
multiline => 0,
};
p $value;
Also, as a suggestion for getting unique files, put the elements into a a hash:
my %unique;
#unique{ #files } = #files;
my #unique_files = sort keys %unique;
Actually, you could even skip that step and put everything into %hash_table in one pass:
my %hash_table;
foreach my $source_parts (#file_list) {
chomp($source_parts);
my #parts = split(/:/, $source_parts);
unshift #{ $hash_table{$parts[0]} }, $parts[1];
}

How can I get to my anonymous arrays in Perl?

The following code generates a list of the average number of clients connected by subnet. Currently I have to pipe it through sort | uniq | grep -v HASH.
Trying to keep it all in Perl, this doesn't work:
foreach $subnet (keys %{keys %{keys %days}}) {
print "$subnet\n";
}
The source is this:
foreach $file (#ARGV) {
open(FH, $file) or warn("Can't open file $file\n");
if ($file =~ /(2009\d{4})/) {
$dt = $+;
}
%hash = {};
while(<FH>) {
#fields = split(/~/);
$subnet = $fields[0];
$client = $fields[2];
$hash{$subnet}{$client}++;
}
close(FH);
$file = "$dt.csv";
open(FH, ">$file") or die("Can't open $file for output");
foreach $subnet (sort keys %hash) {
$tot = keys(%{$hash{$subnet}});
$days{$dt}{$subnet} = $tot;
print FH "$subnet, $tot\n";
push #{$subnet}, $tot;
}
close(FH);
}
foreach $day (sort keys %days) {
foreach $subnet (sort keys %{$days{$day}}) {
$tot = $i = 0;
foreach $amt (#{$subnet}) {
$i++;
$tot += $amt;
}
print "$subnet," . int($tot/$i) . "\n";
}
}
How can I eliminate the need for the sort | uniq process outside of Perl? The last foreach gets me the subnet ids which are the 'anonymous' names for the arrays. It generates these multiple times (one for each day that subnet was used).
but this seemed easier than combining
spreadsheets in excel.
Actually, modules like Spreadsheet::ParseExcel make that really easy, in most cases. You still have to deal with rows as if from CSV or the "A1" type addressing, but you don't have to do the export step. And then you can output with Spreadsheet::WriteExcel!
I've used these modules to read a spreadsheet of a few hundred checks, sort and arrange and mung the contents, and write to a new one for delivery to an accountant.
In this part:
foreach $subnet (sort keys %hash) {
$tot = keys(%{$hash{$subnet}});
$days{$dt}{$subnet} = $tot;
print FH "$subnet,$tot\n";
push #{$subnet}, $tot;
}
$subnet is a string, but you use it in the last statement as an array reference. Since you don't have strictures on, it treats it as a soft reference to a variable with the name the same as the content of $subnet. Which is okay if you really want to, but it's confusing. As for clarifying the last part...
Update I'm guessing this is what you're looking for, where the subnet value is only saved if it hasn't appeared before, even from another day (?):
use List::Util qw(sum); # List::Util was first released with perl 5.007003 (5.7.3, I think)
my %buckets;
foreach my $day (sort keys %days) {
foreach my $subnet (sort keys %{$days{$day}}) {
next if exists $buckets{$subnet}; # only gives you this value once, regardless of what day it came in
my $total = sum #{$subnet}; # no need to reuse a variable
$buckets{$subnet} = int($total/#{$subnet}; # array in scalar context is number of elements
}
}
use Data::Dumper qw(Dumper);
print Dumper \%buckets;
Building on Anonymous's suggestions, I built a hash of the subnet names to access the arrays:
..
push #{$subnet}, $tot;
$subnets{$subnet}++;
}
close(FH);
}
use List::Util qw(sum); # List::Util was first released with perl 5.007003
foreach my $subnet (sort keys %subnets) {
my $total = sum #{$subnet}; # no need to reuse a variable
print "$subnet," . int($total/#{$subnet}) . "\n"; # array in scalar context is number of elements
}
I am not sure if this is the best solution, but I don't have the duplicates any more.