Array of hashes - perl

In perl , i have an array of hashes
like
0 HASH(0x98335e0)
'title' => 1177
'author' => 'ABC'
'quantity' => '-100'
1 HASH(0x832a9f0)
'title' => 1177
'author' => 'ABC'
'quantity' => '100'
2 HASH(0x98335e0)
'title' => 1127
'author' => 'DEF'
'quantity' => '5100'
3 HASH(0x832a9f0)
'title' => 1277
'author' => 'XYZ'
'quantity' => '1030'
Now I need to accumulate the quantity where title and author are same.
In the above structure for hash with title = 1177 and author ='ABC' quantity can be accumulated into one and the entire structure should looks like below
0 HASH(0x98335e0)
'title' => 1177
'author' => 'ABC'
'quantity' => 0
1 HASH(0x98335e0)
'title' => 1127
'author' => 'DEF'
'quantity' => '5100'
2 HASH(0x832a9f0)
'title' => 1277
'author' => 'XYZ'
'quantity' => '1030'
What is the best way i can do this accumulation so that it is optimised? Number of array elements can be very large. I dont mind adding an extra key to the hash to aid the same , but i dont want n lookups . Kindly advise

my %sum;
for (#a) {
$sum{ $_->{author} }{ $_->{title} } += $_->{quantity};
}
my #accumulated;
foreach my $author (keys %sum) {
foreach my $title (keys %{ $sum{$author} }) {
push #accumulated => { title => $title,
author => $author,
quantity => $sum{$author}{$title},
};
}
}
Not sure whether map makes it look nicer:
my #accumulated =
map {
my $author = $_;
map { author => $author,
title => $_,
quantity => $sum{$author}{$_},
},
keys %{ $sum{$author} };
}
keys %sum;

If you don't want N lookups, then you need a hash function -- however you need to store them with that hash function. By the time you have them in a list (or array), it's too late. You either get lucky, all the time, or you're going to have N lookups.
Or insert them into the hash abovebelow. A hybrid solution is to store a locator as item 0 in the list/array.
my $lot = get_lot_from_whatever();
my $tot = $list[0]{ $lot->{author} }{ $lot->{title} };
if ( $tot ) {
$tot->{quantity} += $lot->{quantity};
}
else {
push #list, $list[0]{ $lot->{author} }{ $lot->{title} } = $lot;
}
previous
First of all we'll reformat that to make it readable.
[ { title => 1177, author => 'ABC', quantity => '-100' }
, { title => 1177, author => 'ABC', quantity => '100' }
, { title => 1127, author => 'DEF', quantity => '5100' }
, { title => 1277, author => 'XYZ', quantity => '1030' }
]
Next, you need to break down the problem. You want quantities of things grouped
by author and title. So you need those things to uniquely identify those lots.
To repeat, you want a combination of names to identify entities. Thus, you
will need a hash that identifies things by names.
Since we have two things, a double hash is a good way to do it.
my %hash;
foreach my $lot ( #list ) {
$hash{ $lot->{author} }{ $lot->{title} } += $lot->{quantity};
}
# consolidated by hash
To turn this back into a list, we need to unbundle the levels.
my #consol
= sort { $a->{author} cmp $b->{author} || $a->{title} cmp $b->{title} }
map {
my ( $a, $titles ) = #$_; # $_ is [ $a, {...} ]
map { +{ title => $_, author => $a, quantity => $titles->{$_} }
keys %$titles;
}
map { [ $_ => $hash{$_} ] } # group and freeze a pair
keys %hash
;
# consolidated in a list.
And there you have it back, I even sorted it for you. Of course you could also
sort this by--publishers being what they are--descending quantities.
sort { $b->{quantity} <=> $a->{quantity}
|| $a->{author} cmp $b->{author}
|| $a->{title} cmp $b->{title}
}

I think it is important to step back and consider the source of the data. If the data are coming from a database, then you should write the SQL query so that it gives you one row for each author/title combination with the total quantity in the quantity field. If you are reading the data from a file, then you should either read it directly into a hash or use Tie::IxHash if order is important.
Once you have the data in an array of hashrefs like you do, you will have to create an auxiliary data structure and do a whole bunch of lookups, the cost of which may well dominate the running time of your program (not in a way it matters if it is run for 15 minutes once a day) and you might run into memory issues.

Related

Delete outdated entries in a hash and keep only one latest key value pair

I have a requirement where in i need to delete old entries in a hashref. For e.g. in the below data section only "2017/06/28" key value pair should survive.
Rest all key value pairs should be deleted. Please provide me ideas how to accomplish this.
DATA
$data_hashref = {
'2017/06/27' => {
'start' => '13:07:00',
'end' => '23:47:00'
},
'2017/06/15' => {
'start' => '07:11:00',
'end' => '00:18:00'
},
'2017/06/28' => {
'end' => '06:37:00',
'start' => '00:06:00'
},
'2017/06/17' => {
'start' => '09:17:00',
'end' => '10:17:00'
}
};
RESULT
$data_hashref = {
'2017/06/28' => {
'end' => '06:37:00',
'start' => '00:06:00'
}
};
Just find the one you want to keep, and assign it to the hash.
use List::Util qw( maxstr );
my $newest = maxstr keys %$href;
%$href = ( $newest => $href->{$newest} );
It's a little more efficient to find the newest key than to sort all the keys (O(N) vs O(N log N)), and no more complicated.
#!/usr/bin/perl
use strict;
use warnings;
my $href = {
'2017/06/27' => {
'start' => '13:07:00',
'end' => '23:47:00'
},
'2017/06/15' => {
'start' => '07:11:00',
'end' => '00:18:00'
},
'2017/06/28' => {
'end' => '06:37:00',
'start' => '00:06:00'
},
'2017/06/17' => {
'start' => '09:17:00',
'end' => '10:17:00'
}
};
my (undef, #keys) = sort {$b cmp $a} keys %$href;
delete #$href{ #keys };
use Data::Dumper; print Dumper $href;
Update: stevieb is correct in his comments about this post. I'll try to explain what I did.
The date format goes from largest part to the smallest, (YYYY/MM/DD). So, it can be sorted with a ordinary ascii sort, cmp.
It is sorted from the latest date to the earliest date. The sort result is assigned to undef and #keys. The latest date then will be assigned to undef and the remaining keys (to be deleted) are assigned to #keys.
The delete deletes all the earlier dates and their values from the hash using a hash slice, #$href{ #keys }, leaving only the latest date and its value, a hash reference.

Inserting one hash into another using Perl

I've tried many different versions of using push and splice, but can't seem to combine two hashes as needed. Trying to insert the second hash into the first inside the 'Item' array:
(
ItemData => { Item => { ItemNum => 2, PriceList => "25.00", UOM => " " } },
)
(
Alternate => {
Description => "OIL FILTER",
InFile => "Y",
MfgCode => "FRA",
QtyAvailable => 29,
Stocked => "Y",
},
)
And I need to insert the second 'Alternate' hash into the 'Item' array of the first hash for this result:
(
ItemData => {
Item => {
Alternate => {
Description => "OIL FILTER",
InFile => "Y",
MfgCode => "FRA",
QtyAvailable => 29,
Stocked => "Y",
},
ItemNum => 2,
PriceList => "25.00",
UOM => " ",
},
},
)
Can someone suggest how I can accomplish this?
Assuming you have two hash references, this is straight-forward.
my $item = {
'ItemData' => {
'Item' => {
'PriceList' => '25.00',
'UOM' => ' ',
'ItemNum' => '2'
}
}
};
my $alt = {
'Alternate' => {
'MfgCode' => 'FRA',
'Description' => 'OIL FILTER',
'Stocked' => 'Y',
'InFile' => 'Y',
'QtyAvailable' => '29'
}
};
$item->{ItemData}->{Item}->{Alternate} = $alt->{Alternate};
The trick here is not to actually merge $alt into some part of $item, but to only take the specific part you want and put it where you want it. We take the Alternate key from $alt and put it's content into a new Alternate key inside the guts of $item.
Adam Millerchip pointed out in a hence deleted comment that this is not a copy. If you alter any of the keys inside of $alt->{Alternative} after sticking it into $item, the data will be changed inside of $item as well because we are dealing with references.
$item->{ItemData}->{Item}->{Alternate} = $alt->{Alternate};
$alt->{Alternate}->{InFile} = 'foobar';
This will actually also change the value of $item->{ItemData}->{Item}->{Alternate}->{InFile} to foobar as seen below.
$VAR1 = {
'ItemData' => {
'Item' => {
'ItemNum' => '2',
'Alternate' => {
'Stocked' => 'Y',
'MfgCode' => 'FRA',
'InFile' => 'foobar',
'Description' => 'OIL FILTER',
'QtyAvailable' => '29'
},
'UOM' => ' ',
'PriceList' => '25.00'
}
}
};
References are supposed to do that, because they only reference something. That's what's good about them.
To make a real copy, you need to dereference and create a new anonymous hash reference.
# create a new ref
# deref
$item->{ItemData}->{Item}->{Alternate} = { %{ $alt->{Alternate} } };
This will create a shallow copy. The values directly inside of the Alternate key will be copies, but if they contain references, those will not be copied, but referenced.
If you do want to merge larger data structures where more than the content of one key needs to be merged, take a look at Hash::Merge instead.

Perl- Get Hash Value from Multi level hash

I have a 3 dimension hash that I need to extract the data in it. I need to extract the name and vendor under vuln_soft-> prod. So far, I manage to extract the "cve_id" by using the following code:
foreach my $resultHash_entry (keys %hash){
my $cve_id = $hash{$resultHash_entry}{'cve_id'};
}
Can someone please provide a solution on how to extract the name and vendor. Thanks in advance.
%hash = {
'CVE-2015-6929' => {
'cve_id' => 'CVE-2015-6929',
'vuln_soft' => {
'prod' => {
'vendor' => 'win',
'name' => 'win 8.1',
'vers' => {
'vers' => '',
'num' => ''
}
},
'prod' => {
'vendor' => 'win',
'name' => 'win xp',
'vers' => {
'vers' => '',
'num' => ''
}
}
},
'CVE-2015-0616' => {
'cve_id' => 'CVE-2015-0616',
'vuln_soft' => {
'prod' => {
'name' => 'unity_connection',
'vendor' => 'cisco'
}
}
}
}
First, to initialize a hash, you use my %hash = (...); (note the parens, not curly braces). Using {} declares a hash reference, which you have done. You should always use strict; and use warnings;.
To answer the question:
for my $resultHash_entry (keys %hash){
print "$hash{$resultHash_entry}->{vuln_soft}{prod}{name}\n";
print "$hash{$resultHash_entry}->{vuln_soft}{prod}{vendor}\n";
}
...which could be slightly simplified to:
for my $resultHash_entry (keys %hash){
print "$hash{$resultHash_entry}{vuln_soft}{prod}{name}\n";
print "$hash{$resultHash_entry}{vuln_soft}{prod}{vendor}\n";
}
because Perl always knows for certain that any deeper entries than the first one is always a reference, so the deref operator -> isn't needed here.

How can I join a nested Perl hash?

I have a Perl hash, where I store information about LUNs. It has the following structure:
my %luns = (
360000 => {
Devices => [
{ Major_Minor => "8:144",
SCSI_Address => "1:0:0:8",
SCSI_Device => "sdj",
SCSI_Host => "host1",
},
{ Major_Minor => "129:48",
SCSI_Address => "3:0:0:8",
SCSI_Device => "sder",
SCSI_Host => "host3",
},
],
DM_Device => "dm-13",
Size => "45G",
WWID => 360000,
},
360001 => {
Devices => [
{ Major_Minor => "70:144",
SCSI_Address => "1:0:1:39",
SCSI_Device => "sddb",
SCSI_Host => "host1",
},
{ Major_Minor => "135:48",
SCSI_Address => "3:0:1:39",
SCSI_Device => "sdij",
SCSI_Host => "host3",
},
],
DM_Device => "dm-53",
Size => "200G",
WWID => 360000,
},
);
How can I use join to get a comma-separated list of all SCSI_Devices, for example, of 360000?
You're working with a Hash of Hash of Array of Hash. To learn how to work with such structures, I recommend reading perldsc - Perl Data Structures Cookbook.
In this instance, the following loop will print out each of your device lists:
for my $id ( sort { $a <=> $b } keys %luns ) {
my #devices = map { $_->{SCSI_Device} } #{ $luns{$id}{Devices} };
print "$id - #devices\n";
}
Outputs:
360000 - sdj sder
360001 - sddb sdij
Live Demo
You say you want a list of values for LUN 360000, so for a start you need
$luns->{36000}
which is another hash with a Devices element, which has an array reference as a value, and DM_Device, Size, and WWID elements, whose values are simple scalars.
So presumably you want the list that is
$luns->{36000}{Devices}
which is an array of references to hashes, each of which has Major_Minor, SCSI_Address, SCSI_Device, and SCSI_Host elements.
It sounds like you want the SCSI_Device element, and map is the ideal tool to help you with this
my #scsi_devices = map { $_->{SCSI_Device} } #{ $luns->{360000}{Devices} };
That last step is a big leap, and it may help to separate it in your code. For instance, you can copy the reference to the list of devices for 360000, like this
my $devices = $luns->{360000}{Devices};
and extract the SCSI_Device from each of the hashes in that array with
my #scsi_devices = map { $_->{SCSI_Device} } #$devices;
Either way, the array reference must be dereferenced and the required element from each hash in that array must be extracted.
To get a CSV record, unless the data may contain commas of double-quotes, you simply need to join the result of that map
print join(',', #scsi_devices), "\n";
output
sdj,sder
Although I think this falls short of what you actually need. If this isn't clear then please ask.

Possible to detect hash with more than one key?

I am collecting data in a hash of hashes which looks like
$VAR1 = {
'502' => {
'user2' => '0'
},
'501' => {
'git' => '0',
'fffff' => '755'
},
'19197' => {
'user4' => '755'
}
};
The problem is in 501. Two keys may not occur. Is it possible to detect this?
Update
Fixed typo in hash.
If you are only going to store one key-value pair under each key of the main hash, why not use a 2-element array instead? That way you can check for existence before making each new insert, without needing to check the size of the hash or knowing what its keys are. The structure I'm proposing is this:
$VAR1 = {
'502' => [ 'user2', '0' ],
'501' => [ 'git', '0' ],
'19197' => [ 'user4', '755' ]
}
Assuming your hashref above is named $var :
my #bad = grep { scalar keys %{$var->{$_}} > 1 } keys %$var;
Results in an array of hash keys that have more than one hashref within them. Using your data above:
# perl test.pl
$VAR1 = {
'501' => {
'git' => '0',
'fffff' => '755'
},
'502' => {
'user2' => '0'
},
'19197' => {
'user4' => '755'
}
};
$VAR1 = '501';
Then you could access each element that is detected as bad with:
foreach my $key ( #bad ) {
# do something to or with $var->{$key}
}
keys(%{$VAR1{'501'}}) == 2 where the rest would be one.
Also, syntax error on that key, but I assume it's a typo.