Reading in a CSV File in Perl - perl

I have read files in Perl before, but not when the CSV file has the values I require on different lines. I assume I have to create an array mixed with hash keys but I'm out of my league here.
Basically, my CSV file has the following columns: branch, job, timePeriod, periodType, day1Value, day2Value, day3Value, day4Value, day4Value, day6Value, and day7Value.
The day* values represent the value of a periodType for each day of the week respectively.
For Example -
East,Banker,9AM-12PM,Overtime,4.25,0,0,1.25,1.5,1.5,0,0
West,Electrician,12PM-5PM,Regular,4.25,0,0,-1.25,-1.5,-1.5,0,0
North,Janitor,5PM-12AM,Variance,-4.25,0,0,-1.25,-1.5,-1.5,0,0
South,Manager,12A-9AM,Overtime,77.75,14.75,10,10,10,10,10,
Etc.
I need to output a file that takes this data and keys off of branch, job, timePeriod, and day. My output would list each periodType value for one particular day rather than one periodType value for all seven.
For example -
South,Manager,12A-9AM,77.75,14.75,16
In the line above, the last 3 values represent the three periodTypes (Overtime, Regular, and Variance) day1Values.
As you can see, my problem is I don't know how to load into memory the data in a manner which allows me to pull the data from different lines and output it successfully. I've only parsed off of singular lines before.

Unless you like pain, use Text::CSV and its relatives Text::CSV_XS and Text::CSV_PP.
However, that may be the easier part of this problem. Once you've read and validated that the line is complete, you need to add the relevant information to the correctly keyed hashes. You're probably going to have to get rather intimately familiar with references, too.
You might create a hash %BranchData keyed by the branch. Each element of that hash would be a reference to a hash keyed by job; and each element in that would be a reference to a hash keyed by timePeriod, and each element in that would be reference to an array keyed by day number (using indexes 1..7; it over allocates space slightly, but the chances of getting it right are vastly greater; do not mess with $[ though!). And each element of the array would be a reference to a hash keyed by the three period types. Ouch!
If everything is working well, a prototypical assignment might be something like:
$BranchData{$row{branch}}->{$row{job}}->{$row{period}}->[1]->{$row{p_type}} +=
$row{day1};
You would be iterating of elements 1..7 and 'day1' .. 'day7'; there's a bit of clean-up on the design work to do there.
You have to worry about initializing stuff correctly (or maybe you don't - Perl will do it for you). I'm assuming that the row is returned as a direct hash (rather than a hash reference), with keys for branch, job, period, period type (p_type), and each day ('day1', .. 'day7').
If you know which day you need in advance, you can avoid accumulating all days, but it may make more generalized reporting simpler to read and accumulate all the data all the time, and then simply have the printing deal with whatever subset of the entire data needs to be processed.
It was intriguing enough a problem that I've hacked together this code. I doubt if it is optimal, but it does work.
#!/usr/bin/env perl
#
# SO 8570488
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
use constant debug => 0;
my $file = "input.csv";
my $csv = Text::CSV->new({ binary => 1, eol => $/ })
or die "Cannot use CSV: ".Text::CSV->error_diag();
my #headings = qw( branch job period p_type day1 day2 day3 day4 day5 day6 day7 );
my #days = qw( day0 day1 day2 day3 day4 day5 day6 day7 );
my %BranchData;
open my $in, '<', $file or die "Unable to open $file for reading ($!)";
$csv->column_names(#headings);
while (my $row = $csv->getline_hr($in))
{
print Dumper($row) if debug;
my %r = %$row; # Not for efficiency; for notational compactness
$BranchData{$r{branch}} = { } if !defined $BranchData{$r{branch}};
my $branch = $BranchData{$r{branch}};
$branch->{$r{job}} = { } if !defined $branch->{$r{job}};
my $job = $branch->{$r{job}};
$job->{$r{period}} = [ ] if !defined $job->{$r{period}};
my $period = $job->{$r{period}};
for my $day (1..7)
{
# Assume that Overtime, Regular and Variance are the only types
# Otherwise, you need yet another level of checking whether elements exist...
$period->[$day] = { Overtime => 0, Regular => 0, Variance => 0} if !defined $period->[$day];
$period->[$day]->{$r{p_type}} += $r{$days[$day]};
}
}
print Dumper(\%BranchData);
Given your sample data, the output from this is:
$VAR1 = {
'West' => {
'Electrician' => {
'12PM-5PM' => [
undef,
{
'Regular' => '4.25',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => '-1.25',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => '-1.5',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => '-1.5',
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
}
]
}
},
'South' => {
'Manager' => {
'12A-9AM' => [
undef,
{
'Regular' => 0,
'Overtime' => '77.75',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '14.75',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 10,
'Variance' => 0
}
]
}
},
'North' => {
'Janitor' => {
'5PM-12AM' => [
undef,
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-4.25'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-1.25'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-1.5'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => '-1.5'
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
}
]
}
},
'East' => {
'Banker' => {
'9AM-12PM' => [
undef,
{
'Regular' => 0,
'Overtime' => '4.25',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '1.25',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '1.5',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => '1.5',
'Variance' => 0
},
{
'Regular' => 0,
'Overtime' => 0,
'Variance' => 0
}
]
}
}
};
Have fun taking it from here!

I don't have firsthand experience with it, but you can use DBD::CSV and then pass the relatively simple SQL query needed to compute the aggregation that you want.
If you insist on doing it the hard way, though, you can loop through and gather your data in the following hash of hash references:
(
"branch1,job1,timeperiod1"=>
{
"overtime"=>"overtimeday1value1",
"regular"=>"regulartimeday1value1",
"variance"=>"variancetimeday1value1"
},
"branch2,job2,timeperiod2"=>
{
"overtime"=>"overtimeday1value2",
"regular"=>"regulartimeday1value2",
"variance"=>"variancetimeday1value2"
},
#etc
);
and then just loop through the keys accordingly. This approach does, however, rely on a consistent formatting of the keys (eg "East,Banker,9AM-12PM" is not the same as "East, Banker, 9AM-12PM"), so you'd have to check for consistent formatting (and enforce it) while making the hash above.

Related

Declare hash variable in loop

I need to use a hash and loop in my code. Please see the sample code it's not working. i wanted to print the variable wafer, site and res side by side so it will look like this
1, 1, 63
1, 2, -53
1, 3, 9.47
1, 4, 9.55
1, 5, -8.32
my #wafer = ("1","1","1","1","1");
my #site = ("1", "2", "3", "4", "5");
my #res = ("63","-53","9.47","9.55","-8.32");
my %hash;
foreach my $result(#res) {
$hash{$wafer[0]}{$site[0]} = $result;
last;
}
print "$wafer{$wafer[0]}{$site[0]} \n";
When you want to iterate several arrays synchronously, iterate over the indices:
for my $index (0 .. $#wafer) {
print "$wafer[$index] $site[$index] $res[$index]\n";
}
You also might want to build a hash keyed by the site (as it's the only unique value):
for my $index (0 .. $#wafer) {
$hash{ $site[$index] } = { wafer => $wafer[$index],
res => $res[$index] };
}
This will create a hash like this:
%hash = (
'4' => {
'res' => '9.55',
'wafer' => '1'
},
'3' => {
'wafer' => '1',
'res' => '9.47'
},
'1' => {
'res' => '63',
'wafer' => '1'
},
'2' => {
'res' => '-53',
'wafer' => '1'
},
'5' => {
'res' => '-8.32',
'wafer' => '1'
}
);

sorting keys (which are also values) on the basis of values in hashes in perl

For example, this is my hash
'Level1_A' => {
'Level2_A' => 1071,
'Level2_B' => 3429,
'Level2_C' => 3297
},
'Level1_B' => {
'Level2_A' => 702,
'Level2_B' => 726
},
And I want an output which should look like
'Level1_A' => {
'Level2_B' => 3429,
'Level2_C' => 3297,
'Level2_A' => 1071
},
'Level1_B' => {
'Level2_B' => 726,
'Level2_A' => 702
},
Ultimately, I want to the keys corresponding to the highest values.
Level1_A___Level2_B___3429
Level1_B___Level2_A____726
Hashes are inherently unordered and there is no way to sort them. However, you can find the maximum of the values of a hash and it is best to use a module to help
List::UtilsBy provides max_by and will allow you to find the hash key corresponding to the maximum numeric value
Like this
use strict;
use warnings 'all';
use List::UtilsBy 'max_by';
my $data = {
'Level1_A' => {
'Level2_A' => 1071,
'Level2_B' => 3429,
'Level2_C' => 3297,
},
'Level1_B' => {
'Level2_A' => 702,
'Level2_B' => 726
},
};
for my $k1 ( sort keys %$data ) {
my $v1 = $data->{$k1};
my $k2 = max_by { $v1->{$_} } keys %$v1;
printf "%s %s %s\n", $k1, $k2, $v1->{$k2};
}
output
Level1_A Level2_B 3429
Level1_B Level2_B 726

Can't use string as a HASH reference

Here's the structure that I'm trying to access
Dumper $resourceAudit
$VAR1 = '{
\'rh6\' => {
\'h\' => 1,
\'n\' => 1
},
\'win2k8\' => {
\'h\' => 1,
\'n\' => 1
},
\'win2k12\' => {
\'h\' => 3,
\'n\' => 3
},
\'win2k3\' => {
\'h\' => 0,
\'n\' => 1
},
\'usim\' => {
\'h\' => 4,
\'n\' => 4
}
}';
So, I know that $resourceAudit is actually a string and so, %$resourceAudit is sure to give me the Can't use string as a HASH reference error.
Is there any way I can get around this and access the 'rh6' key?
$resourceAudit doesn't contain a reference to a hash; it contains a string. That string is Perl code that would return a reference to a hash when executed. You can use eval EXPR to run Perl code.
my $data = eval($serialized_data)
or die("Error executing audit code: $#");
... %$data ...

Perl: Get minimum distance value from multi hash using List::Util

i would like to get the smallest distance to a "snaffle" from the following hash:
$VAR1 = {
'0' => {
'y' => '7012',
'snaffle' => {
'5' => {
'y' => '3856',
'x' => '875',
'id' => '5',
'distance' => 9734
},
'6' => {
'x' => '10517',
'id' => '6',
'distance' => 510,
'y' => '6741'
},
'4' => {
'y' => '5291',
'id' => '4',
'x' => '11331',
'target' => 'true',
'distance' => 2125
},
'8' => {
'x' => '11709',
'id' => '8',
'distance' => 2236,
'y' => '5475'
},
'7' => {
'distance' => 8485,
'x' => '4591',
'id' => '7',
'y' => '544'
}
},
'x' => '10084',
'distance2mybase' => 10598,
'distance2enemybase' => 6755,
'type' => 'WIZARD',
'id' => '0',
'state' => 0
},
It is filled early:
# game loop
while (1) {
chomp(my $entities = <STDIN>); # number of entities still in game
for my $i (0..$entities-1) {
chomp($tokens=<STDIN>);
my ($entity_id, $entity_type, $x, $y, $vx, $vy, $state) = split(/ /,$tokens);
my $type;
if ($entity_type eq "WIZARD") {
$type = "wizard";
}
if ($entity_type eq "OPPONENT_WIZARD") {
$type = "enemy";
}
if ($entity_type eq "SNAFFLE") {
$type = "snaffle";
}
if ($entity_type eq "BLUDGER") {
$type = "bludger";
}
$entity{$type}{$entity_id}{x} = $x;
$entity{$type}{$entity_id}{y} = $y;
$entity{$type}{$entity_id}{state} = $state;
$entity{$type}{$entity_id}{id} = $entity_id;
$entity{$type}{$entity_id}{type} = $entity_type;
$entity{$type}{$entity_id}{distance2mybase} = &getdistance($entity{$type}{$entity_id}{x},$entity{$type}{$entity_id}{y},$mybase_x,$mybase_y);
$entity{$type}{$entity_id}{distance2enemybase} = &getdistance($entity{$type}{$entity_id}{x},$entity{$type}{$entity_id}{y},$enemybase_x,$enemybase_y);
}
foreach my $wizard_id (sort keys %{ $entity{'wizard'} }) {
foreach my $snaffle_id (sort keys %{ $entity{'snaffle'} }) {
$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{id} = $snaffle_id;
$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{x} = $entity{'snaffle'}{$snaffle_id}{x};
$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{y} = $entity{'snaffle'}{$snaffle_id}{y};
$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{distance} = &getdistance($entity{'wizard'}{$wizard_id}{x},$entity{'wizard'}{$wizard_id}{y},$entity{'snaffle'}{$snaffle_id}{x},$entity{'snaffle'}{$snaffle_id}{y});
}
&action($wizard_id,"sweep","up");
}
I tried List::Util::min, but i think im searching too deep, because as you can see in the output, it targets the wrong snaffle. (6 distance is lower then 4, which is the current target)
How can i find the overall minimum distance from all snaffles? (in case you wonder, its a codingame(.com))
sub snafflecheck {
my $wizard_id = shift;
my $wizard_x = shift;
my $wizard_y = shift;
if ($entity{'snaffle'}) {
foreach my $snaffle_id (sort keys %{ $entity{'snaffle'} }) {
my $snaffle_x = $entity{'snaffle'}{$snaffle_id}{x};
my $snaffle_y = $entity{'snaffle'}{$snaffle_id}{y};
my $distance2snaffle = &getdistance($wizard_x,$wizard_y,$snaffle_x,$snaffle_y);
my $nearest = min $entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{distance};
if ($distance2snaffle) {
if ($distance2snaffle == $nearest) {
$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{target} = "true";
return("true",$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{id},$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{x},$entity{'wizard'}{$wizard_id}{snaffle}{$snaffle_id}{y},$distance2snaffle);
}
}
}
}
Given the shown data, this is the list of all values for key distance
my #dist = map { $_->{distance} } values %{$entity->{0}{snaffle}};
However, getting the minimum value doesn't reveal its key.
One way of finding the key for which the value of distance is smallest
use List::Util 'reduce';
my $snaff = $entity->{0}{snaffle};
my $min_dist = reduce {
$snaff->{$a}{distance} < $snaff->{$b}{distance} ? $a : $b
} keys %$snaff;
print "Minimal distance: $snaff->{$min_dist}{distance} for key $min_dist\n";
To have more control you can instead iterate over %$snaff using each.
You can also sort the extracted $snaff by distance value, if you'd like to have them all.
You should first extract the reference to the snaffle hash to make things tidier. Then you can just use map to extract the distance field of each hash element and min to find the smallest of them.
If you want to know the snaffle with the smallest distance then
I suggest that you install List::UtilsBy and use its min_by operator
This code shows both operations
The hash is identical to your own, but expressed more compactly using Data::Dump instead
use strict;
use warnings 'all';
use feature 'say';
use List::Util 'min';
use List::UtilsBy 'min_by';
my %data = (
"0" => {
distance2enemybase => 6755,
distance2mybase => 10598,
id => 0,
snaffle => {
4 => { distance => 2125, id => 4, target => "true", x => 11331, y => 5291 },
5 => { distance => 9734, id => 5, x => 875, y => 3856 },
6 => { distance => 510, id => 6, x => 10517, y => 6741 },
7 => { distance => 8485, id => 7, x => 4591, y => 544 },
8 => { distance => 2236, id => 8, x => 11709, y => 5475 },
},
state => 0,
type => "WIZARD",
x => 10084,
y => 7012,
},
);
my $snaffles = $data{0}{snaffle};
my $min_distance = min map { $snaffles->{$_}{distance} } keys %$snaffles;
# OR
my $min_distance = min map { $_->{distance} } values %$snaffles;
my $closest_snaffle = min_by { $snaffles->{$_}{distance} } keys %$snaffles;
say "\$min_distance = $min_distance";
say "\$closest_snaffle = $closest_snaffle";
output
$min_distance = 510
$closest_snaffle = 6

Perl Mongo find object Id

You would think it is a simple thing. I have a list of object id's that are in my collection. I would like to get a single record based on an object id. Have Googled, but nothing helpful.
So I have object id: 5106c7703abc120a04070b34
my $client = MongoDB::MongoClient->new;
my $db = $client->get_database( 'myDatabase' );
my $id_find = $db->get_collection('mycollection')->find({},{_id => MongoDB::OID->new(value => "5106c7703abc120a04070b34")});
print Dumper $id_find;
This prints:
$VAR1 = bless( {
'_skip' => 0,
'_ns' => 'MindCrowd_test.Users',
'_grrrr' => 0,
'partial' => 0,
'_query' => {},
'_tailable' => 0,
'_client' => bless( {
'w' => 1,
'query_timeout' => 30000,
'find_master' => 0,
'_servers' => {},
'sasl' => 0,
'wtimeout' => 1000,
'j' => 0,
'timeout' => 20000,
'sasl_mechanism' => 'GSSAPI',
'auto_connect' => 1,
'auto_reconnect' => 1,
'db_name' => 'admin',
'ssl' => 0,
'ts' => 0,
'inflate_dbrefs' => 1,
'port' => 27017,
'host' => 'mongodb://localhost:27017',
'dt_type' => 'DateTime',
'max_bson_size' => 16777216
}, 'MongoDB::MongoClient' ),
'_limit' => 0,
'slave_okay' => 0,
'_request_id' => 0,
'immortal' => 0,
'started_iterating' => 0
}, 'MongoDB::Cursor' );
I have tried different verions of the above find. All of them fail to compile:
$mongo->my_db->my_collection(find({_id => "ObjectId(4d2a0fae9e0a3b4b32f70000"}));
$mongo->my_db->my_collection(
find({ _id => MongoDB::OID->new(value => "4d2a0fae9e0a3b4b32f70000")})
);
NONE of them work. How do I find (findone) a single record using the object id??
the find methods returns a Cursor object for iterating through. If you only want one record use the find_one method which returns a value.
my $client = MongoDB::MongoClient->new;
my $db = $client->get_database( 'myDatabase' );
my $id_find = $db->get_collection('mycollection')->find_one({_id => MongoDB::OID->new(value => "5106c7703abc120a04070b34")});
print Dumper $id_find;
The answer to this has changed. MongoDB::OID has been deprecated, replaced by BSON::OID, which does not have a method that allows you to pass in the 24-byte hex string that you have. Here's what you have to do these days:
my $id = "5c7463277fc2198b64654feb";
my $oid = BSON::OID->new(oid => pack('H24', $id));
my $result = $db->get_collection('mycollection')->find_id($oid);
pack creates a 12-byte binary sequence from the 24-bytes of hexadecimal data you have in $id. This is what BSON::OID is expecting, and then the perl driver constructs the correct filter for you in the background.