Perl Hash Count - perl

I have a table with users the gender of their kids in seprate lines.
lilly boy
lilly boy
jane girl
lilly girl
jane boy
I wrote a script to put parse the lines and give me a total at the end
lilly boys=2 girls1
jane boys=1 girls=1
I tried this with a hash, but I dont know how to approach it
foreach $lines (#all_lines){
if ($lines =~ /(.+?)/s(.+)/){
$person = $1;
if ($2 =~ /boy/){
$boycount=1;
$girlcount=0;
}
if ($2 =~ /girl/){
$boycount=0;
$girlcount=1;
}
the next part is, if the person doesn't already exist inside the hash, add the person and then start a count for boy and girl. (i think this is the correct way, not sure)
if (!$hash{$person}){
%hash = (
'$person' => [
{'boy' => "0+$boycount", 'girl' => "0+$girlcount"}
],
);
Now, I dont know how to keep updating the values inside the hash, if the person already exists in the hash.
%hash = (
'$person' => [
{'boys' => $boyscount, 'girls' => $girlscount}
],
);
I am not sure how to keep updating the hash.

You just need to study the Perl Data Structures Cookbook
use strict;
use warnings;
my %person;
while (<DATA>) {
chomp;
my ($parent, $gender) = split;
$person{$parent}{$gender}++;
}
use Data::Dump;
dd \%person;
__DATA__
lilly boy
lilly boy
jane girl
lilly girl
jane boy

use strict;
use warnings;
my %hash;
open my $fh, '<', 'table.txt' or die "Unable to open table: $!";
# Aggregate stats:
while ( my $line = <$fh> ) { # Loop over record by record
chomp $line; # Remove trailing newlines
# split is a better tool than regexes to get the necessary data
my ( $parent, $kid_gender ) = split /\s+/, $line;
$hash{$parent}{$kid_gender}++; # Increment by one
# Take advantage of auto-vivification
}
# Print stats:
for my $parent ( keys %hash ) {
printf "%s boys=%d girls = %d\n",
$parent, $hash{$parent}{boy}, $hash{$parent}{girl};
}

Related

Read file into two hashes with Perl

I'm struggling to understand how to read a simple text file into two Perl hashes.
I have a text file like:
George Washington
John Adams
Abraham Lincoln
and I want to create two hashes, one that holds the first names and the other that holds the last names.
I'm looking at doing something like:
my %first;
my %last;
open(my $FH, '<', $file) or die$!;
my $count = 1;
while (<$FH>)
{
chomp;
if count is odd, add to %first
elsif count is even, add to %last
}
close($FH);
but I'm honestly lost. Does anyone have any ideas?
Well you can get desired result with following code.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $count = 0;
my %first;
my %last;
while(<DATA>) {
chomp;
my($f,$l) = split;
$first{$f} = $count;
$last{$l} = $count;
$count++;
}
say Dumper(\%first);
say Dumper(\%last);
__DATA__
George Washington
John Adams
Abraham Lincoln
Output
$VAR1 = {
'George' => 0,
'Abraham' => 2,
'John' => 1
};
$VAR1 = {
'Adams' => 1,
'Lincoln' => 2,
'Washington' => 0
};

Add Multiple values together when a condition is met?

My mind seems to be missing a few screws today. I have an issue that I'm baffled by, but to be fair, I'm new to Perl scripting.
I am opening a csv file and need to look for duplicate values in one column, and where there are duplicates in this column, I need to add all values from another column for each duplicate together and print it on a new line in a new file.
open(my $feed, '<', $rawFile) or die "Could not locate '$rawFile'\n";
open(OUTPUT, '>', $newFile) or die "Could not locate '$newFile'\n";
while(my $line = <$feed>) {
chomp $line;
my #columns = split /,/, $line;
$Address= $columns[1];
$forSale= $columns[3];
}
I understand how to open the file and read it line by line. I know how to print results to new file. What I'm having trouble with is building logic to say, "For each Address in this extract that're duplicates, add all of their forSale's up and print the Address in new file with the added forSale's values. I hope this makes sense. Any assistance at all is encouraged.
The tool you need for this job is a hash.
This will allow you to 'key' things by Address:
my %sum_of;
while(my $line = <$feed>) {
chomp $line;
my #columns = split /,/, $line;
$Address= $columns[1];
$forSale= $columns[3];
$sum_of{$Address} += $forSale;
}
foreach my $address ( sort keys %sum_of ) {
print "$address => $sum_of{$address}\n";
}
Hello Chris Simmons,
I would like to add a few minor modification(s) on the perfect answer that Sobrique provided you.
You can open a file on the way you did but also you can open multiple files on the command line e.g. test.pl sample1.csv sample2.csv, you can read about it here eof.
I would also choose to check the file if it contains comma character (,) else print on terminal that this line can not be parsed.
Next step after splitting all values in the array I would trim the string(s) for white space leading and trailing.
Having said all that see solution bellow:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash;
while (<>) {
chomp;
if (index($_, ',') != -1) {
my #fields = split(/,/);
# remove leading and trailing white space
s{^\s+|\s+$}{}g foreach #fields;
$hash{$fields[0]} += $fields[3];
}
else {
warn "Line could not be parsed: $_\n";
}
} continue {
close ARGV if eof;
}
print Dumper \%hash;
__END__
$ perl test.pl sample.csv
$VAR1 = {
'123 6th St.' => 3,
'71 Pilgrim Avenue' => 5
};
__DATA__
123 6th St., Melbourne, FL 32904, 2
71 Pilgrim Avenue, Chevy Chase, MD 20815, 5
123 6th St., Melbourne, CT 06074, 1
Since you did not provide us sample of input data I created my own.
Another possible way is to use the module Text::CSV as ikegami proposed. Sample of code with the same checks that I mentioned earlier, see bellow:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my $csv = Text::CSV->new({ sep_char => ',' });
my %hash;
while (<>) {
chomp;
if ($csv->parse($_)) {
my #fields = $csv->fields();
# remove leading and trailing white space
s{^\s+|\s+$}{}g foreach #fields;
$hash{$fields[0]} += $fields[3];
} else {
warn "Line could not be parsed: $_\n";
}
} continue {
close ARGV if eof;
}
print Dumper \%hash;
__END__
$ perl test.pl sample.csv
$VAR1 = {
'123 6th St.' => 3,
'71 Pilgrim Avenue' => 5
};
__DATA__
123 6th St., Melbourne, FL 32904, 2
71 Pilgrim Avenue, Chevy Chase, MD 20815, 5
123 6th St., Melbourne, CT 06074, 1
Hope this helps.
BR / Thanos

Perl read and write text file with strings

Friends need help. Following my INPUT TEXT FILE
Andrew UK
Cindy China
Rupa India
Gordon Australia
Peter New Zealand
To convert the above into hash and to write back into file when the records exist in a directory. I have tried following (it does not work).
#!/usr/perl/5.14.1/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash = ();
my $file = ".../input_and_output.txt";
my $people;
my $country;
open (my $fh, "<", $file) or die "Can't open the file $file: ";
my $line;
while (my $line =<$fh>) {
my ($people) = split("", $line);
$hash{$people} = 1;
}
foreach my $people (sort keys %hash) {
my #country = $people;
foreach my $c (#country) {
my $c_folder = `country/test1_testdata/17.26.6/$c/`;
if (-d $cad_root){
print "Exit\n";
} else {
print "NA\n";
}
}
This is the primary problem:
my ($people) = split("", $line);
Your are splitting using an empty string, and you are assigning the return value to a single variable (which will just end up with the first character of each line).
Instead, you should split on ' ' (a single space character which is a special pattern):
As another special case, ... when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\x20" , but not e.g. / /). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.
Limit the number of fields returned to ensure the integrity of country names with spaces:
#!/usr/bin/env perl
use strict;
use warnings;
my #people;
while (my $line = <DATA>) {
$line =~ /\S/ or next;
$line =~ s/\s+\z//;
push #people, [ split ' ', $line, 2 ];
}
use YAML::XS;
print Dump \#people;
__DATA__
Andrew UK
Cindy China
Rupa India
Gordon Australia
Peter New Zealand
The entries are added to an array so 1) The input order is preserved; and 2) Two people with the same name but from different countries do not result in one entry being lost.
If the order is not important, you could just use a hash keyed on country names with people's names in an array reference for each entry. For now, I am going to assume order matters (it would help us help you if you put more effort into formulate a clear question).
One option is to now go through the list of person-country pairs, and print all those pairs for which the directory country/test1_testdata/17.26.6/$c/ exists (incidentally, in your code you have
my $c_folder = `country/test1_testdata/17.26.6/$c/`;
That will try to execute a program called country/test1_testdata/17.26.6/$c/ and save its output in $c_folder if it produces any. To moral of the story: In programming, precision matters. Just because ` looks like ', that doesn't mean you can use one to mean the other.)
Given that your question is focused on hashes, I use an array of references to anonymous hashes to store the list of people-country pairs in the code below. I cache the result of the lookup to reduce the number of times you need to hit the disk.
#!/usr/bin/env perl
use strict;
use warnings;
#ARGV == 2 ? run( #ARGV )
: die_usage()
;
sub run {
my $people_data_file = shift;
my $country_files_location = shift;
open my $in, '<', $people_data_file
or die "Failed to open '$people_data_file': $!";
my #people;
my %countries;
while (my $line = <$in>) {
next unless $line =~ /\S/; # ignore lines consisting of blanks
$line =~ s/\s+\z//;# remove all trailing whitespace
my ($name, $country) = split ' ', $line, 2;
push #people, { name => $name, country => $country };
$countries{ $country } = undef;
}
# At this point, #people has a list of person-country pairs
# We are going to use %countries to reduce the number of
# times we need to check the existence of a given directory,
# assuming that the directory tree is stable while this program
# is running.
PEOPLE:
for my $person ( #people ) {
my $country = $person->{country};
if ($countries{ $country }) {
print join("\t", $person->{name}, $country), "\n";
}
elsif (-d "$country_files_location/$country/") {
$countries{ $country } = 1;
redo PEOPLE;
}
}
}
sub die_usage {
die "Need data file name and country files location\n";
}
Now, there are a bazillion variations on this which is why it is important for you to formulate a clear and concise question so people trying to help you can answer your specific questions, instead of each coming up his/her own solution to the problem as they see it. For example, one could also do this:
#!/usr/bin/env perl
use strict;
use warnings;
#ARGV == 2 ? run( #ARGV )
: die_usage()
;
sub run {
my $people_data_file = shift;
my $country_files_location = shift;
open my $in, '<', $people_data_file
or die "Failed to open '$people_data_file': $!";
my %countries;
while (my $line = <$in>) {
next unless $line =~ /\S/; # ignore lines consisting of blanks
$line =~ s/\s+\z//;# remove all trailing whitespace
my ($name, $country) = split ' ', $line, 2;
push #{ $countries{$country} }, $name;
}
for my $country (keys %countries) {
-d "$country_files_location/$country"
or delete $countries{ $country };
}
# At this point, %countries maps each country for which
# we have a data file to a list of people. We can then
# print those quite simply so long as we don't care about
# replicating the original order of lines from the original
# data file. People's names will still be sorted in order
# of appearance in the original data file for each country.
while (my ($country, $people) = each %countries) {
for my $person ( #$people) {
print join("\t", $person, $country), "\n";
}
}
}
sub die_usage {
die "Need data file name and country files location\n";
}
If what you want is a counter of names in a hash, then I got you, buddy!
I won't attempt the rest of the code because you are checking a folder of records
that I don't have access to so I can't trouble shoot anything more than this.
I see one of your problems. Look at this:
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say'; # Really like using say instead of print because no need for newline.
my $file = 'input_file.txt';
my $fh; # A filehandle.
my %hash;
my $people;
my $country;
my $line;
unless(open($fh, '<', $file)){die "Could not open file $_ because $!"}
while($line = <$fh>)
{
($people, $country) = split(/\s{2,}/, $line); # splitting on at least two spaces
say "$people \t $country"; # Just printing out the columns in the file or people and Country.
$hash{$people}++; # Just counting all the people in the hash.
# Seeing how many unique names there are, like is there more than one Cindy, etc ...?
}
say "\nNow I'm just sorting the hash of people by names.";
foreach(sort{$a cmp $b} keys %hash)
{
say "$_ => $hash{$_}"; # Based on your file. The counter is at 1 because nobody has the same names.
}
Here is the output. As you can see I fixed the problem by splitting on at least two white-spaces so the country names don't get cut out.
Andrew UK
Cindy China
Rupa India
Gordon Australia
Peter New Zealand
Andrew United States
Now I'm just sorting the hash of people by names.
Andrew => 2
Cindy => 1
Gordon => 1
Peter => 1
Rupa => 1
I added another Andrew to the file. This Andrew is from the United States
as you can see. I see one of your problems. Look at this:
my ($people) = split("", $line);
You are splitting on characters as there is no space between those quotes.
If you look at this change now, you are splitting on at least one space.
my ($people) = split(" ", $line);

Using hash values as categories in Perl

I'm reading two tab separated files into two hashes, files looks like this:
apple fruit
pear fruit
carrot vegetable
potato vegetable
peach fruit
and
apple 23
pear 34
carrot 12
potato 45
peach 12
I want to pick up only vegetable and get their numbers. Is there any smarter way than through the for cycle to do this?
And if I want to create two new hashes %fruits and %vegetable, do I really have to do it like:
foreach (keys %kinds_hash) {
if ($kinds_hash{$_} =~ "vegetable") {
$vegetable{$_} = $numbers_hash{$_};
} elsif ($kinds_hash{$_} =~ "fruit") {
$fruit{$_} = $numbers_hash{$_};
}
}
There's nothing wrong with iterating on all the values.
However, if you're going to be doing it often, then perhaps it would be useful to create a new data structure that contains an array of names based off type.
use strict;
use warnings;
# Data in Paragraph mode
local $/ = '';
my %counts = split ' ', <DATA>;
my %types = split ' ', <DATA>;
# Create a structure that puts each type into an array
my %group_by_type;
while (my ($name, $type) = each %types) {
push #{$group_by_type{$type}}, $name
}
# Show all Veges
for my $fruit (#{$group_by_type{vegetable}}) {
print "$fruit $counts{$fruit}\n";
}
__DATA__
apple 23
pear 34
carrot 12
potato 45
peach 12
apple fruit
pear fruit
carrot vegetable
potato vegetable
peach fruit
Outputs:
carrot 12
potato 45
To learn more about Hashes of Arrays and other data structures, check out perldsc - Perl Data Structures Cookbook
You should structure your data so that all the ways you want to access it are made as simple as possible.
You want to access all the items in the vegetable category, and the numbers for all of those items. To make that simple I would build two hashes - one relating the names of the items to their number and category, and another relating the categories to all the names in each category.
This code does just that and uses Data::Dump to show you what has been built.
use strict;
use warnings;
use autodie;
my %items;
my %categories;
open my $fh, '<', 'numbers.tabsep';
while (<$fh>) {
next unless /\S/;
chomp;
my ($name, $number) = split /\t/;
$items{$name}[0] = $number;
}
open $fh, '<', 'categories.tabsep';
while (<$fh>) {
next unless /\S/;
chomp;
my ($name, $cat) = split /\t/;
$items{$name}[1] = $cat;
push #{ $categories{$cat} }, $name;
}
use Data::Dump;
dd \%items;
dd \%categories;
output
{
apple => [23, "fruit"],
carrot => [12, "vegetable"],
peach => [12, "fruit"],
pear => [34, "fruit"],
potato => [45, "vegetable"],
}
{
fruit => ["apple", "pear", "peach"],
vegetable => ["carrot", "potato"],
}
Now, to answer the question "I want to pick up only vegetables and get their numbers" we just loop over the vegetable element of the %categories hash, and use the %items hash to determine their numbers. Like this
for my $item (#{ $categories{vegetable} }) {
printf "%s %d\n", $item, $items{$item}[0];
}
output
carrot 12
potato 45
Tool completed successfully
You can create hash of hashes, just one nested data structure where the inner key will be your category and the value will be another hash whose key will be type and value be the number.
Following program does that:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %data;
open my $fh_one, '<', 'file1';
while(<$fh_one>) {
next unless /\S+/;
chomp;
my ($type, $category) = split /\t/;
$data{$category}{$type} = undef;
}
close($fh_one);
open my $fh_two, '<', 'file2';
OUTER: while(<$fh_two>) {
next unless /\S+/;
chomp;
my ($type, $number) = split /\t/;
for my $category (keys %data) {
for my $item (keys %{ $data{$category} }) {
$data{$category}{$item} = $number and next OUTER if $item eq $type;
}
}
}
close($fh_two);
#print Dumper \%data;
while (my ($type, $number) = each $data{'vegetable'}) {
print "$type $number\n";
}
If you uncomment the print Dumper \%data; you will see the nested data structure. It will look like the following:
$VAR1 = {
'fruit' => {
'peach' => '12',
'apple' => '23',
'pear' => '34'
},
'vegetable' => {
'carrot' => '12',
'potato' => '45'
}
};
The output of the above program is:
carrot 12
potato 45

Perl : change array item that is hashed to a key

I am having some problem with my perl. I hashed a key to an array. Now I want to change things in the array for each key. But I can't find out how this works :
open(DATEBOOK,"<sample.file");
#datebook = <DATEBOOK>;
$person = "Norma";
foreach(#datebook){
#record = ();
#lines = split(":",$_);
$size = #lines;
for ($i=1; $i < $size; $i++){
$record[$i-1] = $lines[$i];
}
$map{$lines[0]}="#record";
}
for(keys%map){ print $map{$_}};
The datebook file :
Tommy Savage:408.724.0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66:34200
Lesle Kerstin:408.456.1234:4 Harvard Square, Boston, MA 02133:4/22/62:52600
JonDeLoach:408.253.3122:123 Park St., San Jose, CA 94086:7/25/53:85100
Ephram Hardy:293.259.5395:235 Carlton Lane, Joliet, IL 73858:8/12/20:56700
Betty Boop:245.836.8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500
William Kopf:846.836.2837:6937 Ware Road, Milton, PA 93756:9/21/46:43500
Norma Corder:397.857.2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
James Ikeda:834.938.8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Lori Gortz:327.832.5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
Barbara Kerz:385.573.8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268500
I tried $map{$_}[1], but that doesn't work. Can anyone give me an example on how this works :) ?
thanks!
First, use strict and use warnings. Always.
Assuming what you want is a hash of arrays, do something like this:
use strict;
use warnings;
open my $datebookfh, '<', 'sample.file' or die $!;
my #datebook = <$datebookfh>;
my %map;
foreach my $row( #datebook ) {
my #record = split /:/, $row;
my $key = shift #record; # throw out first element and save it in $key
$map{$key} = \#record;
}
You can test that you have the correct structure by using Data::Dumper:
use Data::Dumper;
print Dumper( \%map );
The \ operator takes a reference. All hashes and arrays in Perl contain scalars, so compound structures (e.g. hashes of arrays) are really hashes of references to arrays. A reference is like a pointer.
Before going further, you should check out:
Perl reference tutorial
Arrays of arrays
Perl Data Structure Cookbook
Others have given you excellent advice. Here's one other idea to consider: store your data in a hash of hashes rather than a hash of arrays. It makes the data structure more communicative.
# Include these in your Perl scripts.
use strict;
use warnings;
my %data;
# Use lexical files handles, and check whether open() succeeds.
open(my $fh, '<', shift) or die $!;
while (my $line = <$fh>){
chomp $line;
my ($name, $ss, $address, $date, $number) = split /:/, $line;
$data{$name} = {
name => $name,
ss => $ss,
address => $address,
date => $date,
number => $number,
};
}
# Example usage: print info for one person.
my $person = $data{'Betty Boop'};
print $_, ' => ', $person->{$_}, "\n" for keys %$person;