Print the word after match from Perl array - perl

Array #array has
Sex: M
Name: John, oliver
Age is 33
Has no experience
is 5 feet tall
I want to print the word after Name: which is John, oliver in this case.
Below code works on $string, how to do the same on an #array?
my ($name) = $string =~ /Name: (.+)$/;
print $name;

You will have to iterate over the array, apply the regex on each element. Something like:
my #array = ( 'Sex: M', 'Name: John, oliver', 'Age: 33' )
foreach my $item ( #array ) {
if( $item =~ /Name: (.+)$/ ) {
print $1;
}
}
I think hash is a better datastructure to store your data.

To find results in an array, the tool for the job is grep.
foreach my $name_lines ( grep { m/Name/ } #array ) {
my ($name) = /Name: (.+)$/;
print $name,"\n";
}
Here I've assumed there might be multiple matches - you don't have to do that particularly though and could instead:
my ($name) = map { m/Name: (.+$)/ } #stuff;
print $name;
This uses map to transform the array, but because we assign it to a list containing a single scalar, the second match is discarded. (if there is one).
Although I'd suggest this isn't the best approach to take - an array of keys and values isn't particularly useful compared to a hash.
If you have an array:
my #stuff = ( "Sex: M",
"Name: John, oliver",
"Age is 33",
"Has no experience",
"is 5 feet tall" );
Then you can transform it with map:
my %stuff_hash = map { /(\w+):? (.*)$/ } #stuff;
Which gives you a data structure looking like this:
$VAR1 = {
'Age' => 'is 33',
'Sex' => 'M',
'is' => '5 feet tall',
'Name' => 'John, oliver',
'Has' => 'no experience'
};
So you can:
print $stuff_hash{'Name'},"\n";
Or alternatively - stick your array back together into a string, and then multi line the regex:
my ($name) = join ( "\n", #stuff) =~ m/Name: (.*)$/m;
print $name;

Let the structure fit your data
my #arr = (
{
Sex => 'M',
Name => 'John Doe',
Age => 21,
},
{
Sex => 'F',
Name => 'Jane Doe',
Age => 20,
},
);
Then you can more easily find the data:
for (#arr){
print "$_->{Name}\n";
}

Related

Remove duplicates from a 2D array in perl

I have a 2D array in perl whose data is coming as rows in html format from a DB like the data shown below:
<tr><td>Rafa</td><td>Nadal</td><td>Data1</td></tr>,
<tr><td>Goran</td><td>Ivan</td><td>Data2</td></tr>,
<tr><td>Leander</td><td>Paes</td><td>Data2</td></tr>,
<tr><td>Leander</td><td>Paes</td><td>Data2</td></tr>
i want to remove the duplicate rows from the array.
"<tr><td>Leander</td><td>Paes</td><td>Data2</td></tr>" should be removed in above case.
I tried the below piece of code, but it's not working out.
sub unique {
my %seen;
grep ! $seen{ join $;, #$_ }++, #_
}
First: you really should try not to use outdated Perl syntax and side effects.
Second: the answer depends on the data structure you generate from the input. Here are two example implementations:
#!/usr/bin/perl
use strict;
use warnings;
# 2D Array: list of array references
my #data = (
['Rafa', 'Nadal', 'Data1'],
['Goran', 'Ivan', 'Data2'],
['Leander', 'Paes', 'Data2'],
['Leander', 'Paes', 'Data2'],
);
my %seen;
foreach my $unique (
grep {
not $seen{
join('', #{ $_ })
}++
} #data
) {
print join(',', #{ $unique }), "\n";
}
print "\n";
# List of "objects", keys are table column names
#data = (
{ first => 'Rafa', last => 'Nadal', data => 'Data1' },
{ first => 'Goran', last => 'Ivan', data => 'Data2' },
{ first => 'Leander', last => 'Paes', data => 'Data2' },
{ first => 'Leander', last => 'Paes', data => 'Data2' },
);
%seen = ();
my #key_order = qw(first last data);
foreach my $unique (
grep {
not $seen{
join('', #{ $_ }{ #key_order } )
}++
} #data
) {
print join(',', #{ $unique }{ #key_order }), "\n";
}
Output:
$ perl dummy.pl
Rafa,Nadal,Data1
Goran,Ivan,Data2
Leander,Paes,Data2
Rafa,Nadal,Data1
Goran,Ivan,Data2
Leander,Paes,Data2
The shown sub is good for the job, with an array which for elements has array references. That is indeed a basic way to organize 2D data, where your rows are arrayrefs.
There are modules that can be leveraged for this, but this good old method works fine as well
use warnings;
use strict;
use Data::Dump qw(dd);
sub uniq_arys {
my %seen;
grep { not $seen{join $;, #$_}++ } #_;
}
my #data = (
[ qw(one two three) ],
[ qw(ten eleven twelve) ],
[ qw(10 11 12) ],
[ qw(ten eleven twelve) ],
);
my #data_uniq = uniq_arys(#data);
dd \#data_uniq;
Prints as expected (last row is gone), using Data::Dump to show data.
The sub works by joining each array into a string, and those are then checked for duplicates using a hash. The $; is a subscript separator, and an empty string '' is just fine instead.
This approach creates a lot of ancillary data -- in principle doubles the data -- and if performance becomes a problem it may be better to simply compare element-wise (at the cost of complexity). This can be an issue only with rather large data sets.
A module example: use uniq_by from List::UtilsBy
use List::UtilsBy qw(uniq_by);
my #no_dupes = uniq_by { join '', #$_ } #data;
This does, more or less, the same as the sub above.

perl: subroutine returns 0 instead of specified array

I have a hash of hashes like this:
my %HoH = (
flintstones => {
1 => "fred",
2 => "barney",
},
jetsons => {
1 => "george",
2 => "jane",
},
simpsons => {
1 => "homer",
2 => "marge",
},
);
My subroutine is meant to search through the values of a specified key, e.g. search all 2s for e and return the value for key 1 in each case.
It works since it can print those things just fine, and I can also print it to a text file. I also want the same lines to be pushed to an array #output.
Why does my subroutine return zero which is saved in $hej in this case.
sub search_hash {
# Arguments are
#
# $hash=hash ref
# $parameter1=key no. to search in
# $parameter2=value to find
# $parameter3=name of text file to write to
my ( $hash, $parameter1, $parameter2, $parameter3 ) = #_, ;
# Loop over the keys in the hash
foreach ( keys %{$hash} ) {
# Get the value for the current key
my $value = $hash->{$_};
my $value2 = $hash->{'1'};
search_hash( $value, $parameter1, $parameter2, $parameter3 );
for my $key ( $parameter1 ) {
my #output; #create array for loop outputs to be saved
if ( $value =~ $parameter2 ) {
push #output, "$value2"; #push lines to array
print "Value: $value\n";
print "Name: $value2\n";
open( my $fh, '>>', $parameter3 );
print $fh ( "$value2\n" );
close $fh;
}
return #output;
}
}
}
my $hej = search_hash( \%HoH, "2", 'e', 'data3.txt' );
print $hej;
output
Can't use string ("fred") as a HASH ref while "strict refs" in use
There is no key "1" in first loop of your hash. Recursive subroutine is not a good choice here.
my $value2 = $hash->{'1'};
Borodin's one line code is great. But we should search 2 s.
search all 2 s for e and return the value for key 1 in each case.
As a summary, search_hash.pl
use strict;
use warnings;
use utf8;
my %HoH = (
Flintstones => { 1 => "Fred", 2 => "Barney" },
Jetsons => { 1 => "George", 2 => "Jane" },
Simpsons => { 1 => "Homer", 2 => "Marge" }
);
my #output2 = map { $_->{1} } grep { $_->{2} =~ /e/ } values %HoH;
open( my $fh, '>', "data3.txt");
print $fh ( "$_\n" ) foreach #output2;
close $fh;
And
perl search_hash.pl
cat data3.txt
OUTPUT:
Fred
Homer
George
The return expression of subroutine is evaluated in the same context as the subroutine itself. Since you're assuming the result of the subroutine to a scalar, the subroutine is evaluated in scalar context, and #output is evaluated in scalar context. In scalar context, an array returns the number of elements it contains. In this case, #output happened to be empty, so search_hash returned zero.
If you want the elements of #output instead of the number of elements in #output, you will need to call the subroutine in list context. Assigning the result to an array is one way of doing that.
This is how I fixed the problem in the rewrite posted below. Note that I replaced the scalar $hej with the array #hej below.
I also fixed other problems. The values for key 1 from all three
2nd level hashes are now returned, because each of them contains a value for key 2, which contains e (the value to find). See below.
use strict;
use warnings;
my %HoH = (
Flintstones => { 1 => "Fred", 2 => "Barney" },
Jetsons => { 1 => "George", 2 => "Jane" },
Simpsons => { 1 => "Homer", 2 => "Marge" }
);
sub search_hash {
# Arguments:
# $hash: hash ref
# $search_key: key to search in each 2-nd level hash
# $search_string: value to find
my ( $hash, $search_key, $search_string ) = #_;
my #output;
foreach ( keys %{$hash} ) {
#print "Key: $_\n";
my $hash2 = $hash->{$_}; # 2-nd level hash (reference to)
my $search_val = $hash2->{$search_key}; # Value for key == parameter1
#print "Value: $search_val\n";
if ($search_val =~ /\Q$search_string/) {
my $id = $hash2->{'1'};
#print "Name: $id\n";
push #output, $id;
}
}
return #output;
}
my #hej = search_hash( \%HoH, '2', 'e' );
print "Result: #hej\n";

Manipulate and access the contents of a hash of hashes

I am having trouble with writing a Perl script.
This is the task:
My code works fine but has two issues.
I want to add an element to the hash %grocery, which contains category, brand and price. When adding the item, first the system will ask for the category.
If the category does not exist then it will add a new category, brand and price from the user, but if the category already exists then it will take the brand name and price from the user and append it to the existing category.
When I try to do so it erases the preexisting items. I want the previous items appended with the newly added item.
This issue is with the max value. To find the maximum price in the given hash. I am getting garbage value for that.
What am I doing wrong?
Here is my full code:
use strict;
use warnings;
use List::Util qw(max);
use feature "switch";
my $b;
my $c;
my $p;
my $highest;
print "____________________________STORE THE ITEM_____________________\n";
my %grocery = (
"soap" => { "lux" => 13.00, "enriche" => 11.00 },
"detergent" => { "surf" => 18.00 },
"cleaner" => { "domex" => 75.00 }
);
foreach my $c ( keys %grocery ) {
print "\n";
print "$c\n";
foreach my $b ( keys %{ $grocery{$c} } ) {
print "$b:$grocery{$c}{$b}\n";
}
}
my $ch;
do {
print "________________MENU_________________\n";
print "1.ADD ITEM\n";
print "2.SEARCH\n";
print "3.DISPLAY\n";
print "4.FIND THE MAX PRICE\n";
print "5.EXIT\n";
print "enter your choice \n";
$ch = <STDIN>;
chomp( $ch );
given ( $ch ) {
when ( 1 ) {
print "Enter the category you want to add";
$c = <STDIN>;
chomp( $c );
if ( exists( $grocery{$c} ) ) {
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
print "Enter price\n";
$p = <STDIN>;
chomp( $p );
$grocery{$c} = { $b, $p };
print "\n";
}
else {
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
print "Enter price\n";
$p = <STDIN>;
chomp( $p );
$grocery{$c} = { $b, $p };
print "\n";
}
}
when ( 2 ) {
print "Enter the item that you want to search\n";
$c = <STDIN>;
chomp( $c );
if ( exists( $grocery{$c} ) ) {
print "category $c exists\n\n";
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
if ( exists( $grocery{$c}{$b} ) ) {
print "brand $b of category $c exists\n\n";
print "-----$c-----\n";
print "$b: $grocery{$c}{$b}\n";
}
else {
print "brand $b does not exists\n";
}
}
else {
print "category $c does not exists\n";
}
}
when ( 3 ) {
foreach $c ( keys %grocery ) {
print "$c:\n";
foreach $b ( keys %{ $grocery{$c} } ) {
print "$b:$grocery{$c}{$b}\n";
}
}
}
when ( 4 ) {
print "\n________________PRINT HIGHEST PRICED PRODUCT____________________\n";
$highest = max values %grocery;
print "$highest\n";
}
}
} while ( $ch != 5 );
When I try to do so it erases the preexisting items. I want the previous items appended with the newly added item.
In this line you are overwriting the value of $grocery{$c} with a new hash reference.
$grocery{$c}={$b,$p};
Instead, you need to edit the existing hash reference.
$grocery{$c}->{$b} = $p;
That will add a new key $b to the existing data structure inside of $grocery{$b} and assign it the value of $p.
Let's take a look at what that means. I've added this to the code after %grocery gets initialized.
use Data::Dumper;
print Dumper \%grocery;
We will get the following output. Hashes are not sorted, so the order might be different for you.
$VAR1 = {
'cleaner' => {
'domex' => '75'
},
'detergent' => {
'surf' => '18'
},
'soap' => {
'enriche' => '11',
'lux' => '13'
}
};
As you can see we have hashes inside of hashes. In Perl, references are used to construct a multi level data structure. You can see that from the curly braces {} in the output. The very first one after $VAR1 is because I passed a reference of $grocery to Dumper by adding the backslash \ in front.
So behind the value for $grocery{"cleaner"} is a hash reference { "domex" => 75 }. To reach into that hash reference, you need to use the dereferencing operator ->. You can then put a new key into that hash ref like I showed above.
# ##!!!!!!!!!!
$grocery{"cleaner"}->{"foobar"} = 30;
I've marked the relevant parts above with a comment. You can read up on this stuff in these documents: perlreftut, perllol, perldsc and perlref.
This issue is with the max value. To find the max of the values of the given hash. I am getting garbage value for that.
This problem is also based on the fact that you don't yet understand references.
$highest = max values %grocery;
Your code will only take the values directly inside %grocery. If you scroll up and look at the Dumper output again, you'll see that there are three hash references inside of %grocery. Now if you do not dereference them, you just get their scalar representation. A scalar in Perl is a single value, like a number or a string. But for references it is their type and address. What looks like garbage is in fact the memory address of the three hash references in %grocery which has the highest number.
Of course that's not what you want. You need to iterate both levels of your data structure, collect all values and then find the highest one.
my #all_prices;
foreach my $category (keys %grocery) {
push #all_prices, values %{ $grocery{$category} };
}
$highest = max #all_prices;
print "$highest\n";
I chose a very verbose approach to do that. It iterates over all categories in %grocery and then grabs all the values of the hash reference stored behind each of them. Those get added to an array, and in the end we can take the max of all of them from the array.
You have the exact same code for the when a category already exists and when it does not. The line
$grocery{$c} = { $b, $p };
replaces the entire hash for category $c. That's fine for new categories, but if the category is already there then it will throw away any existing information
You need to write
$grocery{$c}{$b} = $p;
And please add a lot more whitespace around operators, separating the elements of lists, and delineating related sequences of statements
With regard to finding the maximum price, your line
$highest = max values %grocery;
is trying to calculate the maximum of the hash references corresponding to the categories
Since there are two levels of hash here, you need
$highest = max map { values %$_ } values %grocery;
but that may not be the way you're expected to do it. If in doubt then you should use two nested for loops
use List::Util qw(max);
use Data::Dumper;
my $grocery =
{
"soap" => { "lux" => 13.00, "enriche" => 11.00 },
"detergent" => { "surf" => 18.00 },
"cleaner" => { "domex"=> 75.00 }
};
display("unadulterated list");
print Dumper $grocery;
display("new silky soap");
$grocery->{"soap"}->{"silky"} = 12.50;
print Dumper $grocery;
display("new mega cleaner");
$grocery->{"cleaner"}->{"megaclean"} = 99.99;
print Dumper $grocery;
display("new exfoliant soap");
$grocery->{"soap"}->{"exfoliant"} = 23.75;
print Dumper $grocery;
display("lux soap gets discounted");
$grocery->{"soap"}->{"lux"} = 9.00;
print Dumper $grocery;
display("domex cleaner is discontinued");
delete $grocery->{"cleaner"}->{"domex"};
print Dumper $grocery;
display("most costly soap product");
my $max = max values $grocery->{soap};
print $max, "\n\n";
sub display
{
printf("\n%s\n%s\n%s\n\n", '-' x 45, shift, '-' x 45 );
}

Parsing HTML-attributes like strings

Have strings like HTML attributes
key1="value1 value2" key2="va3" key4
need parse such string to get HoA:
$parsed = {
key1' => [
'value1',
'value2'
],
key2 => [ 'val3' ], #or key2 => 'val3' doesn't matter..
key4 => undef,
};
Creating the function myself, like:
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
use Data::Dumper;
while(<DATA>) {
my $parsed;
chomp;
next if m/\A\s*#/;
while( m/(\w+)(\s*=\s*(["'])(.*?)(\3))?/g ) {
my $k = $1;
if( $4 ) {
my #v = split(/\s+/, $4);
$parsed->{$k} = \#v;
}
else {
$parsed->{$k} = undef;
}
}
say Dumper $parsed;
}
__DATA__
key1="value1 value2" key2 key3="val3"
key1='value1 "value2"' key8 key3='val3'
key1='value1 i\'m' key2 key3="val3"
key1='value1 value2' key8 key3=val3
works and prints correct results for the first 2 lines.
$VAR1 = {
'key1' => [
'value1',
'value2'
],
'key3' => [
'val3'
],
'key2' => undef
};
$VAR1 = {
'key1' => [
'value1',
'"value2"'
],
'key3' => [
'val3'
],
'key8' => undef
};
Unfortunately, it fails on 3rd line - don't know how to handle the escaped quotes. (And just figured out than the key=val (without quotes) is valid too))
Additionally, because don't want reinvent the wheel again, probably exists some module on CPAN for this, only haven't any idea what to search. ;(
EDIT
#mpapec suggested a module, what could greatly help for parsing the RHS part of the "assignement". My problem is than the string contains multiple space delimited LHS=RHS, where the RHS could be quoted (in single and double) or not quoted (in the case of one value) and the RHS values (in the quotes) are space delimited too..
key1="value1 value2" key2="va3" key4 key5=val5 key6='val6' key7='val x\'y zzz'
So I don't know how to break the string into multiple LHS=RHS parts, because can't split at space and can't use my regex, because it fails in escaped quotes. (maybe some more complicated regex what handles escapes could work).
Any suggestions, please?
You can use Text::ParseWords as mpapec suggested:
use strict;
use warnings;
use 5.010;
use Data::Dumper;
use Text::ParseWords;
$Data::Dumper::Sortkeys = 1;
my $string = q{key1="value1 value2" key2="va3" key4 key5=val5 key6='val6' key7='val x\'y zzz'};
my #words = shellwords $string;
my %parsed;
foreach my $word (#words) {
my ($key, $values) = split /=/, $word, 2;
$parsed{$key} //= [];
push #{ $parsed{$key} }, $_ for shellwords $values;
}
print Dumper \%parsed;
Output:
$VAR1 = {
'key1' => [
'value1',
'value2'
],
'key2' => [
'va3'
],
'key4' => [],
'key5' => [
'val5'
],
'key6' => [
'val6'
],
'key7' => [
'val',
'x\'y',
'zzz'
]
};
Note that for consistency, I assigned keys without values an empty array instead of undef. I think this will make the data structure easier to use.
Also note that I called shellwords twice. I did this to remove the backslashes from escaped quotes, so that
key7='val x\'y zzz'
gets split into
val x'y zzz
instead of
val x\'y zzz
(The backslash in x\'y in the output above is added by Data::Dumper; there's no backslash in the variable itself.)
To fix your current issue, you can setup an alteration to handle backslashes in a special way.
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
use Data::Dump;
my $parsed;
while (<DATA>) {
chomp;
next if m/\A\s*#/;
while (
m{
(\w+)
(?:
\s* = \s*
(["'])
( (?: (?!\2)[^\\] | \\. )* )
\2
)?
}gx
)
{
my $k = $1;
if ($2) {
( my $val = $3 ) =~ s/\\(.)/$1/g; # Unescape backslashes
$parsed->{$k} = [ split /\s+/, $val ]; # Split words
} else {
$parsed->{$k} = undef;
}
}
dd $parsed;
print "\n";
}
__DATA__
key1="value1 value2" key2 key3="val3"
key1='value1 "value2"' key2 key3='val3'
key1='value1 i\'m' key2 key3="val3"
Outputs:
{ key1 => ["value1", "value2"], key2 => undef, key3 => ["val3"] }
{ key1 => ["value1", "\"value2\""], key2 => undef, key3 => ["val3"] }
{ key1 => ["value1", "i'm"], key2 => undef, key3 => ["val3"] }
There are still other issues to take into account, but perhaps this will help you get further along.
You might consider something based on Parser::MGC. Your examples look like a nice simple loop on
my $key = $self->token_ident;
$self->expect( '=' );
my $value = $self->token_string;

sum hash of hash values using perl

I have a Perl script that parses an Excel file and does the following : It counts for each value in column A, the number of elements it has in column B, the script looks like this :
use strict;
use warnings;
use Spreadsheet::XLSX;
use Data::Dumper;
use List::Util qw( sum );
my $col1 = 0;
my %hash;
my $excel = Spreadsheet::XLSX->new('inout_chartdata_ronald.xlsx');
my $sheet = ${ $excel->{Worksheet} }[0];
$sheet->{MaxRow} ||= $sheet->{MinRow};
my $count = 0;
# Iterate through each row
foreach my $row ( $sheet->{MinRow}+1 .. $sheet->{MaxRow} ) {
# The cell in column 1
my $cell = $sheet->{Cells}[$row][$col1];
if ($cell) {
# The adjacent cell in column 2
my $adjacentCell = $sheet->{Cells}[$row][ $col1 + 1 ];
# Use a hash of hashes
$hash{ $cell->{Val} }{ $adjacentCell->{Val} }++;
}
}
print "\n", Dumper \%hash;
The output looks like this :
$VAR1 = {
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
This works great, my question is : How can I access the elements of this output $VAR1 in order to do : for value 13, klm + hij = 3 and get a final output like this :
$VAR1 = {
'13' => {
'somename' => 3,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
So basically what I want to do is loop through my final hash of hashes and access its specific elements based on a unique key and finally do their sum.
Any help would be appreciated.
Thanks
I used #do_sum to indicate what changes you want to make. The new key is hardcoded in the script. Note that the new key is not created if no key exists in the subhash (the $found flag).
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = (
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
);
my #do_sum = qw(klm hij);
for my $num (keys %hash) {
my $found;
my $sum = 0;
for my $key (#do_sum) {
next unless exists $hash{$num}{$key};
$sum += $hash{$num}{$key};
delete $hash{$num}{$key};
$found = 1;
}
$hash{$num}{somename} = $sum if $found;
}
print Dumper \%hash;
It sounds like you need to learn about Perl References, and maybe Perl Objects which are just a nice way to deal with references.
As you know, Perl has three basic data-structures:
Scalars ($foo)
Arrays (#foo)
Hashes (%foo)
The problem is that these data structures can only contain scalar data. That is, each element in an array can hold a single value or each key in a hash can hold a single value.
In your case %hash is a Hash where each entry in the hash references another hash. For example:
Your %hash has an entry in it with a key of 13. This doesn't contain a scalar value, but a references to another hash with three keys in it: klm, hij, and lkm. YOu can reference this via this syntax:
${ hash{13} }{klm} = 1
${ hash{13} }{hij} = 2
${ hash{13} }{lkm} = 4
The curly braces may or may not be necessary. However, %{ hash{13} } references that hash contained in $hash{13}, so I can now reference the keys of that hash. You can imagine this getting more complex as you talk about hashes of hashes of arrays of hashes of arrays. Fortunately, Perl includes an easier syntax:
$hash{13}->{klm} = 1
%hash{13}->{hij} = 2
%hash{13}->{lkm} = 4
Read up about hashes and how to manipulate them. After you get comfortable with this, you can start working on learning about Object Oriented Perl which handles references in a safer manner.