Storing of duplicate inputs in as distinct in Perl - perl

I'm stuck on this problem for days and don't know which data structure I should use in Perl
Suppose I have this file with the following inputs:
lookup hello
lookup good night
good night
I want to be able to store "lookup" linked with "hello", "lookup" linked with "good night" and lastly "good night" linked with nothing else. Should I use a hash or an array? If I set "lookup" as the key of the hash, I will lose "hello" information. If I set "good night" as key, I will end up losing the 3rd line of information.
What should I do?
EDIT: There are certain keywords (which I called "lookup") that has stuff that are linked to them. For example, "password" will be followed by the actual password whereas "login" need not have anything following it.

It's not exactly clear how you're expecting to break up the words here, but in the general case, if you need to do random lookups (by arbitrary word), a hash is a better fit than an array. Without additional details on exactly what you're trying to do here, it's hard to be more specific.

It looks like you want a hash of arrays. Ignoring the issue of how to decide the key is "good night" instead of "good" (and just assuming the key should be "good"), you could do something like:
#!/usr/bin/env perl
use strict;
use warnings;
my %hoa; # hash of arrays
while(<>) {
my #f = split;
my $k = shift #f;
$hoa{ $k } = [] unless $hoa{ $k };
push #{$hoa{ $k }}, join( ' ', #f );
}
foreach my $k( keys( %hoa )) {
print "$k: $_\n" foreach ( #{$hoa{ $k }});
}

Not quite sure what you want, but, is this ok:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);
# build a hash of keywords
my %kword = map{ $_ => 1}qw(lookup kword1 kword2);
my %result;
while(<DATA>) {
chomp;
my ($k, $v) = split(/ /,$_, 2);
push #{$result{$k}}, $v if exists $kword{$k};
}
dump%result;
__DATA__
lookup hello
lookup good night
good night
kword1 foo
kword1 bar
output:
("kword1", ["foo", "bar"], "lookup", ["hello", "good night"])

You could also use a hash of hashes if you plan to store another piece of data with your linked "lookup"->"good night" which will be easy to access later on.
E.g:
%result = (
'lookup' => {
'hello' => $storage1,
'good night' => $storage2,
},
'good night' => {
}
);

Related

How do I preserve the order of a hash in Perl?

I have a .sql file from which I am reading my input. Suppose the file contains the following input....
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
Now I have written a perl script to generate hash from this file..
use strict;
use Data::Dumper;
if(open(MYFILE,"file.sql")){
my #stack;
my %hash;
push #stack,\%hash;
my #file = <MYFILE>;
foreach my $row(#file){
if($row =~ /Message /){
my %my_hash;
my #words = split(" ",$row);
my #sep_words = split(",",$words[2]);
foreach my $x(#sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
}
push #stack,$stack[$#stack]->{$words[1]} = {%my_hash};
pop #stack;
}
}
print Dumper(\%hash);
}
I am getting the following output..
$VAR1 = {
'Flowers' => {
'Flower' => '"Rose"',
'Color' => '"Red";'
},
'Fruits' => {
'Taste' => '"Sweet";',
'Fruit' => '"Apple"',
'Color' => '"Red"'
}
};
Now here the hash is not preserving the order in which the input is read.I want my hash to be in the same order as in input file.
I have found some libraries like Tie::IxHash but I want to avoid the use of any libraries.Can anybody help me out???
For a low key approach, you could always maintain the keys in an array, which does have an order.
foreach my $x(#sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
push(#list_keys,$key);
}
And then to extract, iterate over the keys
foreach my $this_key (#list_keys) {
# do something with $my_hash{$this_key}
}
But that does have the issue of, you're relying on the array of keys and the hash staying in sync. You could also accidentally add the same key multiple times, if you're not careful.
Joel has it correct - you cannot reliably trust the order of a hash in Perl. If you need a certain order, you'll have to store your information in an array.
A hash is a set of key-value pairs with unique keys. A set is never ordered per se.
An array is a sequence of any number of scalars. An array is ordered per se, but uniqueness would have to be enforced externally.
Here is my take on your problem:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
local $/ = ";\n";
my #messages;
while (<DATA>) {
chomp;
my ($msg, $to, $what) = split ' ', $_, 3; # limit number of fragments.
my %options;
while($what =~ /(\w+) = "((?:[^"]++|\\.)*)" (?:,|$)/xg) {
$options{$1} = $2;
}
push #messages, [$to => \%options];
}
print Dumper \#messages;
__DATA__
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I put the messages into an array, because it has to be sorted. Also, I dont do weird gymnastics with a stack I don't need.
I don't split on all newlines, because you could have quoted value that contain newlines. For the same reason, I don't blindly split on , or = and use a sensible regex. It may be worth adding error detection, like die if not defined pos $what or pos($what) != length($what); at the end (requires /c flag on regex), to see if we actually processed everything or were thrown out of the loop prematurely.
This produces:
$VAR1 = [
[ 'Fruits',
{
'Taste' => 'Sweet',
'Fruit' => 'Apple',
'Color' => 'Red'
}
],
[ 'Flowers',
{
'Flower' => 'Rose',
'Color' => 'Red'
}
]
];
(with other indenting, but that's irrelevant).
One gotcha exists: The file has to be terminated by a newline, or the last semicolon isn't caught.

How to use a 'subroutine reference' as a hash key

In Perl, I'm learning how to dereference 'subroutine references'. But I can't seem to use a subroutine reference as a hash 'key'.
In the following sample code,
I can create a reference to a subroutine ($subref) and then dereference it to run the subroutine (&$subref)
I can use the reference as a hash 'value' and then easily dereference that
But I cannot figure out how to use the reference as a hash 'key'. When I pull the key out of the hash, Perl interprets the key as a string value (not a reference) - which I now understand (thanks to this site!). So I've tried Hash::MultiKey, but that seems to turn it into an array reference. I want to treat it as a subroutine/code reference, assuming this is somehow possible?
Any other ideas?
use strict;
#use diagnostics;
use Hash::MultiKey;
my $subref = \&hello;
#1:
&$subref('bob','sue'); #okay
#2:
my %hash;
$hash{'sayhi'}=$subref;
&{$hash{'sayhi'}}('bob','sue'); #okay
#3:
my %hash2;
tie %hash2, 'Hash::MultiKey';
$hash2{$subref}=1;
foreach my $key (keys %hash2) {
print "Ref type is: ". ref($key)."\n";
&{$key}('bob','sue'); # Not okay
}
sub hello {
my $name=shift;
my $name2=shift;
print "hello $name and $name2\n";
}
This is what is returned:
hello bob and sue
hello bob and sue
Ref type is: ARRAY
Not a CODE reference at d:\temp\test.pl line 21.
That is correct, a normal hash key is only a string. Things that are not strings get coerced to their string representation.
my $coderef = sub { my ($name, $name2) = #_; say "hello $name and $name2"; };
my %hash2 = ( $coderef => 1, );
print keys %hash2; # 'CODE(0x8d2280)'
Tieing is the usual means to modify that behaviour, but Hash::MultiKey does not help you, it has a different purpose: as the name says, you may have multiple keys, but again only simple strings:
use Hash::MultiKey qw();
tie my %hash2, 'Hash::MultiKey';
$hash2{ [$coderef] } = 1;
foreach my $key (keys %hash2) {
say 'Ref of the key is: ' . ref($key);
say 'Ref of the list elements produced by array-dereferencing the key are:';
say ref($_) for #{ $key }; # no output, i.e. simple strings
say 'List elements produced by array-dereferencing the key are:';
say $_ for #{ $key }; # 'CODE(0x8d27f0)'
}
Instead, use Tie::RefHash. (Code critique: prefer this syntax with the -> arrow for dereferencing a coderef.)
use 5.010;
use strict;
use warnings FATAL => 'all';
use Tie::RefHash qw();
my $coderef = sub {
my ($name, $name2) = #_;
say "hello $name and $name2";
};
$coderef->(qw(bob sue));
my %hash = (sayhi => $coderef);
$hash{sayhi}->(qw(bob sue));
tie my %hash2, 'Tie::RefHash';
%hash2 = ($coderef => 1);
foreach my $key (keys %hash2) {
say 'Ref of the key is: ' . ref($key); # 'CODE'
$key->(qw(bob sue));
}
From perlfaq4:
How can I use a reference as a hash key?
(contributed by brian d foy and Ben Morrow)
Hash keys are strings, so you can't really use a reference as the key.
When you try to do that, perl turns the reference into its stringified
form (for instance, HASH(0xDEADBEEF) ). From there you can't get back
the reference from the stringified form, at least without doing some
extra work on your own.
Remember that the entry in the hash will still be there even if the
referenced variable goes out of scope, and that it is entirely
possible for Perl to subsequently allocate a different variable at the
same address. This will mean a new variable might accidentally be
associated with the value for an old.
If you have Perl 5.10 or later, and you just want to store a value
against the reference for lookup later, you can use the core
Hash::Util::Fieldhash module. This will also handle renaming the keys
if you use multiple threads (which causes all variables to be
reallocated at new addresses, changing their stringification), and
garbage-collecting the entries when the referenced variable goes out
of scope.
If you actually need to be able to get a real reference back from each
hash entry, you can use the Tie::RefHash module, which does the
required work for you.
So it looks like Tie::RefHash will do what you want. But to be honest, I don't think that what you want to do is a particularly good idea.
Why do you need it? If you e.g. need to store parameters to the functions in the hash, you can use HoH:
my %hash;
$hash{$subref} = { sub => $subref,
arg => [qw/bob sue/],
};
foreach my $key (keys %hash) {
print "$key: ref type is: " . ref($key) . "\n";
$hash{$key}{sub}->( #{ $hash{$key}{arg} } );
}
But then, you can probably choose a better key anyway.

How to count duplicate key and add all values of duplicate key together to make new hash with non duplicate key?

Hi I am new to perl and in a beginners stage Please Help
I am having a hash
%hash = { a => 2 , b=>6, a=>4, f=>2, b=>1, a=>1}
I want output as
a comes 3 times
b comes 2 times
f comes 1 time
a new hash should be
%newhash = { a => 7, b=>7,f =>2}
How can I do this?
To count the frequency of keys in hash i am doing
foreach $element(sort keys %hash) {
my $count = grep /$element/, sort keys %hash;
print "$element comes in $count times \n";
}
But by doing this I am getting the output as:
a comes 1 times
b comes 1 times
a comes 1 times
f comes 1 times
b comes 1 times
a comes 1 times
Which is not what I want.
How can I get the correct number of frequency of the duplicate keys? How can I add the values of those duplicate key and store in a new hash?
By definition, a hash can not have the same hash key in it multiple times. You probably want to store your initial data in a different data structure, such as a two-dimensional array:
use strict;
use warnings;
use Data::Dumper;
my #data = ( [ a => 2 ],
[ b => 6 ],
[ a => 4 ],
[ f => 2 ],
[ b => 1 ],
[ a => 1 ],
);
my %results;
for my $value (#data) {
$results{$value->[0]} += $value->[1];
}
print Dumper %results;
# $VAR1 = 'a';
# $VAR2 = 7;
# $VAR3 = 'b';
# $VAR4 = 7;
# $VAR5 = 'f';
# $VAR6 = 2;
That said, other wrong things:
%hash = { a => 2 , b=>6, a=>4, f=>2, b=>1, a=>1}
You can't do this, it's assigning a hashref ({}) to a hash. Either use %hash = ( ... ) or $hashref = { ... }.
Sonam:
I've reedited your post in order to help format it for reading. Study the Markdown Editing Help Guide and that'll make your posts clearer and easier to understand. Here are a couple of hints:
Indent your code by four spaces. That tells Markdown to leave it alone and don't reformat it.
When you make a list, put astricks with a space in front. Markdown understands it's a bulleted list and formats it as such.
Press "Edit" on your original post, and you can see what changes I made.
Now on to your post. I'm not sure I understand your data. If your data was in a hash, the keys would be unique. You can't have duplicate keys in a hash, so where is your data coming from?
For example, if you're reading it in from a file with two numbers on each line, you could do this:
use autodie;
use strict;
use warnings;
open (my $data_fh, "<", "$fileName");
my %hash;
while (my $line = <$data_fh>) {
chomp $line;
my ($key, $value) = split /\s+/, $line;
$hash{$key}++;
}
foreach my $key (sort keys %hash) {
print "$key appears $hash{$key} times\n";
}
The first three lines are Perl pragmas. They change the way Perl operates:
use autodie: This tells the program to die in certain circumstances such as when you try to open a file that doesn't exist. This way, I didn't have to check to see if the open statement worked or not.
use strict: This makes sure you have to declare your variables before using them which helps eliminate 90% of Perl bugs. You declare a variable most of the time using my. Variables declared with my last in the block where they were declared. That's why my %hash had to be declared before the while block. Otherwise, the variable would become undefined once that loops completes.
use warnings: This has Perl generate warnings in certain conditions. For example, you attempt to print out a variable that has no user set value.
The first loop simply goes line by line through my data and counts the number of occurrences of your key. The second loop prints out the results.

How can I print the contents of a hash in Perl?

I keep printing my hash as # of buckets / # allocated.
How do I print the contents of my hash?
Without using a while loop would be most preferable (for example, a one-liner would be best).
Data::Dumper is your friend.
use Data::Dumper;
my %hash = ('abc' => 123, 'def' => [4,5,6]);
print Dumper(\%hash);
will output
$VAR1 = {
'def' => [
4,
5,
6
],
'abc' => 123
};
Easy:
print "$_ $h{$_}\n" for (keys %h);
Elegant, but actually 30% slower (!):
while (my ($k,$v)=each %h){print "$k $v\n"}
Here how you can print without using Data::Dumper
print "#{[%hash]}";
For debugging purposes I will often use YAML.
use strict;
use warnings;
use YAML;
my %variable = ('abc' => 123, 'def' => [4,5,6]);
print "# %variable\n", Dump \%variable;
Results in:
# %variable
---
abc: 123
def:
- 4
- 5
- 6
Other times I will use Data::Dump. You don't need to set as many variables to get it to output it in a nice format than you do for Data::Dumper.
use Data::Dump = 'dump';
print dump(\%variable), "\n";
{ abc => 123, def => [4, 5, 6] }
More recently I have been using Data::Printer for debugging.
use Data::Printer;
p %variable;
{
abc 123,
def [
[0] 4,
[1] 5,
[2] 6
]
}
( Result can be much more colorful on a terminal )
Unlike the other examples I have shown here, this one is designed explicitly to be for display purposes only. Which shows up more easily if you dump out the structure of a tied variable or that of an object.
use strict;
use warnings;
use MTie::Hash;
use Data::Printer;
my $h = tie my %h, "Tie::StdHash";
#h{'a'..'d'}='A'..'D';
p %h;
print "\n";
p $h;
{
a "A",
b "B",
c "C",
d "D"
} (tied to Tie::StdHash)
Tie::StdHash {
public methods (9) : CLEAR, DELETE, EXISTS, FETCH, FIRSTKEY, NEXTKEY, SCALAR, STORE, TIEHASH
private methods (0)
internals: {
a "A",
b "B",
c "C",
d "D"
}
}
The answer depends on what is in your hash. If you have a simple hash a simple
print map { "$_ $h{$_}\n" } keys %h;
or
print "$_ $h{$_}\n" for keys %h;
will do, but if you have a hash that is populated with references you will something that can walk those references and produce a sensible output. This walking of the references is normally called serialization. There are many modules that implement different styles, some of the more popular ones are:
Data::Dumper
Data::Dump::Streamer
YAML::XS
JSON::XS
XML::Dumper
Due to the fact that Data::Dumper is part of the core Perl library, it is probably the most popular; however, some of the other modules have very good things to offer.
My favorite: Smart::Comments
use Smart::Comments;
# ...
### %hash
That's it.
Looping:
foreach(keys %my_hash) { print "$_ / $my_hash{$_}\n"; }
Functional
map {print "$_ / $my_hash{$_}\n"; } keys %my_hash;
But for sheer elegance, I'd have to choose wrang-wrang's. For my own code, I'd choose my foreach. Or tetro's Dumper use.
I really like to sort the keys in one liner code:
print "$_ => $my_hash{$_}\n" for (sort keys %my_hash);
If you want to be pedantic and keep it to one line (without use statements and shebang), then I'll sort of piggy back off of tetromino's answer and suggest:
print Dumper( { 'abc' => 123, 'def' => [4,5,6] } );
Not doing anything special other than using the anonymous hash to skip the temp variable ;)
The easiest way in my experiences is to just use Dumpvalue.
use Dumpvalue;
...
my %hash = { key => "value", foo => "bar" };
my $dumper = new DumpValue();
$dumper->dumpValue(\%hash);
Works like a charm and you don't have to worry about formatting the hash, as it outputs it like the Perl debugger does (great for debugging). Plus, Dumpvalue is included with the stock set of Perl modules, so you don't have to mess with CPAN if you're behind some kind of draconian proxy (like I am at work).
I append one space for every element of the hash to see it well:
print map {$_ . " "} %h, "\n";

How can I verify that a value is present in an array (list) in Perl?

I have a list of possible values:
#a = qw(foo bar baz);
How do I check in a concise way that a value $val is present or absent in #a?
An obvious implementation is to loop over the list, but I am sure TMTOWTDI.
Thanks to all who answered! The three answers I would like to highlight are:
The accepted answer - the most "built-in" and backward-compatible way.
RET's answer is the cleanest, but only good for Perl 5.10 and later.
draegtun's answer is (possibly) a bit faster, but requires using an additional module. I do not like adding dependencies if I can avoid them, and in this case do not need the performance difference, but if you have a 1,000,000-element list you might want to give this answer a try.
If you have perl 5.10, use the smart-match operator ~~
print "Exist\n" if $var ~~ #array;
It's almost magic.
Perl's bulit in grep() function is designed to do this.
#matches = grep( /^MyItem$/, #someArray );
or you can insert any expression into the matcher
#matches = grep( $_ == $val, #a );
This is answered in perlfaq4's answer to "How can I tell whether a certain element is contained in a list or array?".
To search the perlfaq, you could search through the list of all questions in perlfaq using your favorite browser.
From the command line, you can use the -q switch to perldoc to search for keywords. You would have found your answer by searching for "list":
perldoc -q list
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
use 5.010;
if( $item ~~ #array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
#blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (#blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
#primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
#is_tiny_prime = ();
for (#primes) { $is_tiny_prime[$_] = 1 }
# or simply #istiny_prime[#primes] = (1) x #primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
#articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (#articles) { vec($read,$_,1) = 1 }
Now check whether vec($read,$n,1) is true for some $n.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports the function first for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalent looks like this subroutine:
sub first (&#) {
my $code = shift;
foreach (#_) {
return $_ if &{$code}();
}
undef;
}
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
my $is_there = grep $_ eq $whatever, #array;
If you want to actually extract the matching elements, simply use grep in list context.
my #matches = grep $_ eq $whatever, #array;
Use the first function from List::Util which comes as standard with Perl....
use List::Util qw/first/;
my #a = qw(foo bar baz);
if ( first { $_ eq 'bar' } #a ) { say "Found bar!" }
NB. first returns the first element it finds and so doesn't have to iterate through the complete list (which is what grep will do).
One possible approach is to use List::MoreUtils 'any' function.
use List::MoreUtils qw/any/;
my #array = qw(foo bar baz);
print "Exist\n" if any {($_ eq "foo")} #array;
Update: corrected based on zoul's comment.
Interesting solution, especially for repeated searching:
my %hash;
map { $hash{$_}++ } #a;
print $hash{$val};
$ perl -e '#a = qw(foo bar baz);$val="bar";
if (grep{$_ eq $val} #a) {
print "found"
} else {
print "not found"
}'
found
$val='baq';
not found
If you don't like unnecessary dependency, implement any or first yourself
sub first (&#) {
my $code = shift;
$code->() and return $_ foreach #_;
undef
}
sub any (&#) {
my $code = shift;
$code->() and return 1 foreach #_;
undef
}