Which data structure should I use for a hash without values? - perl

I need to check if a scalar exists in a set of scalars. What is the best way of storing this set of scalars?
Walking through an array would yield linear check time. The check time for a hash would be constant, but it feels inefficient since I wouldn't be using the value part of the hash.

Use a hash, but don't use the values. There really isn't a better way.

The memory overhead for using a hash to test for set membership is minimal, and greatly outweighs the cost of repeated sequential searches through an array. There are many ways to make a set membership style hash:
my %set = map {$_ => 1} ...;
my %set; $set{$_}++ for ...;
my %set; #set{...} = (1) x num_of_items;
Each of these allows you to use the hash lookup directly in a conditional without any additional syntax.
If your hash is going to be huge, and you are worried about the memory usage, you can store undef as the value for each key. But in that case you will have to use exists $set{...} in your conditionals.

A hash should do fine. You could use undef for the value and use exists($h{$k}) or you could use 1 and use $h{$k}.
Judy::HS should be a bit more efficient, but there's no value-less version of that structure either.

You may find this section of the FAQ useful:
How can I tell whether a certain element is contained in a list or array?

Iterating through an array could be done:
my #arr = ( $list, $of, $scalars );
push #arr, $any, $other, $ones;
It's expensive to look through, but not that expensive unless you have a massive list:
grep { $_ eq $what_youre_looking_for } #arr;
The hash method also works:
my %hash = ( $list => 1, $of => 1, $scalars => 1 );
$hash{$another} = 1;
if ( exists $hash{$what_youre_looking_for} ) {
...
}
You could implement a binary search and a list sorter, but those are the two most used methods.

HashTable is the best option.
Note:- As you said it is a set, I hope there are no duplicate elements.

Related

How to get a single key (random ok) from a very large hash in perl?

Suppose you have a very large hash (lots of keys), and have a function that potentially deletes many of those keys, e.g.:
while ( each %in ) {
push #out, $_;
functionThatDeletesOneOrMoreKeys($_, \%in);
}
I believe each in this case is an efficient way to pull a single key from the hash, but the documentation says each should not be used when deleting keys from the hash.
Otherwise I could use while (%in) { $_ = (keys(%in))[0] .... but that seems horribly inefficient for a very large hash.
Is there a better way to do this?
This seems to be a horrible thing to do, and it would be better if you explained what you were trying to achieve
However the problem with deleting hash elements while iterating using each is that each holds a state value for the hash that depends on the hash remaining unchanged
You can clear that state by calling keys (or values) on the same hash.
Here's an example which deletes the three elements with the given key and those before and after it in an attempt to emulate what your function_that_deletes_one_or_more_keys (which is what it should be called) might do
use strict;
use warnings 'all';
use feature 'say';
my %h = map +( $_ => 1), 0 .. 9;
while ( my $key = each %h ) {
say $key;
delete #h{$key-1, $key, $key+1};
keys %h; # Reset "each" state
}
I recommend that you don't use the global $_ variable for this
output
2
9
4
6
0
General advice without knowing details of the complexity to which you refer:
Change the function lines that delete keys to instead store the keys to be deleted in an array or hash (or file). After the first loop completes, loop through that array or hash (or file) and delete those keys from the first hash.
my #in_keys = keys %in;
for (#in_keys) {
if (exists $in{$_}) {
...
}
}
Or do as Borodin shows, resetting the iterator each time you know you've deleted at least one element.

Comparison to an array of a value [duplicate]

This question already has answers here:
How can I verify that a value is present in an array (list) in Perl?
(8 answers)
Closed 9 years ago.
I'm still feeling my way though perl and so there's probably a simple way of doing this but I can find it. I want to compare a single value say A or E to an array that may or may not contain that value, eg A B C D and then perform an action if they match. How should I set this up?
Thanks.
You filter each element of the array to see if it is the element you are looking for and then use the resulting array as a boolean value (not empty = true, empty = false):
#filtered_array = grep { $_ eq 'A' } #array;
if (#filtered_array) {
print "found it!\n";
}
If you store the list in an array then the only way is to examine each element individually in a loop, using grep, or for or any from List::MoreUtils. (grep is the worst of these, as it searches the entire array, even if a match has been found early on.) This is fine if the array is small, but you will hit performance probelms if the array has a significant size and you have to check it frequently.
You can speed things up by representing the same list in a hash, when a check for membership is just a single key lookup.
Alternatively, if the list is enormous, then it is best kept in a database, using SQLite.
Are you stuck on arrays?
Whenever in Perl you're talk about quickly looking up data, you should think in terms of hashes. A hash is a collection of data like an array, but it is keyed, and looking up the key is a very fast operation in Perl.
There's nothing that says the keys to your hash can't be your data, and it is very common in Perl to index an array with a hash in order to quickly search for values.
This turns your array #array into a hash called %arrays_hash.
use strict;
use warnings;
use feature qw(say);
use autodie;
my #array = qw(Alpha Beta Delta Gamma Ohm);
my %array_index;
for my $entry ( #array ) {
$array_index{$entry} = 1; # Doesn't matter. As long as it isn't blank or zero
}
Now, looking up whether or not your data is in your array is very quick. Just simply see if it's a key in your %array_index:
my $item = "Delta"; # Is this in my initial array?
if ( $array_index{$item} ) {
say "Yes! Item '$item' is in my array.";
}
else {
say "No. Item '$item' isn't in my array. David sad.";
}
This is so common, that you'll see a lot of programs that use the map command to index the array. Instead of that for loop, I could have done this:
my %array_index = ( map { $_ => 1 } #array );
or
my %array_index;
map { $array_index{$_} = 1 } #array;
You'll see both. The first one is a one liner. The map command takes each entry in the array, and puts it in $_. Then, it returns the results into an array. Thus, the map will return an array with your data in the even positions (0, 2, 4 8...) and a 1 in the odd positions (1, 3, 5...).
The second one is more literal and easier to understand (or about as easy to understand in a map command). Again, each item in your #array is being assigned to $_, and that is being used as the key in my %array_index hash.
Whether or not you want to use hashes depend upon the length of your array, and how many items of input you'll be searching for. If you're simply searching whether a single item is in your array, I'd probably use List::Utils or List::MoreUtils, or use a for loop to search each value of my array. If I am doing this for multiple values, I am better off with a hash.

Nicer way to test if hash entry exists before assigning it

I'm looking for a nicer way to first "test" if a hash key exists before using it. I'm currently writing a eventlog parser that decodes hex numbers into strings. As I cannot be sure that my decode table contains hex numbers I first need to check if the key exists in a hash before assigning the value to a new variable. So what I'm doing a lot is:
if ($MEL[$i]{type} eq '5024') {
$MEL[$i]{decoded_inline} = $decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"}
if exists ($decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"})
}
What I do not like is that the expression $decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"} is twice in my code. Is there a nicer or shorter version of the line above?
I doubt this qualifies as "nice", but I think it is achieving the goal of not referring to the expression twice. I'm not sure it's worth this pain, mind you:
my $foo = $decode_hash{checkpoint};
my $bar = $MEL[$i]{raw}[128];
if ($MEL[$i]{type} eq '5024') {
$MEL[$i]{decoded_inline} = $foo->{$bar}
if exists ( $foo->{$bar} );
}
Yes there is an easier way. You know that you can only store references in an array or hash, right? Well, there's a neat side effect to that. You can take references to deep hash or array slots and then treat them like scalar references. The unfortunate side-effect is that it autovivifies the slot, but if you're always going to assign to that slot, and just want to do some checking first, it's not a bad way to keep from typing things over and over--as well as repeatedly indexing the structures as well.
my $ref = \$decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"};
unless ( defined( $$ref )) {
...
$$ref = {};
...
}
As long as an existing hash element can't have an undefined value, I would write this
if ($MEL[$i]{type} eq '5024') {
my $value = $decode_hash{checkpoint}{$MEL[$i]{raw}[128]};
$MEL[$i]{decoded_inline} = $value if defined $value;
}
(Note that you shouldn't have the double-quotes around the hash key.)

Declare and populate a hash table in one step in Perl

Currently when I want to build a look-up table I use:
my $has_field = {};
map { $has_field->{$_} = 1 } #fields;
Is there a way I can do inline initialization in a single step? (i.e. populate it at the same time I'm declaring it?)
Just use your map to create a list then drop into a hash reference like:
my $has_field = { map { $_ => 1 } #fields };
Update: sorry, this doesn't do what you want exactly, as you still have to declare $has_field first.
You could use a hash slice:
#{$has_field}{#fields} = (1)x#fields;
The right hand side is using the x operator to repeat one by the scalar value of #fields (i.e. the number of elements in your array). Another option in the same vein:
#{$has_field}{#fields} = map {1} #fields;
Where I've tested it smart match can be 2 to 5 times as fast as creating a lookup hash and testing for the value once. So unless you're going to reuse the hash a good number of times, it's best to do a smart match:
if ( $cand_field ~~ \#fields ) {
do_with_field( $cand_field );
}
It's a good thing to remember that since 5.10, Perl now has a way native to ask "is this untested value any of these known values", it's smart match.

In Perl,can we enter two values for the same key in a hash without losing(overwriting) first one?

after i declare a hash in perl
%hash1=(a=>"turkey",
b=>"india",
c=>"england",
d=>"usa")
if i assign a new value to already existing key like
$hash1{d}="australia";
i am losing the previous value with key 'd' i.e "usa" because when i do
print %hash1;
i dont see the value "usa"...how to retain both the values for the same key?
A hash key can only contain a single scalar value, so if that value is a string, you are stuck with one item per key. However, there is nothing stopping you from storing array references (which are also scalars) as the value. To make things easier, you should probably store only array references or strings, and not mix the two:
my %hash1 = (a=>"turkey", b=>"india", c=>"england", d=>"usa");
# upgrade all values to arrays
# $hash1{$_} = [$hash1{$_}] for keys %hash1; # a way with `keys`
$_ = [$_] for values %hash1; # a better way with `values`, thanks to ysth
push #{ $hash1{d} }, 'australia';
print "$_ : #{ $hash1{$_} }\n" for keys %hash;
As JohnSmith said, use a hash of array:
my %hash1 = (
a => ["turkey"],
b => ["india"],
c => ["england"],
d => ["usa"],
);
and use it as:
push #{$hash1{d}}, "australia";
you need to store a hash of lists
example:
http://www.perlmonks.org/?node_id=1977
This question is precisely equivalent to asking whether if we first assign a variable that can hold only one value some particular value, but then later assign that same variable a different value, whether we can ever access the earlier value that we've just now overwritten.
The same answer applies to both: no, of course not, not without changing around your storage class, access mechanism, or both. One means one. When you have come up with a mechanism that works for a simple unsubscripted scalar variable, you will have done so for an entire class of problem.