Comparison to an array of a value [duplicate] - perl

This question already has answers here:
How can I verify that a value is present in an array (list) in Perl?
(8 answers)
Closed 9 years ago.
I'm still feeling my way though perl and so there's probably a simple way of doing this but I can find it. I want to compare a single value say A or E to an array that may or may not contain that value, eg A B C D and then perform an action if they match. How should I set this up?
Thanks.

You filter each element of the array to see if it is the element you are looking for and then use the resulting array as a boolean value (not empty = true, empty = false):
#filtered_array = grep { $_ eq 'A' } #array;
if (#filtered_array) {
print "found it!\n";
}

If you store the list in an array then the only way is to examine each element individually in a loop, using grep, or for or any from List::MoreUtils. (grep is the worst of these, as it searches the entire array, even if a match has been found early on.) This is fine if the array is small, but you will hit performance probelms if the array has a significant size and you have to check it frequently.
You can speed things up by representing the same list in a hash, when a check for membership is just a single key lookup.
Alternatively, if the list is enormous, then it is best kept in a database, using SQLite.

Are you stuck on arrays?
Whenever in Perl you're talk about quickly looking up data, you should think in terms of hashes. A hash is a collection of data like an array, but it is keyed, and looking up the key is a very fast operation in Perl.
There's nothing that says the keys to your hash can't be your data, and it is very common in Perl to index an array with a hash in order to quickly search for values.
This turns your array #array into a hash called %arrays_hash.
use strict;
use warnings;
use feature qw(say);
use autodie;
my #array = qw(Alpha Beta Delta Gamma Ohm);
my %array_index;
for my $entry ( #array ) {
$array_index{$entry} = 1; # Doesn't matter. As long as it isn't blank or zero
}
Now, looking up whether or not your data is in your array is very quick. Just simply see if it's a key in your %array_index:
my $item = "Delta"; # Is this in my initial array?
if ( $array_index{$item} ) {
say "Yes! Item '$item' is in my array.";
}
else {
say "No. Item '$item' isn't in my array. David sad.";
}
This is so common, that you'll see a lot of programs that use the map command to index the array. Instead of that for loop, I could have done this:
my %array_index = ( map { $_ => 1 } #array );
or
my %array_index;
map { $array_index{$_} = 1 } #array;
You'll see both. The first one is a one liner. The map command takes each entry in the array, and puts it in $_. Then, it returns the results into an array. Thus, the map will return an array with your data in the even positions (0, 2, 4 8...) and a 1 in the odd positions (1, 3, 5...).
The second one is more literal and easier to understand (or about as easy to understand in a map command). Again, each item in your #array is being assigned to $_, and that is being used as the key in my %array_index hash.
Whether or not you want to use hashes depend upon the length of your array, and how many items of input you'll be searching for. If you're simply searching whether a single item is in your array, I'd probably use List::Utils or List::MoreUtils, or use a for loop to search each value of my array. If I am doing this for multiple values, I am better off with a hash.

Related

What's the most efficient way to check multiple hash references in perl

I have a multidimensional data structure for tracking different characteristics of files I am comparing and merging data for. The structure is set up as such:
$cumulative{$slice} = {
DATA => $data,
META => $got_meta,
RECOVER => $recover,
DISPO => $dispo,
DIR => $dir,
};
All of the keys, save DIR (which is just a simple string), are references to hashes, or arrays. I would like to have a simple search for KEYS that match "BASE" for the value DIR points to for each of the $slice keys. My initial thought was to use grep, but I'm not sure how to do that. I thought something like this would be ok:
my (#base_slices) = grep { $cumulative{$_}->{DIR} eq "BASE" } #{$cumulative{$_}};
I was wrong. Is there a way to do this without a loop, or is that pretty much the only way to check those values? Thanks!
Edit: Thanks to Ikegami for answering succinctly, even without my fully representing the outcome of the search. I have changed the question a little bit to more clearly explain the issue I was having.
This is wrong:
#{$cumulative{$slice}}
It gets the value of the array referenced by $cumulative{$slice}. But $cumulative{$slice} is not a reference to an array; it's a reference to a hash. This expression makes no sense, as results in the error
Not an ARRAY reference
What would be correct? Well, it's not quite clear what you want.
Maybe you want the keys of the elements of %cumulative whose DIR attribute equal BASE.
my #matching_keys = # 3. Save the results.
grep { $cumulative{ $_ }->{ DIR } eq "BASE" } # 2. Filter them.
keys( %cumulative ); # 1. Get the keys.
(The -> is optional between indexes, so $cumulative{ $_ }{ DIR } is also fine.)
Maybe you don't need the keys. Maybe you want the values of the elements of %cumulative whose DIR attribute equal BASE.
my #matching_values = # 3. Save the results.
grep { $_->{ DIR } eq "BASE" } # 2. Filter them.
values( %cumulative ); # 1. Get the values.
This was posted for the initial form of the question, before the edit, and reflects what I did and/or did not understand in that formulation.
The use of #{$cummulative{$_}}, with $_ presumably standing for $slice, indicates that the value for key $slice is expected to be an arrayref. However, the question shows there to be a hashref. This is either an error or the question mis-represents the problem.
If the expression in grep accurately represents the problem, for values of $slice that are given or can be built at will, then just feed that list of $slice values to the shown grep
my #base_slices = grep { $cumululative{$_}{DIR} eq 'BASE' } #slice_vals;
or
my #base_slices =
grep { $cumululative{$_}{DIR} eq 'BASE' }
map { generate_list_of_slice_values($_) }
LIST-OF-INPUTS;
That generate_list_of_slice_values() stands for whatever way the values for $slice get acquired dynamically from some input.†
There is no need for a dereferencing arrow for the key DIR (a syntax convenience), and no need for parenthesis around #base_slices since having an array already provides the needed list context.
Please clarify what $slice is meant to be and I'll update.
† The code in map's block gets elements of LIST-OF-INPUTS one at a time (as $_) and whatever it evaluates with each is joined into its return list. That is passed to grep for filtering: elements of its input list are provided to the code in the block one at a time as $_ and those for which the code evaluates to "true" (in Perl's sense) pass, forming the grep's return list.

Declare and populate a hash table in one step in Perl

Currently when I want to build a look-up table I use:
my $has_field = {};
map { $has_field->{$_} = 1 } #fields;
Is there a way I can do inline initialization in a single step? (i.e. populate it at the same time I'm declaring it?)
Just use your map to create a list then drop into a hash reference like:
my $has_field = { map { $_ => 1 } #fields };
Update: sorry, this doesn't do what you want exactly, as you still have to declare $has_field first.
You could use a hash slice:
#{$has_field}{#fields} = (1)x#fields;
The right hand side is using the x operator to repeat one by the scalar value of #fields (i.e. the number of elements in your array). Another option in the same vein:
#{$has_field}{#fields} = map {1} #fields;
Where I've tested it smart match can be 2 to 5 times as fast as creating a lookup hash and testing for the value once. So unless you're going to reuse the hash a good number of times, it's best to do a smart match:
if ( $cand_field ~~ \#fields ) {
do_with_field( $cand_field );
}
It's a good thing to remember that since 5.10, Perl now has a way native to ask "is this untested value any of these known values", it's smart match.

Perl: Beginner. Which data structure should I use?

Okay, not sure where to ask this, but I'm a beginner programmer, using Perl. I need to create an array of an array, but I'm not sure if it would be better use array/hash references, or array of hashes or hash of arrays etc.
I need an array of matches: #totalmatches
Each match contains 6 elements(strings):
#matches = ($chapternumber, $sentencenumber, $sentence, $grammar_relation, $argument1, $argument2)
I need to push each of these elements into the #matches array/hash/reference, and then push that array/hash/reference into the #totalmatches array.
The matches are found based on searching a file and selecting the strings based on meeting the criteria.
QUESTIONS
Which data structure would you use?
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
When working with 2-D, to loop through would you use:
foreach (#totalmatches) {
foreach (#matches) {
...
}
}
Thanks for any advice.
Which data structure would you use?
An array for a ordered set of things. A hash for a set of named things.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
If you try to push an array (1) into an array (2), you'll end up pushing all the elements of 1 into 2. That is why you would push an array ref in instead.
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
Look at perldoc -f push
push ARRAY,LIST
You can push a list of things in.
When working with 2-D, to loop through would you use:
Nested foreach is fine, but that syntax wouldn't work. You have to access the values you are dealing with.
for my $arrayref (#outer) {
for my $item (#$arrayref) {
$item ...
}
}
Do not push one array into another array.
Lists just join with each other into a new list.
Use list of references.
#create an anonymous hash ref for each match
$one_match_ref = {
chapternumber => $chapternumber_value,
sentencenumber => $sentencenumber_value,
sentence => $sentence_value,
grammar_relation => $grammar_relation_value,
arg1 => $argument1,
arg2 => $argument2
};
# add the reference of match into array.
push #all_matches, $one_match_ref;
# list of keys of interest
#keys = qw(chapternumber sentencenumber sentence grammer_relation arg1 arg2);
# walk through all the matches.
foreach $ref (#all_matches) {
foreach $key (#keys) {
$val = $$ref{$key};
}
# or pick up some specific keys
my $arg1 = $$ref{arg1};
}
Which data structure would you use?
An array... I can't really justify that choice, but I can't imagine what you would use as keys if you used a hash.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Here's the thing; in Perl, arrays can only contain scalar variables - the ones which start with $. Something like...
#matrix = ();
#row = ();
$arr[0] = #row; # FAIL!
... wont't work. You will have to instead use a reference to the array:
#matrix = ();
#row = ();
$arr[0] = \#row;
Or equally:
push(#matrix, \#row);
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
If you use references, you need only push once... and since you don't want to concatenate arrays (you need an array of arrays) you're stuck with no alternatives ;)
When working with 2-D, to loop through would you use:
I'd use something like:
for($i=0; $i<#matrix; $i++) {
#row = #{$matrix[$i]}; # de-reference
for($j=0; $j<#row; $j++) {
print "| "$row[$j];
}
print "|\n";
}
Which data structure would you use?
Some fundamental container properties:
An array is a container for ordered scalars.
A hash is a container for scalars obtained by a unique key (there can be no duplicate keys in the hash). The order of values added later is not available anymore.
I would use the same structure like ZhangChn proposed.
Use a hash for each match.
The details of the match then can be accessed by descriptive names instead of plain numerical indices. i.e. $ref->{'chapternumber'} instead of $matches[0].
Take references of these anonymous hashes (which are scalars) and push them into an array in order to preserve the order of the matches.
To dereference items from the data structure
get an item from the array which is a hash reference
retrieve any matching detail you need from the hash reference

Which data structure should I use for a hash without values?

I need to check if a scalar exists in a set of scalars. What is the best way of storing this set of scalars?
Walking through an array would yield linear check time. The check time for a hash would be constant, but it feels inefficient since I wouldn't be using the value part of the hash.
Use a hash, but don't use the values. There really isn't a better way.
The memory overhead for using a hash to test for set membership is minimal, and greatly outweighs the cost of repeated sequential searches through an array. There are many ways to make a set membership style hash:
my %set = map {$_ => 1} ...;
my %set; $set{$_}++ for ...;
my %set; #set{...} = (1) x num_of_items;
Each of these allows you to use the hash lookup directly in a conditional without any additional syntax.
If your hash is going to be huge, and you are worried about the memory usage, you can store undef as the value for each key. But in that case you will have to use exists $set{...} in your conditionals.
A hash should do fine. You could use undef for the value and use exists($h{$k}) or you could use 1 and use $h{$k}.
Judy::HS should be a bit more efficient, but there's no value-less version of that structure either.
You may find this section of the FAQ useful:
How can I tell whether a certain element is contained in a list or array?
Iterating through an array could be done:
my #arr = ( $list, $of, $scalars );
push #arr, $any, $other, $ones;
It's expensive to look through, but not that expensive unless you have a massive list:
grep { $_ eq $what_youre_looking_for } #arr;
The hash method also works:
my %hash = ( $list => 1, $of => 1, $scalars => 1 );
$hash{$another} = 1;
if ( exists $hash{$what_youre_looking_for} ) {
...
}
You could implement a binary search and a list sorter, but those are the two most used methods.
HashTable is the best option.
Note:- As you said it is a set, I hope there are no duplicate elements.

how to grep perl Hash Keys in to an array?

Iam a perl newbie and need help in understanding the below piece of code.
I have a perl Hash defined like this
1 my %myFavourite = ("Apple"=>"Apple");
2 my #fruits = ("Apple", "Orange", "Grape");
3 #myFavourite{#fruits}; # This returns Apple. But how?
It would be great if perl gurus could explain what's going on in Line-3 of the above code.
myFavourite is declared has a hash,but used as an array? And the statement simply takes the key of the hash ,greps it in to the array and returns the hash values corresponding the key searched. Is this the way we grep Hash Keys in to the Array?
It doesn't return Apple. It evaluates to a hash slice consisting of all of the values in the hash corresponding to the keys in #fruits. Notice if you turn on warnings that you get 2 warnings about uninitialized values. This is because myFavourite does not contain values for the keys Orange and Grape. Look up 'hash slice' in perldata.
Essentially, #myFavourite{#fruits} is shorthand for ($myFavourite{Apple}, $myFavourite{Orange}, $myFavourite{Grape}), which in this case is ($myFavourite{Apple},undef,undef). If you print it, the only output you see is Apple.
myFavourite is declared has a hash,but used as an array?
Yes, and it returns a list. It's a hash slice. See: http://perldoc.perl.org/perldata.html
Think of it as an expansion of array #fruits into multiple hash key lookups.
The #hash{#keys} syntax is just a handy way of extracting portions of the hash.
Specifically:
#myFavourite{#fruits}
is equivalent to:
($myFavourite{'Apple'},$myFavourite{'Orange'},$myFavourite{'Grape'})
which returns a three item list if called in list context or a concatenation of all three elements in scalar context (e.g. print)
my #slice_values = #myFavourite{#fruits}
# #slice_values now contains ('Apple',undef,undef)
# which is functionally equivalent to:
my #slice_values = map { $myFavourite{$_} } #fruits;
If you want to only extract hash values with keys, do:
my #favourite_fruits = #myFavourite{ grep { exists $myFavourite{$_} } #fruits };
# #favourite_fruits now contains ('Apple')
If you:
use warnings;
you'll see the interpreters warnings about the two uninitialized values being autovivified as undef.