Perl searching for string contained in array - perl

I have an array with the following values:
push #fruitArray, "apple|0";
push #fruitArray, "apple|1";
push #fruitArray, "pear|0";
push #fruitArray, "pear|0";
I want to find out if the string "apple" exists in this array (ignoring the "|0" "|1")
I am using:
$fruit = 'apple';
if( $fruit ~~ #fruitArray ){ print "I found apple"; }
Which isn't working.

Don't use smart matching. It never worked properly for a number of reasons and it is now marked as experimental
In this case you can use grep instead, together with an appropriate regex pattern
This program tests every element of #fruitArray to see if it starts with the letters in $fruit followed by a pipe character |. grep returns the number of elements that matched the pattern, which is a true value if at least one matched
my #fruitArray = qw/ apple|0 apple|1 pear|0 pear|0 /;
my $fruit = 'apple';
print "I found $fruit\n" if grep /^$fruit\|/, #fruitArray;
output
I found apple

I - like #Borodin suggests, too - would simply use grep():
$fruit = 'apple';
if (grep(/^\Q$fruit\E\|/, #fruitArray)) { print "I found apple"; }
which outputs:
I found apple
\Q...\E converts your string into a regex pattern.
Looking for the | prevents finding a fruit whose name starts with the name of the fruit for which you are looking.
Simple and effective... :-)
Update: to remove elements from array:
$fruit = 'apple';
#fruitsArrayWithoutApples = grep ! /^\Q$fruit\E|/, #fruitArray;

If your Perl is not ancient, you can use the first subroutine from the List::Util module (which became a core module at Perl 5.8) to do the check efficiently:
use List::Util qw{ first };
my $first_fruit = first { /\Q$fruit\E/ } #fruitArray;
if ( defined $first_fruit ) { print "I found $fruit\n"; }

Don't use grep, that will loop the entire array, even if it finds what you are looking for in the first index, so it is inefficient.
this will return true if it finds the substring 'apple', then return and not finish iterating through the rest of the array
#takes a reference to the array as the first parameter
sub find_apple{
#array_input = #{$_[0]};
foreach $fruit (#array_input){
if (index($fruit, 'apple') != -1){
return 1;
}
}
}

You can get close to the smartmatch sun without melting your wings by using match::simple:
use match::simple;
my #fruits = qw/apple|0 apple|1 pear|0 pear|0/;
$fruit = qr/apple/ ;
say "found $fruit" if $fruit |M| \#fruits ;
There's also a match() function if the infix [M] doesn't read well.
I like the way match::simple does almost everything I expected from ~~ without any surprising complexity. If you're fluent in perl it probably isn't something you'd see as necessary, but - especially with match() - code can be made pleasantly readable ... at the cost of imposing the use of references, etc.

Related

How can I determine if an element exists in an array (perl)

I'm looping through an array, and I want to test if an element is found in another array.
In pseudo-code, what I'm trying to do is this:
foreach $term (#array1) {
if ($term is found in #array2) {
#do something here
}
}
I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...
I've tried grep:
if grep {/$term/} #array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in #array2
if (grep(/$term/, #array2)) { #do something }
# this test likewise succeeds for values NOT IN the array
I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.
I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...
... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)
...
I have an existing script which uses 'grep' somewhat succesfully:
$rc=grep(/$term/, #array);
if ($rc eq 0) { #something happens here }
but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.
The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:
#array=`cat file`;
whereas in new script I put the array inside the script itself (coz it's small) ... like this:
#array=("element1","element2","element3","element4");
How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(
########################################################################
addendum ... some clarifications or examples of my actual code:
########################################################################
The term I'm trying to match/find/grep is a word element, for example "word123".
This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.
The term I'm searching for is stored in a variable which is instantiated via split:
foreach $line(#array1) {
chomp($line); # habit
# every line has multiple elements that I want to capture
($term1,$term2,$term3,$term4)=split(/\t/,$line);
# if a particular one of those terms is found in my other array 'array2'
if (grep(/$term2/, #array2) {
# then I'm storing a different element from the line into a 3rd array which eventually will be outputted
push(#known, $term1) unless $seen{$term1}++;
}
}
see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.
...
I just tried one of the below suggestions:
if (grep $_ eq $term2, #array2)
And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.
This is in perlfaq. A quick way to do it is
my %seen;
$seen{$_}++ for #array1;
for my $item (#array2) {
if ($seen{$item}) {
# item is in array2, do something
}
}
If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).
ETA:
With the changed question: If the task is to match single words in #array2 against whole lines in #array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.
Unless #array2 is very large, you might do something like this:
my $rx = join "|", #array2;
for my $line (#array1) {
if ($line =~ /\b$rx\b/) { # use word boundary to avoid partial matches
# do something
}
}
If #array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:
my $rx = join "|", map quotemeta, #array2;
# etc
You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
print "a in \#array1\n" if 'a' ~~ #array1;
print "z in \#array1\n" if 'z' ~~ #array1;
print "z in \#array2\n" if 'z' ~~ #array2;
The example is very simple, but you can use an RE if you need to as well.
I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.
This should work.
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
for my $term (#array1) {
if (grep $_ eq $term, #array2) {
print "$term found.\n";
}
}
Output:
a found.
c found.
e found.
g found.
#!/usr/bin/perl
#ar = ( '1','2','3','4','5','6','10' );
#arr = ( '1','2','3','4','5','6','7','8','9' ) ;
foreach $var ( #arr ){
print "$var not found\n " if ( ! ( grep /$var/, #ar )) ;
}
Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!
print "$element found in the array\n" if ("#array" =~ m/$element/);
Your 'actual code' shouldn't even compile:
if (grep(/$term2/, #array2) {
should be:
if (grep (/$term2/, #array2)) {
You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:
if (grep {/$term2/} #array2) {
You may want to use strict; and use warnings; to catch issues like this.
The example below might be helpful, it tries to see if any element in #array_sp is present in #my_array:
#! /usr/bin/perl -w
#my_array = qw(20001 20003);
#array_sp = qw(20001 20002 20004);
print "#array_sp\n";
foreach $case(#my_array){
if("#array_sp" =~ m/$case/){
print "My God!\n";
}
}
use pattern matching can solve this. Hope it helps
-QC
1. grep with eq , then
if (grep {$_ eq $term2} #array2) {
print "$term2 exists in the array";
}
2. grep with regex , then
if (grep {/$term2/} #array2) {
print "element with pattern $term2 exists in the array";
}

Grep to find item in Perl array

Every time I input something the code always tells me that it exists. But I know some of the inputs do not exist. What is wrong?
#!/usr/bin/perl
#array = <>;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);
if (grep($match, #array)) {
print "found it\n";
}
The first arg that you give to grep needs to evaluate as true or false to indicate whether there was a match. So it should be:
# note that grep returns a list, so $matched needs to be in brackets to get the
# actual value, otherwise $matched will just contain the number of matches
if (my ($matched) = grep $_ eq $match, #array) {
print "found it: $matched\n";
}
If you need to match on a lot of different values, it might also be worth for you to consider putting the array data into a hash, since hashes allow you to do this efficiently without having to iterate through the list.
# convert array to a hash with the array elements as the hash keys and the values are simply 1
my %hash = map {$_ => 1} #array;
# check if the hash contains $match
if (defined $hash{$match}) {
print "found it\n";
}
You seem to be using grep() like the Unix grep utility, which is wrong.
Perl's grep() in scalar context evaluates the expression for each element of a list and returns the number of times the expression was true.
So when $match contains any "true" value, grep($match, #array) in scalar context will always return the number of elements in #array.
Instead, try using the pattern matching operator:
if (grep /$match/, #array) {
print "found it\n";
}
This could be done using List::Util's first function:
use List::Util qw/first/;
my #array = qw/foo bar baz/;
print first { $_ eq 'bar' } #array;
Other functions from List::Util like max, min, sum also may be useful for you
In addition to what eugene and stevenl posted, you might encounter problems with using both <> and <STDIN> in one script: <> iterates through (=concatenating) all files given as command line arguments.
However, should a user ever forget to specify a file on the command line, it will read from STDIN, and your code will wait forever on input
I could happen that if your array contains the string "hello", and if you are searching for "he", grep returns true, although, "he" may not be an array element.
Perhaps,
if (grep(/^$match$/, #array)) more apt.
You can also check single value in multiple arrays like,
if (grep /$match/, #array, #array_one, #array_two, #array_Three)
{
print "found it\n";
}

Simple multi-dimensional array with loop in perl

I'm trying to use an array and a loop to print out the following (basically for each letter of the alphabet, print each letter of the alphabet after it and then move on to the next letter). I'm new to perl, anyone have any quick words of :
aa
ab
ac
ad
...
ba
bb
bc
bd
...
ca
cb
...
Currently I have this, but it only prints a single character alphabet...
#arr = ("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z");
$i = #arr;
while ($i)
{
print $arr[$i];
$i--;
}
Using the range operator and the ranges you want to target:
use strict;
use warnings;
my #elements = ("aa" .. "zz");
for my $combo (#elements)
{
print "$combo\n";
}
You can utilize the initial 2 letters till the ending 2 letters you want as ending and the for will take care of everything.
This really isn't multi-dimensional array work, if it were you'd be working with stuff like:
my #foo = (
[1,2,3],
[4,7,8,1,2,3],
[2,3],
);
This is really a very basic how do I make a nested loop that iterates over the same array. I'll bet this is homework.
So, I'll let you figure out the nesting bits, but give some help with Perl's loop operators.
!! for/foreach
for (the each is optional) is the real heavy hitter for looping in perl. Use it like so:
for my $var ( #array ) {
#do stuff with $var
}
Each element in #array will be aliased to the $var variable, and the block of code will be executed. The fact that we are aliasing, rather than copying means that if alter the value of $var, #array will be changed as well. The stuff between the parenthesis may be any expression. The expression will be evaluated in list context. So if you put a file handle in the parens, the entire file will be read into memory and processed.
You can also leave off naming the loop variable, and $_ will be used instead. In general, DO NOT DO THIS.
!! C-Style for
Every once in a while you need to keep track of indexes as you loop over an array. This is when a C style for loop comes in handy.
for( my $i=0; $i<#array; $i++ ) {
# do stuff with $array[$i]
}
!! While/Until
While and until operate with boolean loop conditions. That means that the loop will repeat as long as the appropriate boolean value if found for the condition ( TRUE for while, and FALSE for until). In addition to the obvious cases where you are looking for a particular condition, while is great for processing a file one line at a time.
while ( my $line = <$fh> ) {
# Do stuff with $line.
}
!! map
map is an amazingly useful bit of functional programming kung-fu. It is used to turn one list into another. You pass an anonymous code reference that is used to enact the transformation.
# Multiply all elements of #old by two and store them in #new.
my #new = map { $_ * 2 } #old;
So how do you solve your particular problem? There are many ways. Which is best depends on how you want to use the results. If you want to create a new array of the letter pairs, use map. If you are interested primarily in a side effect (say printing a variable) use for. If you need to work with really big lists that come from sort of interator (like lines from a filehandle) use while.
Here's a solution. I wouldn't turn it in to your professor until you understand how it works.
print map { my $letter=$_; map "$letter$_\n", "a".."z" } "a".."z";
Look at perldoc articles, perlsyn for info on the looping constructs, perlfunc for info on map and look at perlop for info on the range operator (..).
Good luck.
Use the range operator (..) for your initialization. The range operator basically grabs a range of values such as numbers or characters.
Then use a nested loop to go through the array one time per character for a total of 26^2 iterations.
Rather than a while loop I've used a foreach loop to go through each item in the array. You could also put 'a' .. 'z' instead of declared #arr as the argument to the foreach loop. The foreach loops below set $char or $char2 to each value in #arr in turn.
my #arr = ('a' .. 'z');
for my $char (#arr) {
for my $char2 (#arr) {
print "$char$char2\n";
}
}
If all you really want to do is print the 676 strings you describe, then:
#!/usr/bin/perl
use warnings;
use strict;
my $str = 'aa';
while (length $str < 3) {
print $str++, "\n";
}
But I smell an "XY problem"...

Simplest way to match array of strings to search in perl?

What I want to do is check an array of strings against my search string and get the corresponding key so I can store it. Is there a magical way of doing this with Perl, or am I doomed to using a loop? If so, what is the most efficient way to do this?
I'm relatively new to Perl (I've only written 2 other scripts), so I don't know a lot of the magic yet, just that Perl is magic =D
Reference Array: (1 = 'Canon', 2 = 'HP', 3 = 'Sony')
Search String: Sony's Cyber-shot DSC-S600
End Result: 3
UPDATE:
Based on the results of discussion in this question, depending on your intent/criteria of what constitutes "not using a loop", the map based solution below (see "Option #1) may be the most concise solution, provided that you don't consider map a loop (the short version of the answers is: it's a loop as far as implementation/performance, it's not a loop from language theoretical point of view).
Assuming you don't care whether you get "3" or "Sony" as the answer, you can do it without a loop in a simple case, by building a regular expression with "or" logic (|) from the array, like this:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
Result from my test run: Sony
The regular expression will (once the variable $combined_search is interpolated by Perl) take the form /(Canon|HP|Sony)/ which is what you want.
This will NOT work as-is if any of the strings contain regex special characters (such as | or ) ) - in that case you need to escape them
NOTE: I personally consider this somewhat cheating, because in order to implement join(), Perl itself must do a loop somewhere inside the interpeter. So this answer may not satisfy your desire to remain loop-less, depending on whether you wanted to avoid a loop for performance considerations, of to have cleaner or shorter code.
P.S. To get "3" instead of "Sony", you will have to use a loop - either in an obvious way, by doing 1 match in a loop underneath it all; or by using a library that saves you from writing the loop yourself but will have a loop underneath the call.
I will provide 3 alternative solutions.
#1 option: - my favorite. Uses "map", which I personally still consider a loop:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
die "Not found" unless #which_found;
my $strings_index = 0;
my %strings_indexes = map {$_ => $strings_index++} #strings;
my $index = 1 + $strings_indexes{ $which_found[0] };
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#2 option: Uses a loop hidden behind a nice CPAN library method:
use List::MoreUtils qw(firstidx);
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
die "Not Found!"; unless #which_found;
print "$which_found[0]\n";
my $index_of_found = 1 + firstidx { $_ eq $which_found[0] } #strings;
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#3 option: Here's the obvious loop way:
my $found_index = -1;
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
foreach my $index (0..$#strings) {
next if $search_in !~ /$strings[$index]/;
$found_index = $index;
last; # quit the loop early, which is why I didn't use "map" here
}
# Check $found_index against -1; and if you want "3" instead of "2" add 1.
Here is a solution that builds a regular expression with embedded code to increment the index as perl moves through the regex:
my #brands = qw( Canon HP Sony );
my $string = "Sony's Cyber-shot DSC-S600";
use re 'eval'; # needed to use the (?{ code }) construct
my $index = -1;
my $regex = join '|' => map "(?{ \$index++ })\Q$_" => #brands;
print "index: $index\n" if $string =~ $regex;
# prints 2 (since Perl's array indexing starts with 0)
The string that is prepended to each brand first increments the index, and then tries to match the brand (escaped with quotemeta (as \Q) to allow for regex special characters in the brand names).
When the match fails, the regex engine moves past the alternation | and then the pattern repeats.
If you have multiple strings to match against, be sure to reset $index before each. Or you can prepend (?{$index = -1}) to the regex string.
An easy way is just to use a hash and regex:
my $search = "your search string";
my %translation = (
'canon' => 1,
'hp' => 2,
'sony' => 3
);
for my $key ( keys %translation ) {
if ( $search =~ /$key/i ) {
return $translation{$key};
)
}
Naturally the return can just as easily be a print. You can also surround the entire thing in a while loop with:
while(my $search = <>) {
#your $search is declared = to <> and now gets its values from STDIN or strings piped to this script
}
Please also take a look at perl's regex features at perlre
and take a look at perl's data structures at perlref
EDIT
as was just pointed out to me you were trying to steer away from using a loop. Another method would be to use perl's map function. Take a look here.
You can also take a look at Regexp::Assemble, which will take a collection of sub-regexes and build a single super-regex from them that can then be used to test for all of them at once (and gives you the text which matched the regex, of course). I'm not sure that it's the best solution if you're only looking at three strings/regexes that you want to match, but it's definitely the way to go if you have a substantially larger target set - the project I initially used it on has a library of some 1500 terms that it's matching against and it performs very well.

How do I print unique elements in Perl array?

I'm pushing elements into an array during a while statement. Each element is a teacher's name. There ends up being duplicate teacher names in the array when the loop finishes. Sometimes they are not right next to each other in the array, sometimes they are.
How can I print only the unique values in that array after its finished getting values pushed into it? Without having to parse the entire array each time I want to print an element.
Heres the code after everything has been pushed into the array:
$faculty_len = #faculty;
$i=0;
while ($i != $faculty_len)
{
printf $fh '"'.$faculty[$i].'"';
$i++;
}
use List::MoreUtils qw/ uniq /;
my #unique = uniq #faculty;
foreach ( #unique ) {
print $_, "\n";
}
Your best bet would be to use a (basically) built-in tool, like uniq (as described by innaM).
If you don't have the ability to use uniq and want to preserve order, you can use grep to simulate that.
my %seen;
my #unique = grep { ! $seen{$_}++ } #faculty;
# printing, etc.
This first gives you a hash where each key is each entry. Then, you iterate over each element, counting how many of them there are, and adding the first one. (Updated with comments by brian d foy)
I suggest pushing it into a hash.
like this:
my %faculty_hash = ();
foreach my $facs (#faculty) {
$faculty_hash{$facs} = 1;
}
my #faculty_unique = keys(%faculty_hash);
#array1 = ("abc", "def", "abc", "def", "abc", "def", "abc", "def", "xyz");
#array1 = grep { ! $seen{ $_ }++ } #array1;
print "#array1\n";
This question is answered with multiple solutions in perldoc. Just type at command line:
perldoc -q duplicate
Please note: Some of the answers containing a hash will change the ordering of the array. Hashes dont have any kind of order, so getting the keys or values will make a list with an undefined ordering.
This doen't apply to grep { ! $seen{$_}++ } #faculty
This is a one liner command to print unique lines in order it appears.
perl -ne '$seen{$_}++ || print $_' fileWithDuplicateValues
I just found hackneyed 3 liner, enjoy
my %uniq;
undef #uniq(#non_uniq_array);
my #uniq_array = keys %uniq;
Just another way to do it, useful only if you don't care about order:
my %hash;
#hash{#faculty}=1;
my #unique=keys %hash;
If you want to avoid declaring a new variable, you can use the somehow underdocumented global variable %_
#_{#faculty}=1;
my #unique=keys %_;
If you need to process the faculty list in any way, a map over the array converted to a hash for key coalescing and then sorting keys is another good way:
my #deduped = sort keys %{{ map { /.*/? ($_,1):() } #faculty }};
print join("\n", #deduped)."\n";
You process the list by changing the /.*/ regex for selecting or parsing and capturing accordingly, and you can output one or more mutated, non-unique keys per pass by making ($_,1):() arbitrarily complex.
If you need to modify the data in-flight with a substitution regex, say to remove dots from the names (s/\.//g), then a substitution according to the above pattern will mutate the original #faculty array due to $_ aliasing. You can get around $_ aliasing by making an anonymous copy of the #faculty array (see the so-called "baby cart" operator):
my #deduped = sort keys %{{ map {/.*/? do{s/\.//g; ($_,1)}:()} #{[ #faculty ]} }};
print join("\n", #deduped)."\n";
print "Unmolested array:\n".join("\n", #faculty)."\n";
In more recent versions of Perl, you can pass keys a hashref, and you can use the non-destructive substitution:
my #deduped = sort keys { map { /.*/? (s/\.//gr,1):() } #faculty };
Otherwise, the grep or $seen[$_]++ solutions elsewhere may be preferable.