re-order alphabet sorting in Perl - perl

I am trying to fix sorting in Armenian alphabet, because all standard Unix tools and programming languages sort letters and words as a result for only 1 of the 2 major dialects (Western).
Translating this into technical problem is to re-order one of the chars "ւ", to put it in different place among letters, let's say to make it the last character so that words are ordered correctly for the order dialect (Eastern). Linguistically speaking in Eastern dialect this "ւ" symbol is not written "standalone" but is a part of letter that's written with 2 chars "ու". Current sorting puts letter "ու" behind "ոք" or "ոփ" 2-letter constructs.
Basically, it should be totally similar if you wanted to make e. g. letter "v" be on place of letter "z" in Latin alphabet.
I am trying to use something like
#!/usr/bin/perl -w
use strict;
my (#sortd, #unsortd, $char_u, $char_x);
##unsortd = qw(աբասի ապուշ ապրուստ թուր թովիչ թոշակ թոք);
#unsortd = qw(ու ոց ոք ոփ);
#sortd = sort {
$char_u = "ւ";
$char_x = split(//, #unsortd);
if ($char_u gt $char_x) {
1;
} else {
return $a cmp $b;
}
} #unsortd;
print "#sortd\n";
but that does not scale for whole words, just 2 letter forms are fixed.
UPDATE: I was able to solve this using tr function to map letters to numbers as shown in Perlmonks

You should have a look at the Unicode::Collate::Locale module if you haven't done so already.
use Unicode::Collate::Locale;
my $collator = Unicode::Collate::Locale->new(locale => "hy");
#sortd = $collator->sort(#unsortd);
print join("\n", #sortd, '');
This prints:
ու
ոց
ոք
ոփ
(I'm not sure this is the output you're expecting, but that module and Unicode::Collate has quite a lot of information, it might be easier to create a custom collation for your needs based on that rather than rolling your own.)

For standard alphabets Unicode::Collate::Locale as suggested by #mat should be the first choice.
On the other hand, if you have very specific needs `index' can be used as follows. To sort single characters (note that missing characters would be first):
my $alphabet_A = "acb";
sub by_A {index($alphabet_A,$a) <=> index($alphabet_A,$b)};
...
my #sorted = sort by_A #unsorted;
For words, one can include a loop in the definition of by_A. For the following to work define the function min() and fine-tune the case of words of different lengths:
sub by_A {
$flag=0;
foreach my $i (0..min(length($a),length($b))-1) {
return ($flag) if ($flag);
$flag = ($flag or
index($alphabet_A,substr($a,$i,1)) <=> index($alphabet_A,substr($b,$i,1)));
}
return $flag;
}

Related

How can I remove all the vowels unless they are in word beginnings?

$text = "I like apples more than oranges\n";
#words = split /” “/, $text;
foreach (#words) [1..] {
if $words "AEIOUaeiou";
$words =~ tr/A E I O U a e i o u//d;
}
print "$words\n";
"I like apples more than oranges" will become "I lk appls mr thn orngs". "I" in "I", "a" in "appls" and "o" in "orngs" will stay because they are the first letter in the word.
This is my research assignment as a first year student. I am allowed to ask questions and later cite them. Please don't be mean.
I know you say you are not allowed to use a regex, but for everyone else that shows up here I'll show the use of proper tools. But, then I'll do something just as useful with tr///.
One of the tricks of programming (and mathematics) decomposing what look like hard problems into easier problems, especially if you already have solutions for the easy problems. (Read about Parnas decomposition, for example).
So, the question is "How can I remove all the vowels unless they are in word beginnings?" (after I made your title a bit shorter). This led the answers to think about words, so they split up the input, did some work to ensure they weren't working on the first character, and then reassembled the result.
But, another way to frame the problem is "How do I remove all the vowels that come after another letter?". The only letter that doesn't come after another letter is the first letter of a word.
The regex for a vowel that comes after another letter is simple (but I'll stick to ASCII here, although it is just as simple for any Unicode letter):
[a-z][aeiou]
That only matches when there is a vowel after the first letter. Now you want to replace all of those with nothing. Use the substitution operator, s///. The /g flag makes all global substitutions and the /i makes it case insensitive:
s/[a-z][aeiou]//gi;
But, there's a problem. It also replaces that leading letter. That's easy enough to fix. The \K in a substitution says to ignore the part of the pattern before it in the replacement. Anything before the \K is not replaced. So, this only replaces the vowels:
s/[a-z]\K[aeiou]//gi;
But, maybe there are vowels next to each other, so throw in the + quantifier for "one or more" of the preceding item:
s/[a-z]\K[aeiou]+//gi;
You don't need to care about words at all.
Some other ways
Saying that a letter must follow another letter has a special zero-width assertion: the non-word boundary, \B (although that also counts digits and underscore as "letters"):
s/\B[aeiou]+//gi;
The \K was introduced v5.10 and was really a nifty trick to have a variable-width lookbehind. But, the lookbehind here is fixed width: it's one character:
s/(?<=[a-z])[aeiou]+//gi;
But, caring about words
Suppose you need to handle each word separately, for some other requirement. It looks like you've mixed a little Python-ish sort of code, and it would be nice if Perl could do that :). The problem doesn't change that much because you can do the same thing for each individual word.
foreach my $word ( split /\s+/, $x ) {
.... # same thing for each word
}
But, here's an interesting twist? How do you put it all back together? The other solutions just use a single space assuming that's the separator. Maybe there should be two spaces, or tabs, or whatever. The split has a special "separator retention mode" that can keep whatever was between the pieces. When you have captures in the split pattern, those capture values are part of the output list:
my #words_and_separators = split /(\s+)/, $x;
Since you know that none of the separators will have vowels, you can make substitutions on them knowing they won't change. This means you can treat them just like the words (that is, there is no special case, which is another thing to think about as you decompose problems). To get your final string with the original spacing, join on the empty string:
my $ending_string = join '', #words_and_separators;
So, here's how that might all look put together. I'll add the /r flag on the substitution so it returns the modified copy instead of working on the original (don't modify the control variable!):
my #words;
foreach my $word ( split /(\s+)/, $x ) {
push #words, $word =~ s/\B[aeiou]+//gr;
}
my $ending_string = join '', #words;
But, that foreach is a bit annoying. This list pipeline is the same, and it's easier to read these bottom to top. Each thing produces a list that flows into the thing above it. This is how I'd probably express it in real code:
my $ending_string =
join '',
map { s/\B[aeiou]+//gr } # each item is in $_
split /(\s+)/, $x;
Now, here's the grand finale. What if we didn't split thing up on whitespace but on whitespace and the first letter of each word? With separator retention mode we know that we only have to affect every other item, so we count them as we do the map:
my $n = 0;
my $ending_string =
join '',
map { ++$n % 2 ? tr/aeiouAEIOU//dr : $_ }
split /((?:^|\s+)[a-z])/i, $x;
But, I wouldn't write this technique in this way because someone would ultimately find me and exact their revenge. Instead, that foreach I found annoying before may soothe the angry masses:
my $n = 0;
foreach ( split /((?:^|\s+)[a-z])/i, $x ) {
print ++$n % 2 ? tr/aeiouAEIOU//dr : $_;
}
This now remembers the actual separators from the original string and leaves alone the first character of the "word" because it's not in the element we will modify.
The code in the foreach doesn't need to use the conditional operator, ?: or some of the other features. The important part is skipping every other element. That split pattern is a bit of a puzzler if you haven't seen it before, but that's what you get with those sorts of requirements. I think modifying a portion of the substring is just as likely to trip up people on a first read.
I mean, if they are going to make you do it the wrong way in the homework, strike back with something that will take up a bit of their time. :)
Oh, this is fun
I had another idea, because tr/// has another task beyond transliteration. It also counts. Because it returns the number of replacements, if you replace anything with itself, you get a count of the occurrences of that thing. You can count vowels, for instance:
my $has_vowels = $string =~ tr/aeiou/aeiou/; # counts vowels
But, with a string of one letter, that means you have a way to tell if it is a vowel:
my $is_vowel = substr( $string, $i, 1 ) =~ tr/aeiou/aeiou/;
You also can know things about the previous character:
my $is_letter = substr( $string, $i - 1, 1 ) =~ tr/a-zA-Z/a-zA-Z/;
Put that together and you can look at any position and know if it's a vowel that follows a letter. If so, you skip that letter. Otherwise, you add that letter to the output:
use v5.10;
$x = "I like apples more than oranges oooooranges\n";
my $output = substr $x, 0, 1; # avoid the -1 trap (end of string!)
for( my $i = 1; $i < length $x; $i++ ) {
if( substr( $x, $i, 1 ) =~ tr/aeiou/aeiou/ ) { # is a vowel
next if substr( $x, $i - 1, 1 ) =~ tr/a-zA-Z/a-zA-Z/;
}
$output .= substr $x, $i, 1;
}
say $output;
This has the fun consequence of using the recommended operator but completely bypassing the intent. But, this is a proper and intended use of tr///.
It appears that you need to put a little more effort into learning Perl before taking on challenges like this. Your example contains a lot of code that simply isn't valid Perl.
$x = "I like apples more than oranges\n"; #the original sentence
foreach $i in #x[1..] {
You assign your text to the scalar variable $x, but then try to use the array variable #x. In Perl, these are two completely separate variables that have no connection whatsoever. Also, in Perl, the range operator (..) needs values at both ends.
If you had an array called #x (and you don't, you have a scalar) then you could do what you're trying to do here with foreach $i (#x)
if $i "AEIOUaeiou";
I'm not sure what you're trying to do here. I guess the nearest useful Perl expression I can see would be something like:
if ($i =~ /^[AEIOUaeiou]$/)
Which would test if $i is a vowel. But that's a regex, so you're not allowed to use it.
Obviously, I'd solve this problem with a regex, but as those are banned, I've reached for some slightly more obscure Perl features in my code below (that's so your teacher won't believe this is your solution if you just cut and paste it):
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = "I like apples more than oranges\n";
# Split the string into an array of words
my #words = split /\s+/, $text;
# For each word...
for (#words) {
# Get a substring that omits the first character
# and use tr/// to remove vowels from that substring
substr($_, 1) =~ tr/AEIOUaeiou//d;
}
# Join the array back together
$text = join ' ', #words;
say $text;
Update: Oh, and notice that I've used tr/AEIUOaeiou//d where you have tr/A E I O U a e i o u//d. It probably won't make any difference here (depending on your approach - but you'll probably be applying it to strings that don't contain spaces) but it's good practice to only include the characters that you want to remove.
We can go over the input string from the end and remove any vowel that's not preceded by a space. We go from right to left so we don't have to adjust the position after each deletion. We don't need to check the very first letter, it shouldn't be ever removed. To remove a vowel, we can use tr///d on the substr of the original string.
for my $i (reverse 1 .. length $x) {
substr($x, $i, 1) =~ tr/aeiouAEIOU//d
if substr($x, $i - 1, 1) ne ' ';
}
Firstly your if statement is wrong.
Secondly this is not a Perl code.
Here is a piece of code that will work, but there is a better way to do it
my $x = "I like apples more than oranges\n";
my $new = "";
my #arr;
foreach my $word (split(' ', $x)) {
#arr = split('', $word);
foreach (my $i; $i<scalar #arr; $i++){
if ($i == 0){
$new .= $arr[$i];
}
elsif (index("AEIOUaeiou", $arr[$i]) == -1) {
$new .= $arr[$i];
}
}
$new .= " ";
}
print "$new\n";
Here I am splitting the string in order to get an array, then I am checking if the given char is a vowel, if it's not, I am appending it to a new string.
Always include
use strict;
use warnings;
on top of your code.
Clearly this is an exercise in lvalues. Obviously. Indubitably!
#!/usr/bin/env perl
# any old perl will do
use 5.010;
use strict;
use warnings;
# This is not idomatic nor fantastic code. Idiotastic?
$_='I am yclept Azure-Orange, queueing to close a query. How are YOU?';
# My little paws typed "local pos" and got
# "Useless localization of match position" :(
# so a busy $b keeps/restores that value
while (/\b./g) {
substr($_,$b=pos,/\b/g && -$b+pos)
# Suggestion to use tr is poetic, not pragmatic,
# ~ tr is sometimes y and y is sometimes a vowel
=~ y/aeiouAEIOU//d;
pos=$b;
}
say
# "say" is the last word.
Was there an embargo against using s/// substitution, or against using all regex? For some reason I thought matching was OK, just not substitution. If matches are OK, I have an idea that "improves" upon this by removing $b through pattern matching side effects. Will see if it pans out. If not, should be pretty easy to replace /\b/ and pos with index and variables, though the definition of word boundary over-simplifies in that case.
(edit) here it is a little more legible with nary a regex
my $text="YO you are the one! The-only-person- asking about double spaces.
Unfortunate about newlines...";
for (my $end=length $text;
$end > 0 && (my $start = rindex $text,' ',$end);
$end = $start-1) {
# y is a beautiful letter, using it for vowels is poetry.
substr($text,2+$start,$end-$start) =~ y/aeiouUOIEA//d;
}
say $text;
Maybe more devious minds will succeed with vec, unpack, open, fork?
You can learn about some of these techniques via
perldoc -f substr
perldoc -f pos
perldoc re
As for my own implementer notes, the least important thing is ending without punctuation so nothing can go after

In Perl, how can I tell if a string is a number, but without using modules?

I have an array, each element is a string which consists of several words. The first word is identifier, which will not be considered. The rest part of the string would be number or alphabet only. I want to sort the alphabet part and then output the whole element. Following are my codes.
#!/usr/bin/perl
use strict;
use warnings;
my #log=("a1 9 2 3 1","gl cct car","zo4 4 7", "abl off key dog","a8 act zoo");
my #values;
my #letter_log;
my #letter_idf;
my #dig_log;
my $i;
$i=0;
foreach(#log)
{
#values=split(/\s+/,$_);chomp(#values);
#_= m/$values[0]\s/;
print "25 \$'=$';\n";
if($' =~ /\D\s+/){$letter_idf[$i]=$values[0];$letter_log[$i]="$'";}
else{$dig_log[$i]=$_;}
$i++;
}
#_=sort { $letter_log[$a] cmp $letter_log[$b] } 0..$#letter_log;
chomp(#_);
enter code here
for($i=0;$i<=$#letter_log;$i++){print"$letter_idf[$_[$i]]"."$letter_log[$_[$i]];\t";}
foreach(#dig_log){print "$_;\t";}
print "\n";
I hope the output is like below
( "g1 act car"; "a8 act zoo"; "ab1 off key dog"; "a1 9 2 3 1"; "zo4 4 7")
However, my #letter_log is empty. I highly appreciate it if you could point out the errors and provide a correct solution.
PLEASE DO NOT USE MODULES, LIKE
import Scalar::Util qw(look_like_number);
I tried to avoid the modules because I want to study how to tell if a string is a number or not. Thanks!
The provided code that demonstrates your intention is not a perfect match for the title since it does a lot more. I would like to address only the main question in the title here.
If you can first define what a number is (instead of using a indirect definition of "whatever perl think a numbers is"), it becomes an easier problem.
For example, if you define the number you are looking for as a string with multiple characters of digit 0..9, then a simple regular expression /^[0-9]+$/ can serve your purpose.
Thanks for comments ikegami and melpomene.
The reason why I want to try it because the module is a "program" as well. I want to learn how to realize it by myself. After all, we meet all kinds of data which need to handle.
After many failures and tests, I figured it out and share it here.
first, the system variable "$'"changed before I assign it in if statement. I save it into another variable.
Second, I use an alternative way to realize it because the sequence numbers with spaces must be string. So, I only take one "character" to test it. It proves that this operation is correct. Do not forget the statements in my original post
"The first word is identifier, which will not be considered. The rest part of the string would be number or alphabet only. "
Here are my revised scripts. "NOTHING IS IMPOSSIBLE."
#!/usr/bin/perl -w
use strict;
use warnings;
#import Scalar::Util qw(look_like_number);
my #log=("a1 9 2 3 1","gl act car","zo4 4 7", "abl off key dog","a8 act zoo");
print "original log:\n";
$"=";\t"; #control array's delimiter
print "#log\n";
my #values;
my #letter_log;
my #letter_idf;
my #dig_log;
my $s;
my $i;
my $s_r;
$i=0;
foreach(#log)
{
#values=split(/\s+/,$_);chomp(#values);
#_= m/$values[0]\s/;
print "25 \$'=$';\n";
$s=$';
$s_r=substr($s,0,1);
if($s_r =~ /\D/)
{ print"26.0: $'\ts=$s\ts_r=$s_r\n";
$letter_idf[$i]=$values[0];
$letter_log[$i]=$s;
print "26: $letter_idf[$i]\t$letter_log[$i]\n";$i++;
}
else{$dig_log[$i]=$_;}
}
#_=sort { $letter_log[$a] cmp $letter_log[$b] } 0..$#letter_log;
chomp(#_);
print"35: srt letter log: #letter_log\n";
#foreach(#letter_log){print "$_;\t";}
print"38: nsrt letter log: ";
for($i=0;$i<=$#letter_log;$i++){print"$letter_idf[$_[$i]] "."$letter_log[$_[$i]];\t";}
print"\n";
print"41: digital log:";
foreach(#dig_log){print "$_;\t";}
print "\n";

Perl replace multiple strings simultaneously (case insensitive)

Consider the following perl code which works perfectly:
%replacements = ("what" => "its", "lovely" => "bad");
($val = $sentence) =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/g;
stackoverflow user sresevoir brilliantly came up with that replacement code that involved using a hash, allowing you to find and replace multiple terms without iterating through a loop.
I've been throwing other various search and replace terms at it programmatically and I've started using it to highlight words that are the result of a search.
The problem (refer to problem code shown below):
Make it case insensitive by adding an "i" before the "g" at the end.
If the search term $thisterm and the search term word contained in $sentence has no difference in case, there are no problems. If the search term $thisterm (i.e. Stackoverflow) and the search term word contained in $sentence is a different case (i.e. stackoverflow), then the result returned is nothing for that term. It's as if I told it to
$sentence =~ s/$thisterm//g;
Here's the problem code:
foreach $thisterm (#searchtermarray) {
# The variable $thisterm has already gone through a filter to remove special characters.
$thistermtochange = $thisterm;
$replacements{$thistermtochange} = "<span style=\"background-color:#FFFFCC;\">$thistermtochange<\/span>";
}
$sentence =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/ig;
I also went back and duplicated the problem with the above original code. It seems the combination of adding the i modifier, using a hash reference, and different case is something Perl doesn't like.
What am I missing?
Thanks,
DB
P. S. I've benefited from stackoverflow for years; but I just signed up for this question and the site wouldn't let me directly comment to sresevoir. As a "brand new" user I don't have enough reputation points.
Keep all the keys of the hash in lower case, and do this:
s/(#{[join "|", keys %replacements]})/$replacements{ lc $1 }/ig
(note the addition of lc)
There are a few other things you ought to consider.
First, as is, if you are trying to replace both lovely and love with different replacements, lovely may or may not ever be found, depending on which key is returned by keys first. To prevent this, it's a good idea to sort by descending length:
s/(#{[join "|", sort { length $b <=> length $a } keys %replacements]})/$replacements{$1}/ig
Second, this technique only works with fixed strings; if your keys contain any regex metacharacters, for instance replacing how? with why?, it will fail, because $1 will never be how?. To allow metacharacters (interpreted as literal characters), quote them:
s/(#{[join "|", map quotemeta, sort { length $b <=> length $a } keys %replacements]})/$replacements{$1}/ig
From your comment, it seems to me that you want to find certain strings, all in one pass, and add stuff around them (that doesn't vary by which string). If so, you are going about it the hard way and shouldn't be using a hash at all. Have an array of the strings you want to search for and replace them:
s/(#{[join "|", map quotemeta, sort { length $b <=> length $a } #search_strings]})/<span style="background-color:#FFFFCC;">$1<\/span>/ig;
The problem is that, if you have a hash like this
my %replacements = (
word => '<span style="background-color:#FFFFCC;">word</span>'
)
then the substitution will look like
s/(word)/$replacements{$1}/ig;
But a case-independent regex pattern will match WORD as well, so the replacement expression $replacements{$1} will be $replacements{'WORD'} which doesn't exist.
While you may be pleased with his solution, sresevoir uses an ugly way of embedding a string expression within a regex. This
($val = $sentence) =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/g;
would be much better as
my $pattern = join '|', keys %replacements;
($val = $sentence) =~ s/($pattern)/$replacements{$1}/g;
But you have generalised this hash idea too far and it is the wrong way to make the changes that you need. If your replacement string is a simple function of the original string, as in this case, then it is best written directly as a replacement string using captures from the pattern. I would write it like this
my $pattern = join '|', #searchtermarray;
$sentence =~ s{($pattern)}{<span style="background-color:#FFFFCC;">$1</span>\n}ig;
But note that that, as it stands, the search will find any words that are substrings of anything in the text, and will also go awry if #searchtermarray has any strings that contain regex metacharacters. You don't say anything about your actual data so I can't really help you to resolve this.

Simple multi-dimensional array with loop in perl

I'm trying to use an array and a loop to print out the following (basically for each letter of the alphabet, print each letter of the alphabet after it and then move on to the next letter). I'm new to perl, anyone have any quick words of :
aa
ab
ac
ad
...
ba
bb
bc
bd
...
ca
cb
...
Currently I have this, but it only prints a single character alphabet...
#arr = ("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z");
$i = #arr;
while ($i)
{
print $arr[$i];
$i--;
}
Using the range operator and the ranges you want to target:
use strict;
use warnings;
my #elements = ("aa" .. "zz");
for my $combo (#elements)
{
print "$combo\n";
}
You can utilize the initial 2 letters till the ending 2 letters you want as ending and the for will take care of everything.
This really isn't multi-dimensional array work, if it were you'd be working with stuff like:
my #foo = (
[1,2,3],
[4,7,8,1,2,3],
[2,3],
);
This is really a very basic how do I make a nested loop that iterates over the same array. I'll bet this is homework.
So, I'll let you figure out the nesting bits, but give some help with Perl's loop operators.
!! for/foreach
for (the each is optional) is the real heavy hitter for looping in perl. Use it like so:
for my $var ( #array ) {
#do stuff with $var
}
Each element in #array will be aliased to the $var variable, and the block of code will be executed. The fact that we are aliasing, rather than copying means that if alter the value of $var, #array will be changed as well. The stuff between the parenthesis may be any expression. The expression will be evaluated in list context. So if you put a file handle in the parens, the entire file will be read into memory and processed.
You can also leave off naming the loop variable, and $_ will be used instead. In general, DO NOT DO THIS.
!! C-Style for
Every once in a while you need to keep track of indexes as you loop over an array. This is when a C style for loop comes in handy.
for( my $i=0; $i<#array; $i++ ) {
# do stuff with $array[$i]
}
!! While/Until
While and until operate with boolean loop conditions. That means that the loop will repeat as long as the appropriate boolean value if found for the condition ( TRUE for while, and FALSE for until). In addition to the obvious cases where you are looking for a particular condition, while is great for processing a file one line at a time.
while ( my $line = <$fh> ) {
# Do stuff with $line.
}
!! map
map is an amazingly useful bit of functional programming kung-fu. It is used to turn one list into another. You pass an anonymous code reference that is used to enact the transformation.
# Multiply all elements of #old by two and store them in #new.
my #new = map { $_ * 2 } #old;
So how do you solve your particular problem? There are many ways. Which is best depends on how you want to use the results. If you want to create a new array of the letter pairs, use map. If you are interested primarily in a side effect (say printing a variable) use for. If you need to work with really big lists that come from sort of interator (like lines from a filehandle) use while.
Here's a solution. I wouldn't turn it in to your professor until you understand how it works.
print map { my $letter=$_; map "$letter$_\n", "a".."z" } "a".."z";
Look at perldoc articles, perlsyn for info on the looping constructs, perlfunc for info on map and look at perlop for info on the range operator (..).
Good luck.
Use the range operator (..) for your initialization. The range operator basically grabs a range of values such as numbers or characters.
Then use a nested loop to go through the array one time per character for a total of 26^2 iterations.
Rather than a while loop I've used a foreach loop to go through each item in the array. You could also put 'a' .. 'z' instead of declared #arr as the argument to the foreach loop. The foreach loops below set $char or $char2 to each value in #arr in turn.
my #arr = ('a' .. 'z');
for my $char (#arr) {
for my $char2 (#arr) {
print "$char$char2\n";
}
}
If all you really want to do is print the 676 strings you describe, then:
#!/usr/bin/perl
use warnings;
use strict;
my $str = 'aa';
while (length $str < 3) {
print $str++, "\n";
}
But I smell an "XY problem"...

Simplest way to match array of strings to search in perl?

What I want to do is check an array of strings against my search string and get the corresponding key so I can store it. Is there a magical way of doing this with Perl, or am I doomed to using a loop? If so, what is the most efficient way to do this?
I'm relatively new to Perl (I've only written 2 other scripts), so I don't know a lot of the magic yet, just that Perl is magic =D
Reference Array: (1 = 'Canon', 2 = 'HP', 3 = 'Sony')
Search String: Sony's Cyber-shot DSC-S600
End Result: 3
UPDATE:
Based on the results of discussion in this question, depending on your intent/criteria of what constitutes "not using a loop", the map based solution below (see "Option #1) may be the most concise solution, provided that you don't consider map a loop (the short version of the answers is: it's a loop as far as implementation/performance, it's not a loop from language theoretical point of view).
Assuming you don't care whether you get "3" or "Sony" as the answer, you can do it without a loop in a simple case, by building a regular expression with "or" logic (|) from the array, like this:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
Result from my test run: Sony
The regular expression will (once the variable $combined_search is interpolated by Perl) take the form /(Canon|HP|Sony)/ which is what you want.
This will NOT work as-is if any of the strings contain regex special characters (such as | or ) ) - in that case you need to escape them
NOTE: I personally consider this somewhat cheating, because in order to implement join(), Perl itself must do a loop somewhere inside the interpeter. So this answer may not satisfy your desire to remain loop-less, depending on whether you wanted to avoid a loop for performance considerations, of to have cleaner or shorter code.
P.S. To get "3" instead of "Sony", you will have to use a loop - either in an obvious way, by doing 1 match in a loop underneath it all; or by using a library that saves you from writing the loop yourself but will have a loop underneath the call.
I will provide 3 alternative solutions.
#1 option: - my favorite. Uses "map", which I personally still consider a loop:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
die "Not found" unless #which_found;
my $strings_index = 0;
my %strings_indexes = map {$_ => $strings_index++} #strings;
my $index = 1 + $strings_indexes{ $which_found[0] };
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#2 option: Uses a loop hidden behind a nice CPAN library method:
use List::MoreUtils qw(firstidx);
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
die "Not Found!"; unless #which_found;
print "$which_found[0]\n";
my $index_of_found = 1 + firstidx { $_ eq $which_found[0] } #strings;
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#3 option: Here's the obvious loop way:
my $found_index = -1;
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
foreach my $index (0..$#strings) {
next if $search_in !~ /$strings[$index]/;
$found_index = $index;
last; # quit the loop early, which is why I didn't use "map" here
}
# Check $found_index against -1; and if you want "3" instead of "2" add 1.
Here is a solution that builds a regular expression with embedded code to increment the index as perl moves through the regex:
my #brands = qw( Canon HP Sony );
my $string = "Sony's Cyber-shot DSC-S600";
use re 'eval'; # needed to use the (?{ code }) construct
my $index = -1;
my $regex = join '|' => map "(?{ \$index++ })\Q$_" => #brands;
print "index: $index\n" if $string =~ $regex;
# prints 2 (since Perl's array indexing starts with 0)
The string that is prepended to each brand first increments the index, and then tries to match the brand (escaped with quotemeta (as \Q) to allow for regex special characters in the brand names).
When the match fails, the regex engine moves past the alternation | and then the pattern repeats.
If you have multiple strings to match against, be sure to reset $index before each. Or you can prepend (?{$index = -1}) to the regex string.
An easy way is just to use a hash and regex:
my $search = "your search string";
my %translation = (
'canon' => 1,
'hp' => 2,
'sony' => 3
);
for my $key ( keys %translation ) {
if ( $search =~ /$key/i ) {
return $translation{$key};
)
}
Naturally the return can just as easily be a print. You can also surround the entire thing in a while loop with:
while(my $search = <>) {
#your $search is declared = to <> and now gets its values from STDIN or strings piped to this script
}
Please also take a look at perl's regex features at perlre
and take a look at perl's data structures at perlref
EDIT
as was just pointed out to me you were trying to steer away from using a loop. Another method would be to use perl's map function. Take a look here.
You can also take a look at Regexp::Assemble, which will take a collection of sub-regexes and build a single super-regex from them that can then be used to test for all of them at once (and gives you the text which matched the regex, of course). I'm not sure that it's the best solution if you're only looking at three strings/regexes that you want to match, but it's definitely the way to go if you have a substantially larger target set - the project I initially used it on has a library of some 1500 terms that it's matching against and it performs very well.