Using Perl, how do I show the context around a search term in the search results? - perl

I am writing a Perl script that is searching for a term in large portions of text. What I would like to display back to the user is a small subset of the text around the search term, so the user can have context of where this search term is used. Google search results are a good example of what I am trying to accomplish, where the context of your search term is displayed under the title of the link.
My basic search is using this:
if ($text =~ /$search/i ) {
print "${title}:${text}\n";
}
($title contains the title of the item the search term was found in)
This is too much though, since sometimes $text will be holding hundreds of lines of text.
This is going to be displayed on the web, so I could just provide the title as a link to the actual text, but there is no context for the user.
I tried modifying my regex to capture 4 words before and 4 words after the search term, but ran into problems if the search term was at the very beginning or very end of $text.
What would be a good way to accomplish this? I tried searching CPAN because I'm sure someone has a module for this, but I can't think of the right terms to search for. I would like to do this without modules if possible, because getting modules installed here is a pain. Does anyone have any ideas?

You can use $and $' to get the string before and after the match. Then truncate those values appropriately. But as blixtor points out, shlomif is correct to suggest using#+and#-to avoid the performance penalty imposed by $ and #' -
$foo =~ /(match)/;
my $match = $1;
#my $before = $`;
#my $after = $';
my $before = substr($foo, 0, $-[0]);
my $after = substr($foo, $+[0]);
$after =~ s/((?:(?:\w+)(?:\W+)){4}).*/$1/;
$before = reverse $before; # reverse the string to limit backtracking.
$before =~ s/((?:(?:\W+)(?:\w+)){4}).*/$1/;
$before = reverse $before;
print "$before -> $match <- $after\n";

Your initial attempt at 4 words before/after wasn't too far off.
Try:
if ($text =~ /((\S+\s+){0,4})($search)((\s+\S+){0,4})/i) {
my ($pre, $match, $post) = ($1, $3, $4);
...
}

I would suggest using the positional parameters - #+ and #- (see perldoc perlvar) to find the position in the string of the match, and how much it takes.

You could try the following:
if ($text =~ /(.*)$search(.*)/i ) {
my #before_words = split ' ', $1;
my #after_words = split ' ',$2;
my $before_str = get_last_x_words_from_array(#before_words);
my $after_str = get_first_x_words_from_array(#after_words);
print $before_str . ' ' . $search . ' ' . $after_str;
}
Some code obviously omitted, but this should give you an idea of the approach.
As far as extracting the title ... I think this approach does not lend itself to that very well.

Related

How can I remove all the vowels unless they are in word beginnings?

$text = "I like apples more than oranges\n";
#words = split /” “/, $text;
foreach (#words) [1..] {
if $words "AEIOUaeiou";
$words =~ tr/A E I O U a e i o u//d;
}
print "$words\n";
"I like apples more than oranges" will become "I lk appls mr thn orngs". "I" in "I", "a" in "appls" and "o" in "orngs" will stay because they are the first letter in the word.
This is my research assignment as a first year student. I am allowed to ask questions and later cite them. Please don't be mean.
I know you say you are not allowed to use a regex, but for everyone else that shows up here I'll show the use of proper tools. But, then I'll do something just as useful with tr///.
One of the tricks of programming (and mathematics) decomposing what look like hard problems into easier problems, especially if you already have solutions for the easy problems. (Read about Parnas decomposition, for example).
So, the question is "How can I remove all the vowels unless they are in word beginnings?" (after I made your title a bit shorter). This led the answers to think about words, so they split up the input, did some work to ensure they weren't working on the first character, and then reassembled the result.
But, another way to frame the problem is "How do I remove all the vowels that come after another letter?". The only letter that doesn't come after another letter is the first letter of a word.
The regex for a vowel that comes after another letter is simple (but I'll stick to ASCII here, although it is just as simple for any Unicode letter):
[a-z][aeiou]
That only matches when there is a vowel after the first letter. Now you want to replace all of those with nothing. Use the substitution operator, s///. The /g flag makes all global substitutions and the /i makes it case insensitive:
s/[a-z][aeiou]//gi;
But, there's a problem. It also replaces that leading letter. That's easy enough to fix. The \K in a substitution says to ignore the part of the pattern before it in the replacement. Anything before the \K is not replaced. So, this only replaces the vowels:
s/[a-z]\K[aeiou]//gi;
But, maybe there are vowels next to each other, so throw in the + quantifier for "one or more" of the preceding item:
s/[a-z]\K[aeiou]+//gi;
You don't need to care about words at all.
Some other ways
Saying that a letter must follow another letter has a special zero-width assertion: the non-word boundary, \B (although that also counts digits and underscore as "letters"):
s/\B[aeiou]+//gi;
The \K was introduced v5.10 and was really a nifty trick to have a variable-width lookbehind. But, the lookbehind here is fixed width: it's one character:
s/(?<=[a-z])[aeiou]+//gi;
But, caring about words
Suppose you need to handle each word separately, for some other requirement. It looks like you've mixed a little Python-ish sort of code, and it would be nice if Perl could do that :). The problem doesn't change that much because you can do the same thing for each individual word.
foreach my $word ( split /\s+/, $x ) {
.... # same thing for each word
}
But, here's an interesting twist? How do you put it all back together? The other solutions just use a single space assuming that's the separator. Maybe there should be two spaces, or tabs, or whatever. The split has a special "separator retention mode" that can keep whatever was between the pieces. When you have captures in the split pattern, those capture values are part of the output list:
my #words_and_separators = split /(\s+)/, $x;
Since you know that none of the separators will have vowels, you can make substitutions on them knowing they won't change. This means you can treat them just like the words (that is, there is no special case, which is another thing to think about as you decompose problems). To get your final string with the original spacing, join on the empty string:
my $ending_string = join '', #words_and_separators;
So, here's how that might all look put together. I'll add the /r flag on the substitution so it returns the modified copy instead of working on the original (don't modify the control variable!):
my #words;
foreach my $word ( split /(\s+)/, $x ) {
push #words, $word =~ s/\B[aeiou]+//gr;
}
my $ending_string = join '', #words;
But, that foreach is a bit annoying. This list pipeline is the same, and it's easier to read these bottom to top. Each thing produces a list that flows into the thing above it. This is how I'd probably express it in real code:
my $ending_string =
join '',
map { s/\B[aeiou]+//gr } # each item is in $_
split /(\s+)/, $x;
Now, here's the grand finale. What if we didn't split thing up on whitespace but on whitespace and the first letter of each word? With separator retention mode we know that we only have to affect every other item, so we count them as we do the map:
my $n = 0;
my $ending_string =
join '',
map { ++$n % 2 ? tr/aeiouAEIOU//dr : $_ }
split /((?:^|\s+)[a-z])/i, $x;
But, I wouldn't write this technique in this way because someone would ultimately find me and exact their revenge. Instead, that foreach I found annoying before may soothe the angry masses:
my $n = 0;
foreach ( split /((?:^|\s+)[a-z])/i, $x ) {
print ++$n % 2 ? tr/aeiouAEIOU//dr : $_;
}
This now remembers the actual separators from the original string and leaves alone the first character of the "word" because it's not in the element we will modify.
The code in the foreach doesn't need to use the conditional operator, ?: or some of the other features. The important part is skipping every other element. That split pattern is a bit of a puzzler if you haven't seen it before, but that's what you get with those sorts of requirements. I think modifying a portion of the substring is just as likely to trip up people on a first read.
I mean, if they are going to make you do it the wrong way in the homework, strike back with something that will take up a bit of their time. :)
Oh, this is fun
I had another idea, because tr/// has another task beyond transliteration. It also counts. Because it returns the number of replacements, if you replace anything with itself, you get a count of the occurrences of that thing. You can count vowels, for instance:
my $has_vowels = $string =~ tr/aeiou/aeiou/; # counts vowels
But, with a string of one letter, that means you have a way to tell if it is a vowel:
my $is_vowel = substr( $string, $i, 1 ) =~ tr/aeiou/aeiou/;
You also can know things about the previous character:
my $is_letter = substr( $string, $i - 1, 1 ) =~ tr/a-zA-Z/a-zA-Z/;
Put that together and you can look at any position and know if it's a vowel that follows a letter. If so, you skip that letter. Otherwise, you add that letter to the output:
use v5.10;
$x = "I like apples more than oranges oooooranges\n";
my $output = substr $x, 0, 1; # avoid the -1 trap (end of string!)
for( my $i = 1; $i < length $x; $i++ ) {
if( substr( $x, $i, 1 ) =~ tr/aeiou/aeiou/ ) { # is a vowel
next if substr( $x, $i - 1, 1 ) =~ tr/a-zA-Z/a-zA-Z/;
}
$output .= substr $x, $i, 1;
}
say $output;
This has the fun consequence of using the recommended operator but completely bypassing the intent. But, this is a proper and intended use of tr///.
It appears that you need to put a little more effort into learning Perl before taking on challenges like this. Your example contains a lot of code that simply isn't valid Perl.
$x = "I like apples more than oranges\n"; #the original sentence
foreach $i in #x[1..] {
You assign your text to the scalar variable $x, but then try to use the array variable #x. In Perl, these are two completely separate variables that have no connection whatsoever. Also, in Perl, the range operator (..) needs values at both ends.
If you had an array called #x (and you don't, you have a scalar) then you could do what you're trying to do here with foreach $i (#x)
if $i "AEIOUaeiou";
I'm not sure what you're trying to do here. I guess the nearest useful Perl expression I can see would be something like:
if ($i =~ /^[AEIOUaeiou]$/)
Which would test if $i is a vowel. But that's a regex, so you're not allowed to use it.
Obviously, I'd solve this problem with a regex, but as those are banned, I've reached for some slightly more obscure Perl features in my code below (that's so your teacher won't believe this is your solution if you just cut and paste it):
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = "I like apples more than oranges\n";
# Split the string into an array of words
my #words = split /\s+/, $text;
# For each word...
for (#words) {
# Get a substring that omits the first character
# and use tr/// to remove vowels from that substring
substr($_, 1) =~ tr/AEIOUaeiou//d;
}
# Join the array back together
$text = join ' ', #words;
say $text;
Update: Oh, and notice that I've used tr/AEIUOaeiou//d where you have tr/A E I O U a e i o u//d. It probably won't make any difference here (depending on your approach - but you'll probably be applying it to strings that don't contain spaces) but it's good practice to only include the characters that you want to remove.
We can go over the input string from the end and remove any vowel that's not preceded by a space. We go from right to left so we don't have to adjust the position after each deletion. We don't need to check the very first letter, it shouldn't be ever removed. To remove a vowel, we can use tr///d on the substr of the original string.
for my $i (reverse 1 .. length $x) {
substr($x, $i, 1) =~ tr/aeiouAEIOU//d
if substr($x, $i - 1, 1) ne ' ';
}
Firstly your if statement is wrong.
Secondly this is not a Perl code.
Here is a piece of code that will work, but there is a better way to do it
my $x = "I like apples more than oranges\n";
my $new = "";
my #arr;
foreach my $word (split(' ', $x)) {
#arr = split('', $word);
foreach (my $i; $i<scalar #arr; $i++){
if ($i == 0){
$new .= $arr[$i];
}
elsif (index("AEIOUaeiou", $arr[$i]) == -1) {
$new .= $arr[$i];
}
}
$new .= " ";
}
print "$new\n";
Here I am splitting the string in order to get an array, then I am checking if the given char is a vowel, if it's not, I am appending it to a new string.
Always include
use strict;
use warnings;
on top of your code.
Clearly this is an exercise in lvalues. Obviously. Indubitably!
#!/usr/bin/env perl
# any old perl will do
use 5.010;
use strict;
use warnings;
# This is not idomatic nor fantastic code. Idiotastic?
$_='I am yclept Azure-Orange, queueing to close a query. How are YOU?';
# My little paws typed "local pos" and got
# "Useless localization of match position" :(
# so a busy $b keeps/restores that value
while (/\b./g) {
substr($_,$b=pos,/\b/g && -$b+pos)
# Suggestion to use tr is poetic, not pragmatic,
# ~ tr is sometimes y and y is sometimes a vowel
=~ y/aeiouAEIOU//d;
pos=$b;
}
say
# "say" is the last word.
Was there an embargo against using s/// substitution, or against using all regex? For some reason I thought matching was OK, just not substitution. If matches are OK, I have an idea that "improves" upon this by removing $b through pattern matching side effects. Will see if it pans out. If not, should be pretty easy to replace /\b/ and pos with index and variables, though the definition of word boundary over-simplifies in that case.
(edit) here it is a little more legible with nary a regex
my $text="YO you are the one! The-only-person- asking about double spaces.
Unfortunate about newlines...";
for (my $end=length $text;
$end > 0 && (my $start = rindex $text,' ',$end);
$end = $start-1) {
# y is a beautiful letter, using it for vowels is poetry.
substr($text,2+$start,$end-$start) =~ y/aeiouUOIEA//d;
}
say $text;
Maybe more devious minds will succeed with vec, unpack, open, fork?
You can learn about some of these techniques via
perldoc -f substr
perldoc -f pos
perldoc re
As for my own implementer notes, the least important thing is ending without punctuation so nothing can go after

Perl: How to count number of times 3-word phrase (with gaps) occurs within an N-word window

I'm trying to count the number of times a 3-word phrase occurs within a 12-word window in a document, but the difficulty is that the keywords I'm searching for can be spread throughout the window.
For example:
I want to find the phrase "expect bad weather" within a 12-word phrase, where other words can be inserted between the 3 desired words as long as the total phrase in which the 3 words are contained does not exceed 12 words.
Phrases which would work:
I expect there will be bad weather.
They expect bad and windy weather.
I expect, although no one has confirmed this, that bad weather is on
the way.
I've struggled to figure out how to do this. I know how to count occurrences of 2-word phrases where there can be a gap between. For example, if I'm counting how often "expect" and "weather" occur within a 12-word phrase, I can do:
$mycount =()= $text =~ /\b(?:expect\W+(?:\w+\W+){0,10}?weather)\b/gi;
However, it's not as simple when I want to do this with 3 words, because I end up with 2 gaps which must sum together so that my window doesn't exceed 12 words. Ideally I would be able to do something like:
$mycount =()= $text =~ /\b(?:expect\W+(?:\w+\W+){0,$Gap1}?bad\W+(?:\w+\W+){0,$Gap2}?weather)\b/gi;
Where $Gap2 = 9 - $Gap1, but I don't think there's a way to do this.
I also thought of creating a loop so that in one iteration of the loop, $Gap1=0 and $Gap2=9, in the second iteration $Gap1=1 and $Gap2=8, etc, and then adding the counts of all of the loops. However, doing this will doublecount some instances of the phrase.
I'm at a loss. Does anyone have any ideas? I can't find any relevant examples anywhere.
Note   This post addresses the question of finding words spread out within a window, as asked. It does not consider the far more involved issues of general text parsing or language analysis.
The code below searches for the first word and then continues with another regex for the other two. There it scans the text word by word and keeps a counter so it can stop at 12 words. It uses pos to control where it should continue after checking the window.
The 12-long window is taken to start with word expect once it is found, as clarified in comments. The search continues from after the completed phrase, for the next one.
If the phrase is not found within the next 11 words the engine is returned to the position after expect to carry on with the search (as there may be another expect within the checked 11 words).
use warnings;
use strict;
use feature 'say';
my $s = q(I expect, although no one confirmed, that bad weather is on the way.)
. q( Expect that we cannot expect to escape the bad, bad weather.);
my $word_range = 12;
my ($w1, $w2, $w3) = qw(expect bad weather);
FIRST_WORD: while ($s =~ /\b($w1)\b/gi) {
#say "SEARCH, at ", pos $s;
my ($one, $pos_one) = ($1, pos $s);
my ($two, $three, $cnt);
while ($s =~ /(\w+)/g) {
my $word = $1;
#say "\t$word ... (at ", pos $s, ")";
$two = $1 if $word =~ /\b($w2)\b/i;
if ( $two and (($three) = $word =~ /\b($w3)\b/i) ) {
say "$one + $two + $three (pos ", pos $s, ')';
next FIRST_WORD;
}
last if ++$cnt == $word_range-1; # failed (these 11 + 'expect')
}
pos $s = $pos_one; # return to position in string after 'expect'
}
Note that one cannot assign the match (for $one) inside the loop condition as that puts the matching in the list context and thus disturbs the needed behavior of /g and pos.
The prints which are commented out can be used to track the operation. As it stands this prints
expect + bad + weather (pos 53)
Expect + bad + weather (pos 128)
I extend the string to test multiple occurrences of the phrase. The operation with failed matches can be tested by crippling keywords and tracking the position in the search.
A possible extra keyword inside of the phrase, as in the second sentence, is ignored and the phrase is accepted if there, as this is unspecified but implicit in the question. This is easily changed.
If there were more words in the phrase they would all be sought in the inner while loop, in the same way as the last two are now, by matching them sequentially (requiring for each word that all preceding words had been found). The outer while loop is needed only to start the window.
After a failed window-scan the outer while continues its search for expect from the position of the window beginning, thus scanning the same 11 words again.
This repeated search through the text can be reduced by checking for expect as well during the window scan. Then scan afresh from that position, with the inner while
# First sentence shortened and now does not contain the phrase
my $s = q(I expect, although no one confirmed, that bad expect.)
. q( Expect that we cannot expect to escape the bad, bad weather.);
...
FIRST_WORD: while ($s =~ /\b($w1)\b/gi) {
my ($one, $pos_one) = ($1, pos $s);
my ($two, $three, $cnt, $pos_one_new);
while ($s =~ /(\w+)/g) {
my $word = $1;
#say "\t$word ... (at ", pos $s, ")";
$pos_one_new = pos $s
if not $pos_one_new and $word =~ /\b$w1\b/i;
$two = $1 if $word =~ /\b($w2)\b/i;
if ( $two and (($three) = $word =~ /\b($w3)\b/i) ) {
say "$one + $two + $three (pos ", pos $s, ')';
next FIRST_WORD;
}
if (++$cnt == $word_range-1) {
last if not $pos_one_new;
#say "Scan window anew from $pos_one_new";
pos $s = $pos_one_new;
$pos_one = $pos_one_new;
$pos_one_new = 0;
$two = $three = '';
$cnt = 0;
}
}
pos $s = $pos_one;
}
This prints
expect + bad + weather (pos 113)
Note that the first occurrence of expect within the window is used.
Since you mentioned processing a document, I assume you're working with a long string of sentences. So you could have:
I am unsure why I always expect bad from people. Weather isn't an
indicator.
I assume that this would NOT be desirable to mark as an occurrence of the target phrase "expect bad weather".
You've already got one great answer that is purely regex oriented. You could easily fix the cross sentence phrase detection bug it features by splitting on the sentence, like I do here. Despite that, I thought I'd show another way to think about the problem.
The key concepts are tokenize and normalize.
First we turn the corpus into a list of sentences. That's tokenization level one.
Seconds we turn each sentence into a string of lower case words with punctuation (except apostrophe) removed. Tokenization level two and normalization.
Now all you have to do is sift through all your piles of tokens and see if any contain the target tokens.
I handle backtracking in a very lazy way by looping over the corpus text looking for places where we match the first word of our target. Where that happens, I grab up to the maximum number of words from the corpus and check to see if the target list is contained in that list. This gives a nice backtracking behavior without all the book-keeping.
use strict;
use warnings;
use feature 'say';
use Lingua::Sentence;
my $doc = "I am unsure why I always expect bad from people. Weather isn't an indicator. My mood is fine whether it is sunny or I expect to see some bad weather.";
my #targets = (
[qw/ expect bad weather /],
[qw/ my mood is bad /],
);
my $max_phrase_length = 12;
my $splitter = Lingua::Sentence->new('en');
my #sentences = $splitter->split_array( $doc );
my %counter;
for my $sentence ( #sentences ) {
say "Checking sentence: $sentence";
my #words = map lc, # Normalize to lowercase
map /(['\w]*)/, # get rid of punctuation
split /\s+/, $sentence; # Break sentence into words
for my $target ( #targets ) {
say " Checking target: #$target";
for my $i (0..$#words ) {
my $word = #words[$i];
say " Checking $word";
next if $word ne $target->[0];
my $first_word = $i;
my $last_word = $i + $max_phrase_length -1;
$last_word = $#words if $last_word > $#words;
if ( has_phrase( $target, [ #words[$first_word..$last_word] ] ) ) {
say "BINGO! #$target";
$counter{ "#$target" }++;
}
}
}
}
use Data::Dumper;
print Dumper \%counter;
sub has_phrase {
my ( $target, $text ) = #_;
return if #$target > $text;
my $match_idx = 0;
for my $idx ( 0..$#$text ) {
if ($target->[$match_idx] eq $text->[$idx]) {
$match_idx++;
return 1 if $match_idx eq #$target;
}
}
return;
}
Your requirements are a little vague to me. Like I don't know if you want to take in any sequence of words, and then count "expect .* bad .* weather", or if you want to only take 12 words and ignore the rest, or if you want to slide along a word at a time and never look at more than 12 words at a time.
I figured I'd simplify it:
I take the whole line input; I throw out whatever words aren't expect, bad, or weather; then I count the number of "expect bad weather" occurrences thereafter. If something says "expect bad bad weather" That's not a match. I'm sure you can modify this with your more exact requirements AS you understand them better than I.
while(<>){
$_=lc;
#w=split(/\W+/);
#w=map {
if (/expect/) {1}
elsif (/bad/) {2}
elsif (/weather/) {3}
else {0}
} #w;
$_ = join("", #w);
print;
#w=grep {+$_>0} #w;
$_ = join("", #w);
print "=>$_";
#r=/123/g;
print "=".scalar(#r)."\n";
}
Examples:
hi! Expect really bad weather man.
010230=>123=1
hi! Expect really bad weather man.hi! Expect really bad weather man.hi! Expect really bad weather man.
010230010230010230=>123123123=3
Expect expect bad weather, expect bad bad bad weather, expect bad expect weather.
1123122231213=>1123122231213=1
You can also sort of one-line this, but I think scalar(/123/g) means something different than #r=/123/g;scalar #r; so I put scalar(#_=/123/g).
$ perl -nE '$_=lc;$_=join("",grep{+$_>0}map{if(/expect/){1}elsif(/bad/){2}elsif(/weather/){3}else{0}}split(/\W+/));say scalar(#_=/123/g)."\n";'
hi! Expect really bad weather man.
1
hi! Expect really bad weather man. hi! Expect really bad weather man.
2
Expect Sad and Bad Weather today. Maybe Expect bad weather tomorrow too, because scalar is not helping.
2

How do I count the "real" words in a text with Perl?

I've run into a text processing problem. I've an article, and I'd like to find out how many "real" words there are.
Here is what I mean by "real". Articles usually contain various punctuation marks such as dashes, and commas, dots, etc. What I'd like to find out is how many words there are, skipping like "-" dashes and "," commas with spaces, etc.
I tried doing this:
my #words = split ' ', $article;
print scalar #words, "\n";
But that includes various punctuations that have spaces in them as words.
So I'm thinking of using this:
my #words = grep { /[a-z0-9]/i } split ' ', $article;
print scalar #words, "\n";
This would match all words that have characters or numbers in them. What do you think, would this be good enough way to count words in an article?
Does anyone know maybe of a module on CPAN that does this?
Try to use: \W - any non-word character, and also drop _
Solution
use strict;
my $article = 'abdc, dd_ff, 11i-11, ff44';
# case David's, but it didn't work with I'm or There's
$article =~ s/\'//g;
my $number_words = scalar (split /[\W_]+/, $article);
print $number_words;
I think your solution is about as good as you're going to get without resorting to something elaborate.
You could also write it as
my #words = $article =~ /\S*\w\S*/
or count the words in a file by writing
my $n = 0;
while (<>) {
my #words = /\S*\w\S*/g;
$n += #words;
}
say "$n words found";
Try a few sample blocks of text and look at the list of "words" that it finds. If you are happy with that then your code works.

Perl comparison operation between a variable and an element of an array

I am having quite a bit of trouble with a Perl script I am writing. I want to compare an element of an array to a variable I have to see if they are true. For some reason I cannot seem to get the comparison operation to work correctly. It will either evaluate at true all the time (even when outputting both strings clearly shows they are not the same), or it will always be false and never evaluate (even if they are the same). I have found an example of just this kind of comparison operation on another website, but when I use it it doesn't work. Am I missing something? Is the variable type I take from the file not a string? (Can't be an integer as far as I can tell as it is an IP address).
$ipaddress = '192.43.2.130'
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
print $address[0];
print $address[1];
}
for ($i = 0; $i < #address; $i++)
{
print "hello";
if ($address[$i] eq $ipaddress)
{print $address[$i];
$file = "server_$i";
print "I got here first";
goto SENDING;}
}
SENDING:
print " I am here";
I am pretty weak in Perl, so forgive me for any rookie mistakes/assumptions I may have made in my very meager bit of code. Thank you for you time.
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
You have several issues with this code here. First you should use strict because it would tell you that #address is being used before it's defined and you're also using numeric comparison on a string.
Secondly you aren't creating an array of the address in the file. You need to loop through the lines of the file to add each address:
my #address = ();
while( my $addr = <FH> ) {
chomp($addr); # removes the newline character
push(#address, $addr);
}
However you really don't need to push into an array at all. Just loop through the file and find the IP. Also don't use goto. That's what last is for.
while( my $addr = <FH> ) {
chomp($addr);
if( $addr eq $ipaddress ) {
$file = "server_$i";
print $addr,"\n";
print "I got here first"; # not sure what this means
last; # breaks out of the loop
}
}
When you're reading in from a file like that, you should use chomp() when doing a comparison with that line. When you do:
print $address[0];
print $address[1];
The output is on two separate lines, even though you haven't explicitly printed a newline. That's because $address[$i] contains a newline at the end. chomp removes this.
if ($address[$i] eq $ipaddress)
could read
my $currentIP = $address[$i];
chomp($currentIP);
if ($currentIP eq $ipaddress)
Once you're familiar with chomp, you could even use:
chomp(my $currentIP = $address[$i]);
if ($currentIP eq $ipaddress)
Also, please replace the goto with a last statement. That's perl's equivalent of C's break.
Also, from your comment on Jack's answer:
Here's some code you can use for finding how long it's been since a file was modified:
my $secondsSinceUpdate = time() - stat('filename.txt')->mtime;
You probably are having an issue with newlines. Try using chomp($address[$i]).
First of all, please don't use goto. Every time you use goto, the baby Jesus cries while killing a kitten.
Secondly, your code is a bit confusing in that you seem to be populating #address after starting the if($address[0] == '') statement (not to mention that that if should be if($address[0] eq '')).
If you're trying to compare each element of #address with $ipaddress for equality, you can do something like the following
Note: This code assumes that you've populated #address.
my $num_matches=0;
foreach(#address)
{
$num_matches++ if $_ eq $ipaddress;
}
if($num_matches)
{
#You've got a match! Do something.
}
else
{
#You don't have any matches. This may or may not be bad. Do something else.
}
Alternatively, you can use the grep operator to get any and all matches from #address:
my #matches=grep{$_ eq $ipaddress}#address;
if(#matches)
{
#You've got matches.
}
else
{
#Sorry, no matches.
}
Finally, if you're using a version of Perl that is 5.10 or higher, you can use the smart match operator (ie ~~):
if($ipaddress~~#address)
{
#You've got a match!
}
else
{
#Nope, no matches.
}
When you read from a file like that you include the end-of-line character (generally \n) in each element. Use chomp #address; to get rid of it.
Also, use last; to exit the loop; goto is practically never needed.
Here's a rather idiomatic rewrite of your code. I'm excluding some of your logic that you might need, but isn't clear why:
$ipaddress = '192.43.2.130'
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
while (<FH>) { # loop over the file, using the default input space
chomp; # remove end-of-line
last if ($_ eq $ipaddress); # a RE could easily be used here also, but keep the exact match
}
close(FH);
$file = "server_$."; # $. is the line number - it's not necessary to keep track yourself
print "The file is $file\n";
Some people dislike using perl's implicit variables (like $_ and $.) but they're not that hard to keep track of. perldoc perlvar lists all these variables and explains their usage.
Regarding the exact match vs. "RE" (regular expression, or regexp - see perldoc perlre for lots of gory details) -- the syntax for testing a RE against the default input space ($_) is very simple. Instead of
last if ($_ eq $ipaddress);
you could use
last if (/$ipaddress/);
Although treating an ip address as a regular expression (where . has a special meaning) is probably not a good idea.

Simplest way to match array of strings to search in perl?

What I want to do is check an array of strings against my search string and get the corresponding key so I can store it. Is there a magical way of doing this with Perl, or am I doomed to using a loop? If so, what is the most efficient way to do this?
I'm relatively new to Perl (I've only written 2 other scripts), so I don't know a lot of the magic yet, just that Perl is magic =D
Reference Array: (1 = 'Canon', 2 = 'HP', 3 = 'Sony')
Search String: Sony's Cyber-shot DSC-S600
End Result: 3
UPDATE:
Based on the results of discussion in this question, depending on your intent/criteria of what constitutes "not using a loop", the map based solution below (see "Option #1) may be the most concise solution, provided that you don't consider map a loop (the short version of the answers is: it's a loop as far as implementation/performance, it's not a loop from language theoretical point of view).
Assuming you don't care whether you get "3" or "Sony" as the answer, you can do it without a loop in a simple case, by building a regular expression with "or" logic (|) from the array, like this:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
Result from my test run: Sony
The regular expression will (once the variable $combined_search is interpolated by Perl) take the form /(Canon|HP|Sony)/ which is what you want.
This will NOT work as-is if any of the strings contain regex special characters (such as | or ) ) - in that case you need to escape them
NOTE: I personally consider this somewhat cheating, because in order to implement join(), Perl itself must do a loop somewhere inside the interpeter. So this answer may not satisfy your desire to remain loop-less, depending on whether you wanted to avoid a loop for performance considerations, of to have cleaner or shorter code.
P.S. To get "3" instead of "Sony", you will have to use a loop - either in an obvious way, by doing 1 match in a loop underneath it all; or by using a library that saves you from writing the loop yourself but will have a loop underneath the call.
I will provide 3 alternative solutions.
#1 option: - my favorite. Uses "map", which I personally still consider a loop:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
die "Not found" unless #which_found;
my $strings_index = 0;
my %strings_indexes = map {$_ => $strings_index++} #strings;
my $index = 1 + $strings_indexes{ $which_found[0] };
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#2 option: Uses a loop hidden behind a nice CPAN library method:
use List::MoreUtils qw(firstidx);
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
die "Not Found!"; unless #which_found;
print "$which_found[0]\n";
my $index_of_found = 1 + firstidx { $_ eq $which_found[0] } #strings;
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#3 option: Here's the obvious loop way:
my $found_index = -1;
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
foreach my $index (0..$#strings) {
next if $search_in !~ /$strings[$index]/;
$found_index = $index;
last; # quit the loop early, which is why I didn't use "map" here
}
# Check $found_index against -1; and if you want "3" instead of "2" add 1.
Here is a solution that builds a regular expression with embedded code to increment the index as perl moves through the regex:
my #brands = qw( Canon HP Sony );
my $string = "Sony's Cyber-shot DSC-S600";
use re 'eval'; # needed to use the (?{ code }) construct
my $index = -1;
my $regex = join '|' => map "(?{ \$index++ })\Q$_" => #brands;
print "index: $index\n" if $string =~ $regex;
# prints 2 (since Perl's array indexing starts with 0)
The string that is prepended to each brand first increments the index, and then tries to match the brand (escaped with quotemeta (as \Q) to allow for regex special characters in the brand names).
When the match fails, the regex engine moves past the alternation | and then the pattern repeats.
If you have multiple strings to match against, be sure to reset $index before each. Or you can prepend (?{$index = -1}) to the regex string.
An easy way is just to use a hash and regex:
my $search = "your search string";
my %translation = (
'canon' => 1,
'hp' => 2,
'sony' => 3
);
for my $key ( keys %translation ) {
if ( $search =~ /$key/i ) {
return $translation{$key};
)
}
Naturally the return can just as easily be a print. You can also surround the entire thing in a while loop with:
while(my $search = <>) {
#your $search is declared = to <> and now gets its values from STDIN or strings piped to this script
}
Please also take a look at perl's regex features at perlre
and take a look at perl's data structures at perlref
EDIT
as was just pointed out to me you were trying to steer away from using a loop. Another method would be to use perl's map function. Take a look here.
You can also take a look at Regexp::Assemble, which will take a collection of sub-regexes and build a single super-regex from them that can then be used to test for all of them at once (and gives you the text which matched the regex, of course). I'm not sure that it's the best solution if you're only looking at three strings/regexes that you want to match, but it's definitely the way to go if you have a substantially larger target set - the project I initially used it on has a library of some 1500 terms that it's matching against and it performs very well.