Exact pattern match using perl index() function - perl

I am trying to use the index() function and I want to find the position of a word inside a string, only when it is an exact match. For example:
My string is STRING="CATALOG SCATTER CAT CATHARSIS"
And my search string is KEY=CAT
I want to say something like index($STRING, $KEY) and check match for CAT, and not CATALOG. How do I accomplish this? The documentation says
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match.
which makes me think that it may not be that straight-forward, but my perl skills are limited :). Is it possible to do what I am trying to do?
Hopefully, I was able to articulate my question well. Thanks in advance for your help!

How about:
my $str = "CATALOG SCATTER CAT CATHARSIS";
my $key = "CAT";
if ($str =~ /\b$key\b/) {
say "match at char ",$-[0];;
} else {
say "no match";
}
output:
match at char 16

You need to learn about Regular Expressions in Perl. Perl didn't invent Regular Expressions, but tremendously expanded upon the concept. In fact, many other programming languages talk specifically about using Perl Regular Expressions.
A regular expression matches a specific word pattern. For example, /cat/ matches the sequence cat in a string.
if ( $string =~ /cat/ ) {
print "String contains the letters 'cat' in a row\n";
}
In many ways, this does the same thing as:
my $location = index ( $string, "cat" );
if ( $location =! -1 ) { # index returns -1 when substring isn't found
print "String contains the letters 'cat' in a row\n";
}
But, both of these would match:
"Don't let the cat out of the bag"
"The Sears catalog arrived in the mail"
You don't want to match the last. So, you could do this:
my $location = index $string, " cat ";
Now, index $string, " cat " won't match the word catalog. Case closed! Or is it? What about:
"cat and dog it doth rain."
Maybe you could check and say things are okay if a sentence starts with "cat":
if ( (index ($string, " cat ") != -1) or (index ($string, "cat") = 0) ) {
print "String contains the letters 'cat' in a row\n";
}
But, what about these?
"The word CAT in all uppercase"
"Stupid cat"
"Cat! Here Cat! Common Cat!": Punctuation after the word "cat"
"Don't let the 'cat' out of the 'bag'": Quotation Marks around "cat"
It could take dozens of lines to specify each and every one of these conditions.
However:
if ( $string =~ /\bcat\b/i ) {
print "String contains the word 'cat' in it\n";
}
Specifies each and every one -- and then some. The \b says this is a word boundary. This could be a space, a tab, a quote, the beginning or ending of a line. Thus /\bcat\b/ specifies that this should be the word cat and not catalog. The i on the end tells your regular expression to ignore case when matching, so you'll find Cat, cat, CAT, cAt, and all other possible combinations.
In fact, Perl's regular expressions is what made Perl such a popular language to begin with.
Fortunately, Perl comes with not one, but two tutorials on Regular Expressions:
perlretut: Perl Regular Expression Tutorial
perlrequick: Perl Regular Expression Quick Start.
Hope this helps.

That's (partial) solution of this problem with index:
use warnings;
use strict;
my $test = 'CATALOG SCATTER CAT CATHARSIS';
my $key = 'CAT';
my $k_length = length $key;
my $s_length = (length $test) - $k_length;
my $pos = -1;
while (($pos = index $test, $key, $pos + 1) > -1) {
if ($pos > 0) {
my $prev_char = substr $test, $pos - 1, 1;
### print "Previous character: '$prev_char'\n";
next if $prev_char ge 'A' && $prev_char le 'Z'
|| $prev_char ge 'a' && $prev_char le 'z';
}
if ($pos < $s_length) {
my $next_char = substr $test, $pos + $k_length, 1;
### print "Next character: '$next_char'\n";
next if $next_char ge 'A' && $next_char le 'Z'
|| $next_char ge 'a' && $next_char le 'z';
}
print "Word '$key' found at " . $pos + 1 . "th position.\n";
}
As you see, it's kinda wordy, because it uses basic Perl string functions - index and substr - only. Checking whether the substring found is indeed a word is done via checking its next and previous characters (if they exist): if they belong to either A-Z or a-z range, it's not a word.
You can simplify it a bit by trying to lowercase these characters (with lc), then check against the single character range only:
my $lc_prev_char = lc( substr $test, $pos - 1, 1 );
next if $lc_prev_char ge 'a' && $lc_prev_char le 'z';
... but then again, it's quite a minor improvement (if improvement at all).
Now consider this:
my $test = 'CATALOG SCATTER CAT CATHARSIS CAT';
my $key = 'CAT';
while ($test =~ /(?<![A-Za-z])$key(?![A-Za-z])/g) {
print "Word '$key' found at " . ($-[0] + 1) . "th position.\n";
}
... and that's it! The pattern literally tests the string given ($test) for the substring given ($key) not being either preceded with or followed by the symbol of A-Za-z range, and supporting Perl regex magic (this variable, in particular) makes it easy to get the starting position of such substring.
The bottom line: use regexes to do the regexes' work.

Regular expressions allow for the search to contain word boundaries as well as distinct characters. While
my $string = "CATALOG SCATTER CAT CATHARSIS";
index($string, 'CAT');
will return zero or greater if $string contains the characters CAT, a regular expression like
$string =~ /\bCAT\b/;
will return false as $string doesn't contain CAT preceded and followed by a word boundary. (A word boundary is either the beginning or end of the string, or between an word character and a non-word character. A word character is any alphanumeric character or an underscore.)

use \E value.
so :
#!usr/bin/perl
my $string ="Little Tony";
my $check = "Ton";
if($string =~ m/$check\E/g)
{
print "match";
}
else
{
die("No Match");
}

Related

How can I remove all the vowels unless they are in word beginnings?

$text = "I like apples more than oranges\n";
#words = split /” “/, $text;
foreach (#words) [1..] {
if $words "AEIOUaeiou";
$words =~ tr/A E I O U a e i o u//d;
}
print "$words\n";
"I like apples more than oranges" will become "I lk appls mr thn orngs". "I" in "I", "a" in "appls" and "o" in "orngs" will stay because they are the first letter in the word.
This is my research assignment as a first year student. I am allowed to ask questions and later cite them. Please don't be mean.
I know you say you are not allowed to use a regex, but for everyone else that shows up here I'll show the use of proper tools. But, then I'll do something just as useful with tr///.
One of the tricks of programming (and mathematics) decomposing what look like hard problems into easier problems, especially if you already have solutions for the easy problems. (Read about Parnas decomposition, for example).
So, the question is "How can I remove all the vowels unless they are in word beginnings?" (after I made your title a bit shorter). This led the answers to think about words, so they split up the input, did some work to ensure they weren't working on the first character, and then reassembled the result.
But, another way to frame the problem is "How do I remove all the vowels that come after another letter?". The only letter that doesn't come after another letter is the first letter of a word.
The regex for a vowel that comes after another letter is simple (but I'll stick to ASCII here, although it is just as simple for any Unicode letter):
[a-z][aeiou]
That only matches when there is a vowel after the first letter. Now you want to replace all of those with nothing. Use the substitution operator, s///. The /g flag makes all global substitutions and the /i makes it case insensitive:
s/[a-z][aeiou]//gi;
But, there's a problem. It also replaces that leading letter. That's easy enough to fix. The \K in a substitution says to ignore the part of the pattern before it in the replacement. Anything before the \K is not replaced. So, this only replaces the vowels:
s/[a-z]\K[aeiou]//gi;
But, maybe there are vowels next to each other, so throw in the + quantifier for "one or more" of the preceding item:
s/[a-z]\K[aeiou]+//gi;
You don't need to care about words at all.
Some other ways
Saying that a letter must follow another letter has a special zero-width assertion: the non-word boundary, \B (although that also counts digits and underscore as "letters"):
s/\B[aeiou]+//gi;
The \K was introduced v5.10 and was really a nifty trick to have a variable-width lookbehind. But, the lookbehind here is fixed width: it's one character:
s/(?<=[a-z])[aeiou]+//gi;
But, caring about words
Suppose you need to handle each word separately, for some other requirement. It looks like you've mixed a little Python-ish sort of code, and it would be nice if Perl could do that :). The problem doesn't change that much because you can do the same thing for each individual word.
foreach my $word ( split /\s+/, $x ) {
.... # same thing for each word
}
But, here's an interesting twist? How do you put it all back together? The other solutions just use a single space assuming that's the separator. Maybe there should be two spaces, or tabs, or whatever. The split has a special "separator retention mode" that can keep whatever was between the pieces. When you have captures in the split pattern, those capture values are part of the output list:
my #words_and_separators = split /(\s+)/, $x;
Since you know that none of the separators will have vowels, you can make substitutions on them knowing they won't change. This means you can treat them just like the words (that is, there is no special case, which is another thing to think about as you decompose problems). To get your final string with the original spacing, join on the empty string:
my $ending_string = join '', #words_and_separators;
So, here's how that might all look put together. I'll add the /r flag on the substitution so it returns the modified copy instead of working on the original (don't modify the control variable!):
my #words;
foreach my $word ( split /(\s+)/, $x ) {
push #words, $word =~ s/\B[aeiou]+//gr;
}
my $ending_string = join '', #words;
But, that foreach is a bit annoying. This list pipeline is the same, and it's easier to read these bottom to top. Each thing produces a list that flows into the thing above it. This is how I'd probably express it in real code:
my $ending_string =
join '',
map { s/\B[aeiou]+//gr } # each item is in $_
split /(\s+)/, $x;
Now, here's the grand finale. What if we didn't split thing up on whitespace but on whitespace and the first letter of each word? With separator retention mode we know that we only have to affect every other item, so we count them as we do the map:
my $n = 0;
my $ending_string =
join '',
map { ++$n % 2 ? tr/aeiouAEIOU//dr : $_ }
split /((?:^|\s+)[a-z])/i, $x;
But, I wouldn't write this technique in this way because someone would ultimately find me and exact their revenge. Instead, that foreach I found annoying before may soothe the angry masses:
my $n = 0;
foreach ( split /((?:^|\s+)[a-z])/i, $x ) {
print ++$n % 2 ? tr/aeiouAEIOU//dr : $_;
}
This now remembers the actual separators from the original string and leaves alone the first character of the "word" because it's not in the element we will modify.
The code in the foreach doesn't need to use the conditional operator, ?: or some of the other features. The important part is skipping every other element. That split pattern is a bit of a puzzler if you haven't seen it before, but that's what you get with those sorts of requirements. I think modifying a portion of the substring is just as likely to trip up people on a first read.
I mean, if they are going to make you do it the wrong way in the homework, strike back with something that will take up a bit of their time. :)
Oh, this is fun
I had another idea, because tr/// has another task beyond transliteration. It also counts. Because it returns the number of replacements, if you replace anything with itself, you get a count of the occurrences of that thing. You can count vowels, for instance:
my $has_vowels = $string =~ tr/aeiou/aeiou/; # counts vowels
But, with a string of one letter, that means you have a way to tell if it is a vowel:
my $is_vowel = substr( $string, $i, 1 ) =~ tr/aeiou/aeiou/;
You also can know things about the previous character:
my $is_letter = substr( $string, $i - 1, 1 ) =~ tr/a-zA-Z/a-zA-Z/;
Put that together and you can look at any position and know if it's a vowel that follows a letter. If so, you skip that letter. Otherwise, you add that letter to the output:
use v5.10;
$x = "I like apples more than oranges oooooranges\n";
my $output = substr $x, 0, 1; # avoid the -1 trap (end of string!)
for( my $i = 1; $i < length $x; $i++ ) {
if( substr( $x, $i, 1 ) =~ tr/aeiou/aeiou/ ) { # is a vowel
next if substr( $x, $i - 1, 1 ) =~ tr/a-zA-Z/a-zA-Z/;
}
$output .= substr $x, $i, 1;
}
say $output;
This has the fun consequence of using the recommended operator but completely bypassing the intent. But, this is a proper and intended use of tr///.
It appears that you need to put a little more effort into learning Perl before taking on challenges like this. Your example contains a lot of code that simply isn't valid Perl.
$x = "I like apples more than oranges\n"; #the original sentence
foreach $i in #x[1..] {
You assign your text to the scalar variable $x, but then try to use the array variable #x. In Perl, these are two completely separate variables that have no connection whatsoever. Also, in Perl, the range operator (..) needs values at both ends.
If you had an array called #x (and you don't, you have a scalar) then you could do what you're trying to do here with foreach $i (#x)
if $i "AEIOUaeiou";
I'm not sure what you're trying to do here. I guess the nearest useful Perl expression I can see would be something like:
if ($i =~ /^[AEIOUaeiou]$/)
Which would test if $i is a vowel. But that's a regex, so you're not allowed to use it.
Obviously, I'd solve this problem with a regex, but as those are banned, I've reached for some slightly more obscure Perl features in my code below (that's so your teacher won't believe this is your solution if you just cut and paste it):
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = "I like apples more than oranges\n";
# Split the string into an array of words
my #words = split /\s+/, $text;
# For each word...
for (#words) {
# Get a substring that omits the first character
# and use tr/// to remove vowels from that substring
substr($_, 1) =~ tr/AEIOUaeiou//d;
}
# Join the array back together
$text = join ' ', #words;
say $text;
Update: Oh, and notice that I've used tr/AEIUOaeiou//d where you have tr/A E I O U a e i o u//d. It probably won't make any difference here (depending on your approach - but you'll probably be applying it to strings that don't contain spaces) but it's good practice to only include the characters that you want to remove.
We can go over the input string from the end and remove any vowel that's not preceded by a space. We go from right to left so we don't have to adjust the position after each deletion. We don't need to check the very first letter, it shouldn't be ever removed. To remove a vowel, we can use tr///d on the substr of the original string.
for my $i (reverse 1 .. length $x) {
substr($x, $i, 1) =~ tr/aeiouAEIOU//d
if substr($x, $i - 1, 1) ne ' ';
}
Firstly your if statement is wrong.
Secondly this is not a Perl code.
Here is a piece of code that will work, but there is a better way to do it
my $x = "I like apples more than oranges\n";
my $new = "";
my #arr;
foreach my $word (split(' ', $x)) {
#arr = split('', $word);
foreach (my $i; $i<scalar #arr; $i++){
if ($i == 0){
$new .= $arr[$i];
}
elsif (index("AEIOUaeiou", $arr[$i]) == -1) {
$new .= $arr[$i];
}
}
$new .= " ";
}
print "$new\n";
Here I am splitting the string in order to get an array, then I am checking if the given char is a vowel, if it's not, I am appending it to a new string.
Always include
use strict;
use warnings;
on top of your code.
Clearly this is an exercise in lvalues. Obviously. Indubitably!
#!/usr/bin/env perl
# any old perl will do
use 5.010;
use strict;
use warnings;
# This is not idomatic nor fantastic code. Idiotastic?
$_='I am yclept Azure-Orange, queueing to close a query. How are YOU?';
# My little paws typed "local pos" and got
# "Useless localization of match position" :(
# so a busy $b keeps/restores that value
while (/\b./g) {
substr($_,$b=pos,/\b/g && -$b+pos)
# Suggestion to use tr is poetic, not pragmatic,
# ~ tr is sometimes y and y is sometimes a vowel
=~ y/aeiouAEIOU//d;
pos=$b;
}
say
# "say" is the last word.
Was there an embargo against using s/// substitution, or against using all regex? For some reason I thought matching was OK, just not substitution. If matches are OK, I have an idea that "improves" upon this by removing $b through pattern matching side effects. Will see if it pans out. If not, should be pretty easy to replace /\b/ and pos with index and variables, though the definition of word boundary over-simplifies in that case.
(edit) here it is a little more legible with nary a regex
my $text="YO you are the one! The-only-person- asking about double spaces.
Unfortunate about newlines...";
for (my $end=length $text;
$end > 0 && (my $start = rindex $text,' ',$end);
$end = $start-1) {
# y is a beautiful letter, using it for vowels is poetry.
substr($text,2+$start,$end-$start) =~ y/aeiouUOIEA//d;
}
say $text;
Maybe more devious minds will succeed with vec, unpack, open, fork?
You can learn about some of these techniques via
perldoc -f substr
perldoc -f pos
perldoc re
As for my own implementer notes, the least important thing is ending without punctuation so nothing can go after

Best way to parse string in perl

To achieve below task I have written below C like perl program (As I am new to Perl), But I am not sure if this is the best way to achieve.
Can someone please guide?
Note: Not with the full program, But where I can make improvement.
Thanks in advance
Input :
$str = "mail1, local<mail1#mail.local>, mail2#mail.local, <mail3#mail.local>, mail4 local<mail4#mail.local>"
Expected Output :
mail1, local<mail1#mail.local>
mail2#mail.local
<mail3#mail.local>
mail4, local<mail4#mail.local>
Sample Program
my $str="mail1, \#local<mail1\#mail.local>, mail2\#mail.local, <mail3\#mail.local>, mail4, local<mail4\#mail.local>";
my $count=0, #array, $flag=0, $tempStr="";
for my $c (split (//,$str)) {
if( ($count eq 0) and ($c eq ' ') ) {
next;
}
if($c) {
if( ($c eq ',') and ($flag eq 1) ) {
push #array, $tempStr;
$count=0;
$flag1=0;
$tempStr="";
next;
}
if( ($c eq '>' ) or ( $c eq '#' ) ) {
$flag=1;
}
$tempStr="$tempStr$c";
$count++;
}
}
if($count>0) {
push #array, $tempStr;
}
foreach my $var (#array) {
print "$var\n";
}
Edit:
Input:
Input is the output of above code.
Expected Output :
"mail1, local"<mail1#mail.local>
"mail4, local"<mail4#mail.local>
Sample Code:
$str =~ s/([^#>]+[#>][^,]+),\s*/$1\n/g;
my #addresses = split('\n',$str);
if(scalar #addresses) {
foreach my $address (#addresses) {
if (($address =~ /</) and ($address !~ /\"/) and ($address !~ /^</)){
$address="\"$address";
$address=~ s/</\"</g;
}
}
$str = join(',',#addresses);
}
print "$str\n";
As I see, you want to replace each:
comma and following spaces,
occurring after either # or >,
with a newline.
To make such replacement, instead of writing a parsing program, you can use
a regex.
The search part can be as follows:
([^#>]+[#>][^,]+),\s*
Details:
( - Start of the 1st capturing group.
[^#>]+ - A non-empty sequence of chars other than # or >.
[#>] - Either # or >.
[^,]+ - A non-empty sequence of chars other than a comma.
) - End of the 1st capturing group.
,\s* - A comma and optional sequence of spaces.
The replace part should be:
$1 - The 1st capturing group.
\n - A newline.
So the whole program, much shorter than yours, can be as follows:
my $str='mail1, local<mail1#mail.local>, mail2#mail.local, <mail3#mail.local>, mail4, local<mail4#mail.local>';
print "Before:\n$str\n";
$str =~ s/([^#>]+[#>][^,]+),\s*/$1\n/g;
print "After:\n$str\n";
To replace all needed commas I used g option.
Note that I put the source string in single quotes, otherwise Perl
would have complained about Possible unintended interpolation of #mail.
Edit
Your modified requirements must be handled different way.
"Ordinary" replacement is not an option, because now there are some
fragments to match and some framents to ignore.
So the basic idea is to write a while loop with a matching regex:
(\w+),?\s+(\w+)(<[^>]+>), meaning:
(\w+) - First capturing group - a sequence of word chars (e.g. mail1).
,?\s+ - Optional comma and a sequence of spaces.
(\w+) - Second capturing group - a sequence of word chars (e.g. local).
(<[^>]+>) - Third capturing group - a sequence of chars other than >
(actual mail address), enclosed in angle brackets, e.g. <mail1#mail.local>.
Within each execution of the loop you have access to the groups
captured in this particular match ($1, $2, ...).
So the content of this loop is to print all these captured groups,
with required additional chars.
The code (again much shorter than yours) should look like below:
my $str = 'mail1, local<mail1#mail.local>, mail2#mail.local, <mail3#mail.local>, mail4 local<mail4#mail.local>';
while ($str =~ /(\w+),?\s+(\w+)(<[^>]+>)/g) {
print "\"$1, $2\"$3\n";
}
Here is an approach using split, which in this case also needs a careful regex
use warnings;
use strict;
use feature 'say';
my $string = # broken into two parts for readabililty
q(mail1, local<mail1#mail.local>, mail2#mail.local, )
. q(<mail3#mail.local>, mail4, local<mail4#mail.local>);
my #addresses = split /#.+?\K,\s*/, $string;
say for #addresses;
The split takes a full regex in its delimiter specification. In this case I figure that each record is delimited by a comma which comes after the email address, so #.+?,
To match a pattern only when it is preceded by another brings to mind a negative lookbehind before the comma. But those can't be of variable length, which is precisely the case here.
We can instead normally match the pattern #.+? and then use the \K form (of the lookbehind) which drops all previous matches so that they are not taken out of the string. Thus the above splits on ,\s* when that is preceded by the email address, #... (what isn't consumed).
It prints
mail1, local<mail1#mail.local>
mail2#mail.local
<mail3#mail.local>
mail4, local<mail4#mail.local>
The edit asks about quoting the description preceding <...> when it's there. A simple way is to make another pass once addresses have been parsed out of the string as above. For example
my #addresses = split /#.+?\K,\s*/, $string; #/ stop syntax highlight
s/(.+?,\s*.+?)</"$1"</ for #addresses;
say for #addresses;
The regex in a loop is one way to change elements of an array. I use it for its efficiency (changes elements in place), conciseness, and as a demonstration of the following properties.
In a foreach loop the index variable (or $_) is an alias for the currently processed element – so changing it changes that element. This is a known source of bugs when allowed unknowingly, which was another reason to show it in the above form.
The statement also uses the statement modifier and it is equivalent to
foreach my $elem (#addresses) {
$elem =~ s/(.+?,\s*.+?)</"$1"</;
}
This is often considered a more proper way to write it but I find that the other form emphasizes more clearly that elements are being changed, when that is the sole purpose of the foreach.

Finding index of white space in Perl

I'm trying to find the index of white space in a string in Perl.
For example, if I have the string
stuff/more stuffhere
I'd like to select the word "more" with a substring method. I can find the index of "/" but haven't figured out how to find the index of white space. The length of the substring I'm trying to select will vary, so I can't hard code the index. There will only be one white space in the string (other than those after the end of the string).
Also, if anybody has any better ideas of how to do this, I'd appreciate hearing them. I'm fairly new to programming so I'm open to advice. Thanks.
Just use index:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'stuff/more stuffhere';
my $index_of_slash = index $string, '/';
my $index_of_space = index $string, ' ';
say "Between $index_of_slash and $index_of_space.";
The output is
Between 5 and 10.
Which is correct:
0 1
01234567890123456789
stuff/more stuffhere
If by "whitespace" you also mean tabs or whatever, you can use a regular expression match and the special variables #- and #+:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = "stuff/more\tstuffhere";
if ($string =~ m{/.*(?=\s)}) {
say "Between $-[0] and $+[0]";
}
The (?=\s) means is followed by a whitespace character, but the character itself is not part of the match, so you don't need to do any maths on the returned values.
As you stated, you want to select the word between the first /
and the first space following it.
If this is the case, you maybe don't need any index (you need just
the word).
A perfect tool to find something in a text is regex.
Look at the following code:
$txt = 'stuff/more stuffxx here';
if ($txt =~ /\/(.+?) /) {
print "Match: $1.\n";
}
The regex used tries to match:
a slash,
a non-empty sequence of any chars (note ? - reluctant
version), enclosed in a capturing group,
a space.
So after the match $1 contains what was captured by the first
capturing group, i.e. "your" word.
But if for any reason you were interested in starting and ending
offsets to this word, you can read them from $-[1]
and $+[1] (starting / ending indices of the first capturing group).
The arrays #- (#LAST_MATCH_START) and #+ (#LAST_MATCH_END) give offsets of the start and end of last successful submatches. See Regex related variables in perlvar.
You can capture your real target, and then read off the offset right after it with $+[0]
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against.
Example
my $str = 'target and target with spaces';
while ($str =~ /(target)\s/g)
{
say "Position after match: $+[0]"
}
prints
Position after match: 7
Position after match: 18
These are positions right after 'target', so of spaces that come after it.
Or you can capture \s instead and use $-[1] + 1 (first position of the match, the space).
You can use
my $str = "stuff/more stuffhere";
if ($str =~ m{/\K\S+}) {
... substr($str, $-[0], $+[0] - $-[0]) ...
}
But why substr? That's very weird there. Maybe if you told us what you actually wanted to do, we could provide a better alternatives. Here are three cases:
Data extraction:
my $str = "stuff/more stuffhere";
if ( my ($word) = $str =~ m{/(\S+)} ) {
say $word; # more
}
Data replacement:
my $str = "stuff/more stuffhere";
$str =~ s{/\K\S+}{REPLACED};
say $str; # stuff/REPLACED stuffhere
Data replacement (dynamic):
my $str = "stuff/more stuffhere";
$str =~ s{/\K(\S+)}{ uc($1) }e;
say $str; # stuff/MORE stuffhere

Extracting alphanumeric phrase from a string

Trying to extract the alphanumeric characters from this string:
A_phase_I-II,_open-req_project_id_PX15RAD001
The problem is: the term PX15RAD001 can occur anywhere in the string.
Trying to extract the alpha-numeric part using the below expression. But this returns the entire string. I thought Alum was a valid keyword for alpha-numerics. Is that not the case?
(my $string = $line ) =~ s/\P{Alnum}//g;
print $string;
How can I extract the alphanumeric part of the afore mentioned string?
Thanks in advance.
-simak
At the end as per your input:
> echo "A_phase_I-II,_open-req_project_id_PX15RAD001"|perl -lne 'print $1 if(/id_([A-Z0-9]*)/)'
PX15RAD001
In the middle:
> echo "A_phase_I-II,_open-req_id_PX15RAD001_project" | perl -lne 'print $1 if(/id_([A-Z0-9]*)/)'
PX15RAD001
or in your terms:
$line=~m/id_([A-Z0-9]*)/g;
print $1;
Here are some testcases, produced with the comments of #Vijay s Answer:
my #line = (
'A_phase_I-II,_open-req_project_id_PX15RAD001',
'_PX15RAD001_A_phase_I-II,_open-req_project_id',
'A_pha3333se_I-II,_ope_PX15RAD001_n-req_project',
'A_phase_I-II,_PX15RAD001_open-req_projec123123123t_id',
'A_phase_I-II_PX15RAD001_roject_id'
);
foreach my $string ( #line ) {
$string =~ m{_([^_]{10})_?}g;
print $1 . "\n" if $1;
}
These kinds of questions are hard to answer because there is not enough information. What information we have is:
You say your target string is "alphanumeric", but the entire input string is alphanumeric, except for some punctuation, so that really doesn't tell us anything.
You say it is 12 characters long, but the sample you show is 10 characters long.
You seem to think that "alphanumeric" does not include underscore.
So, the reliable information I can sense from you is:
Target string is always delimited by underscore _
Target string is 10-12 characters, all alphanumeric except underscore.
The "reliable" solution based on this rather skimpy information is:
my $str = "A_phase_I-II,_open-req_project_id_PX15RAD001";
for my $field (split /_/, $str) {
if (length($field) <= 12 and
length($field) >= 10 and # field is 10-12 characters
$field !~ /\W/) { # and contains no non-alphanumerics
# do something
}
}
By splitting on underscore, we can easily isolate each field in the string and perform simpler tests on it, such as the ones above.

grep in perl without array

If I have one variable : I assigned entire file text to it
$var = `cat file_name`
Suppose in the file , the word 'mine' comes in 17th line (location is not available but just giving example) and I want to search a pattern 'word' after N (eg 10) lines of word 'mine' if pattern 'word' exist in those lines or not. How can i do that in the regular expression without using array'
Example:
$var = "I am good in perl\n but would like to know about the \n grep command in details";
I want to search particular pattern in specific lines (lines 2 to 3 only). How can I do it without using array.
There is a valid case for not using arrays here - when files are prohibitively large.
This is a pretty specific requirement. Rather than beat around the bush to find that Perl idiom, I'd prescribe a subroutine:
sub n_lines_apart {
my ( $file, $n, $first_pattern, $second_pattern ) = #_;
open my $fh, '<', $file or die $!;
my $lines_apart;
while (<$fh>) {
$lines_apart++ if qr/$first_pattern/ .. qr/$second_pattern/;
}
return $lines_apart && $lines_apart <= $n+1;
}
Caveat
The sub above is not designed to handle multiple matches in a single file. Let that be an exercise for the reader.
You can do this with a regular expression match like this:
my $var = `cat $filename`;
while ( $var =~ /foo/g ) {
print $1, "\n";
print "match occurred at position ", pos($var), " in the string.\n";
}
This will print out all the matches of the string 'foo' from your string, similar to grep but not using an array (or list). The /$regexp/g syntax makes the regular expression iteratively match against the string from left to right.
I'd recommend reading perlrequick for a tutorial on matching with regular expressions.
Try this:
perl -ne '$m=$. if !$m && /first-pattern/;
print if $m && ($.-$m >= 2 && $.-$m <= 3) && /second-pattern/'