using Regexp::Grammars what does (*COMMIT) do?

using Regexp::Grammars what does (*COMMIT) do? - perl

I'm looking at an example for Regexp::Grammars. The example has a comment around *COMMIT stating about how it will improve the error messages. I can't find any documentation on *COMMIT. What does it do ? I've included part of the example below:
use Regexp::Grammars;
qr{
\A
<Answer>
(*COMMIT) # <-- Remove this to see the error messages get less accurate
(?:
\Z
|
<warning: (?{ "Extra junk after expression at index $INDEX: '$CONTEXT'" })>
<warning: Expected end of input>
<error:>
)
<rule: Answer>
<[_Operand=Mult]>+ % <[_Op=(\+|\-)]>
(?{ $MATCH = shift #{$MATCH{_Operand}};
for my $term (#{$MATCH{_Operand}}) {
my $op = shift #{$MATCH{_Op}};
if ($op eq '+') { $MATCH += $term; }
else { $MATCH -= $term; }
}
})
|
<error: Expected valid arithmetic expression>

(*COMMIT) is documented in perlre.
(*COMMIT) is useful in causing an whole alternation to fail when one of the branch fails after reaching a certain point.
$ perl -E'
say "$_: ", /
^
(?: a (*COMMIT) b
| c (*COMMIT) d
| . z
)
/sx ?1:0
for qw( ab cd az yz );
'
ab: 1
cd: 1
az: 0
yz: 1
You could have written the following, but it could be far less efficient and far harder to write in more complex examples:
/
^
(?: a b
| c d
| [^ac] z
)
/x

It's a backtracking control verb, described in perlre
Essentially it forces a regex to fail if a later part of the pattern mismatches and in a way that would cause the regex engine to backtrack into it

Related

perl, matching balanced parens using .Net regex

I needed some perl code to match balanced parens in a string.
so I found this regular expresion code below from .Net and pasted it into my Perl program thinking the regex engine was similar enough for it to work:
/
\s*\(
(?: [^\(\)] | (?<openp>\() | (?<-openp>\)) )+
(?(openp)(?!))
\)\s*
/x
My understanding of how this regex works is a follows:
Match first paren:
\(
Match pattern a, b, or c at least once:
(?: <a> | <b> | <c>)+
where a, b, and c are:
a is any character that is not a paren
[^\(\)]
b is character that is a left-paren
\(
c is character that is a right-paren
\)
and:
b is a capture group that pushes to named capture "openp"
(?<openp>\()
c is a capture group that pops from named capture "openp"
(?<openp>\()
reject any regular expresssion match where openp doesn't equal zero items on stack:
(?<-openp>\))
4. match end paren
\)
Here's the perl code:
sub eat_parens($) {
my $line = shift;
if ($line !~ /
\s*\(
(?: [^\(\)] | (?<openp>\() | (?<-openp>\)) )+
(?(openp)(?!))
\)\s*
/x)
{
return $line;
}
return $';
}
sub testit2 {
my $t1 = "(( (sdfasd)sdfsas (sdfasd) )sadf) ()";
$t2 = eat_parens($t1);
print "t1: $t1\n";
print "t2: $t2\n";
}
testit2();
Error is:
$ perl x.pl
Sequence (?<-...) not recognized in regex; marked by <-- HERE in m/\s*\((?: [^\(\)] | (?<openp> \( ) | (?<- <-- HERE openp> \) ) )+ (?(openp)(?!) ) \) \s*/ at x.pl line 411.
Not sure what's causing this.... any ideas?

Here's one way to do it:
/
(?&TEXT)
(?(DEFINE)
(?<TEXT>
[^()]*+
(?: \( (?&TEXT) \)
[^()]*+
)*+
)
)
/x
It can also be done without naming anything. Search for "recursive" in perlre.

Why does multiple use of `<( )>` token within `comb` not behave as expected?

I want to extract the row key(here is 28_2820201112122420516_000000), the column name(here is bcp_startSoc), and the value(here is 64.0) in $str, where $str is a row from HBase:
# `match` is OK
my $str = '28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0';
my $match = $str.match(/^ ([\d+]+ % '_') \s 'column=d:' (\w+) ',' \s timestamp '=' \d+ ',' \s 'value=' (<-[=]>+) $/);
my #match-result = $match».Str.Slip;
say #match-result; # Output: [28_2820201112122420516_000000 bcp_startSoc 64.0]
# `smartmatch` is OK
# $str ~~ /^ ([\d+]+ % '_') \s 'column=d:' (\w+) ',' \s timestamp '=' \d+ ',' \s 'value=' (<-[=]>+) $/
# say $/».Str.Array; # Output: [28_2820201112122420516_000000 bcp_startSoc 64.0]
# `comb` is NOT OK
# A <( token indicates the start of the match's overall capture, while the corresponding )> token indicates its endpoint.
# The <( is similar to other languages \K to discard any matches found before the \K.
my #comb-result = $str.comb(/<( [\d+]+ % '_' )> \s 'column=d:' <(\w+)> ',' \s timestamp '=' \d+ ',' \s 'value=' <(<-[=]>+)>/);
say #comb-result; # Expect: [28_2820201112122420516_000000 bcp_startSoc 64.0], but got [64.0]
I want comb to skip some matches, and just match what i wanted, so i use multiple <( and )> here, but only get the last match as result.
Is it possible to use comb to get the same result as match method?

TL;DR Multiple <(...)>s don't mean multiple captures. Even if they did, .comb reduces each match to a single string in the list of strings it returns. If you really want to use .comb, one way is to go back to your original regex but also store the desired data using additional code inside the regex.
Multiple <(...)>s don't mean multiple captures
The default start point for the overall match of a regex is the start of the regex. The default end point is the end.
Writing <( resets the start point for the overall match to the position you insert it at. Each time you insert one and it gets applied during processing of a regex it resets the start point. Likewise )> resets the end point. At the end of processing a regex the final settings for the start and end are applied in constructing the final overall match.
Given that your code just unconditionally resets each point three times, the last start and end resets "win".
.comb reduces each match to a single string
foo.comb(/.../) is equivalent to foo.match(:g, /.../)>>.Str;.
That means you only get one string for each match against the regex.
One possible solution is to use the approach #ohmycloudy shows in their answer.
But that comes with the caveats raised by myself and #jubilatious1 in comments on their answer.
Add { #comb-result .push: |$/».Str } to the regex
You can workaround .comb's normal functioning. I'm not saying it's a good thing to do. Nor am I saying it's not. You asked, I'm answering, and that's it. :)
Start with your original regex that worked with your other solutions.
Then add { #comb-result .push: |$/».Str } to the end of the regex to store the result of each match. Now you will get the result you want.

$str.comb( / ^ [\d+]+ % '_' | <?after d\:> \w+ | <?after value\=> .*/ )

Since you have a comma-separated 'row' of information you're examining, you could try using split() to break your matches up, and assign to an array. Below in the Raku REPL:
> my $str = '28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0';
28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0
> my #array = $str.split(", ")
[28_2820201112122420516_000000 column=d:bcp_startSoc timestamp=1605155065124 value=64.0]
> dd #array
Array #array = ["28_2820201112122420516_000000 column=d:bcp_startSoc", "timestamp=1605155065124", "value=64.0"]
Nil
> say #array.elems
3
Match on individual elements of the array:
> say #array[0] ~~ m/ ([\d+]+ % '_') \s 'column=d:' (\w+) /;
｢28_2820201112122420516_000000 column=d:bcp_startSoc｣
0 => ｢28_2820201112122420516_000000｣
1 => ｢bcp_startSoc｣
> say #array[0] ~~ m/ ([\d+]+ % '_') \s 'column=d:' <(\w+)> /;
｢bcp_startSoc｣
0 => ｢28_2820201112122420516_000000｣
> say #array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /;
｢bcp_startSoc｣
Boolean tests on matches to one-or-more array elements:
> say True if ( #array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /)
True
> say True if ( #array[2] ~~ m/ 'value=' <(<-[=]>+)> / )
True
> say True if ( #array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /) & ( #array[2] ~~ m/ 'value=' <(<-[=]>+)> / )
True
HTH.

Perl: How to extract a string between brackets

I have a file in moinmoin text format:
* [[ Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[ Abiword Wordprocessor]] (2010/10/27 20:17)
* [[ Sylpheed E-Mail]] (2010/03/30 21:49)
* [[ Kupfer]] (2010/05/16 20:18)
All the words between the '[[' and ']]' are the short description of the entry. I need to extract the whole entry, but not each individual word.
I found an answer for a similar question here: https://stackoverflow.com/a/2700749/819596
but can't understand the answer: "my #array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;"
Anything that works will be accepted but explanations would help greatly, ie: what (?0) or /xg does.

The code probably will look like this:
use warnings;
use strict;
my #subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/
\[ \[ # matching two `[` signs
\s* # ... and, if any, whitespace after them
([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
]]
/x;
# main processing loop:
while (<DATA>) { # reading the source file line by line
if (/$pattern/) { # if line is matched by our pattern
push #subjects, $1; # ... push the captured group of symbols into our array
}
}
print $_, "\n" for #subjects; # print our array of subject line by line
__DATA__
* [[ Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[ Abiword Wordprocessor]] (2010/10/27 20:17)
* [[ Sylpheed E-Mail]] (2010/03/30 21:49)
* [[ Kupfer]] (2010/05/16 20:18)
As I see, what you need can be described as follows: in each line of file try to find this sequence of symbols...
[[, an opening delimiter,
then 0 or more whitespace symbols,
then all the symbols that make a subject (which should be saved),
then ]], a closing delimiter
As you see, this description quite naturally translates into a regex. The only thing that is probably not needed is /x regex modifier, which allowed me to extensively comment it. )

If the text will never contain ], you can simply use the following as previously recommended:
/\[\[ ( [^\]]* ) \]\]/x
The following allows ] in the contained text, but I recommend against incorporating it into a larger pattern:
/\[\[ ( .*? ) \]\]/x
The following allows ] in the contained text, and is the most robust solution:
/\[\[ ( (?:(?!\]\]).)* ) \]\]/x
For example,
if (my ($match) = $line =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/x) {
print "$match\n";
}
or
my #matches = $file =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/xg;
/x: Ignore whitespace in pattern. Allows spaces to be added to make the pattern readable without changing the meaning of the pattern. Documented in perlre.
/g: Find all matches. Documented in perlop.
(?0) was used to make the pattern recursive since the linked node had to deal with arbitrary nesting of curlies. * /g: Find all matches. Documented in perlre.

\[\[(.*)]]
\[ is a literal [,
] is a literal ],
.* means every sequence of 0 or more character,
something enclosed in parentheses is a capturing group, hence you can access it later in your script with $1 (or $2 .. $9 depending on how many groups you have).
Put all together you will match two [ then everything up to the last occurrence of two successive ]
Update
On a second read of your question I suddenly are confused, do you need the content between [[ and ]], or the whole line - in that case leave the parentheses out completely and just test if the pattern matches, no need to capture.

The answer you found is for recursive pattern matching, that i think you don't need.
/x allows to use meaningless spaces and comments in the regexp.
/g runs the regexp through all the string. Without it runs only till the first match.
/xg is /x and /g combined.
(?0) runs the regexp itself again (recursion)
If i understand ok, you need something like this:
$text="* [[ Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[ Abiword Wordprocessor]] (2010/10/27 20:17)
* [[ Sylpheed E-Mail]] (2010/03/30 21:49)
* [[ Kupfer]] (2010/05/16 20:18)
";
#array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",#array);
# this prints " Virtualbox Guest Additions, Abiword Wordprocessor, Sylpheed E-Mail, Kupfer"

I would recommend using "extract_bracketed" or "extract_delimited" from module Text::Balanced - see here: http://perldoc.perl.org/Text/Balanced.html

perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp
tested below:
> cat temp
* [[ Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[ Abiword Wordprocessor]] (2010/10/27 20:17)
* [[ Sylpheed E-Mail]] (2010/03/30 21:49)
* [[ Kupfer]] (2010/05/16 20:18)
>
> perl -pe 's/.*\[\[(.*)\]\].*/\1/g' temp
Virtualbox Guest Additions
Abiword Wordprocessor
Sylpheed E-Mail
Kupfer
>
s/.[[(.)]].*/\1/g
.*[[->match any charater till [[
(.*)]] store any charater after the string "[[" till "]]" in \1
.*->matching the rest of the line.
then since we have our data in \1 we can simply use it for printing on the console.

my #array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;
The 'x' flag means that whitespace is ignored in the regex, to allow for a more readable expression. The 'g' flag means that the result will be a list of all matches from left to right (match *g*lobally).
The (?0) represents the regular expression inside the first group of parentheses. It's a recursive regular expression, equivalent to a set of rules such as:
E := '{' ( NoBrace | E) '}'
NoBrace := [^{}]*

How to quickly find and replace many items on a list without replacing previously replaced items in BASH?

I want to perform about many find and replace operations on some text. I have a UTF-8 CSV file containing what to find (in the first column) and what to replace it with (in the second column), arranged from longest to shortest.
E.g.:
orange,fruit2
carrot,vegetable1
apple,fruit3
pear,fruit4
ink,item1
table,item2
Original file:
"I like to eat apples and carrots"
Resulting output file:
"I like to eat fruit3s and vegetable1s."
However, I want to ensure that if one part of text has already been replaced, that it doesn't mess with text that was already replaced. In other words, I don't want it to appear like this (it matched "table" from within vegetable1):
"I like to eat fruit3s and vegeitem21s."
Currently, I am using this method which is quite slow, because I have to do the whole find and replace twice:
(1) Convert the CSV to three files, e.g.:
a.csv b.csv c.csv
orange 0001 fruit2
carrot 0002 vegetable1
apple 0003 fruit3
pear 0004 fruit4
ink 0005 item1
table 0006 item 2
(2) Then, replace all items from a.csv in file.txt with the matching column in b.csv, using ZZZ around the words to make sure there is no mistake later in matching the numbers:
a=1
b=`wc -l < ./a.csv`
while [ $a -le $b ]
do
for i in `sed -n "$a"p ./b.csv`; do
for j in `sed -n "$a"p ./a.csv`; do
sed -i "s/$i/ZZZ$j\ZZZ/g" ./file.txt
echo "Instances of '"$i"' replaced with '"ZZZ$j\ZZZ"' ("$a"/"$b")."
a=`expr $a + 1`
done
done
done
(3) Then running this same script again, but to replace ZZZ0001ZZZ with fruit2 from c.csv.
Running the first replacement takes about 2 hours, but as I must run this code twice to avoid editing the already replaced items, it takes twice as long. Is there a more efficient way to run a find and replace that does not perform replacements on text already replaced?

Here's a perl solution which is doing the replacement in "one phase".
#!/usr/bin/perl
use strict;
my %map = (
orange => "fruit2",
carrot => "vegetable1",
apple => "fruit3",
pear => "fruit4",
ink => "item1",
table => "item2",
);
my $repl_rx = '(' . join("|", map { quotemeta } keys %map) . ')';
my $str = "I like to eat apples and carrots";
$str =~ s{$repl_rx}{$map{$1}}g;
print $str, "\n";

Tcl has a command to do exactly this: string map
tclsh <<'END'
set map {
"orange" "fruit2"
"carrot" "vegetable1"
"apple" "fruit3"
"pear" "fruit4"
"ink" "item1"
"table" "item2"
}
set str "I like to eat apples and carrots"
puts [string map $map $str]
END
I like to eat fruit3s and vegetable1s
This is how to implement it in bash (requires bash v4 for the associative array)
declare -A map=(
[orange]=fruit2
[carrot]=vegetable1
[apple]=fruit3
[pear]=fruit4
[ink]=item1
[table]=item2
)
str="I like to eat apples and carrots"
echo "$str"
i=0
while (( i < ${#str} )); do
matched=false
for key in "${!map[#]}"; do
if [[ ${str:$i:${#key}} = $key ]]; then
str=${str:0:$i}${map[$key]}${str:$((i+${#key}))}
((i+=${#map[$key]}))
matched=true
break
fi
done
$matched || ((i++))
done
echo "$str"
I like to eat apples and carrots
I like to eat fruit3s and vegetable1s
This will not be speedy.
Clearly, you may get different results if you order the map differently. In fact, I believe the order of "${!map[#]}" is unspecified, so you might want to specify the order of the keys explicitly:
keys=(orange carrot apple pear ink table)
# ...
for key in "${keys[#]}"; do

One way to do it would be to do a two-phase replace:
phase 1:
s/orange/##1##/
s/carrot/##2##/
...
phase 2:
s/##1##/fruit2/
s/##2##/vegetable1/
...
The ##1## markers should be chosen so that they don't appear in the original text or the replacements of course.
Here's a proof-of-concept implementation in perl:
#!/usr/bin/perl -w
#
my $repls = $ARGV[0];
die ("first parameter must be the replacement list file") unless defined ($repls);
my $tmpFmt = "###%d###";
open(my $replsFile, "<", $repls) || die("$!: $repls");
shift;
my #replsList;
my $i = 0;
while (<$replsFile>) {
chomp;
my ($from, $to) = /\"([^\"]*)\",\"([^\"]*)\"/;
if (defined($from) && defined($to)) {
push(#replsList, [$from, sprintf($tmpFmt, ++$i), $to]);
}
}
while (<>) {
foreach my $r (#replsList) {
s/$r->[0]/$r->[1]/g;
}
foreach my $r (#replsList) {
s/$r->[1]/$r->[2]/g;
}
print;
}

I would guess that most of your slowness is coming from creating so many sed commands, which each need to individually process the entire file. Some minor adjustments to your current process would speed this up a lot by running 1 sed per file per step.
a=1
b=`wc -l < ./a.csv`
while [ $a -le $b ]
do
cmd=""
for i in `sed -n "$a"p ./a.csv`; do
for j in `sed -n "$a"p ./b.csv`; do
cmd="$cmd ; s/$i/ZZZ${j}ZZZ/g"
echo "Instances of '"$i"' replaced with '"ZZZ${j}ZZZ"' ("$a"/"$b")."
a=`expr $a + 1`
done
done
sed -i "$cmd" ./file.txt
done

Doing it twice is probably not your problem. If you managed to just do it once using your basic strategy, it would still take you an hour, right? You probably need to use a different technology or tool. Switching to Perl, as above, might make your code a lot faster (give it a try)
But continuing down the path of other posters, the next step might be pipelining. Write a little program that replaces two columns, then run that program twice, simultaneously. The first run swaps out strings in column1 with strings in column2, the next swaps out strings in column2 with strings in column3.
Your command line would be like this
cat input_file.txt | perl replace.pl replace_file.txt 1 2 | perl replace.pl replace_file.txt 2 3 > completely_replaced.txt
And replace.pl would be like this (similar to other solutions)
#!/usr/bin/perl -w
my $replace_file = $ARGV[0];
my $before_replace_colnum = $ARGV[1] - 1;
my $after_replace_colnum = $ARGV[2] - 1;
open(REPLACEFILE, $replace_file) || die("couldn't open $replace_file: $!");
my #replace_pairs;
# read in the list of things to replace
while(<REPLACEFILE>) {
chomp();
my #cols = split /\t/, $_;
my $to_replace = $cols[$before_replace_colnum];
my $replace_with = $cols[$after_replace_colnum];
push #replace_pairs, [$to_replace, $replace_with];
}
# read input from stdin, do swapping
while(<STDIN>) {
# loop over all replacement strings
foreach my $replace_pair (#replace_pairs) {
my($to_replace,$replace_with) = #{$replace_pair};
$_ =~ s/${to_replace}/${replace_with}/g;
}
print STDOUT $_;
}

A bash+sed approach:
count=0
bigfrom=""
bigto=""
while IFS=, read from to; do
read countmd5sum x < <(md5sum <<< $count)
count=$(( $count + 1 ))
bigfrom="$bigfrom;s/$from/$countmd5sum/g"
bigto="$bigto;s/$countmd5sum/$to/g"
done < replace-list.csv
sed "${bigfrom:1}$bigto" input_file.txt
I have chosen md5sum, to get some unique token. But some other mechanism can also be used to generate such token; like reading from /dev/urandom or shuf -n1 -i 10000000-20000000

A awk+sed approach:
awk -F, '{a[NR-1]="s/####"NR"####/"$2"/";print "s/"$1"/####"NR"####/"}; END{for (i=0;i<NR;i++)print a[i];}' replace-list.csv > /tmp/sed_script.sed
sed -f /tmp/sed_script.sed input.txt
A cat+sed+sed approach:
cat -n replace-list.csv | sed -rn 'H;g;s|(.*)\n *([0-9]+) *[^,]*,(.*)|\1\ns/####\2####/\3/|;x;s|.*\n *([0-9]+)[ \t]*([^,]+).*|s/\2/####\1####/|p;${g;s/^\n//;p}' > /tmp/sed_script.sed
sed -f /tmp/sed_script.sed input.txt
Mechanism:
Here, it first generates the sed script, using the csv as input file.
Then uses another sed instance to operate on input.txt
Notes:
The intermediate file generated - sed_script.sed can be re-used again, unless the input csv file changes.
####<number>#### is chosen as some pattern, which is not present in the input file. Change this pattern if required.
cat -n | is not UUOC :)

This might work for you (GNU sed):
sed -r 'h;s/./&\\n/g;H;x;s/([^,]*),.*,(.*)/s|\1|\2|g/;$s/$/;s|\\n||g/' csv_file | sed -rf - original_file
Convert the csv file into a sed script. The trick here is to replace the substitution string with one which will not be re-substituted. In this case each character in the substitution string is replaced by itself and a \n. Finally once all substitutions have taken place the \n's are removed leaving the finished string.

There are a lot of cool answers here already. I'm posting this because I'm taking a slightly different approach by making some large assumptions about the data to replace ( based on the sample data ):
Words to replace don't contain spaces
Words are replaced based on the longest, exactly matching prefix
Each word to replace is exactly represented in the csv
This a single pass, awk only answer with very little regex.
It reads the "repl.csv" file into an associative array ( see BEGIN{} ), then attempts to match on prefixes of each word when the length of the word is bound by key length limits, trying to avoid looking in the associative array whenever possible:
#!/bin/awk -f
BEGIN {
while( getline repline < "repl.csv" ) {
split( repline, replarr, "," )
replassocarr[ replarr[1] ] = replarr[2]
# set some bounds on the replace word sizes
if( minKeyLen == 0 || length( replarr[1] ) < minKeyLen )
minKeyLen = length( replarr[1] )
if( maxKeyLen == 0 || length( replarr[1] ) > maxKeyLen )
maxKeyLen = length( replarr[1] )
}
close( "repl.csv" )
}
{
i = 1
while( i <= NF ) { print_word( $i, i == NF ); i++ }
}
function print_word( w, end ) {
wl = length( w )
for( j = wl; j >= 0 && prefix_len_bound( wl, j ); j-- ) {
key = substr( w, 1, j )
wl = length( key )
if( wl >= minKeyLen && key in replassocarr ) {
printf( "%s%s%s", replassocarr[ key ],
substr( w, j+1 ), !end ? " " : "\n" )
return
}
}
printf( "%s%s", w, !end ? " " : "\n" )
}
function prefix_len_bound( len, jlen ) {
return len >= minKeyLen && (len <= maxKeyLen || jlen > maxKeylen)
}
Based on input like:
I like to eat apples and carrots
orange you glad to see me
Some people eat pears while others drink ink
It yields output like:
I like to eat fruit3s and vegetable1s
fruit2 you glad to see me
Some people eat fruit4s while others drink item1
Of course any "savings" of not looking the replassocarr go away when the words to be replaced goes to length=1 or if the average word length is much greater than the words to replace.

regular expression is not working

my $pat = '^x.*d$';
my $dir = '/etc/inet.d';
if ( $dir =~ /$pat/xmsg ) {
print "found ";
}
how to make it sucess

Your pattern is looking for strings starting with x (^x) and ending in d (d$). The path you are trying does not match as it doesn't start with x.

You can use YAPE::Regex::Explain to help you understand regular expressions:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/^x.*d$/xms;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?msx-i:^x.*d$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?msx-i: group, but do not capture (with ^ and $
matching start and end of line) (with .
matching \n) (disregarding whitespace and
comments) (case-sensitive):
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
x 'x'
----------------------------------------------------------------------
.* any character (0 or more times (matching
the most amount possible))
----------------------------------------------------------------------
d 'd'
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Also, you should not need the g modifier in this case. The documentation has plenty of information about regexes: perlre

There is an 'x' too much :
my $pat = '^.*d$';
my $dir = '/etc/inet.d';
if ( $dir =~ /$pat/xmsg ) {
print "found ";
}

My guess is that you're trying to list all files in /etc/init.d whose name matches the regular expression.
Perl isn't smart enough to figure out that when you name a string variable $dir, assign to it the full pathname of an existing directory, and pattern match against it, you don't intend to match against the pathname,
but against the filenames in that directory.
Some ways to fix this:
perldoc -f glob
perldoc -f readdir
perldoc File::Find
You may just want to use this:
if (glob('/etc/init.d/x*'))
{
warn "found\n";
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

using Regexp::Grammars what does (*COMMIT) do? - perl

It's a backtracking control verb, described in perlre Essentially it forces a regex to fail if a later part of the pattern mismatches and in a way that would cause the regex engine to backtrack into it

Related

perl, matching balanced parens using .Net regex

Why does multiple use of `<( )>` token within `comb` not behave as expected?

Perl: How to extract a string between brackets

How to quickly find and replace many items on a list without replacing previously replaced items in BASH?

regular expression is not working

Categories

Resources