perl increment number in string with evaluation modifier - perl

I'm trying to increment:
Text_1_string(0)
to
Text_1_string(1)
and so on.
Note that I only want to increment the number in the parenthesis.
I've used:
name =~ s/\(([0-9]+)\)/$1 + 1/e;
but it turns out as:
Text_1_string1
and I don't understand why. The group captured is the number, it shouldn't replace the parenthesis.

It replaces the whole pattern that it matched, not only what is also captured. So you do need to put back the parens
$name =~ s/\(([0-9]+)\)/'('.($1 + 1).')'/e;
Since the replacement part is evaluated as code it need be normal Perl code, thus the quotes and concatenation, and parenthesis for precedence.
To add, there are patterns that need not be put back in the replacement part: lookahead and lookbehind assertions. Like common anchors, these are zero width assertions, so they do not consume what they match -- you only "look"
$name =~ s/(?<=\() ([0-9]+) (?=\))/$1 + 1/xe;
The lookbehind can't be of variable length (like \w+); it takes only a fixed string pattern.
The (?<=...) asserts that the (fixed length) pattern in parenthesis (which do not capture!) must precede the number while (?=...) asserts that the pattern in its parens must follow, for the whole pattern to match.
Often very useful is the lookbehind-type construct \K, which makes the engine keep in the string what it had matched up to that point (instead of "consuming" it); so it "drops" previous matches, much like the (?<=...) form
$name =~ s/\(\K ([0-9]+) (?=\))/$1 + 1/xe;
This is also more efficient. While it is also termed a "lookbehind" in documentation, there are in fact distinct differences in behavior. See this post and comments. Thanks to ikegami for a comment.
All these are positive lookarounds; there are also negative ones, asserting that given patterns must not be there for the whole thing to match.
A bit of an overkill in this case but a true gift in some other cases.

Related

Not able to understand a command in perl

I need help to understand what below command is doing exactly
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
and $abc{hier} contains a path "/home/test1/test2/test3"
Can someone please let me know what the above command is doing exactly. Thanks
s/PATTERN/REPLACEMENT/ is Perl's substitution operator. It searches a string for text that matches the regex PATTERN and replaces it with REPLACEMENT.
By default, the substitution operator works on $_. To tell it to work on a different variable, you use the binding operator - =~.
The default delimiter used by the substitution operator is a slash (/) but you can change that to any other character. This is useful if your PATTERN or your REPLACEMENT contains a slash. In this case, the programmer has used # as the delimiter.
To recap:
$abc{hier} =~ s#PATTERN#REPLACEMENT#;
means "look for text in $abc{hier} that matches PATTERN and replace it with REPLACEMENT.
The substitution operator also has various options that change its behaviour. They are added by putting letters after the final delimiter. In this case we have a g. That means "make the substitution global" - or match and change all occurrences of PATTERN.
In your case, the REPLACEMENT string is empty (we have two # characters next to each other). So we're replacing the PATTERN with nothing - effectively deleting whatever matches PATTERN.
So now we have:
$abc{hier} =~ s#PATTERN*##g;
And we know it means, "in the variable $abc{hier}, look for any string that matches PATTERN and replace it with nothing".
The last thing to look at is the PATTERN (or regular expression - "regex"). You can get the full definition of regexes in perldoc perlre. But to explain what we're using here:
/tools : is the fixed string "/tools"
.* : is zero or more of any character
/dfII : is the fixed string "/dfII"
/? : is an optional slash character
.* : is (again) zero or more of any character
So, basically, we're removing bits of a file path from a value that's stored in a hash.
This =~ means "Do a regex operation on that variable."
(Actually, as ikegami correctly reminds me, it is not necessarily only regex operations, because it could also be a transliteration.)
The operation in question is s#something#else#, which means replace the "something" with something "else".
The g at the end means "Do it for all occurences of something."
Since the "else" is empty, the replacement has the effect of deleting.
The "something" is a definition according to regex syntax, roughly it means "Starting with '/tools' and later containing '/dfII', followed pretty much by anything until the end."
Note, the regex mentions at the end /?.*. In detail, this would mean "A slash (/) , or maybe not (?), and then absolutely anything (.) any number of times including 0 times (*). Strictly speaking it is not necessary to define "slash or not", if it is followed by "anything any often", because "anything" includes as slash, and anyoften would include 0 or one time; whether it is followed by more "anything" or not. I.e. the /? could be omitted, without changing the behaviour.
(Thanks ikeagami for confirming.)
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
The above commands use regular expression to strip/remove trailing /tools.*/dfII and
/tools.*/dfII/.* from value of hier member of %abc hash.
It is pretty basic perl except non standard regular expression limiters (# instead of standard /). It allows to avoid escaping / inside the regular expression (s/\/tools.*\/dfII\/?.*//g).
My personal preferred style-guide would make it s{/tools.*/dfII/?.*}{}g .

Why aren't my nested lookarounds working correctly in my Perl substitution?

I have a Perl substitution which converts hyperlinks to lowercase:
's/(?<=<a href=")([^"]+)(?=")/\L$1/g'
I want the substitution to ignore any links which begin with a hash, for example I want it to change the path in Foo Bar to lowercase but skip if it comes across Bar.
Nesting lookaheads to instruct it to skip these links isn't working correctly for me. This is the one-liner I've written:
perl -pi -e 's/(?<=<a href=" (?! (?<=<a href="#) ) )([^"]+)(?=")/\L$1/g' *;
Could anyone hint to me where I have gone wrong with this substitution? It executes just fine, but does not do anything.
As near as I can tell, your initial regex will work just fine, if you add the condition that the first character in the link may not be a hash # or a double quote, e.g. [^#"]
s/(?<=<a href=")([^#"][^"]+)(?=")/\L$1/gi;
In the case you have links which do not start with a hash, e.g. Foo Bar, it becomes slightly more complicated:
s{(?<=<a href=")([^#"]+)(#[^"]+)*(?=")}{ lc($1) . ($2 // "") }gei;
We now have to evaluate the substitution, since otherwise we get undefined variable warnings when the optional anchor reference is not present.
You don't need look-arounds, from what I see
use 5.010;
...
s/<a \s+ href \s* = \s* "\K([^#"][^"]*)"/\L$1"/gx;
\K means "keep" everything before it. It amounts to a variable-length look-behind.
perlre:
For various reasons \K may be significantly more efficient than the equivalent (?<=...) construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string.

How to get a perfect match for a regexp pattern in Perl?

I've to match a regular-expression, stored in a variable:
#!/bin/env perl
use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
print "\n********* NOT MATCHED\n";
}
But I'm getting the outout in $& as
%%%%%%%%% -----abcd[3] xyzg-----[4:0]
But expecting, it shouldn't go inside the if clause.
What is intended is:
if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg----- (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg----- (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED (CORRECT/INTENDED)
but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.
Can anyone please help me to change my $expr pattern, so that I can have what is intended?
By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;
(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)
EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:
my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;
This will match anything that takes the following form:
beginning of the string (which your example in the comments seems to imply you want)
zero or more whitespace characters
one or more word characters
optional: [, one or more digits, ]
one or more whitespace characters
one or more word characters
one of the following, in descending order of preference:
[, one or more digits, ]
an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)
Do you have any more unstated requirements that need to be satisfied?
The short answer is your regexp is wrong.
We can't fix it for you without you explaining what you need exactly, and the community is not going to write a regexp exactly for your purpose because that's just too localized a question that only helps you this one time.
You need to ask something more general about regexps that we can explain to you, that will help you fix your regexp, and help others fix theirs.
Here's my general answer when you're having trouble testing your regexp. Use a regexp tool, like the regex buddy one.
So I'm going to give a specific answer about what you're overlooking here:
Let's make this example smaller:
Your pattern is a(bc+d)?. It will match: abcd abccd etc. While it will not match bcd nor bzd in the case of abzd it will match as matching only a because the whole group of bc+d is optional. Similarly it will match abcbcd as a dropping the whole optional group that couldn't be matched (at the second b).
Regexps will match as much of the string as they can and return a true match when they can match something and have satisfied the entire pattern. If you make something optional, they will leave it out when they have to including it only when it's present and matches.
Here's what you tried:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
First, s and x aren't needed modifiers here.
Second, this regex can match:
Any or no whitespace followed by
a word of at least one alpha character followed by
optionally a grouped square bracketed number with at least one digit (eg [0] or [9999]) followed by
at least one white space followed by
a word of at least one alpha character followed by
optionally a square bracketed number with at least one digit.
Clearly when you ask it to match abcd[0] xyzg[0:4] the colon ends the \d+ pattern but doesn't satisfy the \] so it backtracks the whole group, and then happily finds the group was optional. So by not matching the last optional group, your pattern has matched successfully.

Funky 'x' usage in perl

My usual 'x' usage was :
print("#" x 78, "\n");
Which concatenates 78 times the string "#". But recently I came across this code:
while (<>) { print if m{^a}x }
Which prints every line of input starting with an 'a'. I understand the regexp matching part (m{^a}), but I really don't see what that 'x' is doing here.
Any explanation would be appreciated.
It's a modifier for the regex. The x modifier tells perl to ignore whitespace and comments inside the regex.
In your example code it does not make a difference because there are no whitespace or comments in the regex.
The "x" in your first case, is a repetition operator, which takes the string as the left argument and the number of times to repeat as the right argument. Perl6 can replicate lists using the "xx" repetition operator.
Your second example uses the regular expression m{^a}x. While you may use many different types of delimiters, neophytes may like to use the familiar notation, which uses a forward slash: m/^a/x
The "x" in a regex is called a modifier or a flag and is but one of many optional flags that may be used. It is used to ignore whitespace in the regex pattern, but it also allows the use of normal comments inside. Because regex patterns can get really long and confusing, using whitespace and comments are very helpful.
Your example is very short (all it says is if the first letter of the line starts with "a"), so you probably wouldn't need whitespace or comments, but you could if you wanted to.
Example:
m/^a # first letter is an 'a'
# <-- you can put more regex on this line because whitespace is ignored
# <-- and more here if you want
/x
In this use case 'x' is a regex modifier which "Extends your pattern's legibility by permitting whitespace and comments." according to the perl documentation. However it seems redundant here

What's the difference between 'eq' and '=~' in Perl?

What is the difference between these two operators? Specifically, what difference in $a will lead to different behavior between the two?
$a =~ /^pattern$/
$a eq 'pattern'
eq is for testing string equality, == is the same thing but for numerical equality.
The =~ operator is for applying a regular expression to a scalar.
For the gory details of every Perl operator and what they're for, see the perldoc perlop manpage.
As others have noted, ($a =~ /^pattern$/) uses the regular expression engine to evaluate whether the strings are identical, whereas ($a eq 'pattern') is the plain string equality test.
If you really only want to know whether two strings are identical, the latter is preferred for reasons of:
Readability - It is more concise, containing fewer special characters.
Maintainability - With a regex pattern, you must escape any special characters that may appear in your string, or use extra markers such as \Q and \E. With a single-quoted string, the only character you need to escape is a single quote. (You also have to escape backslashes if they are followed by another backslash or the string delimiter.)
Performance - You don't incur the overhead of firing up the regex engine just to compare a string. If this happens several million times in your program, for example, the benefit is notable.
On the other hand, the regex form is far more flexible if you need to do something other than a plain string equality test. See perldoc perlre for more on regular expressions.
EDIT: Like most everyone else before ysth, I missed the obvious functional difference between them and went straight for more abstract differences. I've clarified the question but I'll leave the answer as a (hopefully) useful reference.
eq -- Tests for string equality.
=~ -- Binds a scalar expression to a pattern match.
See here for more in-depth descriptions of all of the operators.
"pattern\n" :)
$a = "pattern\n";
print "ok 1\n" if $a =~ /^pattern$/;
print "ok 2\n" if $a eq 'pattern';
Perhaps you meant /^pattern\z/.
=~ is the binding operator. It is used to bind a value to either a pattern match (m//), a substitution (s///), or a transliteration (tr// or y//).
eq is the string equality operator; it compares two values to determine whether or not they're equal when considered as strings. There is a peer == operator that does the same thing only considering the values as numbers. (In Perl, strings and numbers are mostly interchangeable with conversions happening automatically depending on how the values are used. Because of this, when you want to compare two values you must specify the type of comparison to perform.)
In general, $var =~ m/.../ determines whether or not the value of $var matches a pattern, not whether it equals a particular value. However, in this case the pattern is anchored at both ends and contains nothing but literal characters, so it's equivalent to a string comparison. It's better to use eq here because it's clearer and faster.