Substitution operator in Perl - perl

I am new to perl. I have the following substitution expression:
$tmp =~ s:/x/y/z::;
I have searched a lot for it but couldn't find a similar expression.
What does it mean?

You can use non-whitespace any character as a delimiter; here, instead of the most common / (s/foo/bar/), the delimiter is : (s:foo:bar:), because what you are substituting has slash characters and if you used a slash delimiter, you'd have to escape them (s/\/x\/y\/z//) which many people consider ugly.
So your expression is simply removing the first /x/y/z from $tmp.

That means: replace /x/y/z with nothing.
For exmaple: If you have a strng like /a/b/x/y/z the result will be /a/b

Related

Not able to understand a command in perl

I need help to understand what below command is doing exactly
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
and $abc{hier} contains a path "/home/test1/test2/test3"
Can someone please let me know what the above command is doing exactly. Thanks
s/PATTERN/REPLACEMENT/ is Perl's substitution operator. It searches a string for text that matches the regex PATTERN and replaces it with REPLACEMENT.
By default, the substitution operator works on $_. To tell it to work on a different variable, you use the binding operator - =~.
The default delimiter used by the substitution operator is a slash (/) but you can change that to any other character. This is useful if your PATTERN or your REPLACEMENT contains a slash. In this case, the programmer has used # as the delimiter.
To recap:
$abc{hier} =~ s#PATTERN#REPLACEMENT#;
means "look for text in $abc{hier} that matches PATTERN and replace it with REPLACEMENT.
The substitution operator also has various options that change its behaviour. They are added by putting letters after the final delimiter. In this case we have a g. That means "make the substitution global" - or match and change all occurrences of PATTERN.
In your case, the REPLACEMENT string is empty (we have two # characters next to each other). So we're replacing the PATTERN with nothing - effectively deleting whatever matches PATTERN.
So now we have:
$abc{hier} =~ s#PATTERN*##g;
And we know it means, "in the variable $abc{hier}, look for any string that matches PATTERN and replace it with nothing".
The last thing to look at is the PATTERN (or regular expression - "regex"). You can get the full definition of regexes in perldoc perlre. But to explain what we're using here:
/tools : is the fixed string "/tools"
.* : is zero or more of any character
/dfII : is the fixed string "/dfII"
/? : is an optional slash character
.* : is (again) zero or more of any character
So, basically, we're removing bits of a file path from a value that's stored in a hash.
This =~ means "Do a regex operation on that variable."
(Actually, as ikegami correctly reminds me, it is not necessarily only regex operations, because it could also be a transliteration.)
The operation in question is s#something#else#, which means replace the "something" with something "else".
The g at the end means "Do it for all occurences of something."
Since the "else" is empty, the replacement has the effect of deleting.
The "something" is a definition according to regex syntax, roughly it means "Starting with '/tools' and later containing '/dfII', followed pretty much by anything until the end."
Note, the regex mentions at the end /?.*. In detail, this would mean "A slash (/) , or maybe not (?), and then absolutely anything (.) any number of times including 0 times (*). Strictly speaking it is not necessary to define "slash or not", if it is followed by "anything any often", because "anything" includes as slash, and anyoften would include 0 or one time; whether it is followed by more "anything" or not. I.e. the /? could be omitted, without changing the behaviour.
(Thanks ikeagami for confirming.)
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
The above commands use regular expression to strip/remove trailing /tools.*/dfII and
/tools.*/dfII/.* from value of hier member of %abc hash.
It is pretty basic perl except non standard regular expression limiters (# instead of standard /). It allows to avoid escaping / inside the regular expression (s/\/tools.*\/dfII\/?.*//g).
My personal preferred style-guide would make it s{/tools.*/dfII/?.*}{}g .

Perl global substitution of a file path

I am reading a tab delimited file using Perl; and want to apply a global substitution to a file path within this file. I have read that I need to incorporate Q and E into my substitution command; but I'm not able to get the substitution to work. I want to replace the partial string psoft/batch/cs with ps/bat/csprd.
$xl[$idx] =~ s/\Qpsoft/batch/cs\E/\Q/psoft/batch/csprd\E/g;
You can't use \Q to escape a delimiter. For example,
s/\Qa*b//
is equivalent to
s/a\*b//
and not
s/a\*b\/\/...
That means
$xl[$idx] =~ s/\Qpsoft/batch/cs\E/\Q/psoft/batch/csprd\E/g;
is equivalent to
$xl[$idx] =~ s/psoft/batch/cs <junk>
Solution:
$xl[$idx] =~ s/psoft\/batch\/cs/\/psoft\/batch\/csprd/g;
Better:
$xl[$idx] =~ s{psoft/batch/cs}{/psoft/batch/csprd}g;
In more details
There are three steps to parsing an m//, qr// or s/// operator.
The first step is to obtain the trailing flags that affect how the regex pattern is parsed (e.g. x, s, m, i, etc). Since Perl doesn't yet know how to parse the regex pattern and to keep costs down, Perl simply looks for the delimiter marking the end of the pattern and the end of the substitution (usually /), paying attention to no other character other than backslashes (\). \Q is ignored at this point.
The second step is where the double-quoted string escapes (e.g. \Q, \L, etc) and interpolation occurs. Perl won't have a regex pattern until these are processed.
Finally, Perl has a regex pattern and knows how to compile it, so the third step is to compile the regex pattern.
The first problem is that you need to use a different set of delimiters for the substitution operator. Instead of s///, you can use s{}{}. Another problem is that you should not use \Q and \E on the right side of s/// because the right side is not a regular expression. In your case, you don't need Q/E at all:
s{psoft/batch/cs}{/psoft/batch/csprd}g;
Refer to s/PATTERN/REPLACEMENT/

What is this replacement doing?

$_ =~ s/\#N\/A//g;
it's replacing \3N\ with A\ are these special characters in perl? Sorry I have no idea how to look up this syntax even.
It is removing #N/A from the string $_.
\#N matches #N (escaping the #)
\/A matches /A (escaping the /)
You can simplify how confusing this looks by changing the substitution delimiter:
$_ =~ s|#N/A||g;
Like Hunter McMillen said, It is removing #N/A from the default variable.
But you can code something more readable and shorter :
s!#N/A!!g;
As #Hunter McMillen mentioned, it's just normal regex substitution with special characters escaped. It's probably better written as
s|#N/A||g
or
s{#N/A}{}g
In Damian Conway's Perl Best Practices, to make regular expressions more readable, he recommends:
s{ \# N \/ A }{}gmsx;
That is:
Use curly braces around the pattern and replacement.
Use spaces to separate parts of the pattern via the /x switch.
Backslash escape non-alphanumeric characters, even if they aren't meta-characters.

Perl string sub

I want to replace something with a path like C:\foo, so I:
s/hello/c:\foo
But that is invalid.
Do I need to escape some chars?
Two problems that I can see.
Your first problem is that your s/// replacement is not terminated:
s/hello/c:\foo # fatal syntax error: "Substitution replacement not terminated"
s/hello/c:\foo/ # syntactically okay
s!hello!c:\foo! # also okay, and more readable with backslashes (IMHO)
Your second problem, the one you asked about, is that the \f is taken as a form feed escape sequence (ASCII 0x0C), just as it would be in double quotes, which is not what you want.
You may either escape the backslash, or let variable interpolation "hide" the problem:
s!hello!c:\\foo! # This will do what you want. Note double backslash.
my $replacement = 'c:\foo' # N.B.: Using single quotes here, not double quotes
s!hello!$replacement!; # This also works
Take a look at the treatment of Quote and Quote-like Operators in perlop for more information.
If I understand what you're asking, then this might be something like what you're after:
$path = "hello/there";
$path =~ s/hello/c:\\foo/;
print "$path\n";
To answer your question, yes you do need to double the backslash because \f is an escape sequence for "form feed" in a Perl string.
The problem is that you are not escaping special characters:
s/hello/c:\\foo/;
would solve your problem. \ is a special character so you need to escape it. {}[]()^$.|*+?\ are meta (special) characterss which you need to escape.
Additional reference: http://perldoc.perl.org/perlretut.html

How do I escape special characters for a substitution in a Perl one-liner?

Is there some way to replace a string such as #or * or ? or & without needing to put a "\" before it?
Example:
perl -pe 'next if /^#/; s/\#d\&/new_value/ if /param5/' test
In this example I need to replace a #d& with new_value but the old value might contain any character, how do I escape only the characters that need to be escaped?
You have several problems:
You are using \b incorrectly
You are replacing code with shell variables
You need to quote metacharacters
From perldoc perlre
A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it
Neither of the characters # or & are \w characters. So your match is guaranteed to fail. You may want to use something like s/(^|\s)\#d\&(\s|$)/${1}new text$2/
(^|\s) says to match either the start of the string (^)or a whitespace character (\s).
(\s|$) says to match either the end of the string ($) or a whitespace character (\s).
To solve the second problem, you should use %ENV.
To solve the third problem, you should use the \Q and \E escape sequences to escape the value in $ENV{a}.
Putting it all together we get:
#!/bin/bash
export a='#d&'
export b='new text'
echo 'param5 #d&' |
perl -pe 'next if /^#/; s/(^|\s)\Q$ENV{a}\E(\s|$)/$1$ENV{b}$2/ if /param5/'
Which prints
param5 new text
As discussed at perldoc perlre:
...Today it is more common to use the quotemeta() function or the "\Q" metaquoting
escape sequence to disable all metacharacters' special meanings like this:
/$unquoted\Q$quoted\E$unquoted/
Beware that if you put literal backslashes (those not inside interpolated variables) between "\Q" and "\E", double-quotish backslash interpolation may
lead to confusing results. If you need to use literal backslashes within "\Q...\E", consult "Gory details of parsing quoted constructs" in perlop.
You can also use a ' as the delimiter in the s/// operation to make everything be parsed literally:
my $text = '#';
$text =~ s'#'1';
print $text;
In your example, you can do (note the single quotes):
perl -pe 's/\b\Q#f&\E\b/new_value/g if m/param5/ and not /^ *#/'
The other answers have covered the question, now here's your meta-problem: Leaning Toothpick Syndrome. Its when the delimiter and escapes start to blur together:
s/\/foo\/bar\\/\/bar\/baz/
The solution is to use a different delimiter. You can use just about anything, but balanced braces work best. Most editors can parse them and you generally don't have to worry about escaping.
s{/foo/bar\\}{/bar/baz}
Here's your regex with braced delimiters.
s{\#d\&}{new_value}
Much easier on the eyeholes.
If you really want to avoid typing the \s, put your search string into a variable and then use that in your regex instead. You don't need quotemeta or \Q ... \E in that case. For example:
my $s = '#d&';
s/$s/new_value/g;
If you must use this in a one-liner, bear in mind that you will have to escape the $s if you use "s to contain your perl code, or escape the 's if you use 's to contain your perl code.
If you have a string like
my $var1 = abc$123
and you want to replace it with abcd then you have to use \Q \E. If you don't then no matter what perl doesn't replace the string.
This is the only thing that worked for me.
my $var2 = s/\Q$var1\E/abcd/g;