How can I use "s" as a substitution delimiter in Perl? - perl

I was playing with Perl and thought that
sssssss
Would have been the same as
s/s/ss/
It seems only certain delimiters can be used. What are they?

You can use any non-whitespace character as the delimiter, but you can't use the delimiter inside PATTERN or REPLACEMENT without escaping it. This is totally valid:
my $x = 's';
$x =~ s s\ss\s\ss;
print $x; # prints "ss"
Note that a space is required after the first s or else it will be interpreted as ss identifier.

Related

Code does not remove non-ascii characters from variable

Why do the following lines of code not remove non-ascii characters from my variable and replace it with a single space?
$text =~ s/[[:^ascii:]]+/ /rg;
$text =~ s/\h+/ /g;
Whereas this works to remove newline?
$log_mess =~ s/[\r\n]+//g;
To explain the problem for anyone finding this question in the future:
$text =~ s/[[:^ascii:]]+/ /rg;
The problem is the /r option on the substitution operator (s/.../.../).
This operator is documented in the "Regexp Quote-Like Operators" section of perlop. It says this about /r:
r - Return substitution and leave the original string untouched.
You see, in most cases, the substitution operator works on the string that it is given (e.g. your variable $text) but in some cases, you don't want that. In some cases, you want the original variable to remain unchanged and the altered string to be returned so that you can store it in a new variable.
Previously, you would do this:
my $new_var = $var;
$new_var =~ s/regex/substitution/;
But since the /r option was added, you can simplify that to:
my $new_var = $var =~ s/regex/substitution/r;
I'm not sure why you used /r in your code (I guess you copied it from somewhere else), but you don't need it here and it's what is leading to your original string being unchanged.

How do I match a string ending in whitespace using grep in Perl?

I want to call grep in a Perl script like so:
my $line = `grep "^$name\b" $inputFile`;
When I run the actual grep command in the terminal, it returns what is expected, but when I include this line in my script, nothing is returned.
If I replace the \b with a \s, it complains with "Unrecognized escape \s passed through at test.pl line 50."
I've already looked at how to use grep to match with either whitespace or newline, but it doesn't have specifics to why my script isn't returning the expected.
How do I properly include the \b or \s in the command in Perl?
you have to escape the '\' use grep \\b
You don't use backticks -- you write it in Perl
my $name = 'xyz';
my $line;
while ( <> ) {
next unless /^$name\b/;
$line = $_;
last;
}
This would be likely to be much less clumsy if the context of the test were known. You may need to escape the contents of $name if you wish to pass regex metacharacters through

perl split interesting behavior

can somebody explain this weird behavior:
I hava path in a string and I want to split it for each backslash
my $path = "D:\Folder\AnotherFolder\file.txt";
my #folders = split('\', $path);
in the case above it won't work not even if escaping the backslash like this:
my #folders = split('\\', $path);
but in the case of a regexp it will work:
my #folders = split( /\\/, $path);
why is so?
I think amon gave the best literal answer to your question in his comment:
more explicitly: strings and regexes have different rules for escaping. If a string is used in place of a regex, the string literals suffer from double escaping
Meaning that split '\\' uses a string and split /\\/ uses a regex.
As a practical answer, I wanted to add this:
Perhaps you should consider using a module suited for splitting paths. File::Spec is a core module in Perl 5. And also, you have to escape backslash in a double quoted string, which you have not done. You can also use single quotes, which looks a bit better in my opinion.
use strict;
use warnings;
use Data::Dumper;
use File::Spec;
my $path = 'D:\Folder\AnotherFolder\file.txt'; # note the single quotes
my #elements = File::Spec->splitdir($path);
print Dumper \#elements;
Output:
$VAR1 = [
'D:',
'Folder',
'AnotherFolder',
'file.txt'
];
If you look at the documentation by running:
perldoc -f split
you will see three forms of arguments that split can take:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
This means that even when you pass split a string as the first argument, perl is coercing it into a regex.
If we look at the warnings we get when trying to do something like this in re.pl:
$ my $string_with_backslashes = "Hello\\there\\friend";
Hello\there\friend
$ my #arry = split('\\', $string_with_backslashes);
Compile error: Trailing \ in regex m/\/ at (eval 287) line 6.
we see that first, '\\' is interpolated as a backslash escape followed by an actual backslash, which evaluates to a single backslash.
split then puts the backslash we gave it, and coerces it to a regex as if we had written:
$ my #arry = split(/\/, $string_with_backslashes);
which doesn't work because there is only a single backslash which is interpreted as simply escaping the forward slash after it (without having a terminating /) to show that the regex has ended.
One of the neater ways to extract the elements of a path is to extract all sequences of characters other than a path separator.
use strict;
use warnings;
my $path = 'D:\Folder\AnotherFolder\file.txt';
my #path = $path =~ m([^/\\]+)g;
print "$_\n" for #path;
output
D:
Folder
AnotherFolder
file.txt
When split is used in the form of split STRING and not split REGEX, the string is being converted into a regex. In your case split '\\' will be converted to split /\/ since the first backslash is considered an escape character.
The correct way to do it is split '\\\\' which will be translated to split /\\/.

What is the meaning of the number sign (#) in a Perl regex match?

What is the meaning of below statement in perl?
($script = $0) =~ s#^.*/##g;
I am trying to understand the operator =~ along with the statement on the right side s#^.*/##g.
Thanks
=~ applies the thing on the right (a pattern match or search and replace) to the thing on the left. There's lots of documentation about =~ out there, so I'm just going to point you at a pretty good one.
There's a couple of idioms going on there which are not obvious nor well documented which might be tripping you up. Let's cover them.
First is this...
($copy = $original) =~ s/foo/bar/;
This is a way of copying a variable and performing a search and replace on it in a single step. It is equivalent to:
$copy = $original;
$copy =~ s/foo/bar/;
The =~ operates on whatever is on the left after the left hand code has been run. ($copy = $original) evaluates to $copy so the =~ acts on the copy.
s#^.*/##g is the same as s/^.*\///g but using alternative delimiters to avoid Leaning Toothpick Syndrome. You can use just about anything as a regex delimiter. # is common, though I think its ugly and hard to read. I prefer {} because they balance. s{^.*/}{}g is equivalent code.
Unrolling the idioms, you have this:
$script = $0;
$script =~ s{^.*/}{}g;
$0 is the name of the script. So this is code to copy the name of the script and strip everything up to the last slash (.* is greedy and will match as much as possible) off it. It is getting just the filename of the script.
The /g indicates to perform the match on the string as many times as possible. Since this can only ever match once (the ^ anchors it to the beginning of the string) it serves no purpose.
There's a better and safer way to do this.
use File::Basename;
$script = basename($0);
It's very, very simple:
Perl quote-like expressions can take many different characters as part separators. The separator right after the command (in this case, the s) is the separator for the rest of the operation. For example:
# Out with the "Old" and "In" with the new
$string =~ s/old/new/;
$string =~ s#old#new#;
$string =~ s(old)(new);
$string =~ s#old#new#;
All four of those expressions are the same thing. They replace the string old with new in my $string. Whatever comes after the s is the separator. Note that parentheses, curly braces, and square brackets use parings. This works out rather nicely for the q and qq which can be used instead of single quotes and double quotes:
print "The value of \$foo is \"foo\"\n"; # A bit hard to read
print qq/The value of \$foo is "$foo"\n/; # Maybe slashes weren't a great choice...
print qq(The value of \$foo is "$foo"\n); # Very nice and clean!
print qq(The value of \$foo is (believe it or not) "$foo"\n); #Still works!
The last still works because the quote like operators count opening and closing parentheses. Of course, with regular expressions, parentheses and square brackets are part of the regular expression syntax, so you won't see them so much in substitutions.
Most of the time, it is highly recommended that you stick with the s/.../.../ form just for readability. It's what people are use to and it's easy to digest. However, what if you have this?
$bin_dir =~ s/\/home\/([^\/]+)\/bin/\/Users\/$1\bin/;
Those backslashes can make it hard to read, so the tradition has been to replace the backslash separators to avoid the hills and valleys effect.
$bin_dir =~ s#/home/([^/]+)/bin#/Users/$1/bin#;
This is a bit hard to read, but at least I don't have to quote each forward slash and backslash, so it's easier to see what I'm substituting. Regular expressions are hard because good quote characters are hard to find. Various special symbols such as the ^, *, |, and + are magical regular expression characters, and could probably be in a regular expression, the # is a common one to use. It's not common in strings, and it doesn't have any special meaning in a regular expression, so it won't be used.
Getting back to your original question:
($script = $0) =~ s#^.*/##g;
is the equivalent of:
($script = $0) =~ s/^.*\///g;
But because the original programmer didn't want to backquote that slash, they changed the separator character.
As for the:
($script = $0) =~ s#^.*/##g;`
It's the same as saying:
$script = $0;
$script =~ s#^.*/##g;
You're assigning the $script variable and doing the substitution in a single step. It's very common in Perl, but it is a bit hard to understand at first.
By the way, if I understand that basic expression (Removing all characters to the last forward slash. This would have been way cleaner:
use File::Basename;
...
$script = basename($0);
Much easier to read and understand -- even for an old Perl hand.
In perl, you can use many kinds of characters as quoting characters (string, regular expression, list). lets break it down:
Assign the $script variable the contents of $0 (the string that contains the name of the calling script.)
The =~ character is the binding operator. It invokes a regular expression match or a regex search and replace. In this case, it matches against the new variable, $script.
the s character indicates a search and replace regex.
The # character is being used as the delimiter for the regex. The regex pattern quote character is usually the / character, but you can use others, including # in this case.
The regex, ^.*/. It means, "at the start of string, search for zero or more characters until a slash. This will keep capturing on each line except for newline characters (which . does not match by default.)
The # indicating the start of the 'replace' value. Usually you have a pattern here that uses any captured part of the first line.
The # again. This ends the replace pattern. Since there was nothing between the start and end of the replace pattern, everything that was found in the first is replaced with nothing.
g, or global match. The search and replace will keep happening as many times as it matches in the value.
Effectively, searches for and empties every value before the / in the value , but keeps all the newlines, in the name of the script. It's a really lazy way of getting the script name when invoked in a long script that only works with a unix-like path.
If you have a chance, consider replacing with File::Basename, a core module in Perl:
use File::Basename;
# later ...
my $script = fileparse($0);

I want to create a perl code to extract what is in the parentheses and port it to a variable

I want to create a perl code to extract what is in the parentheses and port it to a variable.
"(05-NW)HPLaserjet" should become "05-NW"
Something like this:
Catch "("
take out any spaces that exsist in between ()
everything in between () = variable 1
How would I go about doing this?
This is a job for regular expressions. Looks confusing because parens are used as meta characters in regular expression and are also part of the pattern in your example, escaped by backslashes.
C:\temp $ echo (05-NW)HPLaserjet | perl -nlwe "print for m/\(([^)]+)\)/g"
Match opening paren, start capture group, match one or more characters that aren't the closing paren, close capture group, match closing paren.
You can use regular expressions (see perlretut) to match and capture the value. By assigning to a list, you can put your captures into named variables. The global variables $1, $2 etc. are also used for capture groups, so you can use that instead of list assignment if you like.
use strict;
use warnings;
while (<>) # read every line
{
my ($printer_code) = m/
\( # Match literal opening parenthesis
([^\)]*) # Capture group (printer_code): Match characters which aren't right parenthesis, zero or more times
\)/x; # Match literal closing parenthesis
# The 'x' modifier allows you to add whitespace and comments to regex for clarity.
# If you use it, make sure you use '\ ' (or '\s', etc.) for actual literal whitespace matching!
}
__DATA__
(05-NW)HPLaserjet
perldoc perlre
use warnings;
use strict;
my $s = '(05-NW)HPLaserjet';
my ($v) = $s =~ /\((.*)\)/; # Grab everything between parens (including other parens)
$v =~ s/\s//g; # Remove all whitespace
print "$v\n";
__END__
05-NW
See also: Perl Idioms Explained - #ary = $str =~ m/(stuff)/g