Perl: quoting correctly all special characters [duplicate] - perl

This question already has answers here:
How can I prevent Perl from interpreting double-backslash as single-backslash character?
(3 answers)
Closed 4 years ago.
I have this sample string, containing 2 backslashes. Please don't ask me for the source of the string, it is just a sample string.
my $string = "use Ppppp\\Ppppp;";
print $string;
Both, double quotes or quotes will print
use Ppppp\Ppppp;
Using
my $string = "\Quse Ppppp\\Ppppp;\E";
print $string;
will print
use\ Ppppp\\Ppppp\;
adding those extra backslashes to the output.
Is there a simple solution in perl to display the string "literally", without modifying the string like adding extra backslashes to escape?

I have this sample string, containing 2 backslashes. ...
my $string = "use Ppppp\\Ppppp;";
Sorry, but you're mistaken - that string only contains one backslash*, as \\ is a escape sequence in double-quoted (and single-quoted) strings that produces a single backslash. See also "Quote and Quote-like Operators" in perlop. If your string really does contain two backslashes, then you need to write "use Ppppp\\\\Ppppp;", or use a heredoc, as in:
chomp( my $string = <<'ENDSTR' );
use Ppppp\\Ppppp;
ENDSTR
If you want the string output as valid Perl source code (using its escaping), then you can use one of several options:
my $string = "use Ppppp\\Ppppp;";
# option 1
use Data::Dumper;
$Data::Dumper::Useqq=1;
$Data::Dumper::Terse=1;
print Dumper($string);
# option 2
use Data::Dump;
dd $string;
# option 3
use B;
print B::perlstring($string);
Each one of these will print "use Ppppp\\Ppppp;". (There are of course other modules available too. Personally I like Data::Dump. Data::Dumper is a core module.)
Using one of these modules is also the best way to verify what your $string variable really contains.
If that still doesn't fit your needs: A previous edit of your question said "How can I escape correctly all special characters including backslash?" - you'd have to specify a full list of which characters you consider special. You could do something like this, for example:
use 5.014; # for s///r
my $string = "use Ppppp\\Ppppp;";
print $string=~s/(?=[\\])/\\/gr;
That'll print $string with backslashes doubled, without modifying $string. You can also add more characters to the regex character class to add backslashes in front of those characters as well.
* Update: So I don't sound too pedantic here: of course the Perl source code contains two backslashes. But there is a difference between the literal source code and what the Perl string ends up containing, the same way that the string "Foo\nBar" contains a newline character instead of the two literal characters \ and n.
For the sake of completeness, as already discussed in the comments: \Q\E (aka quotemeta) is primarily meant for escaping any special characters that may be special to regular expressions (all ASCII characters not matching /[A-Za-z_0-9]/), which is why it is also escaping the spaces and semicolon.
Since you mention external files: If you are reading a line such as use Ppppp\\Ppppp; from an external file, then the Perl string will contain two backslashes, and if you print it, it will also show two backslashes. But if you wanted to represent that string as Perl source code, you have to write "use Ppppp\\\\Ppppp;" (or use one of the other methods from the question you linked to).

Related

What is the space character in Perl?

I want to have this output
"type":"test test"
I don't want to use space in the command not like " ".
I want to have a character that represents a single space, and I know that I can not use \s.
Is there some thing I can use?
print "\"type\":\"test(space character should be here )test\";
The best solution really is to use the space key directly:
print q("type":"test test"); # Changed delimiter for fewer escapes
But if you want to, you can also use
\x20, the ASCII code for “Space”, or
\N{SPACE}, the Unicode charname for the normal space (there are many more).
\N{U+0020}, the Unicode codepoint for the normal space.
…
Note that these only work in double-quoted strings, e.g. qq(...).
If you want to go the charname route, then you have to load the charnames module prior to Perl 5.16:
use charnames ':full';
Since 5.16, that module is loaded automatically once an \N{...} escape is found.
By default, the $" variable contains a single space – the contents of this variable are used to concatenate the contents of an array when it is interpolated.
Just use the space key:
print "\"type\":\"test test\"";
# ^

Different ways of expression quotes in Perl

There are two ways to express quotations:
' apostrophe
’ single quotation
In Perl, I can match ' apostrophe using regular expressions. However, I can't match ’ single quotation in same way.
What's the problem here? Thanks a lot!
What you call "signle quotation" is the unicode character "RIGHT SINGLE QUOTATION MARK". When dealing with unicode characters in Perl, be sure to properly identify the encoding of the input and of the script. See perlunicode - Unicode support in Perl
for details.
$ perl -CO -E 'use utf8; say for "’Hello’" =~ /(’)/g'
’
’
use strict;
use warnings;
my $validq1=qq|' apostrophe|;
my $validq2=qq|’ single quotation|;
my $noquotes=qq| teapot|;
my #listofquotechars=qw(' ` " ’);
my $quotematcher="[".join("",map {quotemeta($_)} #listofquotechars)."]";
print $validq1 if ($validq1 =~ /$quotematcher/);
print $validq2 if ($validq2 =~ /$quotematcher/);
print $noquotes if ($noquotes =~ /$quotematcher/);
Giving a list of characters you wish to match and then making a character class for a regular expression is one way of doing it, as shown above.

Perl: Is quotemeta for regular expressions only? Is it safe for file names?

While answering this question regarding safe escaping of filename with spaces (and potentially other characters), one of the answers said to use Perl's built-in quotemeta function.
The documentation of quotemeta states:
quotemeta (and \Q ... \E ) are useful when interpolating strings
into regular expressions, because by default an interpolated variable
will be considered a mini-regular expression.
In the documentation for quotemeta, the only mention of its use is to escape all the characters other than /[A-Za-z_0-9]/ with a \ for use in a regex. It does not state the use for filenames. This does seem like a very pleasant, if undocumented, side effect however.
In a comment to Sinan Ünür answer to the earlier question, hobbs states:
shell escaping is different from
regexp escaping, and although I can't
come up with a situation where
quotemeta would give a truly unsafe
result, it's not meant for the task.
If you must escape, instead of
bypassing the shell, I suggest trying
String::ShellQuote which takes a more
conservative approach using sh single
quotes to defang everything except
single quotes themselves, and
backslashes for single quotes. – hobbs
Aug 13 '09 at 14:25
Is it safe -- completely -- to use quotemeta in place of more conservative file quoting like String::Shellquote? Is quotemeta utf8 or multibyte character safe?
I put together a test that is unclear. quotemeta works well, it seems, except for a file name or directory name with a \n, or \r in it. While rare, these characters are legal in Unix and I have seen them. Recall that certain characters, such as LF, CR and NUL cannot be escaped with \. I read my hard drive with 700k files with quotemeta and had no failures.
I have suspicion (though I have not demonstrated it yet) that quotemeta might fail with multibyte characters where one or more of the bytes falls into the ASCII range. For example,à can be encoded as one character (UTF8 C3 A0) or as two characters (U+0061 gives a u+0300 is a combining graves accent). The only demonstrated failure I have with quotemeta is with files with a \n or \r in the path that I created. I would be interested in other characters to put in nasty_names to test.
ShellQuote works perfectly on all file names except those terminated by a NUL when creating a file. I have never ever had a failure with it.
So what to use? Just to be clear: shell quoting is not something I do often, since I usually just use Perl open to open a pipe to a process. That method does not suffer the shell issues discussed. I am interested since I have seen quotemeta used often for file name escaping.
(Thanks to Ether I have added IPC::System::Simple)
Test file:
use strict; use warnings; use autodie;
use String::ShellQuote;
use File::Find;
use File::Path;
use IPC::System::Simple 'capturex';
my #nasty_names;
my $top_dir = '/Users/andrew/bin/pipetestdir/testdir';
my $sub_dir = "easy_to_remove_me";
my (#qfail, #sfail, #ipcfail);
sub wanted {
if ($File::Find::name) {
my $rtr;
my $exec1="ls ".quotemeta($File::Find::name);
my $exec2="ls ".shell_quote($File::Find::name);
my #exec3= ("ls", $File::Find::name);
$rtr=`$exec1`;
push #qfail, "$exec1"
if $rtr=~/^\s*$/ ;
$rtr=`$exec2`;
push #sfail, "$exec2"
if $rtr=~/^\s*$/ ;
$rtr = capturex(#exec3);
push #ipcfail, \#exec3
if $rtr=~/^\s*$/ ;
}
}
chdir($top_dir) or die "$!";
mkdir "$top_dir/$sub_dir";
chdir "$top_dir/$sub_dir";
push #nasty_names, "name with new line \n in the middle";
push #nasty_names, "name with CR \r in the middle";
push #nasty_names, "name with tab\tright there";
push #nasty_names, "utf \x{0061}\x{0300} combining diacritic";
push #nasty_names, "utf e̋ alt combining diacritic";
push #nasty_names, "utf e\x{cc8b} alt combining diacritic";
push #nasty_names, "utf άέᾄ greek";
push #nasty_names, 'back\slashes\\Not\\\at\\\\end';
push #nasty_names, qw|back\slashes\\IS\\\at\\\\end\\\\|;
sub create_nasty_files {
for my $name (#nasty_names) {
open my $fh, '>', $name ;
close $fh;
}
}
for my $dir (#nasty_names) {
chdir("$top_dir/$sub_dir");
mkpath($dir);
chdir $dir;
create_nasty_files();
}
find(\&wanted, $top_dir);
print "\nquotemeta failed on:\n", join "\n", #qfail;
print "\nShell Quote failed on:\n", join "\n", #sfail;
print "\ncapturex failed on:\n", join "\n", #ipcfail;
print "\n\n\n",
"Remove \"$top_dir/$sub_dir\" before running again...\n\n";
Quotemeta is safe under these assumptions:
Only non-alphanumeric characters have a special meaning.
If a non-alphanumeric character has a special meaning, putting a backslash in front of it will always make it non-special.
If a non-alphanumeric character doesn't have a special meaning, putting a backslash in front of it will do nothing.
The shell violates rules 2 and 3 no matter what quote context you use -- outside of quotes, backslash-newline doesn't generate newline; in double-quotes, backslash-punctuation puts a backslash into the output (outside of a certain list of punctuation); and in single-quotes, everything is literal and backslash doesn't even protect you against a closing single-quote.
I still recommend String::ShellQuote if you need to quote things for the shell. I also recommend avoiding letting the shell process your filenames entirely, if you can, by using LIST-form system/exec/open or IPC::Open2, IPC::Open3, or IPC::System::Simple.
As for things besides the shell... lots of different things violate one or more of the rules. For example, obsolete POSIX "basic" regexes and various kinds of editor regexes have punctuation characters that are non-special by default, but become special when preceded by backslash. Basically what I'm saying is, know the thing that you're feeding your data to very well, and escape properly. Only use quotemeta if it's an exact fit, or if you're using it for something that's not very important.
You could also use IPC::System::Simple capture() or capturex() (which I suggested in another answer on that first question), which will let you bypass the shell.
I added these lines to your script and found that no examples failed:
use IPC::System::Simple 'capturex';
...
my (#qfail, #sfail, #ipcfail);
...
my #exec3= ("ls", $File::Find::name);
...
$rtr = capturex(#exec3);
push #ipcfail, \#exec3
if $rtr=~/^\s*$/ ;
...
print "\ncapturex failed on:\n", join "\n", #ipcfail;
But in general, you should solve the actual problem, rather than attempting to find better band-aids. quotemeta is intended specifically to escape regular expression-significant characters, which as you have discovered is not a perfect overlap with the set of characters that are significant to the shell.
The following is a Unix-only solution; see https://stackoverflow.com/a/32161361/45375 for Windows support.
An alternative is this simple function, which should work robustly even with non-ASCII characters (assuming the correct encoding), as well as \n, and \r, but excluding NUL (see bottom).
sub quoteforsh { join ' ', map { "'" . s/'/'\\''/gr . "'" } #_ }
The function encloses each argument in single-quotes and, if multiple arguments were specified, separates them with spaces.
Single-quoted strings are used, because their contents is not subject to any interpretation in POSIX-like shells.
As such, however, you cannot even escape ' instances themselves, which requires the following workaround: every embedded ' instance is replaced with '\'' (sic), which effectively splits the input string into multiple single-quoted strings, with escaped ' instances - \' - spliced in - the shell then reassembles the string parts into a single string.
Example:
print quoteforsh 'I\'m here & wëll';
literally produces (including the enclosing single-quotes) 'I'\''m here & wëll', which, to the shell, are 3 contiguous strings - 'I', \', and '&well', which the shell then reassembles into a single string, which, after quote removal, yields I'm here & wëll.
OSX Unicode caveat: The HFS+ stores filenames in NFD (decomposed Unicode normal form - base letter followed by another character that is the associated diacritic), whereas Perl typically creates NFC (composed Unicode normal form - a single character identifies the accented letter).
When using literal filenames, this distinction doesn't matter (the system calls do the mapping), but when using globs, it does, and, unfortunately, you have to do your own translation between the two forms.
Support for NUL (0x0) chars.:
I don't think NUL chars. in filenames are a real-world concern:
Most POSIX-like shells (bash, dash, ksh) ignore NUL chars. on the command line - zsh being the only exception.
Even if that weren't an issue, according to Wikipedia, most Unix systems do not support NUL chars. in filenames.
Besides, trying to pass a literal with a NUL to Perl's system() function breaks the invocation, presumably, because the string passed to sh -c is cut off at the first NUL:
system "echo 'a\x{0}b'"; # BREAKS

How do I escape special characters for a substitution in a Perl one-liner?

Is there some way to replace a string such as #or * or ? or & without needing to put a "\" before it?
Example:
perl -pe 'next if /^#/; s/\#d\&/new_value/ if /param5/' test
In this example I need to replace a #d& with new_value but the old value might contain any character, how do I escape only the characters that need to be escaped?
You have several problems:
You are using \b incorrectly
You are replacing code with shell variables
You need to quote metacharacters
From perldoc perlre
A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it
Neither of the characters # or & are \w characters. So your match is guaranteed to fail. You may want to use something like s/(^|\s)\#d\&(\s|$)/${1}new text$2/
(^|\s) says to match either the start of the string (^)or a whitespace character (\s).
(\s|$) says to match either the end of the string ($) or a whitespace character (\s).
To solve the second problem, you should use %ENV.
To solve the third problem, you should use the \Q and \E escape sequences to escape the value in $ENV{a}.
Putting it all together we get:
#!/bin/bash
export a='#d&'
export b='new text'
echo 'param5 #d&' |
perl -pe 'next if /^#/; s/(^|\s)\Q$ENV{a}\E(\s|$)/$1$ENV{b}$2/ if /param5/'
Which prints
param5 new text
As discussed at perldoc perlre:
...Today it is more common to use the quotemeta() function or the "\Q" metaquoting
escape sequence to disable all metacharacters' special meanings like this:
/$unquoted\Q$quoted\E$unquoted/
Beware that if you put literal backslashes (those not inside interpolated variables) between "\Q" and "\E", double-quotish backslash interpolation may
lead to confusing results. If you need to use literal backslashes within "\Q...\E", consult "Gory details of parsing quoted constructs" in perlop.
You can also use a ' as the delimiter in the s/// operation to make everything be parsed literally:
my $text = '#';
$text =~ s'#'1';
print $text;
In your example, you can do (note the single quotes):
perl -pe 's/\b\Q#f&\E\b/new_value/g if m/param5/ and not /^ *#/'
The other answers have covered the question, now here's your meta-problem: Leaning Toothpick Syndrome. Its when the delimiter and escapes start to blur together:
s/\/foo\/bar\\/\/bar\/baz/
The solution is to use a different delimiter. You can use just about anything, but balanced braces work best. Most editors can parse them and you generally don't have to worry about escaping.
s{/foo/bar\\}{/bar/baz}
Here's your regex with braced delimiters.
s{\#d\&}{new_value}
Much easier on the eyeholes.
If you really want to avoid typing the \s, put your search string into a variable and then use that in your regex instead. You don't need quotemeta or \Q ... \E in that case. For example:
my $s = '#d&';
s/$s/new_value/g;
If you must use this in a one-liner, bear in mind that you will have to escape the $s if you use "s to contain your perl code, or escape the 's if you use 's to contain your perl code.
If you have a string like
my $var1 = abc$123
and you want to replace it with abcd then you have to use \Q \E. If you don't then no matter what perl doesn't replace the string.
This is the only thing that worked for me.
my $var2 = s/\Q$var1\E/abcd/g;

How can I prevent Perl from interpreting \ as an escape character?

How can I print a address string without making Perl take the slashes as escape characters? I don't want to alter the string by adding more escape characters also.
What you're asking about is called interpolation. See the documentation for "Quote-Like Operators" at perldoc perlop, but more specifically the way to do it is with the syntax called the "here-document" combined with single quotes:
Single quotes indicate the text is to be treated literally with no interpolation of its content. This is similar to single quoted strings except that backslashes have no special meaning, with \ being treated as two backslashes and not one as they would in every other quoting construct.
This is the only form of quoting in perl where there is no need to worry about escaping content, something that code generators can and do make good use of.
For example:
my $address = <<'EOF';
blah#blah.blah.com\with\backslashes\all\over\theplace
EOF
You may want to read up on the various other quoting operators such as qw and qq (at the same document as I referenced above), as they are very commonly used and make good shorthand for other more long-winded ways of escaping content.
Use single quotes. For example
print 'lots\of\backslashes', "\n";
gives
lots\of\backslashes
If you want to interpolate variables, use the . operator, as in
$var = "pesky";
print 'lots\of\\' . $var . '\backslashes', "\n";
Notice that you have to escape the backslash at the end of the string.
As an alternative, you could use join:
print join("\\" => "lots", "of", $var, "backslashes"), "\n";
We could give much more helpful answers if you'd give us sample code.
It depends what you're escaping, but the Quote-like operators may help.
See the perlop man page.
Use the backslah two times,
print "This is a backslah character \\";