How can I prevent Perl from interpreting \ as an escape character? - perl

How can I print a address string without making Perl take the slashes as escape characters? I don't want to alter the string by adding more escape characters also.

What you're asking about is called interpolation. See the documentation for "Quote-Like Operators" at perldoc perlop, but more specifically the way to do it is with the syntax called the "here-document" combined with single quotes:
Single quotes indicate the text is to be treated literally with no interpolation of its content. This is similar to single quoted strings except that backslashes have no special meaning, with \ being treated as two backslashes and not one as they would in every other quoting construct.
This is the only form of quoting in perl where there is no need to worry about escaping content, something that code generators can and do make good use of.
For example:
my $address = <<'EOF';
blah#blah.blah.com\with\backslashes\all\over\theplace
EOF
You may want to read up on the various other quoting operators such as qw and qq (at the same document as I referenced above), as they are very commonly used and make good shorthand for other more long-winded ways of escaping content.

Use single quotes. For example
print 'lots\of\backslashes', "\n";
gives
lots\of\backslashes
If you want to interpolate variables, use the . operator, as in
$var = "pesky";
print 'lots\of\\' . $var . '\backslashes', "\n";
Notice that you have to escape the backslash at the end of the string.
As an alternative, you could use join:
print join("\\" => "lots", "of", $var, "backslashes"), "\n";
We could give much more helpful answers if you'd give us sample code.

It depends what you're escaping, but the Quote-like operators may help.
See the perlop man page.

Use the backslah two times,
print "This is a backslah character \\";

Related

Perl: quoting correctly all special characters [duplicate]

This question already has answers here:
How can I prevent Perl from interpreting double-backslash as single-backslash character?
(3 answers)
Closed 4 years ago.
I have this sample string, containing 2 backslashes. Please don't ask me for the source of the string, it is just a sample string.
my $string = "use Ppppp\\Ppppp;";
print $string;
Both, double quotes or quotes will print
use Ppppp\Ppppp;
Using
my $string = "\Quse Ppppp\\Ppppp;\E";
print $string;
will print
use\ Ppppp\\Ppppp\;
adding those extra backslashes to the output.
Is there a simple solution in perl to display the string "literally", without modifying the string like adding extra backslashes to escape?
I have this sample string, containing 2 backslashes. ...
my $string = "use Ppppp\\Ppppp;";
Sorry, but you're mistaken - that string only contains one backslash*, as \\ is a escape sequence in double-quoted (and single-quoted) strings that produces a single backslash. See also "Quote and Quote-like Operators" in perlop. If your string really does contain two backslashes, then you need to write "use Ppppp\\\\Ppppp;", or use a heredoc, as in:
chomp( my $string = <<'ENDSTR' );
use Ppppp\\Ppppp;
ENDSTR
If you want the string output as valid Perl source code (using its escaping), then you can use one of several options:
my $string = "use Ppppp\\Ppppp;";
# option 1
use Data::Dumper;
$Data::Dumper::Useqq=1;
$Data::Dumper::Terse=1;
print Dumper($string);
# option 2
use Data::Dump;
dd $string;
# option 3
use B;
print B::perlstring($string);
Each one of these will print "use Ppppp\\Ppppp;". (There are of course other modules available too. Personally I like Data::Dump. Data::Dumper is a core module.)
Using one of these modules is also the best way to verify what your $string variable really contains.
If that still doesn't fit your needs: A previous edit of your question said "How can I escape correctly all special characters including backslash?" - you'd have to specify a full list of which characters you consider special. You could do something like this, for example:
use 5.014; # for s///r
my $string = "use Ppppp\\Ppppp;";
print $string=~s/(?=[\\])/\\/gr;
That'll print $string with backslashes doubled, without modifying $string. You can also add more characters to the regex character class to add backslashes in front of those characters as well.
* Update: So I don't sound too pedantic here: of course the Perl source code contains two backslashes. But there is a difference between the literal source code and what the Perl string ends up containing, the same way that the string "Foo\nBar" contains a newline character instead of the two literal characters \ and n.
For the sake of completeness, as already discussed in the comments: \Q\E (aka quotemeta) is primarily meant for escaping any special characters that may be special to regular expressions (all ASCII characters not matching /[A-Za-z_0-9]/), which is why it is also escaping the spaces and semicolon.
Since you mention external files: If you are reading a line such as use Ppppp\\Ppppp; from an external file, then the Perl string will contain two backslashes, and if you print it, it will also show two backslashes. But if you wanted to represent that string as Perl source code, you have to write "use Ppppp\\\\Ppppp;" (or use one of the other methods from the question you linked to).

Perl: Is quotemeta for regular expressions only? Is it safe for file names?

While answering this question regarding safe escaping of filename with spaces (and potentially other characters), one of the answers said to use Perl's built-in quotemeta function.
The documentation of quotemeta states:
quotemeta (and \Q ... \E ) are useful when interpolating strings
into regular expressions, because by default an interpolated variable
will be considered a mini-regular expression.
In the documentation for quotemeta, the only mention of its use is to escape all the characters other than /[A-Za-z_0-9]/ with a \ for use in a regex. It does not state the use for filenames. This does seem like a very pleasant, if undocumented, side effect however.
In a comment to Sinan Ünür answer to the earlier question, hobbs states:
shell escaping is different from
regexp escaping, and although I can't
come up with a situation where
quotemeta would give a truly unsafe
result, it's not meant for the task.
If you must escape, instead of
bypassing the shell, I suggest trying
String::ShellQuote which takes a more
conservative approach using sh single
quotes to defang everything except
single quotes themselves, and
backslashes for single quotes. – hobbs
Aug 13 '09 at 14:25
Is it safe -- completely -- to use quotemeta in place of more conservative file quoting like String::Shellquote? Is quotemeta utf8 or multibyte character safe?
I put together a test that is unclear. quotemeta works well, it seems, except for a file name or directory name with a \n, or \r in it. While rare, these characters are legal in Unix and I have seen them. Recall that certain characters, such as LF, CR and NUL cannot be escaped with \. I read my hard drive with 700k files with quotemeta and had no failures.
I have suspicion (though I have not demonstrated it yet) that quotemeta might fail with multibyte characters where one or more of the bytes falls into the ASCII range. For example,à can be encoded as one character (UTF8 C3 A0) or as two characters (U+0061 gives a u+0300 is a combining graves accent). The only demonstrated failure I have with quotemeta is with files with a \n or \r in the path that I created. I would be interested in other characters to put in nasty_names to test.
ShellQuote works perfectly on all file names except those terminated by a NUL when creating a file. I have never ever had a failure with it.
So what to use? Just to be clear: shell quoting is not something I do often, since I usually just use Perl open to open a pipe to a process. That method does not suffer the shell issues discussed. I am interested since I have seen quotemeta used often for file name escaping.
(Thanks to Ether I have added IPC::System::Simple)
Test file:
use strict; use warnings; use autodie;
use String::ShellQuote;
use File::Find;
use File::Path;
use IPC::System::Simple 'capturex';
my #nasty_names;
my $top_dir = '/Users/andrew/bin/pipetestdir/testdir';
my $sub_dir = "easy_to_remove_me";
my (#qfail, #sfail, #ipcfail);
sub wanted {
if ($File::Find::name) {
my $rtr;
my $exec1="ls ".quotemeta($File::Find::name);
my $exec2="ls ".shell_quote($File::Find::name);
my #exec3= ("ls", $File::Find::name);
$rtr=`$exec1`;
push #qfail, "$exec1"
if $rtr=~/^\s*$/ ;
$rtr=`$exec2`;
push #sfail, "$exec2"
if $rtr=~/^\s*$/ ;
$rtr = capturex(#exec3);
push #ipcfail, \#exec3
if $rtr=~/^\s*$/ ;
}
}
chdir($top_dir) or die "$!";
mkdir "$top_dir/$sub_dir";
chdir "$top_dir/$sub_dir";
push #nasty_names, "name with new line \n in the middle";
push #nasty_names, "name with CR \r in the middle";
push #nasty_names, "name with tab\tright there";
push #nasty_names, "utf \x{0061}\x{0300} combining diacritic";
push #nasty_names, "utf e̋ alt combining diacritic";
push #nasty_names, "utf e\x{cc8b} alt combining diacritic";
push #nasty_names, "utf άέᾄ greek";
push #nasty_names, 'back\slashes\\Not\\\at\\\\end';
push #nasty_names, qw|back\slashes\\IS\\\at\\\\end\\\\|;
sub create_nasty_files {
for my $name (#nasty_names) {
open my $fh, '>', $name ;
close $fh;
}
}
for my $dir (#nasty_names) {
chdir("$top_dir/$sub_dir");
mkpath($dir);
chdir $dir;
create_nasty_files();
}
find(\&wanted, $top_dir);
print "\nquotemeta failed on:\n", join "\n", #qfail;
print "\nShell Quote failed on:\n", join "\n", #sfail;
print "\ncapturex failed on:\n", join "\n", #ipcfail;
print "\n\n\n",
"Remove \"$top_dir/$sub_dir\" before running again...\n\n";
Quotemeta is safe under these assumptions:
Only non-alphanumeric characters have a special meaning.
If a non-alphanumeric character has a special meaning, putting a backslash in front of it will always make it non-special.
If a non-alphanumeric character doesn't have a special meaning, putting a backslash in front of it will do nothing.
The shell violates rules 2 and 3 no matter what quote context you use -- outside of quotes, backslash-newline doesn't generate newline; in double-quotes, backslash-punctuation puts a backslash into the output (outside of a certain list of punctuation); and in single-quotes, everything is literal and backslash doesn't even protect you against a closing single-quote.
I still recommend String::ShellQuote if you need to quote things for the shell. I also recommend avoiding letting the shell process your filenames entirely, if you can, by using LIST-form system/exec/open or IPC::Open2, IPC::Open3, or IPC::System::Simple.
As for things besides the shell... lots of different things violate one or more of the rules. For example, obsolete POSIX "basic" regexes and various kinds of editor regexes have punctuation characters that are non-special by default, but become special when preceded by backslash. Basically what I'm saying is, know the thing that you're feeding your data to very well, and escape properly. Only use quotemeta if it's an exact fit, or if you're using it for something that's not very important.
You could also use IPC::System::Simple capture() or capturex() (which I suggested in another answer on that first question), which will let you bypass the shell.
I added these lines to your script and found that no examples failed:
use IPC::System::Simple 'capturex';
...
my (#qfail, #sfail, #ipcfail);
...
my #exec3= ("ls", $File::Find::name);
...
$rtr = capturex(#exec3);
push #ipcfail, \#exec3
if $rtr=~/^\s*$/ ;
...
print "\ncapturex failed on:\n", join "\n", #ipcfail;
But in general, you should solve the actual problem, rather than attempting to find better band-aids. quotemeta is intended specifically to escape regular expression-significant characters, which as you have discovered is not a perfect overlap with the set of characters that are significant to the shell.
The following is a Unix-only solution; see https://stackoverflow.com/a/32161361/45375 for Windows support.
An alternative is this simple function, which should work robustly even with non-ASCII characters (assuming the correct encoding), as well as \n, and \r, but excluding NUL (see bottom).
sub quoteforsh { join ' ', map { "'" . s/'/'\\''/gr . "'" } #_ }
The function encloses each argument in single-quotes and, if multiple arguments were specified, separates them with spaces.
Single-quoted strings are used, because their contents is not subject to any interpretation in POSIX-like shells.
As such, however, you cannot even escape ' instances themselves, which requires the following workaround: every embedded ' instance is replaced with '\'' (sic), which effectively splits the input string into multiple single-quoted strings, with escaped ' instances - \' - spliced in - the shell then reassembles the string parts into a single string.
Example:
print quoteforsh 'I\'m here & wëll';
literally produces (including the enclosing single-quotes) 'I'\''m here & wëll', which, to the shell, are 3 contiguous strings - 'I', \', and '&well', which the shell then reassembles into a single string, which, after quote removal, yields I'm here & wëll.
OSX Unicode caveat: The HFS+ stores filenames in NFD (decomposed Unicode normal form - base letter followed by another character that is the associated diacritic), whereas Perl typically creates NFC (composed Unicode normal form - a single character identifies the accented letter).
When using literal filenames, this distinction doesn't matter (the system calls do the mapping), but when using globs, it does, and, unfortunately, you have to do your own translation between the two forms.
Support for NUL (0x0) chars.:
I don't think NUL chars. in filenames are a real-world concern:
Most POSIX-like shells (bash, dash, ksh) ignore NUL chars. on the command line - zsh being the only exception.
Even if that weren't an issue, according to Wikipedia, most Unix systems do not support NUL chars. in filenames.
Besides, trying to pass a literal with a NUL to Perl's system() function breaks the invocation, presumably, because the string passed to sh -c is cut off at the first NUL:
system "echo 'a\x{0}b'"; # BREAKS

Perl string sub

I want to replace something with a path like C:\foo, so I:
s/hello/c:\foo
But that is invalid.
Do I need to escape some chars?
Two problems that I can see.
Your first problem is that your s/// replacement is not terminated:
s/hello/c:\foo # fatal syntax error: "Substitution replacement not terminated"
s/hello/c:\foo/ # syntactically okay
s!hello!c:\foo! # also okay, and more readable with backslashes (IMHO)
Your second problem, the one you asked about, is that the \f is taken as a form feed escape sequence (ASCII 0x0C), just as it would be in double quotes, which is not what you want.
You may either escape the backslash, or let variable interpolation "hide" the problem:
s!hello!c:\\foo! # This will do what you want. Note double backslash.
my $replacement = 'c:\foo' # N.B.: Using single quotes here, not double quotes
s!hello!$replacement!; # This also works
Take a look at the treatment of Quote and Quote-like Operators in perlop for more information.
If I understand what you're asking, then this might be something like what you're after:
$path = "hello/there";
$path =~ s/hello/c:\\foo/;
print "$path\n";
To answer your question, yes you do need to double the backslash because \f is an escape sequence for "form feed" in a Perl string.
The problem is that you are not escaping special characters:
s/hello/c:\\foo/;
would solve your problem. \ is a special character so you need to escape it. {}[]()^$.|*+?\ are meta (special) characterss which you need to escape.
Additional reference: http://perldoc.perl.org/perlretut.html

How do I escape special characters for a substitution in a Perl one-liner?

Is there some way to replace a string such as #or * or ? or & without needing to put a "\" before it?
Example:
perl -pe 'next if /^#/; s/\#d\&/new_value/ if /param5/' test
In this example I need to replace a #d& with new_value but the old value might contain any character, how do I escape only the characters that need to be escaped?
You have several problems:
You are using \b incorrectly
You are replacing code with shell variables
You need to quote metacharacters
From perldoc perlre
A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it
Neither of the characters # or & are \w characters. So your match is guaranteed to fail. You may want to use something like s/(^|\s)\#d\&(\s|$)/${1}new text$2/
(^|\s) says to match either the start of the string (^)or a whitespace character (\s).
(\s|$) says to match either the end of the string ($) or a whitespace character (\s).
To solve the second problem, you should use %ENV.
To solve the third problem, you should use the \Q and \E escape sequences to escape the value in $ENV{a}.
Putting it all together we get:
#!/bin/bash
export a='#d&'
export b='new text'
echo 'param5 #d&' |
perl -pe 'next if /^#/; s/(^|\s)\Q$ENV{a}\E(\s|$)/$1$ENV{b}$2/ if /param5/'
Which prints
param5 new text
As discussed at perldoc perlre:
...Today it is more common to use the quotemeta() function or the "\Q" metaquoting
escape sequence to disable all metacharacters' special meanings like this:
/$unquoted\Q$quoted\E$unquoted/
Beware that if you put literal backslashes (those not inside interpolated variables) between "\Q" and "\E", double-quotish backslash interpolation may
lead to confusing results. If you need to use literal backslashes within "\Q...\E", consult "Gory details of parsing quoted constructs" in perlop.
You can also use a ' as the delimiter in the s/// operation to make everything be parsed literally:
my $text = '#';
$text =~ s'#'1';
print $text;
In your example, you can do (note the single quotes):
perl -pe 's/\b\Q#f&\E\b/new_value/g if m/param5/ and not /^ *#/'
The other answers have covered the question, now here's your meta-problem: Leaning Toothpick Syndrome. Its when the delimiter and escapes start to blur together:
s/\/foo\/bar\\/\/bar\/baz/
The solution is to use a different delimiter. You can use just about anything, but balanced braces work best. Most editors can parse them and you generally don't have to worry about escaping.
s{/foo/bar\\}{/bar/baz}
Here's your regex with braced delimiters.
s{\#d\&}{new_value}
Much easier on the eyeholes.
If you really want to avoid typing the \s, put your search string into a variable and then use that in your regex instead. You don't need quotemeta or \Q ... \E in that case. For example:
my $s = '#d&';
s/$s/new_value/g;
If you must use this in a one-liner, bear in mind that you will have to escape the $s if you use "s to contain your perl code, or escape the 's if you use 's to contain your perl code.
If you have a string like
my $var1 = abc$123
and you want to replace it with abcd then you have to use \Q \E. If you don't then no matter what perl doesn't replace the string.
This is the only thing that worked for me.
my $var2 = s/\Q$var1\E/abcd/g;

What is the correct usage of (nested | double | simple) quotes

I'm sure this question may seem foolish to some of you, but I'm here to learn.
Are these assumptions true for most of the languages ?
EDIT : OK, let's assume I'm talking about Perl/Bash scripting.
'Single quotes'
=> No interpretation at all (e.g. '$' or any metacharacter will be considered as a character and will be printed on screen)
"Double quotes"
=> Variable interpretation
To be more precise about my concerns, I'm writing some shell scripts (in which quotes can sometimes be a big hassle), and wrote this line :
CODIR=`pwd | sed -e "s/$MODNAME//"`
If I had used single quotes in my sed, my pattern would have been '$MODNAME', right ? (and not the actual value of $MODNAME, which is `alpha' in this particular case)
Another problem I had, with an awk inside an echo :
USAGE=`echo -ne "\
Usage : ./\`basename $0\` [-hnvV]\n\
\`ls -l ${MODPATH}/reference/ | awk -F " " '$8 ~ /\w+/{print "> ",$8}'\`"`
I spent some time debugging that one. I came to the conclusion that backticks were escaped so that the interpreter doesn't "split" the command (and stop right before «basename»). In the awk commmand, '$8' is successfully interpreted by awk, thus not by shell. What if I wanted to use a shell variable ? Would I write awk -F "\"$MY_SHELL_VAR\"" ? Because $MY_SHELL_VAR as is, will be interpreted by awk, won't it ?
Don't hesitate to add any information about quoting or backticks !
Thank you ! :)
It varies massively by language. For example, in the C/Java/C++/C# etc family, you can't use single quotes for a string at all - they're only for single characters.
I think it's far better to learn the rules properly for the languages you're actually interested in than to try to generalise.
Are these assumptions true for most of the languages ?
Answer: No
In bash scripting, backticks are deprecated in favor of $() in part because it is non-obvious how nested quotes and escaping are supposed to work. You may also want to take a look at Bash Pitfalls.
It's definitely not the same for all languages. In Python, for example, single and double quotes are interchangeable. The only difference is that you can include single quotes within a double-quoted string without escaping them and vice versa ("How's it going?").
Also, there are triple-quoted strings that can span multiple lines.
In Perl, you also have q() and qq() to help you in nested quoting situations:
my $x = q(a string with 'single quotes');
my $y = qq(an $interpreted string with "double quotes");
These certainly will help you avoid "\"needlessly\"" '\'escaping\'' internal quotes.
Yes, something like awk -F "\"$MY_SHELL_VAR\"" will work, however in this case you wouldn't be able to use variables in awk, since they will be interpreted by shell, so the way to go is something like this (I will use command simpler than yours, if you don't mind :) ):
awk -F " " '$8 ~ /\w+/{print "> ",$8, '$SOME_SHELL_VAR'}'
Note the single quotes terminating and restarting.
The trickiest part, usually, is to pass a quote in the argument to the command. In this case you need to terminate single quote, add escaped quote character, start quote again, like this:
awk '$1 ~ '\''{print}'
Note, that single quote can't be escaped inside single quotes, since the "\" won't be treated as an escape character.
This is probably not related directly to your quiestion, but still useful.
I don't know about perl, but for bash you don't need to backslash the newline.
As for quotes, I have a (very personal) pattern that I call the "five quotes" pattern. It helps to put one quote in a string enclosed by the same kind of quotes
For instance:
doublequoted="some things "'"'"quoted"'"'" and some not"
simplequoted='again '"'"'quote this'"'"' but not that'
Note that you can freely append strings with different kinds of quotes, which is useful when you want the shell to interprete some vars but not some others:
awk -F " " '$8 ~ /\w+/{print "> ",$8, '"$SOME_SHELL_VAR"'}'
Also, I don't use the backtick anymore but the $(...)pattern which is more legible and can be nested.
USAGE=$(echo -ne "
Usage : ./$(basename $0) [-hnvV]\n
$(ls -l ${MODPATH}/reference/ | awk -F " " '$8 ~ /\w+/{print "> ",$8}')")
In perl, double quoted strings will have their variables expanded.
If you write that for instance:
my $email = "foo#bar.com" ;
perl will try to expand #bar. If you use strict, you'll see an complain about the array bar not existing. If you don't, you'll just see a weird behavior.
So it's better to write:
my $email = 'foo#bar.com' ;
For these types of reason, my advice is to always use single quote for strings, unless you know that you need variable expansion.