How to escape all special characters in a string (along with single and double quotes)? - perl

E.g:
$myVar="this###!~`%^&*()[]}{;'".,<>?/\";
I am not able to export this variable and use it as it is in my program.

Use q to store the characters and use the quotemeta to escape the all character
my $myVar=q("this###!~`%^&*()[]}{;'".,<>?/\");
$myVar = quotemeta($myVar);
print $myVar;
Or else use regex substitution to escape the all character
my $myVar=q("this###!~`%^&*()[]}{;'".,<>?/\");
$myVar =~s/(\W)/\\$1/g;
print $myVar;

This is what quotemeta is for, if I understand your quest
Returns the value of EXPR with all non-"word" characters backslashed. (That is, all characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the \Q escape in double-quoted strings.
Its use is very simple
my $myVar = q(this###!~`%^&*()[]}{;'".,<>?/\\);
print "$myVar\n";
my $quoted_var = quotemeta $myVar;
print "$quoted_var\n";
Note that we must manually escape the last backslash, to prevent it from escaping the closing delimiter. Or you can tack on an extra space at the end, and then strip it (by chop).
my $myVar = q(this###!~`%^&*()[]}{;'".,<>?/\ );
chop $myVar;
Now transform $myVar like above, using quotemeta.
I take the outside pair of " to merely indicate what you'd like in the variable. But if they are in fact meant to be in the variable then simply put it all inside q(), since then the last character is ". The only problem is a backslash immediately preceding the closing delimiter.
If you need this in a regex context then you use \Q to start and \E to end escaping.

Giving Thanks to:
What's between \Q and \E is treated as normal characters, not regexp characters. For example,
'.' =~ /./; # match
'a' =~ /./; # match
'.' =~ /\Q.\E/; # match
'a' =~ /\Q.\E/; # no match
It doesn't stop variables from being interpolated.
$search = '.';
'.' =~ /$search/; # match
'a' =~ /$search/; # match
'.' =~ /\Q$search\E/; # match
'a' =~ /\Q$search\E/; # no match

Related

How do you match \'

I need a regex to match \' <---- literally backslash apostrophe.
my $line = '\'this';
$line =~ s/(\o{134})(\o{047})/\\\\'/g;
$line =~ s/\\'/\\\\'/g;
$line =~ s/[\\][']/\\\\'/g;
printf('%s',$line);
print "\n";
All I get out of this is
'this
When what I want is
\\'this
This occurs whether the string is declared using ' or ". This was a test script for tracking down a file parsing bug. I wanted to confirm that the regex was working as expected.
I don't know if when the backslash apostrophe is parsed by the regex it is not treated as 2 characters, but is instead treated as an escaped apostrophe.
Either way. what is the best way to match \' and print out \\'? I don't want to escape any other back slashes or apostrophes and I can't change the text I am parsing, just the way it is handled and outputted.
s/\\'/\\\\'/g
All three of your patterns match a backslash followed by a quote, the above being the simplest.
Your testing was in vain because your string doesn't contain any backslashes. Both string literals "\'this" (from earlier edit) and '\'this' (from later edit) produce the string 'this.
say "\'this"; # 'this
say '\'this'; # 'this
To produce the string \'this, you could use either of the following string literals (among others):
"\\'this"
'\\\'this'
say "\\'this"; # \'this
say '\\\'this'; # \'this
The answer is, of course
s/[\\][']/\\\\'/g
This will match
\'this
And substitute with this
\\'this
This was the only way I could get it to work.
Perl
Too much "regexing" in your snippet. Try:
my $line = '\'this';
$line =~ s/'/\\\\\'/g;
printf('%s',$line);
print "\n";
# \\'this
or... if you want another mode:
my $line = '\'this';
$line =~ s/'/\\'/g;
printf('%s',$line);
print "\n";
# \'this

Is the use of "||" in a substring search prohibited?

I have a little Perl script which includes a substring search as follows.
#!/usr/bin/perl
use strict;
use warnings;
my $line = "this && is || a test if && ||";
my $nb_if = findSymbols($line, "if ");
my $nb_and = findSymbols($line, "&&");
my $nb_or = findSymbols($line, "||");
print "\nThe result for this func is $nb_if=if , $nb_and=and, $nb_or=or\n";
sub findSymbols {
my $n = () = ($_[0] =~ m/$_[1]/g);
return $n;
}
It should return:
The result for this func is 1=if , 2=and, 2=or
but, instead it returns:
The result for this func is 1=if , 2=and, 30=or
I don't understand what's wrong with my code.
Use quotemeta to escape the special meaning of the regular expression containing || (and any other characters which you pass to the function):
sub findSymbols {
my $pat = quotemeta $_[1];
my $n = () = ($_[0] =~ m/$pat/g);
return $n;
}
The pipe character (|) has a special meaning in regular expressions. It means "or" (matching either the thing on its left or the thing on its right). So having a regex that consists of just two pipes is interpreted as meaning "match an empty string or an empty string or an empty string" - and that matches everywhere in your string (30 times!)
So you need to stop the pipe being interpreted as a special character and let it just represent an actual pipe character. Here are three ways to do that:
Escape the pipes with backslashes when you're creating the string that you pass to findSymbols().
# Note: I've also changed "..." to '...'
# to avoid having to double-escape
my $nb_or = findSymbols($line, '\|\|');
Use quotemeta() to automatically escape problematic characters in any string passed to findSymbols().
my $escaped_regex = quotemeta($_[0]);
my $n = () = ($_[0] =~ m/$escaped_regex/g);
Use \Q...\E to automatically escape any problematic characters used in your regex.
# Note: In this case, the \E isn't actually needed
# as it's at the end of the regex.
my $n = () = ($_[0] =~ m/\Q$_[0]\E/g);
For more detailed information on using regular expressions in Perl, see perlretut and perlre.
| is the alternation operator in the regular expression used by m//. You need to escape each | with a backslash to match literal |s.
my $nb_or = findSymbols($line, "\\|\\|"); # or '\|\|`
(but using quotemeta as suggested by #toolic is a much better idea, as it frees your caller from having to worry about details that should be part of the abstraction provided by findSymbols.)

Replace returns with spaces and commas with semicolons?

I want to be able to be able to replace all of the line returns (\n's) in a single string (not an entire file, just one string in the program) with spaces and all commas in the same string with semicolons.
Here is my code:
$str =~ s/"\n"/" "/g;
$str =~ s/","/";"/g;
This will do it. You don't need to use quotations around them.
$str =~ s/\n/ /g;
$str =~ s/,/;/g;
Explanation of modifier options for the Substitution Operator (s///)
e Forces Perl to evaluate the replacement pattern as an expression.
g Replaces all occurrences of the pattern in the string.
i Ignores the case of characters in the string.
m Treats the string as multiple lines.
o Compiles the pattern only once.
s Treats the string as a single line.
x Lets you use extended regular expressions.
You don't need to quote in your search and replace, only to represent a space in your first example (or you could just do / / too).
$str =~ s/\n/" "/g;
$str =~ s/,/;/g;
I'd use tr:
$str =~ tr/\n,/ ;/;

String match excluding brackets

The below two strings are exactly same, but I am unable to match using regex? Can some one help me in this?
$x="Enzyme(s)"; $y="Enzyme(s)";
if ($x =~ /^$y$/){print "String Matches"};
use quotemeta:
my $x="Enzyme(s)";
my $y="Enzyme(s)";
$y = quotemeta($y);
if ($x =~ /^$y$/){print "String Matches"};
The parentheses in your match string, $y, are being interpreted as a grouping or capture. They need to be "escaped" so that they can be treated as normal characters.
Put the following code after your assignment of $y.
$y =~ s/\(/\\(/g; # escape left parens
$y =~ s/\)/\\)/g; # escape right parens
The 's' is for 'substitution'.
The 'g' is for 'global' replacement. I.e., replace all occurrences in the string.
You should use quotemeta as M42 already mentioned, or to avoid an extra line of code and a permanent change of $y variable, you can use the \Q...\E in the regex which disable all pattern metacharacters within the range:
my $x="Enzyme(s)";
my $y="Enzyme(s)";
if ($x =~ /^\Q$y\E$/){print "String Matches"};

How can I manually interpolate string escapes in a Perl string?

In perl suppose I have a string like 'hello\tworld\n', and what I want is:
'hello world
'
That is, "hello", then a literal tab character, then "world", then a literal newline. Or equivalently, "hello\tworld\n" (note the double quotes).
In other words, is there a function for taking a string with escape sequences and returning an equivalent string with all the escape sequences interpolated? I don't want to interpolate variables or anything else, just escape sequences like \x, where x is a letter.
Sounds like a problem that someone else would have solved already. I've never used the module, but it looks useful:
use String::Escape qw(unbackslash);
my $s = unbackslash('hello\tworld\n');
You can do it with 'eval':
my $string = 'hello\tworld\n';
my $decoded_string = eval "\"$string\"";
Note that there are security issues tied to that approach if you don't have 100% control of the input string.
Edit: If you want to ONLY interpolate \x substitutions (and not the general case of 'anything Perl would interpolate in a quoted string') you could do this:
my $string = 'hello\tworld\n';
$string =~ s#([^\\A-Za-z_0-9])#\\$1#gs;
my $decoded_string = eval "\"$string\"";
That does almost the same thing as quotemeta - but exempts '\' characters from being escaped.
Edit2: This still isn't 100% safe because if the last character is a '\' - it will 'leak' past the end of the string though...
Personally, if I wanted to be 100% safe I would make a hash with the subs I specifically wanted and use a regex substitution instead of an eval:
my %sub_strings = (
'\n' => "\n",
'\t' => "\t",
'\r' => "\r",
);
$string =~ s/(\\n|\\t|\\n)/$sub_strings{$1}/gs;