How do you match \' - perl

I need a regex to match \' <---- literally backslash apostrophe.
my $line = '\'this';
$line =~ s/(\o{134})(\o{047})/\\\\'/g;
$line =~ s/\\'/\\\\'/g;
$line =~ s/[\\][']/\\\\'/g;
printf('%s',$line);
print "\n";
All I get out of this is
'this
When what I want is
\\'this
This occurs whether the string is declared using ' or ". This was a test script for tracking down a file parsing bug. I wanted to confirm that the regex was working as expected.
I don't know if when the backslash apostrophe is parsed by the regex it is not treated as 2 characters, but is instead treated as an escaped apostrophe.
Either way. what is the best way to match \' and print out \\'? I don't want to escape any other back slashes or apostrophes and I can't change the text I am parsing, just the way it is handled and outputted.

s/\\'/\\\\'/g
All three of your patterns match a backslash followed by a quote, the above being the simplest.
Your testing was in vain because your string doesn't contain any backslashes. Both string literals "\'this" (from earlier edit) and '\'this' (from later edit) produce the string 'this.
say "\'this"; # 'this
say '\'this'; # 'this
To produce the string \'this, you could use either of the following string literals (among others):
"\\'this"
'\\\'this'
say "\\'this"; # \'this
say '\\\'this'; # \'this

The answer is, of course
s/[\\][']/\\\\'/g
This will match
\'this
And substitute with this
\\'this
This was the only way I could get it to work.

Perl
Too much "regexing" in your snippet. Try:
my $line = '\'this';
$line =~ s/'/\\\\\'/g;
printf('%s',$line);
print "\n";
# \\'this
or... if you want another mode:
my $line = '\'this';
$line =~ s/'/\\'/g;
printf('%s',$line);
print "\n";
# \'this

Related

How can I prevent Perl from interpreting double-backslash as single-backslash character?

How can I print a string (single-quoted) containing double-backslash \\ characters as is without making Perl somehow interpolating it to single-slash \? I don't want to alter the string by adding more escape characters also.
my $string1 = 'a\\\b';
print $string1; #prints 'a\b'
my $string1 = 'a\\\\b';
#I know I can alter the string to escape each backslash
#but I want to keep string as is.
print $string1; #prints 'a\\b'
#I can also use single-quoted here document
#but unfortunately this would make my code syntactically look horrible.
my $string1 = <<'EOF';
a\\b
EOF
print $string1; #prints a\\b, with newline that could be removed with chomp
The only quoting construct in Perl that doesn't interpret backslashes at all is the single-quoted here document:
my $string1 = <<'EOF';
a\\\b
EOF
print $string1; # Prints a\\\b, with newline
Because here-docs are line-based, it's unavoidable that you will get a newline at the end of your string, but you can remove it with chomp.
Other techniques are simply to live with it and backslash your strings correctly (for small amounts of data), or to put them in a __DATA__ section or an external file (for large amounts of data).
If you are mildly crazy, and like the idea of using experimental software that mucks about with perl's internals to improve the aesthetics of your code, you can use the Syntax::Keyword::RawQuote module, on CPAN since this morning.
use syntax 'raw_quote';
my $string1 = r'a\\\b';
print $string1; # prints 'a\\\b'
Thanks to #melpomene for the inspiration.
Since the backslash interpolation happens in string literals, perhaps you could declare your literals using some other arbitrary symbol, then substitute them for something else later.
my $string = 'a!!!b';
$string =~ s{!}{\\}g;
print $string; #prints 'a\\\b'
Of course it doesn't have to be !, any symbol that does not conflict with a normal character in the string will do. You said you need to make a number of strings, so you could put the substitution in a function
sub bs {
$_[0] =~ s{!}{\\}gr
}
my $string = 'a!!!b';
print bs($string); #prints 'a\\\b'
P.S.
That function uses the non-destructive substitution modifier /r introduced in v5.14. If you are using an older version, then the function would need to be written like this
sub bs {
$_[0] =~ s{!}{\\}g;
return $_[0];
}
Or if you like something more readable
sub bs {
my $str = shift;
$str =~ s{!}{\\}g;
return $str;
}

How to escape all special characters in a string (along with single and double quotes)?

E.g:
$myVar="this###!~`%^&*()[]}{;'".,<>?/\";
I am not able to export this variable and use it as it is in my program.
Use q to store the characters and use the quotemeta to escape the all character
my $myVar=q("this###!~`%^&*()[]}{;'".,<>?/\");
$myVar = quotemeta($myVar);
print $myVar;
Or else use regex substitution to escape the all character
my $myVar=q("this###!~`%^&*()[]}{;'".,<>?/\");
$myVar =~s/(\W)/\\$1/g;
print $myVar;
This is what quotemeta is for, if I understand your quest
Returns the value of EXPR with all non-"word" characters backslashed. (That is, all characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the \Q escape in double-quoted strings.
Its use is very simple
my $myVar = q(this###!~`%^&*()[]}{;'".,<>?/\\);
print "$myVar\n";
my $quoted_var = quotemeta $myVar;
print "$quoted_var\n";
Note that we must manually escape the last backslash, to prevent it from escaping the closing delimiter. Or you can tack on an extra space at the end, and then strip it (by chop).
my $myVar = q(this###!~`%^&*()[]}{;'".,<>?/\ );
chop $myVar;
Now transform $myVar like above, using quotemeta.
I take the outside pair of " to merely indicate what you'd like in the variable. But if they are in fact meant to be in the variable then simply put it all inside q(), since then the last character is ". The only problem is a backslash immediately preceding the closing delimiter.
If you need this in a regex context then you use \Q to start and \E to end escaping.
Giving Thanks to:
What's between \Q and \E is treated as normal characters, not regexp characters. For example,
'.' =~ /./; # match
'a' =~ /./; # match
'.' =~ /\Q.\E/; # match
'a' =~ /\Q.\E/; # no match
It doesn't stop variables from being interpolated.
$search = '.';
'.' =~ /$search/; # match
'a' =~ /$search/; # match
'.' =~ /\Q$search\E/; # match
'a' =~ /\Q$search\E/; # no match

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;

Replace returns with spaces and commas with semicolons?

I want to be able to be able to replace all of the line returns (\n's) in a single string (not an entire file, just one string in the program) with spaces and all commas in the same string with semicolons.
Here is my code:
$str =~ s/"\n"/" "/g;
$str =~ s/","/";"/g;
This will do it. You don't need to use quotations around them.
$str =~ s/\n/ /g;
$str =~ s/,/;/g;
Explanation of modifier options for the Substitution Operator (s///)
e Forces Perl to evaluate the replacement pattern as an expression.
g Replaces all occurrences of the pattern in the string.
i Ignores the case of characters in the string.
m Treats the string as multiple lines.
o Compiles the pattern only once.
s Treats the string as a single line.
x Lets you use extended regular expressions.
You don't need to quote in your search and replace, only to represent a space in your first example (or you could just do / / too).
$str =~ s/\n/" "/g;
$str =~ s/,/;/g;
I'd use tr:
$str =~ tr/\n,/ ;/;

Extracting alphanumeric phrase from a string

Trying to extract the alphanumeric characters from this string:
A_phase_I-II,_open-req_project_id_PX15RAD001
The problem is: the term PX15RAD001 can occur anywhere in the string.
Trying to extract the alpha-numeric part using the below expression. But this returns the entire string. I thought Alum was a valid keyword for alpha-numerics. Is that not the case?
(my $string = $line ) =~ s/\P{Alnum}//g;
print $string;
How can I extract the alphanumeric part of the afore mentioned string?
Thanks in advance.
-simak
At the end as per your input:
> echo "A_phase_I-II,_open-req_project_id_PX15RAD001"|perl -lne 'print $1 if(/id_([A-Z0-9]*)/)'
PX15RAD001
In the middle:
> echo "A_phase_I-II,_open-req_id_PX15RAD001_project" | perl -lne 'print $1 if(/id_([A-Z0-9]*)/)'
PX15RAD001
or in your terms:
$line=~m/id_([A-Z0-9]*)/g;
print $1;
Here are some testcases, produced with the comments of #Vijay s Answer:
my #line = (
'A_phase_I-II,_open-req_project_id_PX15RAD001',
'_PX15RAD001_A_phase_I-II,_open-req_project_id',
'A_pha3333se_I-II,_ope_PX15RAD001_n-req_project',
'A_phase_I-II,_PX15RAD001_open-req_projec123123123t_id',
'A_phase_I-II_PX15RAD001_roject_id'
);
foreach my $string ( #line ) {
$string =~ m{_([^_]{10})_?}g;
print $1 . "\n" if $1;
}
These kinds of questions are hard to answer because there is not enough information. What information we have is:
You say your target string is "alphanumeric", but the entire input string is alphanumeric, except for some punctuation, so that really doesn't tell us anything.
You say it is 12 characters long, but the sample you show is 10 characters long.
You seem to think that "alphanumeric" does not include underscore.
So, the reliable information I can sense from you is:
Target string is always delimited by underscore _
Target string is 10-12 characters, all alphanumeric except underscore.
The "reliable" solution based on this rather skimpy information is:
my $str = "A_phase_I-II,_open-req_project_id_PX15RAD001";
for my $field (split /_/, $str) {
if (length($field) <= 12 and
length($field) >= 10 and # field is 10-12 characters
$field !~ /\W/) { # and contains no non-alphanumerics
# do something
}
}
By splitting on underscore, we can easily isolate each field in the string and perform simpler tests on it, such as the ones above.