Parse single quoted string using Marpa:r2 perl - perl

How to parse single quoted string using Marpa:r2?
In my below code, the single quoted strings appends '\' on parsing.
Code:
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
{ default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= Expression
# include begin
Expression ::= Param
Param ::= Unquoted
| ('"') Quoted ('"')
| (') Quoted (')
:discard ~ whitespace
whitespace ~ [\s]+
Unquoted ~ [^\s\/\(\),&:\"~]+
Quoted ~ [^\s&:\"~]+
END_OF_SOURCE
});
my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
Output's:
Trying to parse:
foo
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
"foo"
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
'foo'
Output:
$VAR1 = [
[
'\'foo\''
]
]; (don't want it to be parsed like this)
Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.
Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')

The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.
When you parse the input 'foo', Marpa will consider the three Param alternatives. The predicted lexemes at that position are:
Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('
Yes, the last is literally ) Quoted (, not anything containing a single quote.
Even if it were ([']) Quoted ([']): Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.
What would happen for an input like " foo " (with double quotes)? Now, only the '"' lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing " is matched.
To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.
Since lexemes cannot have actions, it is usually best to add a proxy production:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
The action could then do something like:
sub process_quoted {
my (undef, $s) = #_;
# remove delimiters from double-quoted string
return $1 if $s =~ /^"(.*)"$/s;
# remove delimiters from single-quoted string
return $1 if $s =~ /^'(.*)'$/s;
die "String was not delimited with single or double quotes";
}

Your result doesn't contain \', it contains '. Dumper merely formats the result like that so it's clear what's inside the string and what isn't.
You can test this behavior for yourself:
use Data::Dumper;
my $tick = chr(39);
my $back = chr(92);
print "Tick Dumper: " . Dumper($tick);
print "Tick Print: " . $tick . "\n";
print "Backslash Dumper: " . Dumper($back);
print "Backslash Print: " . $back . "\n";
You can see a demo here: https://ideone.com/d1V8OE
If you don't want the output to contain single quotes, you'll probably need to remove them from the input yourself.

I am not so familar with Marpa::R2, but could you try to use an action on the Expression rule:
Expression ::= Param action => strip_quotes
Then, implement a simple quote stripper like:
sub MyActions::strip_quotes {
#{$_[1]}[0] =~ s/^'|'$//gr;
}

Related

How to move a substring with preg_replace or preg_match in PHP?

I want to find a substring and move it in the string instead of replacing (for example, moving it from the beginning to the end of the string).
'THIS the rest of the string' -> 'the rest of the string THIS'
I do this by the following code
preg_match('/^(THIS).?/', $str, $match);
$str = trim( $str . $match[1] );
$str = preg_replace('/^(THIS).?/', '', $str);
There should be an easier way to do this with one regex.
You may use
$re = '/^(THIS)\b\s*(.*)/s';
$str = 'THIS the rest of the string';
$result = preg_replace($re, '$2 $1', $str);
See the regex demo and a PHP demo.
Details
^ - start of string
(THIS) - Group 1 (referenced to with $1 from the replacement pattern): THIS
\b - a word boundary (if you do not need a whole word, you may remove it)
\s* - 0+ whitespaces (if there is always at least one whitespace, use \s+ and remove \b, as it will become redundant)
(.*) - Group 2 (referenced to with $2 from the replacement pattern): the rest of the string (s modifier allows . match line break chars, too).

How to escape all special characters in a string (along with single and double quotes)?

E.g:
$myVar="this###!~`%^&*()[]}{;'".,<>?/\";
I am not able to export this variable and use it as it is in my program.
Use q to store the characters and use the quotemeta to escape the all character
my $myVar=q("this###!~`%^&*()[]}{;'".,<>?/\");
$myVar = quotemeta($myVar);
print $myVar;
Or else use regex substitution to escape the all character
my $myVar=q("this###!~`%^&*()[]}{;'".,<>?/\");
$myVar =~s/(\W)/\\$1/g;
print $myVar;
This is what quotemeta is for, if I understand your quest
Returns the value of EXPR with all non-"word" characters backslashed. (That is, all characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the \Q escape in double-quoted strings.
Its use is very simple
my $myVar = q(this###!~`%^&*()[]}{;'".,<>?/\\);
print "$myVar\n";
my $quoted_var = quotemeta $myVar;
print "$quoted_var\n";
Note that we must manually escape the last backslash, to prevent it from escaping the closing delimiter. Or you can tack on an extra space at the end, and then strip it (by chop).
my $myVar = q(this###!~`%^&*()[]}{;'".,<>?/\ );
chop $myVar;
Now transform $myVar like above, using quotemeta.
I take the outside pair of " to merely indicate what you'd like in the variable. But if they are in fact meant to be in the variable then simply put it all inside q(), since then the last character is ". The only problem is a backslash immediately preceding the closing delimiter.
If you need this in a regex context then you use \Q to start and \E to end escaping.
Giving Thanks to:
What's between \Q and \E is treated as normal characters, not regexp characters. For example,
'.' =~ /./; # match
'a' =~ /./; # match
'.' =~ /\Q.\E/; # match
'a' =~ /\Q.\E/; # no match
It doesn't stop variables from being interpolated.
$search = '.';
'.' =~ /$search/; # match
'a' =~ /$search/; # match
'.' =~ /\Q$search\E/; # match
'a' =~ /\Q$search\E/; # no match

Perl Text processing on a variable before its usage

I wrote a perl script whihc will output a list containing similar entries like below:
$var = ' whatever'
$var contains: a single quote, a space, the word whatever, single quote
actually, this is key of a hash and i want to pull the value for the same. but due to the single quotes and a space in betweene, i am not able to pull the hash key value.
So, i want to strip $var as below:
$var = whatever
meaning remove the single quote, the space and the trailing single quote.
so that I can use $var as hash key to pull the respective value.
could you guide me on a perl oneliner for the same.
thnaks.
Here is several ways to do it, but beware - modifying the keys in a hash can end with unwanted results, like:
use strict;
use warnings;
use Data::Dumper;
my $src = {
"a a" => 1,
" a a " => 2,
"' a a '" => 3,
};
print "src: ", Dumper($src);
my $trg;
#$trg{ map { s/^[\s']*(.*?)[\s']*$/$1/; $_ } keys %$src } = values %$src;
print "copy: ", Dumper($trg);
will produce:
src: $VAR1 = {
' a a ' => 2,
'\' a a \'' => 3,
'a a' => 1
};
copy: $VAR1 = {
'a a' => 1
};
Any regex is possible do explain with YAPE::Regex::Explain module. (from CPAN). For the above regex:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new( qr(^[\s']*(.*?)[\s']*$) )->explain;
will produce:
The regular expression:
(?-imsx:^[\s']*(.*?)[\s']*$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In short the: s/^[\s']*(.*?)[\s']*$/$1/; mean:
at the beginning of the string match whitespaces or apostrophe as much times is possible,
then match anything
match at the end of string whitespaces or apostrophes as much times as possible
and keep the only the "anything" part
#!/usr/bin/perl
$string = "' my string'";
print $string . "\n";
$string =~ s/'//g;
$string =~ s/^ //g;
print $string;
Output
' my string'
my string
$var =~ tr/ '//d;
see: tr operator
or, by regex
$var =~ s/(?:^['\s]+)|'//g;
The latter will keep the spaces in the middle of the word, the former removes all spaces and single quotes.
A short test:
...
$var = q{' what ever'};
$var =~ s/
(?: # find the following group
^ # at string begin, followed by
['\s]+ # space or single quote, one or more
) # close group
| # OR
' # single quotes in the while string
//gx ; # replace by nothing, use formatted regex (x)
print "|$var|\n";
...
prints:
|what ever|
as expected.

How to convert single quoted string to double quoted string so it gets interpreted?

For example I have,
my $str = '\t';
print "My String is ".$str;
I want the output to interpret the tab character and return something like:
"My String is \t"
I am actually getting the value of the string from the database, and it returns it as a single quoted string.
String::Interpolate does exactly that
$ perl -MString::Interpolate=interpolate -E 'say "My String is [". interpolate(shift) . "]"' '\t'
My String is [ ]
'\t' and "\t" are string literals, pieces of Perl code that produces strings ("\","t" and the tab character respectively). The database doesn't return Perl code, so describing the problem in terms of single-quoted literals and double-quoted literals makes no sense. You have a string, period.
The string is formed of the characters "\" and "t". You want to convert that sequence of characters into the tab character. That's a simple substitution.
s/\\t/\t/g
I presume you don't want to deal with just \t. You can create a table of the sequences.
my %escapes = (
"t" => "\t",
"n" => "\n",
"\" => "\\",
);
my $escapes_pat = join('', map quotemeta, keys(%escapes));
$escapes_pat = qr/[$escapes_pat]/;
s/\\($escapes_pat)/$escapes{$1}/g;
You can follow the technique in perlfaq4's answer to How can I expand variables in text strings?:
If you can avoid it, don't, or if you can use a templating system, such as Text::Template or Template Toolkit, do that instead. You might even be able to get the job done with sprintf or printf:
my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
However, for the one-off simple case where I don't want to pull out a full templating system, I'll use a string that has two Perl scalar variables in it. In this example, I want to expand $foo and $bar to their variable's values:
my $foo = 'Fred';
my $bar = 'Barney';
$string = 'Say hello to $foo and $bar';
One way I can do this involves the substitution operator and a double /e flag. The first /e evaluates $1 on the replacement side and turns it into $foo. The second /e starts with $foo and replaces it with its value. $foo, then, turns into 'Fred', and that's finally what's left in the string:
$string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
The /e will also silently ignore violations of strict, replacing undefined variable names with the empty string. Since I'm using the /e flag (twice even!), I have all of the same security problems I have with eval in its string form. If there's something odd in $foo, perhaps something like #{[ system "rm -rf /" ]}, then I could get myself in trouble.
To get around the security problem, I could also pull the values from a hash instead of evaluating variable names. Using a single /e, I can check the hash to ensure the value exists, and if it doesn't, I can replace the missing value with a marker, in this case ??? to signal that I missed something:
my $string = 'This has $foo and $bar';
my %Replacements = (
foo => 'Fred',
);
# $string =~ s/\$(\w+)/$Replacements{$1}/g;
$string =~ s/\$(\w+)/
exists $Replacements{$1} ? $Replacements{$1} : '???'
/eg;
print $string;
Well, I just tried below workaround it worked. Please have a look
my $str1 = "1234\n\t5678";
print $str1;
#it prints
#1234
# 5678
$str1 =~ s/\t/\\t/g;
$str1 =~ s/\n/\\n/g;
print $str1;
#it prints exactly the same
#1234\n\t5678

What's the difference between single and double quotes in Perl?

In Perl, what is the difference between ' and " ?
For example, I have 2 variables like below:
$var1 = '\(';
$var2 = "\(";
$res1 = ($matchStr =~ m/$var1/);
$res2 = ($matchStr =~ m/$var2/);
The $res2 statement complains that Unmatched ( before HERE mark in regex m.
Double quotes use variable expansion. Single quotes don't
In a double quoted string you need to escape certain characters to stop them being interpreted differently. In a single quoted string you don't (except for a backslash if it is the final character in the string)
my $var1 = 'Hello';
my $var2 = "$var1";
my $var3 = '$var1';
print $var2;
print "\n";
print $var3;
print "\n";
This will output
Hello
$var1
Perl Monks has a pretty good explanation of this here
' will not resolve variables and escapes
" will resolve variables, and escape characters.
If you want to store your \ character in the string in $var2, use "\\("
Double quotation marks interpret, and single quotation do not
If you are going to create regex strings you should really be using the qr// quote-like operator:
my $matchStr = "(";
my $var1 = qr/\(/;
my $res1 = ($matchStr =~ m/$var1/);
It creates a compiled regex that is much faster than just using a variable containing string. It also will return a string if not used in a regex context, so you can say things like
print "$var1\n"; #prints (?-xism:\()
Perl takes the single-quoted strings 'as is' and interpolates the double-quoted strings. Interpolate means, that it substitutes variables with variable values, and also understands escaped characters. So, your "\(" is interpreted as '(', and your regexp becomes m/(/, this is why Perl complains.
"" Supports variable interpolation and escaping. so inside "\(" \ escapes (
Where as ' ' does not support either. So '\(' is literally \(