Preg Match only digits Between Curly Braces - preg_match - find

Related to Preg Match Between Curly Braces - preg_match
What if my string is: this an example with {2} words and {7} numbers
And I want to return:
array(
[1] => 2
[2] => 7
)

Where the linked question used . to match any character, you can use \d to match any digit character (0-9):
$str = 'this an example with {2} words and {7} numbers';
if(preg_match_all('/{+(\d*?)}/', $str, $matches)) {
var_dump($matches[1]);
}
Output:
array(2) {
[0]=>
string(1) "2"
[1]=>
string(1) "7"
}
https://regex101.com/r/MmXp6E/1

Related

how can I partition a line into code and comment using a single regex in perl?

I want to read through a text file and partition each line into the following three variables. Each variable must be defined, although it might be equal to the empty string.
$a1code: all characters up to and not including the first non-escaped percent sign. If there is no non-escaped percent sign, this is the entire line. As we see in the example below, this also could be the empty string in a line where the following two variables are non-empty.
$a2boundary: the first non-escaped percent sign, if there is one.
$a3cmnt: any characters after the first non-escaped percent sign, if there is one.
The script below accomplishes this but requires several lines of code, two hashes, and a composite regex, that is, 2 regex combined by |.
The composite seems necessary because the first clause,
(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)
does not match a line that is pure code, no comment.
Is there a more elegant way, using a single regex and fewer steps?
In particular, is there a way to dispense with the %match hash and somehow
fill the %+ hash with all three three variables in a single step?
#!/usr/bin/env perl
use strict; use warnings;
print join('', 'perl ', $^V, "\n",);
use Data::Dumper qw(Dumper); $Data::Dumper::Sortkeys = 1;
my $count=0;
while(<DATA>)
{
$count++;
print "$count\t";
chomp;
my %match=(
a2boundary=>'',
a3cmnt=>'',
);
print "|$_|\n";
if($_=~/^(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)|(?<a1code>.*)/)
{
print "from regex:\n";
print Dumper \%+;
%match=(%match,%+,);
}
else
{
die "no match? coding error, should never get here";
}
if(scalar keys %+ != scalar keys %match)
{
print "from multiple lines of code:\n";
print Dumper \%match;
}
print "------------------------------------------\n";
}
__DATA__
This is 100\% text and below you find an empty line.
abba 5\% %comment 9\% %Borgia
%all comment
%
Result:
perl v5.34.0
1 |This is 100\% text and below you find an empty line. |
from regex:
$VAR1 = {
'a1code' => 'This is 100\\% text and below you find an empty line. '
};
from multiple lines of code:
$VAR1 = {
'a1code' => 'This is 100\\% text and below you find an empty line. ',
'a2boundary' => '',
'a3cmnt' => ''
};
------------------------------------------
2 ||
from regex:
$VAR1 = {
'a1code' => ''
};
from multiple lines of code:
$VAR1 = {
'a1code' => '',
'a2boundary' => '',
'a3cmnt' => ''
};
------------------------------------------
3 |abba 5\% %comment 9\% %Borgia|
from regex:
$VAR1 = {
'a1code' => 'abba 5\\% ',
'a2boundary' => '%',
'a3cmnt' => 'comment 9\\% %Borgia'
};
------------------------------------------
4 |%all comment|
from regex:
$VAR1 = {
'a1code' => '',
'a2boundary' => '%',
'a3cmnt' => 'all comment'
};
------------------------------------------
5 |%|
from regex:
$VAR1 = {
'a1code' => '',
'a2boundary' => '%',
'a3cmnt' => ''
};
------------------------------------------
You can use the following:
my ($a1code, $a2boundary, $a3cmnt) =
/
^
( (?: [^\\%]+ | \\. )* )
(?: (%) (.*) )?
\z
/sx;
It does not consider % escaped in abc\\%def since the preceding \ is escaped.
It requires no backtracking, and it always matches.
$a1code is always a string. It can be zero characters long (when the input is an empty string and when % is the first character), or the entire input string (when there is no unescaped %).
However, $a2boundary and $a3cmnt are only defined if there's an unescaped %. In other words, $a2boundary is equivalent to defined($a3cmnt) ? '%' : undef.
Explanation: [^\\%]+ matches non-escaped characters other than \ and %. \\. matches escaped characters. So (?: [^\\%]+ | \\. )* gets us the prefix, or the entire string if there are no unescaped %.
What about cases like this\\%string where the backslash before the percent sign is itself escaped?
Consider something like this, which instead of trying to use a regular expression to split the string into three groups, uses one to look where for it should be split, and substr to do the actual splitting:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub splitter {
my $line = shift;
if ($line =~ /
# Match either
(?<!\\)% # A % not preceded by a backslash
| # or
(?<=[^\\])(?:\\\\)+\K% # Any even number of backslashes followed by a %
/x) {
return (substr($line, 0, $-[0]), '%', substr($line, $+[0]));
} else {
return ($line, '', '');
}
}
while (<DATA>) {
chomp;
# Assign to an array instead of individual scalars for demonstration purposes
my #vals = splitter $_;
print Dumper(\#vals);
}
__DATA__
This is 100\% text and below you find an empty line.
abba 5\% %comment 9\% %Borgia
%all comment
%
a tricky\\%test % case
another \\\%one % to mess with you

Parse single quoted string using Marpa:r2 perl

How to parse single quoted string using Marpa:r2?
In my below code, the single quoted strings appends '\' on parsing.
Code:
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
{ default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= Expression
# include begin
Expression ::= Param
Param ::= Unquoted
| ('"') Quoted ('"')
| (') Quoted (')
:discard ~ whitespace
whitespace ~ [\s]+
Unquoted ~ [^\s\/\(\),&:\"~]+
Quoted ~ [^\s&:\"~]+
END_OF_SOURCE
});
my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
Output's:
Trying to parse:
foo
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
"foo"
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
'foo'
Output:
$VAR1 = [
[
'\'foo\''
]
]; (don't want it to be parsed like this)
Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.
Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')
The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.
When you parse the input 'foo', Marpa will consider the three Param alternatives. The predicted lexemes at that position are:
Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('
Yes, the last is literally ) Quoted (, not anything containing a single quote.
Even if it were ([']) Quoted ([']): Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.
What would happen for an input like " foo " (with double quotes)? Now, only the '"' lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing " is matched.
To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.
Since lexemes cannot have actions, it is usually best to add a proxy production:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
The action could then do something like:
sub process_quoted {
my (undef, $s) = #_;
# remove delimiters from double-quoted string
return $1 if $s =~ /^"(.*)"$/s;
# remove delimiters from single-quoted string
return $1 if $s =~ /^'(.*)'$/s;
die "String was not delimited with single or double quotes";
}
Your result doesn't contain \', it contains '. Dumper merely formats the result like that so it's clear what's inside the string and what isn't.
You can test this behavior for yourself:
use Data::Dumper;
my $tick = chr(39);
my $back = chr(92);
print "Tick Dumper: " . Dumper($tick);
print "Tick Print: " . $tick . "\n";
print "Backslash Dumper: " . Dumper($back);
print "Backslash Print: " . $back . "\n";
You can see a demo here: https://ideone.com/d1V8OE
If you don't want the output to contain single quotes, you'll probably need to remove them from the input yourself.
I am not so familar with Marpa::R2, but could you try to use an action on the Expression rule:
Expression ::= Param action => strip_quotes
Then, implement a simple quote stripper like:
sub MyActions::strip_quotes {
#{$_[1]}[0] =~ s/^'|'$//gr;
}

How can i find if my string contains nine digits?

how can i find if string contains 9 digits?
this is the cokde that I tried.
if($string ~= m/\d{9}/){
print ("String contains 9 digits");
}
thanx.
Try changing ~= to =~
if($string =~ m/\d{9}/){
print ("String contains 9 digits");
}
This will match any string containing at-least 9 didts
how can i ensure that string only contains 9 digits?
Try enclosing match within ^ and $
if($string =~ m/^\d{9}$/){
print ("String only contains 9 digits");
}
i want to also match nine digits that are preceeded or followed by
whitespaces e.g ' 123456798 '
Try specifying \s* at the beginning and end of the string
if($string =~ m/^\s*\d{9}\s*$/){
print ("String contains only 9 digits with optional leading or trailing spaces");
}
P.S. Observe the difference in the meaning of print message in all the cases. They are not same!

how can i match two consecutive words that both start with capital letters?

I want to match first and last name.
e.g. Robert Still, the words can be preceeded and followed by whitespaces however the string can only contain two words.
' Robert Still ' = true
' Robert Still ' = true
'e Robert Still 4 ' = false
this is the code that i tried
m/^\s*[A-Z].*[a-z]\s*[A-Z].*[a-z]\s*$/
Try this:
#!/usr/bin/perl -w
use strict;
my #names = ('Robert Still', ' Robert Still', 'Robert Still ', '4 Robert Still 2', 'Robert e Still');
foreach (#names){
if ($_ =~ /(^\s*[A-Z]\w+\s+[A-Z]\w+\s*$)/){
print "'$_' : true\n"
}
else {
print "'$_' : false\n";
}
}
Output:
'Robert Still' : true
' Robert Still' : true
'Robert Still ' : true
'4 Robert Still 2' : false
'Robert e Still' : false
Regex explained:
^ Start of line
\s* 0 to infinite times [greedy] Whitespace [\t \r\n\f\v]
Char class [A-Z] matches: A-Z A character range between Literal A and Literal Z
\w+ 1 to infinite times [greedy] Word character [a-zA-Z_\d]
\s+ 1 to infinite times [greedy] Whitespace [\t \r\n\f\v]
Char class [A-Z] matches: A-Z A character range between Literal A and Literal Z
\w+ 1 to infinite times [greedy] Word character [a-zA-Z_\d]
\s* 0 to infinite times [greedy] Whitespace [\t \r\n\f\v]
$ End of line
/^ \s* \p{Lu}\S+ \s+ \p{Lu}\S+ \s* \z/x

Perl Text processing on a variable before its usage

I wrote a perl script whihc will output a list containing similar entries like below:
$var = ' whatever'
$var contains: a single quote, a space, the word whatever, single quote
actually, this is key of a hash and i want to pull the value for the same. but due to the single quotes and a space in betweene, i am not able to pull the hash key value.
So, i want to strip $var as below:
$var = whatever
meaning remove the single quote, the space and the trailing single quote.
so that I can use $var as hash key to pull the respective value.
could you guide me on a perl oneliner for the same.
thnaks.
Here is several ways to do it, but beware - modifying the keys in a hash can end with unwanted results, like:
use strict;
use warnings;
use Data::Dumper;
my $src = {
"a a" => 1,
" a a " => 2,
"' a a '" => 3,
};
print "src: ", Dumper($src);
my $trg;
#$trg{ map { s/^[\s']*(.*?)[\s']*$/$1/; $_ } keys %$src } = values %$src;
print "copy: ", Dumper($trg);
will produce:
src: $VAR1 = {
' a a ' => 2,
'\' a a \'' => 3,
'a a' => 1
};
copy: $VAR1 = {
'a a' => 1
};
Any regex is possible do explain with YAPE::Regex::Explain module. (from CPAN). For the above regex:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new( qr(^[\s']*(.*?)[\s']*$) )->explain;
will produce:
The regular expression:
(?-imsx:^[\s']*(.*?)[\s']*$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In short the: s/^[\s']*(.*?)[\s']*$/$1/; mean:
at the beginning of the string match whitespaces or apostrophe as much times is possible,
then match anything
match at the end of string whitespaces or apostrophes as much times as possible
and keep the only the "anything" part
#!/usr/bin/perl
$string = "' my string'";
print $string . "\n";
$string =~ s/'//g;
$string =~ s/^ //g;
print $string;
Output
' my string'
my string
$var =~ tr/ '//d;
see: tr operator
or, by regex
$var =~ s/(?:^['\s]+)|'//g;
The latter will keep the spaces in the middle of the word, the former removes all spaces and single quotes.
A short test:
...
$var = q{' what ever'};
$var =~ s/
(?: # find the following group
^ # at string begin, followed by
['\s]+ # space or single quote, one or more
) # close group
| # OR
' # single quotes in the while string
//gx ; # replace by nothing, use formatted regex (x)
print "|$var|\n";
...
prints:
|what ever|
as expected.