how to pass one regex output to another regex in perl - perl

How to combine two regex . This is my input:
1.UE_frequency_offset_flag else { 2} UE_frequency_offset_flag
2.served1 0x00 Uint8,unsigned char
#my first regex expression is used for extracting the values inside curly braces
my ($first_match) = /(\b(\d+)\b)/g;
print "$1 \n";
#my second regex expression
my ($second_match) = / \S \s+ ( \{ [^{}]+ \} | \S+ ) /x;
I was trying to combine both regex but did not get the expected output.
my ($second_match) = / \S \s+ ( \{ [^{}]+ \} |\b(\d+)\b| \S+ ) /x;
My expected output:
2,0x00
Please help where I am doing mistake?

The question is not completely clear to me, because I don't see how you want to combine two regex or pass the output of one to the other.
If you want to pass the captured part of the first regex then you need to save it to a variable:
my ($first_match) = /(\b(\d+)\b)/g;
my $captured = $1;
Then you can place the variable $captured in the second regex.
If you want to use the complete match and search inside that. Then you need to do the following:
my ($first_match) = /(\b(\d+)\b)/g;
print "$1,"; # Don't print one space then new line if you want to have a comma separating the two values
my ($second_match) = $first_match =~ / \S \s+ ( \{ [^{}]+ \} | \S+ ) /x;
Based on your input, this won't generate the expected output.
The following code would print out:
2,0x00
When processing your input.
print "$1," if /\{\s*(\d+)\s*\}/;
print "$1\n" if /(\d+x\d+)/;

Related

Why does multiple use of `<( )>` token within `comb` not behave as expected?

I want to extract the row key(here is 28_2820201112122420516_000000), the column name(here is bcp_startSoc), and the value(here is 64.0) in $str, where $str is a row from HBase:
# `match` is OK
my $str = '28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0';
my $match = $str.match(/^ ([\d+]+ % '_') \s 'column=d:' (\w+) ',' \s timestamp '=' \d+ ',' \s 'value=' (<-[=]>+) $/);
my #match-result = $match».Str.Slip;
say #match-result; # Output: [28_2820201112122420516_000000 bcp_startSoc 64.0]
# `smartmatch` is OK
# $str ~~ /^ ([\d+]+ % '_') \s 'column=d:' (\w+) ',' \s timestamp '=' \d+ ',' \s 'value=' (<-[=]>+) $/
# say $/».Str.Array; # Output: [28_2820201112122420516_000000 bcp_startSoc 64.0]
# `comb` is NOT OK
# A <( token indicates the start of the match's overall capture, while the corresponding )> token indicates its endpoint.
# The <( is similar to other languages \K to discard any matches found before the \K.
my #comb-result = $str.comb(/<( [\d+]+ % '_' )> \s 'column=d:' <(\w+)> ',' \s timestamp '=' \d+ ',' \s 'value=' <(<-[=]>+)>/);
say #comb-result; # Expect: [28_2820201112122420516_000000 bcp_startSoc 64.0], but got [64.0]
I want comb to skip some matches, and just match what i wanted, so i use multiple <( and )> here, but only get the last match as result.
Is it possible to use comb to get the same result as match method?
TL;DR Multiple <(...)>s don't mean multiple captures. Even if they did, .comb reduces each match to a single string in the list of strings it returns. If you really want to use .comb, one way is to go back to your original regex but also store the desired data using additional code inside the regex.
Multiple <(...)>s don't mean multiple captures
The default start point for the overall match of a regex is the start of the regex. The default end point is the end.
Writing <( resets the start point for the overall match to the position you insert it at. Each time you insert one and it gets applied during processing of a regex it resets the start point. Likewise )> resets the end point. At the end of processing a regex the final settings for the start and end are applied in constructing the final overall match.
Given that your code just unconditionally resets each point three times, the last start and end resets "win".
.comb reduces each match to a single string
foo.comb(/.../) is equivalent to foo.match(:g, /.../)>>.Str;.
That means you only get one string for each match against the regex.
One possible solution is to use the approach #ohmycloudy shows in their answer.
But that comes with the caveats raised by myself and #jubilatious1 in comments on their answer.
Add { #comb-result .push: |$/».Str } to the regex
You can workaround .comb's normal functioning. I'm not saying it's a good thing to do. Nor am I saying it's not. You asked, I'm answering, and that's it. :)
Start with your original regex that worked with your other solutions.
Then add { #comb-result .push: |$/».Str } to the end of the regex to store the result of each match. Now you will get the result you want.
$str.comb( / ^ [\d+]+ % '_' | <?after d\:> \w+ | <?after value\=> .*/ )
Since you have a comma-separated 'row' of information you're examining, you could try using split() to break your matches up, and assign to an array. Below in the Raku REPL:
> my $str = '28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0';
28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0
> my #array = $str.split(", ")
[28_2820201112122420516_000000 column=d:bcp_startSoc timestamp=1605155065124 value=64.0]
> dd #array
Array #array = ["28_2820201112122420516_000000 column=d:bcp_startSoc", "timestamp=1605155065124", "value=64.0"]
Nil
> say #array.elems
3
Match on individual elements of the array:
> say #array[0] ~~ m/ ([\d+]+ % '_') \s 'column=d:' (\w+) /;
「28_2820201112122420516_000000 column=d:bcp_startSoc」
0 => 「28_2820201112122420516_000000」
1 => 「bcp_startSoc」
> say #array[0] ~~ m/ ([\d+]+ % '_') \s 'column=d:' <(\w+)> /;
「bcp_startSoc」
0 => 「28_2820201112122420516_000000」
> say #array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /;
「bcp_startSoc」
Boolean tests on matches to one-or-more array elements:
> say True if ( #array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /)
True
> say True if ( #array[2] ~~ m/ 'value=' <(<-[=]>+)> / )
True
> say True if ( #array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /) & ( #array[2] ~~ m/ 'value=' <(<-[=]>+)> / )
True
HTH.

In a string replacements how we use '/r' modifier

I need to increment a numeric value in a string:
my $str = "tool_v01.zip";
(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1++);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ ($1+1);/eri;
#(my $newstr = $str) =~ s/\_v(\d+)\.zip$/ $1=~s{(\d+)}{$1+1}/r; /eri;
print $newstr;
Expected output is tool_v02.zip
Note: the version number 01 may contain any number of leading zeroes
I don't think this question has anything to do with the /r modifier, but rather how to properly format the output. For that, I'd suggest sprintf:
my $newstr = $str =~ s{ _v (\d+) \.zip$ }
{ sprintf("_v%0*d.zip", length($1), $1+1 ) }xeri;
Or, replacing just the number with zero-width Lookaround Assertions:
my $newstr = $str =~ s{ (?<= _v ) (\d+) (?= \.zip$ ) }
{ sprintf("%0*d", length($1), $1+1 ) }xeri;
Note: With either of these solutions, something like tool_v99.zip would be altered to tool_v100.zip because the new sequence number cannot be expressed in two characters. If that's not what you want then you need to specify what alternative behaviour you require.
The bit you're missing is sprintf which works the same way as printf except rather than outputting the formatted string to stdout or a file handle, it returns it as a string. Example:
sprintf("%02d",3)
generates a string 03
Putting this into your regex you can do this. Rather than using /r you can use do a zero-width look ahead ((?=...)) to match the file suffix and just replace the matched number with the new value
s/(\d+)(?=.zip$)/sprintf("%02d",$1+1)/ei

Perl - Convert integer to text Char(1,2,3,4,5,6)

I am after some help trying to convert the following log I have to plain text.
This is a URL so there maybe %20 = 'space' and other but the main bit I am trying convert is the char(1,2,3,4,5,6) to text.
Below is an example of what I am trying to convert.
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
What I have tried so far is the following while trying to added into the char(in here) to convert with the chr($2)
perl -pe "s/(char())/chr($2)/ge"
All this has manage to do is remove the char but now I am trying to convert the number to text and remove the commas and brackets.
I maybe way off with how I am doing as I am fairly new to to perl.
perl -pe "s/word to remove/word to change it to/ge"
"s/(char(what goes in here))/chr($2)/ge"
Output try to achieve is
select -x1-Q-,-x2-Q-,-x3-Q-
Or
select%20-x1-Q-,-x2-Q-,-x3-Q-
Thanks for any help
There's too much to do here for a reasonable one-liner. Also, a script is easier to adjust later
use warnings;
use strict;
use feature 'say';
use URI::Escape 'uri_unescape';
my $string = q{select%20}
. q{char(45,120,49,45,81,45),char(45,120,50,45,81,45),}
. q{char(45,120,51,45,81,45)};
my $new_string = uri_unescape($string); # convert %20 and such
my #parts = $new_string =~ /(.*?)(char.*)/;
$parts[1] = join ',', map { chr( (/([0-9]+)/)[0] ) } split /,/, $parts[1];
$new_string = join '', #parts;
say $new_string;
this prints
select -x1-Q-,-x2-Q-,-x3-Q-
Comments
Module URI::Escape is used to convert percent-encoded characters, per RFC 3986
It is unspecified whether anything can follow the part with char(...)s, and what that might be. If there can be more after last char(...) adjust the splitting into #parts, or clarify
In the part with char(...)s only the numbers are needed, what regex in map uses
If you are going to use regex you should read up on it. See
perlretut, a tutorial
perlrequick, a quick-start introduction
perlre, the full account of syntax
perlreref, a quick reference (its See Also section is useful on its own)
Alright, this is going to be a messy "one-liner". Assuming your text is in a variable called $text.
$text =~ s{char\( ( (?: (?:\d+,)* \d+ )? ) \)}{
my #arr = split /,/, $1;
my $temp = join('', map { chr($_) } #arr);
$temp =~ s/^|$/"/g;
$temp
}xeg;
The regular expression matches char(, followed by a comma-separated list of sequences of digits, followed by ). We capture the digits in capture group $1. In the substitution, we split $1 on the comma (since chr only works on one character, not a whole list of them). Then we map chr over each number and concatenate the result into a string. The next line simply puts quotation marks at the start and end of the string (presumably you want the output quoted) and then returns the new string.
Input:
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
Output:
select%20"-x1-Q-","-x2-Q-","-x3-Q-"
If you want to replace the % escape sequences as well, I suggest doing that in a separate line. Trying to integrate both substitutions into one statement is going to get very hairy.
This will do as you ask. It performs the decoding in two stages: first the URI-encoding is decoded using chr hex $1, and then each char() function is translated to the string corresponding to the character equivalents of its decimal parameters
use strict;
use warnings 'all';
use feature 'say';
my $s = 'select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)';
$s =~ s/%(\d+)/ chr hex $1 /eg;
$s =~ s{ char \s* \( ( [^()]+ ) \) }{ join '', map chr, $1 =~ /\d+/g }xge;
say $s;
output
select -x1-Q-,-x2-Q-,-x3-Q-

match multiple patterns and extract subpatterns into a array in perl

I have the following string in $str:
assign (rregbus_z_partially_resident | regbus_s_partially_resident | reg_two | )regbus_;
I want to parse this line and only capture all the string that starts with non-word character followed by either reg_\w+ or regbus_\w+ into an array.
so in the above example, i want to capture only
regbus_s_partially_resident and reg_two into a array.
I tried this and it didnot work:
my (#all_matches) = ($str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g);
Since i am trying to use \W, its copying the non-word character also into the array list, which i donot want.
its copying the non-word character also into the array list
No, it doesn't.
$ perl -le'
my $str = "assign (rregbus_z_partially_resident | regbus_s_partially_resident | reg_two | )regbus_;";
my (#all_matches) = ($str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g);
print $_ // "[undef]" for #all_matches;
'
[undef]
regbus_s_partially_resident
reg_two
[undef]
But you do have a problem: You have two captures, so you will get two values per match.
Fix:
my #all_matches;
push #all_matches, $1 // $2 while $str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g;
Far better:
my #all_matches = $str =~ m/\W(reg(?:bus)?_\w+)/g;
Ever better yet:
my #all_matches = $str =~ m/\b(reg(?:bus)?_\w+)/g;
Need a little tweak to your regex
my #all_matches = $str =~ m/\W(reg_\w+|regbus_\w+)/g;
or
my #all_matches = $str =~ m/\W( (?:reg|regbus)_\w+ )/gx;
or even something along the lines of
my #all_matches = $str =~ m/\W( reg(?:bus)?_\w+ )/gx;
The most suitable form depends on what patterns you may need and how this is used.
Or, reduce the regex use to the heart of the problem
my #matches = grep { /^(?:reg_\w+|regbus_\w+)/ } split /\W/, $str;
what may be helpful if your strings and/or requirements grow more complex.

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;