Regular Expression Matching Perl for first case of pattern - perl

I have multiple variables that have strings in the following format:
some_text_here__what__i__want_here__andthen_someĀ 
I want to be able to assign to a variable the what__i__want_here portion of the first variable. In other words, everything after the FIRST double underscore. There may be double underscores in the rest of the string but I only want to take the text after the FIRST pair of underscores.
Ex.
If I have $var = "some_text_here__what__i__want_here__andthen_some", I would like to assign to a new variable only the second part like $var2 = "what__i__want_here__andthen_some"
I'm not very good at matching so I'm not quite sure how to do it so it just takes everything after the first double underscore.

my $text = 'some_text_here__what__i__want_here';
# .*? # Match a minimal number of characters - see "man perlre"
# /s # Make . match also newline - see "man perlre"
my ($var) = $text =~ /^.*?__(.*)$/s;
# $var is not defined when there is no __ in the string
print "var=${var}\n" if defined($var);

You might consider this an example of where split's third parameter is useful. The third parameter to split constrains how many elements to return. Here is an example:
my #examples = (
'some_text_here__what__i_want_here',
'__keep_this__part',
'nothing_found_here',
'nothing_after__',
);
foreach my $string (#examples) {
my $want = (split /__/, $string, 2)[1];
print "$string => ", (defined $want ? $want : ''), "\n";
}
The output will look like this:
some_text_here__what__i_want_here => what__i_want_here
__keep_this__part => keep_this__part
nothing_found_here =>
nothing_after__ =>
This line is a little dense:
my $want = (split /__/, $string, 2)[1];
Let's break that down:
my ($prefix, $want) = split /__/, $string, 2;
The 2 parameter tells split that no matter how many times the pattern /__/ could match, we only want to split one time, the first time it's found. So as another example:
my (#parts) = split /#/, "foo#bar#baz#buzz", 3;
The #parts array will receive these elements: 'foo', 'bar', 'baz#buzz', because we told it to stop splitting after the second split, so that we get a total maximum of three elements in our result.
Back to your case, we set 2 as the maximum number of elements. We then go one step further by eliminating the need for my ($throwaway, $want) = .... We can tell Perl we only care about the second element in the list of things returned by split, by providing an index.
my $want = ('a', 'b', 'c', 'd')[2]; # c, the element at offset 2 in the list.
my $want = (split /__/, $string, 2)[1]; # The element at offset 1 in the list
# of two elements returned by split.

You use brackets to capature then reorder the string, the first set of brackets () is $1 in the next part of the substitution, etc ...
my $string = "some_text_here__what__i__want_here";
(my $newstring = $string) =~ s/(some_text_here)(__)(what__i__want_here)/$3$2$1/;
print $newstring;
OUTPUT
what__i__want_here__some_text_here

Related

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago
Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.
I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

How to grok (and modify) this Perl statement

I am new to Perl.
How do I interpret this Perl statement?:
my( $foo, $bar ) = split /\s+/, $foobar, 2;
I know that local variables are being simultaneously assigned by the split function, but I don't understand what the integer 2 is for - I'm guessing the func will return an array with two elements?.
Can a Perl monger explain the statement above to me (ELI5)
Also, on occasion, the string being split does not contain the expected tokens, resulting in either foo or bar being uninitialized and thus causing a warning when an attempt is made to use them further on in the code.
How do I initialize $foo and $bar to sensible values (null strings) in case the split "fails" to return two strings?
The split function takes three arguments:
A regex that matches separators, or the special value " " (a string consisting of a single space), which trims the string, then splits at whitespace like /\s+/.
A string that shall be split.
A maximum number of resulting fragments. Sometimes this is an optimization when you aren't interested in all fields, and sometimes you don't want to split at each separator, as is the case here.
So your split expression will return at most two fields, but not neccessarily exactly two. To give your variables default values, either assign default values before the split, or check if they are undef after the split, and give the default:
my ($foo, $bar) = ('', '');
($foo, $bar) = split ...;
or combined
(my ($foo, $bar) = ('', '')) = split ...
or
my ($foo, $bar) = split ...;
$_ //= '' for $foo, $bar;
The //= operator assigns the value on the RHS if the LHS is undef. The for loop is just a way to shorten the code.
You may also want to carry on with a piece of code only when exactly two fields were produced:
if ( 2 == (my ($foo, $bar) = split ...) ) {
say "foo = $foo";
say "bar = $bar";
} else {
warn "could not split!";
}
List assignment in scalar context evaluates to the number of elements assigned.
The 2 is the maximum number of components returned by split.
Thus, the regexp /\s+/ splits $foobar on clumps of whitespace, but will only split once, to make two components. If there is no whitespace, then $bar will be undefined.
See http://perldoc.perl.org/functions/split.html
In addition to amon's method, Perl has a defined(x) function that returns true or false depending on whether its argument x is defined or undefined, and this can be used in an if statement to correct cases where something is undefined.
See http://perldoc.perl.org/functions/defined.html
As stated here: http://perldoc.perl.org/functions/split.html
"If LIMIT is specified and positive, it represents the maximum number of fields into which the EXPR may be split;"
For example:
#!/opt/local/bin/perl
my $foobar = "A B C D";
my( $foo, $bar ) = split /\s+/, $foobar, 2;
print "\nfoo=$foo";
print "\nbar=$bar";
print "\n";
output:
foo=A
bar=B C D

Concatenating strings from a multidimensional array overwrites the target string in Perl

I've built a two dimension array with string values. There are always 12 columns but the number of rows vary. Now I'd like to build a string of each row but when I run the following code:
$outstring = "";
for ($i=0; $i < $ctrLASTROW + 1; $i++) {
for ($k=0; $k < 12; $k++){
$datastring = $DATATABLE[$i][$k]);
$outstring .= $datastring;
}
}
$outstring takes the first value. Then on the second inner loop and subsequent loops the value in $outstring gets overlaid. For example the first value is "DATE" then the next time when the value "ABC" gets fed to it. Rather than being the hoped for "DATEABC" it's "ABCE". The "E" is the fourth character of DATE. I figure I'm missing the scalar / list issue but I've tried who knows how many variations to no avail. When I first started I tried the concatenation directly from the #DATATABLE. Same problem. Only quicker.
When you have a problem such as two strings DATE and ABC being concatenated, and the end result is ABCE, or one of the strings overwriting the other, a likely scenario is that you have a file from another OS, with the line endings \r\n, which are chomped, resulting in the string DATE\rABC when concatenated, which then becomes ABCE when printed.
In other words:
my $foo = "DATE\r\n";
my $bar = "ABC\r\n"; # \r\n line endings from file
chomp($foo, $bar); # removes \n but leaves \r
print $foo . $bar; # prints ABCE
To confirm, use
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper $DATATABLE[$i][$k]; # prints $VAR1 = "DATE\rABC\r";
To resolve, instead of chomp use a regex such as:
$foo =~ s/[\r\n]+\z//;

Convert a string into a hash in Perl using split()

$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
%hash = split /,|=>/, $hashdef;
print "$_=>$hash{$_}" foreach(keys %hash);
Mouse=>JerryDog=>SpikeCat=>Tom
I am new to Perl. Can any one explain the regular expression inside the split function? I able to know | is used as the choice of both, but I was still confused.
%hash = split /|=>/, $hashdef;
I get the output
S=>pe=>J=>eT=>or=>rm=>,y=>,u=>sM=>og=>D=>oC=>ai=>kt
%hash = split /,/, $hashdef;
Mouse=>Jerry=>Cat=>TomDog=>Spike=>
Please explain the above condition.
split's first argument defines what separates the elements you want.
/,|=>/ matches a comma (,) or an equals sign followed by a greater-than sign (=>). They're just literals here, there's nothing special about them.
/|=>/ matches the zero-length string or an equals sign followed by a greater-than sign, and splitting on a zero-length string just splits a string up into individual characters; therefore, in your hash, M will map to o, u will map to s, etc. They appear jumbled up in your output because hashes don't have a definite ordering.
/,/ just splits on a comma. You're creating a hash that maps Mouse=>Jerry to Cat=>Tom and Dog=>Spike to nothing.
$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
my %hash = eval( "( $hashdef )" );
print $hash{'Mouse'}."\n";
eval executes a string as a Perl expression. This doesn't use split, but I think would be a good way to handle the case outlined in your post of getting a hash from your string, seeing as your string happens to be well formed Perl, so I've added it here.
sub hash2string {
my $href = $_[0];
my $hstring = "";
foreach (keys %{$href}) {
$hstring .= "$_=>$href->{$_}, ";
}
return substr($hstring, 0, -2);
}
sub string2hash {
my %lhash;
my #lelements = split(/, /, $_[0]);
foreach (#lelements) {
my ($skey,$svalue) = split(/=>/, $_);
$lhash{$skey} = $svalue;
}
return %lhash;
}

How can I split a Perl string only on the last occurrence of the separator?

my $str="1:2:3:4:5";
my ($a,$b)=split(':',$str,2);
In the above code I have used limit as 2 ,so $a will contain 1 and remaining elements will be in $b.
Like this I want the last element should be in one variable and the elements prior to the last element should be in another variable.
Example
$str = "1:2:3:4:5" ;
# $a should have "1:2:3:4" and $b should have "5"
$str = "2:3:4:5:3:2:5:5:3:2"
# $a should have "2:3:4:5:3:2:5:5:3" and $b should have "2"
split(/:([^:]+)$/, $str)
You could use pattern matching instead of split():
my ($a, $b) = $str =~ /(.*):(.*)/;
The first group captures everything up to the last occurence of ':' greedily, and the second group captures the rest.
In case the ':' is not present in the string, Perl is clever enough to detect that and fail the match without any backtracking.
you can also use rindex() eg
my $str="1:2:3:4:5";
$i=rindex($str,":");
$a=substr($str,0,$i);
$b=substr($str,$i+1);
print "\$a:$a, \$b: $b\n";
output
$ perl perl.pl
$a:1:2:3:4, $b: 5
I know, this question is 4 years old. But I found the answer from YOU very interesting as I didn't know split could work like that. So I want to expand it with an extract from the perldoc split that explains this behavior, for the sake of new readers. :-)
my $str = "1:2:3:4:5";
my ($a, $b) = split /:([^:]+)$/, $str;
# Capturing everything after ':' that is not ':' and until the end of the string
# Now $a = '1:2:3:4' and $b = '5';
From Perldoc:
If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures the undef value instead of a substring. Also, note that any such additional field is produced whenever there is a separator (that is, whenever a split occurs), and such an additional field does not count towards the LIMIT. Consider the following expressions evaluated in list context (each returned list is provided in the associated comment):
split(/-|,/, "1-10,20", 3)
# ('1', '10', '20')
split(/(-|,)/, "1-10,20", 3)
# ('1', '-', '10', ',', '20')
split(/-|(,)/, "1-10,20", 3)
# ('1', undef, '10', ',', '20')
split(/(-)|,/, "1-10,20", 3)
# ('1', '-', '10', undef, '20')
split(/(-)|(,)/, "1-10,20", 3)
# ('1', '-', undef, '10', undef, ',', '20')
You can do it using split and reverse as follows:
my $str="1:2:3:4:5";
my ($a,$b)=split(':',reverse($str),2); # reverse and split.
$a = reverse($a); # reverse each piece.
$b = reverse($b);
($a,$b) = ($b,$a); # swap a and b
Now $a will be 1:2:3:4 and $b will be 5.
A much simpler and cleaner way is to use regex as Mark has done in his Answer.
I'm a bit late to this question, but I put together a more generic solution:
# Similar to split() except pattern is applied backwards from the end of the string
# The only exception is that the pattern must be a precompiled regex (i.e. qr/pattern/)
# Example:
# rsplit(qr/:/, 'John:Smith:123:ABC', 3) => ('John:Smith', '123', 'ABC')
sub rsplit {
my $pattern = shift(#_); # Precompiled regex pattern (i.e. qr/pattern/)
my $expr = shift(#_); # String to split
my $limit = shift(#_); # Number of chunks to split into
# 1) Reverse the input string
# 2) split() it
# 3) Reverse split()'s result array element order
# 4) Reverse each string within the result array
map { scalar reverse($_) } reverse split(/$pattern/, scalar reverse($expr), $limit);
}
It accepts arguments similar to split() except that the splitting is done in reverse order. It also accepts a limit clause in case you need a specified number of result elements.
Note: this subroutine expects a precompiled regex as the first parameter.
Perl's split is a built-in and will interpret /pat/ correctly, but attempting to pass /pat/ to a subroutine will be treated as sub($_ =~ /pat/).
This subroutine is not bulletproof! It works well enough for simple delimiters but more complicated patterns can cause issues. The pattern itself cannot be reversed, only the expression it matches against.
Examples:
rsplit(qr/:/, 'One:Two:Three', 2); # => ('One:Two', 'Three')
rsplit(qr/:+/, 'One:Two::Three:::Four', 3); # => ('One:Two', 'Three', 'Four')
# Discards leading blank elements just like split() discards trailing blanks
rsplit(qr/:/, ':::foo:bar:baz'); # => ('foo', 'bar', 'baz')