how to get the required strings from a text using perl - perl

Here is the text to trim:
/home/netgear/Desktop/WGET-1.13/wget-1.13/src/cmpt.c:388,error,resourceLeak,Resource leak: fr
From the above text I need to get the data next to ":". How do I get 388,error,resourceLeak,Resource leak: fr?

You can use split to separate a string into a list based on a delimiter. In your case the delimiter should be a ::
my #parts = split ':', $text;
As the text you want to extract can also contain a :, use the limit argument to stop after the first one:
my #parts = split ':', $text, 2;
$parts[1] will then contain the text you wanted to extract. You could also pass the result into a list, discarding the first element:
my (undef, $extract) = split ':', $text, 2;

Aside from #RobEarl's suggestion of using split, you could use a regular expression to do this.
my ($match) = $text =~ /^[^:]+:(.*?)$/;
Regular expression:
^ the beginning of the string
[^:]+ any character except: ':' (1 or more times)
: match ':'
( group and capture to \1:
.*? any character except \n (0 or more times)
) end of \1
$ before an optional \n, and the end of the string
$match will now hold the result of capture group #1..
388,error,resourceLeak,Resource leak: fr

Related

Regular Expression Matching Perl for first case of pattern

I have multiple variables that have strings in the following format:
some_text_here__what__i__want_here__andthen_some 
I want to be able to assign to a variable the what__i__want_here portion of the first variable. In other words, everything after the FIRST double underscore. There may be double underscores in the rest of the string but I only want to take the text after the FIRST pair of underscores.
Ex.
If I have $var = "some_text_here__what__i__want_here__andthen_some", I would like to assign to a new variable only the second part like $var2 = "what__i__want_here__andthen_some"
I'm not very good at matching so I'm not quite sure how to do it so it just takes everything after the first double underscore.
my $text = 'some_text_here__what__i__want_here';
# .*? # Match a minimal number of characters - see "man perlre"
# /s # Make . match also newline - see "man perlre"
my ($var) = $text =~ /^.*?__(.*)$/s;
# $var is not defined when there is no __ in the string
print "var=${var}\n" if defined($var);
You might consider this an example of where split's third parameter is useful. The third parameter to split constrains how many elements to return. Here is an example:
my #examples = (
'some_text_here__what__i_want_here',
'__keep_this__part',
'nothing_found_here',
'nothing_after__',
);
foreach my $string (#examples) {
my $want = (split /__/, $string, 2)[1];
print "$string => ", (defined $want ? $want : ''), "\n";
}
The output will look like this:
some_text_here__what__i_want_here => what__i_want_here
__keep_this__part => keep_this__part
nothing_found_here =>
nothing_after__ =>
This line is a little dense:
my $want = (split /__/, $string, 2)[1];
Let's break that down:
my ($prefix, $want) = split /__/, $string, 2;
The 2 parameter tells split that no matter how many times the pattern /__/ could match, we only want to split one time, the first time it's found. So as another example:
my (#parts) = split /#/, "foo#bar#baz#buzz", 3;
The #parts array will receive these elements: 'foo', 'bar', 'baz#buzz', because we told it to stop splitting after the second split, so that we get a total maximum of three elements in our result.
Back to your case, we set 2 as the maximum number of elements. We then go one step further by eliminating the need for my ($throwaway, $want) = .... We can tell Perl we only care about the second element in the list of things returned by split, by providing an index.
my $want = ('a', 'b', 'c', 'd')[2]; # c, the element at offset 2 in the list.
my $want = (split /__/, $string, 2)[1]; # The element at offset 1 in the list
# of two elements returned by split.
You use brackets to capature then reorder the string, the first set of brackets () is $1 in the next part of the substitution, etc ...
my $string = "some_text_here__what__i__want_here";
(my $newstring = $string) =~ s/(some_text_here)(__)(what__i__want_here)/$3$2$1/;
print $newstring;
OUTPUT
what__i__want_here__some_text_here

How to move a substring with preg_replace or preg_match in PHP?

I want to find a substring and move it in the string instead of replacing (for example, moving it from the beginning to the end of the string).
'THIS the rest of the string' -> 'the rest of the string THIS'
I do this by the following code
preg_match('/^(THIS).?/', $str, $match);
$str = trim( $str . $match[1] );
$str = preg_replace('/^(THIS).?/', '', $str);
There should be an easier way to do this with one regex.
You may use
$re = '/^(THIS)\b\s*(.*)/s';
$str = 'THIS the rest of the string';
$result = preg_replace($re, '$2 $1', $str);
See the regex demo and a PHP demo.
Details
^ - start of string
(THIS) - Group 1 (referenced to with $1 from the replacement pattern): THIS
\b - a word boundary (if you do not need a whole word, you may remove it)
\s* - 0+ whitespaces (if there is always at least one whitespace, use \s+ and remove \b, as it will become redundant)
(.*) - Group 2 (referenced to with $2 from the replacement pattern): the rest of the string (s modifier allows . match line break chars, too).

Perl, Split string by specific pattern

I found how to split a string by whitespaces, but that only takes into an account a single character. In my case, I have comments pasted into a file that includes newlines and whitespaces. I have them separated by this string: [|]
So I need to split my $string into an array for example, where $string =
This is a comment.
This is a newline.
This is the end[|]This is second comment.
This is second newline.
[|]Last comment
Gets split into $array[0], $array[1], and $array[2] which include the newlines and whitespaces. Separated by [|]
Every example I find on the web uses a single character, such as space or newline, to split strings. In my case I have to use a more specific identifier, which is why I selected [|] but having troubles splitting it by this.
I have tried to limit it to parse by a single '|' character with this code:
my #words = split /|/, $string;
foreach my $thisline (#words) {
print "This line = '" . $thisline . "'\n";
But this seems to split the entire string, character-by-character into #words.
[, |, and ] are all special characters in regular expressions -- | is used to separate options, and […] are used to specify character sets. Using an unquoted | makes the expression match the empty string (more specifically: the empty string or the empty string), causing it to match and split on every character boundary. These characters must be escaped to use them literally in an expression:
my #words = split /\[\|\]/, $string;
Since all the lines makes this visually confusing, you should probably use m{} quotes instead of //, and \Q…\E to quote a range of characters instead of a separate backslash for each one. (This is functionally identical, it's just a little easier to read.)
my #words = split m{\Q[|]\E}, $string;

Split Variable on white space [duplicate]

This question already has answers here:
Using perl to split a line that may contain whitespace
(5 answers)
Closed 9 years ago.
I'm trying to split a string into an array with the split occurring at the white spaces. Each block of text is seperated by numerous (variable) spaces.
Here is the string:
NUM8 host01 1,099,849,993 1,099,849,992 1
I have tried the following without success.
my #array1 = split / /, $VAR1;
my #array1 = split / +/, $VAR1;
my #array1 = split /\s/, $VAR1;
my #array1 = split /\s+/, $VAR1;
I'd like to end up with:
$array1[0] = NUM8
$array1[1] = host01
$array1[2] = 1,099,849,993
$array1[3] = 1,099,849,992
$array1[4] = 1
What is the best way to split this?
If the first argument to split is the string ' ' (the space), it is special. It should match whitespace of any size:
my #array1 = split ' ', $VAR1;
(BTW, it is almost equivalent to your last option, but it also removes any leading whitespace.)
Just try using:
my #array1 = split(' ',$VAR1);
Codepad Demo
From Perldoc:
As another special case, split emulates the default behavior of the
command line tool awk when the PATTERN is either omitted or a literal
string composed of a single space character (such as ' ' or "\x20" ,
but not e.g. / / ). In this case, any leading whitespace in EXPR is
removed before splitting occur
\s+ matches 1 or more whitespaces, and split on them
my #array1 = split /\s+/, $VAR1;

Perl Text processing on a variable before its usage

I wrote a perl script whihc will output a list containing similar entries like below:
$var = ' whatever'
$var contains: a single quote, a space, the word whatever, single quote
actually, this is key of a hash and i want to pull the value for the same. but due to the single quotes and a space in betweene, i am not able to pull the hash key value.
So, i want to strip $var as below:
$var = whatever
meaning remove the single quote, the space and the trailing single quote.
so that I can use $var as hash key to pull the respective value.
could you guide me on a perl oneliner for the same.
thnaks.
Here is several ways to do it, but beware - modifying the keys in a hash can end with unwanted results, like:
use strict;
use warnings;
use Data::Dumper;
my $src = {
"a a" => 1,
" a a " => 2,
"' a a '" => 3,
};
print "src: ", Dumper($src);
my $trg;
#$trg{ map { s/^[\s']*(.*?)[\s']*$/$1/; $_ } keys %$src } = values %$src;
print "copy: ", Dumper($trg);
will produce:
src: $VAR1 = {
' a a ' => 2,
'\' a a \'' => 3,
'a a' => 1
};
copy: $VAR1 = {
'a a' => 1
};
Any regex is possible do explain with YAPE::Regex::Explain module. (from CPAN). For the above regex:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new( qr(^[\s']*(.*?)[\s']*$) )->explain;
will produce:
The regular expression:
(?-imsx:^[\s']*(.*?)[\s']*$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[\s']* any character of: whitespace (\n, \r, \t,
\f, and " "), ''' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In short the: s/^[\s']*(.*?)[\s']*$/$1/; mean:
at the beginning of the string match whitespaces or apostrophe as much times is possible,
then match anything
match at the end of string whitespaces or apostrophes as much times as possible
and keep the only the "anything" part
#!/usr/bin/perl
$string = "' my string'";
print $string . "\n";
$string =~ s/'//g;
$string =~ s/^ //g;
print $string;
Output
' my string'
my string
$var =~ tr/ '//d;
see: tr operator
or, by regex
$var =~ s/(?:^['\s]+)|'//g;
The latter will keep the spaces in the middle of the word, the former removes all spaces and single quotes.
A short test:
...
$var = q{' what ever'};
$var =~ s/
(?: # find the following group
^ # at string begin, followed by
['\s]+ # space or single quote, one or more
) # close group
| # OR
' # single quotes in the while string
//gx ; # replace by nothing, use formatted regex (x)
print "|$var|\n";
...
prints:
|what ever|
as expected.