Split Variable on white space [duplicate] - perl

This question already has answers here:
Using perl to split a line that may contain whitespace
(5 answers)
Closed 9 years ago.
I'm trying to split a string into an array with the split occurring at the white spaces. Each block of text is seperated by numerous (variable) spaces.
Here is the string:
NUM8 host01 1,099,849,993 1,099,849,992 1
I have tried the following without success.
my #array1 = split / /, $VAR1;
my #array1 = split / +/, $VAR1;
my #array1 = split /\s/, $VAR1;
my #array1 = split /\s+/, $VAR1;
I'd like to end up with:
$array1[0] = NUM8
$array1[1] = host01
$array1[2] = 1,099,849,993
$array1[3] = 1,099,849,992
$array1[4] = 1
What is the best way to split this?

If the first argument to split is the string ' ' (the space), it is special. It should match whitespace of any size:
my #array1 = split ' ', $VAR1;
(BTW, it is almost equivalent to your last option, but it also removes any leading whitespace.)

Just try using:
my #array1 = split(' ',$VAR1);
Codepad Demo
From Perldoc:
As another special case, split emulates the default behavior of the
command line tool awk when the PATTERN is either omitted or a literal
string composed of a single space character (such as ' ' or "\x20" ,
but not e.g. / / ). In this case, any leading whitespace in EXPR is
removed before splitting occur

\s+ matches 1 or more whitespaces, and split on them
my #array1 = split /\s+/, $VAR1;

Related

Perl - Convert integer to text Char(1,2,3,4,5,6)

I am after some help trying to convert the following log I have to plain text.
This is a URL so there maybe %20 = 'space' and other but the main bit I am trying convert is the char(1,2,3,4,5,6) to text.
Below is an example of what I am trying to convert.
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
What I have tried so far is the following while trying to added into the char(in here) to convert with the chr($2)
perl -pe "s/(char())/chr($2)/ge"
All this has manage to do is remove the char but now I am trying to convert the number to text and remove the commas and brackets.
I maybe way off with how I am doing as I am fairly new to to perl.
perl -pe "s/word to remove/word to change it to/ge"
"s/(char(what goes in here))/chr($2)/ge"
Output try to achieve is
select -x1-Q-,-x2-Q-,-x3-Q-
Or
select%20-x1-Q-,-x2-Q-,-x3-Q-
Thanks for any help
There's too much to do here for a reasonable one-liner. Also, a script is easier to adjust later
use warnings;
use strict;
use feature 'say';
use URI::Escape 'uri_unescape';
my $string = q{select%20}
. q{char(45,120,49,45,81,45),char(45,120,50,45,81,45),}
. q{char(45,120,51,45,81,45)};
my $new_string = uri_unescape($string); # convert %20 and such
my #parts = $new_string =~ /(.*?)(char.*)/;
$parts[1] = join ',', map { chr( (/([0-9]+)/)[0] ) } split /,/, $parts[1];
$new_string = join '', #parts;
say $new_string;
this prints
select -x1-Q-,-x2-Q-,-x3-Q-
Comments
Module URI::Escape is used to convert percent-encoded characters, per RFC 3986
It is unspecified whether anything can follow the part with char(...)s, and what that might be. If there can be more after last char(...) adjust the splitting into #parts, or clarify
In the part with char(...)s only the numbers are needed, what regex in map uses
If you are going to use regex you should read up on it. See
perlretut, a tutorial
perlrequick, a quick-start introduction
perlre, the full account of syntax
perlreref, a quick reference (its See Also section is useful on its own)
Alright, this is going to be a messy "one-liner". Assuming your text is in a variable called $text.
$text =~ s{char\( ( (?: (?:\d+,)* \d+ )? ) \)}{
my #arr = split /,/, $1;
my $temp = join('', map { chr($_) } #arr);
$temp =~ s/^|$/"/g;
$temp
}xeg;
The regular expression matches char(, followed by a comma-separated list of sequences of digits, followed by ). We capture the digits in capture group $1. In the substitution, we split $1 on the comma (since chr only works on one character, not a whole list of them). Then we map chr over each number and concatenate the result into a string. The next line simply puts quotation marks at the start and end of the string (presumably you want the output quoted) and then returns the new string.
Input:
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
Output:
select%20"-x1-Q-","-x2-Q-","-x3-Q-"
If you want to replace the % escape sequences as well, I suggest doing that in a separate line. Trying to integrate both substitutions into one statement is going to get very hairy.
This will do as you ask. It performs the decoding in two stages: first the URI-encoding is decoded using chr hex $1, and then each char() function is translated to the string corresponding to the character equivalents of its decimal parameters
use strict;
use warnings 'all';
use feature 'say';
my $s = 'select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)';
$s =~ s/%(\d+)/ chr hex $1 /eg;
$s =~ s{ char \s* \( ( [^()]+ ) \) }{ join '', map chr, $1 =~ /\d+/g }xge;
say $s;
output
select -x1-Q-,-x2-Q-,-x3-Q-

issue in matching regexp in perl

I am having following code
$str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
$val = $str =~ /[
]*([\n]?[\n]+
[\n]?) ([^;^
]+)/s;
print "$1 and $2";
Getting output as
and PLNA
Why it is getting PLNA as output. I believe it should stop at first\n. I assume output should be OTNPKT0553 04-02-03 21:43:46
Your regex is messy and contains a lot of redundancy. The following steps demonstrate how it can be simplified and then it becomes more clear why it is matching PLNA.
1) Translating the literal new lines in your regex:
$val = $str =~ /[\n\n]*([\n]?[\n]+\n[\n]?) ([^;^\n]+)/s;
2) Then simplifying this code to remove the redundancy:
$val = $str =~ /(\n{2}) ([^;^\n]+)/s;
So basically, the regex is looking for two new lines followed by 3 spaces.
There are three spaces before OTNPKT0553, but there is only a single new line, so it won't match.
The next three spaces are before PLNA which IS preceded by two new lines, and so matches.
You have a whole lot of newlines in there - some literal and some encoded as \n. I'm not clear how you were thinking. Did you think \n matched a number maybe? A \d matches a digit, and will also match many Unicode characters that are digits in other languages. However for simple ASCII text it works fine.
What you need is something like this
use strict;
use warnings;
my $str = "
OTNPKT0553 04-02-03 21:43:46
M X DENY
PLNA
/*Privilege, Login Not Active*/
;";
my $val = $str =~ / (\w+) \s+ ( [\d-]+ \s [\d:]+ ) /x;
print "$1 and $2";
output
OTNPKT0553 and 04-02-03 21:43:46
You have an extra line feed, change the regex to:
$str =~ /[
]*([\n]?[\n]+[\n]?) ([^;^
]+)/s;
and simpler:
$str =~ /\n+ ([^;^\n]+)/s;

How do I find the sum of all numbers in STDIN even if there are non-digit characters?

I have an assignment asking me to enter a sequence of numbers and characters each separated by a space and the sequence in ended by entering in "q" or "Q" followed by a space. Everything except the numbers should be discarded and we are to find the sum. So for example if the input is "1 12 a 2 5 P Q" then we should expect to get "20" as the output.
So far I'm using
$input = <>;
$input =~ tr/0-9//cd;
to get only the numbers but what I want is to split them up and get the sum. Right now the output would be 11225 and I want "1+12+2+5" and get the sum.
perl -ne '$s=0;($line)=/(.*?)[Qq]/;while($line=~/(\d+)/g) {$s+=$1} print "$s\n"'
Explanation:
Strips the trailing part of each line starting with a Q or a q, then scan the remaining part for isolated positive integers and adds these together.
First, strip out all characters that aren't numbers or spaces:
$input =~ s/[^0-9\s]//g;
Then, split on whitespace:
#digits = split(/\s/, $input);
Then you have a list of digits that you can add up.
Preserve spaces in your first step:
$input =~ tr/0-9 //cd;
Then split on spaces:
my #numbers = split ' ', $input;
(this is a special form of split that works like split /\s+/ but also discards empty leading fields).
You probably want to start by getting rid of everything after a Q though:
$input =~ s/Q .*//i;
For what it's worth, I wouldn't have jumped to using tr here; I'd have started by spliting on spaces, then processed fields that were only digits until a Q was reached.

Perl Split on first occourence

Suppose string is:
ABC-Digest-M2-2.03-04.01.00.05
I want to split "ABC-Digest-M2" and "2.03-04.01.00.05" in two strings.
First occurrence of - and digit combination. "-\d".
How can I do this with one line of code ?
You can use split with a lookahead assertion to do this without consuming the digit. e.g.
perl -MData::Dumper -e 'print Dumper(
split /-(?=\d)/, "ABC-Digest-M2-2.03-04.01.00.05", 2
);'
$VAR1 = 'ABC-Digest-M2';
$VAR2 = '2.03-04.01.00.05';
Split on dash - followed by digit, and limit split() to max number of fields,
my $string = "ABC-Digest-M2-2.03-04.01.00.05";
my ($p1, $p2) = split /-(?=\d)/, $string, 2;

how to get the required strings from a text using perl

Here is the text to trim:
/home/netgear/Desktop/WGET-1.13/wget-1.13/src/cmpt.c:388,error,resourceLeak,Resource leak: fr
From the above text I need to get the data next to ":". How do I get 388,error,resourceLeak,Resource leak: fr?
You can use split to separate a string into a list based on a delimiter. In your case the delimiter should be a ::
my #parts = split ':', $text;
As the text you want to extract can also contain a :, use the limit argument to stop after the first one:
my #parts = split ':', $text, 2;
$parts[1] will then contain the text you wanted to extract. You could also pass the result into a list, discarding the first element:
my (undef, $extract) = split ':', $text, 2;
Aside from #RobEarl's suggestion of using split, you could use a regular expression to do this.
my ($match) = $text =~ /^[^:]+:(.*?)$/;
Regular expression:
^ the beginning of the string
[^:]+ any character except: ':' (1 or more times)
: match ':'
( group and capture to \1:
.*? any character except \n (0 or more times)
) end of \1
$ before an optional \n, and the end of the string
$match will now hold the result of capture group #1..
388,error,resourceLeak,Resource leak: fr