stripping off numbers and alphabetics in perl - perl

I have an input variable, say $a. $a can be either number or string or mix of both.
My question is how can I strip off the variable to separate numeric digits and alphabetic characters?
Example;
$a can be 'AB9'
Here I should be able to store 'AB' in one variable and '9' in other.
How can I do that?

Check this version, it works with 1 or more numeric and alphabetic characters in a variable.
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $var = '11a';
my (#digits, #alphabetics);
while ($var =~ /([a-zA-Z]+)/g) {
push #alphabetics, $1;
}
while ($var =~ /(\d+)/g) {
push #digits, $1;
}
print Dumper(\#alphabetics);
print Dumper(\#digits);

Here's one way to express it very shortly:
my ($digits) = $input =~ /(\d+)/;
my ($alpha) = $input =~ /([a-z]+)/i;
say 'digits: ' . ($digits // 'none');
say 'non-digits: ' . ($alpha // 'none');
It's important to use the match operator in list context here, otherwise it would return if the match succeeded.
If you want to get all occurrences in the input string, simply change the scalar variables in list context to proper arrays:
my #digits = $input =~ /(\d+)/g;
my #alpha = $input =~ /([a-z]+)/gi;
say 'digits: ' . join ', ' => #digits;
say 'non-digits: ' . join ', ' => #alpha;
For my $input = '42AB17C', the output is
digits: 42, 17
non-digits: AB, C

Related

How to get values in different array from main array splitting by keyword in perl? [duplicate]

This question already has answers here:
Getting many values in an array in perl
(3 answers)
Closed 7 years ago.
I have one string FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)
I want to store these values in different arrays when ever A/D is found, using perl.
Eg.
Array1=1,10,A
Array2=11,20,D
Array3=31,5,BI,A
Array4=36,9,NU,D
Array5=46,9,D
It is not known that the bunch will be of 3 or 4 values!
Currently I am splitting the array with split
#!/usr/bin/perl
use strict;
use warnings;
#main = "FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)";
my #val = split(/,/,$1);
print "Val Array = #val\n";
But how to proceed further?
# Grab the stuff inside the parens.
my $input = "FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)";
my ($vals_str) = $input =~ /\(([^)]+)\)/;
# Get substrings of interest.
my #groups = $vals_str =~ /[^,].+?,[AD](?=,|$)/g;
# Split those into your desired arrays.
my #forces = map [split /,/, $_], #groups;
Note that this regex-based approach is reasonable for situations when you can assume that your input data is fairly clean. If you need to handle messier data and need your code to perform validation, I would suggest that you consider a different parsing strategy (as suggested in other answers).
my $str = 'FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)';
my ($list) = $str =~ /^[^=]*=\(([^()]*)\)$/
or die("Unexpected format");
my #list = split(/,/, $list);
my #forces;
while (#list) {
my #force;
while (1) {
die('No "A" or "D" value found') if !#list;
push #force, shift(#list);
last if $force[-1] eq 'A' || $force[-1] eq 'D';
}
push #forces, \#force;
}
Result:
#{$forces[0]} = ( 1, 10, 'A' );
#{$forces[1]} = ( 11, 20, 'D' );
#{$forces[2]} = ( 31, 5, 'BI', 'A' );
#{$forces[3]} = ( 36, 9, 'NU', 'D' );
#{$forces[4]} = ( 46, 9, 'D' );
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils 'part';
# Grab the stuff inside the parens.
my $input = "FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)";
my ($vals_str) = $input =~ /\(([^)]+)\)/;
my #val = split(/,/,$vals_str);
print "Val Array = #val\n";
my $i = 0;
my #partitions = part { $_ eq 'A' || $_ eq 'D' ? $i++ : $i } #val;
creates an array #partitions where each element is a reference to an array with the 3 or 4 elements you want grouped.
Let's start with some issues:
#main = "FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)";
You have use strict, but first you never declare #main, and #main is an array, but you're assigning it a single string.
my #val = split(/,/,$1);
Where does $1 come from?
print "Val Array = #val\n";
This might actually work. if #val had anything in it.
You have:
Array1=1,10,A
Array2=11,20,D
Array3=31,5,BI,A
Array4=36,9,NU,D
Array5=46,9,D
As your desired results. Are these scalar variables, or are these sub-arrays?
I'm going to assume the following:
You need to convert your FORCE string into an array.
You need your results in various arrays.
Because of this, I'm going to use an Array of Arrays which means I'm going to be using References.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
# Convert the string into an array
my $force = "FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)";
$force =~ s/FORCE=\((.*)\)/$1/; # Remove the "FORCE=(" prefix and the ")" suffix
my #main = split /,/, $force; # Convert string into an array
my #array_of_arrays; # Where I'm storing the arrays of arrays
my $array_of_arrays_number = 0; # Array number I'm using for #arrays
while (#main) { # Going through my "#main" array one character at a time
# Take a character from the #main array and put it onto whatever array of arrays you're pushing items into
my $character = shift #main;
push #{ $array_of_arrays[$array_of_arrays_number] }, $character;
# If Character is 'A' or 'D', start a new array_of_arrays
if ( $character eq 'A' or $character eq 'D' ) {
$array_of_arrays_number += 1;
}
}
# Let's print out these arrays
for my $array_number ( 0..$#array_of_arrays ) {
say "Array$array_number = ", join ", ", #{ $array_of_arrays[$array_number] };
}
I like functional approach so there is the version which makes splice indices first and then generates arrays of subarrays
use strict;
use warnings;
use Carp;
sub splice_force ($) {
my $str = shift;
croak "Unexpected format" unless $str =~ /^FORCE=\(([^()]*)\)/;
my #list = split ',', $1;
# find end positions for each splice
my #ends = grep $list[$_] =~ /^[AD]$/, 0 .. $#list;
# make array with starting positions
my #starts = ( 0, map $_ + 1, #ends );
#finally make splices (ignore last #starts element so iterate by #ends)
map [ #list[ shift(#starts) .. $_ ] ], #ends;
}
my $str = 'FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)';
print "#$_\n" for splice_force $str;
You can do this without creating intermediate arrays:
#!/usr/bin/env perl
use strict;
use warnings;
my $input = q{FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)};
my #groups = ([]);
while ($input =~ / ([A-Z0-9]+) ( [,)] ) /xg) {
my ($token, $sep) = ($1, $2);
push #{ $groups[-1] }, $token;
$token =~ /\A(?:A|D)\z/
or next;
$sep eq ')'
and last;
push #groups, [];
}
use YAML::XS;
print Dump \#groups;
Output:
---
- - '1'
- '10'
- A
- - '11'
- '20'
- D
- - '31'
- '5'
- BI
- A
- - '36'
- '9'
- NU
- D
- - '46'
- '9'
- D
There is no need for anything more than split. This solution checks that the string has the expected form and extracts the characters between the parentheses. Then that is split on commas that are preceded by a field that contains A or D, and the result is split again on commas.
use strict;
use warnings;
use 5.014; # For \K regex pattern
my $str = 'FORCE=(1,10,A,11,20,D,31,5,BI,A,36,9,NU,D,46,9,D)';
my #parts;
if ( $str =~ /FORCE \s* = \s* \( ( [^)]+ ) \)/x ) {
#parts = map [ split /,/ ], split / [AD] [^,]* \K , /x, $1;
}
use Data::Dump;
dd \#parts;
output
[
[1, 10, "A"],
[11, 20, "D"],
[31, 5, "BI", "A"],
[36, 9, "NU", "D"],
[46, 9, "D"],
]

Alternate between upper and lowercase, PERL

I want to alternate between upper and lower case, however I only managed to get the whole string upper or lower, or the first character.
I have not found a proper function to execute what I need. Please have a look and help me out. Cheers.
#!/usr/bin/perl
my $mystring = "this is my string I want each character to alternate between upper and lowercase";
my #myarray = split("", $mystring);
print ucfirst("#myarray");
A more general approach using function factory
use strict;
use warnings;
sub periodic {
my #subs = #_;
my $i = 0;
return sub {
$i = 0 if $i > $#subs;
return $subs[$i++]->(#_);
};
}
my $mystring = "this is my string I want each character to alternate between upper and lowercase";
my $f = periodic(
sub { uc pop },
sub { lc pop },
# sub { .. },
# sub { .. },
);
$mystring =~ s/([a-z])/ $f->($1) /egi;
print $mystring, "\n";
output
ThIs Is My StRiNg I wAnT eAcH cHaRaCtEr To AlTeRnAtE bEtWeEn UpPeR aNd LoWeRcAsE
How about:
my $mystring = "this is my string I want each character to alternate between upper and lowercase";
my #myarray = split("", $mystring);
my $cnt = 1;
for (#myarray) {
next unless /[a-z]/i;
$_ = ($cnt%2 ? uc($_) : lc($_));
$cnt++;
}
say join('',#myarray);
Output:
ThIs Is My StRiNg I wAnT eAcH cHaRaCtEr To AlTeRnAtE bEtWeEn UpPeR aNd LoWeRcAsE
My first thought was to use a regex substitution. Try this:
use strict;
use warnings;
my $str = "this string, I will change";
# Ignore whitespace and punctuation.
$str =~ s/(\w)(\w)/\L$1\U$2/g;
# Or include all characters in the uc/lc alternation.
# $str =~ s/(.)(.)/\L$1\U$2/g;
print $str, "\n";
If, for some reason, you wish to avoid regexes, try:
my $str = "this string, I will change";
my #ary;
my $count = 0;
for my $glyph ( split //, lc $str ) {
$glyph = uc $glyph if $count % 2;
push #ary, $glyph;
$count++;
}
print join( "", #ary ), "\n";
Try this:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
my $str = 'hello';
my $x = 0;
$str =~ s/(.)/($x++ % 2 == 0) ? "\U$1" : "\L$1"/eg;
say $str;
--output:--
HeLlO
Save script below with name alter.pl
#!/usr/bin/perl
print#ARGV[0]=~s/([a-z])([^a-z]*)([a-z])/uc($1).$2.lc$3/egri
And run script by command
$ perl alter.pl "this is my string I want each character to alternate between upper and lowercase"
Output
ThIs Is My StRiNg I wAnT eAcH cHaRaCtEr To AlTeRnAtE bEtWeEn UpPeR aNd LoWeRcAse
You have some good answers already but I thought I'd chip in because I hadn't seen map yet.
print map { $c++ % 2 ? lc : uc } split ( //, $mystring );
splits $mystring into characters (split //);
uses map to apply a function to each letter.
uses $c++ to autoincrement, then take a modulo 2 to decide if this should be uppercase or lower case.
join the resultant array.
Gives:
#!c:\Strawberry\perl\bin
use strict;
use warnings;
my $mystring = "this is my string I want each character to alternate between upper and lowercase";
my $c;
print join ( "", map { $c++ % 2 ? lc : uc } split ( //, $mystring ));
Prints:
ThIs iS My sTrInG I WaNt eAcH ChArAcTeR To aLtErNaTe bEtWeEn uPpEr aNd lOwErCaSe
map is a useful function that applies some code to each element in a list, and then 'returns' the list that's produced. So if we treat your string as a list of characters, it works nicely.
Try this. simple if else condition enough for this
my $mystring = "this is my string I want each character to alternate between upper and lowercase";
#xz = split( '', $mystring );
for ( $i = 0; $i < scalar #xz; $i++ ) {
if ( $i % 2 ) {
print uc "$xz[$i]";
}
else {
print "$xz[$i]";
}
}

How to skip splitting for some part of the line

Say I have a line lead=george wife=jane "his boy"=elroy. I want to split with space but that does not include the "his boy" part. I should be considered as one.
With normal split it is also splitting "his boy" like taking "his" as one and "boy" as second part. How to escape this
Following this i tried
split " ", $_
Just came to know that this will work
use strict; use warnings;
my $string = q(hi my name is 'john doe');
my #parts = $string =~ /'.*?'|\S+/g;
print map { "$_\n" } #parts;
But it does not looks good. Any other simple thing with split itself?
You could use Text::ParseWords for this
use Text::ParseWords;
$list = "lead=george wife=jane \"his boy\"=elroy";
#words = quotewords('\s+', 0, $list);
$i = 0;
foreach (#words) {
print "$i: <$_>\n";
$i++;
}
ouput:
0: <lead=george>
1: <wife=jane>
2: <his boy=elroy>
sub split_space {
my ( $text ) = #_;
while (
$text =~ m/
( # group ($1)
\"([^\"]+)\" # first try find something in quotes ($2)
|
(\S+?) # else minimal non-whitespace run ($3)
)
=
(\S+) # then maximum non-whitespace run ($4)
/xg
) {
my $key = defined($2) ? $2 : $3;
my $value = $4;
print( "key=$key; value=$value\n" );
}
}
split_space( 'lead=george wife=jane "his boy"=elroy' );
Outputs:
key=lead; value=george
key=wife; value=jane
key=his boy; value=elroy
PP posted a good solution. But just to make it sure, that there is a cool other way to do it, comes my solution:
my $string = q~lead=george wife=jane "his boy"=elroy~;
my #split = split / (?=")/,$string;
my #split2;
foreach my $sp (#split) {
if ($sp !~ /"/) {
push #split2, $_ foreach split / /, $sp;
} else {
push #split2,$sp;
}
}
use Data::Dumper;
print Dumper #split2;
Output:
$VAR1 = 'lead=george';
$VAR2 = 'wife=jane';
$VAR3 = '"his boy"=elroy';
I use a Lookahead here for splitting at first the parts which keys are inside quotes " ". After that, i loop through the complete array and split all other parts, which are normal key=values.
You can get the required result using a single regexp, which extract the keys and the values and put the result inside a hash table.
(\w+|"[\w ]+") will match both a single and multiple word in the key side.
The regexp captures only the key and the value, so the result of the match operation will be a list with the following content: key #1, value #1, key #2, value#2, etc.
The hash is automatically initiated with the appropriate keys and values, when the match result is assigned to it.
here is the code
my $str = 'lead=george wife=jane "hello boy"=bye hello=world';
my %hash = ($str =~ m/(?:(\w+|"[\w ]+")=(\w+)(?:\s|$))/g);
## outputs the hash content
foreach $key (keys %hash) {
print "$key => $hash{$key}\n";
}
and here is the output of this script
lead => george
wife => jane
hello => world
"hello boy" => bye

How to get consecutive pairs of words in Perl

With this sentence:
my $sent = "Mapping and quantifying mammalian transcriptomes RNA-Seq";
We want to get all possible consecutive pairs of words.
my $var = ['Mapping and',
'and quantifying',
'quantifying mammalian',
'mammalian transcriptomes',
'transcriptomes RNA-Seq'];
Is there a compact way to do it?
Yes.
my $sent = "Mapping and quantifying mammalian transcriptomes RNA-Seq";
my #pairs = $sent =~ /(?=(\S+\s+\S+))\S+/g;
A variation that (perhaps unwisely) relies on operator evaluation order but doesn't rely on fancy regexes or indices:
my #words = split /\s+/, $sent;
my $last = shift #words;
my #var;
push #var, $last . ' ' . ($last = $_) for #words;
This works:
my #sent = split(/\s+/, $sent);
my #var = map { $sent[$_] . ' ' . $sent[$_ + 1] } 0 .. $#sent - 1;
i.e. just split the original string into an array of words, and then use map to iteratively produce the desired pairs.
I don't have it as a single line, but the following code should give you somewhere to start. Basically does it with a push and a regext with /g.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 1;
my $t1 = 'aa bb cc dd ee ff';
my $t2 = 'aa bb cc dd ee';
foreach my $txt ( $t1, $t2 )
{
my #a;
push( #a, $& ) while( $txt =~ /\G\S+(\s+\S+|)\s*/g );
print Dumper( \#a );
}
One liner thanks to the syntax from #ysth
my #a = $txt =~ /\G(\S+(?:\s+\S+|))\s*/g;
My regex is slightly different in that if you have an odd number of words, the last word still gets an entry.

Cutting apart string in Perl

I have a string in Perl that is 23 digits long. I need to cut it apart into different pieces. First 2 digits in one variable, next 3 in another variable, next 4 into another variable, etc. Basically the 23 digits needs to end up as 6 separate variables (2,3,4,4,3,7) characters, in that order.
Any ideas how I can cut the string up like this?
There are lots of ways to do it, but the shortest is probably unpack:
my $string = '1' x 23;
my #values = unpack 'A2A3A4A4A3A7', $string;
If you need separate variables, you can use a list assignment:
my ($v1, $v2, $v3, $v4, $v5, $v6) = unpack 'A2A3A4A4A3A7', $string;
Expanding on Alex's method, rather than specify each start and end, use the list you gave of lengths.
#!/usr/bin/env perl
use strict;
use warnings;
my $string = "abcdefghijklmnopqrstuvw";
my $pos = 0;
my #split = map {
my $start = $pos;
my $end = $_;
$pos += $end;
substr( $string, $start, $end);
} (2,3,4,4,3,7);
print "$_\n" for #split;
This said you probably should look at unpack which is used for fixed width fields. I have no experience with it though.
You could use a regex, viz:
$string =~ /\d{2}\d{3}\d{4}\d{4}\d{3}\d{7}/
and capture each part by surrounding with brackets ().
You then find each capture in the variables $1, $2 ...
or get them all in the returned list
See perldoc perlre
You want to use perldoc substr.
$substring = substr($string, $start, $length);
I'd also use `map' on a list of [start, length] pairs to make your life easier:
$string = "123456789";
#values = map {substr($string, $_->[0], $_->[1])} ([1, 3], [4, 2] , ...);
Here's a sub that will do it, using the already discussed unpack.
sub string_slices {
my $str = shift;
return unpack( join( 'A', '', #_ ), $str );
}