Split string into array of words and numbers using Perl - perl

How can I split a string like "aa132bc4253defg18" to get an array of words and numbers:
aa,132,bc,4253,defg,18
I´m using Perl. The lengths of the subtrings are variable.

How about:
use Modern::Perl;
use Data::Dump qw(dump);
my $str = "aa132bc4253defg18";
my #l = split(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/, $str);
dump#l;
Output:
("aa", 132, "bc", 4253, "defg", 18)
It splits between a digit \d and a non digit \D, in both order.

Something like this will do:
split (/(\d+)/, $x);
With a full working example:
use strict; use warnings;
my $x = 'aa132bc4253defg18';
my #y = split /(\d+)/, $x;
print join ",", #y;
The important sections in perldoc split are:
Anything in EXPR that matches PATTERN is taken to be a separator that separates the EXPR into substrings (called "fields") that do not include the separator.
and
If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences);
Edit
If the string starts with a number, split will return an empty element. These can be elimanated with a grep:
grep {$_} split /(\d+)/, $x;
The example becomes
use strict; use warnings;
my $x = '44aa132bc4253defg18';
my #y = grep {$_} split /(\d+)/, $x;
print join ",", #y;

Related

Text::ParseWords->quotewords() does not work in case of single quote in a string in perl

I tried something like this :
use Text::ParseWords;
my $var="Id;Id2;my name 'is Ankit;code";
my #temp = quotewords('\;',1,$var);
my $length = scalar #temp;
print "$length\n";
I can use split() but it does not work in case of - "Id;Id2;"my name is ;Ankit";code". Any suggestions ?
You have to escape the single quote.
use strict;
use warnings;
use Text::ParseWords;
my $var = q{Id;Id2;my name \'is Ankit;code};
my #words = quotewords('\;', 0, $var);
print scalar #words; #prints 4

Perl: Find Nth occurrence in a string and return the sub-string up to this occurrence

I have the below string in which the delimiter is comma ",":
$str = "abc,123,rty,567,89,,90,gg"
I want to first find the Nth occurrence of "," in the above string, let's say I want to find the 5th occurrence.
In $str this is the comma after element 89.
Then I want to get that portion of the string $str which starts from 0 and ends up to this 5th comma, which would be:
"abc,123,rty,567,89"
Please advise how can I do this with Perl.
Thank you
One simple way using split and list slices:
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $str = "abc,123,rty,567,89,,90,gg";
say $str;
my $to_fifth = join ',', ( split(',', $str) )[0 .. 4] ;
say $to_fifth;
output
abc,123,rty,567,89,,90,gg
abc,123,rty,567,89

How to extract the words through pattern matching?

#!/usr/bin/perl
use strict;
use warnings;
my $string = "praveen is a good boy";
my #try = split(/([a,e,i,o,u]).*\1/,$string);
print "#try\n";
I am trying to print all words containing 2 adjacent vowels in a given string.
o/p : has to be "praveen" and "good" .
I tried with the negate exp [^] to split and give only the 2 adjacent vowels.
The Perl function split isn't a great fit for finding a list of matches. Instead, I would recommend using the regex modifier g. To process all the matches, you can either loop, using e.g. while, or you can assign the list of matches in one go.
The following example should match all words in a string which contain two adjacent vowels:
my $string = "praveen is a good boy";
while ( $string =~ /(\w*[aeiou]{2}\w*)/g ) {
print "$1\n"
}
Output:
praveen
good
You could also do this:
my #matches = ( $string =~ /\w*[aeiou]{2}\w*/g );
and process the result similar to how you were processing #try in the OP.
You could do something like..
#!/usr/bin/perl
use strict;
use warnings;
my $str
= "praveen is a good boy\n"
. "aaron is a good boy\n"
. "praveen and aaron are good, hoot, ho"
;
while ($str =~ /(\w*([aeiou])\2(?:\w*))/g) {
print $1, "\n";
}
Regular expression:
( group and capture to \1:
\w* word characters (a-z, A-Z, 0-9, _) (0 or more times)
( group and capture to \2:
[aeiou] any character of: 'a', 'e', 'i', 'o', 'u'
) end of \2
\2 what was matched by capture \2
(?: group, but do not capture:
\w* word characters (a-z, A-Z, 0-9, _) (0 or more times)
) end of grouping
) end of \1
Which is basically the same as doing /(\w*([aeiou])[aeiou]+(?:\w*))/
Output:
praveen
good
aaron
good
praveen
aaron
good
hoot
#!/usr/bin/perl
use strict;
use warnings;
my $string = "praveen is a good boy";
my #try = split(/\s/,$string);
for(#try) {
# if(/[a,e,i,o,u]{2}/) {
if(/[aeiou]{2}/) { # edited after Birei's comment
print "$_\n";
};
};
First argumant of "split" is a delimiter. Split splits (-8

Read from input and store comma separated values in Hash

I have a Perl question like this:
Write a Perl program that will read a series of last names and phone numbers from the given input. The names and numbers should be separated by a comma. Then print the names and numbers alphabetically according to last name.Use hashes.
Any idea how to solve this?
There's more than one way to do it :)
my %phonebook;
while(<>) {
chomp;
my ($name, $phone) = split /,/;
$phonebook{$name} = $phone;
}
print "$_ => $phonebook{$_}\n" for sort keys %phonebook;
Something like the following perhaps.
my %hash;
foreach(<>){ #reads yor args from commandline or input-file
my #arr = split(/\,/); #split at comma, every line
$hash{$arr[0]} = $arr[1]; #assign to hash
}
#print hash here
foreach my $key (sort keys %hash ) #sort and iterate
{
print "Name: " . $key . " Number: " . $hash{$key} . "\n";
}
Tasks like this are the strength of perl's command line switches. See perldoc perlrun for more infos!
Command line input
$ perl -naF',\s*' -lE'$d{$F[0]}=$F[1];END{say"$_: $d{$_}"for sort keys%d}'
Moe, 12345
Pi, 31416
Homer, 54321
Output
Homer: 54321
Moe: 12345
Pi: 31416
Assuming that we split on commas (you should use Text::CSV generally), we can actually create this hash with a simple application of the map function and the diamond operator (<>).
#!/usr/bin/env perl
use strict;
use warnings;
my %phonebook = map { chomp; split /,/ } <>;
use Data::Dumper;
print Dumper \%phonebook;
The last two lines are just to visualize the result, and the upper three should be in all scripts. The meat of the work is done all in the one line.

How to Split on three different delimiters then ucfirst each result[]?

I am trying to figure out how to split a string that has three possible delimiters (or none) without a million lines of code but, code is still legible to a guy like me.
Many possible combinations in the string.
this-is_the.string
this.is.the.string
this-is_the_string
thisisthestring
There are no spaces in the string and none of these characters:
~`!##$%^&*()+=\][{}|';:"/?>,<.
The string is already stripped of all but:
0-9
a-Z
-
_
.
There are also no sequential dots, dashes or underscores.
I would like the result to be displayed like Result:
This Is The String
I am really having a difficult time trying to get this going.
I believe I will need to use a hash and I just have not grasped the concept even after hours of trial and error.
I am bewildered at the fact I could possibly split a string on multiple delimiters where the delimiters could be in any order AND/OR three different types (or none at all) AND maintain the order of the result!
Any possibilities?
Split the string into words, capitalise the words, then join the words while inserting spaces between them.
It can be coded quite succinctly:
my $clean = join ' ', map ucfirst lc, split /[_.-]+/, $string;
If you just want to print out the result, you can use
use feature qw( say );
say join ' ', map ucfirst lc, split /[_.-]+/, $string;
or
print join ' ', map ucfirst lc, split /[_.-]+/, $string;
print "\n";
It is simple to use a global regular expression to gather all sequences of characters that are not a dot, dash, or underscore.
After that, lc will lower-case each string and ucfirst will capitalise it. Stringifying an array will insert spaces between the elements.
for ( qw/ this-is_the.string this.is.the.string this-is_the_string / ) {
my #string = map {ucfirst lc } /[^-_.]+/g;
print "#string\n";
}
output
This Is The String
This Is The String
This Is The String
" the delimiters could be anywhere AND/OR three different types (or none at all)" ... you need a delimiter to split a string, you can define multiple delimiters with a regular expression to the split function
my #parts = split(/[-_\.]/, $string);
print ucfirst "$_ " foreach #parts;
print "\n"
Here's a solution that will work for all but your last test case. It's extremely hard to split a string without delimiters, you'd need to have a list of possible words, and even then it would be prone to error.
#!/usr/bin/perl
use strict;
use warnings;
my #strings = qw(
this-is_the.string
this.is.the.string
this-is_the_string
thisisthestring
);
foreach my $string (#strings) {
print join(q{ }, map {ucfirst($_)} split(m{[_.-]}smx,$string)) . qq{\n};
}
And here's an alternative for the loop that splits everything into separate statements to make it easier to read:
foreach my $string (#strings) {
my #words = split m{[_.-]}smx, $string;
my #upper_case_words = map {ucfirst($_)} #words;
my $string_with_spaces = join q{ }, #upper_case_words;
print $string_with_spaces . qq{\n};
}
And to prove that just because you can, doesn't mean you should :P
$string =~ s{([A-Za-z]+)([_.-]*)?}{ucfirst(lc("$1")).($2?' ':'')}ge;
For all but last possibility:
use strict;
use warnings;
my $file;
my $newline;
open $file, "<", "testfile";
while (<$file>) {
chomp;
$newline = join ' ', map ucfirst lc, split /[-_\.]/, $_;
print $newline . "\n";
}