Shift and Split - perl

i want to make this 2-Commands shorter, 1 line if it possible.
my $id = shift;
my #splittedID = split(" ", $id);
return $splittedID[0];
but it should have at the end the same functional. Thanks

return (split " ", shift)[0];
Or, if you want:
(split " ", shift)[0];
(The result of the last line of a sub implicitly becomes the return value).
Even shorter (Perl 5.16 required for /r option):
$_[0]=~s/ .*//r
Of course, in actual production code, your original example is better, since it's readable.

Because you just want the first item of the split and ' ' is a special pattern that skips all leading spaces, a regex can also solve this like so:
sub firstword {
return (shift =~ /(\S+)/)[0];
}
my $x = firstword('asdf qwer jkl') # Equals 'asdf';
my $y = firstword(' qwer jkl') # Equals 'qwer';
my $z = firstword(' ') # Equals undef;
Also, the return keyword is optional of course, but shorter is not always better.

This is much more compact using a regular expression:
sub first_field {
return unless $_[0] =~ /(\S+)/;
$1;
}

Related

Perl regular expressions and returned array of matched groups

i am new in Perl and i need to do some regexp.
I read, when array is used like integer value, it gives count of elements inside.
So i am doing for example
if (#result = $pattern =~ /(\d)\.(\d)/) {....}
and i was thinking it should return empty array, when pattern matching fails, but it gives me still array with 2 elements, but with uninitialized values.
So how i can put pattern matching inside if condition, is it possible?
EDIT:
foreach (keys #ARGV) {
if (my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) {
if (defined $params{$result[0]}) {
print STDERR "Cmd option error\n";
}
$params{$result[0]} = (defined $result[1] ? $result[1] : 1);
}
else {
print STDERR "Cmd option error\n";
exit ERROR_CMD;
}
}
It is regexp pattern for command line options, cmd options are in long format with two hyphens preceding and possible with argument, so
--CMD[=ARG]. I want elegant solution, so this is why i want put it to if condition without some prolog etc.
EDIT2:
oh sry, i was thinking groups in #result array are always counted from 0, but accesible are only groups from branch, where the pattern is success. So if in my code command is "input", it should be in $result[0], but actually it is in $result[1]. I thought if $result[0] is uninitialized, than pattern fails and it goes to the if statement.
Consider the following:
use strict;
use warnings;
my $pattern = 42.42;
my #result = $pattern =~ /(\d)\.(\d)/;
print #result, ' elements';
Output:
24 elements
Context tells Perl how to treat #result. There certainly aren't 24 elements! Perl has printed the array's elements which resulted from your regex's captures. However, if we do the following:
print 0 + #result, ' elements';
we get:
2 elements
In this latter case, Perl interprets a scalar context for #result, so adds the number of elements to 0. This can also be achieved through scalar #results.
Edit to accommodate revised posting: Thus, the conditional in your code:
if(my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) { ...
evaluates to true if and only if the match was successful.
#results = $pattern =~ /(\d)\.(\d)/ ? ($1,$2) : ();
Try this:
#result = ();
if ($pattern =~ /(\d)\.(\d)/)
{
push #result, $1;
push #result, $2;
}
=~ is not an equal sign. It's doing a regexp comparison.
So my code above is initializing the array to empty, then assigning values only if the regexp matches.

stripping off numbers and alphabetics in perl

I have an input variable, say $a. $a can be either number or string or mix of both.
My question is how can I strip off the variable to separate numeric digits and alphabetic characters?
Example;
$a can be 'AB9'
Here I should be able to store 'AB' in one variable and '9' in other.
How can I do that?
Check this version, it works with 1 or more numeric and alphabetic characters in a variable.
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $var = '11a';
my (#digits, #alphabetics);
while ($var =~ /([a-zA-Z]+)/g) {
push #alphabetics, $1;
}
while ($var =~ /(\d+)/g) {
push #digits, $1;
}
print Dumper(\#alphabetics);
print Dumper(\#digits);
Here's one way to express it very shortly:
my ($digits) = $input =~ /(\d+)/;
my ($alpha) = $input =~ /([a-z]+)/i;
say 'digits: ' . ($digits // 'none');
say 'non-digits: ' . ($alpha // 'none');
It's important to use the match operator in list context here, otherwise it would return if the match succeeded.
If you want to get all occurrences in the input string, simply change the scalar variables in list context to proper arrays:
my #digits = $input =~ /(\d+)/g;
my #alpha = $input =~ /([a-z]+)/gi;
say 'digits: ' . join ', ' => #digits;
say 'non-digits: ' . join ', ' => #alpha;
For my $input = '42AB17C', the output is
digits: 42, 17
non-digits: AB, C

Using a char variable in tr///

I am trying to count the characters in a string and found an easy solution counting a single character using the tr operator. Now I want to do this with every character from a to z. The following solution doesn't work because tr/// matches every character.
my #chars = ('a' .. 'z');
foreach my $c (#chars)
{
$count{$c} = ($text =~ tr/$c//);
}
How do I correctly use the char variable in tr///?
tr/// doesn't work with variables unless you wrap it in an eval
But there is a nicer way to do this:
$count{$_} = () = $text =~ /$_/g for 'a' .. 'z';
For the TIMTOWTDI:
$count{$_}++ for grep /[a-z]/i, split //, $text;
tr doesn't support variable interpolation (neither in the search list nor in the replacement list). If you want to use variables, you must use eval():
$count{$c} = eval "\$text =~ tr/$c/$c/";
That said, a more efficient (and secure) approach would be to simply iterate over the characters in the string and increment counters for each character, e.g.:
my %count = map { $_ => 0 } 'a' .. 'z';
for my $char (split //, $text) {
$count{$char}++ if defined $count{$char};
}
If you look at the perldoc for tr/SEARCHLIST/REPLACEMENTLIST/cdsr, then you'll see, right at the bottom of the section, the following:
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $# if $#;
eval "tr/$oldlist/$newlist/, 1" or die $#;
Thus, you would need an eval to generate a new SEARCHLIST.
This is going to be very inefficient... the code might feel neat, but you're processing the complete string 26 times. You're also not counting uppercase characters.
You'd be better off stepping through the string once and just incrementing counters for each character found.
From the perlop documentation:
tr/AAA/XYZ/
will transliterate any A to X.
Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval()
Alternatively in your case you can use the s/// operator as:
foreach my $c (#chars) {
$count{$c} += ($text =~ s/$c//g);
}
My solution with some modification based from http://www.perlmonks.org/?node_id=446003
sub lowerLetters {
my $string = shift;
my %table;
#table{split //, $letters_uc} = split //, $letters_lc;
my $table_re = join '|', map { quotemeta } reverse sort keys %table;
$string =~ s/($table_re)/$table{$1}/g;
return if not defined $string;
return $string;
}
You may want to use s instead. Substitution is much more powerful than tr
My solution:
$count{$c} =~ s/\$search/$replace/g;
g at the end means "use it globally".
See:
https://blog.james.rcpt.to/2010/10/25/perl-search-and-replace-using-variables/
https://docstore.mik.ua/orelly/perl3/lperl/ch09_06.htm

Saving a transliteration table in perl

I want to transliterate digits from 1 - 8 with 0 but not knowing the number at compile time. Since transliterations do not interpolate variables I'm doing this:
#trs = (sub{die},sub{${$_[0]} =~ tr/[0,1]/[1,0]/},sub{${$_[0]} =~ tr/[0,2]/[2,0]/},sub{${$_[0]} =~ tr/[0,3]/[3,0]/},sub{${$_[0]} =~ tr/[0,4]/[4,0]/},sub{${$_[0]} =~ tr/[0,5]/[5,0]/},sub{${$_[0]} =~ tr/[0,6]/[6,0]/},sub{${$_[0]} =~ tr/[0,7]/[7,0]/},sub{${$_[0]} =~ tr/[0,8]/[8,0]/});
and then index it like:
$trs[$character_to_transliterate](\$var_to_change);
I would appreciate if anyone can point me to a best looking solution.
Any time that you are repeating yourself, you should see if what you are doing can be done in a loop. Since tr creates its tables at compile time, you can use eval to access the compiler at runtime:
my #trs = (sub {die}, map {eval "sub {\$_[0] =~ tr/${_}0/0$_/}"} 1 .. 8);
my $x = 123;
$trs[2]($x);
print "$x\n"; # 103
There is also no need to use references here, subroutine arguments are already passed by reference.
If you do not want to use string eval, you need to use a construct that supports runtime modification. For that you can use the s/// operator:
sub subst {$_[0] =~ s/($_[1]|0)/$1 ? 0 : $_[1]/ge}
my $z = 1230;
subst $z => 2;
print "$z\n"; # 1032
The tr/// construct is faster than s/// since the latter supports regular expressions.
I'd suggest simply ditching tr in favor of something that actually permits a little bit of metaprogramming like s///. For example:
# Replace $to_swap with 0 and 0 with $to_swap, and leave
# everything else alone.
sub swap_with_0 {
my ($digit, $to_swap) = #_;
if ($digit == $to_swap) {
return 0;
} elsif ($digit == 0) {
return $to_swap;
} else {
return $digit;
}
}
# Swap 0 and $to_swap throughout $string
sub swap_digits {
my ($string, $to_swap) = #_;
$string =~ s/([0$to_swap])/swap_with_0($1, $to_swap)/eg;
return $string;
}
which is surprisingly straightforward. :)
Here's a short subroutine that uses substitution instead of transliteration:
sub swap_digits {
my ($str, $digit) = #_;
$str =~ s{ (0) | $digit }{ defined $1 ? $digit : 0 }gex;
return $str;
}

Split on comma, but only when not in parenthesis

I am trying to do a split on a string with comma delimiter
my $string='ab,12,20100401,xyz(A,B)';
my #array=split(',',$string);
If I do a split as above the array will have values
ab
12
20100401
xyz(A,
B)
I need values as below.
ab
12
20100401
xyz(A,B)
(should not split xyz(A,B) into 2 values)
How do I do that?
use Text::Balanced qw(extract_bracketed);
my $string = "ab,12,20100401,xyz(A,B(a,d))";
my #params = ();
while ($string) {
if ($string =~ /^([^(]*?),/) {
push #params, $1;
$string =~ s/^\Q$1\E\s*,?\s*//;
} else {
my ($ext, $pre);
($ext, $string, $pre) = extract_bracketed($string,'()','[^()]+');
push #params, "$pre$ext";
$string =~ s/^\s*,\s*//;
}
}
This one supports:
nested parentheses;
empty fields;
strings of any length.
Here is one way that should work.
use Regexp::Common;
my $string = 'ab,12,20100401,xyz(A,B)';
my #array = ($string =~ /(?:$RE{balanced}{-parens=>'()'}|[^,])+/g);
Regexp::Common can be installed from CPAN.
There is a bug in this code, coming from the depths of Regexp::Common. Be warned that this will (unfortunately) fail to match the lack of space between ,,.
Well, old question, but I just happened to wrestle with this all night, and the question was never marked answered, so in case anyone arrives here by Google as I did, here's what I finally got. It's a very short answer using only built-in PERL regex features:
my $string='ab,12,20100401,xyz(A,B)';
$string =~ s/((\((?>[^)(]*(?2)?)*\))|[^,()]*)(*SKIP),/$1\n/g;
my #array=split('\n',$string);
Commas that are not inside parentheses are changed to newlines and then the array is split on them. This will ignore commas inside any level of nested parentheses, as long as they're properly balanced with a matching number of open and close parens.
This assumes you won't have newline \n characters in the initial value of $string. If you need to, either temporarily replace them with something else before the substitution line and then use a loop to replace back after the split, or just pick a different delimiter to split the array on.
Limit the number of elements it can be split into:
split(',', $string, 4)
Here's another way:
my $string='ab,12,20100401,xyz(A,B)';
my #array = ($string =~ /(
[^,]*\([^)]*\) # comma inside parens is part of the word
|
[^,]*) # split on comma outside parens
(?:,|$)/gx);
Produces:
ab
12
20100401
xyz(A,B)
Here is my attempt. It should handle depth well and could even be extended to include other bracketed symbols easily (though harder to be sure that they MATCH). This method will not in general work for quotation marks rather than brackets.
#!/usr/bin/perl
use strict;
use warnings;
my $string='ab,12,20100401,xyz(A(2,3),B)';
print "$_\n" for parse($string);
sub parse {
my ($string) = #_;
my #fields;
my #comma_separated = split(/,/, $string);
my #to_be_joined;
my $depth = 0;
foreach my $field (#comma_separated) {
my #brackets = $field =~ /(\(|\))/g;
foreach (#brackets) {
$depth++ if /\(/;
$depth-- if /\)/;
}
if ($depth == 0) {
push #fields, join(",", #to_be_joined, $field);
#to_be_joined = ();
} else {
push #to_be_joined, $field;
}
}
return #fields;
}