Autoincrementing letters in Perl - perl

I do not understand autoincrementing letters in Perl.
This example seems perfectly understandable:
$a = 'bz'; ++$a;
ca #output
b gets incremented to c. There is nothing left for z to go to, so it goes back to a (or at least this is how I see the process).
But then I come across statements like this:
$a = 'Zz'; ++$a;
AAa #output
and:
$a = '9z'; ++$a;
10 #output
Why doesn't incrementing Zz return Aa? And why doesn't incrementing 9z return 0z?
Thanks!

To quote perlop:
If, however, the variable has been
used in only string contexts since it
was set, and has a value that is not
the empty string and matches the
pattern /^[a-zA-Z]*[0-9]*\z/, the
increment is done as a string,
preserving each character within its
range, with carry.
The ranges are 0-9, A-Z, and a-z. When a new character is needed, it is taken from the range of the first character. Each range is independent; characters never leave the range they started in.
9z does not match the pattern, so it gets a numeric increment. (It probably ought to give an "Argument isn't numeric" warning, but it doesn't in Perl 5.10.1.) Digits are allowed only after all the letters (if any), never before them.
Note that an all-digit string does match the pattern, and does receive a string increment (if it's never been used in a numeric context). However, the result of a string increment on such a string is identical to a numeric increment, except that it has infinite precision and leading zeros (if any) are preserved. (So you can only tell the difference when the number of digits exceeds what an IV or NV can store, or it has leading zeros.)
I don't see why you think Zz should become Aa (unless you're thinking of modular arithmetic, but this isn't). It becomes AAa through this process:
Incrementing z wraps around to a. Increment the previous character.
Incrementing Z wraps around to A. There is no previous character, so add the first one from this range, which is another A.
The range operator (..), when given two strings (and the left-hand one matches the pattern), uses the string increment to produce a list (this is explained near the end of that section). The list starts with the left-hand operand, which is then incremented until either:
The value equals the right-hand operand, or
The length of the value exceeds the length of the right-hand operand.
It returns a list of all the values. (If case 2 terminated the list, the final value is not included in it.)

Because (ignoring case for the moment; case is merely preserved, nothing interesting happens with it), 'AA' is the successor to 'Z', so how could it also be the successor to 'ZZ'? The successor to 'ZZ' is 'AAA'.
Because as far as ++ and all other numeric operators are concerned, "9z" is just a silly way of writing 9, and the successor to 9 is 10. The special string behavior of auto-increment is clearly specified to only occur on strings of letters, or strings of letters followed by numbers (and not mixed in any other way).

The answer is to not do that. The automagic incrementing of ++ with non-numbers is full of nasty pitfalls. It is suitable only for quick hacks.
You are better off writing your own iterator for this sort of thing:
#!/usr/bin/perl
use strict;
use warnings;
{ package StringIter;
sub new {
my $class = shift;
my %self = #_;
$self{set} = ["a" .. "z"] unless exists $self{set};
$self{value} = -1 unless exists $self{value};
$self{size} = #{$self{set}};
return bless \%self, $class;
}
sub increment {
my $self = shift;
$self->{value}++;
}
sub current {
my $self = shift;
my $n = $self->{value};
my $size = $self->{size};
my $s = "";
while ($n >= $size) {
my $offset = $n % $size;
$s = $self->{set}[$offset] . $s;
$n /= $size;
}
$s = $self->{set}[$n] . $s;
return $s;
}
sub next {
my $self = shift;
$self->increment;
return $self->current;
}
}
{
my $iter = StringIter->new;
for (1 .. 100) {
print $iter->next, "\n";
}
}
{
my $iter = StringIter->new(set => [0, 1]);
for (1 .. 7) {
print $iter->next, "\n";
}
}

You're asking why increment doesn't wrap around.
If it did it wouldn't really be an increment. To increment means you have a totally ordered set and an element in it and produce the next higher element, so it can never take you back to a lower element. In this case the total ordering is the standard alphabetical ordering of strings (which is only defined on the English alphabet), extended to cope with arbitrary ASCII strings in a way that seems natural for certain common types of identifier strings.
Wrapping would also defeat its purpose: usually you want to use it to generate arbitrarily many different identifiers of some sort.
I agree with Chas Owens's verdict: applying this operation to arbitrary strings is a bad idea, that's not the sort of use it was intended for.
I disagree with his remedy: just pick a simple starting value on which increment behaves sanely, and you'll be fine.

I don't see why incrementing Zz would return Aa; why do you think it should? 9z incrementing looks like Perl thinks 9z is a number 9 rather than some kind of base-36 weirdness.

=> In case of alpha-numeric strings starting with a character like 'bz' or 'Zz' start moving from the right.The first character is 'z'.As you say there is nowhere for 'z' to increment so it increments to 'a' but an extra carry is given over to the next digit on the left.So 'b' increments to 'c'. Now in the second case Z does not see any alphabet to the left of it.In such cases an extra copy of the current digit is created as it gets incremented.
=> In case of alpha-numeric strings starting with a digit like '9z', perl considers it as a mistake made by the user and considers it as the number which precedes the string (in this case 9) and increments the number. So 9 becomes 10.
Plz. correct me if I am wrong

Related

How to get the index of the last digit in a string in Perl?

In Perl, how do you find the index of the last digit in a string?
Example: Hello123xyz index of last digit is 7
A RE to match the last digit in the string and the #- variable to get the index of the start of the match:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
sub last_digit_index($) {
if ($_[0] =~ /\d\D*\z/) {
return $-[0];
} else {
return -1;
}
}
say last_digit_index("Hello123xyz"); # 7
I'd probably use the pos function here. Match with /g and the Perl remembers where the match left off. The next global match on that string will start where the last match left off, so isolate this in a sub or block to avoid weird effects on subsequent matches on the same variable.
Since the position counts from 0, the next position will be one greater than the 1-based position of the final digit. You decide if you want to subtract 1 or not:
use v5.10;
say last_digit_pos('Hello123xyz');
sub last_digit_pos {
my( $string ) = #_;
$string =~ m/^.*\d/sg;
return pos($string); # 6
}
And, if the string doesn't match, pos doesn't return a defined value.
Can also leverage List::MoreUtils::last_index
use List::MoreUtils qw(last_index);
my $last_digit_index = last_index { /[0-9]/ } split '', $string;
I find this simple: break the string into a list of characters with a typical use of split, and use a library to find the last one which is a digit, via a trivial regex.
Note that this is "expensive" as it creates a scalar for each character and runs regex multiple times. So if efficiency matters -- if this is done on an absolutely gigantic string, or many many many times on smaller strings -- then better seek other approaches, or at least benchmark it before deciding on it. (Note, that would have to be really a lot of strings to see degraded efficiency.)

Matching inside square Brackets in Perl

When I try to match 2 variables which are same, it works until there is a square bracket in perl for me.
For ex, VAR1 = u6701, VAR2 = u6701 matches and gives me EQUAL
However, VAR1 = aw[101], VAR2 = aw[101] gives me UNEQUAL.
I use $VAR1 == $VAR2 to check and both the variables are strings. Please help.
Thanks.
== is the numeric equality operator in Perl, it checks that two things are equal as numbers. eq is the string equality operator, that's what you want to be using. "1" and "01.00" are equal as numbers but not as strings. Here's the docs on all the equality operators. There is also the pretty good online book Beginning Perl.
Why == sometimes works is because Perl is pretty liberal, to the point of desperation, about interpreting strings as numbers. Often it will simply consider a string to be 0, but sometimes it will find a number in the string and use it. For example, "101aw" will be interpreted as 101, but "aw101" is 0. Do not rely on this.
BTW Perl will warn you about all this, but not by default. You have to turn on strict and warnings and I highly recommend you do and deal with all the issues it brings up. It will save you (and us) lots of time.
In perl in order to check two string values I recommend you to use eq
for eg
if($VAR1 eq $VAR2)
return true;
else
return false;
== tests equality for numbers.
eq does the same for strings.
You can also use the cmp operator, which is the non-numerical equivalent of the <=> operator:
$result = $string1 cmp $string2;
$result will be:
`0` if the strings are equal
`1` if string1 is greater than string2
`-1` if string1 is less than string2

How exactly does Perl handle operator chaining?

So I have this bit of code that does not work:
print $userInput."\n" x $userInput2; #$userInput = string & $userInput2 is a integer
It prints it out once fine if the number is over 0 of course, but it doesn't print out the rest if the number is greater than 1. I come from a java background and I assume that it does the concatenation first, then the result will be what will multiply itself with the x operator. But of course that does not happen. Now it works when I do the following:
$userInput .= "\n";
print $userInput x $userInput2;
I am new to Perl so I'd like to understand exactly what goes on with chaining, and if I can even do so.
You're asking about operator precedence. ("Chaining" usually refers to chaining of method calls, e.g. $obj->foo->bar->baz.)
The Perl documentation page perlop starts off with a list of all the operators in order of precedence level. x has the same precedence as other multiplication operators, and . has the same precedence as other addition operators, so of course x is evaluated first. (i.e., it "has higher precedence" or "binds more tightly".)
As in Java you can resolve this with parentheses:
print(($userInput . "\n") x $userInput2);
Note that you need two pairs of parentheses here. If you'd only used the inner parentheses, Perl would treat them as indicating the arguments to print, like this:
# THIS DOESN'T WORK
print($userInput . "\n") x $userInput2;
This would print the string once, then duplicate print's return value some number of times. Putting space before the ( doesn't help since whitespace is generally optional and ignored. In a way, this is another form of operator precedence: function calls bind more tightly than anything else.
If you really hate having more parentheses than strictly necessary, you can defeat Perl with the unary + operator:
print +($userInput . "\n") x $userInput2;
This separates the print from the (, so Perl knows the rest of the line is a single expression. Unary + has no effect whatsoever; its primary use is exactly this sort of situation.
This is due to precedence of . (concatenation) operator being less than the x operator. So it ends up with:
use strict;
use warnings;
my $userInput = "line";
my $userInput2 = 2;
print $userInput.("\n" x $userInput2);
And outputs:
line[newline]
[newline]
This is what you want:
print (($userInput."\n") x $userInput2);
This prints out:
line
line
As has already been mentioned, this is a precedence issue, in that the repetition operator x has higher precedence than the concatenation operator .. However, that is not all that's going on here, and also, the issue itself comes from a bad solution.
First off, when you say
print (($foo . "\n") x $count);
What you are doing is changing the context of the repetition operator to list context.
(LIST) x $count
The above statement really means this (if $count == 3):
print ( $foo . "\n", $foo . "\n", $foo . "\n" ); # list with 3 elements
From perldoc perlop:
Binary "x" is the repetition operator. In scalar context or if the left operand is not enclosed in parentheses, it returns a string consisting of the left operand repeated the number of times specified by the right operand. In list context, if the left operand is enclosed in parentheses or is a list formed by qw/STRING/, it repeats the list. If the right operand is zero or negative, it returns an empty string or an empty list, depending on the context.
The solution works as intended because print takes list arguments. However, if you had something else that takes scalar arguments, such as a subroutine:
foo(("text" . "\n") x 3);
sub foo {
# #_ is now the list ("text\n", "text\n", "text\n");
my ($string) = #_; # error enters here
# $string is now "text\n"
}
This is a subtle difference which might not always give the desired result.
A better solution for this particular case is to not use the concatenation operator at all, because it is redundant:
print "$foo\n" x $count;
Or even use more mundane methods:
for (0 .. $count) {
print "$foo\n";
}
Or
use feature 'say'
...
say $foo for 0 .. $count;

Skipping particular positions in a string using substitution operator in perl

Yesterday, I got stuck in a perl script. Let me simplify it, suppose there is a string (say ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD), first I've to break it at every position where "E" comes, and secondly, break it specifically where the user wants to be at. But, the condition is, program should not cut at those sites where E is followed by P. For example there are 6 Es in this sequence, so one should get 7 fragments, but as 2 Es are followed by P one will get 5 only fragments in the output.
I need help regarding the second case. Suppose user doesn't wants to cut this sequence at, say 5th and 10th positions of E in the sequence, then what should be the corresponding script to let program skip these two sites only? My script for first case is:
my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';
$otext=~ s/([E])/$1=/g; #Main cut rule.
$otext=~ s/=P/P/g;
#output = split( /\=/, $otext);
print "#output";
Please do help!
To split on "E" except where it's followed by "P", you should use Negative look-ahead assertions.
From perldoc perlre "Look-Around Assertions" section:
(?!pattern)
A zero-width negative look-ahead assertion.
For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar".
my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';
# E E EP E EP E
my #output=split(/E(?!P)/, $otext);
use Data::Dumper; print Data::Dumper->Dump([\#output]);"
$VAR1 = [
'ABCD',
'ABCD',
'ABCDEPABCD',
'ABCDEPABCD',
'ABCD'
];
Now, in order to NOT cut at occurences #2 and #4, you can do 2 things:
Concoct a really fancy regex that automatically fails to match on given occurence. I will leave that to someone else to attempt in an answer for completeness sake.
Simply stitch together the correct fragments.
I'm too brain-dead to come up with a good idiomatic way of doing it, but the simple and dirty way is either:
my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
my #output_final;
for(my $i=0; $i < #output; $i++) {
if ($no_cuts{$i}) {
$output_final[-1] .= $output[$i];
} else {
push #output_final, $output[$i];
}
}
print Data::Dumper->Dump([\#output_final];
$VAR1 = [
'ABCD',
'ABCDABCDEPABCD',
'ABCDEPABCDABCD'
];
Or, simpler:
my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
for(my $i=0; $i < #output; $i++) {
$output[$i-1] .= $output[$i];
$output[$i]=undef; # Make the slot empty
}
my #output_final = grep {$_} #output; # Skip empty slots
print Data::Dumper->Dump([\#output_final];
$VAR1 = [
'ABCD',
'ABCDABCDEPABCD',
'ABCDEPABCDABCD'
];
Here's a dirty trick that exploits two facts:
normal text strings never contain null bytes (if you don't know what a null byte is, you should as a programmer: http://en.wikipedia.org/wiki/Null_character, and nb. it is not the same thing as the number 0 or the character 0).
perl strings can contain null bytes if you put them there, but be careful, as this may screw up some perl internal functions.
The "be careful" is just a point to be aware of. Anyway, the idea is to substitute a null byte at the point where you don't want breaks:
my $s = "ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD";
my #nobreak = (4,9);
foreach (#nobreak) {
substr($s, $_, 1) = "\0";
}
"\0" is an escape sequence representing a null byte like "\t" is a tab. Again: it is not the character 0. I used 4 and 9 because there were E's in those positions. If you print the string now it looks like:
ABCDABCDABCDEPABCDEABCDEPABCDEABCD
Because null bytes don't display, but they are there, and we are going to swap them back out later. First the split:
my #a = split(/E(?!P)/, $s);
Then swap the zero bytes back:
$_ =~ s/\0/E/g foreach (#a);
If you print #a now, you get:
ABCDEABCDEABCDEPABCD
ABCDEPABCD
ABCD
Which is exactly what you want. Note that split removes the delimiter (in this case, the E); if you intended to keep those you can tack them back on again afterward. If the delimiter is from a more dynamic regex it is slightly more complicated, see here:
http://perlmeme.org/howtos/perlfunc/split_function.html
"Example 9. Keeping the delimiter"
If there is some possibility that the #nobreak positions are not E's, then you must also keep track of those when you swap them out to make sure you replace with the correct character again.

How do I get the length of a string in Perl?

What is the Perl equivalent of strlen()?
length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.
Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).
length($string)
The length() function:
$string ='String Name';
$size=length($string);
You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context