Arithmetic with strings that contain numbers - perl

What is the Perlish way to deal with arithmetic with strings that contain numbers?
Example: Say I'm dealing with font sizes that are represented with a string like -> "120px". I know the values of font sizes will always be formatted with number characters followed by non-number characters and I know that Perl will truncate any trailing characters of a string in arithmetic so could I do something like the following(with appropriate comments)?
#! /usr/bin/env perl
use warnings;
use strict;
use utf8;
use constant FONT_UNIT => 4;
my $font_size = "120px";
STDOUT->print("${font_size}\n");
$font_size = do {no warnings; $font_size + FONT_UNIT}."px";
STDOUT->print("${font_size}\n");
exit (0);
I ask because this feature of the language really works here.

Yes, your approach seems ok. You can also use substitution with evaluation:
$font_size =~ s/([0-9]+)/$1 + FONT_UNIT/e;
Or, if you need clarity, just extract the number, change it, and glue the parts back:
my ($size, $unit) = $font_size =~ /([0-9]+)(.*)/;
$size += FONT_UNIT;
STDOUT->say("$size$unit");

The only two suggestions I'd make would be to be more precise about which warnings you're turning off and to use a slightly larger naked code block to make it more readable.
#! /usr/bin/env perl
use warnings;
use strict;
use utf8;
use constant FONT_UNIT => 4;
my $font_size = "120px";
STDOUT->print("${font_size}\n");
{
no warnings 'numeric';
$font_size = $font_size + FONT_UNIT . "px";
}
STDOUT->print("${font_size}\n");
exit (0);

What is the Perlish way to deal with arithmetic with strings that contain numbers?
Leading and trailing whitespace is ignored (without generating a warning).
inf, infinity, nan, case insensitive, after stripping whitespace, and with optional leading + or -, are treated as those special numbers (without generating a warning).
0 but true (no extra whitespace allowed) is treated as the number 0 (without generating a warning).
Any leading thing that looks like an integer or decimal number with an optional following e or E and optionally signed exponent is treated as that number (to the extent it can be represented in a numeric type). If any non-whitespace characters remain afterwards, an "isn't numeric" warning is generated.

Related

Chopping the last sequence of a pattern

I have this series of values
rd_8KB_rms
rd_8KB_rms_qd1
rd_8KB_wh
rd_8KB_wh_q1
rd_8KB_wms
rd_8KB_wms_qd1
rd_256K_rms
rd_256K_rms_1
and where there are 3 underscores I would like to chop the last underscore and the characters that trail it ( which are variable in number). I think I have tried variations of substr, split, regex but can't find anything that works
You can use transliteration tr/_// to count the number of underscores and substitution s/_[^_]*$// to remove the part from the last underscore to the end.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
while (<DATA>) {
chomp;
s/_[^_]*$// if tr/_// == 3;
say;
}
__DATA__
rd_8KB_rms
rd_8KB_rms_qd1
rd_8KB_wh
rd_8KB_wh_q1
rd_8KB_wms
rd_8KB_wms_qd1
rd_256K_rms
rd_256K_rms_1
If there can be even more underscores, use a variant like
s/_[^_]*$// until tr/_// <= 3;

truncate string in perl into substring with trailing elipses

I'm trying to truncate a string in a select input option using perl if it is longer than a set value, though i can't get it to work correctly.
my $value = defined $option->{value} ? $option->{value} : '';
my $maxValueLength = 50;
if ($value.length > $maxValueLength) {
$value = substr $value, 0, $maxValueLength + '...';
}
Another option is regex
$string =~ s/.{$maxLength}\K.*/.../;
It matches any character (.) given number of times ({N}, here $maxLength), what is the first $maxLength characters in $string; then \K makes it "forget" all previous matches so those won't get replaced later. The rest of the string that is matched is then replaced by ...
See Lookaround assertions in perlre for \K.
This does start the regex engine for a simple task but it doesn't need any conditionals -- if the string is shorter than the maximum length the regex won't match and nothing happens.
Your code has several syntax errors. Turn on use strict and use warnings if you don't have it, and then read the error messages it tells you about. This is a bit tricky because of Perl's very complex syntax (see also Damian Conway's keynote from the 2020 Perl and Raku Conference), but it boils down to these:
Use of uninitialized value in concatenation (.) or string at line 7
Argument "..." isn't numeric in addition (+) at line 8
I've used the following adaption of your code to produce these
use strict;
use warnings;
my $value = '1234567890' x 10;
my $maxValueLength = 50;
if ( $value.length > $maxValueLength ) {
$value = substr $value, 0, $maxValueLength + '...';
}
print $value;
Now let's see what they mean.
The . operator in Perl is a concatenation. You cannot use it to call methods, and length is not a method on a string. Perl thinks you are using the built-in length (a function, not a method) without an argument, which makes it default to $_. Most built-ins do this, to make one-liners shorter. But $_ is not defined. Now the . tries to concatenate the length of undef to $value. And using undef in a string operation leads to this warning.
The correct way of doing this is length $value (or with parentheses if you prefer them, length($value)).
The + operator is not concatenation (we just learned that the . is). It's a numerical addition. Perl is pretty good at converting between strings and numbers as there aren't really any types, so saying 1 + "5" would give you 6 without problems, but it cannot do that for a couple of dots in a string. Hence it complains about a non-number value in an addition.
You want the substring with a given length, and then you want to attach the three dots. Because of associativity (or stickyness) of operators you will need to use parentheses () for your substr call.
$value = substr($value, 0, $maxValueLength) . '...';
To find a length of the string use length(STRING)
Here is the code snippet how you can modify the script.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
my $string = "abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz";
say "length of original string is:".length($string);
my $value = defined $string ? $string : '';
my $maxValueLength = 50;
if (length($value) > $maxValueLength) {
$value = substr $value, 0, $maxValueLength;
say "value:$value";
say "value's length:".length($value);
}
Output:
length of original string is:80
value:abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvw
value's length:50

Perl if Statement only specific length of numbers

is it possible to have an if-statement where I look if my $expression has less than 12 integers and only integers. Like
if($expression> less than 12numbers and only integers).
You can match it using regex. Below is the code snippet.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
my $exp = "1234567898711";
if ($exp =~ /^\d{12}$/) {
say "Matched expression: $exp";
} else {
say "Not matched";
}
EDIT:
If you want to look for 12 digits or less than that use below expression:
\d{1,12}
Note: This expression is only when you have straight digits. If its a alphanumeric, then it needed to be changed accordingly.

Convert Hex to Binary and keep leading 0's Perl

I have an array of hex numbers that I'd like to convert to binary numbers, the problem is, in my code it removes the leading 0's for things like 0,1,2,3. I need these leading 0's to process in a future section of my code. Is there an easy way to convert Hex to Binary and keep my leading 0's in perl?
use strict;
use warnings;
my #binary;
my #hex = ('ABCD', '0132', '2211');
foreach my $h(#hex){
my $bin = sprintf( "%b", hex($h));
push #binary, $bin;
}
foreach (#binary){
print "$_\n";
}
running the code gives me
1010101111001101
100110010
10001000010001
Edit: Found a similar answer using pack and unpack, replaced
sprint( "%b", hex($h));
with
unpack( 'B*', pack('H*' ($h))
You can specify the width of the output in sprintf or printf by putting the number between the % and the format character like this.
printf "%16b\n",hex("0132");
and by preceding the number with 0, make it pad the result with 0s like this
printf "%016b\n",hex("0132");
the latter giving the result of
0000000100110010
But this is all covered in the documentation for those functions.
This solution uses the length of the hex repesentation to determine the length of the binary representation:
for my $num_hex (#nums_hex) {
my $num = hex($num_hex);
my $num_bin = sprintf('%0*b', length($num_hex)*4, $num);
...
}

performance issue with substr on a very long UTF-8 string

I am using substr on a very long UTF-8 string (~250,000,000 characters).
The thing is my program almost freeze around the 200,000,000th character.
Does somebody know about this issue? What are my options?
As I am indexing a document using a suffix array, I need:
to keep my string in one piece;
to access variable length substrings using an index.
As for a MWE:
use strict;
use warnings;
use utf8;
my $text = 'あいうえお' x 50000000;
for( my $i = 0 ; $i < length($text) ; $i++ ){
print "\r$i";
my $char = substr($text,$i,1);
}
print "\n";
Perl has two string storage formats. One that's capable of storing 8-bit characters, and one capable of storing 72-bit characters (practically limited to 32 or 64). Your string necessarily uses the latter format. This wide-character format uses a variable number of bytes per character like UTF-8 does.
Finding the ith element of a string in the first format is trivial: Add the offset to the string pointer. With the second format, finding the ith character requires scanning the string from the beginning, just like you would have to scan a file from the beginning to find the nth line. There is a mechanism that caches information about the string as it's discovered, but it's not perfect.
The problem goes away if you use a fixed number of bytes per character.
use utf8;
use Encode qw( encode );
my $text = 'あいうえお' x 50000000;
my $packed = encode('UCS-4le', $text);
for my $i (0..length($packed)/4) {
print "\r$i";
my $char = chr(unpack('V', substr($packed, $i*4, 4)));
}
print "\n";
Note that the string will use 33% more memory for hiragana characters. Or maybe not, since there's no cache anymore.
I suggest that you use a regular expression instead of substr.
Benchmarking these two methods shows that a regex is nearly 100 times faster:
use strict;
use warnings;
use utf8;
my $text = 'あいうえお' x 50_000;
sub mysubstr {
for( my $i = 0 ; $i < length($text) ; $i++ ){
my $char = substr($text,$i,1);
}
}
sub myregex {
while ($text =~ /(.)/g) {
my $char = $1;
}
}
use Benchmark qw(:all) ;
timethese(10, {
'substr' => \&mysubstr,
'regex' => \&myregex,
});
Outputs:
Benchmark: timing 10 iterations of regex, substr...
regex: 2 wallclock secs ( 2.18 usr + 0.00 sys = 2.18 CPU) # 4.58/s (n=10)
substr: 198 wallclock secs (184.66 usr + 0.16 sys = 184.81 CPU) # 0.05/s (n=10)
It is a known issue listed under Bugs for Perl 5.20.0:
http://perldoc.perl.org/perlunicode.html#Speed
The most important part is the first paragraph of my quote:
Speed
Some functions are slower when working on UTF-8 encoded strings than on byte encoded strings. All functions that need to hop over characters such as length(), substr() or index(), or matching regular expressions can work much faster when the underlying data are byte-encoded.
In Perl 5.8.0 the slowness was often quite spectacular; in Perl 5.8.1 a caching scheme was introduced which will hopefully make the slowness somewhat less spectacular, at least for some operations. In general, operations with UTF-8 encoded strings are still slower. As an example, the Unicode properties (character classes) like \p{Nd} are known to be quite a bit slower (5-20 times) than their simpler counterparts like \d (then again, there are hundreds of Unicode characters matching Nd compared with the 10 ASCII characters matching d ).
The easiest way to avoid it is using byte-strings instead of unicode-strings.
In your particular sample, you can just remove characters from the beginning of the $text string as they are processed in order to avoid the linear lookup:
use utf8;
use Encode qw( encode );
$| = 1;
my $text = 'あいうえお' x 50000000;
while ($text ne '') {
print ".";
my $char = substr($text, 0, 1, '');
}
print "\n";