How do I get the length of a string in Perl? - perl

What is the Perl equivalent of strlen()?

length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.

Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).

length($string)

The length() function:
$string ='String Name';
$size=length($string);

You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context

Related

truncate string in perl into substring with trailing elipses

I'm trying to truncate a string in a select input option using perl if it is longer than a set value, though i can't get it to work correctly.
my $value = defined $option->{value} ? $option->{value} : '';
my $maxValueLength = 50;
if ($value.length > $maxValueLength) {
$value = substr $value, 0, $maxValueLength + '...';
}
Another option is regex
$string =~ s/.{$maxLength}\K.*/.../;
It matches any character (.) given number of times ({N}, here $maxLength), what is the first $maxLength characters in $string; then \K makes it "forget" all previous matches so those won't get replaced later. The rest of the string that is matched is then replaced by ...
See Lookaround assertions in perlre for \K.
This does start the regex engine for a simple task but it doesn't need any conditionals -- if the string is shorter than the maximum length the regex won't match and nothing happens.
Your code has several syntax errors. Turn on use strict and use warnings if you don't have it, and then read the error messages it tells you about. This is a bit tricky because of Perl's very complex syntax (see also Damian Conway's keynote from the 2020 Perl and Raku Conference), but it boils down to these:
Use of uninitialized value in concatenation (.) or string at line 7
Argument "..." isn't numeric in addition (+) at line 8
I've used the following adaption of your code to produce these
use strict;
use warnings;
my $value = '1234567890' x 10;
my $maxValueLength = 50;
if ( $value.length > $maxValueLength ) {
$value = substr $value, 0, $maxValueLength + '...';
}
print $value;
Now let's see what they mean.
The . operator in Perl is a concatenation. You cannot use it to call methods, and length is not a method on a string. Perl thinks you are using the built-in length (a function, not a method) without an argument, which makes it default to $_. Most built-ins do this, to make one-liners shorter. But $_ is not defined. Now the . tries to concatenate the length of undef to $value. And using undef in a string operation leads to this warning.
The correct way of doing this is length $value (or with parentheses if you prefer them, length($value)).
The + operator is not concatenation (we just learned that the . is). It's a numerical addition. Perl is pretty good at converting between strings and numbers as there aren't really any types, so saying 1 + "5" would give you 6 without problems, but it cannot do that for a couple of dots in a string. Hence it complains about a non-number value in an addition.
You want the substring with a given length, and then you want to attach the three dots. Because of associativity (or stickyness) of operators you will need to use parentheses () for your substr call.
$value = substr($value, 0, $maxValueLength) . '...';
To find a length of the string use length(STRING)
Here is the code snippet how you can modify the script.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
my $string = "abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz";
say "length of original string is:".length($string);
my $value = defined $string ? $string : '';
my $maxValueLength = 50;
if (length($value) > $maxValueLength) {
$value = substr $value, 0, $maxValueLength;
say "value:$value";
say "value's length:".length($value);
}
Output:
length of original string is:80
value:abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvw
value's length:50

Substitution on string in perl changes string to an integer value

I am trying to do delete some characters matching a regex in perl and when I do that it returns an integer value.
I have tried substituting multiple spaces in a string with empty string or basically deleting the space.
#! /usr/intel/bin/perl
my $line = "foo/\\bar car";
print "$line\n";
#$line = ~s/(\\|(\s)+)+//; <--Ultimately need this, where backslash and space needs to be deleted. Tried this, returns integer value
$line = ~s/\s+//; <-- tried this, returns integer value
print "$line\n";
Expected results:
First print: foo/\bar car
Second print: foo/barcar
Actual result:
First print: foo/\\bar car
Second print: 18913234908
The proper solution is
$line =~ s/[\s\\]+//g;
Note:
g flag to substitute all occurrences
no space between = and ~
=~ is a single operator, binding the substitution operator s to the target variable $line.
Inserting a space (as in your code) means s binds to the default target, $_, because there is no explicit target, and then the return value (which is the number of substitutions made) has all its bits inverted (unary ~ is bitwise complement) and is assigned to $line.
In other words,
$line = ~ s/...//
parses as
$line = ~(s/...//)
which is equivalent to
$line = ~($_ =~ s/...//)
If you had enabled use warnings, you would've gotten the following message:
Use of uninitialized value $_ in substitution (s///) at prog.pl line 6.
You've already accepted an answer, but I thought it would be useful to give you a few more details.
As you now know,
$line = ~s/\s+//;
is completely different to:
$line =~ s/\s+//;
You wanted the second, but you typed the first. So what did you end up with?
~ is "bitwise negation operator". That is, it converts its argument to a binary number and then bit-flips that number - all the zeroes become ones and all the ones become zeros.
So you're asking for the bitwise negation of s/\s+//. Which means the bitwise negation works on the value returned by s/\s+//. And the value returned by a substitution is the number of substitutions made.
We can now work out all of the details.
s/\s+// carries out your substitution and returns the number of substitutions made (an integer).
~s/\s+// returns the bitwise negation of the integer returned by the substitution (which is also an integer).
$line = ~s/\s+// takes that second integer and assigns it to the variable $line.
Probably, the first step returns 1 (you don't use /g on your s/.../.../, so only one substitution will be made). It's easy enough to get the bitwise negation of 1.
$ perl -E'say ~1'
18446744073709551614
So that might well be the integer that you're seeing (although it might be different on a 32-bit system).

Learning Perl, One-Liner behaves differently

UPDATE
As pointed out in the answer, this question really has to do with Scalar versus List Context in Perl.
## ## ##
I am learning perl via self-taught crash course (primarily with the Llama book and the web). In attempting some byte swap code, I have found a one liner I do not understand completely. A comment in the script explains what I think is happening in the one-liner.
#!/usr/bin/perl --
#
# Script to print byte-swapped hex values
#
use 5.010;
use warnings;
use strict;
# NOTE: I realize I could use a single variable 'data', but the x- y- z- prefixes may help in
# identification (for clarity) in the code for this SO question.
my $xData;
my $yData;
my $zData;
for ( my $ijk = 998; $ijk < 1001; $ijk++ )
{
printf ( "\n%4d is hex " . (sprintf "0x%04X", $ijk) . "\n", $ijk );
# with byte swap
say "These numbers (bytes swapped) should match...";
# do sprintf, match pattern and store to ($1)($2), now reverse them into ($2)($1).
# BindOp leaves $_ alone, match stuffs $_ & is then used as input for reverse, prints.
say reverse ((sprintf "%04X", $ijk) =~ /(..)(..)/) ; # from perlmonks' webpage
$yData = (reverse ((sprintf "%04X", $ijk) =~ /(..)(..)/) );
say $yData; # does NOT match
$xData = sprintf "%04X", $ijk;
$xData =~ s/(..)(..)/$2$1/ ;
say $xData; # does match
$_ = sprintf "%04X", $ijk;
/(..)(..)/;
$zData = $2 . $1 ;
say $zData; # does match
}
exit 0;
OUTPUT:
998 is hex 0x03E6 These numbers (bytes swapped) should match...
E603
6E30
E603
E603
999 is hex 0x03E7 These numbers (bytes swapped) should match...
E703
7E30
E703
E703
1000 is hex 0x03E8 These numbers (bytes swapped) should match...
E803
8E30
E803
E803
Why does the one liner work, and why doesn't the $yData perform the same way? I'm pretty sure I understand why $xData and $zData work, but I would expect $yData to be the closest equivalent non-one-liner. What is the closest equivalent non-one-liner and why? Where is the discrepancy?
The reverse in your print (say) statement comes in the list context, while when assigned to $yData the context is scalar. This function (may) behave considerably differently based on the context.
From perldoc -f reverse
reverse LIST
In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.
In this case this produces different results.
When called in list context, it swaps the (two) input bytes, keeping each byte intact (represented by two hexadecimal digits matched in a group). When called in scalar context, it joins the input and returns a character string, running in the opposite order. Taken to represent a hex number this would have each byte changed.

In Perl, how can I tell split not to strip empty trailing fields?

Was trying to count the number of lines in a string of text (including empty lines). A little surprised by the behavior of split. Had expected the following to output 2 but it printed 1 on my perl 5.14.2.
$str = "hello\
world\n\n";
#a = split(/\n/, $str);
print $#a, "\n";
Seems that split() is insensitive to consecutive \n (add more \n's at the end of the string will not increase the printout). The only I can get it sort of close to giving the number of lines is
$str = "hello\
world\n\n";
#a = split(/(\n)/, $str);
printf "%d\n", ($#a + 1)/2, "\n";
But it looks more like a workaround than a straight solution. Any ideas?
perldoc -f split:
If LIMIT is negative, it is treated as if it were instead
arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually
treated as if it were instead negative but with the exception that
trailing empty fields are stripped (empty leading fields are
always preserved); if all fields are empty, then all fields are
considered to be trailing (and are thus stripped in this case).
$ perl -E 'my $x = "1\n2\n\n"; my #x = split /\n/, $x, -1; say $#x'
3
Perhaps the problem is that you are using $#a when scalar #a is what you are actually looking for?
I apologize if you are already aware of this or if this is not the issue, but $#a returns the index of the last element of #a and (scalar #a) returns the number of elements that #a contains. Since array indexing starts at 0, $#a is one less than scalar #a.

Modify the last two characters of a string in Perl

I am looking for a solution to a problem:
I have the NSAP address which is 20 characters long:
39250F800000000000000100011921680030081D
I now have to replace the last two characters of this string with F0 and the final string should look like:
39250F80000000000000010001192168003008F0
My current implementation chops the last two characters and appends F0 to it:
my $nsap = "39250F800000000000000100011921680030081D";
chop($nsap);
chop($nsap);
$nsap = $nsap."F0";
Is there a better way to accomplish this?
You can use substr:
substr ($nsap, -2) = "F0";
or
substr ($nsap, -2, 2, "F0");
Or you can use a simple regex:
$nsap =~ s/..$/F0/;
This is from substr's manpage:
substr EXPR,OFFSET,LENGTH,REPLACEMENT
substr EXPR,OFFSET,LENGTH
substr EXPR,OFFSET
Extracts a substring out of EXPR and returns it.
First character is at offset 0, or whatever you've
set $[ to (but don't do that). If OFFSET is nega-
tive (or more precisely, less than $[), starts
that far from the end of the string. If LENGTH is
omitted, returns everything to the end of the
string. If LENGTH is negative, leaves that many
characters off the end of the string.
Now, the interesting part is that the result of substr can be used as an lvalue, and be assigned:
You can use the substr() function as an lvalue, in
which case EXPR must itself be an lvalue. If you
assign something shorter than LENGTH, the string
will shrink, and if you assign something longer
than LENGTH, the string will grow to accommodate
it. To keep the string the same length you may
need to pad or chop your value using "sprintf".
or you can use the replacement field:
An alternative to using substr() as an lvalue is
to specify the replacement string as the 4th argu-
ment. This allows you to replace parts of the
EXPR and return what was there before in one oper-
ation, just as you can with splice().
$nsap =~ s/..$/F0/;
replaces the last two characters of a string with F0.
Use the substr( ) function:
substr( $nsap, -2, 2, "F0" );
chop( ) and the related chomp( ) are really intended for removing line ending characters - newlines and so on.
I believe that substr( ) will be quicker than using a regular expression.