truncate string in perl into substring with trailing elipses - perl

I'm trying to truncate a string in a select input option using perl if it is longer than a set value, though i can't get it to work correctly.
my $value = defined $option->{value} ? $option->{value} : '';
my $maxValueLength = 50;
if ($value.length > $maxValueLength) {
$value = substr $value, 0, $maxValueLength + '...';
}

Another option is regex
$string =~ s/.{$maxLength}\K.*/.../;
It matches any character (.) given number of times ({N}, here $maxLength), what is the first $maxLength characters in $string; then \K makes it "forget" all previous matches so those won't get replaced later. The rest of the string that is matched is then replaced by ...
See Lookaround assertions in perlre for \K.
This does start the regex engine for a simple task but it doesn't need any conditionals -- if the string is shorter than the maximum length the regex won't match and nothing happens.

Your code has several syntax errors. Turn on use strict and use warnings if you don't have it, and then read the error messages it tells you about. This is a bit tricky because of Perl's very complex syntax (see also Damian Conway's keynote from the 2020 Perl and Raku Conference), but it boils down to these:
Use of uninitialized value in concatenation (.) or string at line 7
Argument "..." isn't numeric in addition (+) at line 8
I've used the following adaption of your code to produce these
use strict;
use warnings;
my $value = '1234567890' x 10;
my $maxValueLength = 50;
if ( $value.length > $maxValueLength ) {
$value = substr $value, 0, $maxValueLength + '...';
}
print $value;
Now let's see what they mean.
The . operator in Perl is a concatenation. You cannot use it to call methods, and length is not a method on a string. Perl thinks you are using the built-in length (a function, not a method) without an argument, which makes it default to $_. Most built-ins do this, to make one-liners shorter. But $_ is not defined. Now the . tries to concatenate the length of undef to $value. And using undef in a string operation leads to this warning.
The correct way of doing this is length $value (or with parentheses if you prefer them, length($value)).
The + operator is not concatenation (we just learned that the . is). It's a numerical addition. Perl is pretty good at converting between strings and numbers as there aren't really any types, so saying 1 + "5" would give you 6 without problems, but it cannot do that for a couple of dots in a string. Hence it complains about a non-number value in an addition.
You want the substring with a given length, and then you want to attach the three dots. Because of associativity (or stickyness) of operators you will need to use parentheses () for your substr call.
$value = substr($value, 0, $maxValueLength) . '...';

To find a length of the string use length(STRING)
Here is the code snippet how you can modify the script.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
my $string = "abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz";
say "length of original string is:".length($string);
my $value = defined $string ? $string : '';
my $maxValueLength = 50;
if (length($value) > $maxValueLength) {
$value = substr $value, 0, $maxValueLength;
say "value:$value";
say "value's length:".length($value);
}
Output:
length of original string is:80
value:abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvw
value's length:50

Related

Perl module / subroutine to remove shared substring of strings

Given a list/array of strings (in particular, UNIX paths), remove the shared part, eg:
./dir/fileA_header.txt
./dir/fileA_footer.txt
I probably will strip the directory before using the function, but strictly speacking this won't change much.
I'd like to know a method to either remove the shared parts (./dir/fileA_) or remove the not-shared part.
Thank you for your help!
This is a bit of a hack, but if you don't need to support Unicode strings (that is, if all characters have a value below 256), you can use xor to get the length of the longest common prefix of two strings:
my $n = do {
($str1 ^ $str2) =~ /^\0*/;
$+[0]
};
You can apply this operation in a loop to get the common prefix of a list of strings:
use v5.12.0;
use warnings;
sub common_prefix {
my $prefix = shift;
for my $str (#_) {
($prefix ^ $str) =~ /^\0*/;
substr($prefix, $+[0]) = '';
}
return $prefix;
}
my #paths = qw(
./dir/fileA_header.txt
./dir/fileA_footer.txt
);
say common_prefix(#paths);
Output: ./dir/fileA_

Extract substring using two delimiters and NO REGEX

I have a function whose aim is to extract a substring found between two delimiters. I would use regex but in this case I have explicit instructions not to use them.
I had a simpler and more elegant solution which was just one line but I cannot for the life of me remember or find it.
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = (#_);
my $tmp = substr($theString, index($theString,$delimiter1)+length($delimiter1));
$tmp = substr($tmp, 0, index($tmp,$delimiter2));
return $tmp;}
Thank you for taking a look at this issue, I am aware it is very basic and somewhat redundant. What I need is a simpler solution involving perl basic functions and no regex.
You can use two index() calls to locate both delimiters and use indexes to extract string between them,
sub findBetween {
my ($theString,$delimiter1,$delimiter2) = #_;
my $i1 = index($theString, $delimiter1, 0) + length($delimiter1);
my $i2 = index($theString, $delimiter2, $i1);
return substr($theString, $i1, $i2-$i1);
}
print findBetween("111--2222~~333", "--", "~~"), "\n";
output
2222
I would simply use index
use strict;
use warnings;
my $string = "hello my world";
my $substr = "my";
if (index($string, $substr) != -1) {
print "$substr found in $string";
}
Extract from perldoc
• index STR,SUBSTR,POSITION
• index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at zero. If the substring is not found, index returns -1.

Finding index of white space in Perl

I'm trying to find the index of white space in a string in Perl.
For example, if I have the string
stuff/more stuffhere
I'd like to select the word "more" with a substring method. I can find the index of "/" but haven't figured out how to find the index of white space. The length of the substring I'm trying to select will vary, so I can't hard code the index. There will only be one white space in the string (other than those after the end of the string).
Also, if anybody has any better ideas of how to do this, I'd appreciate hearing them. I'm fairly new to programming so I'm open to advice. Thanks.
Just use index:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'stuff/more stuffhere';
my $index_of_slash = index $string, '/';
my $index_of_space = index $string, ' ';
say "Between $index_of_slash and $index_of_space.";
The output is
Between 5 and 10.
Which is correct:
0 1
01234567890123456789
stuff/more stuffhere
If by "whitespace" you also mean tabs or whatever, you can use a regular expression match and the special variables #- and #+:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = "stuff/more\tstuffhere";
if ($string =~ m{/.*(?=\s)}) {
say "Between $-[0] and $+[0]";
}
The (?=\s) means is followed by a whitespace character, but the character itself is not part of the match, so you don't need to do any maths on the returned values.
As you stated, you want to select the word between the first /
and the first space following it.
If this is the case, you maybe don't need any index (you need just
the word).
A perfect tool to find something in a text is regex.
Look at the following code:
$txt = 'stuff/more stuffxx here';
if ($txt =~ /\/(.+?) /) {
print "Match: $1.\n";
}
The regex used tries to match:
a slash,
a non-empty sequence of any chars (note ? - reluctant
version), enclosed in a capturing group,
a space.
So after the match $1 contains what was captured by the first
capturing group, i.e. "your" word.
But if for any reason you were interested in starting and ending
offsets to this word, you can read them from $-[1]
and $+[1] (starting / ending indices of the first capturing group).
The arrays #- (#LAST_MATCH_START) and #+ (#LAST_MATCH_END) give offsets of the start and end of last successful submatches. See Regex related variables in perlvar.
You can capture your real target, and then read off the offset right after it with $+[0]
#+
This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. $+[0] is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against.
Example
my $str = 'target and target with spaces';
while ($str =~ /(target)\s/g)
{
say "Position after match: $+[0]"
}
prints
Position after match: 7
Position after match: 18
These are positions right after 'target', so of spaces that come after it.
Or you can capture \s instead and use $-[1] + 1 (first position of the match, the space).
You can use
my $str = "stuff/more stuffhere";
if ($str =~ m{/\K\S+}) {
... substr($str, $-[0], $+[0] - $-[0]) ...
}
But why substr? That's very weird there. Maybe if you told us what you actually wanted to do, we could provide a better alternatives. Here are three cases:
Data extraction:
my $str = "stuff/more stuffhere";
if ( my ($word) = $str =~ m{/(\S+)} ) {
say $word; # more
}
Data replacement:
my $str = "stuff/more stuffhere";
$str =~ s{/\K\S+}{REPLACED};
say $str; # stuff/REPLACED stuffhere
Data replacement (dynamic):
my $str = "stuff/more stuffhere";
$str =~ s{/\K(\S+)}{ uc($1) }e;
say $str; # stuff/MORE stuffhere

Perl tr operator is transliterating based on the variable's name not its value

I'm using Perl 5.16.2 to try to count the number of occurrences of a particular delimiter in the $_ string. The delimiter is passed to my Perl program via the #ARGV array. I verify that it is correct within the program. My instruction to count the number of delimiters in the string is:
$dlm_count = tr/$dlm//;
If I hardcode the delimiter, e.g. $dlm_count = tr/,//; the count comes out correctly. But when I use the variable $dlm, the count is wrong. I modified the instruction to say
$dlm_count = tr/$dlm/\t/;
and realized from how the tabs were inserted in the string that the operation was substituting every instance of any of the four characters "$", "d", "l", or "m" to \t — i.e. any of the four characters that made up my variable name $dlm.
Here is a sample program that illustrates the problem:
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm = ",";
my $dlm_count = tr/$dlm/\t/;
print "The count is $dlm_count\n";
print "The modified string is $_\n";
There are only two commas in the $_ string, but this program prints the following:
The count is 3
The modified string is abc efghij,k ,nopqrstuvwxyz
Why is the $dlm token being treated as a literal string of four characters instead of as a variable name?
You cannot use tr that way, it doesn't interpolate variables. It runs strictly character by character replacement. So this
$string =~ tr/a$v/123/
is going to replace every a with 1, every $ with 2, and every v with 3. It is not a regex but a transliteration. From perlop
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $# if $#;
eval "tr/$oldlist/$newlist/, 1" or die $#;
The above example from docs hints how to count. For $dlms in $string
$dlm_count = eval "\$string =~ tr/$dlm//";
The $string is escaped so to not be interpolated before it gets to eval. In your case
$dlm_count = eval "tr/$dlm//";
You can also use tools other than tr (or regex). For example, with string being in $_
my $dlm_count = grep { /$dlm/ } split //;
When split breaks $_ by the pattern that is empty string (//) it returns the list of all characters in it. Then the grep block tests each against $dlm so returning the list of as many $dlm characters as there were in $_. Since this is assigned to a scalar, $dlm_count is set to the length of that list, which is the count of all $dlm.
In the section of the docs on perlop 'Quote Like Operators', it states:
Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval():
As documented and as you discovered, tr/// doesn't interpolate. The simple solution is to use s/// instead.
my $dlm = ",";
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm_count = s/\Q$dlm/\t/g;
If the transliteration is being performed in a loop, the following might speed things up noticeably:
my $dlm = ",";
my $tr = eval "sub { tr/\Q$dlm\E/\\t/ }";
for (...) {
my $dlm_count = $tr->();
...
}
Although several answers have hinted at the eval() idiom for tr///, none have the form that covers cases where the string has tr syntax characters in it, e.g.- (hyphen):
$_ = "abcdefghij,klm,nopqrstuvwxyz";
my $dlm = ",";
my $dlm_count = eval sprintf "tr/%s/%s/", map quotemeta, $dlm, "\t";
But as others have noted, there are lots of ways to count characters in Perl that avoid eval(), here's another:
my $dlm_count = () = m/$dlm/go;

How do I get the length of a string in Perl?

What is the Perl equivalent of strlen()?
length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.
Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).
length($string)
The length() function:
$string ='String Name';
$size=length($string);
You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context