How to truncate a string to a specific length in perl? - perl

I am just unable to find "truncate a string to a specific length" in Perl.
Is there any built in way?
UPDATE:
input: $str = "abcd";
output (truncate for 3 characters): $str is abc

You want to use the substr() function.
$shortened = substr( $long, 0, 50 ); # 50 characters long, starting at the beginning.
For more, use perldoc
perldoc -f substr
In your case, it would be:
$str = 'abcd';
$short = substr( $str, 0, 3 );

For a string of arbitrary length, where truncate length can be longer than string length, I would opt for a substitution
$str =~ s/.{3}\K.*//s;
For shorter strings, the substitution will not match and the string will be unchanged. The convenient \K escape can be replaced with a lookbehind assertion, or a simple capture:
s/(?<=.{3}).*//s # lookbehind
s/(.{3}).*/$1/s # capture

It's probably useful to also mention that, instead of substr() or regular expressions, you could use printf or sprintf.
See perldoc -f sprintf :
For string conversions, specifying a precision truncates the string to
fit the specified width:
printf '<%.5s>', "truncated"; # prints "<trunc>"
printf '<%10.5s>', "truncated"; # prints "< trunc>"

As long as your original string is at least 3 characters long, you can use a call to substr as an lvalue.
my $str = "abcd";
substr($str, 3) = "";
print "$str\n"; # prints "abc"
The initial length of the string may need to be checked, as if it is shorter than 3 characters, the return value of this call to substr cannot be assigned to (see perldoc -f substr for more information) and attempting to do so will cause an error.

If I understand correctly, you need to do like php wordwrap() a string, so :
use Text::Format;
print Text::Format->new({columns => 50})->format($string);
If you just need the first N characters :
print substr $string, 0, 50;

Or you can use regexp to do the same.
#!/usr/bin/perl -w
use strict;
my $str = "abcd";
$str =~ /(\w{0,3})/;
print $1;

The most natural way is to use substr to extract the part you want:
$first_n = substr($string, 0, $n);
If you only want to modify the string and you are certain it is at least the desired length:
substr($string, $n) = '';
If you are not certain, you can do:
use List::Util "min";
substr($string, min($n, length($string))) = '';
or catch the exception:
eval { substr($string, $n) = '' };

Related

How to extract a value from a variable within if condition

I am trying to extract the timestamp portion from the variable, but somehow the substr function isnt working
This is what I have tried
while(<INFILE>){
chomp;
if(/timestamp:(.+$)/){
$ts = $1;
$ts =~ substr($ts, 10);
print $ts;
}
close(INFILE);
This is how the line is in the file
timestamp: 25JUN2019_02:55:02.234
somedata..
..
..
..
timestamp: 25JUN2019_07:00:28.718
I want the output to be
02:55:02.234
07:00:28.718
But instead the output is
25JUN2019_02:55:02.234
25JUN2019_07:00:28.718
Several issues:
You are using the bind operator =~ instead of the assignment operator =
You should always use strict and use warnings
You should ignore whitespace before your match
If there is additional data on the line, substr will return it as well. You should scope your substr to only include want you want.
Revised code:
use strict;
use warnings;
while(<DATA>) {
chomp;
if (/timestamp:\s*(.+$)/) {
my $ts = substr($1, 10, 12); # only include length of data you want
print $ts;
}
}
__DATA__
timestamp: 25JUN2019_02:55:02.234
Output:
02:55:02.234
Two problems:
=~ is the binding operator, you probably want normal assignment.
substr $ts, 10 returns the substring form position 10 to the end of $ts. To only extract 12 characters, use
$ts = substr $ts, 10, 12;
You can also extract the timestamp directly:
if(my ($ts) = /timestamp: [^_]+_(\S+)/){
print $ts, "\n";
}

bitwise shift for a string holding a numeric hex. in perl

I have a string
$string = "0x0"
Now I want to basically use this string as a number and do a bitwise shift i.e my aim is to
$C = $string <<4 ;
But when I do this it says :
Argument "0x0" isn't numeric in left bitshift (<<)
Can someone please help to execute this ?
It's because "0x0" isn't numeric. It's a string. You would have to turn it into a numeric value.
use strict;
use warnings;
my $string = "0x0";
my $number = hex($string);
my $C = $number <<4 ;
print $C;
As mentioned, you have to convert it to a numeric value and use sprintf to turn back to a hexadecimal value.
sprintf ("0x%x" , hex($string) << 4);

Using a char variable in tr///

I am trying to count the characters in a string and found an easy solution counting a single character using the tr operator. Now I want to do this with every character from a to z. The following solution doesn't work because tr/// matches every character.
my #chars = ('a' .. 'z');
foreach my $c (#chars)
{
$count{$c} = ($text =~ tr/$c//);
}
How do I correctly use the char variable in tr///?
tr/// doesn't work with variables unless you wrap it in an eval
But there is a nicer way to do this:
$count{$_} = () = $text =~ /$_/g for 'a' .. 'z';
For the TIMTOWTDI:
$count{$_}++ for grep /[a-z]/i, split //, $text;
tr doesn't support variable interpolation (neither in the search list nor in the replacement list). If you want to use variables, you must use eval():
$count{$c} = eval "\$text =~ tr/$c/$c/";
That said, a more efficient (and secure) approach would be to simply iterate over the characters in the string and increment counters for each character, e.g.:
my %count = map { $_ => 0 } 'a' .. 'z';
for my $char (split //, $text) {
$count{$char}++ if defined $count{$char};
}
If you look at the perldoc for tr/SEARCHLIST/REPLACEMENTLIST/cdsr, then you'll see, right at the bottom of the section, the following:
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $# if $#;
eval "tr/$oldlist/$newlist/, 1" or die $#;
Thus, you would need an eval to generate a new SEARCHLIST.
This is going to be very inefficient... the code might feel neat, but you're processing the complete string 26 times. You're also not counting uppercase characters.
You'd be better off stepping through the string once and just incrementing counters for each character found.
From the perlop documentation:
tr/AAA/XYZ/
will transliterate any A to X.
Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval()
Alternatively in your case you can use the s/// operator as:
foreach my $c (#chars) {
$count{$c} += ($text =~ s/$c//g);
}
My solution with some modification based from http://www.perlmonks.org/?node_id=446003
sub lowerLetters {
my $string = shift;
my %table;
#table{split //, $letters_uc} = split //, $letters_lc;
my $table_re = join '|', map { quotemeta } reverse sort keys %table;
$string =~ s/($table_re)/$table{$1}/g;
return if not defined $string;
return $string;
}
You may want to use s instead. Substitution is much more powerful than tr
My solution:
$count{$c} =~ s/\$search/$replace/g;
g at the end means "use it globally".
See:
https://blog.james.rcpt.to/2010/10/25/perl-search-and-replace-using-variables/
https://docstore.mik.ua/orelly/perl3/lperl/ch09_06.htm

Cutting apart string in Perl

I have a string in Perl that is 23 digits long. I need to cut it apart into different pieces. First 2 digits in one variable, next 3 in another variable, next 4 into another variable, etc. Basically the 23 digits needs to end up as 6 separate variables (2,3,4,4,3,7) characters, in that order.
Any ideas how I can cut the string up like this?
There are lots of ways to do it, but the shortest is probably unpack:
my $string = '1' x 23;
my #values = unpack 'A2A3A4A4A3A7', $string;
If you need separate variables, you can use a list assignment:
my ($v1, $v2, $v3, $v4, $v5, $v6) = unpack 'A2A3A4A4A3A7', $string;
Expanding on Alex's method, rather than specify each start and end, use the list you gave of lengths.
#!/usr/bin/env perl
use strict;
use warnings;
my $string = "abcdefghijklmnopqrstuvw";
my $pos = 0;
my #split = map {
my $start = $pos;
my $end = $_;
$pos += $end;
substr( $string, $start, $end);
} (2,3,4,4,3,7);
print "$_\n" for #split;
This said you probably should look at unpack which is used for fixed width fields. I have no experience with it though.
You could use a regex, viz:
$string =~ /\d{2}\d{3}\d{4}\d{4}\d{3}\d{7}/
and capture each part by surrounding with brackets ().
You then find each capture in the variables $1, $2 ...
or get them all in the returned list
See perldoc perlre
You want to use perldoc substr.
$substring = substr($string, $start, $length);
I'd also use `map' on a list of [start, length] pairs to make your life easier:
$string = "123456789";
#values = map {substr($string, $_->[0], $_->[1])} ([1, 3], [4, 2] , ...);
Here's a sub that will do it, using the already discussed unpack.
sub string_slices {
my $str = shift;
return unpack( join( 'A', '', #_ ), $str );
}

Using Perl, how do I decode or create those %-encodings on the web?

I need to handle URI (i.e. percent) encoding and decoding in my Perl script. How do I do that?
This is a question from the official perlfaq. We're importing the perlfaq to Stack Overflow.
This is the official FAQ answer minus subsequent edits.
Those % encodings handle reserved characters in URIs, as described in RFC 2396, Section 2. This encoding replaces the reserved character with the hexadecimal representation of the character's number from the US-ASCII table. For instance, a colon, :, becomes %3A.
In CGI scripts, you don't have to worry about decoding URIs if you are using CGI.pm. You shouldn't have to process the URI yourself, either on the way in or the way out.
If you have to encode a string yourself, remember that you should never try to encode an already-composed URI. You need to escape the components separately then put them together. To encode a string, you can use the URI::Escape module. The uri_escape function returns the escaped string:
my $original = "Colon : Hash # Percent %";
my $escaped = uri_escape( $original );
print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'
To decode the string, use the uri_unescape function:
my $unescaped = uri_unescape( $escaped );
print $unescaped; # back to original
If you wanted to do it yourself, you simply need to replace the reserved characters with their encodings. A global substitution is one way to do it:
# encode
$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
#decode
$string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;
DIY encode (improving above version):
$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%02x", ord $1 /eg;
(note the '%02x' rather than only '%0x')
DIY decode (adding '+' -> ' '):
$string =~ s/\+/ /g; $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;
Coders helping coders - bartering knowledge!
Maybe this will help deciding which method to choose.
Benchmarks on perl 5.32. Every function returns same result for given $input.
Code:
#!/usr/bin/env perl
my $input = "ala ma 0,5 litra 40%'owej vodki :)";
use Net::Curl::Easy;
my $easy = Net::Curl::Easy->new();
use URI::Encode qw( uri_encode );
use URI::Escape qw( uri_escape );
use Benchmark(cmpthese);
cmpthese(-3, {
'a' => sub {
my $string = $input;
$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
},
'b' => sub {
my $string = $input;
$string = $easy->escape( $string );
},
'c' => sub {
my $string = $input;
$string = uri_encode( $string, {encode_reserved => 1} );
},
'd' => sub {
my $string = $input;
$string = uri_escape( $string );
},
});
And results:
Rate c d a b
c 5618/s -- -98% -99% -100%
d 270517/s 4716% -- -31% -80%
a 393480/s 6905% 45% -- -71%
b 1354747/s 24016% 401% 244% --
Not surprising. A specialized C solution is the fast. An in-place regex with no sub calls is quite fast, followed closely by a copying regex with a sub call. I didn't look into why uri_encode was so much worse than uri_escape.
use URI and it will make URLs that just work.