how fast are string operations in Perl? In particular concatenation and assignment - perl

How fast is string concatenation in Perl? Is it linear to the length of the second operand? If so, what conditions need to be met for this operation to be linear? What are the examples on non-linear concatenation time?
And what about string assignment? When and where does the actual copy of the buffer occurs?
What about other operations like substring or simple regexes?

This is really complex question and answer depends on far many factors (architecture, underlying OS, HW, Perl compilation flags, etc.)
To get an idea, you can take a look at internals of perl structures used to represent your variables. Good source is perlguts illustrated.
If you have specific implementation in mind, try benchmarking your code:
use Benchmark qw(:all);
my $a = "Some string";
my #b = map { "Some string to append " x $_ } (1..10);
cmpthese(-1, {
( map {+ "concat_$_" => sub { my $c = $a . $b[$_] } } (1..10) )
});
The thing above compares operation my $c = $a . $b for various length of second argument. From result it can be seen that for this length ranges the operation runs roughly in linear time.

I tested this myself. Concatenation is linear to the length of the second argument but assignment is always linear to the length of the string.
It looks like Perl does not count references for strings but associates a buffer with every variable (reference).
Here are some test results:
Concatenation seems to be constant and entire test is linear:
248ms my $x; $x .= "a" for 1..2_000_000
501ms my $x; $x .= "a" for 1..4_000_000
967ms my $x; $x .= "a" for 1..8_000_000
$x = $x . $y seems to be optimized and uses $x buffer in this case:
295ms my $x; $x = $x . "a" for 1..2_000_000
592ms my $x; $x = $x . "a" for 1..4_000_000
1170ms my $x; $x = $x . "a" for 1..8_000_000
Previous optimization seems to be done statically so concatenation in next test is linear to the resulting string length and entire test is quadratic:
233ms my $x; ${\$x} = ${\$x} . "a" for 1..40_000
951ms my $x; ${\$x} = ${\$x} . "a" for 1..80_000
3811ms my $x; ${\$x} = ${\$x} . "a" for 1..160_000
Copying is linear:
186ms my $x; for (1..50_000) { $x .= "a"; my $y = $x }
764ms my $x; for (1..100_000) { $x .= "a"; my $y = $x }
3029ms my $x; for (1..200_000) { $x .= "a"; my $y = $x }
Every copy is linear, reference counting is not used for strings:
545ms my $x; for (1..50_000) { $x .= "a"; my $y = $x; my $y2 = $x; my $y3 = $x }
2264ms my $x; for (1..100_000) { $x .= "a"; my $y = $x; my $y2 = $x; my $y3 = $x }
8951ms my $x; for (1..200_000) { $x .= "a"; my $y = $x; my $y2 = $x; my $y3 = $x }

Related

Comparing multiple numerical values in Perl

Say I have a few variables, $x, $y, $z, $a, $b, $c, and I want to make sure they all have the same value.
Can I test with something like if ($x == $y == $z == $a == $b == $c) to avoid multiple binary comparisons, i.e. (if $x == $y and $x == $z and $y == $z ...)?
Is there any way I can do all the comparing with one short and simple test?
if ( grep $x != $_, $y, $z, $a, $b, $c ) {
print "not all the same\n";
}
$x == $y and $x == $z and $y == $z is equivalent to $x == $y and $x == $z due to equality being transitive. This latter one is also the optimal solution, with N-1 comparisons for N variables.
If you have an array, you can use uniq from List::MoreUtils:
use List::MoreUtils qw(uniq);
my #arr1 = qw(foo foo foo foo foo foo);
my #arr2 = qw(foo BAR foo foo foo foo);
print "arr1: ", (uniq #arr1) == 1 ? "All same" : "Different" , "\n";
print "arr2: ", (uniq #arr2) == 1 ? "All same" : "Different" , "\n";
(If you have more than several variables and don't have an array, it might be worth considering to rewrite the code...)
You can use List::MoreUtils::first_index.
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw( first_index );
my ($x, $y, $z, $a, $b, $c) = (1) x 6;
if (are_all_same($x, $y, $z, $a, $b, $c)) {
print "They all have the same value\n";
}
$c = 3;
unless (are_all_same($x, $y, $z, $a, $b, $c)) {
print "At least one has a different value than the others\n";
}
sub are_all_same {
my $x = shift;
-1 == first_index { $x != $_ } #_;
}
Of course, there is the issue of whether having so many variables in a small scope is appropriate (are you suffering from Fortranitis?), and whether one should use a hash to avoid a problem like this in the first place.
You can also use are_all_same with a large array, and it will impose minimal additional space and time penalties.
If they are all the same, then in particular the first must be equal to all the remaining ones. So that suggests the use of List::Util::all:
use List::Util 'all';
if( all { $x == $_ } $y, $z, $a, $b, $c ) {
...
}

Perl: Change in Subroutine not printing outside of routine

So I want to change numbers that I pass into a subroutine, and then retain those numbers being changed, but it doesn't seem to work.
my $A = 0;
my $T = 0;
my $C = 0;
my $G = 0;
foreach my $bases in (keys %basereads){
count ($bases, $A, $T, $C, $G);
}
Here is my subroutine
sub count {
my $bases = shift;
my $A = shift;
my $T = shift;
my $C = shift;
my $G = shift;
for (my $i = 0; $i < length($bases); $i++){
print "$bases\t";
if (uc(substr($bases,$i,1)) eq 'A'){
$A++;
}elsif (uc(substr($bases,$i,1)) eq 'T'){
$T++;
} elsif (uc(substr($bases,$i,1)) eq 'G'){
$G++;
} elsif (uc(substr($bases,$i,1)) eq 'C'){
$C++;
} else { next; }
}
print "$A\t$C\t$T\t$G\n";
return my($bases, $A, $T, $C, $G);
}
after the subroutine, I want to stored the altered A, C, T, G into a hashmap. When I print bases and ATCG inside the subroutine, it prints, so I know the computer is running through the subroutine, but it's not saving it, and when I try to manipulate it outside the subroutine (after I've called it), it starts from zero (what I had defined the four bases as before). I'm new to Perl, so I'm a little weary of subroutines.
Could someone help?
Always include use strict; and use warnings; at the top of EVERY script.
With warnings enabled, you should've gotten the following messages:
"my" variable $bases masks earlier declaration in same scope at script.pl line ...
"my" variable $A masks earlier declaration in same scope at script.pl line ...
"my" variable $T masks earlier declaration in same scope at script.pl line ...
"my" variable $C masks earlier declaration in same scope at script.pl line ...
"my" variable $G masks earlier declaration in same scope at script.pl line ...
These are caused by the my before your return statement:
return my($bases, $A, $T, $C, $G);
Correct this by simply removing the my:
return ($bases, $A, $T, $C, $G);
And then you just need to capture your returned values
($bases, $A, $T, $C, $G) = count($bases, $A, $T, $C, $G);
Given that you're new to perl, I'm sure you won't be surprised that your code could be cleaned up further though. If one uses a hash, it makes it a lot easier to count various characters in a string, as demonstrated below:
use strict;
use warnings;
my $A = 0;
my $T = 0;
my $C = 0;
my $G = 0;
foreach my $bases (keys %basereads) {
my %counts;
for my $char (split //, $bases) {
$counts{$char}++;
}
$A += $counts{A};
$T += $counts{T};
$C += $counts{C};
$G += $counts{G};
}

Perl calculator reads in numbers will not do calculation

I have a problem with my simple calculator program. It is not performing the calculation with my if statement: it goes straight to the else.
#!/usr/bin/perl
print "enter a symbol operation symbol to and two numbers to make a calculation";
chomp($input = <>);
if ($input eq '+') {
$c = $a + $b;
print $c;
}
elsif ($input eq '-') {
$c = $a - $b;
print $c;
}
elsif ($input eq '*') {
$c = $a * $b;
print $c;
}
elsif ($input eq '/') {
$c = $a / $b;
print $c;
}
elsif ($input eq '%') {
$c = $a % $b;
print $c;
}
elsif ($input eq '**') {
$c = $a**$b;
print $c;
}
elsif ($input eq 'root') {
$c = sqrt($a);
$c = sqrt($b);
print $c;
}
else {
print " you messed up" . "$input" . "$a" . "$b";
}
To start off with, you need to add strict and warnings to the top of your script
#!/usr/bin/perl
use strict;
use warnings;
That is going to alert you to a lot of syntax errors, and force you to completely rethink/refactor your code. This is a good thing though.
One obvious thing is that $a and $b are never initialized at all. And your first if is missing the dollar sign before input.
I would change the capturing of your variables to the following:
print "enter a symbol operation symbol to and two numbers to make a calculation";
chomp(my $input = <>);
my ($operation, $x, $y) = split ' ', $input.
I'd also lean away from using $a and $b as variable names, as they are special variables used by perl's sort. Once your certain that you're getting your input properly, then start working the rest of your logic.
You forgot '$' sign in the first condition before input:
if($input eq '+'){
$c = $a + $b;
print $c;
my $a = shift(#ARGV); // first argument is a
my $b = shift(#ARGV); // second argument is b
my $input = shift(#ARGV); // third argument is an operator
if($input eq '+'){...
Also, I would recommend 'use strict' and 'use warnings' at the top unless you're proficient at Perl.

What does dot-equals mean in Perl?

What does ".=" mean in Perl (dot-equals)? Example code below (in the while clause):
if( my $file = shift #ARGV ) {
$parser->parse( Source => {SystemId => $file} );
} else {
my $input = "";
while( <STDIN> ) { $input .= $_; }
$parser->parse( Source => {String => $input} );
}
exit;
Thanks for any insight.
The period . is the concatenation operator. The equal sign to the right means that this is an assignment operator, like in C.
For example:
$input .= $_;
Does the same as
$input = $input . $_;
However, there's also some perl magic in this, for example this removes the need to initialize a variable to avoid "uninitialized" warnings. Try the difference:
perl -we 'my $x; $x = $x + 1' # Use of uninitialized value in addition ...
perl -we 'my $x; $x += 1' # no warning
This means that the line in your code:
my $input = "";
Is quite redundant. Albeit some people might find it comforting.
For pretty much any binary operator X, $a X= $b is equivalent to $a = $a X $b. The dot . is a string concatenation operator; thus, $a .= $b means "stick $b at the end of $a".
In your code, you start with an empty $input, then repeatedly read a line and append it to $input until there's no lines left. You should end up with the entire file as the contents of $input, one line at a time.
It should be equivalent to the loopless
local $/;
$input = <STDIN>;
(define line separator as a non-defined character, then read until the "end of line" that never comes).
EDIT: Changed according to TLP's comment.
You have found the string concatenation operator.
Let's try it :
my $string = "foo";
$string .= "bar";
print $string;
foobar
This performs concatenation to the $input var. Whatever is coming in via STDIN is being assigned to $input.

What kind of syntactic sugar is available in Perl to reduce code for l/rvalue operators vs. if statements?

There's a bunch out there, as Perl is a pretty sugary language, but the most used statements in any language is the combination of if statements and setting values. I think I've found many of them, but there's still a few gaps. Ultimately, the goal would be to not have to write a variable name more than once:
Here's what I have so far:
$r ||= $s; # $r = $s unless ($r);
$r //= $s; # $r = $s unless (defined $r);
$r &&= $s; # $r = $s if ($r);
$r = $c ? $s : $t; # if ($c) { $r = $s } else { $r = $t }
$c ? $r : $s = $t; # if ($c) { $r = $t } else { $s = $t }
$r = $s || $t; # if ($s) { $r = $s } else { $r = $t }
$r = $s && $t; # if ($s) { $r = $t } else { $r = $s = undef, 0, untrue, etc. }
$c and return $r; # return $r if ($c);
$c or return $r; # return $r unless ($c);
$c and $r = $s; # $r = $s if ($c);
#$r{qw(a b c d)} # ($r->{a}, $r->{b}, $r->{c}, $r->{d})
Somebody also had a really interesting article on a "secret operator", shown here:
my #part = (
'http://example.net/app',
( 'admin' ) x!! $is_admin_link,
( $subsite ) x!! defined $subsite,
$mode,
( $id ) x!! defined $id,
( $submode ) x!! defined $submode,
);
However, what I've found to be missing from the list is:
$r <= $s; # read as "$r = min($r, $s);" except with short-circuiting
$r = $s if (defined $s); # what's the opposite of //?
$r and return $r # can that be done without repeating $r?
Is there anything else worth adding? What other conditional set variables are available to reduce the code? What else is missing?
These structures from your question could be written a little bit more clearly using the low precedence and and or keywords:
$c and return $r; # return $r if ($c);
$c or return $r; # return $r unless ($c);
$c and $r = $s; # $r = $s if ($c);
The nice thing about and and or is that unlike the statement modifier control words, and and or can be chained into compound expressions.
Another useful tool for syntactic sugar is using the for/foreach loop as a topicalizer over a single value. Consider the following:
$var = $new_value if defined $new_value;
vs
defined and $var = $_ for $new_value;
or things like:
$foo = "[$foo]";
$bar = "[$bar]";
$_ = "[$_]" for $foo, $bar;
the map function can also be used in this manner, and has a return value you can use.
There's also the left hand side ternary operator:
$cond ? $var1 : $var2 = "the value";
is equivalent to:
if ($cond) {
$var1 = "the value";
} else {
$var2 = "the value";
}
$r = $r < $s ? $r : $s;:
$r = $s if $r > $s;
or
use List::Util qw( min );
$r = min($r, $s);
or:
sub min_inplace {
my $min_ref = \shift;
for (#_) { $$min_ref = $_ if $$min_ref > $_; }
}
min_inplace($r, $s);
$r = $s if (defined $s);:
$r = $s // $r;
$r = $t; $r = $s if (defined $s);:
$r = $s // $t;
$r = !$s ? $s : $t;:
$r = $s && $t;
One of the biggest called for features in Perl was the switch statement. This finally appeared in Perl 5.10. I'm just using the example from the documentation:
use feature qw(say switch); #My preference
#use feature ":5.10"; #This does both "say" and "switch"
[...]
given($foo) {
when (undef) {
say '$foo is undefined';
}
when ("foo") {
say '$foo is the string "foo"';
}
when ([1,3,5,7,9]) {
say '$foo is an odd digit';
continue; # Fall through
}
when ($_ < 100) {
say '$foo is numerically less than 100';
}
when (\&complicated_check) {
say 'a complicated check for $foo is true';
}
default {
die q(I don't know what to do with $foo);
}
}
Why o' why did they go with given/when and not switch/case like you find in most languages is a mystery to me. And, why if the statement is given/when, do you specify it in use features as switch?
Alas, the people who made these decisions are at a higher plane than I am, so I have no right to even question these luminaries.
I avoid the more exotic stuff, and stick with the easiest to understand syntax. Imagine the person who has to go through your code and find a bug of add a feature, which would be easier for that person to understand:
$r &&= $s;
or
if ($r) {
$r = $s;
}
And, maybe I might realize that I really meant:
if (not defined $r) {
$r = $s;
}
And, in this case, I might even say:
$r = $s if not defined $r;
Although I don't usually like post-fixed if statements because people tend to miss the if part when glancing through the code.
Perl is compiled at runtime, and the compiler is fairly efficient. So, even though it's way cooler to write $r &&= $s and it earns it earns you more geek points and is less to type, it doesn't execute any faster. The biggest amount of time spent on code is on maintaining it, so I'd rather skip the fancy stuff and go for readability.
By the way, when I think of syntactic sugar, I think of things added to the language to improve readability. A great example is the -> operator:
${${${$employee_ref}[0]}{phone}}[0];
vs.
$employee_ref->[0]->{phone}->[0];
Of course, if you're storing data as a reference to a list to a hash to a list, you are probably better off using object oriented coding.
There's also:
$hash{$key||'foo'} = 1; # if($key) { $hash{$key} = 1 } else { $hash{'foo'} = 1 }