Efficiently inserting a decimal in perl string - perl

I came across code:
sub insertDecimal
{
my $number = shift;
my $sigDigRight = shift;
if ($number =~ /\./) { return ($number); }
elsif (length $number < $sigDigRight) { return ($number); }
else
{
my $leftSide = substr($number, 0, (length $number)-$sigDigRight);
my $rightSide = substr($number, (length $number)-$sigDigRight, );
return ($leftSide . "." . $rightSide);
}
}
And I hoped to improve/re-write as:
sub insertDecimal
{
my ($number, $sigDigRight) = #_;
return $number if index ($number, '.') != -1 or length $number < $sigDigRight;
# YES! substr takes an LVALUE ... perldoc it for more :)
substr($number, -$sigDigRight, 0) = '.';
return $number;
}
I was very surprised that a run of some 74mm records had almost no improvement at all with 2nd version.
Questions:
Anyone to overflow with better way to make insertDecimal more efficient ?
How come I see no improvement, at all (just one minute better on 74MM records) ?
If Perl compiler is re-jiggering the code of the first version to be more efficient, is there anyway I can see the improved path to execution that Perl has chosen ?

Both routines would seem to do essentially the same amount of work:
scan $number for a single character (any compiler ought be able to reduce that regex match to an index)
compare the length of $number to a limit
possibly insert a single character somewhere within $number
Using lvalue substr (or, just taking advantage of the fourth argument to substr) may make the insertion a little more efficient, but, after all, things will have to be moved.
To my eye, the biggest opportunity for optimization comes from moving the length check ahead of the check for the decimal point.
I would be tempted to re-write your routine as
sub insertDecimal {
my ($number, $sigDigRight) = #_;
return $number if length($number) < $sigDigRight;
return $number if index($number, '.') >= 0;
substr($number, -$sigDigRight, 0, '.');
$number;
}
I find simple decisions and short lines to be easier to understand. I do not think this should change the correctness of the function.
An ugly alternative is:
sub gah {
my ($number, $sigDigRight) = #_;
my $n = length($number) - $sigDigRight;
return $number unless $n > 0;
$number =~ s{\A ([^.]{$n}) ([^.]+) \z}{$1.$2}x;
$number;
}
That combines the check for . with the replacement operation.
Again, I cannot be certain this is correct wrt your spec, but it is something for you to explore.
I probably would not opt for gah unless the improvement was more than 20% of something that took at least an hour or so. On my system, it slows down a simple example by 1,000%.

Related

using for loop, finding prime number between 1-100

I'd tried lots of ways to get rid of this problem but... I can't find what's the problem of this code.
I use Perl and I want to find the prime number between 1-100.
use strict;
my $i=0;
my $j=0;
my $count=0;
for ($i=0; $i<101; $i++)
{
for ($j=2; $j<$i; $j++)
{
if ($i%$j==0)
{
$count+=1;
}
}
if ($count==0)
{
print "$i\n";
}
}
There are a few things to think about as you use the Sieve of Eratosthenes. Armali already pointed out that you were reusing the value in $count because you had it in a higher scope, so it didn't reset for each number you wanted to check.
But, I reformatted your code a bit:
use v5.10;
NUM: for( my $i=1; $i < 101; $i++ ) {
DIVISOR: for( my $j=2; $j < $i; $j++ ) {
next NUM if $i%$j == 0;
}
say $i;
}
Instead of using a flag variable ($count) to figure out what to do, you can use loop controls. If you find any divisor, you know that you have found a non-prime and there's no need to continue. That is, you don't need to count divisors.
When you find one, stop and move on to the next number. To do that, I've labeled the looping constructs. That way, in the inner loop I can skip to the next iteration of the outer loop. And, usually, once I label one loop I label them all but you don't need to do that.
Once you figure that part out, you don't need to do so much work. Aside from 2, you know that all the even numbers are not prime. You don't need to check those. So, instead of being clever, I'll just break out 2 as a special case:
use v5.10;
say 2;
NUM: for( my $i = 3; $i < 101; $i += 2 ) {
DIVISOR: for( my $j=2; $j < $i; $j++ ) {
next NUM if $i%$j == 0;
}
say $i;
}
The inner loop is doing too much work too. None of the numbers that you are checking are even, so you don't need to check any even divisors (or those ending 5 once you choose 5). And, you only have to go half way, so you can stop when you get to the square root of the number.
#!perl
use v5.10;
say 2;
NUM: for( my $i = 3; $i < 101; $i += 2 ) {
my $stop_at = int sqrt $i;
DIVISOR: for( my $j=3; $j <= $stop_at; $j += 2 ) {
next NUM if $i % $j == 0;
}
say $i;
}
And, for a final flourish, I'll take the top number from the command-line arguments but default to 100. With that, the comparison in the outer loop changes to <=:
#!perl
use v5.10;
my $limit = $ARGV[0] // 100;
say 2;
NUM: for( my $i = 3; $i <= $limit; $i += 2 ) {
my $stop_at = int sqrt $i;
DIVISOR: for( my $j=3; $j <= $stop_at; $j += 2 ) {
next NUM if $i % $j == 0;
}
say $i;
}
But, ikegami notes in a comment that for my $x (0..$n-1) is more idiomatic. That doesn't easily handle step sizes larger than 1. You can do various things to multiply that number to get the candidate number, or ways to generate the list ahead of time (but that means you have the list all at once). I'll switch to a while instead, and assume that these other bits do their work properly.
The $get_number is some magic subroutine that always gives us back the next number, and the is_prime does what it does to make the determination:
while( my $n = $get_number->() ) {
say $n if is_prime($n);
}
Here's one way that might work. First, there's a nifty Perl regex trick to determine primes. It doesn't matter that I'm using that because you can change it to whatever you like because it's hidden behind is_prime. The biggest benefit here is that it's short (and a bit of a show off):
#!perl
use v5.10;
my $get_number = generate_sub( $ARGV[0] // 100 );
while( my $n = $get_number->() ) {
say $n if is_prime($n);
}
sub is_prime { ( '1' x $_[0] ) !~ /\A(11+?)\1+\z/ }
sub generate_sub {
my( $limit ) = #_;
sub {
state $queue = [ 2, 3 ];
return if $queue->[0] > $limit;
push $queue->#*, $queue->[-1] + 2;
shift $queue->#*;
}
}
The generate_sub is a bit more tricky. First, the 2 makes is a bit tricky. Second, Perl doesn't have a yield like Python or Ruby (would be nice). To get around that, I'll see a queue with the first two numbers then add the next number based on the last on (so, adding 2 to 3 gets 5, and so on). That gets around the unique interval from 2 to 3. This stops if the next number in the queue is above the one that you want.
But, that's a bit complicated and only there to handle the special case of 2. I've been playing with a different idiom lately although I'm not convinced its desirable.
The state is a way to declare a persistent lexical variable. It runs only on the first execution. We'll use a state to return the first 2 right away. Then, the next time we come around, that $rc statement doesn't run and $next has 3. From there, I get the current number (0+$next so it's not the same data), and increment $next in a list, but only return the first in that list. That's just a trick that condenses the if-else:
sub generate_sub {
my( $limit ) = #_;
sub {
state $rc = do { return 2 };
state $next = 3;
return $next <= $limit ? ( 0+$next, $next += 2 )[0] : ();
}
}
I don't recommend this for your problem, but you should consider a way to generate the list of numbers so it's not tightly coupled to the problem. That way, you can get rid of the looping constructs.
But, that's much more than you needed to know.
You initialized my $count=0; outside instead of inside the outer for loop.
Besides that, $i should start from 2 rather than 0.

How does this searching work in Perl?

my ($len, $longest) =0;
length > $len and ($longest, $len)=($_, length) for #matches;
#matches stores some substrings. This code catches the longest substring from #matches, then stores it in $longest.
Source code:
#!usr/bin/perl
use strict;
use Data::Dumper;
my $needle = "axibidm";
my $haystack = "axididm";
my #matches;
for my $start (0..length $needle) {
for my $len (1 .. ((length $needle)-$start)) {
my $substr = substr($needle, $start, $len);
push #matches, $haystack =~ m[($substr)]g;
print "$substr\t";
print "#matches\t\n";
}
}
my ($len, $longest) = 0;
length > $len and ($longest, $len) = ($_, length) for #matches;
print "The longest common substring between\n", $needle, "\nand\n", $haystack, "\nis '$longest'\n";
Someone was being far too clever, or they made a typo. Or both. Probably both.
There's some things going on with that code which don't do what it appears to be doing. This does not initialize both variables to zero.
my ($len, $longest) = 0;
Its a deceptive way to write this.
my $len = 0;
my $longest;
for $matches is silly, there's only one thing to iterate over so why use a loop? This idiom is very occasionally used to put the value into $_ and use it in various default constructs, but that's not much use here.
Next, the following is a very tortured way to write do this if that. It's written that way to make it a single expression which will work within the for loop statement modifier.
length > $len and ($longest, $len)=($_, length)
It is much better written like this.
if( length > $len ) {
$longest = $_;
$len = length;
}
Expanding it out, and removing the useless for loop, we get...
my $len = 0;
my $longest;
if( length $match > $len ) {
$longest = $match;
$len = length $match;
}
The other option is that $matches is an array references and they meant for #$matches. for $matches would still "work" but it would always return a length of 21 because an array reference stringifies as something like ARRAY(0x7fc07c800468).
That looks almost deliberately obfuscated. Here's a more verbose expression of the same logic.
my $len = 0;
my $longest;
foreach my $match (#matches) {
if (length($match) > $len) {
$longest = $match;
$len = length($match);
}
}
So let's compare.
my ($len, $longest) = 0;
This declares the two lexical (my) variables $len and $longest, and sets the first one ($len) to 0, leaving $longest at its default value of undef.
This structure:
(code goes here) for #matches;
is the same as this:
for (#matches) {
(code goes here)
}
So we're iterating over the #matches array and running the code once per element. Within the body of the code, the special variable $_ will hold the current element.
length > $len and ($longest, $len) = ($_, length);
First, (expression) and (code) is a shorthand way of writing if ( (expression) ) { (code) }. It works because in Perl, and is evaluated left-to-right in a short-circuiting fashion. That is, if the left side expression is false, Perl doesn't bother evaluating the right side, since its value doesn't matter; false and anything is false.
When length is called without an argument, it means length($_), so that's the length of the current element of #matches that is being examined.
And ($var1, $var2) = ($val1, $val2) is parallel assignment that sets $var1 to $val1 and $var2 to $val2.
EXPR for LIST;
is roughly the same as
for (LIST) { EXPR; }
EXPR1 and EXPR2;
is roughly the same as
if (EXPR1) { EXPR2; }
(This is not a generally accepted practice, except when EXPR2 is a flow control expression (next, die, etc).)
length defaults to using $_ as its argument (length($_)).
( $x, $y ) = ( EXPR1, EXPR2 )
is roughly the same as
$x = EXPR1;
$y = EXPR2;
(One notable difference is that you can do ($x,$y)=($y,$x) to swap values, but that's used here.)
(Using a list assignment when two scalar assignments would do is also not a generally accepted practice.)
A more conventional way of writing the code would be:
my $len = 0;
my $longest;
for (#matches) {
if (length($_) > $len) {
$longest = $_;
$len = length($_);
}
}

Perl - finding the largest palindrome product

I started learning perl recently and I wrote this code to find the largest palindrome product that can be obtained by multiplying 2 3-digit numbers. (question here: https://projecteuler.net/problem=4 )
Here is the code:
#!/usr/bin/perl
use 5.010;
sub checkpal{
for ($k=0;$k<length($_[0]);$k++){$b = substr ($_[0], $k, 1).$b;}
if ($_[0] eq $b){1}else{0}
}
$pals = $numb1 = undef;
for ($i = 998001; $i>=10000; $i--){
if (&checkpal($i)){
for ($j = 100; $j <1000; $j++){
if ( !($i % $j) && (length $j == 3) ){$numb1 = $j; $pals = $i; last;}
}
}
if (defined $numb1){last}
}
say $numb1." ".($pals/$numb1);
My idea is quite simple. It simply goes through a loop starting from 998001 (the largest value that product of 2 3-digit number can have) and check if the value is palindrome. If it is a palindrome, it goes through another loop to check if it can be obtained by multiplying 2 three digit numbers. Now, this algorithm might not be the most efficient or the best in the world, it should at least give the result. Which it isn't.
The problem isn't in the subroutine checkpal as far as I know. But the if (&checkblock($i)) block doesn't get executed even when $i is a palindrome. And I don't know why. Sorry if it is obvious or something .. but please tell me why it isn't working?
if ( !($i % $j) and length($i/$j)==3) { .. }
instead of
if ( !($i % $j) && (length $j == 3) )
as you want to check whether $i/$j has three digits, not $j which goes anyway from 100 to 999.
As a side-note,
if (checkpal($i))
can be replaced with simple
if ($i eq reverse $i)

Perl 5 - Iterator

I have implemented a simple iterator in perl. I normally work with C#, and use iterators and functional programming quite frequently. So I thought it would be simple to get some basics working in perl.
Problem is, I'm getting some poor performance, I don't expect be any faster than for or foreach, but I thought someone could give me some insight in how to speed it up.
Here is the guts of my package:
package Iterator;
use strict;
#Constructor for Iterator type
sub new {
my $type = $_[0];
my $this = {};
#set default values
$this->{Array} = #_[1];
$this->{Index} = 0;
$this->{Sub} = sub {
my #array = #{$this->{Array}};
return $#array >= $this->{Index} ? $array[$this->{Index}++] : undef;
};
#return self
bless($this, $type);
return $this;
}
#Iterates next
sub Next {
return $_[0]->{Sub}->();
}
Allows you to do this:
my $iterator = Iterator->new(\#array);
while (defined(my $current = $iterator->Next())) {
print $current, "\n";
}
Not flashy... yet.
Also enables some functional code like this:
my $sum = 0;
Iterator
->new(\#array)
->Where(sub { $_[0] % 2 == 0 })
->ForEach(sub { $sum += $_[0] });
Which would sum up all the even values of an array.
My bottleneck is the iteration code:
$this->{Sub} = sub {
my #array = #{$this->{Array}};
return $#array >= $this->{Index} ? $array[$this->{Index}++] : undef;
};
Any pointers to speed this up?
A bit late to the game here, but since you are concerned about performance, one of the largest bottlenecks in iterator type code is that the fields of your hash based object need to be dereferenced on each access. One way to combat this is to use closures in which the fields are closed over variables, avoiding unneeded dereferencing.
In my module List::Gen which contains a fairly performant implementation of lazy lists, I wrote the utility function curse which makes closure based objects behave like normal Perl objects.
Here is a short example of your iterator class written with curse. In a simple benchmark summing 1000 numbers, this method is twice as fast as yours, even after fixing all of the inefficiencies noted in the other answers.
{package Iterator;
use List::Gen 'curse';
sub new {
my ($class, $array) = #_;
my $index = 0;
curse {
next => sub {$$array[$index++]},
index => sub :lvalue {$index},
} => $class
}
sub reset {shift->index = 0}
}
If you are really pushing for more speed, since the next method does not need to be passed anything, you could even write:
my $next = $iterator->can('next');
while (defined (my $x = $next->()) {...}
Which will give you a 30% to 40% speed boost over a normal method call.
You can read the source of List::Gen for more advanced usage of curse
You might find it useful to read a bit of Higher Order Perl.
this line:
my #array = #{$this->{Array}};
duplicates the array into #array, and I don't think you want to. Just do $#{$this->{Array}} to find the endpoint of your array.
A much more efficient version:
package Iterator;
use strict;
#Constructor for Iterator type
sub new {
my $type = shift;
my $array = shift;
my $this = {};
$this->{array} = $array;
$this->{index} = 0;
bless($this, $type);
return $this;
}
#Iterates next
sub Next {
my $this = shift;
return $this->{array}->[$this->{index}++];
}
Summing even numbers is easier done using grep and List::Util:
use List::Util 'sum';
say sum grep { not $_ % 2 } (1 .. 10); // 30
It seems very likely to me that that the code suggested by your question is over-engineering. Unless you can come up with a decent example that cannot be easily solved using the traditional Perl primitives.
Have a look at List::Util and List::MoreUtils for utilities that may help you with this.
You can even use perl5i for a more modern looking syntax.
Example:
use perl5i::2;
my #nums = (0..100);
my $sumEven = #nums->grep(sub { $_ % 2 == 0 })->reduce(sub { $a+$b });
say $sumEven;
There is already an array iterator in CPAN, so you can look at its approach if you have not done it yet.
By the way in your code you have:
#set default values
$this->{Array} = #_[1];
I assume you want to say $_[1]. With #_[1] you are requesting an array slice of one element. At the end the result is the same but the semantics isn't. The curious thing is that I was expecting to have an array of one element if I do #_[1] or an error but tested in the debugger and you obtain the scalar (at least in perl 5.10). Perl 6 will go for this behaviour anyway and will not change sigil for accessing elements in arrays or hashes so you are coding 'advanced' Perl ;-)
Don't unload the stored array. You're copying every element of an array from where it is pointed at by $this->{Array} to the local list #array when you do this:
my #array = #{$this->{Array}};
Also if you know that you are going to stop when you hit undef, then you don't have to even check bounds.
$this->{Sub} = sub { return $this->{Array}[++$this->{Index}]; }
Is all you need. When {Index} gets out of range, it will return undef.
In addition, you can write your expression in Perl like:
$sum += $_ foreach grep { $_ % 2 == 0 } #array;
A much simpler Perl iterator:
my #array = (1, 2, 3, 4);
while (my $i = shift #array)
{
print $i . "\n";
}

How can I count overlapping substrings in Perl?

i need to implement a program to count the occurrence of a substring in a string in perl. i have implemented it as follows
sub countnmstr
{
$count =0;
$count++ while $_[0] =~ /$_[1]/g;
return $count;
}
$count = countnmstr("aaa","aa");
print "$count\n";
now this is what i would normally do. however, in the implementation above i want to count occurrence of 'aa' in 'aaa'. here i get answer as 1 which seems reasonable but i need to consider the overlapping cases as well. hence the above case should give an answer as 2 since there are two 'aa's if we consider overlap.
can anyone suggest how to implement such a function??
Everyone is getting pretty complicated in their answers (d'oh! daotoad should have made his comment an answer!), perhaps because they are afraid of the goatse operator. I didn't name it, that's just what people call it. It uses the trick that the result of a list assignment is the number of elements in the righthand list.
The Perl idiom for counting matches is then:
my $count = () = $_[0] =~ /($pattern)/g;
The goatse part is the = () =, which is an empty list in the middle of two assignments. The lefthand part of the goatse gets the count from the righthand side of the goatse. Note the you need a capture in the pattern because that's the list the match operator will return in list context.
Now, the next trick in your case is that you really want a positive lookbehind (or lookahead maybe). The lookarounds don't consume characters, so you don't need to keep track of the position:
my $count = () = 'aaa' =~ /((?<=a)a)/g;
Your aaa is just an example. If you have a variable-width pattern, you have to use a lookahead. Lookbehinds in Perl have to be fixed width.
See ysth's answer ... I failed to realize that the pattern could consist solely of a zero width assertion and still work for this purpose.
You can use positive lookahead as suggested by others, and write the function as:
sub countnmstr {
my ($haystack, $needle) = #_;
my ($first, $rest) = $needle =~ /^(.)(.*)$/;
return scalar (() = $haystack =~ /(\Q$first\E(?=\Q$rest\E))/g);
}
You can also use pos to adjust where the next search picks up from:
#!/usr/bin/perl
use strict; use warnings;
sub countnmstr {
my ($haystack, $needle) = #_;
my $adj = length($needle) - 1;
die "Search string cannot be empty!" if $adj < 0;
my $count = 0;
while ( $haystack =~ /\Q$needle/g ) {
pos $haystack -= $adj;
$count += 1;
}
return $count;
}
print countnmstr("aaa","aa"), "\n";
Output:
C:\Temp> t
2
sub countnmstr
{
my ($string, $substr) = #_;
return scalar( () = $string =~ /(?=\Q$substr\E)/g );
}
$count = countnmstr("aaa","aa");
print "$count\n";
A few points:
//g in list context matches as many times as possible.
\Q...\E is used to auto-escape any meta characters, so that you are doing a substring count, not a subpattern count.
Using a lookahead (?= ... ) causes each match to not "consume" any of the string, allowing the following match to be attempted at the very next character.
This uses the same feature where a list assignment (in this case, to an empty list) in scalar context returns the count of elements on the right of the list assignment as the goatse/flying-lentil/spread-eagle/whatever operator, but uses scalar() instead of a scalar assignment to provide the scalar context.
$_[0] is not used directly, but instead copied to a lexical; a naive use of $_[0] in place of $string would cause the //g to start partway through the string instead of at the beginning if the passed string had a stored pos().
Update: s///g is faster, though not as fast as using index:
sub countnmstr
{
my ($string, $substr) = #_;
return scalar( $string =~ s/(?=\Q$substr\E)//g );
}
You could use a lookahead assertion in the regular expression:
sub countnmstr {
my #matches = $_[0] =~ /(?=($_[1]))/g;
return scalar #matches;
}
I suspect Sinan's suggestion will be quicker though.
you can try this, no more regex than needed.
$haystack="aaaaabbbcc";
$needle = "aa";
while ( 1 ){
$ind = index($haystack,$needle);
if ( $ind == -1 ) {last};
$haystack = substr($haystack,$ind+1);
$count++;
}
print "Total count: $count\n";
output
$ ./perl.pl
Total count: 4
If speed is an issue, the index approach suggested by ghostdog74 (with cjm's improvement) is likely to be considerably faster than the regex solutions.
use strict;
use warnings;
sub countnmstr_regex {
my ($haystack, $needle) = #_;
return scalar( () = $haystack =~ /(?=\Q$needle\E)/g );
}
sub countnmstr_index {
my ($haystack, $needle) = #_;
my $i = 0;
my $tally = 0;
while (1){
$i = index($haystack, $needle, $i);
last if $i == -1;
$tally ++;
$i ++;
}
return $tally;
}
use Benchmark qw(cmpthese);
my $size = 1;
my $h = 'aaa aaaaaa' x $size;
my $n = 'aa';
cmpthese( -2, {
countnmstr_regex => sub { countnmstr_regex($h, $n) },
countnmstr_index => sub { countnmstr_index($h, $n) },
} );
__END__
# Benchmarks run on Windows.
# Result using a small haystack ($size = 1).
Rate countnmstr_regex countnmstr_index
countnmstr_regex 93701/s -- -66%
countnmstr_index 271893/s 190% --
# Result using a large haystack ($size = 100).
Rate countnmstr_regex countnmstr_index
countnmstr_regex 929/s -- -81%
countnmstr_index 4960/s 434% --