Converting base10 to base36 in perl [duplicate] - perl

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What’s the best way to do base36 arithmetic in Perl?
Hello is it possible to convert numbers from a base-10-to-base-36-conversion with perl script?
here's an example :
base 10 - 1234567890123 and outcome
base 36 - FR5HUGNF

Try using Math::Base36 CPAN library for base conversion.

Poking around the Math::Base36 source code shows how easy it is to enact the conversion:
sub encode_base36 {
my ( $number, $padlength ) = #_;
$padlength ||= 1;
die 'Invalid base10 number' if $number =~ m{\D};
die 'Invalid padding length' if $padlength =~ m{\D};
my $result = '';
while ( $number ) {
my $remainder = $number % 36;
$result .= $remainder <= 9 ? $remainder : chr( 55 + $remainder );
$number = int $number / 36;
}
return '0' x ( $padlength - length $result ) . reverse( $result );
}

It is quite straightforward to write a subroutine to do this. The code below does no value checking and assumes the numbers to be converted are always non-negative
If your version of Perl isn't sufficiently up-to-date to support the state keyword, then just declare $symbols as a my variable at the head of the program
use strict;
use warnings;
use feature 'state';
print base36(1234567890123);
sub base36 {
my ($val) = #_;
state $symbols = join '', '0'..'9', 'A'..'Z';
my $b36 = '';
while ($val) {
$b36 = substr($symbols, $val % 36, 1) . $b36;
$val = int $val / 36;
}
return $b36 || '0';
}
output
FR5HUGNF

Related

No values being output

I'm having a problem coding my first Perl program.
What I'm trying to do here is getting the maximum, minimum,total and average of a list of numbers using a subroutine for each value and another subroutine to print the final values. I'm using a "private" for all my variables, but I still couldn't print my values.
Here is my code:
&max(<>);
&print_stat(<>);
sub max {
my ($mymax) = shift #_;
foreach (#_) {
if ( $_ > $mymax ) {
$mymax = $_;
}
}
return $mymax;
}
sub print_stat {
print max($mymax);
}
Please try this one:
use strict;
use warnings;
my #list_nums = qw(10 21 30 42 50 63 70);
ma_xi(#list_nums);
sub ma_xi
{
my #list_ele = #_;
my $set_val_max = '0'; my $set_val_min = '0';
my $add_all_vals = '0';
foreach my $each_ele(#list_ele)
{
$set_val_max = $each_ele if($set_val_max < $each_ele);
$set_val_min = $each_ele if($set_val_min eq '0');
$set_val_min = $each_ele if($set_val_min > $each_ele);
$add_all_vals += $each_ele;
}
my $set_val_avg = $add_all_vals / scalar(#list_ele) + 1;
print "MAX: $set_val_max\n";
print "MIN: $set_val_min\n";
print "TOT: $add_all_vals\n";
print "AVG: $set_val_avg\n";
#Return these values into array and get into the new sub routine's
}
Some notes
Use plenty of whitespace to lay out your code. I have tidied the Perl code in your question so that I could read it more easily, without changing its semantics
You must always use strict and use warnings 'all' at the top of every Perl program you write
Never use an ampersand & in a subroutine call. That hasn't been necessary or desirable since Perl 4 over twenty-five years ago. Any tutorial that tells you otherwise is wrong
Using <> in a list context (such as the parameters to a subroutine call) will read all of the file and exhaust the file handle. Thereafter, any calls to <> will return undef
You should use chomp to remove the newline from each line of input
You declare $mymax within the scope of the max subroutine, but then try to print it in print_stat where it doesn't exists. use strict and use warnings 'all' would have caught that error for you
Your max subroutine returns the maximum value that it calculated, but you never use that return value
Below is a fixed version of your code.
Note that I've read the whole file into array #values and then chomped them all at once. In general it's best to read and process input one line at a time, which would be quite possible here but I wanted to say as close to your original code as possible
I've also saved the return value from max in variable $max, and then passed that to print_stat. It doesn't make sense to try to read the file again and pass all of those values to print_stat, as your code does
I hope this helps
use strict;
use warnings 'all';
my #values = <>;
chomp #values;
my $max = max(#values);
print_stat( $max );
sub max {
my $mymax = shift;
for ( #_ ) {
if ( $_ > $mymax ) {
$mymax = $_;
}
}
return $mymax;
}
sub print_stat {
my ($val) = #_;
print $val, "\n";
}
Update
Here's a version that calculates all of the statistics that you mentioned. I don't think subroutines are a help in this case as the solution is short and no code is reusable
Note that I've added the data at the end of the program file, after __DATA__, which lets me read it from the DATA file handle. This is often handy for testing
use strict;
use warnings 'all';
my ($n, $max, $min, $tot);
while ( <DATA> ) {
next unless /\S/; # Skip blank lines
chomp;
if ( not defined $n ) {
$max = $min = $tot = $_;
}
else {
$max = $_ if $max < $_;
$min = $_ if $min > $_;
$tot += $_;
}
++$n;
}
my $avg = $tot / $n;
printf "\$n = %d\n", $n;
printf "\$max = %d\n", $max;
printf "\$min = %d\n", $min;
printf "\$tot = %d\n", $tot;
printf "\$avg = %.2f\n", $avg;
__DATA__
7
6
1
5
1
3
8
7
output
$n = 8
$max = 8
$min = 1
$tot = 38
$avg = 4.75

Fixed field width number output in perl

I have a random number between 0.001 and 1000 and I need perl to print it with a fixed column width of 5 characters. That is, if it's too long, it should be rounded, and if it's too short, it should be padded with spaces.
Everything I found online suggested using sprintf, but sprintf ignores the field width if the number is too long.
Is there any way to get perl to do this?
What doesn't work:
my $number = 0.001 + rand(1000);
my $formattednumber = sprintf("%5f", $number);
print <<EOF;
$formattednumber
EOF
You need to define your sprintf pattern dynamically. The number of decimals depends on the number of digits on the left hand side of the decimal point.
This function will do that for you.
use strict;
use warnings 'all';
use feature 'say';
sub round_to_col {
my ( $number, $width ) = #_;
my $width_int = length int $number;
return sprintf(
sprintf( '%%%d.%df', $width_int, $width - 1 - $width_int ),
$number
);
}
my $number = 0.001 + rand(1000);
say round_to_col( $number, 5);
Output could be:
11.18
430.7
0.842
You could use pack after the sprintf. It may not be a computationally efficient approach, but it is relatively simple to implement and maintain:
my $formattednumber = pack ('A5', sprintf("%5f", $number));
The answer posted by simbabque does not cover all cases, so this is my improved version, just in case anyone also needs something like this:
sub round_to_col {
my ( $number, $width ) = #_;
my $width_int = length int $number;
my $sprintf;
print "round_to_col error: number longer than width" if $width_int > $width;
$sprintf = "%d" if $width_int == $width;
$sprintf = "% d" if $width_int == $width - 1;
$sprintf = sprintf( '%%%d.%df', $width_int, $width - 1 - $width_int )
if $width_int < $width -1;
return sprintf( $sprintf , $number );
}

Perl Recursion and Functions

Having heard about Perl for year I decided to give it a few hours of my time to see how much I could pick up. I got through the basics fine and then got to loops. As a test I wanted to see if I could build a script to recurse through all alphanumerical values of up to 4 characters. I had written a PHP code that did the same thing some time ago so I took the same concept and used it. However when I run the script it puts "a" as the first 3 values and then only loops through the last digit. Anyone see what I am doing wrong?
#!/usr/local/bin/perl
$chars = "abcdefghijklmnopqrstuvwxyz";
$chars .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$chars .= "0123456789";
#charset = split(//, $chars);
$charset_length = scalar(#charset);
sub recurse
{
($width, $position, $base_string) = #_;
for ($i = 0; $i < $charset_length; ++$i) {
$base = $base_string . $charset[$i];
if ($position < $width - 1) {
$pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print "\n";
}
}
recurse(4, 0, '');
This is what I get when I run it:
aaaa
aaab
aaac
aaad
aaae
aaaf
aaag
aaah
aaai
aaaj
aaak
aaal
aaam
aaan
aaao
aaap
aaaq
aaar
aaas
aaat
aaau
aaav
aaaw
aaax
aaay
aaaz
aaaA
aaaB
aaaC
aaaD
aaaE
aaaF
aaaG
aaaH
aaaI
aaaJ
aaaK
aaaL
aaaM
aaaN
aaaO
aaaP
aaaQ
aaaR
aaaS
aaaT
aaaU
aaaV
aaaW
aaaX
aaaY
aaaZ
aaa0
aaa1
aaa2
aaa3
aaa4
aaa5
aaa6
aaa7
aaa8
aaa9
aaa9
aaa9
aaa9
You've been bitten by non strict scoping, this code does what it should (note the use strict at the top and the subsequent use of my to guarantee variable scoping).
#!/usr/bin/env perl
use strict;
use warnings;
my $chars = "abcdefghijklmnopqrstuvwxyz";
$chars .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$chars .= "0123456789";
my #charset = split(//, $chars);
my $charset_length = scalar(#charset);
sub recurse {
my ($width, $position, $base_string) = #_;
for (my $i = 0; $i < $charset_length; ++$i) {
my $base = $base_string . $charset[$i];
if ($position < $width - 1) {
my $pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print "\n";
}
}
recurse(4, 0, '');
Already well answered, but a more idiomatic approach would be:
use strict;
use warnings;
sub recurse {
my ($width, $base_string, $charset) = #_;
if (length $base_string) {
print "$base_string\n";
}
if (length($base_string) < $width) {
$recurser->($base_string . $_) for #$charset;
}
}
my #charset = ('a'..'z', 'A'..'Z', '0'..'9');
recurse(4, '', \#charset);
There's no need to pass position; it's implicit in the width of the base string passed in. The charset, on the other hand, should be passed in rather than having the subroutine use an external variable.
Alternatively, since the width and character set stay constant, generate a closure that references them:
use strict;
use warnings;
sub make_recurser {
my ($width, $charset) = #_;
my $recurser;
$recurser = sub {
my ($base_string) = #_;
if (length $base_string) {
print "$base_string\n";
}
if (length($base_string) < $width) {
$recurser->($base_string . $_) for #$charset;
}
}
}
my #charset = ('a'..'z', 'A'..'Z', '0'..'9');
my $recurser = make_recurser(4, \#charset);
$recurser->('');
Alternatively, just:
print "$_\n" for glob(('{' . join(',', 'a'..'z', 'A'..'Z', '0'..'9') . '}') x 4);
It has to do with the scope of the variables, you're still changing the same vars when you're calling the recursion. The keyword 'my' declares the variables local to the subroutine.
(http://perl.plover.com/FAQs/Namespaces.html)
I always use perl with 'use strict;' declared, forcing me to decide on the scope of the variables.
sub recurse {
my ($width, $position, $base_string) = #_;
for (my $i = 0; $i < $charset_length; ++$i) {
my $base = $base_string . $charset[$i];
if ($position < $width - 1) {
my $pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print " ";
}
}
You seem to be running into some scoping issues. Perl is very flexible, so it is taking a guess at what you want because you haven't told it what you want. One of the first things you'll learn is to add use strict; as for your first statement after the shebang. It will point out the variables that are not being explicitly defined, as well as any variables that are accessed before being created (helps with misspelled variables, etc).
If you make your code look like this, you'll see why you are getting your errors:
sub recurse {
($width, $position, $base_string) = #_;
for ($i = 0; $i < $charset_length; ++$i) {
$base = $base_string . $charset[$i];
if ($position < $width - 1) {
$pos = $position + 1;
recurse($width, $pos, $base);
}
# print "$base\n";
}
print "$position\n";
}
This should output:
3
3
3
3
Because you are not scoping $position correctly with my, you aren't getting a new variable each recurse, you are re-using the same one. Toss a use strict; in there, and fix the errors you get, and the code should be good.
I realize that you're just tinkering with recursion. But as long as you're having fun comparing implementations between two languages you may as well also see how the CPAN can extend your tool set.
If you don't care about the order, you can generate all 13,388,280 permutations of ( 'a'..'z', 'A..'Z', '0'..'9' ) taken four at a time with the CPAN module, Algorithm::Permute
Here is an example of how that code may look.
use strict;
use warnings;
use Algorithm::Permute;
my $p = Algorithm::Permute->new(
[ 'a' .. 'z', 'A' .. 'Z', '0' .. '9' ], # Set of...
4 # <---- at a time.
);
while ( my #res = $p->next ) {
print #res, "\n";
}
The new() method accepts an array ref that enumerates the character set or list of what to permute. Its second argument is how many at a time to include in the permutation. So you're essentially taking 62 items 4 at a time. Then use the next() method to iterate through the permutations. The rest is just window dressing.
The same thing could be reduced to the following Perl one-liner:
perl -MAlgorithm::Permute -e '$p=Algorithm::Permute->new(["a".."z","A".."Z",0..9],4);print #r, "\n" while #r=$p->next;'
There is also a section on permutation, along with additional examples in perlfaq4. It includes several examples and lists some additional modules that handle the details for you. One of Perl's strengths is the size and completeness of the Comprehensive Perl Archive Network (the CPAN).

How can I do 64-bit hex/decimal arithmetic AND output a full number in HEX as string in Perl?

I need to do some arithmetic with large hexadecimal numbers below, but when I try to output I'm getting overflow error messages "Hexadecimal number > 0xffffffff non-portable", messages about not portable, or the maximum 32-bit hex value FFFFFFFF.
All of which imply that the standard language and output routines only cope with 32 bit values. I need 64-bit values and have done a lot of research, but I found nothing that BOTH enables the arithmetic AND outputs the large number in hex.
my $result = 0x00000200A0000000 +
( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
So, for $id with the following values I should get $result:
$id = 0, $result = 0x00000200A0000000
$id = 1, $result = 0x00000200A0000002
$id = 2, $result = 0x00000200A0000004
How can I do this?
Here is my inconclusive research results, with reasons why:
How can I do 64-bit arithmetic in Perl?
How can I sum large hexadecimal values in Perl? Vague, answer not definitively precise and no example.
Integer overflow
non conclusive
Integer overflow
non conclusive
bigint
no info about assignment, arithmetic or output
bignum
examples not close to my problem.
How can I sprintf a big number in Perl?
example given is not enough info for me: doesn't deal with hex
assignment or arithmetic.
Re: secret code generator
Some examples using Fleximal, mentions to_str to output value of
variable but 1) I don't see how the
variable was assigned and 2) I get
error "Can't call method "to_str"
without a package or object
reference" when I run my code using
it.
String to Hex
Example of using Math::BigInt which
doesn't work for me - still get
overflow error.
Is there a 64-bit hex()?
Nearly there - but doesn't deal with
outputting the large number in hex,
it only talks of decimal.
CPAN Math:Fleximal
does the arithmetic, but there doesn't seem to be any means to actually
output the value still in hex
sprintf
Doesn't seem to be able to cope with
numbers greater than 32-bits, get the
saturated FFFFFFFF message.
Edit: Update - new requirement and supplied solution - please feel free to offer comments
Chas. Owens answer is still accepted and excellent (part 2 works for me, haven't tried the part 1 version for newer Perl, though I would invite others to confirm it).
However, another requirement was to be able to convert back from the result to the original id.
So I've written the code to do this, here's the full solution, including #Chas. Owens original solution, followed by the implementation for this new requirement:
#!/usr/bin/perl
use strict;
use warnings;
use bigint;
use Carp;
sub bighex {
my $hex = shift;
my $part = qr/[0-9a-fA-F]{8}/;
croak "$hex is not a 64-bit hex number"
unless my ($high, $low) = $hex =~ /^0x($part)($part)$/;
return hex("0x$low") + (hex("0x$high") << 32);
}
sub to_bighex {
my $decimal = shift;
croak "$decimal is not an unsigned integer"
unless $decimal =~ /^[0-9]+$/;
my $high = $decimal >> 32;
my $low = $decimal & 0xFFFFFFFF;
return sprintf("%08x%08x", $high, $low);
}
for my $id (0 ,1, 2, 0xFFFFF, 0x100000, 0x100001, 0x1FFFFF, 0x200000, 0x7FDFFFFF ) {
my $result = bighex("0x00000200A0000000");
$result += ( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
my $clusterid = to_bighex($result);
# the convert back code here:
my $clusterid_asHex = bighex("0x".$clusterid);
my $offset = $clusterid_asHex - bighex("0x00000200A0000000");
my $index_small_units = ( $offset / 2 ) & 0xFFFFF;
my $index_0x100000_units = ( $offset / 0x40000000 ) * 0x100000;
my $index = $index_0x100000_units + $index_small_units;
print "\$id = ".to_bighex( $id ).
" clusterid = ".$clusterid.
" back to \$id = ".to_bighex( $index ).
" \n";
}
Try out this code at http://ideone.com/IMsp6.
#!/usr/bin/perl
use strict;
use warnings;
use bigint qw/hex/;
for my $id (0 ,1, 2) {
my $result = hex("0x00000200A0000000") +
( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
printf "%d: %#016x\n", $id, $result;
}
The bigint pragma replaces the hex function with a version that can handle numbers that large. It also transparently makes the mathematical operators deal with big ints instead of the ints on the target platform.
Note, this only works in Perl 5.10 and later. If you are running an earlier version of Perl 5, you can try this:
#!/usr/bin/perl
use strict;
use warnings;
use bigint;
use Carp;
sub bighex {
my $hex = shift;
my $part = qr/[0-9a-fA-F]{8}/;
croak "$hex is not a 64-bit hex number"
unless my ($high, $low) = $hex =~ /^0x($part)($part)$/;
return hex("0x$low") + (hex("0x$high") << 32);
}
sub to_bighex {
my $decimal = shift;
croak "$decimal is not an unsigned integer"
unless $decimal =~ /^[0-9]+$/;
my $high = $decimal >> 32;
my $low = $decimal & 0xFFFFFFFF;
return sprintf("%08x%08x", $high, $low);
}
for my $id (0 ,1, 2) {
my $result = bighex("0x00000200A0000000");
$result += ( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
print "$id ", to_bighex($result), "\n";
}
The comment by ysth is right. Short example of 64-bit arithmetics using Perl from Debian stretch without Math::BigInt aka "use bigint":
#!/usr/bin/perl -wwi
sub do_64bit_arith {
use integer;
my $x = ~2;
$x <<= 4;
printf "0x%08x%08x\n", $x>>32, $x;
}
do_64bit_arith();
exit 0;
The script prints 0xffffffffffffffffffffffffffffffd0.

How do I determine the longest similar portion of several strings?

As per the title, I'm trying to find a way to programmatically determine the longest portion of similarity between several strings.
Example:
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
Ideally, I'd get back file:///home/gms8994/Music/, because that's the longest portion that's common for all 3 strings.
Specifically, I'm looking for a Perl solution, but a solution in any language (or even pseudo-language) would suffice.
From the comments: yes, only at the beginning; but there is the possibility of having some other entry in the list, which would be ignored for this question.
Edit: I'm sorry for mistake. My pity that I overseen that using my variable inside countit(x, q{}) is big mistake. This string is evaluated inside Benchmark module and #str was empty there. This solution is not as fast as I presented. See correction below. I'm sorry again.
Perl can be fast:
use strict;
use warnings;
package LCP;
sub LCP {
return '' unless #_;
return $_[0] if #_ == 1;
my $i = 0;
my $first = shift;
my $min_length = length($first);
foreach (#_) {
$min_length = length($_) if length($_) < $min_length;
}
INDEX: foreach my $ch ( split //, $first ) {
last INDEX unless $i < $min_length;
foreach my $string (#_) {
last INDEX if substr($string, $i, 1) ne $ch;
}
}
continue { $i++ }
return substr $first, 0, $i;
}
# Roy's implementation
sub LCP2 {
return '' unless #_;
my $prefix = shift;
for (#_) {
chop $prefix while (! /^\Q$prefix\E/);
}
return $prefix;
}
1;
Test suite:
#!/usr/bin/env perl
use strict;
use warnings;
Test::LCP->runtests;
package Test::LCP;
use base 'Test::Class';
use Test::More;
use Benchmark qw(:all :hireswallclock);
sub test_use : Test(startup => 1) {
use_ok('LCP');
}
sub test_lcp : Test(6) {
is( LCP::LCP(), '', 'Without parameters' );
is( LCP::LCP('abc'), 'abc', 'One parameter' );
is( LCP::LCP( 'abc', 'xyz' ), '', 'None of common prefix' );
is( LCP::LCP( 'abcdefgh', ('abcdefgh') x 15, 'abcdxyz' ),
'abcd', 'Some common prefix' );
my #str = map { chomp; $_ } <DATA>;
is( LCP::LCP(#str),
'file:///home/gms8994/Music/', 'Test data prefix' );
is( LCP::LCP2(#str),
'file:///home/gms8994/Music/', 'Test data prefix by LCP2' );
my $t = countit( 1, sub{LCP::LCP(#str)} );
diag("LCP: ${\($t->iters)} iterations took ${\(timestr($t))}");
$t = countit( 1, sub{LCP::LCP2(#str)} );
diag("LCP2: ${\($t->iters)} iterations took ${\(timestr($t))}");
}
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
Test suite result:
1..7
ok 1 - use LCP;
ok 2 - Without parameters
ok 3 - One parameter
ok 4 - None of common prefix
ok 5 - Some common prefix
ok 6 - Test data prefix
ok 7 - Test data prefix by LCP2
# LCP: 22635 iterations took 1.09948 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) # 20766.06/s (n=22635)
# LCP2: 17919 iterations took 1.06787 wallclock secs ( 1.07 usr + 0.00 sys = 1.07 CPU) # 16746.73/s (n=17919)
That means that pure Perl solution using substr is about 20% faster than Roy's solution at your test case and one prefix finding takes about 50us. There is not necessary using XS unless your data or performance expectations are bigger.
The reference given already by Brett Daniel for the Wikipedia entry on "Longest common substring problem" is very good general reference (with pseudocode) for your question as stated. However, the algorithm can be exponential. And it looks like you might actually want an algorithm for longest common prefix which is a much simpler algorithm.
Here's the one I use for longest common prefix (and a ref to original URL):
use strict; use warnings;
sub longest_common_prefix {
# longest_common_prefix( $|# ): returns $
# URLref: http://linux.seindal.dk/2005/09/09/longest-common-prefix-in-perl
# find longest common prefix of scalar list
my $prefix = shift;
for (#_) {
chop $prefix while (! /^\Q$prefix\E/);
}
return $prefix;
}
my #str = map {chomp; $_} <DATA>;
print longest_common_prefix(#ARGV), "\n";
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
If you truly want a LCSS implementation, refer to these discussions (Longest Common Substring and Longest Common Subsequence) at PerlMonks.org. Tree::Suffix would probably be the best general solution for you and implements, to my knowledge, the best algorithm. Unfortunately recent builds are broken. But, a working subroutine does exist within the discussions referenced on PerlMonks in this post by Limbic~Region (reproduced here with your data).
#URLref: http://www.perlmonks.org/?node_id=549876
#by Limbic~Region
use Algorithm::Loops 'NestedLoops';
use List::Util 'reduce';
use strict; use warnings;
sub LCS{
my #str = #_;
my #pos;
for my $i (0 .. $#str) {
my $line = $str[$i];
for (0 .. length($line) - 1) {
my $char= substr($line, $_, 1);
push #{$pos[$i]{$char}}, $_;
}
}
my $sh_str = reduce {length($a) < length($b) ? $a : $b} #str;
my %map;
CHAR:
for my $char (split //, $sh_str) {
my #loop;
for (0 .. $#pos) {
next CHAR if ! $pos[$_]{$char};
push #loop, $pos[$_]{$char};
}
my $next = NestedLoops([#loop]);
while (my #char_map = $next->()) {
my $key = join '-', #char_map;
$map{$key} = $char;
}
}
my #pile;
for my $seq (keys %map) {
push #pile, $map{$seq};
for (1 .. 2) {
my $dir = $_ % 2 ? 1 : -1;
my #offset = split /-/, $seq;
$_ += $dir for #offset;
my $next = join '-', #offset;
while (exists $map{$next}) {
$pile[-1] = $dir > 0 ?
$pile[-1] . $map{$next} : $map{$next} . $pile[-1];
$_ += $dir for #offset;
$next = join '-', #offset;
}
}
}
return reduce {length($a) > length($b) ? $a : $b} #pile;
}
my #str = map {chomp; $_} <DATA>;
print LCS(#str), "\n";
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
It sounds like you want the k-common substring algorithm. It is exceptionally simple to program, and a good example of dynamic programming.
My first instinct is to run a loop, taking the next character from each string, until the characters are not equal. Keep a count of what position in the string you're at and then take a substring (from any of the three strings) from 0 to the position before the characters aren't equal.
In Perl, you'll have to split up the string first into characters using something like
#array = split(//, $string);
(splitting on an empty character sets each character into its own element of the array)
Then do a loop, perhaps overall:
$n =0;
#array1 = split(//, $string1);
#array2 = split(//, $string2);
#array3 = split(//, $string3);
while($array1[$n] == $array2[$n] && $array2[$n] == $array3[$n]){
$n++;
}
$sameString = substr($string1, 0, $n); #n might have to be n-1
Or at least something along those lines. Forgive me if this doesn't work, my Perl is a little rusty.
If you google for "longest common substring" you'll get some good pointers for the general case where the sequences don't have to start at the beginning of the strings.
Eg, http://en.wikipedia.org/wiki/Longest_common_substring_problem.
Mathematica happens to have a function for this built in:
http://reference.wolfram.com/mathematica/ref/LongestCommonSubsequence.html (Note that they mean contiguous subsequence, ie, substring, which is what you want.)
If you only care about the longest common prefix then it should be much faster to just loop for i from 0 till the ith characters don't all match and return substr(s, 0, i-1).
From http://forums.macosxhints.com/showthread.php?t=33780
my #strings =
(
'file:///home/gms8994/Music/t.A.T.u./',
'file:///home/gms8994/Music/nina%20sky/',
'file:///home/gms8994/Music/A%20Perfect%20Circle/',
);
my $common_part = undef;
my $sep = chr(0); # assuming it's not used legitimately
foreach my $str ( #strings ) {
# First time through loop -- set common
# to whole
if ( !defined $common_part ) {
$common_part = $str;
next;
}
if ("$common_part$sep$str" =~ /^(.*).*$sep\1.*$/)
{
$common_part = $1;
}
}
print "Common part = $common_part\n";
Faster than above, uses perl's native binary xor function, adapted from perlmongers solution (the $+[0] didn't work for me):
sub common_suffix {
my $comm = shift #_;
while ($_ = shift #_) {
$_ = substr($_,-length($comm)) if (length($_) > length($comm));
$comm = substr($comm,-length($_)) if (length($_) < length($comm));
if (( $_ ^ $comm ) =~ /(\0*)$/) {
$comm = substr($comm, -length($1));
} else {
return undef;
}
}
return $comm;
}
sub common_prefix {
my $comm = shift #_;
while ($_ = shift #_) {
$_ = substr($_,0,length($comm)) if (length($_) > length($comm));
$comm = substr($comm,0,length($_)) if (length($_) < length($comm));
if (( $_ ^ $comm ) =~ /^(\0*)/) {
$comm = substr($comm,0,length($1));
} else {
return undef;
}
}
return $comm;
}