Why if/elsif in Perl execute only the first block? - perl

I am new to Perl. I have an assignment to write a Perl program that accept a countable word from a command line and then generates its plural form. I have composed the following code below, and it shows no errors of compilation. When I execute it from the command line:
(perl plural.pl, for example), it prompts me to enter a noun, then whatever noun I feed as input, the plural form is the same. It doesn't execute the remaining if statements.
For example, if I enter the word "cat", the plural is generated as "cats". But when I enter the word 'church', for example, the plural is generated as 'churches', "fly" as "flys".
Here is the code:
#!/usr/bin/perl
$suffix1 = 's';
$suffix2 = 'es';
$suffix3 = 'ies';
print "Enter a countable noun to get plural: ";
$word = <STDIN>;
chomp($word);
if(substr $word, -1 == 'b' or 'd' or 'c' or 'g' or 'r' or 'j' or 'k' or 'l' or 'm' or 'n' or 'p' or 'q' or 'r' or 't' or 'v' or 'w' or 'e' or 'i' or 'o' or 'u') {
$temp = $word.$suffix1;
print "The plural form of the word \"$word\" is: $temp \n";
}
elsif (substr $word, -1 == 's' or 'sh' or 'ch' or 'x' or 'z') {
$temp = $word.$suffix2;
print "The plural form of the word \"$word\" is: $temp \n";
}
elsif (substr $word, -1 == 'y') {
chop($word);
$temp = $word.$suffix3;
print "The plural form of the word \"$word\" is: $temp \n";
}
Could you help me making the code execute the three statements.

First of all, always use use strict; use warnings;.
Strings are compared using eq, not ==.
substr $word, -1 eq 'b' means substr $word, (-1 eq 'b') when you meant substr($word, -1) eq 'b'. You'll face lots of problems if you omit parens around function calls.
substr($word, -1) eq 'b' or 'd' means the same as (substr($word, -1) eq 'b') or ('d'). 'd' is always true. You'd need to use substr($word, -1) eq 'b' or substr($word, -1) eq 'd'. (Preferably, you'd save substr $word, -1 in a variable to avoid doing it repeatedly.)
substr $word, -1 will never equal ch or sh.
The match operator makes this easy:
if ($word =~ /[bdcgrjklmnpqrtvweiou]\z/) {
...
}
elsif ($word =~ /(?:[sxz]|[sc]h)\z/) {
...
}
elsif ($word =~ /y\z/) {
...
}

In Perl, we use eq for string comparison instead of ==.
You can't use or like this. It should be like if (substr($word, -1) eq 'b' or substr ($word, -1) eq 'd'). Otherwise you could use an array containing all the string that you would like to compare and grep from that array.

Duskast is right. Perl uses symbols for numeric comparisons, and strings for string comparisons.
== eq
!= ne
< lt
<= le
> gt
>= ge
<=> cmp
Also, your use of or, though a good try, doesn't work. The keyword or has weak precedence, and so the expression
substr $word, -1 == 'b' or 'd' or 'c' or
'g' or 'r' or 'j' or
'k' or 'l' or 'm' or
'n' or 'p' or 'q' or
'r' or 't' or 'v' or
'w' or 'e' or 'i' or
'o' or 'u'
is interpreted as
substr ($word, (-1 == 'b')) or 'd' or 'c' or
'g' or 'r' or 'j' or
'k' or 'l' or 'm' or
'n' or 'p' or 'q' or
'r' or 't' or 'v' or
'w' or 'e' or 'i' or
'o' or 'u'
I'm not sure what the substr works out to, but if it's false, the expression continues to the or 'b', which is interpreted as true. Have you seen regular expressions yet? This is much more idiomatically done as
if ($word =~ /[bdcgrjklmnpqrtvweiou]$/) {...}
# Does $word match any of those characters followed by
# the end of the line or string?
Look in the Perl docs for string substitution and the s/.../.../ construct.
By the way, if you were paid to do this instead of being a student, you'd use the Lingua modules instead.

First of all, always, always include use strict; and use warnings;.
Second, use indentations. I've taught Perl courses at work and refuse to accept any assignment that was not indented correctly. In fact, I'm very, very strict about this because I want users to learn to code to the standard (4 space indent, etc.). It makes your program easier to read and to support.
While we're at it, break overly long lines -- especially on StackOverflow. It's hard to read a program when you have to scroll back and forth.
Quick look at your program:
In Perl, strings and numerics use two different sets of boolean operations. This is because strings can contain only digits, but still be strings. Imagine inventory item numbers like 1384 and 993. If I'm sorting these as strings, the 1384 item comes first. If I am sorting them numerically, 993 should come first. Your program has no way of knowing this except by the boolean operation you use:
Boolean Operation Numeric String
================= ======= ======
Equals == eq
Not Equals != ne
Greater Than > gt
Less Than < lt
Greater than/Equals >= ge
Less than/Equals <= le
THe other is that an or, and, || and && only work with two booleans. This won't work:
if ( $a > $b or $c ) {
What this is saying is this:
if ( ( $a > $b ) or $c ) {
So, if $c is a non-zero value, then $c will be true, and the whole statement would be true. You have to do your statement this way:
if ( $a > $b or $a > $c ) {
Another thing, use qq(..) and q() when quoting strings that contain quotation marks. This way, you don't have to put a backslash in front of them.
print "The word is \"swordfish\"\n";
print qq(The word is "swordfish"\n);
And, if you use use feature qw(say); at the top of your program, you get the bonus command of say which is like print, except the ending new line is assumed:
say qq(The word is "swordfish");
When you use substr, $foo, -1, you are only looking at the last character. It cannot ever be a two character string:
if ( substr $word, -1 eq "ch" ) {
will always be false.
Long ifs are hard to maintain. I would use a for loop (actually not, but let's pretend for now..):
#! /usr/bin/env perl
#
# Use these in ALL of your programs
#
use strict;
use warnings;
use feature qw(say);
#
# Use better, more descriptive names
#
my $standard_plural_suffix = 's';
my $uncommon_plural_suffix = 'es';
my $y_ending_plural_suffix = 'ies';
print "Enter a countable noun to get plural: ";
chomp (my $word = <STDIN>);
my $plural_form;
#
# Instead of a long, long "if", use a for loop for testing. Easier to maintain
#
for my $last_letter qw( b d c g r j k l m n p q r t v w e i o u) {
if ( substr($word, -1) eq $last_letter ) {
$plural_form = $word . $standard_plural_suffix;
last;
}
}
#
# Is it an "uncommon plural" test (single character)
#
if ( not $plural_form ) {
for my $last_letter qw(s x z) {
if ( substr($word, -1) eq $last_letter ) {
$plural_form = $word . $uncommon_plural_suffix;
last;
}
}
}
#
# Is it an "uncommon plural" test (double character)
#
if ( not $plural_form ) {
for my $last_two_letters qw(sh ch) {
if ( substr($word, -2) eq $last_two_letters ) {
$plural_form = $word . $uncommon_plural_suffix;
last;
}
}
}
if ( not $plural_form ) {
if ( substr($word, -1) eq 'y' ) {
chop ( my $chopped_word = $word );
$plural_form = $chopped_word . $y_ending_plural_suffix;
}
}
if ( $plural_form ) {
say qq(The plural of "$word" is "$plural_form");
}
else {
say qq(Could not find plural form of "$word");
}
Do you know about regular expressions? Those would work a lot better than using substr because you can test multiple things at once. Plus, I wouldn't use chop, but a regular expression substitution:
#! /usr/bin/env perl
#
# Use these in ALL of your programs
#
use strict;
use warnings;
use feature qw(say);
#
# Use better, more descriptive names
#
my $standard_plural_suffix = 's';
my $uncommon_plural_suffix = 'es';
my $y_ending_plural_suffix = 'ies';
print "Enter a countable noun to get plural: ";
chomp (my $word = <STDIN>);
my $plural_form;
#
# Standard plural (adding plain ol' 's'
#
if ( $word =~ /[bdcgrjklmnpqrtvweiou]$/ ) {
$plural_form = $word . $standard_plural_suffix;
}
#
# Uncommon plural (adding es)
#
elsif ( $word =~ /([sxz]|[sc]h)$/ ) {
$plural_form = $word . $uncommon_plural_suffix;
}
#
# Final 'y' rule: Replace y with ies
#
elsif ( $word =~ /y$/ ) {
$plural_form = $word;
$plural_form =~ s/y$/ies/;
}
if ( $plural_form ) {
say qq(The plural of "$word" is "$plural_form");
}
else {
say qq(Could not find plural form of "$word");
}

I have changed your code a bit. I'm using regular expression:
#!/usr/bin/perl
$suffix1 = 's';
$suffix2 = 'es';
$suffix3 = 'ies';
print "Enter a countable noun to get plural: ";
$word = <STDIN>;
chomp($word);
if ( $word =~ m/(s|sh|ch|x|z)$/) {
$temp = $word . $suffix2;
}
elsif ( substr( $word, -1 ) eq 'y' ) {
chop($word);
$temp = $word . $suffix3;
}
else {
$temp = $word . $suffix1;
}
print "The plural form of the word \"$word\" is: $temp \n";
Also I recommend you always use strict; and use warnings;

Related

How to tell if a string can be transformed into another string by removing or uppercasing lowercase letters?

In perl, I have two input strings, for this example, ahueFFggLKy and HFFGLK. I want to be able to iterate through all of the possible combinations of my input without lowercase letter groups (a, h, u, e, g...ah, au...hegy, etc) so in each iteration lowercase letters are removed and the remaining lowercase letters are uppercased:
ah:
ueFFggLKy (UEFFGGLKY)
^^
au:
h eFFggLKy (HEFFGGLKY)
^ ^
hegy:
a u FF gLKy | a u FFg LKy (AUFFGLKY)
^ ^ ^ | ^ ^ ^
auegy:
h FF gLK | h FFg LK (HFFGLK)
^ ^^ ^ ^ ^ ^^ ^ ^ -^--^-
The last option (auegy) is the answer, and I want to be able to iterate over letters to determine if I am able to convert ahueFFggLKy to HFFGLK without modifying any of the capital letters. This example would return "YES".
If inputs like fOoBar and BAR come up, I am not successfully able to convert fOoBar to BAR because the O in fOoBar is capitalized. My program would return "NO".
Can someone provide me with a perl example of how this would be done?
I think I have understood your requirement: the first string may be transformed by either deleting or upper-casing any lower-case letter, and you wish to know whether the second string can be derived from the first in this way
I suggest that you can transform the second string to a regex pattern to achieve this. If every upper-case letter in the second string must be matched by the corresponding upper or lower-case letter in the first, with any number of intervening lower-case letters, then the transformation is possible. Otherwise it is not
This program implements the idea
use strict;
use warnings 'all';
use feature 'say';
my #pairs = (
[ qw/ ahueFFggLKy HFFGLK / ],
[ qw/ fOoBar BAR / ],
);
for my $pair ( #pairs ) {
my ($s1, $s2) = #$pair;
printf "%s => %s -- %s\n", $s1, $s2, contains($s1, $s2) ? 'YES' : 'NO';
}
sub contains {
my ($s1, $s2) = #_;
my $re = join ' \p{Ll}* ', map { "(?i: $_ )" } $s2 =~ /\p{Lu}/g;
$re = qr/ ^ \p{Ll}* $re \p{Ll}* $ /x;
$s1 =~ $re;
}
output
ahueFFggLKy => HFFGLK -- YES
fOoBar => BAR -- NO
To read an array like #pairs from STDIN you could write something like this
my #pairs;
{
local $/;
my #input = split ' ', <>;
push #pairs, [ splice #input, 0, 2 ] while #input > 1;
}
Kind of unelegant solution, but it seems to output what you need.
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw{ all };
my ($str1, $str2) = qw( ahueFFggLKy HFFGLK );
my #small_indices;
push #small_indices, pos($str1) - 1 while $str1 =~ /[[:lower:]]/g;
my #present = (0) x #small_indices;
until (all { $_ } #present) {
my $try = $str1;
for my $i (reverse 0 .. $#present) {
substr $try, $small_indices[$i], 1,
$present[$i] ? substr $str1, $small_indices[$i], 1
: q();
}
if (uc $try eq $str2) {
print $present[$_] ? q() : substr $str1, $small_indices[$_], 1
for 0 .. $#present;
print ":\n";
my $j = 0;
for my $i (0 .. length($str1) - 1) {
my $char = substr $str1, $i, 1;
if ($char eq uc $char || $present[$j++]) {
print $char;
} else {
print '.';
}
}
print "\n";
}
my $idx = 0;
$present[$idx] = 0, ++$idx while $present[$idx];
$present[$idx] = 1;
}
It builds an indicator function #present, which say what lowercase letters are present in the string. All possible values of #present are iterated by adding 1 to the binary number corresponding to the function.

stripping off numbers and alphabetics in perl

I have an input variable, say $a. $a can be either number or string or mix of both.
My question is how can I strip off the variable to separate numeric digits and alphabetic characters?
Example;
$a can be 'AB9'
Here I should be able to store 'AB' in one variable and '9' in other.
How can I do that?
Check this version, it works with 1 or more numeric and alphabetic characters in a variable.
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $var = '11a';
my (#digits, #alphabetics);
while ($var =~ /([a-zA-Z]+)/g) {
push #alphabetics, $1;
}
while ($var =~ /(\d+)/g) {
push #digits, $1;
}
print Dumper(\#alphabetics);
print Dumper(\#digits);
Here's one way to express it very shortly:
my ($digits) = $input =~ /(\d+)/;
my ($alpha) = $input =~ /([a-z]+)/i;
say 'digits: ' . ($digits // 'none');
say 'non-digits: ' . ($alpha // 'none');
It's important to use the match operator in list context here, otherwise it would return if the match succeeded.
If you want to get all occurrences in the input string, simply change the scalar variables in list context to proper arrays:
my #digits = $input =~ /(\d+)/g;
my #alpha = $input =~ /([a-z]+)/gi;
say 'digits: ' . join ', ' => #digits;
say 'non-digits: ' . join ', ' => #alpha;
For my $input = '42AB17C', the output is
digits: 42, 17
non-digits: AB, C

Saving a transliteration table in perl

I want to transliterate digits from 1 - 8 with 0 but not knowing the number at compile time. Since transliterations do not interpolate variables I'm doing this:
#trs = (sub{die},sub{${$_[0]} =~ tr/[0,1]/[1,0]/},sub{${$_[0]} =~ tr/[0,2]/[2,0]/},sub{${$_[0]} =~ tr/[0,3]/[3,0]/},sub{${$_[0]} =~ tr/[0,4]/[4,0]/},sub{${$_[0]} =~ tr/[0,5]/[5,0]/},sub{${$_[0]} =~ tr/[0,6]/[6,0]/},sub{${$_[0]} =~ tr/[0,7]/[7,0]/},sub{${$_[0]} =~ tr/[0,8]/[8,0]/});
and then index it like:
$trs[$character_to_transliterate](\$var_to_change);
I would appreciate if anyone can point me to a best looking solution.
Any time that you are repeating yourself, you should see if what you are doing can be done in a loop. Since tr creates its tables at compile time, you can use eval to access the compiler at runtime:
my #trs = (sub {die}, map {eval "sub {\$_[0] =~ tr/${_}0/0$_/}"} 1 .. 8);
my $x = 123;
$trs[2]($x);
print "$x\n"; # 103
There is also no need to use references here, subroutine arguments are already passed by reference.
If you do not want to use string eval, you need to use a construct that supports runtime modification. For that you can use the s/// operator:
sub subst {$_[0] =~ s/($_[1]|0)/$1 ? 0 : $_[1]/ge}
my $z = 1230;
subst $z => 2;
print "$z\n"; # 1032
The tr/// construct is faster than s/// since the latter supports regular expressions.
I'd suggest simply ditching tr in favor of something that actually permits a little bit of metaprogramming like s///. For example:
# Replace $to_swap with 0 and 0 with $to_swap, and leave
# everything else alone.
sub swap_with_0 {
my ($digit, $to_swap) = #_;
if ($digit == $to_swap) {
return 0;
} elsif ($digit == 0) {
return $to_swap;
} else {
return $digit;
}
}
# Swap 0 and $to_swap throughout $string
sub swap_digits {
my ($string, $to_swap) = #_;
$string =~ s/([0$to_swap])/swap_with_0($1, $to_swap)/eg;
return $string;
}
which is surprisingly straightforward. :)
Here's a short subroutine that uses substitution instead of transliteration:
sub swap_digits {
my ($str, $digit) = #_;
$str =~ s{ (0) | $digit }{ defined $1 ? $digit : 0 }gex;
return $str;
}

In Perl, I need to read a .conf file that contains condition like $a>$b based on the result of the condition i have to execute either if or else block

The content of condition.conf:
condition1=$a>$b
Example Perl code:
$cnd_ConfFile = $ARGV[0];
open(CONDITIONS, '<', $cndConfFile);
$cndCount=0;
while ( <CONDITIONS> ) {
chomp; # no newline
s/#.*//; # no comments
s/^\s+//; # no leading white
s/\s+$//; # no trailing white
next unless length;
($var, $value) = split(/\s*=\s*/, $_, 2);
$cndOnCounterValues[$cndCount++]=$value;
}
close CONDITIONS;
$cond = $cndOnCounterValues[0];
print "\n$cond\n";
$a=3;
$b=5;
if($cond){
print "a is greater then b";
}
else
{
print "b is greater then a";
}
The above code always gives the output "a is greater then b".
Regardless of the values of $a and $b.
I assume that you want to eval the $a>$b expression that literally appears in your config file. To do that replace:
if ($cond) {
with:
if (eval $cond) {
That should to the trick.
Disclaimer: don't do this unless you know what you are doing (see comments).
Here i a quick example that seems to satisfy your problem.
#! /usr/bin/env perl
use strict;
use warnings;
my #cond;
{
while( <> ){
chomp;
next unless length;
next if m' ^ \s* \# 'x;
next unless m' (\w+) \s* = \s* (.*?) \s* $'x;
push #cond, [$1,$2];
}
}
my($a,$b);
$a=3;
$b=5;
for my $elem ( #cond ){
my($name,$cond) = #$elem;
if( eval $cond ){
print "$name is true, because $cond matches "
}else{
print "$name is false, because $cond doesn't match "
}
print '(', eval("qq{$cond}"), ")\n";
}
echo 'condition1=$a>$b
condition2=$a<$b' | perl test.pl
condition1 is false, because $a>$b doesn't match (3>5)
condition2 is true, because $a<$b matches (3<5)

How do I determine the longest similar portion of several strings?

As per the title, I'm trying to find a way to programmatically determine the longest portion of similarity between several strings.
Example:
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
Ideally, I'd get back file:///home/gms8994/Music/, because that's the longest portion that's common for all 3 strings.
Specifically, I'm looking for a Perl solution, but a solution in any language (or even pseudo-language) would suffice.
From the comments: yes, only at the beginning; but there is the possibility of having some other entry in the list, which would be ignored for this question.
Edit: I'm sorry for mistake. My pity that I overseen that using my variable inside countit(x, q{}) is big mistake. This string is evaluated inside Benchmark module and #str was empty there. This solution is not as fast as I presented. See correction below. I'm sorry again.
Perl can be fast:
use strict;
use warnings;
package LCP;
sub LCP {
return '' unless #_;
return $_[0] if #_ == 1;
my $i = 0;
my $first = shift;
my $min_length = length($first);
foreach (#_) {
$min_length = length($_) if length($_) < $min_length;
}
INDEX: foreach my $ch ( split //, $first ) {
last INDEX unless $i < $min_length;
foreach my $string (#_) {
last INDEX if substr($string, $i, 1) ne $ch;
}
}
continue { $i++ }
return substr $first, 0, $i;
}
# Roy's implementation
sub LCP2 {
return '' unless #_;
my $prefix = shift;
for (#_) {
chop $prefix while (! /^\Q$prefix\E/);
}
return $prefix;
}
1;
Test suite:
#!/usr/bin/env perl
use strict;
use warnings;
Test::LCP->runtests;
package Test::LCP;
use base 'Test::Class';
use Test::More;
use Benchmark qw(:all :hireswallclock);
sub test_use : Test(startup => 1) {
use_ok('LCP');
}
sub test_lcp : Test(6) {
is( LCP::LCP(), '', 'Without parameters' );
is( LCP::LCP('abc'), 'abc', 'One parameter' );
is( LCP::LCP( 'abc', 'xyz' ), '', 'None of common prefix' );
is( LCP::LCP( 'abcdefgh', ('abcdefgh') x 15, 'abcdxyz' ),
'abcd', 'Some common prefix' );
my #str = map { chomp; $_ } <DATA>;
is( LCP::LCP(#str),
'file:///home/gms8994/Music/', 'Test data prefix' );
is( LCP::LCP2(#str),
'file:///home/gms8994/Music/', 'Test data prefix by LCP2' );
my $t = countit( 1, sub{LCP::LCP(#str)} );
diag("LCP: ${\($t->iters)} iterations took ${\(timestr($t))}");
$t = countit( 1, sub{LCP::LCP2(#str)} );
diag("LCP2: ${\($t->iters)} iterations took ${\(timestr($t))}");
}
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
Test suite result:
1..7
ok 1 - use LCP;
ok 2 - Without parameters
ok 3 - One parameter
ok 4 - None of common prefix
ok 5 - Some common prefix
ok 6 - Test data prefix
ok 7 - Test data prefix by LCP2
# LCP: 22635 iterations took 1.09948 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) # 20766.06/s (n=22635)
# LCP2: 17919 iterations took 1.06787 wallclock secs ( 1.07 usr + 0.00 sys = 1.07 CPU) # 16746.73/s (n=17919)
That means that pure Perl solution using substr is about 20% faster than Roy's solution at your test case and one prefix finding takes about 50us. There is not necessary using XS unless your data or performance expectations are bigger.
The reference given already by Brett Daniel for the Wikipedia entry on "Longest common substring problem" is very good general reference (with pseudocode) for your question as stated. However, the algorithm can be exponential. And it looks like you might actually want an algorithm for longest common prefix which is a much simpler algorithm.
Here's the one I use for longest common prefix (and a ref to original URL):
use strict; use warnings;
sub longest_common_prefix {
# longest_common_prefix( $|# ): returns $
# URLref: http://linux.seindal.dk/2005/09/09/longest-common-prefix-in-perl
# find longest common prefix of scalar list
my $prefix = shift;
for (#_) {
chop $prefix while (! /^\Q$prefix\E/);
}
return $prefix;
}
my #str = map {chomp; $_} <DATA>;
print longest_common_prefix(#ARGV), "\n";
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
If you truly want a LCSS implementation, refer to these discussions (Longest Common Substring and Longest Common Subsequence) at PerlMonks.org. Tree::Suffix would probably be the best general solution for you and implements, to my knowledge, the best algorithm. Unfortunately recent builds are broken. But, a working subroutine does exist within the discussions referenced on PerlMonks in this post by Limbic~Region (reproduced here with your data).
#URLref: http://www.perlmonks.org/?node_id=549876
#by Limbic~Region
use Algorithm::Loops 'NestedLoops';
use List::Util 'reduce';
use strict; use warnings;
sub LCS{
my #str = #_;
my #pos;
for my $i (0 .. $#str) {
my $line = $str[$i];
for (0 .. length($line) - 1) {
my $char= substr($line, $_, 1);
push #{$pos[$i]{$char}}, $_;
}
}
my $sh_str = reduce {length($a) < length($b) ? $a : $b} #str;
my %map;
CHAR:
for my $char (split //, $sh_str) {
my #loop;
for (0 .. $#pos) {
next CHAR if ! $pos[$_]{$char};
push #loop, $pos[$_]{$char};
}
my $next = NestedLoops([#loop]);
while (my #char_map = $next->()) {
my $key = join '-', #char_map;
$map{$key} = $char;
}
}
my #pile;
for my $seq (keys %map) {
push #pile, $map{$seq};
for (1 .. 2) {
my $dir = $_ % 2 ? 1 : -1;
my #offset = split /-/, $seq;
$_ += $dir for #offset;
my $next = join '-', #offset;
while (exists $map{$next}) {
$pile[-1] = $dir > 0 ?
$pile[-1] . $map{$next} : $map{$next} . $pile[-1];
$_ += $dir for #offset;
$next = join '-', #offset;
}
}
}
return reduce {length($a) > length($b) ? $a : $b} #pile;
}
my #str = map {chomp; $_} <DATA>;
print LCS(#str), "\n";
__DATA__
file:///home/gms8994/Music/t.A.T.u./
file:///home/gms8994/Music/nina%20sky/
file:///home/gms8994/Music/A%20Perfect%20Circle/
It sounds like you want the k-common substring algorithm. It is exceptionally simple to program, and a good example of dynamic programming.
My first instinct is to run a loop, taking the next character from each string, until the characters are not equal. Keep a count of what position in the string you're at and then take a substring (from any of the three strings) from 0 to the position before the characters aren't equal.
In Perl, you'll have to split up the string first into characters using something like
#array = split(//, $string);
(splitting on an empty character sets each character into its own element of the array)
Then do a loop, perhaps overall:
$n =0;
#array1 = split(//, $string1);
#array2 = split(//, $string2);
#array3 = split(//, $string3);
while($array1[$n] == $array2[$n] && $array2[$n] == $array3[$n]){
$n++;
}
$sameString = substr($string1, 0, $n); #n might have to be n-1
Or at least something along those lines. Forgive me if this doesn't work, my Perl is a little rusty.
If you google for "longest common substring" you'll get some good pointers for the general case where the sequences don't have to start at the beginning of the strings.
Eg, http://en.wikipedia.org/wiki/Longest_common_substring_problem.
Mathematica happens to have a function for this built in:
http://reference.wolfram.com/mathematica/ref/LongestCommonSubsequence.html (Note that they mean contiguous subsequence, ie, substring, which is what you want.)
If you only care about the longest common prefix then it should be much faster to just loop for i from 0 till the ith characters don't all match and return substr(s, 0, i-1).
From http://forums.macosxhints.com/showthread.php?t=33780
my #strings =
(
'file:///home/gms8994/Music/t.A.T.u./',
'file:///home/gms8994/Music/nina%20sky/',
'file:///home/gms8994/Music/A%20Perfect%20Circle/',
);
my $common_part = undef;
my $sep = chr(0); # assuming it's not used legitimately
foreach my $str ( #strings ) {
# First time through loop -- set common
# to whole
if ( !defined $common_part ) {
$common_part = $str;
next;
}
if ("$common_part$sep$str" =~ /^(.*).*$sep\1.*$/)
{
$common_part = $1;
}
}
print "Common part = $common_part\n";
Faster than above, uses perl's native binary xor function, adapted from perlmongers solution (the $+[0] didn't work for me):
sub common_suffix {
my $comm = shift #_;
while ($_ = shift #_) {
$_ = substr($_,-length($comm)) if (length($_) > length($comm));
$comm = substr($comm,-length($_)) if (length($_) < length($comm));
if (( $_ ^ $comm ) =~ /(\0*)$/) {
$comm = substr($comm, -length($1));
} else {
return undef;
}
}
return $comm;
}
sub common_prefix {
my $comm = shift #_;
while ($_ = shift #_) {
$_ = substr($_,0,length($comm)) if (length($_) > length($comm));
$comm = substr($comm,0,length($_)) if (length($_) < length($comm));
if (( $_ ^ $comm ) =~ /^(\0*)/) {
$comm = substr($comm,0,length($1));
} else {
return undef;
}
}
return $comm;
}