Force auto-increment to treat its argument as string - perl

I need Perl's auto-increment magic for strings, but some strings (such as those composed entirely of digits) are interpreted as numbers and a normal increment is performed instead. How would I force Perl to treat a value passed to ++ as a string?

Here's the related question about how auto incrementing works: Autoincrementing letters in Perl
Like the docs explained, basically you need the variable to
match the regex /^[a-zA-Z]*[0-9]*\z/ and
only be used in string contexts.
Because you have variables that don't match the regex, those ones will be treated as numbers. You can write your own increment function to get your desired functionality. Here's an idea I had about how it could work to get you started.
#!/usr/bin/perl
use strict;
use warnings;
my $test = "1000";
for (0..100) {
$test = increment($test);
}
print $test . "\n";
$test = "M2V3";
for (0..100) {
$test = increment($test);
}
print $test . "\n";
sub increment {
my ($str) = #_;
my #letters = reverse split //, $str;
my $add = "";
my $increment = 1;
my $result = "";
for my $let (#letters) {
if ( $increment == 1 ) {
++$let;
}
if ( $let =~ /(.)(.)/ ) {
$add = $2;
$increment = 1;
} else {
$add = $let;
$increment = 0;
}
$result = $add . $result;
}
return $result;
}
This outputs:
1101
M3F4
I didn't calculate to confirm that M3F4 is the correct result but it seems close.

Related

Passing strings as array to subroutine and return count of specific char

I was trying to think in the right way to tackle this:
-I would to pass say, n elements array as argument to a subroutine. And for each element match two char types S and T and print for each element, the count of these letters. So far I did this but I am locked and found some infinite loops in my code.
use strict;
use warnings;
sub main {
my #array = #_;
while (#array) {
my $s = ($_ = tr/S//);
my $t = ($_ = tr/T//);
print "ST are in total $s + $t\n";
}
}
my #bunchOfdata = ("QQQRRRRSCCTTTS", "ZZZSTTKQSST", "ZBQLDKSSSS");
main(#bunchOfdata);
I would like the output to be:
Element 1 Counts of ST = 5
Element 2 Counts of ST = 6
Element 3 Counts of ST = 4
Any clue how to solve this?
while (#array) will be an infinite loop since #array never gets smaller. You can't read into the default variable $_ this way. For this to work, use for (#array) which will read the array items into $_ one at a time until all have been read.
The tr transliteration operator is the right tool for your task.
The code needed to get your results could be:
#!/usr/bin/perl
use strict;
use warnings;
my #data = ("QQQRRRRSCCTTTS", "ZZZSTTKQSST", "ZBQLDKSSSS");
my $i = 1;
for (#data) {
my $count = tr/ST//;
print "Element $i Counts of ST = $count\n";
$i++;
}
Also, note that my $count = tr/ST//; doesn't require the binding of the transliteration operator with $_. Perl assumes this when $_ holds the value to be counted here. Your code tried my $s = ($_ = tr/S//); which will give the results but the shorter way I've shown is the preferred way.
(Just noticed you had = instead of =~ in your statement. That is an error. Has to be $s = ($_ =~ tr/S//);)
You can combine the 2 sought letters as in my code. Its not necessary to do them separately.
I got the output you want.
Element 1 Counts of ST = 5
Element 2 Counts of ST = 6
Element 3 Counts of ST = 4
Also, you can't perform math operations in a quoted string like you had.
print "ST are in total $s + $t\n";
Instead, you would need to do:
print "ST are in total ", $s + $t, "\n";
where the operation is performed outside of the string.
Don't use while to traverse an array - your array gets no smaller, so the condition is always true and you get an infinite loop. You should use for (or foreach) instead.
for (#array) {
my $s = tr/S//; # No need for =~ as tr/// works on $_ by default
my $t = tr/T//;
print "ST are in total $s + $t\n";
}
Why tr///??
sub main {
my #array = #_;
while (#array) {
my $s = split(/S/, $_, -1) - 1;
my $t = split(/T/, $_, -1) - 1;
print "ST are in total $s + $t\n";
}
}

Perl - longest common prefix of 2 or more strings?

How can i create a Perl subroutine which would take in an array and find the longest common prefix for 2 or more of its elements? (strings)
I have this code:
sub longest_common_prefix {
$prefix = shift;
for (#_) {
chop $prefix while (! /^\Q$prefix\E/);
}
return $prefix;
}
But it only works if you are looking for the longest common prefix of all strings.
For example, if i pass an array with the following strings:
aaaBGFB
aaaJJJJ
jjfkBBB
aaaHGHG
I want it to return aaa as the answer.
Thanks!
I'd use a modified trie.
Normally, one could use the following to add to a trie:
sub add {
my $p = \shift;
my $s = shift;
$p = \( $$p->{$_} ) for split(//, $s);
$$p->{''} = 1;
}
But we need two modifications:
All prefixes of a string must be added when adding a string. For example, adding abc should also add a and ab to the trie.
When adding to the trie, we want to return the length of previously-existing part of the path taken.
So we need:
sub add {
my $p = \shift;
my $s = shift;
my $cp_len = 0;
for (split(//, $s)) {
$p = \( $$p->{$_} );
++$cp_len if $$p->{$_}{''};
$$p->{''} = 1;
}
return $cp_len;
}
Combine (an optimized version of) this with an algorithm to find the longest strings in a list and with an algorithm to remove duplicate strings from a list to get the following solution:
use strict;
use warnings;
use feature qw( say );
sub add {
my $p = \shift;
my $s = shift;
my $cp_len = 0;
for (split(//, $s)) {
++$cp_len if exists($$p->{$_});
$p = \( $$p->{$_} );
}
return $cp_len;
}
my $t;
my $lcp_len = 0; # lcp = longest common prefix
my %lcps;
while (<>) {
chomp;
my $cp_len = add($t, $_)
or next;
if ($cp_len >= $lcp_len) {
if ($cp_len > $lcp_len) {
$lcp_len = $cp_len;
%lcps = ();
}
$lcps{ substr($_, 0, $cp_len) } = 1;
}
}
my #lcps = sort keys %lcps;
if (#lcps) {
say "Longest common prefix(es): #lcps";
} else {
say "No common prefix";
}
Data:
abc
abc
abcd
abcde
hijklx
hijkly
mnopqx
mnopqy
Output:
Longest common prefix(es): hijkl mnopq
The time taken by the above is proportional to the number of input characters.
One way would be to store the information in a hash. In this example, I set the hash key to the length of each prefix, and the value being the actual prefix found.
Note that this method overwrites a key and value if a same-length prefix exists, so you'll always get the last prefix found of the longest length (sort() takes care of finding the longest one).
The regex says "find the first character in the string and capture it, and use that char found in a second capture, and capture as many as there are". This string is then join()ed into a scalar and put into the hash.
use warnings;
use strict;
my %prefixes;
while (<DATA>){
my $prefix = join '', /^(.)(\1+)/;
$prefixes{length $prefix} = $prefix;
}
my $longest = (sort {$b <=> $a} keys %prefixes)[0];
print "$prefixes{$longest}\n";
__DATA__
aaBGFB
aaaJJJJ
jjfkBBB
aaaHGHG
Output:
aaa
You can keep a hash of an array of words keyed by the first character. By definition, if you have words starting with the same letter, those words share at least a one character common prefix of that one letter. Then reduce to the single longest prefix by stepping through the hash by character:
use strict; use warnings;
sub lcp {
(join("\0", #_) =~ /^ ([^\0]*) [^\0]* (?:\0 \1 [^\0]*)* $/sx)[0];
}
my %HoA;
my $longest='';
while (my $line=<DATA>){
$line =~ s/^\s+|\s+$//g ;
push #{ $HoA{substr $line, 0, 1} }, $line if $line=~/^[a-zA-Z]/;
}
for my $key ( sort (keys %HoA )) {
if (scalar #{ $HoA{$key} } > 1){
my $lon=lcp(#{ $HoA{$key} });
my $s = join ', ', map { qq/"$_"/ } #{ $HoA{$key} };
print "lcp: \"$lon\" for ($s)\n";
if (length($lon) > length($longest)) {
$longest=$lon;
}
}
else{
print "$key: no common prefix\n";
}
}
print "\nlongest common prefix is \"$longest\"\n";
__DATA__
aardvark
aaaBGFB
aaaJJJJ
jjfkBBB
aaaHGHG
interspecies
interstellar
interstate
Prints:
lcp: "aa" for ("aardvark", "aaaBGFB", "aaaJJJJ", "aaaHGHG")
lcp: "inters" for ("interspecies", "interstellar", "interstate")
j: no common prefix
longest common prefix is "inters"

Parsing a string into a hash structure in perl

I have the following string:
$str = "list
XYZ
status1 : YES
value1 : 100
status2 : NO
value2 : 200
Thats all";
I want to convert it into a hash using a function which takes this string as input and returns a hash with status1 as key and YES as value for example.
How to do so?
And how to reference the returned hash?
Like always, there's more than one way to do it. Here come five.
Pure regular expressions (YEAH!)
I think this is the coolest one. The regex returns a list of all captures which is exactly the list we want to initialize the hash with:
my %regex = $str =~ /(\S+)\s*:\s*(\S+)/g;
Iterative
This is the most straightforward way for most programmers, I think:
my #lines = split /\R/ => $str;
my %iterative = ();
for (#lines) {
next unless /(\S+)\s*:\s*(\S+)/;
$iterative{$1} = $2;
}
Nothing to explain here. I first split the string in lines, then iterate over them, leaving out lines that don't look like foo : bar. Done.
List processing
Writing everything as a big list expression feels a little bit hackish, but maybe this is interesting to learn more ways to express stuff:
my %list = map { /(\S+)\s*:\s*(\S+)/ and $1 => $2 }
grep { /:/ }
split /\R/ => $str;
Read from right to left: Like in the example above we start with splitting the string in lines. grep filters the lines for : and in the final map I transform matching line strings in a list of length two, with a key and a value.
List reducing
Non-trivial use-cases of List::Util's reduce function are very rare. Here's one, based on the list approach from above, returning a hash reference:
my $reduced = reduce {
$a = { $a =~ /(\S+)\s*:\s*(\S+)/ } unless ref $a;
$a->{$1} = $2 if $b =~ /(\S+)\s*:\s*(\S+)/;
return $a;
} grep { /:/ } split /\R/ => $str;
State machine
Here's a funny one with regex usage for white-space separation only. It needs to keep track of a state:
# preparations
my $state = 'idle';
my $buffer = undef;
my %state = ();
my #words = split /\s+/ => $str;
# loop over words
for my $word (#words) {
# last word was a key
if ($state eq 'idle' and $word eq ':') {
$state = 'got_key';
}
# this is a value for the key in buffer
elsif ($state eq 'got_key') {
$state{$buffer} = $word;
$state = 'idle';
$buffer = undef;
}
# remember this word
else {
$buffer = $word;
}
}
Just for fun (note that I recommend using one of memowe's) here is one that (ab)uses the YAML:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML;
my $str = "list
XYZ
status1 : YES
value1 : 100
status2 : NO
value2 : 200
Thats all";
$str = join "\n", grep { /:/ } split "\n", $str;
my $hash = Load "$str\n";
#!/usr/bin/perl
use warnings;
$\="\n";
sub convStr {
my $str = $_[0];
my %h1=();
while ($str =~m/(\w+)\s+:\s+(\w+)/g) {
$h1{$1} =$2;
}
return \%h1;
}
my $str = "list
XYZ
status1 : YES
value1 : 100
status2 : NO
value2 : 200
Thats all";
my $href=convStr($str);
foreach (keys(%$href)) {
print $_ , "=>", $href->{$_};
}
On running this, I get:
status2=>NO
value1=>100
status1=>YES
value2=>200
my %hhash;
my #lines = split /\s+\n/, $str;
foreach (#lines)
{
$_=~s/^\s+//g;
if(/:/)
{
$key=(split(/:/))[0];
$value=(split(/:/))[1];
$hhash{$key}=$value;
}
}

Subkey comparison function for sorting

I need a Perl comparison function that can be used with sort.
Each key is a text string that has an arbitrary number of subkeys, separated by delimiter characters (dot, colon, space, and slash). Some subkeys are numeric, and need to be sorted numerically. The key format and number of subkeys varies. Therefore, the comparison has to handle one key being longer than the other, and has to handle the case where a subkey is numeric in one key but not in another (in which case a textual comparison is appropriate for that subkey).
This works, but I bet there are better solutions:
use warnings;
use strict;
use Scalar::Util qw[looks_like_number];
sub hier_cmp {
my $aa = $a;
my $bb = $b;
# convert all delims (. : / space) to the same delim
$aa =~ tr/.:\/ /::::/;
$bb =~ tr/.:\/ /::::/;
my #lista = split(":", $aa);
my #listb = split(":", $bb);
my $result;
for my $ix (0 .. min($#lista, $#listb)) {
if (exists($lista[$ix]) && exists($listb[$ix])) {
if ( looks_like_number($lista[$ix]) && looks_like_number($listb[$ix])) {
# compare numerically
$result = ($lista[$ix] <=> $listb[$ix]);
} else {
# compare as strings
$result = ($lista[$ix] cmp $listb[$ix]);
}
if ($result == 0) {
next;
}
return $result;
} elsif (exists($lista[$ix])) {
return 1;
} else {
return -1;
}
}
}
For my purposes, readability is more important than speed. This is just for an internal tool, and lists will rarely have more than hundreds of elements. However, any opportunity to learn something is good.
As you can see, I'm not a perl wizard. Even trivial improvements on my code would be appreciated.
Thanks!
That looks like natural sorting. There are several modules on CPAN that already do that such as Sort::Naturally or Sort::Key::Natural.
For instance:
use Sort::Key::Natural qw(natsort);
my #sorted = natsort #data;
It would help if you gave us some data to test with, but this code passes a few basic tests and it looks right.
It simplifies the problem by using the List::MoreUtils function pairwise to create an array of field pairs.
Then it is just a matter of checking whether only one is defined, when one of the lists has come to an end before the other and should be sorted first; if they are both numeric, when they should be compared with a numeric comparison; or otherwise simply compare them as strings.
If the end of the array of pairs is reached then everything has matched and zero is returned to indicate equiality.
Update
I have changed this code to remove the dependency on List::MoreUtils::pairwise.
use strict;
use warnings;
use Scalar::Util 'looks_like_number';
sub hier_cmp {
our ($a, $b);
my #a = split m|[.: /]+|, $a;
my #b = split m|[.: /]+|, $b;
for my $i (0 .. $#a > $#b ? $#a : $#b) {
my #ab = ( $a[$i], $b[$i] );
if (grep defined, #ab < 2) {
return defined $ab[0] ? 1 : -1;
}
else {
my $numeric = grep(looks_like_number($_), #ab) == 2;
my $result = $numeric ? $ab[0] <=> $ab[1] : $ab[0] cmp $ab[1];
return $result if $result;
}
}
return 0;
}

Trouble recording unique regex output to array in perl

The goal of the following code sample is to read the contents of $target and assign all unique regex search results to an array.
I have confirmed my regex statement works so I am simplifying that so as not to focus on it.
When I execute the script I get a list of all the regex results, however, the results are not unique which leads me to believe that my manipulation of the array or my if (grep{$_ eq $1} #array) { check is causing a problem(s).
#!/usr/bin/env perl
$target = "string to search";
$inc = 0;
$once = 1;
while ($target =~ m/(regex)/g) { #While a regex result is returned
if ($once) { #If $once is not equal to zero
#array[$inc] = $1; #Set the first regex result equal to #array[0]
$once = 0; #Set $once equal to zero so this is not executed more than once
} else {
if (grep{$_ eq $1 } #array ) { #From the second regex result, check to see if the result is already in the array
#If so, do nothing
} else {
#array[$inc] = $1; #If it is not, then assign the regex search result to the next unused position in the array in any position.
$inc++; #Increment to next unused array position.
}
}
}
print #array;
exit 0;
how about this:
while ($target =~ m/(regex)/g) {
$hash{$1}++;
}
print keys %hash;
Update:
# if the order matters
while ($target =~ m/(a.)/g) {
$hash{$1} = ++$i unless $hash{$1};
}
#array = sort {$hash{$a} <=> $hash{$b}} keys %hash;