Apply my hash to a string to get numbers from letters - perl

I am trying to convert letters to their respective number in the alphabet. I have a hash that I think should work I just dont know how to apply it to my string.
string:
my $string = "abc";
and my hash:
#hash{("a".."z")} = (1..26);
how do i get my string to be 123 in this case?

substitution
use warnings;
use strict;
my $string = "abc";
my %hash;
#hash{("a".."z")} = (1..26);
$string =~ s/(.)/$hash{$1}/g;
print "$string\n";
__END__
123
UPDATE: Another way, without a hash, is to use ord
my $string = "abc";
$string =~ s/(.)/ord($1) - 96/ge;
print "$string\n";

General solution:
my %lookup; #lookup{"a".."z"} = 1..26;
my $pat = '(?:'.( join '|', map quotemeta, keys %lookup ).')';
s/($pat)/$lookup{$1}/g;
Assumes keys consist of at most one character:
my %lookup; #lookup{"a".."z"} = 1..26;
my $class = '['.( join '', map quotemeta, keys %lookup ).']';
s/($class)/$lookup{$1}/g;
"Hardcoded":
$string =~ s/([a-z])/ ord($1) - ord('a') + 1 /ge;

Related

Perl Regex to find all strings formed using substring

Using Perl RegEx, How to find largest superstring in a sentence, when superstring is a repeatation of 1 or more substring.
For Ex:
$sentence = "zsabcxyzabcabcabccde_xdrabcabcrte__23abcerabcabccbabacxyz";
$subStr = "abc";
I want to find all the occurrences of abc and largest one in that.
Output:
abc
abcabcabc
abcabc
abc
abcabc
Largest string is abcabcabc
Compile a regex using a quantifier. + says 'one or more'.
So your "substr" becomes ((?:abc)+) the outer brackets to 'capture', the inner brackets non capturing. Otherwise you'll also get the 'partial' hits in the array - although the net result isn't changed much, because the longest hit will still sort to the top.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $sentence = "zsabcxyzabcabcabccde_xdrabcabcrte__23abcerabcabccbabacxyz";
my $substring = "abc";
my $regex = qr/((?:$substring)+)/;
my #matches = $sentence =~ m/$regex/g;
print Dumper \#matches;
#Then sort it:
my ( $longest ) = sort { length ( $b ) <=> length ( $a ) } #matches;
print $longest,"\n";
Try as below one
use warnings;
use strict;
my $sentence = "zsabcxyzabcabcabccde_xdrabcabcrte__23abcerabcabccbabacxyz";
my ($larg) = sort{length($b)<=>length($a)} $sentence =~ m/((?:abc)+)/g;
print $larg,"\n";
If don't want to store it means, make a loop
use warnings;
use strict;
my $sentence = "zsabcxyzabcabcabccde_xdrabcabcrte__23abcerabcabccbabacxyzabcabcabcabc";
my $longstr;
my $len = 0;
while($sentence=~m/((?:abc)+)/g)
{
$longstr = $1 and $len = length($1) if(length($1) > $len)
}
the above one is in single regex with (?{}) but not recommended
my $sentence = "zsabcxyzabcabcabccde_xdrabcabcrte__23abcerabcabccbabacxyzabcabcabcabc";
my $lar = 0;
my $larg;
$sentence=~m/((?:abc)+)(?{ $larg = $1 and $lar=(length $1) if(length $1 > $lar )}) \G/x;
Thanks to #mkHun and #Sobrique...
I have used #matches = $str =~ m/(abc)+/ng; and then sorting.
/n makes it looks much simpler which is available from 5.22, i think.
The best way is next: ((?:abc)+) with sorting after finding matches

What is the best way to create a unique array of values from a string?

Is there a nice one liner in perl that lets you create an array of unique values from a string?
$str = "bob-2-4 asdfasdfasdf bob-2-4 asdfasdf bob-3-1";
my #unique = $str =~ m/(bob-\d-\d)/g;
# array is now "bob-2-4, bob-2-4, bob-3-1"
I want the unique array to only contain "bob-2-4, bob-3-1" however.
Without modules:
sub uniq { my %seen; grep !$seen{$_}++, #_ }
With a commonly-used module:
use List::MoreUtils qw( uniq );
Usage:
my $str = "bob-2-4 asdfasdfasdf bob-2-4 asdfasdf bob-3-1";
my #unique = uniq $str =~ m/(bob-\d-\d)/g;
say for #unique;
If you're talking about uniques, the tool for the job is a hash.
You can do:
#!/usr/bin/env perl
use strict;
use warnings;
my $str = "bob-2-4 asdfasdfasdf bob-2-4 asdfasdf bob-3-1";
my %unique = map { $_ => 1 } $str =~ m/(bob-\d-\d)/g;
print keys %unique;
As a one liner, straight into your array, you could:
my #unique = keys %{{map { $_ => 1 } $str =~ m/(bob-\d-\d)/g}};
This does approximately the same thing - use map to construct the hash, and then keys to extract the unique values. Note - keys doesn't return a defined order.
If ordering is important, you could also use grep, but you'll still need a hash:
my $str = "bob-2-4 asdfasdfasdf bob-2-4 asdfasdf bob-3-1";
my %seen;
my #unique = grep { not $seen{$_}++ } $str =~ m/(bob-\d-\d)/g;
print #unique;

Bug with parsing by Text::CSV_XS?

Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".
The funny thing is, if I remove the double quote in the string $a, then it will do splitting.
Wonder if it's a bug or I missed something. Thanks!
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
EDIT:
In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?
$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';
You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.
Disable both quote_char and escape_char, like this
use strict;
use warnings;
use Text::CSV_XS;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my $space_sep = Text::CSV_XS->new({
sep_char => ' ',
quote_char => undef,
escape_char => undef,
});
$space_sep->parse($string);
for my $field ($space_sep->fields) {
print "$field\n";
}
output
id=firewall
time="2010-05-09
16:07:21
UTC"
But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.
In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.
Update
As #ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;
use Data::Dump;
dd \%data;
output
{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }
Update
This program will extract the two name=value strings and print them on separate lines.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for #fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"
If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = quotewords(' ', 0, $str);
print Dumper \#fields;
my %hash = map split(/=/, $_, 2), #fields;
print Dumper \%hash;
Output:
$VAR1 = [
'id=firewall',
'time=2010-05-09 16:07:21 UTC'
];
$VAR1 = {
'time' => '2010-05-09 16:07:21 UTC',
'id' => 'firewall'
};
I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.

perl: how to make compact name from a numbered sequence

[perl 5.8.8]
I have a sequence of names of things like:
names='foobar1304,foobar1305,foobar1306,foobar1307'
where the names differ only by a contiguous string of digits somewhere in the name. The strings of digits in any sequence are all of the same length, and the digit strings form a continuous numeric sequence with no skips, e.g. 003,004,005.
I want a compact representation like:
compact_name='foobar1304-7'
(The compact form is just a name, so it's exact form is negotiable.)
There will usually only be <10 things, though some sets might span a decade, e.g.
'foobaz2205-11'
Is there some concise way to do this in perl? I'm not a big perl hacker, so be a little gentle...
Bonus points for handling embedded sequences like:
names='foobar33-pqq,foobar34-pqq,foobar35-pqq'
The ideal script would neatly fall back to 'firstname2301-lastname9922' in case it can't identify a sequence in the names.
I am not sure I got your specification, but it works somehow:
#!/usr/bin/perl
use warnings;
use strict;
use Test::More;
sub compact {
my $string = shift;
my ($name, $value) = split /=/, $string;
$name =~ s/s$// or die "Cannot create compact name for $name.\n"; #/ SO hilite bug
$name = 'compact_' . $name;
$value =~ s/^'|'$//g; #/ SO hilite bug
my #values = split /,/, $value; #/ SO hilite bug
my ($prefix, $first, $suffix) = $values[0] =~ /^(.+?)([0-9]+)(.*)$/;
my $last = $first + $#values;
my $same = 0;
$same++ while substr($first, 0, $same) eq substr($last, 0, $same);
$last = substr $last, $same - 1;
for my $i ($first .. $first + $#values) {
$values[$i - $first] eq ($prefix . $i . $suffix)
or die "Invalid sequence at $values[$i-$first].\n";
}
return "$name='$prefix$first-$last$suffix'";
}
is( compact("names='foobar1304,foobar1305,foobar1306,foobar1307'"),
"compact_name='foobar1304-7'");
is( compact("names='foobaz2205,foobaz2206,foobaz2207,foobaz2208,foobaz2209,foobaz2210,foobaz2211'"),
"compact_name='foobaz2205-11'");
is( compact("names='foobar33-pqq,foobar34-pqq,foobar35-pqq'"),
"compact_name='foobar33-5-pqq'");
done_testing();
Someone sure will post an more elegant solution, but the following
use strict;
use warnings;
my $names='foobar1308-xy,foobar1309-xy,foobar1310-xy,foobar1311-xy';
my #names = split /,/,$names;
my $pfx = lcp(#names);
my #nums = map { m/$pfx(\d*)/; $1 } #names;
my $first=shift #nums;
my $last = pop #nums;
my $suf=$names[0];
$suf =~ s/$pfx\d*//;
print "$pfx\{$first-$last}$suf\n";
#https://gist.github.com/3309172
sub lcp {
my $match = shift;
substr($match, (($match ^ $_) =~ /^\0*/, $+[0])) = '' for #_;
$match;
}
prints:
foobar13{08-11}-xy

stripping off numbers and alphabetics in perl

I have an input variable, say $a. $a can be either number or string or mix of both.
My question is how can I strip off the variable to separate numeric digits and alphabetic characters?
Example;
$a can be 'AB9'
Here I should be able to store 'AB' in one variable and '9' in other.
How can I do that?
Check this version, it works with 1 or more numeric and alphabetic characters in a variable.
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $var = '11a';
my (#digits, #alphabetics);
while ($var =~ /([a-zA-Z]+)/g) {
push #alphabetics, $1;
}
while ($var =~ /(\d+)/g) {
push #digits, $1;
}
print Dumper(\#alphabetics);
print Dumper(\#digits);
Here's one way to express it very shortly:
my ($digits) = $input =~ /(\d+)/;
my ($alpha) = $input =~ /([a-z]+)/i;
say 'digits: ' . ($digits // 'none');
say 'non-digits: ' . ($alpha // 'none');
It's important to use the match operator in list context here, otherwise it would return if the match succeeded.
If you want to get all occurrences in the input string, simply change the scalar variables in list context to proper arrays:
my #digits = $input =~ /(\d+)/g;
my #alpha = $input =~ /([a-z]+)/gi;
say 'digits: ' . join ', ' => #digits;
say 'non-digits: ' . join ', ' => #alpha;
For my $input = '42AB17C', the output is
digits: 42, 17
non-digits: AB, C