Compare 2 sets of numbers contained in strings - perl

I have 2 scalars as below:
$a = '100 105 010';
$b = '010 105 100';
How do I compare if both has same set of values? order doesn't matter.
one way is to arrange it in ascending order and compare, is there a better way?

You can split each string into an array and sort and compare arrays. By hand:
use warnings;
use strict;
my $x = '100 105 1 010';
my $y = '010 105 100 2';
my #xs = sort { $a <=> $b } split ' ', $x;
my #ys = sort { $a <=> $b } split ' ', $y;
if (#xs != #ys) {
print "Differ in number of elements.\n";
}
else {
for (0..$#xs) {
print "Differ: $xs[$_] vs $ys[$_]\n" if $xs[$_] != $ys[$_];
}
}
# For boolean comparison put it in a sub
print arr_num_eq(\#xs, \#ys), "\n";
sub arr_num_eq {
my ($ra, $rb) = #_;
return 0 if #$ra != #$rb;
$ra->[$_] != $rb->[$_] && return 0 for 0..$#$ra;
return 1;
}
The sorting can be moved to the sub as well, which would then take strings. The way it stands it can be used for comparison of existing arrays as well. Please add argument checking.
There is a number of modules that have this capability. The perm from Array::Compare hides the sorting above, but internally joins sorted arrays into strings thus duplicating the work here since we started with strings. The List::AllUtils certainly offers this as well with its long list of utilities.
See this post, for example, for a few methods (just not the smart match ~~), and for benchmarks if efficiency is a concern.
Using the mentioned implementation idea from Array::Compare, per comment by ysth
sub str_num_eq {
return join(' ', sort { $a <=> $b } split / /, $_[0])
eq join(' ', sort { $a <=> $b } split / /, $_[1])
}
What the most suitable method is depends on what this is for and how it is used. Is it only a boolean comparison, or will more be done if they are found to differ? How does it come about in your program flow? What are typical sizes of strings, how often is it run? Are the strings most often the same or different, do they typically differ a lot or a little? Etc.

Without modules, you can use hashes:
#!/usr/bin/perl
use warnings;
use strict;
my $x = '100 105 010 2';
my $y = '010 105 100 100 1';
my (%hx, %hy);
$hx{$_}++ for split ' ', $x;
$hy{$_}++ for split ' ', $y;
for my $k (keys %hx) {
if (! exists $hy{$k}) {
print "$k missing in y\n";
} elsif ($hy{$k} != $hx{$k}) {
print "$k has different number of occurences\n";
}
delete $hy{$k};
}
print "$_ missing in x\n" for keys %hy;
$a and $b are special variables used in sort, so I renamed them to $x and $y.
split turns the strings into lists. Each hash counts how many times a member occurs in the list.
See also Perl FAQ 4.

Something else try with pattern matching,
This is not a straight forward but it will work.
Construct the pattern by anyone of your scalar value. Then check the another string by the constructed pattern.
my $a = '100 100 105';
my $b = '100 105 100';
my #b_ary = split(" ",$b);
my $regex = join'\b|\b', #b_ary;
my $word_length = #b_ary * 2 - 1; #Count the number of words and space.
my $rgx = qr"^(?:\b$regex\b|\s){$word_length}$"; #`{n}` match word exact n times
if($a=~m/$rgx/)
{
print "same values\n";
}
else
{
print "Not a same values\n";
}

The answer is already posted above. This is just in case you want to remove the white spaces and compare each number.
$x = '100 105 010';
$y = '010 105 100';
join("",sort split "",join("",split " ",$x)) eq join("",sort split "",join("",split " ",$y));

Related

Passing strings as array to subroutine and return count of specific char

I was trying to think in the right way to tackle this:
-I would to pass say, n elements array as argument to a subroutine. And for each element match two char types S and T and print for each element, the count of these letters. So far I did this but I am locked and found some infinite loops in my code.
use strict;
use warnings;
sub main {
my #array = #_;
while (#array) {
my $s = ($_ = tr/S//);
my $t = ($_ = tr/T//);
print "ST are in total $s + $t\n";
}
}
my #bunchOfdata = ("QQQRRRRSCCTTTS", "ZZZSTTKQSST", "ZBQLDKSSSS");
main(#bunchOfdata);
I would like the output to be:
Element 1 Counts of ST = 5
Element 2 Counts of ST = 6
Element 3 Counts of ST = 4
Any clue how to solve this?
while (#array) will be an infinite loop since #array never gets smaller. You can't read into the default variable $_ this way. For this to work, use for (#array) which will read the array items into $_ one at a time until all have been read.
The tr transliteration operator is the right tool for your task.
The code needed to get your results could be:
#!/usr/bin/perl
use strict;
use warnings;
my #data = ("QQQRRRRSCCTTTS", "ZZZSTTKQSST", "ZBQLDKSSSS");
my $i = 1;
for (#data) {
my $count = tr/ST//;
print "Element $i Counts of ST = $count\n";
$i++;
}
Also, note that my $count = tr/ST//; doesn't require the binding of the transliteration operator with $_. Perl assumes this when $_ holds the value to be counted here. Your code tried my $s = ($_ = tr/S//); which will give the results but the shorter way I've shown is the preferred way.
(Just noticed you had = instead of =~ in your statement. That is an error. Has to be $s = ($_ =~ tr/S//);)
You can combine the 2 sought letters as in my code. Its not necessary to do them separately.
I got the output you want.
Element 1 Counts of ST = 5
Element 2 Counts of ST = 6
Element 3 Counts of ST = 4
Also, you can't perform math operations in a quoted string like you had.
print "ST are in total $s + $t\n";
Instead, you would need to do:
print "ST are in total ", $s + $t, "\n";
where the operation is performed outside of the string.
Don't use while to traverse an array - your array gets no smaller, so the condition is always true and you get an infinite loop. You should use for (or foreach) instead.
for (#array) {
my $s = tr/S//; # No need for =~ as tr/// works on $_ by default
my $t = tr/T//;
print "ST are in total $s + $t\n";
}
Why tr///??
sub main {
my #array = #_;
while (#array) {
my $s = split(/S/, $_, -1) - 1;
my $t = split(/T/, $_, -1) - 1;
print "ST are in total $s + $t\n";
}
}

How can I use map to clean up this Perl code?

The code below does what I want it to. It prints the list and adds an asterisk at the end of lines that are not sequential, e.g. if you skip from 1 to 3 or 3 to 5.
use strict;
use warnings;
#note: thanks to all who helped with formatting issues.
#note: I recognize a hash would be a much better option for what I want to do.
my #printy = ("1 -> this",
"5 -> that",
"3 -> the other",
"6 -> thus and such");
#printy = sort {num($a) <=> num($b)} #printy;
my $thisID = 0;
my $lastID = 0;
#print out (line)* if initial number is >1 more than previous, or just (line) otherwise
for (#printy)
{
$thisID = $_; $thisID =~s/ .*//g;
if ($thisID - $lastID != 1) { $_ =~ s/$/ \*/; }
$lastID = $thisID;
}
print join("\n", #printy) . "\n";
sub num
{
my $x = $_[0];
$x =~ s/ .*//;
return $x;
}
But I think I can do better. It feels tangled, and my intuition tells me I'm missing something powerful that could do the job more easily, one that takes maybe two lines.
Now I've used the map() command before, but only to look at/modify an element, not how it compares to a previous element. Can anyone recommend a way to make this more succinct? Thanks!
Since Perl promotes TIMTOWTDI, map may seem like an attractive option at first. Let's see how it fares for this task:
Schwartzian thought process
Since access to neighboring elements is necessary, it's convenient to work with the indices. Since for n elements, there are n-1 pairs of neighbors, you don't have to loop n times. In this case, let's start with 1 instead of the usual 0:
1 .. $#printy
One can access neighboring elements by calling the relevant indices inside map.
map { my $prev = $printy[$_-1]; my $curr = $printy[$_] } 1 .. $#printy;
An array slice expresses this more succinctly:
map { my ( $prev, $curr ) = #printy[$_-1,$_]; } 1 .. $#printy;
Let's introduce the real logic related to comparing numbers using the num subroutine:
map {
my ( $prev, $curr ) = #printy[$_-1,$_];
if ( num($curr) - num($prev) > 1 ) {
"$curr *";
}
else {
$curr;
}
} 1 .. $#printy;
Which is equivalent to:
map {
my ( $prev, $curr ) = #printy[$_-1,$_];
$curr .= " *" if num($curr) - num($prev) > 1;
$curr
} 1 .. $#printy;
Remember not to forget the first element:
#printy = ( $printy[0],
map {
my ( $prev, $curr ) = #printy[$_-1,$_];
$curr .= " *" if num($curr) - num($prev) > 1;
$curr
} 1 .. $#printy
);
Given the final result, I'm not so sure I'd use map for this:
It's hard to read
There's a lot going on
The next person working on your code will love you
No map needed, just add some spaces here and there, and remove stuff that's not needed ($_, join, etc.). Also, reuse num() inside the loop, no need to repeat the regex.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #printy = sort { num($a) <=> num($b) }
'1 -> this', '5 -> that', '3 -> the other', '6 -> thus and such';
my $thisID = my $lastID = 0;
for (#printy) {
$thisID = num($_);
$_ .= ' *' if $thisID - $lastID != 1;
$lastID = $thisID;
}
say for #printy;
sub num {
my ($x) = #_;
$x =~ s/ .*//;
return $x;
}
Also, reimplementing num using /(\d+)/ instead of substitution might tell its purpose more clearly.
I agree with choroba that there is no need for a map here. But I'd refactor a little bit anyway.
use strict;
use warnings;
use feature 'say';
my #printy = ( "1 -> this", "5 -> that", "3 -> the other", "6 -> thus and such" );
my $last_id = 0;
foreach my $line ( sort { num($a) <=> num($b) } #printy ) {
my $current_id = num($line);
$line .= ' *' unless $current_id - $last_id == 1;
$last_id = $current_id;
}
say for #printy;
# returns the number at the start of a string
sub num {
$_[0] =~ m/^(\d+)/;
return $1;
}
I moved the sort down into the foreach, because you shouldn't rely on the fact that your input is sorted.
I changed the variable names to go with the convention that there should be no capital letters in variable names, and I used say, which is like print with a system-specific newline at the end.
I also moved the $current_id into the loop. That doesn't need to be visible outside because it's lexical to that loop. Always declare variables in the smallest possible scope.
You already had that nice num function, but you're not using it inside of the loop to get the $current_id. Use it.
I think if the input gets very long, it might make sense to go with a map construct because sorting will be very expensive at some point. Look at the Schwartzian transform for caching the calculation before sorting. You could then do everything at once. But it won't be readable for a beginner any more.
Your data yells "Use a hash!" to me.
If we had a hash,
my %printy =
map { split / -> / }
"1 -> this", "5 -> that", "3 -> the other", "6 -> thus and such";
The solution would simply be:
my #order = sort { $a <=> $b } keys(%printy);
for my $i (#order[1..$#order]) {
$printy{$i} .= ' *'
if !exists($printy{$i-1});
}
print "$_ -> $printy{$_}\n"
for #order;
This can be golfed down, though I'm not sure it's worth it.
my $count;
print "$_ -> $printy{$_}".( !$count++ || exists($printy{$_-1}) ? "" : " *" )."\n"
for
sort { $a <=> $b }
keys(%printy);
That for can be converted into a map, but it just makes it less efficient.
my $count;
print
map { "$_ -> $printy{$_}".( !$count++ || exists($printy{$_-1}) ? "" : " *" )."\n" }
sort { $a <=> $b }
keys(%printy);
I'd also advise to clean up the code and keep the loop. However, here is a map based way.
The code uses your sorted #printy and the num sub.
my #nums = map { num($_) } #printy;
my #res = map {
$nums[$_] == $nums[$_-1] + 1 # invariably false for $_ == 0
? $printy[$_] : $printy[$_] .= ' *';
}
(0..$#printy);
say for #res;
This works for the first element since it does not come after the last, given that we're sorted. That may be a bit diabolical though and it needs a comment in code. So perhaps better spell it out
my #res = map {
($nums[$_] == $nums[$_-1] + 1) ? $printy[$_] : $printy[$_] .= ' *';
}
(1..$#printy);
unshift #res, $printy[0];
Not as clean but clear.
All this does extra work when compared to a straight loop, of course.
I'm sorry, but your code is a shambles, and you need to do much more than use map to clean up this code
You have no indentation and multiple statements on a single line, and you haven't thought through your logic. Your code is unmaintainable
Here's how I would write this. It builds a parallel array of IDs, and then sorts a list of indices so that both the IDs and the original data are in order
If it makes you happier, it does include map
use strict;
use warnings 'all';
my #data = ( '1 -> this', '5 -> that', '3 -> the other', '6 -> thus and such' );
my #ids = map { /(\d+)/ } #data;
my #indexes = sort { $ids[$a] <=> $ids[$b] } 0 .. $#ids;
my $last_id;
for my $i ( #indexes ) {
print $data[$i];
print ' *' if defined $last_id and $ids[$i] > $last_id + 1;
print "\n";
$last_id = $ids[$i];
}
output
1 -> this
3 -> the other *
5 -> that *
6 -> thus and such

Selecting highest count of element except when...

So i have been working on this perl script that will analyze and count the same letters in different line spaces. I have implemented the count to a hash but am having trouble excluding a " - " character from the output results of this hash. I tried using delete command or next if, but am not getting rid of the - count in the output.
So with this input:
#extract = ------------------------------------------------------------------MGG-------------------------------------------------------------------------------------
And following code:
#Count selected amino acids.
my %counter = ();
foreach my $extract(#extract) {
#next if $_ =~ /\-/; #This line code does not function correctly.
$counter{$_}++;
}
sub largest_value_mem (\%) {
my $counter = shift;
my ($key, #keys) = keys %$counter;
my ($big, #vals) = values %$counter;
for (0 .. $#keys) {
if ($vals[$_] > $big) {
$big = $vals[$_];
$key = $keys[$_];
}
}
$key
}
I expect the most common element to be G, same as the output. If there is a tie in the elements, say G = M, if there is a way to display both in that would be great but not necessary. Any tips on how to delete or remove the '-' is much appreciated. I am slowly learning perl language.
Please let me know if what I am asking is not clear or if more information is needed, thanks again kindly for all the comments.
Your data doesn't entirely make sense, since it's not actually working perl code. I'm guessing that it's a string divided into characters. After that it sounds like you just want to be able to find the highest frequency character, which is essentially just a sort by descending count.
Therefore the following demonstrates how to count your characters and then sort the results:
use strict;
use warnings;
my $str = '------------------------------------------------------------------MGG-------------------------------------------------------------------------------------';
my #chars = split '', $str;
#Count Characteres
my %count;
$count{$_}++ for #chars;
delete $count{'-'}; # Don't count -
# Sort keys by count descending
my #keys = sort {$count{$b} <=> $count{$a}} keys %count;
for my $key (#keys) {
print "$key $count{$key}\n";
}
Outputs:
G 2
M 1
foreach my $extract(#extract) {
#next if $_ =~ /\-/
$_ setting is suppressed by $extract here.
(In this case, $_ keeps value from above, e.g. routine argument list, previous match, etc.)
Also, you can use character class for better readability:
next if $extract=~/[-]/;

Why we can not use "le" in for loop and can use in "if condition"

I've made this program as a beginner. Need to clarify something!
Why I m not able to use "le" in a for loop given below but I'm able to use it in "if condition". What is the reason behind that. ?
print "Type the number upto series should run\n";
my $var;
$var = int<STDIN>;
chomp $var;
my ($i, $t);
my $x = 0;
my $y = 1;
if($var eq 1) {
print "\nYour Series is : $x\n";
} elsif($var eq 2){
print "\nYour Series is : $x $y\n";
} elsif($var ge 2) {
print "Your Series is : $x $y ";
for($i = 1; $i le $var - 2; $i++) {
# Why the condition not working when I'm using "le"
# but it does work when I'm using "<="
$t = $x + $y;
$x = $y;
$y = $t;
print "$t ";
}
print "\n";
} else {
print "Error: Enter a valid postive integer\n";
}
You are free to use le and <= as you like. But you should be aware that they are completely different operators.
Numeric comparision operators are
== != < <= > >= <=>
The equivalent string comparision operators are
eq ne lt le gt ge cmp
Strings and numbers are converted to each other as needed. This means for example that
3 ge 20 # true: 3 is string-greater than 20
11 le 2 # true: 11 is string-less-or-equal than 2
because lexicographic ordering compares character by character. Using the numeric operators when you want to treat the contents of your $variables as numbers is therefore preferable and will yield the correct results.
Note that Perl translates between strings and numbers invisibly. It is advisable to use warnings, so that you get a helpful message when a string can't represent a number (e.g. "a").
The correct answer has been given, that ge does string comparison, where for example 11 is considered less than 2. The solution is to use numerical comparison, == and >=.
I thought I would help demonstrate the problem you are having. Consider this demonstration of how the default sort works:
$ perl -le 'print for sort 1 .. 10'
1
10
2
3
4
5
6
7
8
9
As you can see, it considers 10 lower than 2, and that is because of string comparison, which is the default mode for sort. This is how the default sorting routine looks under the hood:
sort { $a cmp $b }
cmp belongs to the string comparison operators, like eq, le, and ge. These operators are described here (cmp is below that).
For the sort routine to do what we expect in the above example, we would have to use a numeric comparison operator, which is the "spaceship operator" <=>:
sort { $a <=> $b }
In your case, you can try out your problem with this one-liner:
$ perl -nlwe 'print $_ ge 2 ? "Greater or equal" : "Lesser"'
1
Lesser
2
Greater or equal
3
Greater or equal
11
Lesser
When you compare numbers with eq, le, gt..... etc; they first be converted to string. And strings will be checkd for alphabatical order, so "11" will be less then "2" here.
So you should be using ==,<=,>= ...etc when comparing numbers.
I thought you may like to see a more Perl-like program that produces the Fibonnaci series like yours.
use strict;
use warnings;
print "Type the length of the series\n";
chomp(my $length = <>);
unless ($length and $length !~ /\D/ and $length > 0) {
print "Error: Enter a positive integer\n";
}
print "\n";
my #series = (0, 1);
while (#series < $length) {
push #series, $series[-2] + $series[-1];
}
print "#series[0..$length-1]\n";

Subkey comparison function for sorting

I need a Perl comparison function that can be used with sort.
Each key is a text string that has an arbitrary number of subkeys, separated by delimiter characters (dot, colon, space, and slash). Some subkeys are numeric, and need to be sorted numerically. The key format and number of subkeys varies. Therefore, the comparison has to handle one key being longer than the other, and has to handle the case where a subkey is numeric in one key but not in another (in which case a textual comparison is appropriate for that subkey).
This works, but I bet there are better solutions:
use warnings;
use strict;
use Scalar::Util qw[looks_like_number];
sub hier_cmp {
my $aa = $a;
my $bb = $b;
# convert all delims (. : / space) to the same delim
$aa =~ tr/.:\/ /::::/;
$bb =~ tr/.:\/ /::::/;
my #lista = split(":", $aa);
my #listb = split(":", $bb);
my $result;
for my $ix (0 .. min($#lista, $#listb)) {
if (exists($lista[$ix]) && exists($listb[$ix])) {
if ( looks_like_number($lista[$ix]) && looks_like_number($listb[$ix])) {
# compare numerically
$result = ($lista[$ix] <=> $listb[$ix]);
} else {
# compare as strings
$result = ($lista[$ix] cmp $listb[$ix]);
}
if ($result == 0) {
next;
}
return $result;
} elsif (exists($lista[$ix])) {
return 1;
} else {
return -1;
}
}
}
For my purposes, readability is more important than speed. This is just for an internal tool, and lists will rarely have more than hundreds of elements. However, any opportunity to learn something is good.
As you can see, I'm not a perl wizard. Even trivial improvements on my code would be appreciated.
Thanks!
That looks like natural sorting. There are several modules on CPAN that already do that such as Sort::Naturally or Sort::Key::Natural.
For instance:
use Sort::Key::Natural qw(natsort);
my #sorted = natsort #data;
It would help if you gave us some data to test with, but this code passes a few basic tests and it looks right.
It simplifies the problem by using the List::MoreUtils function pairwise to create an array of field pairs.
Then it is just a matter of checking whether only one is defined, when one of the lists has come to an end before the other and should be sorted first; if they are both numeric, when they should be compared with a numeric comparison; or otherwise simply compare them as strings.
If the end of the array of pairs is reached then everything has matched and zero is returned to indicate equiality.
Update
I have changed this code to remove the dependency on List::MoreUtils::pairwise.
use strict;
use warnings;
use Scalar::Util 'looks_like_number';
sub hier_cmp {
our ($a, $b);
my #a = split m|[.: /]+|, $a;
my #b = split m|[.: /]+|, $b;
for my $i (0 .. $#a > $#b ? $#a : $#b) {
my #ab = ( $a[$i], $b[$i] );
if (grep defined, #ab < 2) {
return defined $ab[0] ? 1 : -1;
}
else {
my $numeric = grep(looks_like_number($_), #ab) == 2;
my $result = $numeric ? $ab[0] <=> $ab[1] : $ab[0] cmp $ab[1];
return $result if $result;
}
}
return 0;
}