Subkey comparison function for sorting

Subkey comparison function for sorting - perl

I need a Perl comparison function that can be used with sort.
Each key is a text string that has an arbitrary number of subkeys, separated by delimiter characters (dot, colon, space, and slash). Some subkeys are numeric, and need to be sorted numerically. The key format and number of subkeys varies. Therefore, the comparison has to handle one key being longer than the other, and has to handle the case where a subkey is numeric in one key but not in another (in which case a textual comparison is appropriate for that subkey).
This works, but I bet there are better solutions:
use warnings;
use strict;
use Scalar::Util qw[looks_like_number];
sub hier_cmp {
my $aa = $a;
my $bb = $b;
# convert all delims (. : / space) to the same delim
$aa =~ tr/.:\/ /::::/;
$bb =~ tr/.:\/ /::::/;
my #lista = split(":", $aa);
my #listb = split(":", $bb);
my $result;
for my $ix (0 .. min($#lista, $#listb)) {
if (exists($lista[$ix]) && exists($listb[$ix])) {
if ( looks_like_number($lista[$ix]) && looks_like_number($listb[$ix])) {
# compare numerically
$result = ($lista[$ix] <=> $listb[$ix]);
} else {
# compare as strings
$result = ($lista[$ix] cmp $listb[$ix]);
}
if ($result == 0) {
next;
}
return $result;
} elsif (exists($lista[$ix])) {
return 1;
} else {
return -1;
}
}
}
For my purposes, readability is more important than speed. This is just for an internal tool, and lists will rarely have more than hundreds of elements. However, any opportunity to learn something is good.
As you can see, I'm not a perl wizard. Even trivial improvements on my code would be appreciated.
Thanks!

That looks like natural sorting. There are several modules on CPAN that already do that such as Sort::Naturally or Sort::Key::Natural.
For instance:
use Sort::Key::Natural qw(natsort);
my #sorted = natsort #data;

It would help if you gave us some data to test with, but this code passes a few basic tests and it looks right.
It simplifies the problem by using the List::MoreUtils function pairwise to create an array of field pairs.
Then it is just a matter of checking whether only one is defined, when one of the lists has come to an end before the other and should be sorted first; if they are both numeric, when they should be compared with a numeric comparison; or otherwise simply compare them as strings.
If the end of the array of pairs is reached then everything has matched and zero is returned to indicate equiality.
Update
I have changed this code to remove the dependency on List::MoreUtils::pairwise.
use strict;
use warnings;
use Scalar::Util 'looks_like_number';
sub hier_cmp {
our ($a, $b);
my #a = split m|[.: /]+|, $a;
my #b = split m|[.: /]+|, $b;
for my $i (0 .. $#a > $#b ? $#a : $#b) {
my #ab = ( $a[$i], $b[$i] );
if (grep defined, #ab < 2) {
return defined $ab[0] ? 1 : -1;
}
else {
my $numeric = grep(looks_like_number($_), #ab) == 2;
my $result = $numeric ? $ab[0] <=> $ab[1] : $ab[0] cmp $ab[1];
return $result if $result;
}
}
return 0;
}

Related

Compare values of two hashes with mixed types

I want to compare the hash if the key-value pairs are same in the second hash. I don't want to use smartmatch as it gives warnings.
What is the best way to compare two hashes with integers,strings and maybe also arrays in it?
use warnings;
use diagnostics;
my $hash1={"key_str"=>"a string", "key_int"=>4};
my $hash2={"key_str"=>"b string", "key_int"=>2};
foreach my $key ( keys(%$hash1) ) {
if ($hash1->{$key} != $hash2->{$key}) {
print($key);
}
}
the output as expected is:
Argument "b string" isn't numeric in numeric ne (!=) at hash_compare.pl line 8 (#1)
(W numeric) The indicated string was fed as an argument to an operator
that expected a numeric value instead. If you're fortunate the message
will identify which operator was so unfortunate.
Argument "a string" isn't numeric in numeric ne (!=) at hash_compare.pl line 8 (#1)

First, Perl does not have types. It does not distinguish between strings and numbers (on the outside).
Furthermore, it does not make a difference between numbers and strings on this level. The numerical context and string context matters if you check what's greater or less than. Consider this:
my $foo = 200;
my $bar = 99;
print $foo > $bar ? $foo : $bar;
Obviously it will print 200, because 200 is numerically larger than 99.
my $foo = 200;
my $bar = 99;
print $foo gt $bar ? $foo : $bar;
But this will print 99, because 9 is alphanumerically (as in string) greater than 2. It compared the numbers of the code points for the characters.
But if all you want to do is check for inequality, the ne operator is fine. Even when you are not sure whether there are things other than numbers in your input.
foreach my $key ( keys(%$hash1) ) {
if ($hash1->{$key} ne $hash2->{$key}) {
print($key);
}
}
eq (and ne) are smart enough to see if a number was initially a string or a number without quotes, because the internal representation of those differs.
Warning, technical details ahead.
Scalar values are saved in _SV_s. These in terms can contain different things. There's a special internal type for simple integers called IV, and also one called PV for strings. Perl internally converts between those two as needed when you use numbers inside of strings, or vise versa.
You can get some debugging information about the internal representation of data with Dump from Devel::Peek.
use Devel::Peek;
Dump("01");
Dump(01);
This will output:
SV = PV(0x19560d0) at 0x19327d0
REFCNT = 1
FLAGS = (POK,READONLY,IsCOW,pPOK)
PV = 0x1c94fd0 "01"\0
CUR = 2
LEN = 10
COW_REFCNT = 0
SV = IV(0x19739b0) at 0x19739c0
REFCNT = 1
FLAGS = (IOK,READONLY,pIOK)
IV = 1
As you can see, the first one is a string, and the second one is a number.
But if we do this
print "01" eq 01;
there is no output, because the 01 is an integer and will be converted to "1" for comparison. Since the 0 of "01" is not equal to 1, nothing gets printed.
If the values of your data structures are more complex, you need to walk the structure. Each type of element needs to have its own handling. There could be array references, hash references, scalar references, scalars, glob references, dualvars and so on. There might be objects that you want to treat specially.
I suggest taking a look at how Test::Deep implements this. If you decide to use it in production code (and not a unit test), you can use Test::Deep::NoTest.

You can use Scalar::Util qw( looks_like_number ); to determine if the value is a number or a string. Scalar::Util is a standard module that is included with Perl. For a list of standard modules, see perldoc perlmodlib.
#!/usr/bin/env perl
# always use these two
use strict;
use warnings;
# handle errors in open and close
use autodie; # See http://perldoc.perl.org/autodie.html
use Scalar::Util qw( looks_like_number );
my $hash1={"key_str"=>"a string", "key_int"=>4};
my $hash2={"key_str"=>"b string", "key_int"=>2};
foreach my $key ( keys(%$hash1) ) {
if( looks_like_number( $hash1->{$key} ) && looks_like_number( $hash2->{$key} ) ){
if ($hash1->{$key} != $hash2->{$key}) {
print "number value of $key is different\n";
}
}else{
if ($hash1->{$key} ne $hash2->{$key}) {
print "string value of $key is different\n";
}
}
}

I've written a program that doesn't use any module. I've tested below program for many cases, worked fine, but if you find any case where it fails then please let me know.
Always use ne to compare if you are not sure of datatype to be compared. != works only for integers, ne for integers and strings both.
use strict;
use warnings;
use feature 'say';
my $hash1 = {
'key1' => 'value1',
'key2' => [1, 2, 2],
'key3' => {1=>1, 2=> [5, 7]},
};
my $hash2 = {
'key1' => 'value1',
'key2' => [1, 2, 2],
'key3' => {1=>1, 2=> [5, 7]},
};
my $is_same = 0;
$is_same = compare($hash1, $hash2);
if ($is_same) {
say "Same";
} else {
say "Not same";
}
sub compare {
my ($value1, $value2) = #_;
my $is_same = 1;
if (ref($value1) eq "ARRAY") {
if (is_same_sized_array($value1, $value2)) {
foreach (my $i = 0; $i < #$value1; $i++) {
if (ref $value1->[$i] eq ref $value2->[$i]) {
$is_same = compare($value1->[$i], $value2->[$i]);
return 0 unless $is_same;
} else {
return 0;
}
}
} else {
return 0;
}
} elsif (ref($value1) eq "HASH") {
if (is_same_sized_array([keys %$value1], [keys %$value2])) {
foreach my $key (sort keys %$value1) {
if (exists $value2->{$key} && ref $value1->{$key} eq ref $value2->{$key}) {
$is_same = compare($value1->{$key}, $value2->{$key});
return 0 unless $is_same;
} else {
return 0;
}
}
} else {
return 0;
}
} else {
if ($value1 ne $value2) {
return 0;
}
}
return $is_same;
}
sub is_same_sized_array {
my ($arr1, $arr2) = #_;
return (#$arr1 == #$arr2) || 0;
}

How can I test if a value is repeated in array?

I'm doing a subroutine that takes a list of numbers as an argument. What I would like to do is check if there is a repeated value in that list. In case there are repeated numbers, print a message and stop the program. In case there are no repeated numbers, continue with the execution.
For example:
if (there_is_number_repeated){
print "There is a number repeated";}
else{
run this code...}
I was trying to do this creating a hash with the values of that list, and then check if there are values > 1.
use strict;
use warnings;
sub name_subroutine{
my (#numbers)=#_;
my $n=scalar(#numbers);
my %table=();
foreach my $i(#numbers){
if (exists $tabla{$i}){
$tabla{$i}+=1;}
else{
$tabla{$i} = 1;
}
my #values = values %tabla;
}
}
It's here where I do not know to continue. Is there any way to do this in an amateurish way? I'm newbie in Perl.
Thanks!

I would just do:
my %uniq;
if ( grep ++$uniq{$_} > 1, #numbers ) {
# some numbers are repeated
}
In your existing code (with a couple corrections):
my %table=();
foreach my $i(#numbers){
if (exists $table{$i}){
$table{$i}+=1;}
else{
$table{$i} = 1;
}
}
my #values = values %table;
you don't need to check for exists; doing += 1 or ++ will set it to 1 if it didn't exist. And you don't want the values (those are just the counts of how many times each array value was found), you want the keys, specifically those for which the value is > 1:
my #repeated = grep $table{$_} > 1, keys %table;

my #arr = #_;
my $count = #arr;
for(my $i=0;$i<$count;$i++)
{
my $num = $arr[$i];
for(my $j=0; $j<$count,$j!=$i; $j++)
{
if($num == $arr[$j])
{
print "\n$num is repeated";
last;
}
}
}
Tried and tested. Cheers.

Compare 2 sets of numbers contained in strings

I have 2 scalars as below:
$a = '100 105 010';
$b = '010 105 100';
How do I compare if both has same set of values? order doesn't matter.
one way is to arrange it in ascending order and compare, is there a better way?

You can split each string into an array and sort and compare arrays. By hand:
use warnings;
use strict;
my $x = '100 105 1 010';
my $y = '010 105 100 2';
my #xs = sort { $a <=> $b } split ' ', $x;
my #ys = sort { $a <=> $b } split ' ', $y;
if (#xs != #ys) {
print "Differ in number of elements.\n";
}
else {
for (0..$#xs) {
print "Differ: $xs[$_] vs $ys[$_]\n" if $xs[$_] != $ys[$_];
}
}
# For boolean comparison put it in a sub
print arr_num_eq(\#xs, \#ys), "\n";
sub arr_num_eq {
my ($ra, $rb) = #_;
return 0 if #$ra != #$rb;
$ra->[$_] != $rb->[$_] && return 0 for 0..$#$ra;
return 1;
}
The sorting can be moved to the sub as well, which would then take strings. The way it stands it can be used for comparison of existing arrays as well. Please add argument checking.
There is a number of modules that have this capability. The perm from Array::Compare hides the sorting above, but internally joins sorted arrays into strings thus duplicating the work here since we started with strings. The List::AllUtils certainly offers this as well with its long list of utilities.
See this post, for example, for a few methods (just not the smart match ~~), and for benchmarks if efficiency is a concern.
Using the mentioned implementation idea from Array::Compare, per comment by ysth
sub str_num_eq {
return join(' ', sort { $a <=> $b } split / /, $_[0])
eq join(' ', sort { $a <=> $b } split / /, $_[1])
}
What the most suitable method is depends on what this is for and how it is used. Is it only a boolean comparison, or will more be done if they are found to differ? How does it come about in your program flow? What are typical sizes of strings, how often is it run? Are the strings most often the same or different, do they typically differ a lot or a little? Etc.

Without modules, you can use hashes:
#!/usr/bin/perl
use warnings;
use strict;
my $x = '100 105 010 2';
my $y = '010 105 100 100 1';
my (%hx, %hy);
$hx{$_}++ for split ' ', $x;
$hy{$_}++ for split ' ', $y;
for my $k (keys %hx) {
if (! exists $hy{$k}) {
print "$k missing in y\n";
} elsif ($hy{$k} != $hx{$k}) {
print "$k has different number of occurences\n";
}
delete $hy{$k};
}
print "$_ missing in x\n" for keys %hy;
$a and $b are special variables used in sort, so I renamed them to $x and $y.
split turns the strings into lists. Each hash counts how many times a member occurs in the list.
See also Perl FAQ 4.

Something else try with pattern matching,
This is not a straight forward but it will work.
Construct the pattern by anyone of your scalar value. Then check the another string by the constructed pattern.
my $a = '100 100 105';
my $b = '100 105 100';
my #b_ary = split(" ",$b);
my $regex = join'\b|\b', #b_ary;
my $word_length = #b_ary * 2 - 1; #Count the number of words and space.
my $rgx = qr"^(?:\b$regex\b|\s){$word_length}$"; #`{n}` match word exact n times
if($a=~m/$rgx/)
{
print "same values\n";
}
else
{
print "Not a same values\n";
}

The answer is already posted above. This is just in case you want to remove the white spaces and compare each number.
$x = '100 105 010';
$y = '010 105 100';
join("",sort split "",join("",split " ",$x)) eq join("",sort split "",join("",split " ",$y));

Compare two arrays in a easy way using Perl

Is there any easy way to compare two string arrays in Perl?
#array1 = (value1, value2, value3...);
#array2 = (value1, value3, value4...);
I need the comparison like below for "N" Number of values,
value1 eq value1
value2 eq value3
value3 eq value4
Please suggest me is there any module to do this?
Thanks

Hmm... a module to compare arrays, you say. How about Array::Compare?
use Array::Compare;
my $compare = Array::Compare->new;
my #array1 = (value1, value2, value3...);
my #array2 = (value1, value3, value4...);
if ($compare->compare(\#array1, \#array2)) {
say "Arrays are the same";
} else {
say "Arrays are different";
}
But you can also use the smart match operator.
if (#array1 ~~ #array2) {
say "Arrays are the same";
} else {
say "Arrays are different";
}

You can compare sizes of both arrays (#a1 == #a2 in scalar context), and then compare size of #a1 array with size of list of indices which correspond to equal strings in both arrays (grep $a1[$_] eq $a2[$_], 0..$#a1),
if (#a1 == #a2 and #a1 == grep $a1[$_] eq $a2[$_], 0..$#a1) { print "equal arrays\n" }
More performance oriented version (doesn't go through all elements if not necessary),
use List::Util 'all';
if (#a1 == #a2 and all{ $a1[$_] eq $a2[$_] } 0..$#a1) { print "equal arrays\n" }

Perl already has some parts for a solution to any list operations.
See List::Util and List::MoreUtils.
my $arrays_are_equal
= !List::Util::pairfirst { $a ne $b } # first case where $a != $b
List::MoreUtils::zip( #array1, #array2 )
;
For this application, see List::Util::pairfirst and List::MoreUtils::zip

This task is simple enough that I wouldn't necessarily want to use a CPAN module. Instead, I would likely write my own comparison subroutine and put it in my own utility module. Here is one implementation that will compare two arrays containing strings and/or integers.
#!/usr/bin/env perl
use strict;
use warnings;
my #array1 = (1..10, 'string');
my #array2 = (1..10, 'string');
my $is_same = is_same(\#array1, \#array2);
print "is_same: $is_same\n";
sub is_same {
my($array1, $array2) = #_;
# immediately return false if the two arrays are not the same length
return 0 if scalar(#$array1) != scalar(#$array2);
# turn off warning about comparing uninitialized (undef) string values
# (limited in scope to just this sub)
no warnings;
for (my $i = 0; $i <= $#$array1; $i++) {
if ($array1->[$i] ne $array2->[$i]) {
return 0;
}
}
return 1;
}

How can I get this basic Perl sub program that sorts to work properly?

I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;

When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?

With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Subkey comparison function for sorting - perl

That looks like natural sorting. There are several modules on CPAN that already do that such as Sort::Naturally or Sort::Key::Natural. For instance: use Sort::Key::Natural qw(natsort); my #sorted = natsort #data;

Related

Compare values of two hashes with mixed types

How can I test if a value is repeated in array?

Compare 2 sets of numbers contained in strings

Compare two arrays in a easy way using Perl

How can I get this basic Perl sub program that sorts to work properly?

Categories

Resources