Related
I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;
When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?
With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.
The top answer in this post: How can I create a multidimensional array in Perl? suggests building a multi-dimensional array as follows:
my #array = ();
foreach my $i ( 0 .. 10 ) {
foreach my $j ( 0 .. 10 ) {
push #{ $array[$i] }, $j;
}
}
I am wondering if there is a way of building the array more compactly and avoiding the nested loop, e.g. using something like:
my #array = ();
my #other_array = (0 ... 10);
foreach my $i ( 0 .. 10 ) {
$array[$i] = #other_array; # This does not work in Perl
}
}
Does Perl support any syntax like that for building multi-dimensional arrays without nested looping?
Similarly, is there a way to print the multidimensional array without (nested) looping?
There is more than one way to do it:
Generating
push accepts LISTs
my #array;
push #{$array[$_]}, 0 .. 10 for 0 .. 10;
Alternative syntax:
my #array;
push #array, [ 0 .. 10 ] for 0 .. 10;
map eye-candy
my #array = map { [ 0 .. 10 ] } 0 .. 10;
Alternative syntax:
my #array = map [ 0 .. 10 ], 0 .. 10;
Printing
With minimal looping
print "#$_\n" for #array;
On Perl 5.10+
use feature 'say';
say "#$_" for #array;
With more formatting control
print join( ', ', #$_ ), "\n" for #array; # "0, 1, 2, ... 9, 10"
"No loops" (The loop is hidden from you)
use Data::Dump 'dd';
dd #array;
Data::Dumper
use Data::Dumper;
print Dumper \#array;
Have a look at perldoc perllol for more details
You are close, you need a reference to the other array
my #array; # don't need the empty list
my #other_array = (0 ... 10);
foreach my $i ( 0 .. 10 ) {
$array[$i] = \#other_array;
# or without a connection to the original
$array[$i] = [ #other_array ];
# or for a slice
$array[$i] = [ #other_array[1..$#other_array] ];
}
}
You can also make anonymous (unnamed) array reference directly using square braces [] around a list.
my #array;
foreach my $i ( 0 .. 10 ) {
$array[$i] = [0..10];
}
}
Edit: printing is probably easiest using the postfix for
print "#$_\n" for #array;
for numerical multidimensional arrays, you can use PDL. It has several constructors for different use cases. The one analogous to the above would be xvals. Note that PDL objects overload printing, so you can just print them out.
use PDL;
my $pdl = xvals(11, 11);
print $pdl;
I have two arrays. I need to check and see if the elements of one appear in the other one.
Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.
Another way to do it is to use Array::Utils
use Array::Utils qw(:all);
my #a = qw( a b c d );
my #b = qw( c d e f );
# symmetric difference
my #diff = array_diff(#a, #b);
# intersection
my #isect = intersect(#a, #b);
# unique union
my #unique = unique(#a, #b);
# check if arrays contain same members
if ( !array_diff(#a, #b) ) {
# do something
}
# get items from array #a that are not in array #b
my #minus = array_minus( #a, #b );
perlfaq4 to the rescue:
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
If you properly declare your variables, the code looks more like the following:
my %count;
for my $element (#array1, #array2) { $count{$element}++ }
my ( #union, #intersection, #difference );
for my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need to provide a lot more context. There are more efficient ways of doing that ranging from:
Go outside of Perl and use shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)
Example:
my %second = map {$_=>1} #second;
my #only_in_first = grep { !$second{$_} } #first;
# use a foreach loop with `last` instead of "grep"
# if you only want yes/no answer instead of full list
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.
You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:
sub array_diff(\#\#) {
my %e = map { $_ => undef } #{$_[1]};
return #{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } #{ $_[0] } ), keys %e ] };
}
Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.
There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:
#array1 = qw(1 1 2 2 3 3 4 4 5 5);
#array2 = qw(1 2 3 4 5);
Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in #array1 and not #array2.
What about this?
#array1 = qw( 1 1 2 3 4 5 );
#array2 = qw( 1 1 2 3 4 5 );
I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.
And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?
No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.
Here's a solid solution that assumes duplicates don't matter:
sub array_diff {
my #array1 = #{ shift() };
my #array2 = #{ shift() };
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #array1 ) {
$array1_hash{$element} = #array1;
}
# Same for #array2: This time, use map instead of a loop
map { $array_2{$_} = 1 } #array2;
for my $entry ( #array2 ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array_hash1 != keys %array_hash2 ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:
my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } #array1;
map { $array2_hash{$_} += 2 } #array2;
Now, you can go through each hash and verify that not only do the keys exist, but that their entries match
for my $key ( keys %array1_hash ) {
if ( not exists $array2_hash{$key}
or $array1_hash{$key} != $array2_hash{$key} ) {
return 1; #Arrays differ
}
}
You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:
if ( keys %array2_hash != keys %array1_hash ) {
return 1; #Arrays have a different number of keys: Don't match
}
else {
return; #Arrays have the same keys: They do match
}
You can use this for getting diffrence between two arrays
#!/usr/bin/perl -w
use strict;
my #list1 = (1, 2, 3, 4, 5);
my #list2 = (2, 3, 4);
my %diff;
#diff{ #list1 } = undef;
delete #diff{ #list2 };
You want to compare each element of #x against the element of the same index in #y, right? This will do it.
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n"
for grep { $x[$_] != $y[$_] } 0 .. $#x;
...or...
foreach( 0 .. $#x ) {
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n" if $x[$_] != $y[$_];
}
Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)
n + n log n algorithm, if sure that elements are unique in each array (as hash keys)
my %count = ();
foreach my $element (#array1, #array2) {
$count{$element}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,
my %count = ();
foreach (#array1) {
$count{$_} = 1 ;
};
foreach (#array2) {
$count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
return 'Some element of array1 does not appears in array2'
} else {
return 'All elements of array1 are in array2'.
}
# N + N log N
my #a = (1,2,3);
my #b=(2,3,1);
print "Equal" if grep { $_ ~~ #b } #a == #b;
Not elegant, but easy to understand:
#!/usr/local/bin/perl
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my #file1lines = split/\n/,`cat $file1`;
my #file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(#file1lines){
$lines{$file1line}+=1;
}
foreach my $file2line(#file2lines){
$lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
if($value == 1){
print "$key is in only $file1\n";
}elsif($value == 2){
print "$key is in only $file2\n";
}elsif($value == 3){
print "$key is in both $file1 and $file2\n";
}
}
exit;
__END__
Try to use List::Compare. IT has solutions for all the operations that can be performed on arrays.
I have a fixed-sized array where the size of the array is always in factor of 3.
my #array = ('foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5);
How can I cluster the member of array such that we can get
an array of array group by 3:
$VAR = [ ['foo','bar','qux'],
['foo1','bar','qux2'],
[3, 4, 5] ];
my #VAR;
push #VAR, [ splice #array, 0, 3 ] while #array;
or you could use natatime from List::MoreUtils
use List::MoreUtils qw(natatime);
my #VAR;
{
my $iter = natatime 3, #array;
while( my #tmp = $iter->() ){
push #VAR, \#tmp;
}
}
I really like List::MoreUtils and use it frequently. However, I have never liked the natatime function. It doesn't produce output that can be used with a for loop or map or grep.
I like to chain map/grep/apply operations in my code. Once you understand how these functions work, they can be very expressive and very powerful.
But it is easy to make a function to work like natatime that returns a list of array refs.
sub group_by ($#) {
my $n = shift;
my #array = #_;
croak "group_by count argument must be a non-zero positive integer"
unless $n > 0 and int($n) == $n;
my #groups;
push #groups, [ splice #array, 0, $n ] while #array;
return #groups;
}
Now you can do things like this:
my #grouped = map [ reverse #$_ ],
group_by 3, #array;
** Update re Chris Lutz's suggestions **
Chris, I can see merit in your suggested addition of a code ref to the interface. That way a map-like behavior is built in.
# equivalent to my map/group_by above
group_by { [ reverse #_ ] } 3, #array;
This is nice and concise. But to keep the nice {} code ref semantics, we have put the count argument 3 in a hard to see spot.
I think I like things better as I wrote it originally.
A chained map isn't that much more verbose than what we get with the extended API.
With the original approach a grep or other similar function can be used without having to reimplement it.
For example, if the code ref is added to the API, then you have to do:
my #result = group_by { $_[0] =~ /foo/ ? [#_] : () } 3, #array;
to get the equivalent of:
my #result = grep $_->[0] =~ /foo/,
group_by 3, #array;
Since I suggested this for the sake of easy chaining, I like the original better.
Of course, it would be easy to allow either form:
sub _copy_to_ref { [ #_ ] }
sub group_by ($#) {
my $code = \&_copy_to_ref;
my $n = shift;
if( reftype $n eq 'CODE' ) {
$code = $n;
$n = shift;
}
my #array = #_;
croak "group_by count argument must be a non-zero positive integer"
unless $n > 0 and int($n) == $n;
my #groups;
push #groups, $code->(splice #array, 0, $n) while #array;
return #groups;
}
Now either form should work (untested). I'm not sure whether I like the original API, or this one with the built in map capabilities better.
Thoughts anyone?
** Updated again **
Chris is correct to point out that the optional code ref version would force users to do:
group_by sub { foo }, 3, #array;
Which is not so nice, and violates expectations. Since there is no way to have a flexible prototype (that I know of), that puts the kibosh on the extended API, and I'd stick with the original.
On a side note, I started with an anonymous sub in the alternate API, but I changed it to a named sub because I was subtly bothered by how the code looked. No real good reason, just an intuitive reaction. I don't know if it matters either way.
Or this:
my $VAR;
while( my #list = splice( #array, 0, 3 ) ) {
push #$VAR, \#list;
}
Another answer (a variation on Tore's, using splice but avoiding the while loop in favor of more Perl-y map)
my $result = [ map { [splice(#array, 0, 3)] } (1 .. (scalar(#array) + 2) % 3) ];
Try this:
$VAR = [map $_ % 3 == 0 ? ([ $array[$_], $array[$_ + 1], $array[$_ + 2] ])
: (),
0..$#array];
Another generic solution, non-destructive to the original array:
use Data::Dumper;
sub partition {
my ($arr, $N) = #_;
my #res;
my $i = 0;
while ($i + $N-1 <= $#$arr) {
push #res, [#$arr[$i .. $i+$N-1]];
$i += $N;
}
if ($i <= $#$arr) {
push #res, [#$arr[$i .. $#$arr]];
}
return \#res;
}
print Dumper partition(
['foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5],
3
);
The output:
$VAR1 = [
[
'foo',
'bar',
'qux'
],
[
'foo1',
'bar',
'qux2'
],
[
3,
4,
5
]
];
As a learning experience I decided to do this in Perl6
The first, perhaps most simplest way I tried was to use map.
my #output := #array.map: -> $a, $b?, $c? { [ $a, $b // Nil, $c // Nil ] };
.say for #output;
foo bar qux
foo1 bar qux2
3 4 5
That didn't seem very scalable. What if I wanted to take the items from the list 10 at a time, that would get very annoying to write. ... Hmmm I did just mention "take" and there is a keyword named take lets try that in a subroutine to make it more generally useful.
sub at-a-time ( Iterable \sequence, Int $n where $_ > 0 = 1 ){
my $is-lazy = sequence.is-lazy;
my \iterator = sequence.iterator;
# gather is used with take
gather loop {
my Mu #current;
my \result = iterator.push-exactly(#current,$n);
# put it into the sequence, and yield
take #current.List;
last if result =:= IterationEnd;
}.lazy-if($is-lazy)
}
For kicks let's try it against an infinite list of the fibonacci sequence
my $fib = (1, 1, *+* ... *);
my #output = at-a-time( $fib, 3 );
.say for #output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)
Notice that I used $fib instead of #fib. It was to prevent Perl6 from caching the elements of the Fibonacci sequence.
It might be a good idea to put it into a subroutine to create a new sequence everytime you need one, so that the values can get garbage collected when you are done with them.
I also used .is-lazy and .lazy-if to mark the output sequence lazy if the input sequence is. Since it was going into an array #output it would have tried to generate all of the elements from an infinite list before continuing onto the next line.
Wait a minute, I just remembered .rotor.
my #output = $fib.rotor(3);
.say for #output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)
.rotor is actually far more powerful than I've demonstrated.
If you want it to return a partial match at the end you will need to add a :partial to the arguments of .rotor.
Use the spart function from the List::NSect package on CPAN.
perl -e '
use List::NSect qw{spart};
use Data::Dumper qw{Dumper};
my #array = ("foo", "bar", "qux", "foo1", "bar", "qux2", 3, 4, 5);
my $var = spart(3, #array);
print Dumper $var;
'
$VAR1 = [
[
'foo',
'bar',
'qux'
],
[
'foo1',
'bar',
'qux2'
],
[
3,
4,
5
]
];
Below a more generic solution to the problem:
my #array = ('foo', 'bar', 1, 2);
my $n = 3;
my #VAR = map { [] } 1..$n;
my #idx = sort map { $_ % $n } 0..$#array;
for my $i ( 0..$#array ){
push #VAR[ $idx[ $i ] ], #array[ $i ];
}
This also works when the number of items in the array is not a factor of 3.
In the above example, the other solutions with e.g. splice would produce two arrays of length 2 and one of length 0.
Let's make this very easy. What I want:
#array = qw/one two one/;
my #duplicates = duplicate(#array);
print "#duplicates"; # This should now print 'one'.
How to print duplicate values of a array/hash?
sub duplicate {
my #args = #_;
my %items;
for my $element(#args) {
$items{$element}++;
}
return grep {$items{$_} > 1} keys %items;
}
# assumes inputs can be hash keys
#a = (1, 2, 3, 3, 4, 4, 5);
# keep count for each unique input
%h = ();
map { $h{$_}++ } #a;
# duplicate inputs have count > 1
#dupes = grep { $h{$_} > 1 } keys %h;
# should print 3, 4
print join(", ", sort #dupes), "\n";
The extra verbose, extra readable version of what you want to do:
sub duplicate {
my %value_hash;
foreach my $val (#_) {
$value_hash{$val} +=1;
}
my #arr;
while (my ($val, $num) = each(%value_hash)) {
if ($num > 1) {
push(#arr, $val)
}
}
return #arr;
}
This can be shortened considerably, but I intentionally left it verbose so that you can follow along.
I didn't test it, though, so watch out for my typos.
Use a dictionary, put the value in the key, and the count in the value.
Ah, just noticed you've tagged as perl
while ([...]) {
$hash{[dbvalue]}++
}
Unspecified in the question is the order in which the duplicates should be returned.
I can think of several possibilities: don't care; by order of first/second/last occurrence in the input list; sorted.
I'm going golfing!
sub duplicate {
my %count;
grep $count{$_}++, #_;
}
#array = qw/one two one/;
my #duplicates = duplicate(#array);
print "#duplicates"; # This should now print 'one'.
# or if returning *exactly* 1 occurrence of each duplicated item is important
sub duplicate {
my %count;
grep ++$count{$_} == 2, #_;
}