Generate unordered pairs from list in perl - perl

Perl newbie here.
I want to loop over pairs of words, coming from a list of words:
#words = ("word1", "word2", "word3", "word4");
I want to create and process all pairs of words, but pair ordering is not important, i.e. the pairs
("word1", "word2") and ("word2, "word1") are considered the same and only one of them should be generated.
Is there an easy way to do this? The obvious solution would be to have an nested loop somewhat like:
for my $i1 (0 ... $#words) {
for my $i2 ($i1 + 1 ... $#words) {
process_pair(words[$i1], words[$i2])
}
}
but I am looking for something more Perl-esque than this. Thanks in advance.

For a fixed R=2, the solution you presented is quite appropriate.
for my $i1 (0 ... $#words) {
for my $i2 ($i1 + 1 ... $#words) {
process_pair($words[$i1], $words[$i2])
}
}
But what if R was larger or variable? You can do some powerful stuff using NestedLoops.
use Algorithm::Loops qw( NestedLoops );
my $R = 2;
NestedLoops(
[ [ 0..$#words ],
( sub { [$_+1..$#words] } ) x ($R-1),
],
\&process_pair,
);
Or as an iterator
use Algorithm::Loops qw( NestedLoops );
my $R = 2;
my $iter = NestedLoops([
[ 0..$#words ],
( sub { [$_+1..$#words] } ) x ($R-1),
]);
while (my #combo = $iter->()) {
process_pair(#combo);
}
But that's hardly readable. Solutions that specifically address this problem are going to be cleanest.
use Math::Combinatorics qw( );
my $R = 2;
my $iter = Math::Combinatorics->new( count => $R, data => \#words );
while (my #combo = $iter->next_combination) {
process_pair(#combo);
}

Related

perl: assign interval to number with hash keys, increment counter if number is in interval

I have several pairs of numbers and want to check whether the first one falls into a given interval. If so, I want to increment one of two counters for that interval, depending on whether the second number of the pair is higher or lower than the first.
The intervals are simple and look like 1-10,11-20,21-30 etc. The pairs of numbers look like (5,15),(24,13) etc. But I have several thousand of each. The exact format of input does not matter so much.
The desired outcome looks like this
1-10: higher=1, lower=0
11-20: higher=0,lower=0
21-30: higher=0,lower=1
My idea was to create a hash with the interval start as key and store the two counters in the value.
for(my $i = $start;$i<=$end;$i = ($i+$intervalsize)){
my $counter1 = 0;
my $counter2 = 0;
#{$hash{$i}} = ($counter1,$counter2);
but now I don't know how to compare the numbers with the keys and how to address the counters.
Thank you for any help!
Something like this maybe?
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw( first_index );
sub interval_to_key { sprintf '(%d,%d)', #{ $_[0] } }
my #intervals = (
[ 1, 10 ],
[ 11, 20 ],
[ 21, 30 ],
);
my #upper_bounds = map $_->[0], #intervals;
my #pairs = (
[ 5, 15 ],
[ 24, 13 ],
);
my #keys = qw( ascending descending );
my %counts;
for my $interval ( #intervals ) {
#{ $counts{ interval_to_key($interval) } }{ #keys } = (0) x #keys;
}
for my $pair ( #pairs ) {
my $is_ascending = ($pair->[0] <= $pair->[1]);
my $i = first_index { $pair->[0] <= $_ } #upper_bounds;
++ $counts{ interval_to_key( $intervals[$i]) }{ $keys[1 - $is_ascending] };
}
use Data::Dumper;
print Dumper \%counts;
If the counters are to be kept for each interval we can record them in a data structure of choice. For example, store counters for each interval in a hash with the key being the interval index and low-high counters being in a two-element refarray as its value.
use warnings 'all';
use strict;
my #intervals = ( [1,10], [11,20], [21,30] );
my #pairs = ( [5,15], [24, 13] );
my %counter = map { $_ => [0,0] } (0..$#intervals); # initialize
foreach my $i (0..$#intervals)
{
foreach my $pair (#pairs)
{
my ($beg, $end) = #{$intervals[$i]};
my ($pl, $pr) = #$pair;
if ($pl >= $beg and $pl <= $end)
{
if ($pr > $pl) { ++$counter{$i}[1] }
else { ++$counter{$i}[0] }
}
}
}
print "$_ => #{$counter{$_}}\n" for sort keys %counter;
This is still rudimentary, in order to be easier to adjust. For one thing, since the hash uses the index of #intervals array it may well be an array (with arrayrefs) itself. Another possibility is to use a hash with keys like low and high for counters instead of a two-element array.
In order to use start of interval as a key
my %counter = map { $_->[0] => [0,0] } #intervals; # initialize
for my $inter (#intervals) {
for my $pair (#pairs) {
my ($beg, $end) = #$inter;
my ($pl, $pr) = #$pair;
if ($pl >= $beg and $pr <= $end)
{
if ($pr > $pl) { ++$counter{$beg}[1] }
else { ++$counter{$end}[0] }
}
}
}
printf("%3d => #{$counter{$_}}\n", $_) for sort keys %counter;

List array find double and add value

the original perl array is sorted and looks like this:
Original ARRARY:
ccc-->2
ccc-->5
abc-->3
abc-->7
cb-->6
and i like to have the following result:
FINAL ARRARY:
ccc-->7
abc-->10
cb-->6
Question:
can you please create a subroutine for that ?
this was the orig. subroutine that i used:
sub read_final_dev_file {
$dfcnt=0;
$DEVICE_ANZSUMZW=0;
$DEVICE_ANZSUM=0;
open(DATA,"$log_dir1/ALLDEVSORT.$log_file_ext1") || die ("Cannot Open Logfile: $log_dir1/$log_DEV_name.$log_file_ext1 !!!!");
#lines = <DATA>;
close(DATA);
chomp(#lines); # erase the last sign from a string
foreach $logline (#lines) {
if ($logline =~ /(.*)-->(.*)/) {
$DEVICE_CODE[$dfcnt] = $1;
$DEVICE_ANZAHL[$dfcnt] = $2;
print "DEVICE_final = $DEVICE_CODE[$dfcnt], D_ANZAHL_final = $DEVICE_ANZAHL[$dfcnt]\n";
if ($dfcnt > 0 ) {
if ( $DEVICE_CODE[$dfcnt] eq $DEVICE_CODE[$dfcnt-1] ) {
$DEVICE_ANZSUM = $DEVICE_ANZAHL[$dfcnt] + $DEVICE_ANZAHL[$dfcnt-1];
$DEVICE_ANZSUMZW = $DEVICE_ANZSUM++;
#$DEVICE_ANZSUM = $DEVICE_ANZAHL[$dfcnt]++;
#print "DEVICE_ANZAHL = $DEVICE_ANZAHL[$dfcnt],DEVICE_ANZAHL -1 = $DEVICE_ANZAHL[$dfcnt-1]\n";
print "DEVICE_eq = $DEVICE_CODE[$dfcnt], D_ANZAHL_eq = $DEVICE_ANZAHL[$dfcnt],DEVANZSUM = $DEVICE_ANZSUM,COUNT = $dfcnt\n";
}#end if
if ( $DEVICE_CODE[$dfcnt] ne $DEVICE_CODE[$dfcnt-1] ) {
#$DEVICE_ANZSUM=0;
#splice(#data3,$dfcnt+2,1) if ($DEVICE_ANZSUM > 1);
push (#data3,$DEVICE_ANZSUMZW) if ($DEVICE_ANZSUM > 1);
push (#data3,$DEVICE_ANZAHL[$dfcnt]) if ($DEVICE_ANZSUM == 0);
if ( $DEVICE_CODE[$dfcnt] ne $DEVICE_CODE[$dfcnt-1] ) {
$DEVICE_ANZSUM=0;
}
print "DEVICE_ne = $DEVICE_CODE[$dfcnt], D_ANZAHL_ne = $DEVICE_ANZAHL[$dfcnt], DEVANZSUM = $DEVICE_ANZSUM\n";
}#end if
}#end if $dfcnt
$dfcnt++;
}#end if logline
}#end for
print "#labels3\n";
print "#data3\n";
}#end sub read_final_dev_file
Probably not the best way, but this is what came to mind after seeing LeoNerd answer, since I don't have CPAN access in production and never have modules lying around:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #input = (
[ ccc => 2 ],
[ ccc => 5 ],
[ abc => 3 ],
[ abc => 7 ],
[ cb => 6 ],
);
my %output;
$output{$_->[0]} += $_->[1] for #input;
print Dumper \%output;
my #output = map { [ $_ => $output{$_} ] } keys(%output);
print Dumper \#output;
Output:
$VAR1 = {
'abc' => 10,
'cb' => 6,
'ccc' => 7
};
$VAR1 = [
['abc', 10],
['cb', 6],
['ccc', 7],
];
You could use List::UtilsBy::partition_by to group the original list into partitions, by the first string:
use List::UtilsBy qw( partition_by );
my #input = (
[ ccc => 2 ],
[ ccc => 5 ],
[ abc => 3 ],
[ abc => 7 ],
[ cb => 6 ],
);
my %sets = partition_by { $_->[0] } #input;
Now you have a hash, keyed by the leading strings, whose values are all the ARRAY refs with that key first. You can now sum the values within them, by mapping over $_->[1] which contains the numbers:
use List::Util qw( sum );
my %totals;
foreach my $key ( keys %sets ) {
$totals{$key} = sum map { $_->[1] } #{ $sets{$key} };
}
If you're inclined towards code of a more compact and functional-looking nature, you could instead use the new pairmap here; making the whole thing expressible in one line:
use List::UtilsBy qw( partition_by );
use List::Util qw( pairmap sum );
my %totals = pairmap { $a => sum map { $_->[1] } #$b }
partition_by { $_->[0] } #input;
Edit: I should add that even though you stated in your original question that the array was sorted, this solution doesn't require it sorted. It will happily take the input in any order.
You can simplify your subroutine a lot by using a hash to track the counts instead of an array. The following uses an array #devices to track the order and a hash %device_counts to track the counts:
my #devices;
my %device_counts;
while (<DATA>) { # Read one line at a time from DATA
if (/(.*)-->(.*)/) { # This won't extract newlines so no need to chomp
if (!exists $device_counts{$1}) {
push #devices, $1; # Add to the array the first time we encounter a device
}
$device_counts{$1} += $2; # Add to the count for this device
}
}
for my $device (#devices) {
printf "%s-->%s\n", $device, $device_counts{$device};
}

Enumerating ordered tokens with all possible length in Perl

Given this string:
<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0><MIR-0><PREP>
What I want to do is to enumerate all possible ordered length like this:
<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0><MIR-0><PREP>
<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0><MIR-0>
<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0>
<VACC-PROP-0><VACC-PROP-0><NUM>
<VACC-PROP-0><VACC-PROP-0>
<VACC-PROP-0>
<VACC-PROP-0><NUM><EXP-V-0><MIR-0><PREP>
<NUM><EXP-V-0><MIR-0><PREP>
<EXP-V-0><MIR-0><PREP>
<MIR-0><PREP>
<PREP>
Not that the above is done by hand. It's possible that I may be missing something.
But the idea is to identify all possible ordered tokens of all length (number of tokens).
I tried this code but failed, what's the best way to do it?
use Data::Dumper;
my $str = "<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0><MIR-0><PREP>";
# Remove all the brackets
my #tokens = grep {!/^$/} split(/[><]/,$str);
# Print the combinations
foreach my $i (0 .. $#tokens) {
print join(" ", #tokens[0..$i]),"\n";
}
Execute here: https://eval.in/51023
You want nested loops of arbitrarily depth.
for my $use_token0 (0..1) {
for my $use_token1 (0..1) {
for my $use_token2 (0..1) {
...
}
}
}
For that, you use Algorithm::Loops's NestedLoops.
use Algorithm::Loops qw( NestedLoops );
my $str = "<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0><MIR-0><PREP>";
my #tokens = split /(?<=>)(?=<)/, $str;
my $iter = NestedLoops([ ( [0,1] ) x #tokens ]);
while ( my #bools = $iter->() ) {
say #tokens[ grep $bools[$_], 0..$#tokens ];
}
Although, in this case, you could simply use
my $str = "<VACC-PROP-0><VACC-PROP-0><NUM><EXP-V-0><MIR-0><PREP>";
my #tokens = split /(?<=>)(?=<)/, $str;
for my $i (0 .. (1<<#tokens)-1) {
say #tokens[ grep $i & (1 << ($#tokens-$_)), 0..$#tokens ];
}

Difference of Two Arrays Using Perl

I have two arrays. I need to check and see if the elements of one appear in the other one.
Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.
Another way to do it is to use Array::Utils
use Array::Utils qw(:all);
my #a = qw( a b c d );
my #b = qw( c d e f );
# symmetric difference
my #diff = array_diff(#a, #b);
# intersection
my #isect = intersect(#a, #b);
# unique union
my #unique = unique(#a, #b);
# check if arrays contain same members
if ( !array_diff(#a, #b) ) {
# do something
}
# get items from array #a that are not in array #b
my #minus = array_minus( #a, #b );
perlfaq4 to the rescue:
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
If you properly declare your variables, the code looks more like the following:
my %count;
for my $element (#array1, #array2) { $count{$element}++ }
my ( #union, #intersection, #difference );
for my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need to provide a lot more context. There are more efficient ways of doing that ranging from:
Go outside of Perl and use shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)
Example:
my %second = map {$_=>1} #second;
my #only_in_first = grep { !$second{$_} } #first;
# use a foreach loop with `last` instead of "grep"
# if you only want yes/no answer instead of full list
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.
You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:
sub array_diff(\#\#) {
my %e = map { $_ => undef } #{$_[1]};
return #{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } #{ $_[0] } ), keys %e ] };
}
Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.
There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:
#array1 = qw(1 1 2 2 3 3 4 4 5 5);
#array2 = qw(1 2 3 4 5);
Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in #array1 and not #array2.
What about this?
#array1 = qw( 1 1 2 3 4 5 );
#array2 = qw( 1 1 2 3 4 5 );
I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.
And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?
No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.
Here's a solid solution that assumes duplicates don't matter:
sub array_diff {
my #array1 = #{ shift() };
my #array2 = #{ shift() };
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #array1 ) {
$array1_hash{$element} = #array1;
}
# Same for #array2: This time, use map instead of a loop
map { $array_2{$_} = 1 } #array2;
for my $entry ( #array2 ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array_hash1 != keys %array_hash2 ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:
my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } #array1;
map { $array2_hash{$_} += 2 } #array2;
Now, you can go through each hash and verify that not only do the keys exist, but that their entries match
for my $key ( keys %array1_hash ) {
if ( not exists $array2_hash{$key}
or $array1_hash{$key} != $array2_hash{$key} ) {
return 1; #Arrays differ
}
}
You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:
if ( keys %array2_hash != keys %array1_hash ) {
return 1; #Arrays have a different number of keys: Don't match
}
else {
return; #Arrays have the same keys: They do match
}
You can use this for getting diffrence between two arrays
#!/usr/bin/perl -w
use strict;
my #list1 = (1, 2, 3, 4, 5);
my #list2 = (2, 3, 4);
my %diff;
#diff{ #list1 } = undef;
delete #diff{ #list2 };
You want to compare each element of #x against the element of the same index in #y, right? This will do it.
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n"
for grep { $x[$_] != $y[$_] } 0 .. $#x;
...or...
foreach( 0 .. $#x ) {
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n" if $x[$_] != $y[$_];
}
Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)
n + n log n algorithm, if sure that elements are unique in each array (as hash keys)
my %count = ();
foreach my $element (#array1, #array2) {
$count{$element}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,
my %count = ();
foreach (#array1) {
$count{$_} = 1 ;
};
foreach (#array2) {
$count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
return 'Some element of array1 does not appears in array2'
} else {
return 'All elements of array1 are in array2'.
}
# N + N log N
my #a = (1,2,3);
my #b=(2,3,1);
print "Equal" if grep { $_ ~~ #b } #a == #b;
Not elegant, but easy to understand:
#!/usr/local/bin/perl
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my #file1lines = split/\n/,`cat $file1`;
my #file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(#file1lines){
$lines{$file1line}+=1;
}
foreach my $file2line(#file2lines){
$lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
if($value == 1){
print "$key is in only $file1\n";
}elsif($value == 2){
print "$key is in only $file2\n";
}elsif($value == 3){
print "$key is in both $file1 and $file2\n";
}
}
exit;
__END__
Try to use List::Compare. IT has solutions for all the operations that can be performed on arrays.

In Perl, how can I iterate over the Cartesian product of multiple sets?

Given x number of arrays, each with a possibly different number of elements, how can I iterate through all combinations where I select one item from each array?
Example:
[ ] [ ] [ ]
foo cat 1
bar dog 2
baz 3
4
Returns
[foo] [cat] [ 1 ]
[foo] [cat] [ 2 ]
...
[baz] [dog] [ 4 ]
I'm doing this in Perl, btw.
My Set::CrossProduct module does exactly what you want. Note that you aren't really looking for permutations, which is the ordering of the elements in a set. You're looking for the cross product, which is the combinations of elements from different sets.
My module gives you an iterator, so you don't create it all in memory. You create a new tuple only when you need it.
use Set::Crossproduct;
my $iterator = Set::CrossProduct->new(
[
[qw( foo bar baz )],
[qw( cat dog )],
[qw( 1 2 3 4 )],
]
);
while( my $tuple = $iterator->get ) {
say join ' ', $tuple->#*;
}
A simple recursive solution for an arbitrary number of lists:
sub permute {
my ($first_list, #remain) = #_;
unless (defined($first_list)) {
return []; # only possibility is the null set
}
my #accum;
for my $elem (#$first_list) {
push #accum, (map { [$elem, #$_] } permute(#remain));
}
return #accum;
}
A not-so-simple non-recursive solution for an arbitrary number of lists:
sub make_generator {
my #lists = reverse #_;
my #state = map { 0 } #lists;
return sub {
my $i = 0;
return undef unless defined $state[0];
while ($i < #lists) {
$state[$i]++;
last if $state[$i] < scalar #{$lists[$i]};
$state[$i] = 0;
$i++;
}
if ($i >= #state) {
## Sabotage things so we don't produce any more values
$state[0] = undef;
return undef;
}
my #out;
for (0..$#state) {
push #out, $lists[$_][$state[$_]];
}
return [reverse #out];
};
}
my $gen = make_generator([qw/foo bar baz/], [qw/cat dog/], [1..4]);
while ($_ = $gen->()) {
print join(", ", #$_), "\n";
}
Recursive and more-fluent Perl examples (with commentary and documentation) for doing the Cartesian product can be found at http://www.perlmonks.org/?node_id=7366
Example:
sub cartesian {
my #C = map { [ $_ ] } #{ shift #_ };
foreach (#_) {
my #A = #$_;
#C = map { my $n = $_; map { [ $n, #$_ ] } #C } #A;
}
return #C;
}
You can use nested loops.
for my $e1 (qw( foo bar baz )) {
for my $e2 (qw( cat dog )) {
for my $e3 (qw( 1 2 3 4 )) {
my #choice = ($e1, $e2, $e3);
...
}}}
When you need an arbitrary number of nested loops, you can use Algorithm::Loops's NestedLoops.
use Algorithm::Loops qw( NestedLoops );
my #lists = (
[qw( foo bar baz )],
[qw( cat dog )],
[qw( 1 2 3 4 )],
);
my $iter = NestedLoops(\#lists);
while ( my #choice = $iter->() ) {
...
}
There's one method I thought of first that uses a couple for loops and no recursion.
find total number of permutations
loop from 0 to total_permutations-1
observe that, by taking the loop index modulus the number of elements in an array, you can get every permutations
Example:
Given A[3], B[2], C[3],
for (index = 0..totalpermutations) {
print A[index % 3];
print B[(index / 3) % 2];
print C[(index / 6) % 3];
}
where of course a for loop can be substituted to loop over [A B C ...], and a small part can be memoized. Of course, recursion is neater, but this might be useful for languages in which recursion is severely limited by stack size.