print out unique lines in Perl

print out unique lines in Perl - perl

I've been challenged with a mathematics problem. Given ten numbers (in this case, numbers from 1 to 10), how many unique combinations of six numbers are there? The short answer is 210. However, I would like to know what these combinations are.
I have put the following code together. The first while loop works fine to create a lot of sorted combinations, however, I haven't been able to print out only the unique line combinations. How can I print out these unique lines?
my %hash;
my #sorted_numbers;
# make all permutations with numbers from 1 to 10
my $permutor = List::Permutor->new (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
while ( my #permutation = $permutor->next() ) {
my $string = "#permutation";
my ($N1, $N2, $N3, $N4, $N5, $N6, $N7, $N8, $N9, $N10) = split (/ /, $string);
# chose only the first six numbers and sort them in ascending order
my #numbers = ($N1, $N2, $N3, $N4, $N5, $N6);
#sorted_numbers = sort {$a <=> $b} #numbers;
}
# print out the unique number combinations from the temp file
my #unique_combinations = uniq #sorted_numbers;
foreach ( #unique_combinations ) {
print $_, "\n";
}

There are probably other CPAN modules for this, but here's one way:
use Math::Combinatorics qw(combine);
my #comb = combine(6, 1..10);

(this smells a lot like a homework problem, so I'm only going to give you a hint)
On each iteration you need to store #sorted_numbers some place, e.g.:
while (my #permutation = $permutor->next) {
...
#sorted_numbers = sort { $a <= > $b } #numbers;
push(#combinations, ...);
}
my #unique_combinations = uniq #combinations;
foreach (#unique_combinations) { ... }
So you have to figure out what to push onto the list #combinations so that the call to uniq will do what you want.
Some other pointers:
(1,2,3,4,5,6,7,8,9,10) may be written (1..10)
You can compute #numbers directly from #permutation with an array slice:
my #numbers = #permutation[0..5];

Related

Selecting highest count of element except when...

So i have been working on this perl script that will analyze and count the same letters in different line spaces. I have implemented the count to a hash but am having trouble excluding a " - " character from the output results of this hash. I tried using delete command or next if, but am not getting rid of the - count in the output.
So with this input:
#extract = ------------------------------------------------------------------MGG-------------------------------------------------------------------------------------
And following code:
#Count selected amino acids.
my %counter = ();
foreach my $extract(#extract) {
#next if $_ =~ /\-/; #This line code does not function correctly.
$counter{$_}++;
}
sub largest_value_mem (\%) {
my $counter = shift;
my ($key, #keys) = keys %$counter;
my ($big, #vals) = values %$counter;
for (0 .. $#keys) {
if ($vals[$_] > $big) {
$big = $vals[$_];
$key = $keys[$_];
}
}
$key
}
I expect the most common element to be G, same as the output. If there is a tie in the elements, say G = M, if there is a way to display both in that would be great but not necessary. Any tips on how to delete or remove the '-' is much appreciated. I am slowly learning perl language.
Please let me know if what I am asking is not clear or if more information is needed, thanks again kindly for all the comments.

Your data doesn't entirely make sense, since it's not actually working perl code. I'm guessing that it's a string divided into characters. After that it sounds like you just want to be able to find the highest frequency character, which is essentially just a sort by descending count.
Therefore the following demonstrates how to count your characters and then sort the results:
use strict;
use warnings;
my $str = '------------------------------------------------------------------MGG-------------------------------------------------------------------------------------';
my #chars = split '', $str;
#Count Characteres
my %count;
$count{$_}++ for #chars;
delete $count{'-'}; # Don't count -
# Sort keys by count descending
my #keys = sort {$count{$b} <=> $count{$a}} keys %count;
for my $key (#keys) {
print "$key $count{$key}\n";
}
Outputs:
G 2
M 1

foreach my $extract(#extract) {
#next if $_ =~ /\-/
$_ setting is suppressed by $extract here.
(In this case, $_ keeps value from above, e.g. routine argument list, previous match, etc.)
Also, you can use character class for better readability:
next if $extract=~/[-]/;

How can I get this basic Perl sub program that sorts to work properly?

I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;

When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?

With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.

sorting an array on the first number found in each element

I'm looking for help sorting an array where each element is made up of "a number, then a string, then a number". I would like to sort on the first number part of the array elements, descending (so that I list the higher numbers first), while also listing the text etc.
am still a beginner so alternatives to the below are also welcome
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/$i,/g; # maybe try to make a string only once then search trough it... ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
}
#for (#count2) {print "$_\n";}
# try to add up all numbers in the first coloum to make sure they == 100
#sort #count2 and print the top 7
#count2 = sort {$b <=> $a} #count2; # try to stop printout of this, or sort on =~ m/^anumber/ ??? or just on the first one or two \d
foreach my $i (0..6) {
print $count2[$i] ."\n"; # seems to be sorted right anyway
}

First, store your data in an array, not in a string:
# inside the first loop, replace your line with the push() with this one:
push(#count2, [$count1, $i];
Then you can easily sort by the first element of each subarray:
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
And when you print it, construct the string:
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
See also: http://perldoc.perl.org/perlreftut.html, perlfaq4

Taking your requirements as is. You're probably better off not embedding count information in a string. However, I'll take it as a learning exercise.
Note, I am trading memory for brevity and likely speed by using a hash to do the counting.
However, the sort could be optimized by using a Schwartzian Transform.
EDIT: Create results array using only numbers that were drawn
#!/usr/bin/perl
use strict; use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 );
my %counts;
++$counts{$_} for #arr;
my #result = map sprintf('%d times for %d', $counts{$_}, $_),
sort {$counts{$a} <=> $counts{$b}} keys %counts;
print "$_\n" for #result;
However, I'd probably have done something like this:
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99; #initialize #arr capacity to 100 elements
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1); # pick a number
$arr[ $i ] = $n; # store it
++$counts{ $n }; # update count
}
# sort keys according to counts, keys of %counts has only the numbers drawn
# for each number drawn, create an anonymous array ref where the first element
# is the number drawn, and the second element is the number of times it was drawn
# and put it in the #result array
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
print Dump \#result;

How to remove array elements equal to some element in a second array in perl

Just wonder if I am given two arrays, A and B, how to remove/delete those elements in A that can also be found in B? What is the most efficient way of doing this?
And also, as a special case, if B is the resulting array after grep on A, how to do this? Of course, in this case, we can do a grep on the negated condition. But is there something like taking a complement of an array with respect to another in perl?
Thank you.

Any time you are thinking of found in you are probably looking for a hash. In this case, you would create a hash of your B values. Then you would grep A, checking the hash for each element.
my #A = 1..9;
my #B = (2, 4, 6, 8);
my %B = map {$_ => 1} #B;
say join ' ' => grep {not $B{$_}} #A; # 1 3 5 7 9
As you can see, perl is not normally maintaining any sort of found in table by itself,
so you have to provide one. The above code could easily be wrapped into a function, but for efficiency, it is best done inline.

Have a look at the none, all, part, notall methods available via List::MoreUtils. You can perform pretty much any set operation using the methods available in this module.
There's a good tutorial available at Perl Training Australia

If you ask for most efficient way:
my #A = 1..9;
my #B = (2, 4, 6, 8);
my %x;
#x{#B} = ();
my #AminusB = grep !exists $x{$_}, #A;
But you will notice difference between mine and Eric Strom's solution only for bigger inputs.
You can find handy this functional approach:
sub complementer {
my %x;
#x{#_} = ();
return sub { grep !exists $x{$_}, #_ };
}
my $c = complementer(2, 4, 6, 8);
print join(',', $c->(#$_)), "\n" for [1..9], [2..10], ...;
# you can use it directly of course
print join(' ', complementer(qw(a c e g))->('a'..'h')), "\n";

You're probably better off with the hash, but you could also use smart matching. Stealing Eric Strom's example,
my #A = 1..9;
my #B = (2, 4, 6, 8);
say join ' ' => grep {not $_ ~~ #B } #A; # 1 3 5 7 9

Again, you're probably better off with the hash, but you could also use Perl6::Junction. Again stealing Eric Strom's example,
use Perl6::Junction qw(none);
my #A = 1..9;
my #B = (2, 4, 6, 8);
say join ' ' => grep {none(#B) == $_} #A; # 1 3 5 7 9

As already mentioned by Eric Strom, whenever you need to search for something specific, it's always easier if you have a hash.
Eric has a nicer solution, but can be difficult to understand. I hope mine is easier to understand.
# Create a B Hash
my %BHash;
foreach my $element (#B) {
$BHash{$element} = 1;
}
# Go through #A element by element and delete duplicates
my $index = 0;
foreach my $element (#A) {
if (exists $BHash{$element}) {
splice #A, $index, 1; #Deletes $A[$index]
$index = $index + 1;
}
}
In the first loop, we simply create a hash that is keyed by the elements in #B.
In the second loop, we go through each element in #A, while keeping track of the index in #A.

How do I remove duplicate items from an array in Perl?

I have an array in Perl:
my #my_array = ("one","two","three","two","three");
How do I remove the duplicates from the array?

You can do something like this as demonstrated in perlfaq4:
sub uniq {
my %seen;
grep !$seen{$_}++, #_;
}
my #array = qw(one two three two three);
my #filtered = uniq(#array);
print "#filtered\n";
Outputs:
one two three
If you want to use a module, try the uniq function from List::MoreUtils

The Perl documentation comes with a nice collection of FAQs. Your question is frequently asked:
% perldoc -q duplicate
The answer, copy and pasted from the output of the command above, appears below:
Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
How can I remove duplicate elements from a list or array?
(contributed by brian d foy)
Use a hash. When you think the words "unique" or "duplicated", think
"hash keys".
If you don't care about the order of the elements, you could just create the hash then extract the keys. It's not important how you create that hash: just that you use "keys" to get the unique elements.
my %hash = map { $_, 1 } #array;
# or a hash slice: #hash{ #array } = ();
# or a foreach: $hash{$_} = 1 foreach ( #array );
my #unique = keys %hash;
If you want to use a module, try the "uniq" function from
"List::MoreUtils". In list context it returns the unique elements, preserving their order in the list. In scalar context, it returns the number of unique elements.
use List::MoreUtils qw(uniq);
my #unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
element, that element has no key in %Seen. The "next" statement creates
the key and immediately uses its value, which is "undef", so the loop
continues to the "push" and increments the value for that key. The next
time the loop sees that same element, its key exists in the hash and
the value for that key is true (since it's not 0 or "undef"), so the
next skips that iteration and the loop goes to the next element.
my #unique = ();
my %seen = ();
foreach my $elem ( #array )
{
next if $seen{ $elem }++;
push #unique, $elem;
}
You can write this more briefly using a grep, which does the same thing.
my %seen = ();
my #unique = grep { ! $seen{ $_ }++ } #array;

Install List::MoreUtils from CPAN
Then in your code:
use strict;
use warnings;
use List::MoreUtils qw(uniq);
my #dup_list = qw(1 1 1 2 3 4 4);
my #uniq_list = uniq(#dup_list);

My usual way of doing this is:
my %unique = ();
foreach my $item (#myarray)
{
$unique{$item} ++;
}
my #myuniquearray = keys %unique;
If you use a hash and add the items to the hash. You also have the bonus of knowing how many times each item appears in the list.

Can be done with a simple Perl one-liner.
my #in=qw(1 3 4 6 2 4 3 2 6 3 2 3 4 4 3 2 5 5 32 3); #Sample data
my #out=keys %{{ map{$_=>1}#in}}; # Perform PFM
print join ' ', sort{$a<=>$b} #out;# Print data back out sorted and in order.
The PFM block does this:
Data in #in is fed into map. map builds an anonymous hash. keys are extracted from the hash and feed into #out

Method 1: Use a hash
Logic: A hash can have only unique keys, so iterate over array, assign any value to each element of array, keeping element as key of that hash. Return keys of the hash, its your unique array.
my #unique = keys {map {$_ => 1} #array};
Method 2: Extension of method 1 for reusability
Better to make a subroutine if we are supposed to use this functionality multiple times in our code.
sub get_unique {
my %seen;
grep !$seen{$_}++, #_;
}
my #unique = get_unique(#array);
Method 3: Use module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique = uniq(#array);

The variable #array is the list with duplicate elements
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;

That last one was pretty good. I'd just tweak it a bit:
my #arr;
my #uniqarr;
foreach my $var ( #arr ){
if ( ! grep( /$var/, #uniqarr ) ){
push( #uniqarr, $var );
}
}
I think this is probably the most readable way to do it.

Previous answers pretty much summarize the possible ways of accomplishing this task.
However, I suggest a modification for those who don't care about counting the duplicates, but do care about order.
my #record = qw( yeah I mean uh right right uh yeah so well right I maybe );
my %record;
print grep !$record{$_} && ++$record{$_}, #record;
Note that the previously suggested grep !$seen{$_}++ ... increments $seen{$_} before negating, so the increment occurs regardless of whether it has already been %seen or not. The above, however, short-circuits when $record{$_} is true, leaving what's been heard once 'off the %record'.
You could also go for this ridiculousness, which takes advantage of autovivification and existence of hash keys:
...
grep !(exists $record{$_} || undef $record{$_}), #record;
That, however, might lead to some confusion.
And if you care about neither order or duplicate count, you could for another hack using hash slices and the trick I just mentioned:
...
undef #record{#record};
keys %record; # your record, now probably scrambled but at least deduped

Try this, seems the uniq function needs a sorted list to work properly.
use strict;
# Helper function to remove duplicates in a list.
sub uniq {
my %seen;
grep !$seen{$_}++, #_;
}
my #teststrings = ("one", "two", "three", "one");
my #filtered = uniq #teststrings;
print "uniq: #filtered\n";
my #sorted = sort #teststrings;
print "sort: #sorted\n";
my #sortedfiltered = uniq sort #teststrings;
print "uniq sort : #sortedfiltered\n";

Using concept of unique hash keys :
my #array = ("a","b","c","b","a","d","c","a","d");
my %hash = map { $_ => 1 } #array;
my #unique = keys %hash;
print "#unique","\n";
Output:
a c b d