How do I remove duplicate items from an array in Perl?

How do I remove duplicate items from an array in Perl? - perl

I have an array in Perl:
my #my_array = ("one","two","three","two","three");
How do I remove the duplicates from the array?

You can do something like this as demonstrated in perlfaq4:
sub uniq {
my %seen;
grep !$seen{$_}++, #_;
}
my #array = qw(one two three two three);
my #filtered = uniq(#array);
print "#filtered\n";
Outputs:
one two three
If you want to use a module, try the uniq function from List::MoreUtils

The Perl documentation comes with a nice collection of FAQs. Your question is frequently asked:
% perldoc -q duplicate
The answer, copy and pasted from the output of the command above, appears below:
Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
How can I remove duplicate elements from a list or array?
(contributed by brian d foy)
Use a hash. When you think the words "unique" or "duplicated", think
"hash keys".
If you don't care about the order of the elements, you could just create the hash then extract the keys. It's not important how you create that hash: just that you use "keys" to get the unique elements.
my %hash = map { $_, 1 } #array;
# or a hash slice: #hash{ #array } = ();
# or a foreach: $hash{$_} = 1 foreach ( #array );
my #unique = keys %hash;
If you want to use a module, try the "uniq" function from
"List::MoreUtils". In list context it returns the unique elements, preserving their order in the list. In scalar context, it returns the number of unique elements.
use List::MoreUtils qw(uniq);
my #unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
element, that element has no key in %Seen. The "next" statement creates
the key and immediately uses its value, which is "undef", so the loop
continues to the "push" and increments the value for that key. The next
time the loop sees that same element, its key exists in the hash and
the value for that key is true (since it's not 0 or "undef"), so the
next skips that iteration and the loop goes to the next element.
my #unique = ();
my %seen = ();
foreach my $elem ( #array )
{
next if $seen{ $elem }++;
push #unique, $elem;
}
You can write this more briefly using a grep, which does the same thing.
my %seen = ();
my #unique = grep { ! $seen{ $_ }++ } #array;

Install List::MoreUtils from CPAN
Then in your code:
use strict;
use warnings;
use List::MoreUtils qw(uniq);
my #dup_list = qw(1 1 1 2 3 4 4);
my #uniq_list = uniq(#dup_list);

My usual way of doing this is:
my %unique = ();
foreach my $item (#myarray)
{
$unique{$item} ++;
}
my #myuniquearray = keys %unique;
If you use a hash and add the items to the hash. You also have the bonus of knowing how many times each item appears in the list.

Can be done with a simple Perl one-liner.
my #in=qw(1 3 4 6 2 4 3 2 6 3 2 3 4 4 3 2 5 5 32 3); #Sample data
my #out=keys %{{ map{$_=>1}#in}}; # Perform PFM
print join ' ', sort{$a<=>$b} #out;# Print data back out sorted and in order.
The PFM block does this:
Data in #in is fed into map. map builds an anonymous hash. keys are extracted from the hash and feed into #out

Method 1: Use a hash
Logic: A hash can have only unique keys, so iterate over array, assign any value to each element of array, keeping element as key of that hash. Return keys of the hash, its your unique array.
my #unique = keys {map {$_ => 1} #array};
Method 2: Extension of method 1 for reusability
Better to make a subroutine if we are supposed to use this functionality multiple times in our code.
sub get_unique {
my %seen;
grep !$seen{$_}++, #_;
}
my #unique = get_unique(#array);
Method 3: Use module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique = uniq(#array);

The variable #array is the list with duplicate elements
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;

That last one was pretty good. I'd just tweak it a bit:
my #arr;
my #uniqarr;
foreach my $var ( #arr ){
if ( ! grep( /$var/, #uniqarr ) ){
push( #uniqarr, $var );
}
}
I think this is probably the most readable way to do it.

Previous answers pretty much summarize the possible ways of accomplishing this task.
However, I suggest a modification for those who don't care about counting the duplicates, but do care about order.
my #record = qw( yeah I mean uh right right uh yeah so well right I maybe );
my %record;
print grep !$record{$_} && ++$record{$_}, #record;
Note that the previously suggested grep !$seen{$_}++ ... increments $seen{$_} before negating, so the increment occurs regardless of whether it has already been %seen or not. The above, however, short-circuits when $record{$_} is true, leaving what's been heard once 'off the %record'.
You could also go for this ridiculousness, which takes advantage of autovivification and existence of hash keys:
...
grep !(exists $record{$_} || undef $record{$_}), #record;
That, however, might lead to some confusion.
And if you care about neither order or duplicate count, you could for another hack using hash slices and the trick I just mentioned:
...
undef #record{#record};
keys %record; # your record, now probably scrambled but at least deduped

Try this, seems the uniq function needs a sorted list to work properly.
use strict;
# Helper function to remove duplicates in a list.
sub uniq {
my %seen;
grep !$seen{$_}++, #_;
}
my #teststrings = ("one", "two", "three", "one");
my #filtered = uniq #teststrings;
print "uniq: #filtered\n";
my #sorted = sort #teststrings;
print "sort: #sorted\n";
my #sortedfiltered = uniq sort #teststrings;
print "uniq sort : #sortedfiltered\n";

Using concept of unique hash keys :
my #array = ("a","b","c","b","a","d","c","a","d");
my %hash = map { $_ => 1 } #array;
my #unique = keys %hash;
print "#unique","\n";
Output:
a c b d

Related

I need help regarding perl hash of array

I have a hash of array and two variables -1 list and 1-scalar value
How do I do this?
I want two things out of this. First is a list for every key.
Second is I need $b to have the value of last array element of every key
%abc=(
0=>[1,2,3],
1=>[1,5]
);
#a;
$b;
for key 0 i need #a to have [1,2] and for key 1 i need #a to have [1].
for 0 key i need $b to have value 3 and for key 1 i need $b to have value 5

I understand you want #a to hold all values but last, and $b to just hold the last value. How about:
use feature 'say';
my %abc = (0 => [1,2,3], 1 => [1,5]);
for my $key (keys %abc) {
my #a = #{$abc{$key}};
my $b = pop #a;
say "#a / $b"
}

I'd almost certainly use something like the other solution here. But here's a different solution that a) uses values() instead of keys() and b) uses reverse() to simplify(?!) the assignments. Don't do this :-)
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my %abc = (0 => [1,2,3], 1 => [1,5]);
for (values %abc) {
my ($b, #a) = reverse #$_;
#a = reverse #a;
say "#a / $b";
}

Perl how to compare hash value against a list and returns keys matching

Consider this case:
I have a list:
#Value2CompareList
And I have a hash table:
%Hash2Check
Now I want to achieve the following:
if($Hash2Check{$keys} eq $ElementFromArray) {return matching keys}
How do I do it in a faster way without Loop?
Thanks folks!

You can use grep inside a grep:
#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #Value2CompareList = qw( a b c d e f );
my %Hash2Check = (
A => 'a',
B => 'b',
C => 'c');
say for grep {
my $k = $_;
grep $Hash2Check{$k} eq $_, #Value2CompareList
} keys %Hash2Check;
It's complex because the data structure is not suitable for what you need. An inverted hash would be much better:
my %Inverted = reverse %Hash2Check;
say for grep defined, #Inverted{#Value2CompareList};
This only works if the values are unique. If not, you need to create a hash of arrays:
my %Inverted;
while (my ($k, $v) = each %Hash2Check) {
push #{ $Inverted{$v} }, $k;
}
say for map #$_, grep defined, #Inverted{#Value2CompareList};

You cannot do it without a loop. You are iterating a list, and that's a loop.
But you can pretend a little, and use grep:
my #matches = grep { exists $Hash2Check{$_} } #Value2CompareList;
But be under no illusions - grep is still iterating the list. Each iteration, it sets $_ to the current element, and checks if it exists in the hash.

How can I get this basic Perl sub program that sorts to work properly?

I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;

When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?

With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.

How to read and print the common elements of multiple array in perl?

I have stored values in 5 text files. The values in each text file should be considered as an array. I am trying to write a perl program, to read and print the common elements in these 5 arrays.
For Instance
#a1=(1,7,4,5);
#a2=(1,9,4,5);
#a3=qw(1,6,4,5 );
#a4=qw(1 2 4 5 );
#a5=qw(1 2 4 5 );
I expect to print
1 4 5

The perlfaq has lots of answer to questions that are frequently asked. Of course it's all a bit of a waste of time and effort if no-one bothers to check there before asking the question again :-)
How do I compute the difference of two arrays? How do I compute the
intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each
element is unique in a given array:
my (#union, #intersection, #difference);
my %count = ();
foreach my $element (#array1, #array2) { $count{$element}++ }
foreach my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need the intersection of two arrays. And then do it three more times.

You don't say what format your input files have, but this program will find all digit strings in each file and list the values common to all of them.
The list of input files is expected as command-line arguments.
use strict;
use warnings;
use File::Slurp 'read_file';
my %counts;
for (#ARGV) {
$counts{$_}++ for map /\d+/g, read_file $_;
}
my #common = grep $counts{$_} == #ARGV, keys %counts;
printf "(%s)\n", join ', ', #common;
output
(4, 1, 5)

sorting an array on the first number found in each element

I'm looking for help sorting an array where each element is made up of "a number, then a string, then a number". I would like to sort on the first number part of the array elements, descending (so that I list the higher numbers first), while also listing the text etc.
am still a beginner so alternatives to the below are also welcome
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/$i,/g; # maybe try to make a string only once then search trough it... ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
}
#for (#count2) {print "$_\n";}
# try to add up all numbers in the first coloum to make sure they == 100
#sort #count2 and print the top 7
#count2 = sort {$b <=> $a} #count2; # try to stop printout of this, or sort on =~ m/^anumber/ ??? or just on the first one or two \d
foreach my $i (0..6) {
print $count2[$i] ."\n"; # seems to be sorted right anyway
}

First, store your data in an array, not in a string:
# inside the first loop, replace your line with the push() with this one:
push(#count2, [$count1, $i];
Then you can easily sort by the first element of each subarray:
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
And when you print it, construct the string:
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
See also: http://perldoc.perl.org/perlreftut.html, perlfaq4

Taking your requirements as is. You're probably better off not embedding count information in a string. However, I'll take it as a learning exercise.
Note, I am trading memory for brevity and likely speed by using a hash to do the counting.
However, the sort could be optimized by using a Schwartzian Transform.
EDIT: Create results array using only numbers that were drawn
#!/usr/bin/perl
use strict; use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 );
my %counts;
++$counts{$_} for #arr;
my #result = map sprintf('%d times for %d', $counts{$_}, $_),
sort {$counts{$a} <=> $counts{$b}} keys %counts;
print "$_\n" for #result;
However, I'd probably have done something like this:
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99; #initialize #arr capacity to 100 elements
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1); # pick a number
$arr[ $i ] = $n; # store it
++$counts{ $n }; # update count
}
# sort keys according to counts, keys of %counts has only the numbers drawn
# for each number drawn, create an anonymous array ref where the first element
# is the number drawn, and the second element is the number of times it was drawn
# and put it in the #result array
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
print Dump \#result;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How do I remove duplicate items from an array in Perl? - perl

I have an array in Perl: my #my_array = ("one","two","three","two","three"); How do I remove the duplicates from the array?

You can do something like this as demonstrated in perlfaq4: sub uniq { my %seen; grep !$seen{$_}++, #_; } my #array = qw(one two three two three); my #filtered = uniq(#array); print "#filtered\n"; Outputs: one two three If you want to use a module, try the uniq function from List::MoreUtils

Install List::MoreUtils from CPAN Then in your code: use strict; use warnings; use List::MoreUtils qw(uniq); my #dup_list = qw(1 1 1 2 3 4 4); my #uniq_list = uniq(#dup_list);

My usual way of doing this is: my %unique = (); foreach my $item (#myarray) { $unique{$item} ++; } my #myuniquearray = keys %unique; If you use a hash and add the items to the hash. You also have the bonus of knowing how many times each item appears in the list.

The variable #array is the list with duplicate elements %seen=(); #unique = grep { ! $seen{$_} ++ } #array;

That last one was pretty good. I'd just tweak it a bit: my #arr; my #uniqarr; foreach my $var ( #arr ){ if ( ! grep( /$var/, #uniqarr ) ){ push( #uniqarr, $var ); } } I think this is probably the most readable way to do it.

Using concept of unique hash keys : my #array = ("a","b","c","b","a","d","c","a","d"); my %hash = map { $_ => 1 } #array; my #unique = keys %hash; print "#unique","\n"; Output: a c b d

Related

I need help regarding perl hash of array

Perl how to compare hash value against a list and returns keys matching

How can I get this basic Perl sub program that sorts to work properly?

How to read and print the common elements of multiple array in perl?

sorting an array on the first number found in each element

Categories

Resources