How do I print unique elements in Perl array? - perl

I'm pushing elements into an array during a while statement. Each element is a teacher's name. There ends up being duplicate teacher names in the array when the loop finishes. Sometimes they are not right next to each other in the array, sometimes they are.
How can I print only the unique values in that array after its finished getting values pushed into it? Without having to parse the entire array each time I want to print an element.
Heres the code after everything has been pushed into the array:
$faculty_len = #faculty;
$i=0;
while ($i != $faculty_len)
{
printf $fh '"'.$faculty[$i].'"';
$i++;
}

use List::MoreUtils qw/ uniq /;
my #unique = uniq #faculty;
foreach ( #unique ) {
print $_, "\n";
}

Your best bet would be to use a (basically) built-in tool, like uniq (as described by innaM).
If you don't have the ability to use uniq and want to preserve order, you can use grep to simulate that.
my %seen;
my #unique = grep { ! $seen{$_}++ } #faculty;
# printing, etc.
This first gives you a hash where each key is each entry. Then, you iterate over each element, counting how many of them there are, and adding the first one. (Updated with comments by brian d foy)

I suggest pushing it into a hash.
like this:
my %faculty_hash = ();
foreach my $facs (#faculty) {
$faculty_hash{$facs} = 1;
}
my #faculty_unique = keys(%faculty_hash);

#array1 = ("abc", "def", "abc", "def", "abc", "def", "abc", "def", "xyz");
#array1 = grep { ! $seen{ $_ }++ } #array1;
print "#array1\n";

This question is answered with multiple solutions in perldoc. Just type at command line:
perldoc -q duplicate

Please note: Some of the answers containing a hash will change the ordering of the array. Hashes dont have any kind of order, so getting the keys or values will make a list with an undefined ordering.
This doen't apply to grep { ! $seen{$_}++ } #faculty

This is a one liner command to print unique lines in order it appears.
perl -ne '$seen{$_}++ || print $_' fileWithDuplicateValues

I just found hackneyed 3 liner, enjoy
my %uniq;
undef #uniq(#non_uniq_array);
my #uniq_array = keys %uniq;

Just another way to do it, useful only if you don't care about order:
my %hash;
#hash{#faculty}=1;
my #unique=keys %hash;
If you want to avoid declaring a new variable, you can use the somehow underdocumented global variable %_
#_{#faculty}=1;
my #unique=keys %_;

If you need to process the faculty list in any way, a map over the array converted to a hash for key coalescing and then sorting keys is another good way:
my #deduped = sort keys %{{ map { /.*/? ($_,1):() } #faculty }};
print join("\n", #deduped)."\n";
You process the list by changing the /.*/ regex for selecting or parsing and capturing accordingly, and you can output one or more mutated, non-unique keys per pass by making ($_,1):() arbitrarily complex.
If you need to modify the data in-flight with a substitution regex, say to remove dots from the names (s/\.//g), then a substitution according to the above pattern will mutate the original #faculty array due to $_ aliasing. You can get around $_ aliasing by making an anonymous copy of the #faculty array (see the so-called "baby cart" operator):
my #deduped = sort keys %{{ map {/.*/? do{s/\.//g; ($_,1)}:()} #{[ #faculty ]} }};
print join("\n", #deduped)."\n";
print "Unmolested array:\n".join("\n", #faculty)."\n";
In more recent versions of Perl, you can pass keys a hashref, and you can use the non-destructive substitution:
my #deduped = sort keys { map { /.*/? (s/\.//gr,1):() } #faculty };
Otherwise, the grep or $seen[$_]++ solutions elsewhere may be preferable.

Related

Perl searching for string contained in array

I have an array with the following values:
push #fruitArray, "apple|0";
push #fruitArray, "apple|1";
push #fruitArray, "pear|0";
push #fruitArray, "pear|0";
I want to find out if the string "apple" exists in this array (ignoring the "|0" "|1")
I am using:
$fruit = 'apple';
if( $fruit ~~ #fruitArray ){ print "I found apple"; }
Which isn't working.
Don't use smart matching. It never worked properly for a number of reasons and it is now marked as experimental
In this case you can use grep instead, together with an appropriate regex pattern
This program tests every element of #fruitArray to see if it starts with the letters in $fruit followed by a pipe character |. grep returns the number of elements that matched the pattern, which is a true value if at least one matched
my #fruitArray = qw/ apple|0 apple|1 pear|0 pear|0 /;
my $fruit = 'apple';
print "I found $fruit\n" if grep /^$fruit\|/, #fruitArray;
output
I found apple
I - like #Borodin suggests, too - would simply use grep():
$fruit = 'apple';
if (grep(/^\Q$fruit\E\|/, #fruitArray)) { print "I found apple"; }
which outputs:
I found apple
\Q...\E converts your string into a regex pattern.
Looking for the | prevents finding a fruit whose name starts with the name of the fruit for which you are looking.
Simple and effective... :-)
Update: to remove elements from array:
$fruit = 'apple';
#fruitsArrayWithoutApples = grep ! /^\Q$fruit\E|/, #fruitArray;
If your Perl is not ancient, you can use the first subroutine from the List::Util module (which became a core module at Perl 5.8) to do the check efficiently:
use List::Util qw{ first };
my $first_fruit = first { /\Q$fruit\E/ } #fruitArray;
if ( defined $first_fruit ) { print "I found $fruit\n"; }
Don't use grep, that will loop the entire array, even if it finds what you are looking for in the first index, so it is inefficient.
this will return true if it finds the substring 'apple', then return and not finish iterating through the rest of the array
#takes a reference to the array as the first parameter
sub find_apple{
#array_input = #{$_[0]};
foreach $fruit (#array_input){
if (index($fruit, 'apple') != -1){
return 1;
}
}
}
You can get close to the smartmatch sun without melting your wings by using match::simple:
use match::simple;
my #fruits = qw/apple|0 apple|1 pear|0 pear|0/;
$fruit = qr/apple/ ;
say "found $fruit" if $fruit |M| \#fruits ;
There's also a match() function if the infix [M] doesn't read well.
I like the way match::simple does almost everything I expected from ~~ without any surprising complexity. If you're fluent in perl it probably isn't something you'd see as necessary, but - especially with match() - code can be made pleasantly readable ... at the cost of imposing the use of references, etc.

Case Insensitive Unique Array Elements in Perl

I am using the uniq function exported by the module, List::MoreUtils to find the uniq elements in an array. However, I want it to find the uniq elements in a case insensitive way. How can I do that?
I have dumped the output of the Array using Data::Dumper:
#! /usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use List::MoreUtils qw(uniq);
use feature "say";
my #elements=<array is formed here>;
my #words=uniq #elements;
say Dumper \#words;
Output:
$VAR1 = [
'John',
'john',
'JohN',
'JOHN',
'JoHn',
'john john'
];
Expected output should be: john, john john
Only 2 elements, rest all should be filtered since they are the same word, only the difference is in case.
How can I remove the duplicate elements ignoring the case?
Use lowercase, lc with a map statement:
my #uniq_no_case = uniq map lc, #elements;
The reason List::MoreUtils' uniq is case sensitive is that it relies on the deduping characteristics of hashes, which also is case sensitive. The code for uniq looks like so:
sub uniq {
my %seen = ();
grep { not $seen{$_}++ } #_;
}
If you want to use this sub directly in your own code, you could incorporate lc in there:
sub uniq_no_case {
my %seen = ();
grep { not $seen{$_}++ } map lc, #_;
}
Explanation of how this works:
#_ contains the args to the subroutine, and they are fed to a grep statement. Any elements that return true when passed through the code block are returned by the grep statement. The code block consist of a few finer points:
$seen{$_}++ returns 0 the first time an element is seen. The value is still incremented to 1, but after it is returned (as opposed to ++$seen{$_} who would inc first, then return).
By negating the result of the incrementation, we get true for the first key, and false for every following such key. Hence, the list is deduped.
grep as the last statement in the sub will return a list, which in turn is returned by the sub.
map lc, #_ simply applies the lc function to all elements in #_.
Use a hash to keep track of the words you have already seen, but also normalize them for upper/lower case:
my %seen;
my #unique;
for my $w (#words) {
next if $seen{lc($w)}++;
push(#unique, $w);
}
# #unique has the unique words
Note that this will preserve the case of the original words.
UPDATE: As noted in the comments, it's not clear exactly what the OP needs, but I wrote the solution this way to illustrate a general technique for selecting unique representatives from a list under some "equivalence relation." In this case the equivalence relationship is word $a is equivalent to word $b if and only if lc($a) eq lc($b).
Most equivalence relationships can be expressed in this way, that is, the relationship is defined by a classifier function f() such that $a is equivalent to $b if and only if f($a) eq f($b). For instance, if we want to say that two words are equivalent if they have the same length, then f() would be length().
So now you might see why I wrote the algorithm this way - the classifier function may not produce values that are part of the original list. In the case of f = length, we want to select words, but f of a word is a number.

Grep to find item in Perl array

Every time I input something the code always tells me that it exists. But I know some of the inputs do not exist. What is wrong?
#!/usr/bin/perl
#array = <>;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);
if (grep($match, #array)) {
print "found it\n";
}
The first arg that you give to grep needs to evaluate as true or false to indicate whether there was a match. So it should be:
# note that grep returns a list, so $matched needs to be in brackets to get the
# actual value, otherwise $matched will just contain the number of matches
if (my ($matched) = grep $_ eq $match, #array) {
print "found it: $matched\n";
}
If you need to match on a lot of different values, it might also be worth for you to consider putting the array data into a hash, since hashes allow you to do this efficiently without having to iterate through the list.
# convert array to a hash with the array elements as the hash keys and the values are simply 1
my %hash = map {$_ => 1} #array;
# check if the hash contains $match
if (defined $hash{$match}) {
print "found it\n";
}
You seem to be using grep() like the Unix grep utility, which is wrong.
Perl's grep() in scalar context evaluates the expression for each element of a list and returns the number of times the expression was true.
So when $match contains any "true" value, grep($match, #array) in scalar context will always return the number of elements in #array.
Instead, try using the pattern matching operator:
if (grep /$match/, #array) {
print "found it\n";
}
This could be done using List::Util's first function:
use List::Util qw/first/;
my #array = qw/foo bar baz/;
print first { $_ eq 'bar' } #array;
Other functions from List::Util like max, min, sum also may be useful for you
In addition to what eugene and stevenl posted, you might encounter problems with using both <> and <STDIN> in one script: <> iterates through (=concatenating) all files given as command line arguments.
However, should a user ever forget to specify a file on the command line, it will read from STDIN, and your code will wait forever on input
I could happen that if your array contains the string "hello", and if you are searching for "he", grep returns true, although, "he" may not be an array element.
Perhaps,
if (grep(/^$match$/, #array)) more apt.
You can also check single value in multiple arrays like,
if (grep /$match/, #array, #array_one, #array_two, #array_Three)
{
print "found it\n";
}

Remove duplicates from list of files in perl

I know this should be pretty simple and the shell version is something like:
$ sort example.txt | uniq -u
in order to remove duplicate lines from a file. How would I go about doing this in Perl?
The interesting spin on this question is the uniq -u! I don't think the other answers I've seen tackle this; they deal with sort -u example.txt or (somewhat wastefully) sort example.txt | uniq.
The difference is that the -u option eliminates all occurrences of duplicated lines, so the output is of lines that appear only once.
To tackle this, you need to know how many times each name appears, and then you need to print the names that appear just once. Assuming the list is to be read from standard input, then this code does the trick:
my %counts;
while (<>)
{
chomp;
$counts{$_}++;
}
foreach my $name (sort keys %counts)
{
print "$name\n" if $counts{$name} == 1;
}
Or, using using grep:
my %counts;
while (<>)
{
chomp;
$counts{$_}++;
}
{
local $, = "\n";
print grep { $counts{$_} == 1 } sort keys %counts;
}
Or, if you don't need to remove the newlines (because you're only going to print the names):
my %counts;
$counts{$_}++ for (<>);
print grep { $counts{$_} == 1 } sort keys %counts;
If you do in fact want every name that appears in the input to appear in the output (but only once), then any of the other solutions will do the trick (or, with minimal adaptation, will do the trick). In fact, since the input lines will end with a newline, you can generate the answer in just two lines:
my %counts = map { $_, 1 } <>;
print sort keys %counts;
No, you can't do it in one by simply replacing %counts in the print line with the map in the first line:
print sort keys map { $_, 1 } <>;
You get the error:
Type of arg 1 to keys must be hash or array (not map iterator) at ...
or use 'uniq' sub from List::MoreUtils module after reading all the file to a list (although its not a good solution)
Are you wanting to update a list of files to remove duplicate lines?
Or process a list of files, ignoring duplicate lines?
Or remove duplicate filenames from a list?
Assuming the latter:
my %seen;
#filenames = grep !$seen{$_}++, #filenames;
or other solutions from perldoc -q duplicate
First of all, sort -u xxx.txt would have been smarter than sort | uniq -u.
Second, perl -ne 'print unless $seen{$_}++' is prone to integer overflow, so a more sophisticated way of perl -ne 'if(!$seen{$_}){print;$seen{$_}=1}' seems preferable.

Sorting an Array Reference to Hashes

After executing these lines in Perl:
my $data = `curl '$url'`;
my $pets = XMLin($data)->(pets);
I have an array reference that contains references to hashes:
$VAR1 = [
{
'title' => 'cat',
'count' => '210'
},
{
'title' => 'dog',
'count' => '210'
}
]
In Perl, how do I sort the hashes first by count and secondarily by title. Then print to STDOUT the count followed by the title on each newline.
Assuming you want counts in descending order and titles ascending:
print map join(" ", #$_{qw/ count title /}) . "\n",
sort { $b->{count} <=> $a->{count}
||
$a->{title} cmp $b->{title} }
#$pets;
That's compact code written in a functional style. To help understand it, let's look at equivalent code in a more familiar, imperative style.
Perl's sort operator takes an optional SUBNAME parameter that allows you to factor out your comparison and give it a name that describes what it does. When I do this, I like to begin the sub's name with by_ to make sort by_... ready more naturally.
To start, you might have written
sub by_count_then_title {
$b->{count} <=> $a->{count}
||
$a->{title} cmp $b->{title}
}
my #sorted = sort by_count_then_title #$pets;
Note that no comma follows the SUBNAME in this form!
To address another commenter's question, you could use or rather than || in by_count_then_title if you find it more readable. Both <=> and cmp have higher precedence (which you might think of as binding more tightly) than || and or, so it's strictly a matter of style.
To print the sorted array, a more familiar choice might be
foreach my $p (#sorted) {
print "$p->{count} $p->{title}\n";
}
Perl uses $_ if you don't specify the variable that gets each value, so the following has the same meaning:
for (#sorted) {
print "$_->{count} $_->{title}\n";
}
The for and foreach keywords are synonyms, but I find that the uses above, i.e., foreach if I'm going to name a variable or for otherwise, read most naturally.
Using map, a close cousin of foreach, instead isn't much different:
map print("$_->{count} $_->{title}\n"), #sorted;
You could also promote print through the map:
print map "$_->{count} $_->{title}\n",
#sorted;
Finally, to avoid repetition of $_->{...}, the hash slice #$_{"count", "title"} gives us the values associated with count and title in the loop's current record. Having the values, we need to join them with a single space and append a newline to the result, so
print map join(" ", #$_{qw/ count title /}) . "\n",
#sorted;
Remember that qw// is shorthand for writing a list of strings. As this example shows, read a map expression back-to-front (or bottom-to-top the way I indented it): first sort the records, then format them, then print them.
You could eliminate the temporary #sorted but call the named comparison:
print map join(" ", #$_{qw/ count title /}) . "\n",
sort by_count_then_title
#$pets;
If the application of join is just too verbose for your taste, then
print map "#$_{qw/ count title /}\n",
sort by_count_then_title
#$pets;