Schwartzian transform in Perl? - perl

my #output =
map $_->[0],
sort{$a->[1] <=> $b->[1]}
map [$_,-s $_],
#array;
Can someone explain the code in more detail? I can't get head or tail of it ..

Read from the bottom up:
#array
An array (of filenames, given later usage).
map [$_,-s $_],
For each filename, get a reference to a two element anonymous array, with the first element being the filename and the second element, the byte size of the file. map returns a list of these array references.
sort{$a->[1] <=> $b->[1]}
Sort the list of array references by increasing file size.
map $_->[0],
Turn the list of array references back into a list of filenames, but now in sorted order.
my #output =
Save the list in #output.
This is equivalent in function to:
my #output = sort { -s $a <=> -s $b } #array;
but only gets the size for each file once instead of once per comparison done by the sort.

Wikipedia has a detailed explanation and analysis

Related

Count hash values while using Data::Dumper

I need to find the count of values (ie abc1) in a Perl hash and if > 4 run run an internal command within a IF block. I just need to figure out the concept of how to count # of values.
(I could leave a code sample of what I've attempted but that would just result in uncontrolled laughter and confusion)
I am using Data::Dumper, and utilizing the following format to store key/value in hash.
push #{$hash{$key}}, $val;
A print of hash gives :
$ print Dumper \%hash;
$VAR1 = {
'5555' => [
'abc1',
'abc1',
'abc1'
]
};
Please let me know how to get the count.
Thanks in advance.
Well, do you want to count that particular string, or the number of elements?
my $count = #{$hash{$key}}; # get the size of the array (all elements)
my %num;
for my $val (#{$hash{$key}}) {
$num{$val}++; # count the individual keys
}
print "Number of 'abc1': $num{'abc1'}\n";
The number of values in a hash is the same as the number of keys. What you are after, though, is the number of elements in an array (referenced from a hash value). To get the size of an array, just use it in scalar context. For an array reference, you have to dereference it first:
my $count = #{ $hash{$key} };

Test, if a hash key exists and print values of Hash in perl

If a key exists in a array, I want to print that key and its values from hash. Here is the code I wrote.
for($i=0;$i<#array.length;$i++)
{
if (exists $hash{$array[$i]})
{
print OUTPUT $array[$i],"\n";
}
}
From the above code, I am able to print keys. But I am not sure how to print values of that key.
Can someone help me?
Thanks
#array.length is syntactically legal, but it's definitely not what you want.
#array, in scalar context, gives you the number of elements in the array.
The length function, with no argument, gives you the length of $_.
The . operator performs string concatenation.
So #array.length takes the number of elements in #array and the length of the string contained in $_, treats them as strings, and joins them together. $i < ... imposes a numeric context, so it's likely to be treated as a number -- but surely not the one you want. (If #array has 15 elements and $_ happens to be 7 characters long, the number should be 157, a meaningless value.)
The right way to compute the number of elements in #array is just #array in scalar context -- or, to make it more explicit, scalar #array.
To answer your question, if $array[$i] is a key, the corresponding value is $hash{$array[$i]}.
But a C-style for loop is not the cleanest way to traverse an array, especially if you only need the value, not the index, on each iteration.
foreach my $elem (#array) {
if (exists $hash{$elem}) {
print OUTPUT "$elem\n";
}
}
Some alternative methods using hash slices:
foreach (#hash{#array}) { print OUTPUT "$_\n" if defined };
print OUTPUT join("\n",grep {defined} #hash{#array});
(For those who like golfing).

Perl: Beginner. Which data structure should I use?

Okay, not sure where to ask this, but I'm a beginner programmer, using Perl. I need to create an array of an array, but I'm not sure if it would be better use array/hash references, or array of hashes or hash of arrays etc.
I need an array of matches: #totalmatches
Each match contains 6 elements(strings):
#matches = ($chapternumber, $sentencenumber, $sentence, $grammar_relation, $argument1, $argument2)
I need to push each of these elements into the #matches array/hash/reference, and then push that array/hash/reference into the #totalmatches array.
The matches are found based on searching a file and selecting the strings based on meeting the criteria.
QUESTIONS
Which data structure would you use?
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
When working with 2-D, to loop through would you use:
foreach (#totalmatches) {
foreach (#matches) {
...
}
}
Thanks for any advice.
Which data structure would you use?
An array for a ordered set of things. A hash for a set of named things.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
If you try to push an array (1) into an array (2), you'll end up pushing all the elements of 1 into 2. That is why you would push an array ref in instead.
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
Look at perldoc -f push
push ARRAY,LIST
You can push a list of things in.
When working with 2-D, to loop through would you use:
Nested foreach is fine, but that syntax wouldn't work. You have to access the values you are dealing with.
for my $arrayref (#outer) {
for my $item (#$arrayref) {
$item ...
}
}
Do not push one array into another array.
Lists just join with each other into a new list.
Use list of references.
#create an anonymous hash ref for each match
$one_match_ref = {
chapternumber => $chapternumber_value,
sentencenumber => $sentencenumber_value,
sentence => $sentence_value,
grammar_relation => $grammar_relation_value,
arg1 => $argument1,
arg2 => $argument2
};
# add the reference of match into array.
push #all_matches, $one_match_ref;
# list of keys of interest
#keys = qw(chapternumber sentencenumber sentence grammer_relation arg1 arg2);
# walk through all the matches.
foreach $ref (#all_matches) {
foreach $key (#keys) {
$val = $$ref{$key};
}
# or pick up some specific keys
my $arg1 = $$ref{arg1};
}
Which data structure would you use?
An array... I can't really justify that choice, but I can't imagine what you would use as keys if you used a hash.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Here's the thing; in Perl, arrays can only contain scalar variables - the ones which start with $. Something like...
#matrix = ();
#row = ();
$arr[0] = #row; # FAIL!
... wont't work. You will have to instead use a reference to the array:
#matrix = ();
#row = ();
$arr[0] = \#row;
Or equally:
push(#matrix, \#row);
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
If you use references, you need only push once... and since you don't want to concatenate arrays (you need an array of arrays) you're stuck with no alternatives ;)
When working with 2-D, to loop through would you use:
I'd use something like:
for($i=0; $i<#matrix; $i++) {
#row = #{$matrix[$i]}; # de-reference
for($j=0; $j<#row; $j++) {
print "| "$row[$j];
}
print "|\n";
}
Which data structure would you use?
Some fundamental container properties:
An array is a container for ordered scalars.
A hash is a container for scalars obtained by a unique key (there can be no duplicate keys in the hash). The order of values added later is not available anymore.
I would use the same structure like ZhangChn proposed.
Use a hash for each match.
The details of the match then can be accessed by descriptive names instead of plain numerical indices. i.e. $ref->{'chapternumber'} instead of $matches[0].
Take references of these anonymous hashes (which are scalars) and push them into an array in order to preserve the order of the matches.
To dereference items from the data structure
get an item from the array which is a hash reference
retrieve any matching detail you need from the hash reference

What is the difference between push and unshift in Perl?

Can someone please explain why push behaves the way as shown below?
Basically I am trying to print values of an array populated by push as well unshift.
When I try to print array contents populated by push using array indexes, It always prints the element at the top of the array, whereas array populated by unshift prints contents of array based on array index. I don't understand why.
with unshift
#!/usr/bin/perl
#names = ("Abhijit","Royal Enfield","Google");
#numbers=();
$number=1;
$i=0;
foreach $name (#names) {
#print $_ . "\n";
$number=$number+1;
#push(#numbers,($number));
unshift(#numbers,($number));
print("Array size is :" . #numbers . "\n");
$i=$i+1;
print("Individual Elements are:" . #numbers[i] . "\n");
pop(#numbers);
}
rhv:/var/cl_ip_down>./run.sh
Array size is :1
Individual Elements are:2
Array size is :2
Individual Elements are:3
Array size is :3
Individual Elements are:4
without unshift
#!/usr/bin/perl
#names = ("Abhijit","Royal Enfield","Google");
#numbers=();
$number=1;
$i=0;
foreach $name (#names) {
#print $_ . "\n";
$number=$number+1;
push(#numbers,($number));
#unshift(#numbers,($number));
print("Array size is :" . #numbers . "\n");
$i=$i+1;
print("Individual Elements are:" . #numbers[i] . "\n");
}
rhv:/var/cl_ip_down>./run.sh
Array size is :1
Individual Elements are:2
Array size is :2
Individual Elements are:2
Array size is :3
Individual Elements are:2
/without pop/
#!/usr/bin/perl
#names = ("Abhijit","Royal Enfield","Google");
#numbers=();
$number=1;
$i=0;
foreach $name (#names) {
#print $_ . "\n";
$number=$number+1;
#push(#numbers,($number));
unshift(#numbers,($number));
print("Array size is :" . #numbers . "\n");
$i=$i+1;
print("Individual Elements are:" . #numbers[i] . "\n");
#pop(#numbers);
}
rhv:/var/cl_ip_down>./run.sh
Array size is :1
Individual Elements are:2
Array size is :2
Individual Elements are:3
Array size is :3
Individual Elements are:4
with pop
#!/usr/bin/perl
#names = ("Abhijit","Royal Enfield","Google");
#numbers=();
$number=1;
$i=0;
foreach $name (#names) {
#print $_ . "\n";
$number=$number+1;
#push(#numbers,($number));
unshift(#numbers,($number));
print("Array size is :" . #numbers . "\n");
$i=$i+1;
print("Individual Elements are:" . #numbers[i] . "\n");
pop(#numbers);
}
rhv:/var/cl_ip_down>./run.sh
Array size is :1
Individual Elements are:2
Array size is :1
Individual Elements are:3
Array size is :1
Individual Elements are:4
You really should be using use strict; and use warnings; in your code. Having them activated will allow you to identify errors in your code.
Change all instances of the following:
foreach $name (#names) -> for my $i (#names) as you don't do anything with the elements in the #names array.
#numbers[i] -> $numbers[$i] as this is where you've made a not uncommon mistake of using an array slice rather than referring to an array element.
This is not C. Every 'variable' has to have a sigil ($, #, %, &, etc.) in front of it. That i should really be $i.
As for the difference between push and shift, the documentation explains:
perldoc -f push:
push ARRAY,LIST
Treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. ... Returns the number of elements in the array following the completed "push".
perldoc -f unshift:
unshift ARRAY,LIST
Does the opposite of a shift. Or the opposite of a push, depending on how you look at it. Prepends list to the front of the array, and returns the new number of elements in the array.
To put it ASCII-matically...
+---------+ +-----------+ +---------+
<----- | ITEM(S) | -----> | (#) ARRAY | <----- | ITEM(S) | ----->
shift +---------+ unshift +-----------+ push +---------+ pop
^ ^
FRONT END
unshift is used to add a value or values onto the beginning of an array:
Does the opposite of a shift. Or the opposite of a push, depending on how you look at it.
The new values then become the first elements in the array.
push adds elements to the end of an array:
Treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY.
This should really be a comment but it is too long for a comment box, so here it is.
If you want to illustrate the difference between unshift and push, the following would suffice:
#!/usr/bin/perl
use strict; use warnings;
my #x;
push #x, $_ for 1 .. 3;
my #y;
unshift #y, $_ for 1 .. 3;
print "\#x = #x\n\#y = #y\n";
Output:
#x = 1 2 3
#y = 3 2 1
Note use strict; protects you against many programmer errors and use warnings; warns you when you use constructs of dubious value. At your level, neither is optional.
Note that
the preallocated array is balanced toward the 0 end of the array (meaning there is more free space at the far end of the list than there is before the list's 0 element). This is done purposely to make pushes more efficient than unshifts.
http://www.perlmonks.org/?node_id=17890
Although lists do quite fine as "Perl is smartly coded because the use of lists as queues was anticipated (Ibid.)".
For comparison, in various JavaScript engines shift/unshift on arrays seems to be significantly slower.
I haven't seen any articulation of what these methods actually do in terms of operational complexity, which is what helped me with conceptualization: quintessentially, I believe it is called "shift" because it actually has to shift all of the n elements in your array to new indices in order to properly update the length property.
push() and pop() use simpler operational complexity. No matter what the number of n values in your array, or your array.length, push or pop will always execute 1 operation. It doesn't need to deal with indexes, it doesn't need to iterate, it only needs to execute one operation, always at the end of the stack, either adding or removing a value and index.
Most importantly, notice when using push/pop, that the other elements in the array are not affected - they are the same values in the same indices of your array. The length of the array is also automatically updated properly to what you'd expect when removing or adding values.
On the other hand, shift() and unshift() not only add or remove, but also have to actually "shift" all of the other elements in your array into different indices. This is more complex and takes more time because the amount of operations is dependent on n, the number of elements in your array, or array.length. For every n+1 larger, it has to do 1 more operation to shift each of the values into the correct index, properly updating the length properly.
Otherwise, if it didn't perform n operations after shift() and move the other elements, you would have no element at index 0, and it wouldn't change the length of your array, would it? We want the length of our arrays to update intuitively, and shift and unshift have to execute more operations to accomplish this.

How do I print unique elements in Perl array?

I'm pushing elements into an array during a while statement. Each element is a teacher's name. There ends up being duplicate teacher names in the array when the loop finishes. Sometimes they are not right next to each other in the array, sometimes they are.
How can I print only the unique values in that array after its finished getting values pushed into it? Without having to parse the entire array each time I want to print an element.
Heres the code after everything has been pushed into the array:
$faculty_len = #faculty;
$i=0;
while ($i != $faculty_len)
{
printf $fh '"'.$faculty[$i].'"';
$i++;
}
use List::MoreUtils qw/ uniq /;
my #unique = uniq #faculty;
foreach ( #unique ) {
print $_, "\n";
}
Your best bet would be to use a (basically) built-in tool, like uniq (as described by innaM).
If you don't have the ability to use uniq and want to preserve order, you can use grep to simulate that.
my %seen;
my #unique = grep { ! $seen{$_}++ } #faculty;
# printing, etc.
This first gives you a hash where each key is each entry. Then, you iterate over each element, counting how many of them there are, and adding the first one. (Updated with comments by brian d foy)
I suggest pushing it into a hash.
like this:
my %faculty_hash = ();
foreach my $facs (#faculty) {
$faculty_hash{$facs} = 1;
}
my #faculty_unique = keys(%faculty_hash);
#array1 = ("abc", "def", "abc", "def", "abc", "def", "abc", "def", "xyz");
#array1 = grep { ! $seen{ $_ }++ } #array1;
print "#array1\n";
This question is answered with multiple solutions in perldoc. Just type at command line:
perldoc -q duplicate
Please note: Some of the answers containing a hash will change the ordering of the array. Hashes dont have any kind of order, so getting the keys or values will make a list with an undefined ordering.
This doen't apply to grep { ! $seen{$_}++ } #faculty
This is a one liner command to print unique lines in order it appears.
perl -ne '$seen{$_}++ || print $_' fileWithDuplicateValues
I just found hackneyed 3 liner, enjoy
my %uniq;
undef #uniq(#non_uniq_array);
my #uniq_array = keys %uniq;
Just another way to do it, useful only if you don't care about order:
my %hash;
#hash{#faculty}=1;
my #unique=keys %hash;
If you want to avoid declaring a new variable, you can use the somehow underdocumented global variable %_
#_{#faculty}=1;
my #unique=keys %_;
If you need to process the faculty list in any way, a map over the array converted to a hash for key coalescing and then sorting keys is another good way:
my #deduped = sort keys %{{ map { /.*/? ($_,1):() } #faculty }};
print join("\n", #deduped)."\n";
You process the list by changing the /.*/ regex for selecting or parsing and capturing accordingly, and you can output one or more mutated, non-unique keys per pass by making ($_,1):() arbitrarily complex.
If you need to modify the data in-flight with a substitution regex, say to remove dots from the names (s/\.//g), then a substitution according to the above pattern will mutate the original #faculty array due to $_ aliasing. You can get around $_ aliasing by making an anonymous copy of the #faculty array (see the so-called "baby cart" operator):
my #deduped = sort keys %{{ map {/.*/? do{s/\.//g; ($_,1)}:()} #{[ #faculty ]} }};
print join("\n", #deduped)."\n";
print "Unmolested array:\n".join("\n", #faculty)."\n";
In more recent versions of Perl, you can pass keys a hashref, and you can use the non-destructive substitution:
my #deduped = sort keys { map { /.*/? (s/\.//gr,1):() } #faculty };
Otherwise, the grep or $seen[$_]++ solutions elsewhere may be preferable.