command line pivot - perl

I've been hunting around the past few days for a set of command line tools, a perl or an awk script that allow me to very quickly transpose the following data:
Row|Col|Val
1|A|foo
1|B|bar
1|C|I have a real
2|A|bad
2|C|hangover
into this:
A|B|C
foo|bar|I have a real
bad||hangover
Note that there is only one value in the dataset for each "cell" (i.e., as with a spreadsheet, there aren't any duplicates of Row "1" Col "A")
I've tried various awk shell implementations for transposing data - but can't seem to get them working. One idea I had was to cut each "Col" value into a separate file, then use the "join" command line to put them back together by "Row" -- but there MUST be an easier way. I'm sure this is just incredibly simple to do - but I'm struggling a bit.
My input files have Cols A through G (mostly including variable length strings), and 10,000 Rows. If I can avoid loading everything into memory that would be a huge plus.
Beer-by-mail for anyone who's got the answer!
As always - many thanks in advance for your help.
Cheers,
Josh
p.s. - I'm a bit surprised that there isn't an out-of-the-box command line util for doing this very basic type of pivot/transposition operation. I looked at http://code.google.com/p/openpivot/ and at http://code.google.com/p/crush-tools/ both of which seem to require aggregate calcs.

I can do this in gawk, but not nawk.
#!/usr/local/bin/gawk -f
BEGIN {
FS="|";
}
{
rows[$1]=1; cols[$2]=1; values[$1][$2]=$3;
}
END {
for (col in cols) {
output=output sprintf("|%s", col);
}
print substr(output, 2);
for (row in rows) {
output="";
for (col in cols) {
output=output sprintf("|%s", values[row][col]);
}
print substr(output, 2);
}
}
And it even works:
ghoti#pc $ cat data
1|A|foo
1|B|bar
1|C|I have a real
2|A|bad
2|C|hangover
ghoti#pc $ ./doit.gawk data
A|B|C
foo|bar|I have a real
bad||hangover
ghoti#pc $
I'm not sure how well this will work with 10000 rows, but I suspect if you've got the memory for it, you'll be fine. I can't see how you can avoid loading things into memory except by storing things in separate files which you'd later join. Which is pretty much a manual implementation of virtual memory.
UPDATE:
Per comments:
#!/usr/local/bin/gawk -f
BEGIN {
FS="|";
}
{
rows[$1]=1; cols[$2]=1; values[$1,$2]=$3;
}
END {
for (col in cols) {
output=output sprintf("|%s", col);
}
print output;
for (row in rows) {
output="";
for (col in cols) {
output=output "|" values[row,col];
}
print row output;
}
}
And the output:
ghoti#pc $ ./doit.awk data
|A|B|C
1|foo|bar|I have a real
2|bad||hangover
ghoti#pc $

Just use a hash.
If you don't want to load them into memory, you may need modules like DBM::Deep and a DBM backend.
my %table;
my $maxa = 'A';
my $maxr = 0;
<>;
while (<>) {
chomp;
my ($a,$b,$c) = split /\|/;
$table{$a}->{$b} = $c;
$maxr = $a if ($a > $maxr);
$maxa = $b if ($b gt $maxa);
}
for (my $c = 'A' ; $c lt $maxa ; $c++) {
print $c . '|';
}
print "$maxa\n";
for (my $r = 1 ; $r <= $maxr ; $r++) {
for (my $c = 'A' ; $c lt $maxa ; $c++) {
print $table{$r}->{$c} . '|';
}
print $table{$r}->{$maxa} . "\n";
}

If you know Awk, I'd recommend you look at Perl. Perl is just much more powerful than Awk. The advantage is that if you know BASH/Bourne shell and Awk, much of the syntax in Perl will be familiar.
Another nice thing about Perl is the entire CPAN repository which allows you to download already written Perl modules to use in your program. A quick search in CPAN brought up Data::Pivot which looks like (at a very quick glance) it might do what you want.
If not, take a look at Acme::Tools pivot command. Or try one of the many others.
Others have already provided a few solutions, but I recommend you look at what the CPAN Perl archive has. It's a very powerful tool for things like this.

Related

using printf to create columnar data

I am new to perl and scripting in general. I have five variables that hold data and I need to print them as five columns next to each other. Here is the code I have now.
$i = 0;
foreach $line (<inf>){
chomp $line;
#line=split / +/, $line;
$i = $i + 1;
if ($i > $n+1) {
$i = 1;
$numdata = $numdata + 1;
}
if ($i == 1) {
printf "%20s\n", $n, "\n";
} else {
print $i-1, "BEAD", $line[$col], $line[$col+1], $line[$col+2], "\n";
}
# other statistics
}
The output I get from this looks like:
5
1BEAD0.00000e+000.00000e+000.00000e+00
2BEAD0.00000e+000.00000e+000.00000e+00
3BEAD0.00000e+000.00000e+000.00000e+00
4BEAD0.00000e+000.00000e+000.00000e+00
5BEAD0.00000e+000.00000e+000.00000e+00
5
1BEAD9.40631e-02-3.53254e-022.09369e-01
2BEAD-6.69662e-03-3.13492e-012.62915e-01
3BEAD2.98822e-024.60254e-023.61680e-01
4BEAD-1.45631e-013.45979e-021.50167e-01
5BEAD-5.57204e-02-1.51673e-012.95947e-01
5
1BEAD8.14225e-028.10216e-022.76423e-01
2BEAD2.36992e-02-2.74023e-014.47334e-01
3BEAD1.23492e-011.12571e-012.59486e-01
4BEAD-2.05375e-011.25304e-011.85252e-01
5BEAD5.54441e-02-1.30280e-015.82256e-01
I have tried using "%6d %9d %15.6f %28.6f %39.6f\n" before the variables in my print statement to try to space the data out; however, this did not give me the columns I hoped for. Any help/ suggestions are appreciated.
If you're using Perl and doing more complex stuff, you may want to look into perlform, which is designed for this kind of thing, or a module like Text::Table.
As for using printf though, you can use the padding specifiers to get consistent spacing. For instance, using the Perl docs on it, make sure the field width is before the .: your printf string should probably look something more like this (check out the "precision, or maximum width" section):
printf "%6.d %9.d %15.6f %28.6f %39.6f"
Also, if your things are in an array, you can just pass the array the second argument to printf and save yourself typing everything out. I've also prepended the two other items from your example with unshift:
unshift(#line, $i-1, "BEAD");
printf "%6.d %10s %15.6f %28.6f %39.6f\n", $line;
Note that the %s placeholders don't have the . precision specifier, so leave it out for that. If you want the e-notation for the numbers, use %e or %g instead of %f (%39.6e).
Also, for Perl questions, always check out Perl Monks - much of this answer was culled from a question there.
P.S. Given one of your example columns, here's the proof-of-concept script I tried to make sure everything worked:
perl -e '#line = (8.14225e-02,8.10216e-02,2.76423e-01);
unshift(#line, 4, "BEAD");
printf "%6.d %10s %15.6f %28.6f %39.6e\n", #line;'

Matched lines (with regex) being written to both output files, but it's supposed only to be written to one output file..

I have a tab-delimited text file with several rows. I wrote a script in which I assign the rows to an array, and then I search through the array by means of regular expressions, to find the rows that match certain criteria. When a match is found, I write it to Output1. After going through all the if-statements listed (the regular expressions) and the criteria still isn't met, then the line is written to Output 2.
I works 100% when it comes to matching criteria and writing to Output 1, but here is where my problem comes in:
The matched lines are also being written to Output2, along with the unmatched lines. I am probably making a silly mistake, but I really just can't see it. If someone could have a look and help me out, I'd really appreciate it..
Thanks so much! :)
Inputfile sample:
skool school
losieshuis pension
prys prijs
eeu eeuw
lys lijs
water water
outoritêr outoritaire
#!/usr/bin/perl-w
use strict;
use warnings;
use open ':utf8';
use autodie;
open OSWNM, "<SecondWordsNotMatched.txt";
open ONIC, ">Output1NonIdenticalCognates.txt";
open ONC, ">Output2NonCognates.txt";
while (my $line = <OSWNM>)
{
chomp $line;
my #Row = $line;
for (my $x = 0; $x <= $#Row; $x++)
{
my $RowWord = $Row[$x];
#Match: anything, followed by 'y' or 'lê' or 'ê', followed by anything, followed by
a tab, followed by anything, followed by 'ij' or 'leggen' or 'e', followed by anything
if ($RowWord =~ /(.*)(y|lê|ê)(.*)(\t)(.*)(ij|leggen|e)(.*)/)
{
print ONIC "$RowWord\n";
}
#Match: anything, followed by 'eeu', followed by 'e' or 's', optional, followed by
anyhitng, followed by a tab, followed by anything, followed by 'eeuw', followed by 'en', optional
if ($RowWord =~ /(.*)(eeu)(e|s)?(\t)(.*)(eeuw)(en)?/)
{
print ONIC "$RowWord\n";
}
else
{
print ONC "$RowWord\n";
}
}
}
Inside your loop you essentially have:
if (A) {
output to file1
}
if (B) {
output to file1
} else {
output to file2
}
So you'll output to file2 anything that doesn't satisfy B (regardless of whether A was satisfied or not), and output stuff that satisfies both A and B twice to file1.
If outputting twice was not intended, you should modify your logic to something like:
if (A or B) {
output to file1
} else {
output to file2
}
Or:
if (A) {
output to file1
} elsif (B) {
output to file1
} else {
output to file2
}
(This second version allows you to do different processing for the A and B cases.)
If the double output was intended, you could do something like:
my $output_to_file2 = 1;
if (A) {
output to file1
$output_to_file2 = 0;
}
if (B) {
output to file1
$output_to_file2 = 0;
}
if ($output_to_file2) {
output to file2
}

Perl: Finding _ followed by X with stuff in between

many thanks for the help with the earlier issues.
I've almost finished the last thing I was working on - specifically an ORF (open reading frame) finder program. So far, I've got an array called #AminoAcidArray1. All the start codons are "_" and all the stop codons are "X".
How do I count the ORFs? Put another way, How do I count times in the array when "_" is followed by "X" with random ignorable characters between? What sort of loop should I be using? I need a ~= there somewhere I think
And yes, I know bioPerl can do this easily, but only activePerl is available for some reason.
Sincerest thanks,
Shtanto
First, contemporary ActivePerl has Bundle::BioPerl
in its main 'Activeperl' repository. This should allow
a BioPerl installation on some ActivePerl versions.
Then,
print "$-[0]..$+[0]\n" while $orf =~ /_[^X]*X/g;
prints start- (_) and stop (X) index of your orfs contained in
$orf if they are consecutive (not nested). If nested, then you'd have
to use slightly more complicated expressions (with recursion).
BTW.: What does the expression
print join ',', #AminoAcidArray1;
print on your console?
rbo
If I understand it right from your comment:
you have an array, you don't need =~ operator.
You need to traverse the array once and remember the current state of what you call "reading window". Say:
my $state = 0;
my $count = 0;
for my $item (#array) {
if ($item eq "_") {
if ($state==0) {
$state=1;
}
} elsif ($item eq "X") {
if ($state==1) {
$state=0;
$count++;
}
}
}
return $count;
Your question is too specific to your domain, but what I understand is that you want to count some occurrences in an array, this is what I does in the following code (I use perlconsole) :
Perl> my #a = qw/az ae ar at ay au pr lf/
8
Perl> my $count = grep /^a/, #a
6
Perl> print "$count\n"
6
1
Perl>

How is this Perl code selecting two different elements from an array?

I have inherited some code from a guy whose favorite past time was to shorten every line to its absolute minimum (and sometimes only to make it look cool). His code is hard to understand but I managed to understand (and rewrite) most of it.
Now I have stumbled on a piece of code which, no matter how hard I try, I cannot understand.
my #heads = grep {s/\.txt$//} OSA::Fast::IO::Ls->ls($SysKey,'fo','osr/tiparlo',qr{^\d+\.txt$}) || ();
my #selected_heads = ();
for my $i (0..1) {
$selected_heads[$i] = int rand scalar #heads;
for my $j (0..#heads-1) {
last if (!grep $j eq $_, #selected_heads[0..$i-1]);
$selected_heads[$i] = ($selected_heads[$i] + 1) % #heads; #WTF?
}
my $head_nr = sprintf "%04d", $i;
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$heads[$selected_heads[$i]].txt","$recdir/heads/$head_nr.txt");
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$heads[$selected_heads[$i]].cache","$recdir/heads/$head_nr.cache");
}
From what I can understand, this is supposed to be some kind of randomizer, but I never saw a more complex way to achieve randomness. Or are my assumptions wrong? At least, that's what this code is supposed to do. Select 2 random files and copy them.
=== NOTES ===
The OSA Framework is a Framework of our own. They are named after their UNIX counterparts and do some basic testing so that the application does not need to bother with that.
This looks like some C code with Perl syntax. Sometimes knowing the language the person is thinking in helps you figure out what's going on. In this case, the person's brain is infected with the inner workings of memory management, pointer arithmetic, and other low level concerns, so he wants to minutely control everything:
my #selected_heads = ();
# a tricky way to make a two element array
for my $i (0..1) {
# choose a random file
$selected_heads[$i] = int rand #heads;
# for all the files (could use $#heads instead)
for my $j (0..#heads-1) {
# stop if the chosen file is not already in #selected_heads
# it's that damned ! in front of the grep that's mind-warping
last if (!grep $j eq $_, #selected_heads[0..$i-1]);
# if we are this far, the two files we selected are the same
# choose a different file if we're this far
$selected_heads[$i] = ($selected_heads[$i] + 1) % #heads; #WTF?
}
...
}
This is a lot of work because the original programmer either doesn't understand hashes or doesn't like them.
my %selected_heads;
until( keys %selected_heads == 2 )
{
my $try = int rand #heads;
redo if exists $selected_heads{$try};
$selected_heads{$try}++;
}
my #selected_heads = keys %selected_heads;
If you still hate hashes and have Perl 5.10 or later, you can use smart-matching to check if a value is in an array:
my #selected_heads;
until( #selected_heads == 2 )
{
my $try = int rand #heads;
redo if $try ~~ #selected_heads;
push #selected_heads, $try;
}
However, you have a special constraint on this problem. Since you know there are only two elements, you just have to check if the element you want to add is the prior element. In the first case it won't be undef, so the first addition always works. In the second case, it just can't be the last element in the array:
my #selected_heads;
until( #selected_heads == 2 )
{
my $try = int rand #heads;
redo if $try eq $selected_heads[-1];
push #selected_heads, $try;
}
Huh. I can't remember the last time I used until when it actually fit the problem. :)
Note that all of these solutions have the problem that they can cause an infinite loop if the number of original files is less than 2. I'd add a guard condition higher up so the no and single file cases through an error and perhaps the two file case doesn't bother to order them.
Another way you might do this is to shuffle (say, with List::Util) the entire list of original files and just take off the first two files:
use List::Util qw(shuffle);
my #input = 'a' .. 'z';
my #two = ( shuffle( #input ) )[0,1];
print "selected: #two\n";
It selects a random element from #heads.
Then it adds on another random but different element from #heads (if it is the element previously selected, it scrolls through #heads till it find an element not previously selected).
In summary, it selects N (in your case N=2) different random indexes in #heads array and then copies files corresponding to those indexes.
Personally I would write it a bit differently:
# ...
%selected_previously = ();
foreach my $i (0..$N) { # Generalize for N random files instead of 2
my $random_head_index = int rand scalar #heads;
while ($selected_previously[$random_head_index]++) {
$random_head_index = $random_head_index + 1) % #heads; # Cache me!!!
}
# NOTE: "++" in the while() might be considered a bit of a hack
# More readable version: $selected_previously[$random_head_index]=1; here.
The part you labeled "WTF" isn't so troubling, it's just simply making sure that $selected_heads[$i] remains as a valid subscript of #head. The really troubling part is that it is a pretty inefficient way of making sure he's not selecting the same file.
Then again, if the size of #heads is small, stepping from 0..$#heads is probably more efficient than just generating int rand( 2 ) and testing if they are the same.
But basically it copies two files at random (why?) as a '.txt' file and a '.cache' file.
How about just
for my $i (0..1) {
my $selected = splice( #heads, rand #heads, 1 );
my $head_nr = sprintf "%04d", $i;
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$selected.txt","$recdir/heads/$head_nr.txt");
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$selected.cache","$recdir/heads/$head_nr.cache");
}
unless #heads or #selected_heads are used later.
Here's yet another way to select 2 unique random indices:
my #selected_heads = ();
my #indices = 0..$#heads;
for my $i (0..1) {
my $j = int rand (#heads - $i);
push #selected_heads, $indices[$j];
$indices[$j] = $indices[#heads - $i - 1];
}

What are some elegant features or uses of Perl?

What? Perl Beautiful? Elegant? He must be joking!
It's true, there's some ugly Perl out there. And by some, I mean lots. We've all seen it.
Well duh, it's symbol soup. Isn't it?
Yes there are symbols. Just like 'math' has 'symbols'. It's just that we programmers are more familiar with the standard mathematical symbols. We grew to accept the symbols from our mother languages, whether that be ASM, C, or Pascal. Perl just decided to have a few more.
Well, I think we should get rid of all the unnecessary symbols. Makes the code look better.
The language for doing so already exists. It's called Lisp. (and soon, perl 6.)
Okay, smart guy. Truth is, I can already invent my own symbols. They're called functions and methods. Besides, we don't want to reinvent APL.
Oh, fake alter ego, you are so funny! It's really true, Perl can be quite beautiful. It can be quite ugly, as well. With Perl, TIMTOWTDI.
So, what are your favorite elegant bits of Perl code?
Perl facilitates the use of lists/hashes to implement named parameters, which I consider very elegant and a tremendous aid to self-documenting code.
my $result = $obj->method(
flux_capacitance => 23,
general_state => 'confusion',
attitude_flags => ATTITUDE_PLEASANT | ATTITUDE_HELPFUL,
);
My favourite pieces of elegant Perl code aren't necessarily elegant at all. They're meta-elegant, and allow you to get rid of all those bad habits that many Perl developers have slipped into. It would take me hours or days to show them all in the detail they deserve, but as a short list they include:
autobox, which turns Perl's primitives into first-class objects.
autodie, which causes built-ins to throw exceptions on failure (removing most needs for the or die... construct). See also my autodie blog and video).
Moose, which provide an elegant, extensible, and correct way of writing classes in Perl.
MooseX::Declare, which provides syntaxic aweseomeness when using Moose.
Perl::Critic, your personal, automatic, extensible and knowledgeable code reviewer. See also this Perl-tip.
Devel::NYTProf, which provides me the most detailed and usable profiling information I've seen in any programming language. See also Tim Bunce's Blog.
PAR, the Perl Archiver, for bundling distributions and even turning whole programs into stand-alone executable files. See also this Perl-tip.
Perl 5.10, which provides some stunning regexp improvements, smart-match, the switch statement, defined-or, and state variables.
Padre, the only Perl editor that integrates the best bits of the above, is cross-platform, and is completely free and open source.
If you're too lazy to follow links, I recently did a talk at Linux.conf.au about most of the above. If you missed it, there's a video of it on-line (ogg theora). If you're too lazy to watch videos, I'm doing a greatly expanded version of the talk as a tutorial at OSCON this year (entitled doing Perl right).
All the best,
Paul
I'm surprised no one mentioned the Schwartzian Transform.
my #sorted =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, expensive_func($_) ] }
#elements;
And in the absence of a slurp operator,
my $file = do { local $/; readline $fh };
Have a list of files the user wants your program to process? Don't want to accidentally process a program, folder, or nonexistent file? Try this:
#files = grep { -T } #files;
And, like magic, you've weeded out all the inappropriate entries. Don't want to ignore them silently? Add this line before the last one:
warn "Not a file: $_" foreach grep { !-T } #files;
Prints a nice warning message for every file that it can't process to standard error. The same thing without using grep would look like this:
my #good;
foreach(#files) {
if(-T) {
push #good, $_;
} else {
warn "Not a file: $_";
}
}
grep (and map) can be used to make code shorter while still keeping it very readable.
The "or die" construct:
open my $fh, "<", $filename
or die "could not open $filename: $!";
The use of qr// to create grammars:
#!/usr/local/ActivePerl-5.10/bin/perl
use strict;
use warnings;
use feature ':5.10';
my $non_zero = qr{[1-9]};
my $zero = qr{0};
my $decimal = qr{[.]};
my $digit = qr{$non_zero+ | $zero}x;
my $non_zero_natural = qr{$non_zero+ $digit*}x;
my $natural = qr{$non_zero_natural | $zero}x;
my $integer = qr{-? $non_zero_natural | $zero}x;
my $real = qr{$integer (?: $decimal $digit)?}x;
my %number_types = (
natural => qr/^$natural$/,
integer => qr/^$integer$/,
real => qr/^$real$/
);
for my $n (0, 3.14, -5, 300, "4ever", "-0", "1.2.3") {
my #types = grep { $n =~ $number_types{$_} } keys %number_types;
if (#types) {
say "$n is of type", #types == 1 ? " ": "s ", "#types";
} else {
say "$n is not a number";
}
}
Anonymous subroutines used to factor out duplicate code:
my $body = sub {
#some amount of work
};
$body->();
$body->() while $continue;
instead of
#some amount of work
while ($continue) {
#some amount of work again
}
Hash based dispatch tables:
my %dispatch = (
foo => \&foo,
bar => \&bar,
baz => \&baz
);
while (my $name = iterator()) {
die "$name not implemented" unless exists $dispatch{$name};
$dispatch{$name}->();
}
instead of
while (my $name = iterator()) {
if ($name eq "foo") {
foo();
} elsif ($name eq "bar") {
bar();
} elsif ($name eq "baz") {
baz();
} else {
die "$name not implemented";
}
}
Three-line classes with constructors, getter/setters and type validation:
{
package Point;
use Moose;
has ['x', 'y'] => (isa => 'Num', is => 'rw');
}
package main;
my $point = Point->new( x => '8', y => '9' );
$point->x(25);
A favorite example of mine is Perl's implementation of a factorial calculator. In Perl 5, it looks like so:
use List::Util qw/reduce/;
sub factorial {
reduce { $a * $b } 1 .. $_[0];
}
This returns false if the number is <= 1 or a string and a number if a number is passed in (rounding down if a fraction).
And looking forward to Perl 6, it looks like this:
sub factorial {
[*] 1..$^x
}
And also ( from the blog in the link above ) you can even implement this as an operator:
sub postfix:<!>(Int $x) {
[*] 1..($x || 1)
}
and then use it in your code like so:
my $fact5 = 5!;
If you have a comma separated list of flags, and want a lookup table for them, all you have to do is:
my %lookup = map { $_ => 1 } split /,/, $flags;
Now you can simply test for which flags you need like so:
if ( $lookup{FLAG} ) {
print "Ayup, got that flag!";
}
I am surprised no one has mentioned this. It's a masterpiece in my opinion:
#!/usr/bin/perl
$==$';
$;||$.| $|;$_
='*$ ( ^#(%_+&~~;# ~~/.~~
;_);;.);;#) ;~~~~;_,.~~,.* +,./|~
~;_);#-, .;.); ~ ~,./##-__);#-);~~,.*+,.
/|);;;~~#-~~~~;.~~,. /.);;.,./#~~#-;.;#~~#-;;
;;,.*+,./.);;#;./#,./ |~~~~;#-(#-__#-__&$#%^';$__
='`'&'&';$___="````" |"$[`$["|'`%",';$~=("$___$__-$[``$__"|
"$___"| ("$___$__-$[.%")).("'`"|"'$["|"'#").
'/.*?&([^&]*)&.*/$'.++$=.("/``"|"/$[`"|"/#'").(";`/[\\`\\`$__]//`;"
|";$[/[\\$[\\`$__]//`;"|";#/[\\\$\\.$__]//'").'#:=("#-","/.",
"~~",";#",";;",";.",",.",");","()","*+","__","-(","/#",".%","/|",
";_");#:{#:}=$%..$#:;'.('`'|"$["|'#')."/(..)(..)/".("```"|"``$["|
'#("').'(($:{$'.$=.'}<<'.(++$=+$=).')|($:{$'.$=.'}))/'.("```;"|
"``$[;"|"%'#;").("````'$__"|"%$[``"|"%&!,").${$[};`$~$__>&$=`;$_=
'*$(^#(%_+&#-__~~;#~~#-;.;;,.(),./.,./|,.-();;#~~#-);;;,.;_~~#-,./.,
./#,./#~~#-);;;,.(),.;.~~#-,.,.,.;_,./#,.-();;#~~#-,.;_,./|~~#-,.
,.);););#-#-__~~;#~~#-,.,.,.;_);~~~~#-);;;,.(),.*+);;# ~~#-,
./|,.*+,.,.);;;);*+~~#-,.*+,.;;,.;.,./.~~#-,.,.,.;_) ;~~~
~#-,.;;,.;.,./#,./.);*+,.;.,.;;#-__~~;#~~#-,.;;,.* +);;
#);#-,./#,./.);*+~~#-~~.%~~.%~~#-;;__,. /.);;##- __#-
__ ~~;;);/#;#.%;#/.;#-(#-__~~;;;.;_ ;#.%~~~~ ;;()
,.;.,./#,. /#,.;_~~#- ););,.;_ );~~,./ #,.
;;;./#,./| ~~~~;#-(#- __,.,.,. ;_);~~~ ~#
-~~());; #);#-,./#, .*+);;; ~~#-~~
);~~);~~ *+~~#-);-( ~~#-#-_ _~~#-
~~#-);; #,./#,.;., .;.);# -~~#-;
#/.;#-( ~~#-#-__ ~~#-~~ #-);#
-);~~, .*+,./ |);;;~ ~#-~~
;;;.; _~~#-# -__);. %;#-(
#-__# -__~~;# ~~#-;; ;#,.
;_,.. %);#-,./#, .*+,
..%, .;.,./|) ;;;)
;;#~ ~#-,.*+,. ,.~~
#-); *+,.;_);;.~ ~););
~~,.; .~~#-);~~,.;., ./.,.;
;,.*+ ,./|,.); ~~#- );;;,.(
),.*+); ;#~~/|#-
__~~;#~~ $';$;;
I absolutely love Black Perl (link to version rewritten to compile under Perl 5). It compiles, but as far as I can tell it doesn't actually do anything.
That's what you get for a language written by a linguist from a pragmatic perspective rather than from a theoretical perspective.
Moving on from that, you can think about the Perl that people complain about as pidgin Perl (perfectly useful, but not expressive, and beware of trying to express anything complex in it), and the stuff that #pjf is talking about as "proper" Perl, the language of Shakespeare, Hemingway, Hume and so on. [edit: err, though easier to read than Hume and less dated than Shakespeare.] [re-edit and hopefully less alcoholic than Hemingway]
Adding to the love of map and grep, we can write a simple command-line parser.
my %opts = map { $_ => 1 } grep { /^-/ } #ARGV;
If we want, we can set each flag to it's index in #ARGV:
my %opts = map { $ARGV[$_] => $_ } grep { $ARGV[$_] =~ /^-/ } 0 .. $#ARGV;
That way, if a flag has an argument, we can get the argument like this:
if( defined( $opts{-e} ) ) {
my $arg = $ARGV[ $opts{-e} ];
# do -e stuff for $arg
}
Of course, some people will cry that we're reinventing the wheel and we should use getopt or some variant thereof, but honestly, this was a fairly easy wheel to reinvent. Plus, I don't like getopt.
If you don't like how long some of those lines are, you can always use intermediate variables or just convenient line breaks (hey, Python fanatics? You hear that? We can put one line of code across two lines and it still works!) to make it look better:
my %opts = map { $ARGV[$_] => $_ }
grep { $ARGV[$_] =~ /^-/ } 0 .. $#ARGV;
This file parsing mechanism is compact and easy to customize (skip blank lines, skip lines starting with X, etc..).
open(H_CONFIG, "< $file_name") or die("Error opening file: $file_name! ($!)");
while (<H_CONFIG>)
{
chomp; # remove the trailing newline
next if $_ =~ /^\s*$/; # skip lines that are blank
next if $_ =~ /^\s*#/; # skip lines starting with comments
# do something with the line
}
I use this type of construct in diverse build situations - where I need to either pre or post process payload files (S-records, etc..) or C-files or gather directory information for a 'smart build'.
My favourite elegant Perl feature is that it uses different operators for numerical values and string values.
my $string = 1 . 2;
my $number = "1" + "2";
my $unambiguous = 1 . "2";
Compare this to other dynamic languages such as JavaScript, where "+" is used for concatenation and addition.
var string = "1" + "2";
var number = 1 + 2;
var ambiguous = 1 + "2";
Or to dynamic languages such as Python and Ruby that require type coercion between strings and numberical values.
string = "1" + "2"
number = 1 + 2
throws_exception = 1 + "2"
In my opinion Perl gets this so right and the other languages get it so wrong.
Poorer typists like me who get cramps hitting the shift key too often and have an almost irrational fear of using a semicolon started writing our Perl code in python formatted files. :)
e.g.
>>> k = 5
>>> reduce(lambda i,j: i*j, range(1,k+1),1)
120
>>> k = 0
>>> reduce(lambda i,j: i*j, range(1,k+1),1)
1