Using an awk command inside a Perl script - perl

This may not be the best way to do the below, so any comments are appreciated.
I'm currently tailing a number of log files and outputting them to screen, so that I get a quick overview of the system.
What I would like to do is to highlight different messages [INFO], [WARN] and [ERROR]
The following syntax works fine on the command line, but fails when being called from Perl
system ("tail -n 5 $ArchiverLog | awk '{if ($4 ~ /DEBUG/)print "\033[1;33m"$0"\033[0m"; else if ($6 ~ /ERROR/) print "\033[1;31m"$0"\033[0m"; else print $0}'");
I believe Perl can do this
Should I read in the file line by line, match on the words and print to screen (I only want the last 10 lines). Is that a better option?
I've also seen reference to a2p, which is an awk to Perl translator. Would that be people's preferred choice?

It seems crazy to use one powerful scripting language to call up another one so it can do something which the first one can do very well, so I would not persist with trying to call up awk from perl.
I have not had much experience with a2p, rather I tend to just translate such snippets by hand.
#!/usr/bin/perl
use strict;
foreach(`tail -n 5 $ArchiverLog`) {
my #f = split;
if ($f[4] =~ /DEBUG/) {
print "\033[1;33m$_\033[0m";
} elsif ($f[6] =~ /ERROR/) {
print "\033[1;31m$_\033[0m";
} else {
print $_;
}
}
(Hard to say if the above is completely correct without some sample input to test it with).
As Borodin says in the comments a more elegant solution would be to use a Tailing module from CPAN rather than calling up tail as a subprocess. But for a quick tool that might be overkill.
NB: if $ArchiverLog comes from anywhere you don't have control of, remember to sanitise it, otherwise you are creating a nice security hole.

Related

Interpreting & modifying Perl one-liner?

I have the following Perl 'one-liner' script (found it online, so not mine):
perl -lsne '
/$today.* \[([0-9.]+)\]:.+dovecot_(?:login|plain):([^\s]+).* for (.*)/
and $sender{$2}{r}+=scalar (split / /,$3)
and $sender{$2}{i}{$1}=1;
END {
foreach $sender(keys %sender){
printf"Recip=%05d Hosts=%03d Auth=%s\n",
$sender{$sender}{r},
scalar (keys %{$sender{$sender}{i}}),
$sender;
}
}
' -- -today=$(date +%F) /var/log/exim_mainlog | sort
I've been trying to understand its innards, because I would like to modify it to re-use its functionality.
Some questions I got:
What does the flag -lsne does? (From what I know, it's got to be, at least, 3 different flags in one)
Where does $sender gets its value from?
What about that (?:login|plain) segment, are they 'variables'? (I get that's ReGex, I'm just not familiarized with it)
What I'm trying to achieve:
Get the number of emails sent by each user in a SMTP relay periodically (cron job)
If there's an irregular number of emails (say, 500 in a 1-hour timespan), do something (like shutting of the service, or send a notification)
Why I'm trying to achieve this:
Lately, someone has been using my SMTP server to send spam, so I would like to monitor email activity so they stop abusing the SMTP relay resources. (Security-related suggestions are always welcomed, but out of topic for this question. Trying to focus on the script for now)
What I'm NOT trying to achieve:
To get the script done by third-parties. (Just try and point me in the right direction, maybe an example)
So, any suggestions, guidance,and friendly comments are welcomed. I understand this may be an out-of-topic question, yet I've been struggling with this for almost a week and my background with Perl is null.
Thanks in advance.
What does the flag -lsne does? (From what I know, it's got to be, at least, 3 different flags in one)
-l causes lines of input read in to be auto-chomped, and lines of
output printed out to have "\n" auto-appended
-s enables switch
parsing. This is what creates the variable $today, because a
command-line switch of --today=$(date +%F) was passed.
-n surrounds the entire "one-liner" in a while (<>) { ... } loop.
Effectively reading every line from standard input and running the
body of the one liner on that line
-e is the switch that tells
perl to execute the following code from the command line, rather
than running a file containing Perl code
Where does $sender gets its value from?
I suspect you are confusing $sender with %sender. The code uses $sender{$2}{r} without explicitly mentioning %sender. This is a function of Perl called "auto-vivification". Basically, because we used $sender{$2}{r}, perl automatically created a variable %sender, and added a key whose name is whatever is in $2, and set the value of that key in %sender to be a reference to a new hash. It then set that new hash to have a key 'r' and a value of scalar (split / /,$3)
What about that (?:login|plain) segment, are they 'variables'? (I get that's ReGex, I'm just not familiarized with it)
It's saying that this portion of the regular expression will match either 'login' or 'plain'. The ?: at the beginning tells Perl that these parentheses are used only for clustering, not capturing. In other words, the result of this portion of the pattern match will not be stored in the $1, $2, $3, etc variables.
-MO=Deparse is your friend for understanding one-liners (and one liners that wrap into five lines on your terminal):
$ perl -MO=Deparse -lsne '/$today.* \[([0-9.]+)\]:.+dovecot_( ...
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE:
while ( defined($_ = <ARGV>) ) {
chomp $_;
$sender{$2}{'i'}{$1} = 1 if
/$today.* \[([0-9.]+)\]:.+dovecot_(?:login|plain):([^\s]+).* for (.*)/
and $sender{$2}{'r'} += scalar split(/ /, $3, 0);
sub END {
foreach $sender (keys %sender) {
printf "Recip=%05d Hosts=%03d Auth=%s\n",
$sender{$sender}{'r'},
scalar keys %{$sender{$sender}{'i'};}, $sender;
}
}
}
-e syntax OK
[newlines and indentation added for clarity]
What does the flag -lsne does? (From what I know, it's got to be, at least, 3 different flags in one)
You can access a summary of the available perl command line options by running '~$ perl -h' in the terminal. Below are filtered out the specific command line options you were asking about.
~$ perl -h|perl -ne 'print if /^\s+(-l|-s|-n|-e)/'
-e program one line of program (several -e's allowed, omit programfile)
-l[octal] enable line ending processing, specifies line terminator
-n assume "while (<>) { ... }" loop around program
-s enable rudimentary parsing for switches after programfile
Two examples of the '-s' command line option in use.
~$ perl -se 'print "Todays date is $today\n"' -- -today=`date +%F`
Todays date is 2016-10-17
~$ perl -se 'print "The sky is $color.\n"' -- -color='blue'
The sky is blue.
For detailed explanations of those command line options read the online documentation below.
http://perldoc.perl.org/perlrun.html
Or run the command below from your terminal.
~$ perldoc perlrun
Unrelated to the questions of the OP, I'm aware that this is not a complete answer (added as much as I was able to at the moment), so if this post/answer violates any SO rules, the moderators should remove it. Thx.

How to get first file in directory by alphabetical order?

I want to execute $filename = `ls *.gz | head -n 1`; through perl but I think the pipe is causing an error. Execution of -e aborted due to compilation errors.
This will be part of the perl script rather than run via -e.
What would be the correct way to do this?
How about
my $filename = (sort glob "*.gz")[0];
You state the alphabeticall order thus the sort, which by default uses the "standard string comparison order". Note that ls can be aliased, while its defaults also may depend on the system.
Going out to the shell would make sense only if you use some particular strengths of ls, that would take a lot of work to do in Perl. For mere sorting there is no reason to go out of your program. It is far less efficient and adds a whole list of new problems to solve.
A good point was raised by mob. Because of sort invocation, sort BLOCK|SUB LIST, one could wonder whether glob above could be taken in an unintended way, as a SUB. It's not, as the builtin certainly runs first. However, that's a little close and this is just clearer
my ($filename) = sort (glob "*.gz");
zdim's solution uses sort and glob, both of which do a lot of needless work.
The glob is something I explain in Wasting time thinking about wasted time, in which I look at some really bad benchmarks we had in Intermediate Perl.
The sort compares a bunch of files to each other, even if it already know that the two files it wants to compare cannot be the one you want. You only care about which one comes first. A linear scan does just fine (and we show an example of that in Learning Perl to find the maximum number in a list):
opendir my $dh, $dir or die "Could not open dir $dir";
my $first;
while( my $file = readdir $dh ) {
$first = $file if $file lt $first;
}
It's a bit more complicated as you probably want to filter out some files (the virtual dirs . and .., and maybe all the hidden files), but a grep handles that:
my $file = grep { ! /\A\.\.?\z/ } readdir $dh
Even better, this wouldn't even have to know how to get the next file. Some other sort of iterator would provide that without high_water knowing how it works. That's a more complicated example that I won't present here (and is one of those areas where Perl should look to Python, and that I demonstrate in Object::Iterate).
But, sometimes the easier, less pure thing is a good enough solution. Maybe not for this problem, but for some other problem. For example, if you have an external process (ls) generate all the input, all its resources can be freed once it does its job. A Perl pipe can do that:
$ perl -e 'open my $ph, q(-|), q(/bin/ls *.gz); print scalar <$ph>'
a.gz
Or, with head as well (notice the disappearing scalar) so I print all the lines instead of the first line (even though in this example there is only one line):
$ perl -e 'open my $ph, q(-|), q(/bin/ls *.gz | /usr/bin/head -n 1); print <$ph>'
a.gz
In your simple case, where you only get one line of input, the backticks can do the job:
$ perl -e 'print `/bin/ls *.gz | /usr/bin/head -n 1`'
a.gz
I don't know what you were having problems with your one liner, but that's something to consider: The shell things tend to be fragile and difficult to get right. Then, when you get it right, someone else messes it up. Or simple translations from Perl strings to the shell don't come out as expected.
I suspect that you had quoting issues, which is why I use generalized quoting, q(), inside my Perl one-liners. It gets even more fun on Windows where you can't use single ticks to quote the argument to -e, but if you use double ticks in a unix shell, you get interpolation.
Remember though, asking for an external process means you have to be really careful. I used /bin/ls to be sure I got what I wanted and not some other thing in my path (although you can also limit $ENV{PATH}). I write much more about that in Mastering Perl, although perlsec has some advice too.

How can I convert Perl script into one-liner

I know of Perl compiler back-end that allows you to convert any one-liner into script on following matter:
perl -MO=Deparse -pe 's/(\d+)/localtime($1)/e'
Which would give the following output
LINE: while (defined($_ = <ARGV>)) {
s/(\d+)/localtime($1);/e;
}
continue {
print $_;
}
Is there possibly a reverse tool, usable from command-line, which provided full script will generate one-liner version of it?
Note: The above example was taken from https://stackoverflow.com/a/2822721/4313369.
Perl is a free-form syntax language with clear statement and block separators, so there is nothing preventing you from simply folding a normal script up into a single line.
To use your example in reverse, you could write it as:
$ perl -e 'LINE: while (defined($_ = <ARGV>)) { s/(\d+)/localtime($1);/e; } continue { print $_; }'
This is a rather contrived example, since there is a shorter and clearer way to write it. Presumably you're starting with scripts that are already as short and clear as they should be.
Any use statements in your program can be turned into -M flags.
Obviously you have to be careful about quoting and other characters that are special to the shell. You mention running this on a remote system, by which I assume you mean SSH, which means you now have two levels of shell to sneak any escaping through. It can be tricky to work out the proper sequence of escapes, but it's always doable.
This method may work for automatically translating a Perl script on disk into a one-liner:
$ perl -e "$(tr '\n' ' ' < myscript.pl)"
The Perl script can't have comments in it, since that would comment out the entire rest of the script. If that's a problem, a bit of egrep -v '\w+#' type hackery could solve the problem.

what does this Perl foreach do?

I am stuck while understanding what this foreach does, I am new to perl programming
-e && print ("$_\n") foreach $base_name, "build/$base_name";
here build is directory. Thanks
Not very pretty code somebody left you with there. :(
Either of
for ($basename, "build/$basename") { say if -e }
or
for my $file ($basename, "build/$basename") {
say $file if -e $file;
}
would be clearer.
Checks whether the file exists, and if it does,prints its name.
It can be written in a clearer way.
It iterates over $base_name and then build/$base_name while the file name is in $base_name
it is essentially the same as:
foreach( $base_name, "build/$base_name" ){
if( -e ){
print ("$_\n");
}
}
I hope whoever wrote that isn't prolific.
How many ways to do the same thing in a clearer fashion?
grep
grep filters a list and removes anything that returns false for the the check condition.
The current value under consideration is set to $_, so it is very convenient to use file test operators.
Since say and print both handle lists of strings well, it makes sense to filter output before passing it to them.
say grep -e, $base_name, "build/$base_name";
On older perls without say, map is sometimes used to apply newlines to the output before it is printed.
print map "$_\n", grep -e, $base_name, "build/$base_name";
for
Here we safely use a statement modifier if. IMHO, statement modifier forms can be very handy as long as the LHS is kept very simple.
for my $dir ( $base_name, "build/$base_name" ) {
print "$dir\n" if -e $dir;
}
punctuation
Still ugly as sin, but breaking up the line and using some extra parens helps identify what is being looped over and separate it from the looping code.
-e && print("$_\n")
foreach( $base_name, "build/$base_name" );
It's better, but please don't do this. If you want to do ($foo if $bar) for #baz; DON"T. Use a full sized block. I can count one time in my life where it felt OK to use a $bar and $foo for #baz construct like this one--and it was probably wrong to use it.
sub
The OP's mystery code is nicer if it is merely wrapped in a sub with a good name.
I'll write this in the ugliest way I know how:
sub print_existing_files { -e && print ("$_\n") foreach #_; }
The body of the sub remains a puzzle, but at least there's some kind of clue in the name.
Conclusion
I'm sure I left out many variations on how this could be done (I don't have anything that uses until, and damn near anything that isn't intentionally obfuscated would be clearer). But that is beside the point.
The point of this is really to say that in any language there are many ways to achieve any given task. It is, therefore, important that each programmer who works on a system remain mindful of those who follow.
Code exists to serve two purposes: to communicate with future programmers and to instruct the computer how to operate--in that order.

How do I specify the "before-the loop" code when using "perl -ne"?

How do I specify the "before-the loop" code when using "perl -ne", without resorting to either BEGIN/END blocks or replacing "-n" with actually spelled-out while loop?
To explain in detail:
Say, I have the following Perl code:
use MyModule;
SETUP_CODE;
while (<>) {
LOOP_CODE;
}
FINAL_CODE;
How can I replace that with a one-liner using perl -ne?
Of course, the loop part is handled by the -n itself, while the FINAL_CODE can be done using a trick of adding "} { FINAL_CODE" at the end; whereas the use statement can be handled via "-M" parameter.
So, if we had no SETUP_CODE before the loop, I could write the following:
perl -MMyModule -ne 'LOOP_CODE } { FINAL_CODE'
But, how can we insert SETUP_CODE here?
The only idea I have is to try to add it after the loop via a BEGIN{} block, ala
perl -MMyModule -ne 'LOOP_CODE } BEGIN { SETUP_CODE } { FINAL_CODE'
But this seems at best hacky.
Any other solution?
Just to be clear - I already know I can do this by either spelling out the while loop instead of using "-n" or by using BEGIN/END blocks (and might even agree that from certain points of view, doing "while" is probably better).
What I'm interested in is whether there is a different solution.
Write BEGIN and END blocks without ceremony:
$ perl -lne 'BEGIN { print "hi" }
print if /gbacon/;
END { print "bye" }' /etc/passwd
hi
gbacon:x:700:700:Greg Bacon,,,:/home/gbacon:/bin/bash
bye
Sneak your extra code into the -M option
perl -M'Module;SETUP CODE' -ne 'LOOP CODE'
$ perl -MO=Deparse -M'MyModule;$SETUP=1' -ne '$LOOP=1}{$FINAL=1'
use MyModule;
$SETUP = 1;
LINE: while (defined($_ = <ARGV>)) {
$LOOP = 1;
}
{
$FINAL = 1;
}
-e syntax OK
Put your extra code in a module and use ‑M. That’ll run before the loop.
You might even be able to sneak something in via $ENV{PERL5OPT}, although the switches are pretty limited; no ‑e or ‑E, for example.
I suppose you could do something outrageous with $ENV{PERL_ENCODING} too, if you really wanted to.
This is all Acme:: territory. Please don’t. ☹
EDIT: The only solution I much like is the very uncreative and completely straightforward INIT{}.
Remove the -n and add while (<>) { ... }.
What? It's shorter and more straightforward than the BEGIN thing.
Maybe there is some obscure way to achieve this with -ne, but yeah, it is much easer just to use perl -e and code in the while(<>) yourself.
You can have multiple -e switches on the command line.
Assuming test contains
1
2
3
4
perl -e '$a=5;' -ne '$b+=$_ + $a}{print "$b\n"' test
will print 30.