I am writing a script that looks at an access_log file to see how many times each search engine was accessed and to see which one is accessed the most. I am sure there are problems with some of my syntax, but I can't even tell since I am not receiving any information back when running it. Any help would be appreciated!
Code:
#!/usr/bin/perl
use 5.010;
$googleCount = 0;
$msnCount = 0;
$yahooCount = 0;
$askCount = 0;
$bingCount = 0;
while (<STDIN>)
{
if (/(google.com)/)
{
$googleCount++;
}
if (/(msn.com)/)
{
$msnCount++;
}
if (/yahoo.com/)
{
$yahooCount++;
}
if (/ask.com/)
{
$askCount++;
}
if (/bing.com/)
{
$bingCount++;
}
}
print "Google.com was accessed $googleCount times in this log.\n";
print "MSN.com was accessed $msnCount times in this log.\n";
print "Yahoo.com was accessed $yahooCount times in this log.\n";
print "Ask.com was accessed $askCount times in this log.\n";
print "Bing.com was accessed $bingCount times in this log.\n";
I am running MacOS. In the terminal I am typing:
perl -w access_scan.pl access_log.1
When I press enter, nothing happens.
Beside the fact that your script didn't work as you expected, there are a few things wrong with your script:
In regexes, the dot . matches any non-newline character. This includes a literal period, but is not restricted to that. Either escape it (/google\.com/) or protect special characters with \Q...\E: /\Qgoogle.com\E/.
There is a programming proverb “Three or more, use a for”. All your conditionals inside your loop are the same, except for the regex. You counts are actually one variable. Your report at the end is the same line multiple times.
You can use a hash to ease the pain:
#!/usr/bin/perl
use strict; use warnings; use feature 'say';
my %count; # a hash is a mapping of strings to scalars (e.g. numbers)
my #sites = qw/google.com msn.com yahoo.com ask.com bing.com/;
# initialize the counts we are interested in:
$count{$_} = 0 foreach #sites;
while (<>) { # accept input from files specified as command line options or STDIN
foreach my $site (#sites) {
$count{$site}++ if /\Q$site\E/i; # /i for case insensitive matching
}
}
foreach my $site (#sites) {
say "\u$site was accessed $count{$site} times in this log";
}
The \u uppercases the next character, this is required to produce identical output.
The say is exactly like print, but appends a newline. It is available in perl5 v10 or later.
The script is trying to read from STDIN, but you are providing the filename to read from as an argument.
"Nothing happens" because the script is waiting for input (which, since you haven't redirected anything to standard input, it expects you to type).
Change <STDIN> to <> or change the command to perl -w access_scan.pl < access_log.1
Your script is reading from stdin, but you're providing your input as a file. You need to redirect thus:
perl -w access_scan.pl < access_log.1
The < file construct provides the contents of your file as the standard input for your script.
The script works fine (I tested it), but you need to feed it with the log in the STDIN:
cat access_log.1 | perl -w access_scan.pl
Related
I made a small script in perl that displays the content of a file sent by a user.
However I noticed something strange: if the file is named 0, nothing will be printed, like if I didn't send any file and just refreshed the page.
How can this happen?
Is there any risk of someone dropping in the filename command to make my server execute it? (with the pipe thing)
Here is the code:
#!/usr/bin/perl
use CGI;
my $cgi = CGI->new;
print "Content-type: text/html\n\n";
if ($cgi->upload('file')) {
print '<h1>file uploaded:</h1>';
my $file = $cgi->param('file');
while (<$file>) {
print "a";
print "<p>".$cgi->escapeHTML($_)."</p>";
}
}
Because the string 0, like the empty string and the undef value, evaluate to false in a boolean context like
if ($cgi->upload('file')) { ... }
In filenames and text processing, this is an edge case that is usually less trouble than its worth to think about, but when you do need to worry about it, the workarounds are to evaluate whether the input is an empty string or not
if ($cgi->upload('file') ne '') { ... }
if (length($cgi->upload('file'))) { ... }
Early versions of the CGI module open the temporary file with sysopen and modern versions use File::Temp. Either way is sufficient to ensure that Perl is attempting to open a real file and will not use a shell that can be tricked by pipes or backticks into executing an arbitrary command.
I would probably use autoEscape ().
Another option is to simply test for the file name being "0" and asking the user to change it if it is.
Also, you could use temporary filenames.
See the official CGI documentation for more information.
CGI Documentation
I am executing my script this way:
./script.pl -f files*
I looked at some other threads (like How can I open a file in Perl using a wildcard in the directory name?)
If i hard code the file name like it is written in this thread I get my desired result. If I take it from the command line it does not.
My options subroutine should save all the files I get this way in an array.
my #file;
sub Options{
my $i=0;
foreach my $opt (#ARGV){
switch ($opt){
case "-f" {
$i++;
### This part does not work:
#file= glob $ARGV[$i];
print Dumper("$ARGV[$i]"); #$VAR1 = 'files';
print Dumper(#file); #$VAR1 = 'files';
}
}
$i++;
}
}
It seems the execution is interpreted in advance and the wildcard (*) is dropped in the process.
Desired result: All files beginning with files are saved in an array, after execution from the command line.
I hope you get my problem. If not feel free to ask.
Thank you.
Well, first I'd suggest using a module to do args on command line:
Getopt::Long for example.
But otherwise your problem is simpler - your shell is expanding the 'file*' before perl gets it. (shell glob is getting there first).
If you do this with:
-f 'file*'
then it'll work properly. You should be able to see this - for example - if you just:
use Data::Dumper;
print Dumper \#ARGV;
I expect you'll see a much longer list than you thought.
However, I'd also point out - perl has a really nice feature you may be able to use (depending what you're doing with your files).
You can use <>, which automatically opens and reads all files specified on command line (in order).
Since your shell is already expanding the glob files* into a list of filenames, that's what the Perl program gets.
$ perl -E 'say #ARGV' files*
files1files2files3
There's no need to do that in Perl, if your shell can do it for you. If all you want is the filenames in an array, you already have #ARGV which contains those.
I'm trying to modify a script that someone else has written and I wanted to keep my script separate from his.
The script I wrote ends with a print line that outputs all relevant data separated by spaces.
Ex: print "$sap $stuff $more_stuff";
I want to use this data in the middle of another perl script and I'm not sure if it's possible using a system call to the script I wrote.
Ex: system("./sap_calc.pl $id"); #obtain printed data from sap_calc.pl here
Can this be done? If not, how should I go about this?
Somewhat related, but not using system():
How can I get one Perl script to see variables in another Perl script?
How can I pass arguments from one Perl script to another?
You're looking for the "backtick operator."
Have a look at perlop, Section "Quote-like operators".
Generally, capturing a program's output goes like this:
my $output = `/bin/cmd ...`;
Mind that the backtick operator captures STDOUT only. So in order to capture everything (STDERR, too) the commands needs to be appended with the usual shell redirection "2>&1".
If you want to use the data printed to stdout from the other script, you'd need to use backticks or qx().
system will only return the return value of the shell command, not the actual output.
Although the proper way to do this would be to import the actual code into your other script, by building a module, or simply by using do.
As a general rule, it is better to use all perl solutions, than relying on system/shell as a way of "simplifying".
myfile.pl:
sub foo {
print "Foo";
}
1;
main.pl:
do 'myfile.pl';
foo();
perldoc perlipc
Backquotes, like in shell, will yield the standard output of the command as a string (or array, depending on context). They can more clearly be written as the quote-like qx operator.
#lines = `./sap_calc.pl $id`;
#lines = qx(./sap_calc.pl $id);
$all = `./sap_calc.pl $id`;
$all = qx(./sap_calc.pl $id);
open can also be used for streaming instead of reading into memory all at once (as qx does). This can also bypass the shell, which avoids all sorts of quoting issues.
open my $fh, '-|', './sap_calc.pl', $id;
while (readline $fh) {
print "read line: $_";
}
I have a perl script file called as xyz.prl.
If I run this in command prompt, then it will ask for some command line inputs.
So I have placed all the inputs in a separate file called as input.txt.
Then I used the following command in the command window.
D:>xyz.prl < input.txt
But it is not taking the values from the file input.txt and going into infinite loop by asking for the first value.
If I give only xyz.prl , Then it is asking for the input values. and accepting the value which I give manually.
Actually I have to develop a VC++ (MFC) dialog based application and using windowsXP. In that I have to use system command to run this xyz.prl and I have to pass all the arguments as a text file (input.txt). I am very sorry to inform u that the xyz.prl is highly secured file and I cannot share the code.
If I give xyz.prl directly on command prompt, it is asking for the input values one by one. But using system call I cannot send the values like that.
I am entirely new to perl. So please let me know the command that I have to pass to system command.
Thanks,
Segu
Thanks
Segu
With the shell redirection xyz.pl < input.txt I believe you are giving the input from the file through STDIN, which can be read with <STDIN> or the diamond operator <>.
However, the generic way to read data from a file is:
Usage:
$ xyz.pl input.txt
Code:
use strict;
use warnings;
use ARGV::readonly;
while (<>) {
# $_ variable contains each line from the file
}
That's because input.txt won't be passed as a parameter -- it will be accessible as a stream. In the example below it's the "while(<>)"
http://alumnus.caltech.edu/~svhwan/prodScript/perlGettingInput.html
#!/bin/sh
#! -*- perl -*-
eval 'exec $PERLLOCATION/bin/perl -x $0 ${1+"$#"} ;'
if 0;
$okayToPrint = 0;
while (<>) {
my $currLine = $_;
if ($currLine eq "WorldBegin\n") {
$okayToPrint = 1;
} elsif ($currLine eq "WorldEnd\n") {
$okayToPrint = 0;
} else {
# some line between WorldBegin and WorldEnd
print $currLine;
}
}
I've had a search around, and from my perspective using backticks is the only way I can solve this problem. I'm trying to call the mdls command from Perl for each file in a directory to find it's last accessed time. The issue I'm having is that in the file names I have from find I have unescaped spaces which bash obviously doesn't like. Is there an easy way to escape all of the white space in my file names before passing them to mdls. Please forgive me if this is an obvious question. I'm quite new to Perl.
my $top_dir = '/Volumes/hydrogen/FLAC';
sub wanted { # Learn about sub routines
if ($File::Find::name) {
my $curr_file_path = $File::Find::name. "\n";
`mdls $curr_file_path`;
print $_;
}
}
find(\&wanted, $top_dir);
If you are JUST wanting "last access time" in terms of of the OS last access time, mdls is the wrong tool. Use perl's stat. If you want last access time in terms of the Mac registered application (ie, a song by Quicktime or iTunes) then mdls is potentially the right tool. (You could also use osascript to query the Mac app directly...)
Backticks are for capturing the text return. Since you are using mdls, I assume capturing and parsing the text is still to come.
So there are several methods:
Use the list form of system and the quoting is not necessary (if you
don't care about the return text);
Use String::ShellQuote to escape the file name before sending to sh;
Build the string and enclose in single quotes prior to sending to sending to the shell. This is harder than it sounds because files names with single quotes defeats your quotes! For example, sam's song.mp4 is a legal file name, but if you surround with single quotes you get 'sam's song.mp4' which is not what you meant...
Use open to open a pipe to the output of the child process like this: open my $fh, '-|', "mdls", "$curr_file" or die "$!";
Example of String::ShellQuote:
use strict; use warnings;
use String::ShellQuote;
use File::Find;
my $top_dir = '/Users/andrew/music/iTunes/iTunes Music/Music';
sub wanted {
if ($File::Find::name) {
my $curr_file = "$File::Find::name";
my $rtr;
return if -d;
my $exec="mdls ".shell_quote($curr_file);
$rtr=`$exec`;
print "$rtr\n\n";
}
}
find(\&wanted, $top_dir);
Example of pipe:
use strict; use warnings;
use String::ShellQuote;
use File::Find;
my $top_dir = '/Users/andrew/music/iTunes/iTunes Music/Music';
sub wanted {
if ($File::Find::name) {
my $curr_file = "$File::Find::name";
my $rtr;
return if -d;
open my $fh, '-|', "mdls", "$curr_file" or die "$!";
{ local $/; $rtr=<$fh>; }
close $fh or die "$!";
print "$rtr\n\n";
}
}
find(\&wanted, $top_dir);
If you're sure the filenames don't contain newlines (either CR or LF), then pretty much all Unix shells accept backslash quoting, and Perl has the quotemeta function to apply it.
my $curr_file_path = quotemeta($File::Find::name);
my $time = `mdls $curr_file_path`;
Unfortunately, that doesn't work for filenames with newlines, because the shell handles a backslash followed by a newline by deleting both characters instead of just the backslash. So to be really safe, use String::ShellQuote:
use String::ShellQuote;
...
my $curr_file_path = shell_quote($File::Find::name);
my $time = `mdls $curr_file_path`;
That should work on filenames containing anything except a NUL character, which you really shouldn't be using in filenames.
Both of these solutions are for Unix-style shells only. If you're on Windows, proper shell quoting is much trickier.
If you just want to find the last access time, is there some weird Mac reason you aren't using stat? When would it be worse than kMDItemLastUsedDate?
my $last_access = ( stat($file) )[8];
It seems kMDItemLastUsedDate isn't always updated to the last access time. If you work with a file through the terminal (e.g. cat, more), kMDItemLastUsedDate doesn't change but the value that comes back from stat is right. touch appears to do the right thing in both cases.
It looks like you need stat for the real answer, but mdls if you're looking for access through applications.
You can bypass the shell by expressing the command as a list, combined with capture() from IPC::System::Simple:
use IPC::System::Simple qw(capture);
my $output = capture('mdls', $curr_file_path);
Quote the variable name inside the backticks:
`mdls "$curr_file_path"`;
`mdls '$curr_file_path'`;