Perl - Command line input and STDIN - perl

I've created this script below for a assignment I have. It asks for a text file, checks the frequency of words, and lists the 10 words that appear the most times. Everything is working fine, but I need this script to be able to start via the command line as well as via the standard input.
So I need to be able to write 'perl wfreq.pl example.txt' and that should start the script and not ask the question for a text file. I'm not sure how to accomplish this really. I think I might need a while loop at the start somewhere that skips the STDIN if you give it the text file on a terminal command line.
How can I do it?
The script
#! /usr/bin/perl
use utf8;
use warnings;
print "Please enter the name of the file: \n" ;
$file = <STDIN>;
chop $file;
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
while(<$DATA>) {
tr/A-Za-z//cs;
s/[;:()".,!?]/ /gio;
foreach $word (split(' ', lc $_)) {
$freq{$word}++;
}
}
foreach $word (sort { $freq{$b} <=> $freq{$a} } keys %freq) {
#fr = (#fr, $freq{$word});
#ord = (#ord, $word);
}
for ($v =0; $v < 10; $v++) {
print " $fr[$v] | $ord[$v]\n";
}

Instead of reading from <STDIN>, you can read from <> to get data either from files provided on the command line or from stdin if there are no files.
For example, with the program:
#!/usr/bin/env perl
while (<>) {
print $_;
}
The command ./script foo.txt will read and print lines from foo.txt, while ./script by itself will read and print lines from standard input.

You need to do the following:
my $DATA;
my $filename = $ARGV[0];
unless ($filename) {
print "Enter filename:\n";
$filename = <STDIN>;
chomp $filename;
}
open($DATA, $filename) or die $!;
Though I have to say, user-prompts are very un-Unix like.

perl script.pl < input.txt
The use of the operator < passes input.txt to script.pl as standard input. You can then skip querying for the filename. Otherwise, use $ARGV[0] or similar, if defined.

You can check for a command-line argument in #ARGV, which is Perl's array that automagically grabs command line arguments, and --if present-- process them (else continue with input from STDIN). Something like:
use utf8;
use strict; #Don't ever forget this! Always, always, ALWAYS use strict!
use warnings;
if(#ARGV)
{
#Assume that the first command line argument is a file to be processed as in your original code.
#You may or may not want to care if additional command line arguments are passed. Up to you.
}
else
{
#Ask for the filename and proceed as normal.
}
Note that you might want to isolate the code for processing the file name (i.e., the opening of DATA, going through the lines, counting the words, etc.) to a subroutine that you can easily call (perhaps with an argument consisting of the file name) in each branch of the code.
Oh, and to make sure I'm clear about this: always use strict and always use warnings. If you don't, you're asking for trouble.

Related

Unable to redirect the output of the system command to a file named error.log and stderr to another file named test_file.errorlog

This perl script is traversing all directories and sub directories, searching for a file named RUN in it. Then it opens the file and runs the 1st line written in the file. The problem is that I am not able to redirect the output of the system command to a file named error.log and STDERR to another file named test_file.errorlog, but no such file is created.
Note that all variable are declared if not found.
find (\&pickup_run,$path_to_search);
### Subroutine for extracting path of directories with RUN FILE PRESENT
sub pickup_run {
if ($File::Find::name =~/RUN/) {
### If RUN file is present , push it into array named run_file_present
push(#run_file_present,$File::Find::name);
}
}
###### Iterate over the array containing paths to directories containing RUN files one by one
foreach my $var (#run_file_present) {
$var =~ s/\//\\/g;
($path_minus_run=$var) =~ s/RUN\b//;
#print "$path_minus_run\n";
my $test_case_name;
($test_case_name=$path_minus_run) =~ s/expression to be replced//g;
chdir "$path_minus_run";
########While iterating over the paths, open each file
open data, "$var";
#####Run the first two lines containing commands
my #lines = <data>;
my $return_code=system (" $lines[0] >error.log 2>test_file.errorlog");
if($return_code) {
print "$test_case_name \t \t FAIL \n";
}
else {
print "$test_case_name \t \t PASS \n";
}
close (data);
}
The problem is almost certainly that $lines[0] has a newline at the end after being read from the file
But there are several improvements you could make
Always use strict and use warnings at the top of every Perl program, and declare all your variables using my as close as possible to their first point of use
Use the three-parameter form of open and always check whether it succeeded, putting the built-in variable $! into your die string to say why it failed. You can also use autodie to save writing the code for this manually for every open, but it requires Perl v5.10.1 or better
You shouldn't put quotes around scalar variables -- just used them as they are. so chdir $path_minus_run and open data, $var are correct
There is also no need to save all the files to be processed and deal with them later. Within the wanted subroutine, File::Find sets you up with $File::Find::dir set to the directory containing the file, and $_ set to the bare file name without a path. It also does a chdir to the directory for you, so the context is ideal for processing the file
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Find;
my $path_to_search;
find( \&pickup_run, $path_to_search );
sub pickup_run {
return unless -f and $_ eq 'RUN';
my $cmd = do {
open my $fh, '<', $_;
<$fh>;
};
chomp $cmd;
( my $test_name = $File::Find::dir ) =~ s/expression to be replaced//g;
my $retcode = system( "$cmd >error.log 2>test_file.errorlog" );
printf "%s\t\t%s\n", $test_name, $retcode ? 'FAIL' : 'PASS';
}

'merging' 2 files into a third using perl

I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;

Validate perl input - filter out inexistent files

I have a perl script to which i supply input(text file) from batch or sometimes from command prompt. When i supply input from batch file sometimes the file may not exisits. I want to catch the No such file exists error and do some other task when this error is thrown. Please find the below sample code.
while(<>) //here it throws an error when file doesn't exists.
{
#parse the file.
}
#if error is thrown i want to handle that error and do some other task.
Filter #ARGV before you use <>:
#ARGV = grep {-e $_} #ARGV;
if(scalar(#ARGV)==0) die('no files');
# now carry on, if we've got here there is something to do with files that exist
while(<>) {
#...
}
<> reads from the files listed in #ARGV, so if we filter that before it gets there, it won't try to read non-existant files. I've added the check for the size of #ARGV because if you supply a list files which are all absent, it will wait on stdin (the flipside of using <>). This assumes that you don't want to do that.
However, if you don't want to read from stdin, <> is probably a bad choice; you might as well step through the list of files in #ARGV. If you do want the option of reading from stdin, then you need to know which mode you're in:
$have_files = scalar(#ARGV);
#ARGV = grep {-e $_} #ARGV;
if($have_files && scalar(grep {defined $_} #ARGV)==0) die('no files');
# now carry on, if we've got here there is something to do;
# have files that exist or expecting stdin
while(<>) {
#...
}
The diamond operator <> means:
Look at the names in #ARGV and treat them as files you want to open.
Just loop through all of them, as if they were one big file.
Actually, Perl uses the ARGV filehandle for this purpose
If no command line arguments are given, use STDIN instead.
So if a file doesn't exist, Perl gives you an error message (Can't open nonexistant_file: ...) and continues with the next file. This is what you usually want. If this is not the case, just do it manually. Stolen from the perlop page:
unshift(#ARGV, '-') unless #ARGV;
FILE: while ($ARGV = shift) {
open(ARGV, $ARGV);
LINE: while (<ARGV>) {
... # code for each line
}
}
The open function returns a false value when a problem is encountered. So always invoke open like
open my $filehandle "<", $filename or die "Can't open $filename: $!";
The $! contains a reason for the failure. Instead of dieing, we can do some other error recovery:
use feature qw(say);
#ARGV or #ARGV = "-"; # the - symbolizes STDIN
FILE: while (my $filename = shift #ARGV) {
my $filehandle;
unless (open $filehandle, "<", $filename) {
say qq(Oh dear, I can't open "$filename". What do you wan't me to do?);
my $tries = 5;
do {
say qq(Type "q" to quit, or "n" for the next file);
my $response = <STDIN>;
exit if $response =~ /^q/i;
next FILE if $response =~ /^n/i;
say "I have no idea what that meant.";
} while --$tries;
say "I give up" and exit!!1;
}
LINE: while (my $line = <$filehandle>) {
# do something with $line
}
}

How do I copy a CSV file, but skip the first line?

I want to write a script that takes a CSV file, deletes its first row and creates a new output csv file.
This is my code:
use Text::CSV_XS;
use strict;
use warnings;
my $csv = Text::CSV_XS->new({sep_char => ','});
my $file = $ARGV[0];
open(my $data, '<', $file) or die "Could not open '$file'\n";
my $csvout = Text::CSV_XS->new({binary => 1, eol => $/});
open my $OUTPUT, '>', "file.csv" or die "Can't able to open file.csv\n";
my $tmp = 0;
while (my $line = <$data>) {
# if ($tmp==0)
# {
# $tmp=1;
# next;
# }
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
$csvout->print($OUTPUT, \#fields);
} else {
warn "Line could not be parsed: $line\n";
}
}
On the perl command line I write: c:\test.pl csv.csv and it doesn't create the file.csv output, but when I double click the script it creates a blank CSV file. What am I doing wrong?
Your program isn't ideally written, but I can't tell why it doesn't work if you pass the CSV file on the command line as you have described. Do you get the errors Could not open 'csv.csv' or Can't able to open file.csv? If not then the file must be created in your current directory. Perhaps you are looking in the wrong place?
If all you need to do is to drop the first line then there is no need to use a module to process the CSV data - you can handle it as a simple text file.
If the file is specified on the command line, as in c:\test.pl csv.csv, you can read from it without explicitly opening it using the <> operator.
This program reads the lines from the input file and prints them to the output only if the line counter (the $. variable) isn't equal to one).
use strict;
use warnings;
open my $out, '>', 'file.csv' or die $!;
while (my $line = <>) {
print $out $line unless $. == 1;
}
Yhm.. you don't need any modules for this task, since CSV ( comma separated value ) are simply text files - just open file, and iterate over its lines ( write to output all lines except particular number, e.g. first ). Such task ( skip first line ) is so simple, that it would be probably better to do it with command line one-liner than a dedicated script.
quick search - see e.g. this link for an example, there are numerous tutorials about perl input/output operations
http://learn.perl.org/examples/read_write_file.html
PS. Perl scripts ( programs ) usually are not "compiled" into binary file - they are of course "compiled", but, uhm, on the fly - that's why /usr/bin/perl is called rather "interpreter" than "compiler" like gcc or g++. I guess what you're looking for is some editor with syntax highlighting and other development goods - you probably could try Eclipse with perl plugin for that ( cross platform ).
http://www.eclipse.org/downloads/
http://www.epic-ide.org/download.php/
this
user#localhost:~$ cat blabla.csv | perl -ne 'print $_ if $x++; '
skips first line ( prints out only if variable incremented AFTER each use of it is more than zero )
You are missing your first (and only) argument due to Windows.
I think this question will help you: #ARGV is empty using ActivePerl in Windows 7

Perl script that works the same as unix command "history | grep keyword"

in Unix, what I want to do is "history | grep keyword", just because it takes quite some steps if i wanna grep many types of keywords, so I want it to be automation, which I write a Perl script to do everything, instead of repeating the commands by just changing the keyword, so whenever I want to see those certain commands, I will just use the Perl script to do it for me.
The keyword that I would like to 'grep' is such as source, ls, cd, etc.
It can be printed out in any format, as long as to know how to do it.
Thanks! I appreciate any comments.
modified (thanks to #chas-owens)
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = ".bash.history";
open FILE, "<", $historyFile or die "could not open $historyFile: $!";
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
original
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = "<.bash.history";
open FILE, $historyFile;
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
to be honest ... history | grep whatever is clean and simple and nice ; )
note code may not be perfect
because it takes quite some steps if i wanna grep many types of keywords
history | grep -E 'ls|cd|source'
-P will switch on the Perl compatible regular expression library, if you have a new enough version of grep.
This being Perl, there are many ways to do it. The simplest is probably:
#!/usr/bin/perl
use strict;
use warnings;
my $regex = shift;
print grep { /$regex/ } `cat ~/.bash_history`;
This runs the shell command cat ~/.bash_history and returns the output as a list of lines. The list of lines is then consumed by the grep function. The grep function runs the code block for every item and only returns the ones that have a true return value, so it will only return lines that match the regex.
This code has several things wrong with it (it spawns a shell to run cat, it holds the entire file in memory, $regex could contain dangerous things, etc.), but in a safe environment where speed/memory isn't an issue, it isn't all that bad.
A better script would be
#!/usr/bin/perl
use strict;
use warnings;
use constant HISTORYFILE => "$ENV{HOME}/.bash_history";
my $regex = shift;
open my $fh, "<", HISTORYFILE
or die "could not open ", HISTORYFILE, ": $!";
while (<$fh>) {
next unless /$regex/;
print;
}
This script uses a constant to make it easier to change which history file it is using at a latter date. It opens the history file directly and reads it line by line. This means the whole file is never in memory. This can be very important if the file is very large. It still has the problem that $regex might contain a harmful regex, but so long as you are the person running it, you only have yourself to blame (but I wouldn't let outside users pass arguments to a command like this through, say a web application).
I think you are better off writing a perlscript which does you fancy matching (i.e. replaces the grep) but does not read the history file. I say this because the history does not appear to be flushed to the .bash_history file until I exit the shell. Now there are probably settings and/or environment variables to control this, but I don't know what they are. So if you just write a perl script which scanns STDIN for your favourite commands you can invoke it like
history | findcommands.pl
If its less typing you are after set up a shell function or alias to do this for you.
As requested by #keifer here is a sample perl script which searches for a specified (or default set of commands in your history). Onbiously you should change the dflt_cmds to whichever ones you search for most frequently.
#!/usr/bin/perl
my #dflt_cmds = qw( cd ls echo );
my $cmds = \#ARGV;
if( !scalar(#$cmds) )
{
$cmds = \#dflt_cmds;
}
while( my $line = <STDIN> )
{
my( $num, $cmd, #args ) = split( ' ', $line );
if( grep( $cmd eq $_ , #$cmds ) )
{
print join( ' ', $cmd, #args )."\n";
}
}