Perl script giving un predictable results - perl

I am very new to Perl. I wrote a script to display user name from Linux passwd file.
It displays list of user name but then it also display user ids (which I am not trying to display at the moment) and at the end it displays "List of users ids and names:" which it should display before displaying list of names.
Any idea why it is behaving like this?
#!/usr/bin/perl
#names=system("cat /etc/passwd | cut -f 1 -d :");
#ids=system("cat /etc/passwd | cut -f 3 -d :");
$length=#ids;
$i=0;
print "List of users ids and names:\n";
while ($i < $length) {
print $names[$i];
$i +=1;
}

Short answer: system doesn't return output from a command; it returns the exit value. As the output of the cut isn't redirected, it prints to the current STDOUT (e.g. your terminal). Use open or qx// quotes (aka backticks) to capture output:
#names = `cat /etc/passwd | cut -f 1 -d :`;
As you are still learning Perl, here is a write-up detailing how I'd solve that problem:
First, always use strict; use warnings; at the beginning of your script. This helps preventing and detecting many problems, which makes it an invaluable help.
Next, starting a shell when everything could be done inside Perl is inefficient (your solution starts six unneccessary processes (two sets of sh, cat, cut)). In fact, cat is useless even in the shell version; just use shell redirection operators: cut ... </etc/passwd.
To open a file in Perl, we'll do
use autodie; # automatic error handling
open my $passwd, "<", "/etc/passwd";
The "<" is the mode (here: reading). The $passwd variable now holds a file handle from which we can read lines like <$passwd>. The lines still contain a newline, so we'll chomp the variable (remove the line ending):
while (<$passwd>) { # <> operator reads into $_ by default
chomp; # defaults to $_
...
}
The split builtin takes a regex that matches separators, a string (defaults to $_ variable), and a optional limit. It returns a list of fields. To split a string with : seperator, we'll do
my #fields = split /:/;
The left hand side doesn't have to be an array, we can also supply a list of variables. This matches the list on the right, and assigns one element to each variable. If we want to skip a field, we name it undef:
my ($user, undef, $id) = split /:/;
Now we just want to print the user. We can use the print command for that:
print "$user\n";
From perl5 v10 on, we can use the say feature. This behaves exactly like print, but auto-appends a newline to the output:
say $user;
And voilĂ , we have our final script:
#!/usr/bin/perl
use strict; use warnings; use autodie; use feature 'say';
open my $passwd, "<", "/etc/passwd";
while (<$passwd>) {
chomp;
my ($user, undef, $id) = split /:/;
say $user;
}
Edit for antique perls
The autodie module was forst distributed as a core module with v10.1. Also, the feature 'say' isn't available before v10.
Therefore, we must use print instead of say and do manual error handling:
#!/usr/bin/perl
use strict; use warnings;
open my $passwd, "<", "/etc/passwd" or die "Can't open /etc/passwd: $!";
while (<$passwd>) {
chomp;
my ($user, undef, $id) = split /:/;
print "$user\n";
}
The open returns a false value when it fails. In that case, the $! variable will hold the reason for the error.

For reading of system databases you should use proper system functions:
use feature qw(say);
while (
my ($name, $passwd, $uid, $gid, $quota,
$comment, $gcos, $dir, $shell, $expire
)
= getpwent
)
{
say "$uid $name";
}

If you're scanning the entire password file, you can use getpwent():
while( my #pw = getpwent() ){
print "#pw\n";
}
See perldoc -f getpwent.

Related

print specific INFILE area using perl

I have a file with the format below
locale,English,en_AU,6251
locale,French,fr_BE,25477
charmap,English,EN,5423
And I would like to use perl to print out something with the option "-a" follows by the file and outputs something like
Available locales:
en_Au
fr_BE
EN
To do that, I have the perl script below
$o = $ARGV[0];
$f = $ARGV[1];
open (INFILE, "<$f") or die "error";
my $line = <INFILE>;
my #fields = split(',', $line);
if($o eq "-a"){
if(!$fields[2]){print "No locales available\n";}
else{print "Available locales: \n";
while($fields[2]){print "$fields[2]\n";}
}
}
close(INFILE);
And I have three questions here.
1. my script will only print the first locale "en_Au" forever.
2. it should be able to test if a file is empty, but if a file is purely empty, it outputs nothing, but if I type in two empty lines in the file, it prints two lines of "No locales available" instead.
3.In fact in the (!$filed[2]) part I should verify if the file is empty or no available locales exist, if so do I need to put some regular expression here to verify if it is a locale as well??
Hope someone could help me figure these out! Many thanks!!!
The biggest missing thing is a loop over lines from the file, in which you then process one line at a time. Comments follow the code.
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
#my ($opt, $file) = #ARGV; # better use a module
my ($opt, $file);
Getoptions( 'a' => \$opt, 'file=s' => \$file ) or usage();
usage() if not $file; # mandatory argument
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>) {
chomp $line;
my #fields = split /,/, $line;
next if not $fields[2];
if ($opt) {
say $fields[2];
}
}
close $fh;
sub usage {
say STDERR "Usage: $0 [-a] --file filename";
exit 1;
}
This prints the desired output. (Is that simple condition on $fields[2] really all you need?)
Comments
Always have use warnings; and use strict; at the beginning
I do not recommend single-letter variable names. One forgets what they mean, it makes the code harder to follow, and it's way too easy to make silly mistakes
The #ARGV can be assigned to variables in a list. Much better, use Getopt::Long module, which checks invocation and allows for far easier interface changes. I set the -a option to act as a "flag," so it just sets a variable ($opt) if it's given. If that should have possible values instead, use 'a=s' => \$opt and check for a value.
Use lexical filehandles and the three-argument open, open my $fh, '<', $file ...
When die-ing print the error, die "... $!";, using $! variable
The "diamond" (angle) operator, <$fh>, reads one line from a file opened with $fh when used in scalar context, as in $line = <$fh>. It advances a pointer in the file as it reads a line so the next time it's used it returns the next line. If you use it in list context then it returns all lines, but when you process a file you normally want to go line by line.
Some of the described logic and requirements aren't clear to me, but hopefully the code above is going to be easier to adjust as needed.

'merging' 2 files into a third using perl

I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;

Parsing the large files in Perl

I need to compare the big file(2GB) contains 22 million lines with the another file. its taking more time to process it while using Tie::File.so i have done it through 'while' but problem remains. see my code below...
use strict;
use Tie::File;
# use warnings;
my #arr;
# tie #arr, 'Tie::File', 'title_Nov19.txt';
# open(IT,"<title_Nov19.txt");
# my #arr=<IT>;
# close(IT);
open(RE,">>res.txt");
open(IN,"<input.txt");
while(my $data=<IN>){
chomp($data);
print"$data\n";
my $occ=0;
open(IT,"<title_Nov19.txt");
while(my $line2=<IT>){
my $line=$line2;
chomp($line);
if($line=~m/\b$data\b/is){
$occ++;
}
}
print RE"$data\t$occ\n";
}
close(IT);
close(IN);
close(RE);
so help me to reduce it...
Lots of things wrong with this.
Asides from the usual (lack of use strict, use warnings, use of 2-argument open(), not checking open() result, use of global filehandles), the specific problem in your case is that you are opening/reading/closing the second file once for every single line of the first. This is going to be very slow.
I suggest you open the file title_Nov19.txt once, read all the lines into an array or hash or something, then close it; and then you can open the first file, input.txt and walk along that once, comparing to things in the array so you don't have to reopen that second file all the time.
Futher I suggest you read some basic articles on style/etc.. as your question is likely to gain more attention if it's actually written in vaguely modern standards.
I tried to build a small example script with a better structure but I have to say, man, your problem description is really very unclear. It's important to not read the whole comparison file each time as #LeoNerd explained in his answer. Then I use a hash to keep track of the match count:
#!/usr/bin/env perl
use strict;
use warnings;
# cache all lines of the comparison file
open my $comp_file, '<', 'input.txt' or die "input.txt: $!\n";
chomp (my #comparison = <$comp_file>);
close $comp_file;
# prepare comparison
open my $input, '<', 'title_Nov19.txt' or die "title_Nov19.txt: $!\n";
my %count = ();
# compare each line
while (my $title = <$input>) {
chomp $title;
# iterate comparison strings
foreach my $comp (#comparison) {
$count{$comp}++ if $title =~ /\b$comp\b/i;
}
}
# done
close $input;
# output (sorted by count)
open my $output, '>>', 'res.txt' or die "res.txt: $!\n";
foreach my $comp (#comparison) {
print $output "$comp\t$count{$comp}\n";
}
close $output;
Just to get you started... If someone wants to further work on this: these were my test files:
title_Nov19.txt
This is the foo title
Wow, we have bar too
Nothing special here but foo
OMG, the last title! And Foo again!
input.txt
foo
bar
And the result of the program was written to res.txt:
foo 3
bar 1
Here's another option using memowe's (thank you) data:
use strict;
use warnings;
use File::Slurp qw/read_file write_file/;
my %count;
my $regex = join '|', map { chomp; $_ = "\Q$_\E" } read_file 'input.txt';
for ( read_file 'title_Nov19.txt' ) {
my %seen;
!$seen{ lc $1 }++ and $count{ lc $1 }++ while /\b($regex)\b/ig;
}
write_file 'res.txt', map "$_\t$count{$_}\n",
sort { $count{$b} <=> $count{$a} } keys %count;
Numerically-sorted output to res.txt:
foo 3
bar 1
An alternation regex which quotes meta characters (\Q$_\E) is built and used, so only one pass against the large file's lines is needed. The hash %seen is used to insure that the input words are only counted once per line.
Hope this helps!
Try this:
grep -i -c -w -f input.txt title_Nov19.txt > res.txt

Perl - Command line input and STDIN

I've created this script below for a assignment I have. It asks for a text file, checks the frequency of words, and lists the 10 words that appear the most times. Everything is working fine, but I need this script to be able to start via the command line as well as via the standard input.
So I need to be able to write 'perl wfreq.pl example.txt' and that should start the script and not ask the question for a text file. I'm not sure how to accomplish this really. I think I might need a while loop at the start somewhere that skips the STDIN if you give it the text file on a terminal command line.
How can I do it?
The script
#! /usr/bin/perl
use utf8;
use warnings;
print "Please enter the name of the file: \n" ;
$file = <STDIN>;
chop $file;
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
while(<$DATA>) {
tr/A-Za-z//cs;
s/[;:()".,!?]/ /gio;
foreach $word (split(' ', lc $_)) {
$freq{$word}++;
}
}
foreach $word (sort { $freq{$b} <=> $freq{$a} } keys %freq) {
#fr = (#fr, $freq{$word});
#ord = (#ord, $word);
}
for ($v =0; $v < 10; $v++) {
print " $fr[$v] | $ord[$v]\n";
}
Instead of reading from <STDIN>, you can read from <> to get data either from files provided on the command line or from stdin if there are no files.
For example, with the program:
#!/usr/bin/env perl
while (<>) {
print $_;
}
The command ./script foo.txt will read and print lines from foo.txt, while ./script by itself will read and print lines from standard input.
You need to do the following:
my $DATA;
my $filename = $ARGV[0];
unless ($filename) {
print "Enter filename:\n";
$filename = <STDIN>;
chomp $filename;
}
open($DATA, $filename) or die $!;
Though I have to say, user-prompts are very un-Unix like.
perl script.pl < input.txt
The use of the operator < passes input.txt to script.pl as standard input. You can then skip querying for the filename. Otherwise, use $ARGV[0] or similar, if defined.
You can check for a command-line argument in #ARGV, which is Perl's array that automagically grabs command line arguments, and --if present-- process them (else continue with input from STDIN). Something like:
use utf8;
use strict; #Don't ever forget this! Always, always, ALWAYS use strict!
use warnings;
if(#ARGV)
{
#Assume that the first command line argument is a file to be processed as in your original code.
#You may or may not want to care if additional command line arguments are passed. Up to you.
}
else
{
#Ask for the filename and proceed as normal.
}
Note that you might want to isolate the code for processing the file name (i.e., the opening of DATA, going through the lines, counting the words, etc.) to a subroutine that you can easily call (perhaps with an argument consisting of the file name) in each branch of the code.
Oh, and to make sure I'm clear about this: always use strict and always use warnings. If you don't, you're asking for trouble.

Perl script that works the same as unix command "history | grep keyword"

in Unix, what I want to do is "history | grep keyword", just because it takes quite some steps if i wanna grep many types of keywords, so I want it to be automation, which I write a Perl script to do everything, instead of repeating the commands by just changing the keyword, so whenever I want to see those certain commands, I will just use the Perl script to do it for me.
The keyword that I would like to 'grep' is such as source, ls, cd, etc.
It can be printed out in any format, as long as to know how to do it.
Thanks! I appreciate any comments.
modified (thanks to #chas-owens)
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = ".bash.history";
open FILE, "<", $historyFile or die "could not open $historyFile: $!";
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
original
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = "<.bash.history";
open FILE, $historyFile;
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
to be honest ... history | grep whatever is clean and simple and nice ; )
note code may not be perfect
because it takes quite some steps if i wanna grep many types of keywords
history | grep -E 'ls|cd|source'
-P will switch on the Perl compatible regular expression library, if you have a new enough version of grep.
This being Perl, there are many ways to do it. The simplest is probably:
#!/usr/bin/perl
use strict;
use warnings;
my $regex = shift;
print grep { /$regex/ } `cat ~/.bash_history`;
This runs the shell command cat ~/.bash_history and returns the output as a list of lines. The list of lines is then consumed by the grep function. The grep function runs the code block for every item and only returns the ones that have a true return value, so it will only return lines that match the regex.
This code has several things wrong with it (it spawns a shell to run cat, it holds the entire file in memory, $regex could contain dangerous things, etc.), but in a safe environment where speed/memory isn't an issue, it isn't all that bad.
A better script would be
#!/usr/bin/perl
use strict;
use warnings;
use constant HISTORYFILE => "$ENV{HOME}/.bash_history";
my $regex = shift;
open my $fh, "<", HISTORYFILE
or die "could not open ", HISTORYFILE, ": $!";
while (<$fh>) {
next unless /$regex/;
print;
}
This script uses a constant to make it easier to change which history file it is using at a latter date. It opens the history file directly and reads it line by line. This means the whole file is never in memory. This can be very important if the file is very large. It still has the problem that $regex might contain a harmful regex, but so long as you are the person running it, you only have yourself to blame (but I wouldn't let outside users pass arguments to a command like this through, say a web application).
I think you are better off writing a perlscript which does you fancy matching (i.e. replaces the grep) but does not read the history file. I say this because the history does not appear to be flushed to the .bash_history file until I exit the shell. Now there are probably settings and/or environment variables to control this, but I don't know what they are. So if you just write a perl script which scanns STDIN for your favourite commands you can invoke it like
history | findcommands.pl
If its less typing you are after set up a shell function or alias to do this for you.
As requested by #keifer here is a sample perl script which searches for a specified (or default set of commands in your history). Onbiously you should change the dflt_cmds to whichever ones you search for most frequently.
#!/usr/bin/perl
my #dflt_cmds = qw( cd ls echo );
my $cmds = \#ARGV;
if( !scalar(#$cmds) )
{
$cmds = \#dflt_cmds;
}
while( my $line = <STDIN> )
{
my( $num, $cmd, #args ) = split( ' ', $line );
if( grep( $cmd eq $_ , #$cmds ) )
{
print join( ' ', $cmd, #args )."\n";
}
}