Does "print $ARGV" alter the argument array in any way? - perl

Here is the example:
$a = shift;
$b = shift;
push(#ARGV,$b);
$c = <>;
print "\$b: $b\n";
print "\$c: $c\n";
print "\$ARGV: $ARGV\n";
print "\#ARGV: #ARGV\n";
And the output:
$b: file1
$c: dir3
$ARGV: file2
#ARGV: file3 file1
I don't understand what exactly is happening when printing $ARGV without any index. Does it print the first argument and then remove it from the array? Because I thought after all the statements the array becomes:
file2 file3 file1
Invocation:
perl port.pl -axt file1 file2 file3
file1 contains the lines:
dir1
dir2
file2:
dir3
dir4
dir5
file3:
dir6
dir7

Greg has quoted the appropriate documentation, so here's a quick rundown of what happens
$a = shift; # "-axt" is removed from #ARGV and assigned to $a
$b = shift; # "file1" likewise
push(#ARGV,$b); # "file1" inserted at end of #ARGV
$c = <>; # "file2" is removed from #ARGV, and its file
# handle opened, the first line of file2 is read
When the file handle for "file2" is opened, it sets the file name in $ARGV. As Greg mentioned, #ARGV and $ARGV are completely different variables.
The internal workings of the diamond operator <> is probably what is confusing you here, in that it does an approximate $ARGV = shift #ARGV

In Perl, $ARGV and #ARGV are completely different. From perlvar:
$ARGV
Contains the name of the current file when reading from <>.
#ARGV
The array #ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself. See $0 for the command name.

No, but <> does. <> is short for <ARGV> (which in turn is short for readline(ARGV))
, where ARGV is a special file handle that reads from the files listed in #ARGV (or STDIN if #ARGV is empty). As it opens the files in #ARGV, it removes them from #ARGV and stores them in $ARGV.

Related

Perl ARGV value in scalar context

Given the following Perl script:
# USAGE: ./flurp -x -vf file1 file2 file3 file4
# e.
$a = shift;
$b = shift;
$c = shift;
#d = <>;
# ei. value of $b = -vf
# eii. value of #d = content of file2, file3, file4
print "$b\n";
print "#d\n";
print "$ARGV\n";
This is the output:
-vf
{contents of file2, file3, file4}
file4
I am puzzled by the output of print "$ARGV\n". If I try to do print "$ARGV[-1]\n", an empty line is printed out to STDOUT. If I directly reference $ARG[2], I get an empty line as well.
Why is the script printing file4 when $ARGV is used?
As a counter-example, I tried print "$d\n", expecting to get the last line of file4. Instead of the expected output I got an empty line. How does $ARGV work?
In answer to your specific question: "How does $ARGV work?"
$ARGV Contains the name of the current file when reading from <> .
from Variables related to filehandles in the Perl docs.
Although at the point you print $ARGV you've finished reading from file4, the variable still holds the name of the file.

Performing a one-liner on multiple input files specified by extension

I'm using the following line to split and process a tab-delimited .txt file:
perl -lane 'next unless $. >30; #array = split /[:,\/]+/, $F[2]; print if $array[1]/$array[2] >0.5 && $array[4] >2' input.txt > output.txt
Is there a way to alter this one-liner in order to perform this on multiple input files without specifying each individually?
Ideally this would be accomplished by performing it on all files within the current directory holding the .txt (or other) file extension - and then outputting a set of modified files names e.g.:
Input:
test1.txt
test2.txt
Output:
test1MOD.txt
test2MOD.txt
I know that I can access the filename to modify it with $ARGV but I do not know how to go about getting it to run on multiple files.
Solution:
perl -i.MOD -lane 'next unless $. >30; #array = split /[:,\/]+/, $F[2]; print if $array[1]/$array[2] >0.5 && $array[4] >2; close ARGV if eof;' *.txt
$. needs to be reset otherwise it throws a division by zero error.
If you don't mind slightly different output file name,
perl -i.MOD -lane'
next unless $. >30;
#array = split /[:,\/]+/, $F[2];
print if $array[1]/$array[2] >0.5 && $array[4] >2;
close ARGV if eof; # Reset $. for each file.
' *.txt
Have you considered calling the perl script from a shell for loop?
for TXT in *.txt; do
OUT=$(basename $TXT .txt)MOD.txt
perl ... $TXT > $OUT
done

Find file which content not match a string pattern in Perl

I'm writing a code to find the file which not contain a string pattern. Provided I have a list of files, I have to look into the content of each file, I would like to get the file name if the string pattern "clean" not appear inside the file. Pls help.
Here is the scenario:
I have a list of files, inside each file is having numerous of lines. If the file is clean, it will have the "clean" wording. But if the file is dirty, the "clean" wording not exist and there will be no clear indication to tell the file is dirty. So as long as inside each file, if the "clean" wording is not detect, I'll category it as dirty file and I would like to trace the file name
You can use a simple one-liner:
perl -0777 -nlwE 'say $ARGV if !/clean/i' *.txt
Slurping the file with -0777, making the regex check against the entire file. If the match is not found, we print the file name.
For perl versions lower than 5.10 that do not support -E you can substitute -E with -e and say $ARGV with print "$ARGV".
perl -0777 -nlwe 'print "$ARGV\n" if !/clean/i' *.txt
If you need to generate the list within Perl, the File::Finder module will make life easy.
Untested, but should work:
use File::Finder;
my #wanted = File::Finder # finds all ..
->type( 'f' ) # .. files ..
->name( '*.txt' ) # .. ending in .txt ..
->in( '.' ) # .. in current dir ..
->not # .. that do not ..
->contains( qr/clean/ ); # .. contain "clean"
print $_, "\n" for #wanted;
Neat stuff!
EDIT:
Now that I have a clearer picture of the problem, I don't think any module is necessary here:
use strict;
use warnings;
my #files = glob '*.txt'; # Dirty & clean laundry
my #dirty;
foreach my $file ( #files ) { # For each file ...
local $/ = undef; # Slurps the file in
open my $fh, $file or die $!;
unless ( <$fh> =~ /clean/ ) { # if the file isn't clean ..
push #dirty, $file; # .. it's dirty
}
close $fh;
}
print $_, "\n" for #dirty; # Dirty laundry list
Once you get the mechanics, this can be simplified a la grep, etc.
One way like this:
ls *.txt | grep -v "$(grep -l clean *.txt)"
#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<file_list_file>");
while(<FILE>)
{
my $flag=0;
my $filename=$_;
open(TMPFILE,"$_");
while(<TMPFILE>)
{
$flag=1 if(/<your_string>/);
}
close(TMPFILE);
if(!$flag)
{
print $filename;
}
}
close(FILE);

Perl script to make first 8 characters all caps but not the whole file name

What Perl script should I be using to only change the first 8 characters in a file name to all caps instead of the script changing the entire file name to all caps?
Here is how I am setting it up:
#!/usr/bin/perl
chdir "directory path";
##files = `ls *mw`;
#files = `ls | grep mw`;
chomp #files;
foreach $oldname (#files) {
$newname = $oldname;
$newname =~ s/mw//;
print "$oldname -> $newname\n";
rename("$oldname","$newname");
}
You can use this regex:
my $str = 'Hello World!';
$str =~ s/^(.{8})/uc($1)/se; # $str now contains 'HELLO WOrld!'
The substitution
s/^(.{1,8})/\U$1/
will set the first eight characters of a string to upper case. The complete program looks like this
use strict;
use warnings;
chdir "directory path" or die "Unable to change current directory: $!";
opendir my $dh, '.' or die $!;
my #files = grep -f && /mw/, readdir $dh;
foreach my $file (#files) {
(my $new = $file) =~ s/mw//;
$new =~ s/^(.{1,8})/\U$1/s;
print "$file -> $new\n";
rename $file, $new;
}
How about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
chdir'/path/to/directory';
# Find all files that contain 'mw'
my #files = glob("*mw*");
foreach my $file(#files) {
# skip directories
next if -d $file;
# remve 'mw' from the filename
(my $FILE = $file) =~ s/mw//;
# Change filename to uppercase even if the length is <= 8 char
$FILE =~ s/^(.{1,8})/uc $1/se;
move($file, $FILE);
}
As said in the doc for rename, you'd better use File::Copy to be platform independent.
Always check return values of system calls!
When you make any call to OS services, you should always check the return value. For example, the Perl documentation for chdir is (with added emphasis)
chdir EXPR
chdir FILEHANDLE
chdir DIRHANDLE
chdir
Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by $ENV{HOME}, if set; if not, changes to the directory specified by $ENV{LOGDIR}. (Under VMS, the variable $ENV{SYS$LOGIN} is also checked, and used if it is set.) If neither is set, chdir does nothing. It returns true on success, false otherwise. See the example under die.
On systems that support fchdir(2), you may pass a filehandle or directory handle as the argument. On systems that don't support fchdir(2), passing handles raises an exception.
As written in your question, your code discards important information: whether system calls chdir and rename succeeded or failed.
Providing useful error messages
An example of a common idiom for checking return values in Perl is
chdir $path or die "$0: chdir $path: $!";
The error message contains three important bits of information:
the program emitting the error, $0
what it was trying to do, chdir in this case
why it failed, $!
Also note that die also the name of the file and line number where program control was if your error message does not end with newline. When the chdir fails, the standard error will resemble
./myprogram: chdir: No such file or directory at ./myprogram line 3.
Logical or is true when at least one of its arguments is true. The “do something or die” idiom works because if chdir above fails, it returns a false value and requires or to evaluate the right-hand side and terminates execution with die. In the happy case where chdir succeeds and returns a true value, there is no need to evaluate the right-hand side because we already have one true argument to logical or.
Suggested improvements to your code
For what you’re doing, I recommend using readdir to avoid problems in case one of the filenames contains whitespace. Note the defined test in the code below that’s there to stop a file named 0 (i.e., a single zero character) terminating your loop.
#! /usr/bin/env perl
chdir "directory path" or die "$0: chdir: $!";
opendir $dh, "." or die "$0: opendir: $!";
while (defined($oldname = readdir $dh)) {
next unless ($newname = $oldname) =~ s/mw//;
$newname =~ s/^(.{1,8})/\U$1/;
rename $oldname, $newname or die "$0: rename $oldname, $newname: $!";
}
For the rename to have any hope, you have to preserve the value of $oldname, so right away, the code above copies it to $newname and starts changing the copy rather than the original. You will see
($new = $old) =~ s/.../.../; # or /.../
in Perl code, so it is also an important idiom to understand.
The perlop documentation defines handy escape sequences for use in strings and regex substitutions:
\l lowercase next character only
\u titlecase (not uppercase!) next character only
\L lowercase all characters till \E seen
\U uppercase all characters till \E seen
\Q quote non-word characters till \E
\E end either case modification or quoted section (whichever was last seen)
The code above grabs the first eight characters (or fewer if $newname is shorter in length) and replaces them with their upcased counterparts.
Example output
See the code in action:
$ ls directory\ path/
defmwghijk mwabc nochange qrstuvwxyzmw
$ ./prog
$ ls directory\ path/
ABC DEFGHIJK QRSTUVWXyz nochange
I figure there's more to your requirements than you're telling us, such as not uppercasing parts of the file extension. Instead of matching the first eight characters, I'll match the first eight letters:
use v5.14;
use utf8;
chdir "/Users/brian/test/";
my #files = glob( 'mw*' );
foreach my $old (#files) {
my $new = $old =~ s/\Amw(\pL{1,8})/\U$1/ir;
print "$old → $new\n";
}
Some other notes:
You can do the glob directly in Perl. You don't need ls.
It looks like you were stripping off mv, so I did that. If that's not what you want, it's easy to change.
In lieu of a regular expression to up-case the first eight characters you could use the 4-argument form of substr. This offers in situ replacement.
my $old = q(abcdefghij);
my $new = $old;
substr( $new, 0, 8, substr( uc($old), 0, 8 ) );
print "$old\n$new\n";
abcdefghij
ABCDEFGHij
Use rename or File::Copy::move (as M42 showed) to perform the actual rename.

How can I do bulk search and replace with Perl?

I have the following script that takes in an input file, output file and
replaces the string in the input file with some other string and writes out
the output file.
I want to change the script to traverse through a directory of files
i.e. instead of prompting for input and output files, the script should take
as argument a directory path such as C:\temp\allFilesTobeReplaced\ and
search for a string x and replace it with y for all files under that
directory path and write out the same files.
How do I do this?
Thanks.
$file=$ARGV[0];
open(INFO,$file);
#lines=<INFO>;
print #lines;
open(INFO,">c:/filelist.txt");
foreach $file (#lines){
#print "$file\n";
print INFO "$file";
}
#print "Input file name: ";
#chomp($infilename = <STDIN>);
if ($ARGV[0]){
$file= $ARGV[0]
}
print "Output file name: ";
chomp($outfilename = <STDIN>);
print "Search string: ";
chomp($search = <STDIN>);
print "Replacement string: ";
chomp($replace = <STDIN>);
open(INFO,$file);
#lines=<INFO>;
open(OUT,">$outfilename") || die "cannot create $outfilename: $!";
foreach $file (#lines){
# read a line from file IN into $_
s/$search/$replace/g; # change the lines
print OUT $_; # print that line to file OUT
}
close(IN);
close(OUT);
The use of the perl single liner
perl -pi -e 's/original string/new string/' filename
can be combined with File::Find, to give the following single script (this is a template I use for many such operations).
use File::Find;
# search for files down a directory hierarchy ('.' taken for this example)
find(\&wanted, ".");
sub wanted
{
if (-f $_)
{
# for the files we are interested in call edit_file().
edit_file($_);
}
}
sub edit_file
{
my ($filename) = #_;
# you can re-create the one-liner above by localizing #ARGV as the list of
# files the <> will process, and localizing $^I as the name of the backup file.
local (#ARGV) = ($filename);
local($^I) = '.bak';
while (<>)
{
s/original string/new string/g;
}
continue
{
print;
}
}
You can do this with the -i param:
Just process all the files as normal, but include -i.bak:
#!/usr/bin/perl -i.bak
while ( <> ) {
s/before/after/;
print;
}
This should process each file, and rename the original to original.bak And of course you can do it as a one-liner as mentioned by #Jamie Cook
Try this
#!/usr/bin/perl -w
#files = <*>;
foreach $file (#files) {
print $file . '\n';
}
Take also a look to glob in Perl:
http://perldoc.perl.org/File/Glob.html
http://www.lyingonthecovers.net/?p=312
I know you can use a simple Perl one-liner from the command line, where filename can be a single filename or a list of filenames. You could probably combine this with bgy's answer to get the desired effect:
perl -pi -e 's/original string/new string/' filename
And I know it's trite but this sounds a lot like sed, if you can use gnu tools:
for i in `find ./allFilesTobeReplaced`; do sed -i s/original string/new string/g $i; done
perl -pi -e 's#OLD#NEW#g' filename.
You can replace filename with the pattern that suits your file list.