Perl wildcards in the file paths - perl

I am working on my project where the GNU Makefile should automatically test my Perl program with different input files. I have this code which reads only one file from inputs directory, searches stop words and outputs out.txt file with frequency table of non-stop words.
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::StopWords qw(getStopWords);
my %found;
my $src = '/programu-testavimas/1-dk/trunk/tests/inputs/test.txt';
my $des = '/programu-testavimas/1-dk/trunk/tests/outputs/out.txt';
open(SRC,'<',$src) or die $!;
open(DES,'>',$des) or die $!;
my $stopwords = getStopWords('en');
while( my $line = <SRC>){
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print DES $_, "\t\t", $found{$_}, $/ for sort keys %found;
close(SRC);
close(DES);
My goal is to test many files with separate case.sh scripts where the input files should be different in each case, this is one of the case:
#!/bin/sh
perl /programu-testavimas/1-dk/trunk/scripts/test.pl /programu-testavimas/1-dk/trunk/tests/inputs/test.txt > /home/aleksandra/programų-testavimas/1-dk/trunk/tests/outputs/out.txt
Then, my Makefile at once should test program with different inputs in each case. So, right now I'm struggling with my Perl code where my input file is only one individual and I need to make it read different files in inputs directory. How can I change path correctly that bash scripts could have each case with individual input file?
EDIT: I tried this with glob function but it outputs empty file
open(DES,'>',$des) or die $!;
my $stopwords = getStopWords('en');
for my $file ( glob $src ) {
open(SRC,'<',$file) or die "$! opening $file";
while( my $line = <SRC>){
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print DES $_, "\t\t", $found{$_}, $/ for sort keys %found;
close(SRC);
}
close(DES);

Correct me if I'm wrong, but to me it sounds like you have different shell scripts, each calling your perl script with a different input, and redirecting your perl's script output to a new file.
You don't need to glob anything in your perl script. It already has all the information it needs: which file to read. Your shell script/Makefile is handling the rest.
So given the shell script
#!/bin/sh
perl /path/to/test.pl /path/to/input.txt > /path/to/output.txt
Then in your perl script, simply read from the file provided via the first positional parameter:
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::StopWords qw(getStopWords);
my %found;
my $stopwords = getStopWords('en');
while(my $line = <>) {
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print $_, "\t\t", $found{$_}, $/ for sort keys %found;
while(<>) will read from STDIN or from ARGV.
Your shell script could then call your perl script with different inputs and define outputs:
#!/bin/sh
for input in /path/to/*.txt; do
perl /path/to/test.pl "$input" > "$input.out"
done

Related

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

I have a file that I want to split using pipe as delimiter. How can I read the file using Perl?

Here is a shell script reading the file.
#!/bin/sh
procDate=$1
echo "Date $procDate"
file=`cat filename_$procDate.txt`
echo "$file"
I want to convert it to Perl and use the split operator with pipe | as delimiter.
It's far from clear from your question what it is that you want to do with these fields once you have split them
Your own shell script uses cat to copy the entire contents of your file into $file, but that's unlikely to be what you need to do
A very generalised Perl program would look like this
use strict;
use warnings 'all';
my ($procDate) = #ARGV;
print "Date $procDate\n";
open my $fh, '<', "filename_$procDate.txt" or die $!;
while ( <$fh> ) {
chomp;
my #fields = split /\|/;
# do something with #fields, for instance
print "#fields\n";
}
That code splits each line on pipe | characters, puts the list of substrings in #fields and then prints it separated by spaces. But I can't guess what more you might want to do?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
open(FILE, "<filename_$procDate.txt") or die "Couldn't open file filename_$procDate.txt, $!";
while ( my $line = <FILE> ) {
print "Line content is $line\n";
my #line_content = split(/\|/, $line);
print Dumper (\#line_content);
}
close (FILE);

foreach and special variable $_ not behaving as expected

I'm learning Perl and wrote a small script to open perl files and remove the comments
# Will remove this comment
my $name = ""; # Will not remove this comment
#!/usr/bin/perl -w <- wont remove this special comment
The name of files to be edited are passed as arguments via terminal
die "You need to a give atleast one file-name as an arguement\n" unless (#ARGV);
foreach (#ARGV) {
$^I = "";
(-w && open FILE, $_) || die "Oops: $!";
/^\s*#[^!]/ || print while(<>);
close FILE;
print "Done! Please see file: $_\n";
}
Now when I ran it via Terminal:
perl removeComments file1.pl file2.pl file3.pl
I got the output:
Done! Please see file:
This script is working EXACTLY as I'm expecting but
Issue 1 : Why $_ didn't print the name of the file?
Issue 2 : Since the loop runs for 3 times, why Done! Please see file: was printed only once?
How you would write this script in as few lines as possible?
Please comment on my code as well, if you have time.
Thank you.
The while stores the lines read by the diamond operator <> into $_, so you're writing over the variable that stores the file name.
On the other hand, you open the file with open but don't actually use the handle to read; it uses the empty diamond operator instead. The empty diamond operator makes an implicit loop over files in #ARGV, removing file names as it goes, so the foreach runs only once.
To fix the second issue you could use while(<FILE>), or rewrite the loop to take advantage of the implicit loop in <> and write the entire program as:
$^I = "";
/^\s*#[^!]/ || print while(<>);
Here's a more readable approach.
#!/usr/bin/perl
# always!!
use warnings;
use strict;
use autodie;
use File::Copy;
# die with some usage message
die "usage: $0 [ files ]\n" if #ARGV < 1;
for my $filename (#ARGV) {
# create tmp file name that we are going to write to
my $new_filename = "$filename\.new";
# open $filename for reading and $new_filename for writing
open my $fh, "<", $filename;
open my $new_fh, ">", $new_filename;
# Iterate over each line in the original file: $filename,
# if our regex matches, we bail out. Otherwise we print the line to
# our temporary file.
while(my $line = <$fh>) {
next if $line =~ /^\s*#[^!]/;
print $new_fh $line;
}
close $fh;
close $new_fh;
# use File::Copy's move function to rename our files.
move($filename, "$filename\.bak");
move($new_filename, $filename);
print "Done! Please see file: $filename\n";
}
Sample output:
$ ./test.pl a.pl b.pl
Done! Please see file: a.pl
Done! Please see file: b.pl
$ cat a.pl
#!/usr/bin/perl
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
$ cat a.pl.bak
#!/usr/bin/perl
# this doesn't do much
print "I don't do much\n"; # comments dont' belong here anyways
exit;
print "errrrrr";
Its not safe to use multiple loops and try to get the right $_. The while Loop is killing your $_. Try to give your files specific names inside that loop. You can do this with so:
foreach my $filename(#ARGV) {
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
or that way:
foreach (#ARGV) {
my $filename = $_;
$^I = "";
(-w && open my $FILE,'<', $filename) || die "Oops: $!";
/^\s*#[^!]/ || print while(<$FILE>);
close FILE;
print "Done! Please see file: $filename\n";
}
Please never use barewords for filehandles and do use a 3-argument open.
open my $FILE, '<', $filename — good
open FILE $filename — bad
Simpler solution: Don't use $_.
When Perl was first written, it was conceived as a replacement for Awk and shell, and Perl heavily borrowed from that syntax. Perl also for readability created the special variable $_ which allowed you to use various commands without having to create variables:
while ( <INPUT> ) {
next if /foo/;
print OUTPUT;
}
The problem is that if everything is using $_, then everything will effact $_ in many unpleasant side effects.
Now, Perl is a much more sophisticated language, and has things like locally scoped variables (hint: You don't use local to create these variables -- that merely gives _package variables (aka global variables) a local value.)
Since you're learning Perl, you might as well learn Perl correctly. The problem is that there are too many books that are still based on Perl 3.x. Find a book or web page that incorporates modern practice.
In your program, $_ switches from the file name to the line in the file and back to the next file. It's what's confusing you. If you used named variables, you could distinguished between files and lines.
I've rewritten your program using more modern syntax, but your same logic:
use strict;
use warnings;
use autodie;
use feature qw(say);
if ( not $ARGV[0] ) {
die "You need to give at least one file name as an argument\n";
}
for my $file ( #ARGV ) {
# Remove suffix and copy file over
if ( $file =~ /\..+?$/ ) {
die qq(File "$file" doesn't have a suffix);
}
my ( $output_file = $file ) =~ s/\..+?$/./; #Remove suffix for output
open my $input_fh, "<", $file;
open my $output_fh, ">", $output_file;
while ( my $line = <$input_fh> ) {
print {$output_fh} $line unless /^\s*#[^!]/;
}
close $input_fh;
close $output_fh;
}
This is a bit more typing than your version of the program, but it's easier to see what's going on and maintain.

Perl-script to read and print lines from multiple txt files?

We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:
To: blabla#hotmail.com
Subject: blabla
From: bla1#hotmail.com
Message: Hello World!
The platform I am to the script on is Windows, and everything is local (including the Perl instance). The aim is to write a script, which crawls through each file (all located within the same directory), and print out a list of each 'unique' email address in the from field. The concept is very easy.
Can anyone point me in the right direction here? I know how to start off a Perl script, and I am able to read a single file and print all details:
#!/usr/local/bin/perl
open (MYFILE, 'emails/email_id_1.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
So now, I need to be able to read and print line 3 of this file, but perform this activity not just once, but for all of the files. I've looked into the File::Find module, could this be of any use?
What platform? If Linux then it's simple:
foreach $f (#ARGS) {
# Do stuff
}
and then call with:
perl mything.pl *.txt
In Windows you'll need to expand the wildcard first as cmd.exe doesn't expand wildcards (unlike Linux shells):
#ARGV = map glob, #ARGV
foreach $f (#ARGS) {
# Do stuff
}
then extracting the third line is just a simple case of reading each line in and counting when you've got to line 3 so you know to print the results.
The glob() builtin can give you a list of files in a directory:
chdir $dir or die $!;
my #files = glob('*');
You can use Tie::File to access the 3rd line of a file:
use Tie::File;
for (#files) {
tie my #lines, 'Tie::File', $_ or die $!;
print $lines[2], "\n";
}
Perl one-liner, windows-version:
perl -wE "#ARGV = glob '*.txt'; while (<>) { say $1 if /^From:\s*(.*)/ }"
It will check all the lines, but only print if it finds a valid From: tag.
Are you using a Unix-style shell? You can do this in the shell without even using Perl.
grep "^From:" ./* | sort | uniq -c"
The breakdown is as follows:
grep will grab every line that starts with "From:", and send it to...
sort, which will alpha sort those lines, then...
uniq, which will filter out dupe lines. The "-c" part will count the occurrences.
Your output would look like:
3 From: dave#example.com
5 From: foo#bar.example.com
etc...
Possible issues:
I'm not sure how complex your "From" lines will be, e.g. multiple addresses, different formats, etc.
You could enhance that grep step in a few ways, or replace it with a Perl script that has less-broad functionality than your proposed all-in-one script.
Please comment if anything isn't clear.
Here's my solution (I hope this isn't homework).
It checks all files in the current directory whose names end with ".txt", case-insensitive (e.g., it will find "foo.TXT", which is probably what you want under Windows). It also allows for possible variations in line terminators (at least CR-LF and LF), and searches for the From: prefix case-insensitively, and allows arbitrary whitespace after the :.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $DIR, '.' or die "opendir .: $!\n";
my #files = grep /\.txt$/i, readdir $DIR;
closedir $DIR;
# print "Got ", scalar #files, " files\n";
my %seen = ();
foreach my $file (#files) {
open my $FILE, '<', $file or die "$file: $!\n";
while (<$FILE>) {
if (/^From:\s*(.*)\r?$/i) {
$seen{$1} = 1;
}
}
close $FILE;
}
foreach my $addr (sort keys %seen) {
print "$addr\n";
}

How can I do bulk search and replace with Perl?

I have the following script that takes in an input file, output file and
replaces the string in the input file with some other string and writes out
the output file.
I want to change the script to traverse through a directory of files
i.e. instead of prompting for input and output files, the script should take
as argument a directory path such as C:\temp\allFilesTobeReplaced\ and
search for a string x and replace it with y for all files under that
directory path and write out the same files.
How do I do this?
Thanks.
$file=$ARGV[0];
open(INFO,$file);
#lines=<INFO>;
print #lines;
open(INFO,">c:/filelist.txt");
foreach $file (#lines){
#print "$file\n";
print INFO "$file";
}
#print "Input file name: ";
#chomp($infilename = <STDIN>);
if ($ARGV[0]){
$file= $ARGV[0]
}
print "Output file name: ";
chomp($outfilename = <STDIN>);
print "Search string: ";
chomp($search = <STDIN>);
print "Replacement string: ";
chomp($replace = <STDIN>);
open(INFO,$file);
#lines=<INFO>;
open(OUT,">$outfilename") || die "cannot create $outfilename: $!";
foreach $file (#lines){
# read a line from file IN into $_
s/$search/$replace/g; # change the lines
print OUT $_; # print that line to file OUT
}
close(IN);
close(OUT);
The use of the perl single liner
perl -pi -e 's/original string/new string/' filename
can be combined with File::Find, to give the following single script (this is a template I use for many such operations).
use File::Find;
# search for files down a directory hierarchy ('.' taken for this example)
find(\&wanted, ".");
sub wanted
{
if (-f $_)
{
# for the files we are interested in call edit_file().
edit_file($_);
}
}
sub edit_file
{
my ($filename) = #_;
# you can re-create the one-liner above by localizing #ARGV as the list of
# files the <> will process, and localizing $^I as the name of the backup file.
local (#ARGV) = ($filename);
local($^I) = '.bak';
while (<>)
{
s/original string/new string/g;
}
continue
{
print;
}
}
You can do this with the -i param:
Just process all the files as normal, but include -i.bak:
#!/usr/bin/perl -i.bak
while ( <> ) {
s/before/after/;
print;
}
This should process each file, and rename the original to original.bak And of course you can do it as a one-liner as mentioned by #Jamie Cook
Try this
#!/usr/bin/perl -w
#files = <*>;
foreach $file (#files) {
print $file . '\n';
}
Take also a look to glob in Perl:
http://perldoc.perl.org/File/Glob.html
http://www.lyingonthecovers.net/?p=312
I know you can use a simple Perl one-liner from the command line, where filename can be a single filename or a list of filenames. You could probably combine this with bgy's answer to get the desired effect:
perl -pi -e 's/original string/new string/' filename
And I know it's trite but this sounds a lot like sed, if you can use gnu tools:
for i in `find ./allFilesTobeReplaced`; do sed -i s/original string/new string/g $i; done
perl -pi -e 's#OLD#NEW#g' filename.
You can replace filename with the pattern that suits your file list.