My question is probably simple but I'm a complete newbie. I want to search the contents of multiple text files for a particular phrase and then display the lines of the finds on screen. I've already learnt how to deal with a single file. For example, if I want to search for a word, say "Okay" in a text file named "wyvern.txt" in the root directory of F. The following code works:
#!/usr/bin/perl
$file = 'F:\wyvern.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /Okay/;
}
close(txt);
But what should I do if I want to search for the same phrase in two text files, say "wyvern' and "casanova" respectively? or how about all the files in the directory "novels" in the root directory of F.
Any help would be greatly appreciated.
Thanks in advance
Mike
Edit:
Haha, I finally figured out how to search all the files in a directory for a pattern match:)
The following code works great:
#!/usr/bin/perl
#files = <F:/novels/*>;
foreach $file (#files) {
open (FILE, "$file");
while($line= <FILE> ){
print "$line" if $line =~ /Okay/;
}
close FILE;
}
Extending the good answer provided by Jonathan Leffler:
The filename where the match was found is in $ARGV, and with a small change, the line number can be found in $.. Example:
while (<>) {
print "$ARGV:$.:$_" if /pattern/;
} continue {
close ARGV if eof; # Reset $. at the end of each file.
}
Furthermore, if you have a list of filenames and they're not on the commandline, you can still get the magic ARGV behavior. Watch:
{
local #ARGV = ('one.txt', 'two.txt');
while (<>) {
print "$ARGV:$.:$_" if /Okay/;
} continue {
close ARGV if eof;
}
}
Which is a generally useful pattern for doing line-by-line processing on a series of files, whatever it is -- even if I might recommend File::Grep or App::Ack for this specific problem :)
On a system where command line arguments are properly expanded, you can use:
[sinan#host:~/test]$ perl -ne 'print "$.:$_" if /config/' *
1:$(srcdir)/config/override.m4
The problem with Windows is:
C:\Temp> perl -ne "print if /perl/" *.txt
Can't open *.txt: Invalid argument.
On Windows, you could do:
C:\Temp> for %f in (*.txt) do perl -ne "print if /perl/" %f
But, you might just want to use cmd.exe builtin findstr or the grep command line tool.
The easiest way is to list the files on the command line, and then simply use:
while (<>)
{
print if m/Okay/;
}
File::Grep is what you need here
Just a tweak on your line: <F:/novels/*>, I prefer to use the glob keyword - it works the same in this context and avoids the chances of confusing the many different uses of angle brackets in perl. Ie:
#files = glob "F:/novels/*";
See perldoc glob for more.
put the files in a for loop, or something along those lines:
i.e.
for $file ('F:\wyvern.txt','F:\casanova.txt') {
open(TXT, $file);
while($line = <txt>) {
print "$line" if $line =~ /Okay/;
}
close TXT;
}
Okay, I'm a complete dummie. But to sum up, I now can search one single text file or multiple text files for a specified string. I'm still trying to figuring out how to deal with all the files in one folder.
the following codes work.
Code 1:
#!/usr/bin/perl
$file = 'F:\one.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /Okay/;
}
close(txt);
Code 2:
#!/usr/bin/perl
{
local #ARGV = ('F:\wyvern.txt', 'F:\casanova.txt');
while (<>) {
print "$ARGV:$.:$_" if /Okay/;
} continue {
close ARGV if eof;
}
}
Thanks again for your help. I really appreciate it.
Related
I have this simple script that I'm working on. I must admit, I'm totally new to PERL and kinda stuck with this stupid problem.
open(IN, "<def/t.html") or die();
while(<IN>) {
chomp;
if($_ =~ m/FF0000/) {
print "\n".$_."\n";
}
}
So... I opened the t.html and found the given string in the file. Output was ok, but I need also filename of a file in which string was found, to be printed. I really don't know how to return this, and I need it right after the $_. Thanks for the help in advance.
Simply save the file name in a variable before you open it, then go from there:
my $filename = 'def/t.html';
open( IN, '<', $filename ) or die $!;
...
print "\n$filename: " . $_ . "\n";
Notice that the above uses the 3-arg form of open(), which is safer.
(Also, the language is "Perl", not "PERL".)
That is a strange idea, but you can if you want:
$ cat 1.pl
#somewhere in the code
open(F, "f.txt");
my $f = fileno(F);
#here you want to find the filename
open(FILENAME, "ls -l /proc/$$/fd/$f|");
my #fn = split(/\s+/, <FILENAME>);
print $fn[$#fn],"\n";
$ perl 1.pl
/home/ic/f.txt
Here you know only the filedescriptor and find the filename using it.
You can also write it much shorter with readlink:
open(F, "f.txt");
my $f = fileno(F);
#here you want to find the filename
print readlink("/proc/$$/fd/$f"), "\n";
I must note that the file can be already deleted (but it exists still if it is open).
I have a text file to parse in Perl. I parse it from the start of file and get the data that is needed.
After all that is done I want to read the last line in the file with data. The problem is that the last two lines are blank. So how do I get the last line that holds any data?
If the file is relatively short, just read on from where you finished getting the data, keeping the last non-blank line:
use autodie ':io';
open(my $fh, '<', 'file_to_read.txt');
# get the data that is needed, then:
my $last_non_blank_line;
while (my $line = readline $fh) {
# choose one of the following two lines, depending what you meant
if ( $line =~ /\S/ ) { $last_non_blank_line = $line } # line isn't all whitespace
# if ( line !~ /^$/ ) { $last_non_blank_line = $line } # line has no characters before the newline
}
If the file is longer, or you may have passed the last non-blank line in your initial data gathering step, reopen it and read from the end:
my $backwards = File::ReadBackwards->new( 'file_to_read.txt' );
my $last_non_blank_line;
do {
$last_non_blank_line = $backwards->readline;
} until ! defined $last_non_blank_line || $last_non_blank_line =~ /\S/;
perl -e 'while (<>) { if ($_) {$last = $_;} } print $last;' < my_file.txt
You can use the module File::ReadBackwards in the following way:
use File::ReadBackwards ;
$bw = File::ReadBackwards->new('filepath') or
die "can't read file";
while( defined( $log_line = $bw->readline ) ) {
print $log_line ;
exit 0;
}
If they're blank, just check $log_line for a match with \n;
If the file is small, I would store it in an array and read from the end. If its large, use File::ReadBackwards module.
Here's my variant of command line perl solution:
perl -ne 'END {print $last} $last= $_ if /\S/' file.txt
No one mentioned Path::Tiny. If the file size is relativity small you can do this:
use Path::Tiny;
my $file = path($file_name);
my ($last_line) = $file->lines({count => -1});
CPAN page.
Just remember for the large file, just as #ysth said it's better to use File::ReadBackwards. The difference can be substantial.
sometimes it is more comfortable for me to run shell commands from perl code. so I'd prefer following code to resolve the case:
$result=`tail -n 1 /path/file`;
I am using File::Find and file i/o on a text file to parse a series of directories and move the contents into a new folder. It is a simple script (see below):
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use File::Copy;
my $dir = "/opt/CollectMinderDocuments/coastalalglive"; #base directory for Coastal documents
#read file that contains a list of closed IDs
open(MYDATA, "Closed.txt");
mkdir("Closed");
while(my $line = <MYDATA>) {
chomp $line;
my $str = "$dir" . "/Account$line";
print "$str\n";
find(\&move_documents, $str);
}
sub move_documents {
my $smallStr = substr $File::Find::name, 43;
if(-d) {
#system("mkdir ~/Desktop/Closed/$smallStr");
print "I'm here\n";
system("mkdir /opt/CollectMinderDocuments/coastalalglive/Closed/$smallStr");
#print "Made a directory: /opt/CollectMinderDocuments/coastalalglive/Closed/$smallStr\n";
}
else {
print "Now I'm here\n";
my $smallerStr = substr $File::Find::dir, 43;
my $temp = "mv * /opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/";
system("$temp");
}
}
The text file contains a list of numbers:
1234
2805
5467
The code worked when I executed it last month, but it is now returning a "file or directory not found" error. The actual error is "No such file or directoryerDocuments/coastalalglive/Account2805". I know all of the directories it is searching for exist. I have manually typed in one of the directories, and the script executes fine:
find(\&move_documents, "/opt/CollectMinderDocuments/coastalalglive/Account2805/");
I am not sure why the error is being returned. Thanks in advance for the help.
Your error:
"No such file or directoryerDocuments/coastalalglive/Account2805"
Seems to imply that there is an \r that was not removed by your chomp. That will happen when transferring files between different file systems, where the file contains \r\n as line endings. The real error string would be something like:
/opt/CollectMinderDocuments/coastalalglive/Account2805\r: No such file or directory
Try changing chomp $line to $line =~ s/[\r\n]+$//; instead, and see if that works.
Also:
my $temp = "mv * /opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/";
system("$temp");
Is very wrong. The first non-directory file in that loop will move all the remaining files (including dirs? not sure if mv does that by default). Hence, subsequent iterations of the subroutine will find nothing to move, also causing a "Not found" type error. Though not one caught by perl, since you are using system instead of File::Copy::move. E.g.:
move $_, "/opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/" or die $!;
I've created this script below for a assignment I have. It asks for a text file, checks the frequency of words, and lists the 10 words that appear the most times. Everything is working fine, but I need this script to be able to start via the command line as well as via the standard input.
So I need to be able to write 'perl wfreq.pl example.txt' and that should start the script and not ask the question for a text file. I'm not sure how to accomplish this really. I think I might need a while loop at the start somewhere that skips the STDIN if you give it the text file on a terminal command line.
How can I do it?
The script
#! /usr/bin/perl
use utf8;
use warnings;
print "Please enter the name of the file: \n" ;
$file = <STDIN>;
chop $file;
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
while(<$DATA>) {
tr/A-Za-z//cs;
s/[;:()".,!?]/ /gio;
foreach $word (split(' ', lc $_)) {
$freq{$word}++;
}
}
foreach $word (sort { $freq{$b} <=> $freq{$a} } keys %freq) {
#fr = (#fr, $freq{$word});
#ord = (#ord, $word);
}
for ($v =0; $v < 10; $v++) {
print " $fr[$v] | $ord[$v]\n";
}
Instead of reading from <STDIN>, you can read from <> to get data either from files provided on the command line or from stdin if there are no files.
For example, with the program:
#!/usr/bin/env perl
while (<>) {
print $_;
}
The command ./script foo.txt will read and print lines from foo.txt, while ./script by itself will read and print lines from standard input.
You need to do the following:
my $DATA;
my $filename = $ARGV[0];
unless ($filename) {
print "Enter filename:\n";
$filename = <STDIN>;
chomp $filename;
}
open($DATA, $filename) or die $!;
Though I have to say, user-prompts are very un-Unix like.
perl script.pl < input.txt
The use of the operator < passes input.txt to script.pl as standard input. You can then skip querying for the filename. Otherwise, use $ARGV[0] or similar, if defined.
You can check for a command-line argument in #ARGV, which is Perl's array that automagically grabs command line arguments, and --if present-- process them (else continue with input from STDIN). Something like:
use utf8;
use strict; #Don't ever forget this! Always, always, ALWAYS use strict!
use warnings;
if(#ARGV)
{
#Assume that the first command line argument is a file to be processed as in your original code.
#You may or may not want to care if additional command line arguments are passed. Up to you.
}
else
{
#Ask for the filename and proceed as normal.
}
Note that you might want to isolate the code for processing the file name (i.e., the opening of DATA, going through the lines, counting the words, etc.) to a subroutine that you can easily call (perhaps with an argument consisting of the file name) in each branch of the code.
Oh, and to make sure I'm clear about this: always use strict and always use warnings. If you don't, you're asking for trouble.
in Unix, what I want to do is "history | grep keyword", just because it takes quite some steps if i wanna grep many types of keywords, so I want it to be automation, which I write a Perl script to do everything, instead of repeating the commands by just changing the keyword, so whenever I want to see those certain commands, I will just use the Perl script to do it for me.
The keyword that I would like to 'grep' is such as source, ls, cd, etc.
It can be printed out in any format, as long as to know how to do it.
Thanks! I appreciate any comments.
modified (thanks to #chas-owens)
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = ".bash.history";
open FILE, "<", $historyFile or die "could not open $historyFile: $!";
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
original
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = "<.bash.history";
open FILE, $historyFile;
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
to be honest ... history | grep whatever is clean and simple and nice ; )
note code may not be perfect
because it takes quite some steps if i wanna grep many types of keywords
history | grep -E 'ls|cd|source'
-P will switch on the Perl compatible regular expression library, if you have a new enough version of grep.
This being Perl, there are many ways to do it. The simplest is probably:
#!/usr/bin/perl
use strict;
use warnings;
my $regex = shift;
print grep { /$regex/ } `cat ~/.bash_history`;
This runs the shell command cat ~/.bash_history and returns the output as a list of lines. The list of lines is then consumed by the grep function. The grep function runs the code block for every item and only returns the ones that have a true return value, so it will only return lines that match the regex.
This code has several things wrong with it (it spawns a shell to run cat, it holds the entire file in memory, $regex could contain dangerous things, etc.), but in a safe environment where speed/memory isn't an issue, it isn't all that bad.
A better script would be
#!/usr/bin/perl
use strict;
use warnings;
use constant HISTORYFILE => "$ENV{HOME}/.bash_history";
my $regex = shift;
open my $fh, "<", HISTORYFILE
or die "could not open ", HISTORYFILE, ": $!";
while (<$fh>) {
next unless /$regex/;
print;
}
This script uses a constant to make it easier to change which history file it is using at a latter date. It opens the history file directly and reads it line by line. This means the whole file is never in memory. This can be very important if the file is very large. It still has the problem that $regex might contain a harmful regex, but so long as you are the person running it, you only have yourself to blame (but I wouldn't let outside users pass arguments to a command like this through, say a web application).
I think you are better off writing a perlscript which does you fancy matching (i.e. replaces the grep) but does not read the history file. I say this because the history does not appear to be flushed to the .bash_history file until I exit the shell. Now there are probably settings and/or environment variables to control this, but I don't know what they are. So if you just write a perl script which scanns STDIN for your favourite commands you can invoke it like
history | findcommands.pl
If its less typing you are after set up a shell function or alias to do this for you.
As requested by #keifer here is a sample perl script which searches for a specified (or default set of commands in your history). Onbiously you should change the dflt_cmds to whichever ones you search for most frequently.
#!/usr/bin/perl
my #dflt_cmds = qw( cd ls echo );
my $cmds = \#ARGV;
if( !scalar(#$cmds) )
{
$cmds = \#dflt_cmds;
}
while( my $line = <STDIN> )
{
my( $num, $cmd, #args ) = split( ' ', $line );
if( grep( $cmd eq $_ , #$cmds ) )
{
print join( ' ', $cmd, #args )."\n";
}
}