Perl script to parse a text file and match a string - perl

I'm editing my question to add more details
The script executes the command and redirects the output to a text file.
The script then parses the text file to match the following string " Standard 1.1.1.1"
The output in the text file is :
Host Configuration
------------------
Profile Hostname
-------- ---------
standard 1.1.1.1
standard 1.1.1.2
The code works if i search for either 1.1.1.1 or standard . When i search for standard 1.1.1.1 together the below script fails.
this is the error that i get "Unable to find string: standard 172.25.44.241 at testtest.pl
#!/usr/bin/perl
use Net::SSH::Expect;
use strict;
use warnings;
use autodie;
open (HOSTRULES, ">hostrules.txt") || die "could not open output file";
my $hos = $ssh->exec(" I typed the command here ");
print HOSTRULES ($hos);
close(HOSTRULES);
sub find_string
{
my ($file, $string) = #_;
open my $fh, '<', $file;
while (<$fh>) {
return 1 if /\Q$string/;
}
die "Unable to find string: $string";
}
find_string('hostrules.txt', 'standard 1.1.1.1');

Perhaps write a function:
use strict;
use warnings;
use autodie;
sub find_string {
my ($file, $string) = #_;
open my $fh, '<', $file;
while (<$fh>) {
return 1 if /\Q$string/;
}
die "Unable to find string: $string";
}
find_string('output.txt', 'object-cache enabled');
Or just slurp the entire file:
use strict;
use warnings;
use autodie;
my $data = do {
open my $fh, '<', 'output.txt';
local $/;
<$fh>;
};
die "Unable to find string" if $data !~ /object-cache enabled/;

You're scanning a file for a particular string. If that string is not found in that file, you want an error thrown. Sounds like a job for grep.
use strict;
use warnings;
use features qw(say);
use autodie;
use constant {
OUTPUT_FILE => 'output.txt',
NEEDED_STRING => "object-cache enabled",
};
open my $out_fh, "<", OUTPUT_FILE;
my #output_lines = <$out_fh>;
close $out_fh;
chomp #output_lines;
grep { /#{[NEEDED_STRING]}/ } #output_lines or
die qq(ERROR! ERROR! ERROR!); #Or whatever you want
The die command will end the program and exit with a non-zero exit code. The error will be printed on STDERR.
I don't know why, but using qr(object-cache enabled), and then grep { NEEDED_STRING } didn't seem to work. Using #{[...]} allows you to interpolate constants.
Instead of constants, you might want to be able to pass in the error string and the name of the file using GetOptions.
I used the old fashion <...> file handling instead of IO::File, but that's because I'm an old fogy who learned Perl back in the 20th century before it was cool. You can use IO::File which is probably better and more modern.
ADDENDUM
Any reason for slurping the entire file in memory? - Leonardo Herrera
As long as the file is reasonably sized (say 100,000 lines or so), reading the entire file into memory shouldn't be that bad. However, you could use a loop:
use strict;
use warnings;
use features qw(say);
use autodie;
use constant {
OUTPUT_FILE => 'output.txt',
NEEDED_STRING => qr(object-cache enabled),
};
open my $out_fh, "<", OUTPUT_FILE;
my $output_string_found; # Flag to see if output string is found
while ( my $line = <$out_fh> ) {
if ( $line =~ NEEDED_STRING ){
$output_string_found = "Yup!"
last; # We found the string. No more looping.
}
}
die qq(ERROR, ERROR, ERROR) unless $output_string_found;
This will work with the constant NEEDED_STRING defined as a quoted regexp.

perl -ne '/object-cache enabled/ and $found++; END{ print "Object cache disabled\n" unless $found}' < input_file
This just reads the file a line at a time; if we find the key phrase, we increment $found. At the end, after we've read the whole file, we print the message unless we found the phrase.
If the message is insufficient, you can exit 1 unless $found instead.
I suggest this because there are two things to learn from this:
Perl provides good tools for doing basic filtering and data munging right at the command line.
Sometimes a simpler approach gets a solution out better and faster.
This absolutely isn't the perfect solution for every possible data extraction problem, but for this particular one, it's just what you need.
The -ne option flags tell Perl to set up a while loop to read all of standard input a line at a time, and to take any code following it and run it into the middle of that loop, resulting in a 'run this pattern match on each line in the file' program in a single command line.
END blocks can occur anywhere and are always run at the end of the program only, so defining it inside the while loop generated by -n is perfectly fine. When the program runs out of lines, we fall out the bottom of the while loop and run out of program, so Perl ends the program, triggering the execution of the END block to print (or not) the warning.
If the file you are searching contained a string that indicated the cache was disabled (the condition you want to catch), you could go even shorter:
perl -ne '/object-cache disabled/ and die "Object cache disabled\n"' < input_file
The program would scan the file only until it saw the indication that the cache was disabled, and would exit abnormally at that point.

First, why are you using Net::SSH::Expect? Are you executing a remote command? If not, all you need to execute a program and wait for its completion is system.
system("cmd > file.txt") or die "Couldn't execute: $!";
Second, it appears that what fails is your regular expression. You are searching for the literal expression standard 1.1.1.1 but in your sample text it appears that the wanted string contains either tabs or several spaces instead of a single space. Try changing your call to your find_string function:
find_string('hostrules.txt', 'standard\s+1.1.1.1'); # note '\s+' here

Related

How to match and find common based on substring from two files?

I have two files. File1 contains list of email addresses. File2 contains list of domains.
I want to filter out all the email addresses after matching exact domain using Perl script.
I am using below code, but I don't get correct result.
#!/usr/bin/perl
#use strict;
#use warnings;
use feature 'say';
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
my $match = open(MATCH, ">matching_domain") || die;
open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my #wrd = <$data1>;
chomp #wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
chomp $line;
foreach (#wrd) {
if($line =~ /\#$_$/) {
print MATCH "$line\n";
}
}
}
File1
abc#1gmail.com.au
abc#gmail.com
abc#gmail.com1
abc#2outlook.com2
abc#outlook.com1
abc#yahoo.com
abc#yahooo1.com
abc#yahooo.com
File2
yahoo.com
gmail.com
Expected output
abc#gmail.com
abc#yahoo.com
First off, since you seem to be on *nix, you might want to check out grep -f, which can take search patterns from a given file. I'm no expert in grep, but I would try the file and "match whole words" and this should be fairly easy.
Second: Your Perl code can be improved, but it works as expected. If you put the emails and domains in the files as indicated by your code. It may be that you have mixed the files up.
If I run your code, fixing only the paths, and keeping the domains in file1, it does create the file matching_domain and it contains your expected output:
abc#gmail.com
abc#yahoo.com
So I don't know what you think your problem is (because you did not say). Maybe you were expecting it to print output to the terminal. Either way, it does work, but there are things to fix.
#use strict;
#use warnings;
It is a huge mistake to remove these two. Biggest mistake you will ever do while coding Perl. It will not remove your errors, just hide them. You will spend 10 times as much time bug fixing. Uncomment this as your first thing you do to fix this.
use feature 'say';
You never use this. You could for example replace print MATCH "$line\n" with say MATCH $line, which is slightly more concise.
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
This is very incorrect. You are placing a condition on the creation of a variable. If the condition fails, does the variable exist? Don't do this. I assume this is to check if the file exists, but that is not what this does. To check if a file exists, you can use -e, documented as perldoc "-X" (various file tests).
Furthermore, a statement in the form of a string, "/home/user..." is TRUE ("truthy"), as far as Perl conditions are concerned. It is only false if it is "0" (zero), "" (empty) or undef (undefined). So your or clause will never be executed. E.g. "foo" or die will never die.
Lastly, this test is quite meaningless, as you will be testing this in your open statement later on anyway. If the file does not exist, the open will fail and your program will die.
my $match = open(MATCH, ">matching_domain") || die;
This is also very incorrect. First off, you never use the $match variable. Secondly, I bet it does not contain what you think it does. (it contains a boolean which states whether open was successful or not, see perldoc -f open) Thirdly, again, don't put conditions on my declarations of variables, it is a bad idea.
What this statement really means is that $match will contain either the return value of the open, or the return value of die. This should probably be simply:
open my $match, ">", "matching_domain" or die "Cannot open '$match': $!;
Also, use the three argument open with explicit open MODE, and use lexical file handles, like you have done elsewhere.
And one more thing on top of all the stuff I've already badgered you with: I don't recommend hard coding output files for small programs like this. If you want to redirect the output, use shell redirection: perl foo.pl > output.txt. I think this is what has prompted you to think something is wrong with your code: You don't see the output.
Other than that, your code is fine, as near as I can tell. You may want to chomp the lines from the domain file, but it should not matter. Also remember that indentation is a good thing, and it helps you read your code. I mentioned this in a comment, but it was removed for some reason. It is important though.
Good luck!
This assumes that the lines labeled File1 are in the file pointed to by $file1 and the lines labeled File2 are in the file pointed to by $file2.
You have your variables swapped. You want to match what is in $line against $_, not the other way around:
# loop on the file to be searched
open( my $data2, '<', $file2 ) or die "Could not open '$file2' $!\n";
while ( my $line = <$data2> ) {
chomp $line;
foreach (#wrd) {
if (/\#$line$/) {
print MATCH "$_\n";
}
}
}
You should un-comment the warnings and strict lines:
use strict;
use warnings;
warnings shows you that the or die checks are not really working the way you intended in the file name assignment statements. Just use :
my $file1 = "/home/user/domain_file";
my $file2 = "/home/user/email_address_file";
You are already doing the checks where they belong (on open).

Printing a text file using perl

I am completely new to this and this should be the easiest thing to do but for some reason I cannot get my local text file to print. After trying multiple times with different code I came to use the following code but it doesn't print.
I have searched for days on various threads to solve this and have had no luck. Please help. Here is my code:
#!/usr/bin/perl
$newfile = "file.txt";
open (FH, $newfile);
while ($file = <FH>) {
print $file;
}
I updated my code to the following:
#!/user/bin/perl
use strict; # Always use strict
use warnings; # Always use warnings.
open(my $fh, "<", "file.txt") or die "unable to open file.txt: $!";
# Above we open file using 3 handle method
# or die die with error if unable to open it.
while (<$fh>) { # While in the file.
print $_; # Print each line
}
close $fh; # Close the file
system('C:\Users\RSS\file.txt');
It returns the following: my first report generated by perl. I do not know where this is coming from. Nowhere do I have a print "my first report generated by perl."; statement and it definitely is not in my text file.
My text file is full of various emails, addresses, phone numbers and snippets of emails.
Thank you all for your help. I figured out my problem. I somehow managed to kick myself out of my directory and did not realize it.
This is most likely a combination of a failure to open the file, and a failure to check the return value of open.
If you are completely new to perl, I warmly recommend reading the excellent "perlintro" man page, using either man perlintro or perldoc perlintro on the command line, or taking a look here: https://perldoc.perl.org/perlintro.html.
The "Files and I/O" section there gives a good and concise way of doing this:
open(my $in, "<", "input.txt") or die "Can't open input.txt: $!";
while (<$in>) { # assigns each line in turn to $_
print "Just read in this line: $_";
}
This version will give you an explanation and abort if anything goes wrong while trying to open the file. For example, if there is no file named file.txt in the current working directory, your version will quietly fail to open the file, and afterwards it will quietly fail to read from the closed file handle.
Also, always adding at least one of these to your perl scripts will save you a lot of trouble in the long run:
use warnings; # or use the -w command line switch to turn warnings on globally
use diagnostics;
These won't catch the failure to open the file, but will alert on the failed read.
In the first example here you can see that without the diagnostics module, the code fails without any error messages. The second example shows how the diagnostics module changes this.
$ perl -le 'open FH, "nonexistent.txt"; while(<FH>){print "foo"}'
$ perl -le 'use diagnostics; open FH, "nonexistent.txt"; while(<FH>){print "foo"}'
readline() on closed filehandle FH at -e line 1 (#1)
(W closed) The filehandle you're reading from got itself closed sometime
before now. Check your control flow.
By the way, the legendary "Camel Book" is basically the perl man pages formatted for paper printing, so reading the perldocs in the order listed in perldoc perl will give you a high level of understanding of the language in a reasonably accessible and inexpensive manner.
Happy hacking!
This is simple and including explanations.
use strict; # Always use strict
use warnings; # Always use warnings.
open(my $fh, "<", "file.txt") or die "unable to open file.txt: $!";
# Above we open file using 3 handle method
# or die die with error if unable to open it.
while (<$fh>) { # While in the file.
print $_; # Print each line
}
close $fh; # Close the file
There is then also the case where you are trying to open a file which is not in a location where you think it is. So consider doing full path, if not in the same dir.
open(my $fh, "<", 'F:\Workdir\file.txt') or die "unable to open < input.txt: $!";
EDIT: After your comments, it seems that you are opening an empty file. Please add this at the bottom of that same script and rerun. It will open the file in C:\Users\RSS and make sure it does actually contain data?
system('C:\Users\RSS\file.txt');
First, of all as you are starting out, it is better to enable all warnings by 'use warnings' and disable all such expression which can lead to uncertain behavior or are difficult to debug by pragma 'use strict'.
As you are dealing with file stream, it is always recommended to the check if you were able to open the stream. so, try to use croak or die both would terminate the program with a given message.
Instead of reading inside the while condition, I would recommend checking for end of file. So, loop breaks as end is found. Usually, when reading a line you would use it for further processing, so it is good idea to remove end of lines using chomp.
A sample for reading a file in perl can be as follows:
#!/user/bin/perl
use strict;
use warnings;
my $newfile = "file.txt";
open (my $fh, $newfile) or die "Could not open file '$newfile' $!";
while (!eof($fh))
{
my $line=<$fh>;
chomp($line);
print $line , "\n";
}

perl - string compare failing while fetching a line from a file.

my code,
#!/usr/bin/perl -w
use strict;
use warnings;
my $codes=" ";
my $count=0;
my $str1="code1";
open (FILE, '/home/vpnuser/testFile.txt') or die("Could not open the file.");
while($codes=<FILE>)
{
print($codes);
if($codes eq $str1)
{
$count++;
}
}
print "$count";
the comparison always fails. my testFile.txt contains one simple line - code1
when i have written a separate perl script where i have two strings declared in the script it self rather than getting it from a file, the eq operator works fine. but when i am getting it from a file, there is a problem. Pease help,
Thanks in advance!
Don't forget to chomp your file input if you don't want it to end in a return character.
while(my $codes = <FILE>)
{
chomp $codes;
That is likely the reason why your string comparison is failing.
As on additional aside, kudus for including use strict; and use warnings; at the the top of your script, like one should always do.
I'd like to recommend that you also include use autodie; at the top as well when doing file processing. It will automatically give you a detailed error message for doing many kinds of operations, such as opening a file, so you won't have to remember to include the error code $! or the filename in your die statement.

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

STDOUT to array Perl

I am compiling a Perl program, i am writing the output STDOUT to a file. In the same program , i want to run another small script using while function on the output of STDOUT. So, I need to save the output of first script in an array, then i can use in while<#array>. Like
open(File,"text.txt");
open(STDOUT,">output,txt");
#file_contents=<FILE>;
foreach (#file_contents){
//SCRIPT GOES HERE//
write;
}
format STDOUT =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
//Here I want to use output of above program in while loop //
while(<>){
}
How can i save the output of first program into array so that i can use in while loop, or how can i directly use STDOUT in while loop. I have to make sure that first part is completely executed. Thanks in advance.
Since you remapped STDOUT so it writes to a file, you could presumably close STDOUT, and then reopen the file for reading.
Quite where you're going to send any other output is a bit of a mystery, but presumably you can resolve that. Were it me, I'd not fiddle with STDOUT. I'd make the script write to a file handle:
use strict;
use warnings;
open my $input, "<", "text.txt" or die "A horrible death";
open my $output, ">", "output.txt" or die "A horrible death";
my #file_contents = <$input>;
close($input);
foreach (#file_contents)
{
# Script goes here
print $output "Any information that goes to output\n";
}
close $output;
open my $reread, "<", "output.txt" or die "A horrible death";
while (<$reread>)
{
# Process the previous output
}
Note the use of lexical file handles, the checking that the open worked, the close when finished with the input file, the use of use strict; and use warnings;. (I've only been working with Perl for 20 years and I know I don't trust my scripts until they run clean with those settings.)
I assume you want to reopen STDOUT in order to make the write function work. However, the correct solution for that is to either specify the file handle, or to a lesser extent, to use select.
write FILEHANDLE;
or
select FILEHANDLE;
write;
Unfortunately, it seems the IO of perlform is a bit arcane, and does not seem to allow for lexical file handles.
Your problem is you can't reuse the formatted text within the program, so a bit of trixy programming is required. What you can do is open a file handle that prints to a scalar. Which is another somewhat arcane perl functionality, but in this case, it might be the only way to do this directly.
# Using FOO as format to avoid destroying STDOUT
format FOO =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
my $foo;
use autodie; # save yourself some typing
open INPUT, '<', "text.txt"; # normally, we would add "or die $!" on these
open FOO, '>', \$foo; # but now autodie handles that for us
open my $output, '>', "output.txt";
while (<FILE>) {
$foo = ""; # we need to reset $foo each iteration
write FOO; # write to the file handle instead
print $output $foo; # this now prints $foo to output.txt
do_something($foo); # now you can also process the text at the same time
}
As you'll notice, we now first print the formatted line to the scalar $foo. While it is there, we can handle it as regular data, so there's no need to save to a file and reopening it to get to the data.
Each iteration, data is concatenated to the end of $foo, so to avoid accumulation, we need to reset $foo. The best way to handle this would be to make $foo lexical within the scope, but unfortunately we need $foo to be declared outside the while loop in order to be able to use it in the open statement.
It might be possible to use local $foo inside the while-loop, but I think that's adding yet more bad practice to this already very bad hack.
Conclusion:
With all this said and done, I suspect the best way to handle this is to not use perlform at all, and format your data in some other way. While perlform might be well suited to print to a file, it is not the best suited for what you have in mind. I recall this question from earlier, perhaps there was some other answer that would work better. Such as using sprintf, like Jonathan suggested
Assuming the output from your first program is tab-delimited:
while (<>) {
chomp $_;
my ($variable, $x, $y, $z) = split("\t", $_);
# do stuff with values
}