How to read multiple files from a directory, extract specific strings and ouput to an html file? - perl

Greetings,
I have the following code and am stuck on how I would proceed to modify it so it will ask for the directory, read all files in the directory, then extract specific strings and ouput to an html file? Thanks in advance.
#!/usr/local/bin/perl
use warnings;
use strict;
use Cwd;
print "Enter filename: "; # Should be Enter directory
my $perlfile =STDIN;
open INPUT_FILE, $perlfile || die "Could not open file: $!";
open OUTPUT, '>out.html' || die "Could not open file: $!";
# Evaluates the file and imports it into an array.
my #comment_array = ;
close(INPUT_FILE);
chomp #comment_array;
#comment_array = grep /^\s*#/g, #comment_array;
my $comment;
foreach $comment (#comment_array) {
$comment =~ /####/; #Pattern match to grab only #s
# Prints comments to screen
Print results in html format
# Writes comments to output.html
Writes results to html file
}
close (OUTPUT);

Take it one step at a time. You have a lot planned, but so far you haven't even changed your prompt string to ask for a directory.
To read the entered directory name, your:
my $perlfile =STDIN;
gives an error (under use strict;). Start by looking that error up (use diagnostics; automates this) and trying to figure out what you should be doing instead.
Once you can prompt for a directory name and print it out, then add code to open the directory and read the directory. Directories can be opened and read with opendir and readdir. Make sure you can read the directory and print out the filenames before going on to the next step.

a good starting point to learn about specific functions (from the cmd line)
perldoc -f opendir
However, your particular problem is answered as follows, you can also use command line programs and pipe them into a string to simplify file handling ('cat') and pattern matching ('grep').
#!/usr/bin/perl -w
use strict;
my $dir = "/tmp";
my $dh;
my #patterns;
my $file;
opendir($dh,$dir);
while ($file = readdir($dh)){
if (-f "$dir/$file"){
my $string = `cat $dir/$file | grep pattern123`;
push #patterns, $string;
}
}
closedir($dh);
my $html = join("<br>",#patterns);
open F, ">out.html";
print F $html;
close F;

Related

Perl multiple files read replacing string, write to multiple files

I have this code working, but this is only for one file with specific name, how can I let it does all .vb file in current folder and output with file name plus _1 in the back
#!/usr/bin/perl
use strict;
use warnings;
open my $fhIn, '<', 'file.vb' or die $!;
open my $fhOut, '>', 'file_1.vb' or die $!;
while (<$fhIn>) {
print $fhOut "'01/20/2016 Added \ngpFrmPosition = Me.Location\n" if /MessageBox/ and !/'/;
print $fhOut $_;
}
close $fhOut;
close $fhIn;
I might approach it like this. (This assumes the script is running in the same directory as the .vb files).
#!/usr/bin/perl
use strict;
use warnings;
# script running in same directory as the .vb files
for my $file (glob "*.vb") {
my $outfile = $file =~ s/(?=\.vb$)/_1/r;
print "$file $outfile\n"; # DEBUG
# open input and output files
# do the while loop
}
The print statement in the loop is for debug purposes - to see if you are creating the new file names correctly. You can delete it or comment it out when you are satisfied you have got the files you want.
Update: Put the glob in the for loop instead of reading it to an array.

how to point this script to one folder for reading and another for writing

I can't seem to get this script to open from one directory and write to another. Both Directories exist. I've commented out what I tried. Funny this is it runs fine when I place it in the directory with the files to process. Here's the code:
use strict;
use warnings "all";
my $tmp;
my $dir = ".";
#my $dir = "Ask/Parsed/Html4/";
opendir(DIR, $dir) or die "Cannot open directory: $dir!\n";
my #files = readdir(DIR);
closedir(DIR);
open my $out, ">>output.txt" or die "Cannot open output.txt!\n";
#open my $out, ">>Ask/Parsed/Html5/output.txt" or die "Cannot open output.txt!\n";
foreach my $file (#files)
{
if($file =~ /html$/)
{
open my $in, "<$file" or die "Cannot open $file!\n";
undef $tmp;
while(<$in>)
{
$tmp .= $_;
}
print $out ">$file\n";
print $out "$tmp\n";
#print $out "===============";
close $in;
}
}
close $out;
The directories you use -- . and Ask/Parsed/Html4/ -- are relative paths, which means they are relative to your current working directory, and so it makes a difference where in the file system you are currently located when you run the script.
In addition, the files you are opening -- output.txt and $file -- have no path information, so Perl will look in your current working directory to find them.
There are a few ways to solve this.
You could cd to the directory where your files are before running the script, and open the directory as . as you currently do
You could achieve the same effect by calling chdir from within the script, which will change the current working directory and make the program ignore your location when you ran it
Or you could add an absolute directory path to the beginning of the file names, preferably using catfile from File::Spec::Functions
However I would choose to use glob -- which works in the same way as command-line filename expansion -- in preference to opendir / readdir as the resulting strings include the path (if one was specified in the parameter) and there is no need to separately filter the .html files.
I would also choose to undefine the input record separator $/ to read the whole file, rather than reading it line-by-line and concatenating them all.
Finally, if you are running version 10 or later of Perl 5 then it is simpler to use autodie rather than checking the success of every open, readline, close, opendir, readdir, and closedir etc.
Something like this
use strict;
use warnings 'all';
use 5.010;
use autodie;
my $dir = '/path/to/Ask/Parsed/Html4';
my #html = glob "$dir/*.html";
open my $out, '>>', "$dir/output.txt";
for my $file (#html) {
my $contents = do {
open my $in, '<', $file;
local $/;
<$in>;
};
print $out "> $file\n";
print $out "$contents\n";
print $out "===============";
}
close $out;
It is likely trying to access the files from where ever you are calling this from. If you're files are located relative to the location of the script use the following example to provide a full path;
use FindBin;
my $file = "$FindBin::Bin/Ask/Parsed/Html5/output.txt";
If your file us not relative to the script, provide the full path;
my $file = "/home/john.doe/Ask/Parsed/Html5/output.txt";
Note that readdir() only returns the file name. If you want to open it prepend the directory
eg
open my $in, "<", "$dir/$file" or die "Cannot open $file!\n";
Note that best practice says you should be using the three parameter version of open, otherwise

Rename all .txt files in a directory and then open that file in perl

I need some help with file manipulations and need some expert advice.
It looks like I am making a silly mistake somewhere but I can't catch it.
I have a directory that contains files with a .txt suffix, for example file1.txt, file2.txt, file3.txt.
I want to add a revision string, say rev0, to each of those files and then open the modified files. For instance rev0_file1.txt, rev0_file2.txt, rev0_file3.txt.
I can append rev0, but my program fails to open the files.
Here is the relevant portion of my code
my $dir = "path to my directory";
my #::tmp = ();
opendir(my $DIR, "$dir") or die "Can't open directory, $!";
#::list = readdir($DIR);
#::list2 = sort grep(/^.*\.txt$/, #::list);
foreach (#::list2) {
my $new_file = "$::REV0" . "_" . "$_";
print "new file is $new_file\n";
push(#::tmp, "$new_file\n");
}
closedir($DIR);
foreach my $cur_file (<#::tmp>) {
$cur_file = $_;
print "Current file name is $cur_file\n"; # This debug print shows nothing
open($fh, '<', "$cur_file") or die "Can't open the file\n"; # Fails to open file;
}
Your problem is here:
foreach my $cur_file(<#::tmp>) {
$cur_file = $_;
You are using the loop variable $cur_file, but you overwrite it with $_, which is not used at all in this loop. To fix this, just remove the second line.
Your biggest issue is the fact you are using $cur_file in your loop for the file name, but then reassign it with $_ even though $_ won't have a value at that point. Also, as Borodin pointed out, $::REV0 was never defined.
You can use the move command from the File::Copy to move the files, and you can use File::Find to find the files you want to move:
use strict;
use warnings;
use feature qw(say);
use autodie;
use File::Copy; # Provides the move command
use File::Find; # Gives you a way to find the files you want
use constant {
DIRECTORY => '/path/to/directory',
PREFIX => 'rev0_',
};
my #files_to_rename;
find (
sub {
next unless /\.txt$/; # Only interested in ".txt" files
push #files_to_rename, $File::Find::name;
}, DIRECTORY );
for my $file ( #files_to_rename ) {
my $new_name = PREFIX . $file;
move $file, $new_name;
$file = $new_name; # Updates #files_to_rename with new name
open my $file_fh, "<", $new_name; # Open the file here?
...
close $file_fh;
}
for my $file ( #files_to_rename ) {
open my $file_fh, "<", $new_name; # Or, open the file here?
...
close $file_fh;
}
See how using Perl modules can make your task much easier? Perl comes with hundreds of pre-installed packages to handle zip files, tarballs, time, email, etc. You can find a list at the Perldoc page (make sure you select the version of Perl you're using!).
The $file = $new_name is actually changing the value of the file name right inside the #files_to_rename array. It's a little Perl trick. This way, your array refers to the file even through it has been renamed.
You have two choices where to open the file for reading: You can rename all of your files first, and then loop through once again to open each one, or you can open them after you rename them. I've shone both places.
Don't use $:: at all. This is very bad form since it overrides use strict; -- that is if you're using use strict to begin with. The standard is not to use package variables (aka global variables) unless you have to. Instead, you should use lexically scoped variables (aka local variables) defined with my.
One of the advantages of the my variable, I really don't need the close command since the variable falls out of scope with each iteration of the loop and disappears entirely once the loop is complete. When the variable that contains the file handle falls out of scope, the file handle is automatically closed.
Always include use strict;, use warnings at the top of EVERY script. And use autodie; anytime you're doing file or directory processing.
There is no reason why you should be prefixing your variables with :: so please simplify your code like the following:
use strict;
use warnings;
use autodie;
use File::Copy;
my $dir = "path to my directory";
chdir($dir); # Make easier by removing the need to prefix path information
foreach my $file (glob('*.txt')) {
my $newfile = 'rev0_'.$file;
copy($file, $newfile) or die "Can't copy $file -> $newfile: $!";
open my $fh, '<', $newfile;
# File processing
}
What you've attempted to store is the updated name of the file in #::tmp. The file hasn't been renamed, so it's little surprise that the code died because it couldn't find the renamed file.
Since it's just renaming, consider the following code:
use strict;
use warnings;
use File::Copy 'move';
for my $file ( glob( "file*.txt" ) ) {
move( $file, "rev0_$file" )
or die "Unable to rename '$file': $!";
}
From a command line/terminal, consider the rename utility if it is available:
$ rename file rev0_file file*.txt

Error when running a DOS command in CGI

I tried to run a simple copy of one file to another folder using Perl
system("copy template.html tmp/$id/index.html");
but I got the error error: The syntax of the command is incorrect.
When I change it to
system("copy template.html tmp\\$id\\index.html");
The system copies another file to the tmp\$id foler
Can someone help me?
I suggest you use File::Copy, which comes with your Perl distribution.
use strict; use warnings;
use File::Copy;
print copy('template.html', "tmp/$id/index.html");
You do not need to worry about the slashes or backslashes on Windows because the module will take care of that for you.
Note that you have to set relative paths from your current working directory, so both template.html as well as the dir tmp/$id/ needs to be there. If you want to create the folders on the fly, take a look at File::Path.
Update: Reply to comment below.
You can use this program to create your folders and copy the files with in-place substitution of the IDs.
use strict; use warnings;
use File::Path qw(make_path);
my $id = 1; # edit ID here
# Create output folder
make_path("tmp/$id");
# Open the template for reading and the new file for writing
open $fh_in, '<', 'template.html' or die $!;
open $fh_out, '>', "tmp\\$id\index.html" or die $!;
# Read the template
while (<$fh_in>) {
s/ID/$id/g; # replace all instances of ID with $id
print $fh_out $_; # print to new file
}
# Close both files
close $fh_out;
close $fh_in;

filehandle - won't write to a file

I cannot get the script below to write to the file, data.txt, using a FILEHANDLE. Both the files are in the same folder, so that's not the issue. Since I started with Perl, I have noticed to run scripts, I have to use a full path: c:\programs\scriptname.pl and also the same method to input files. I thought that could be the issue and tried this syntax below but that didn't work either...
open(WRITE, ">c:\programs\data.txt") || die "Unable to open file data.txt: $!";
Here is my script. I have checked the syntax until it makes me crazy and cannot see an issue. Any help would be greatly appreciated!. I'm also puzzled, why the die function hasn't kicked in.
#!c:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
#warnings
#obtain info in variables to be written to data.txt
print("What is your name?");
$name = <STDIN>;
print("How old are you?");
$age = <STDIN>;
print("What is your email address?");
$email = <STDIN>;
#data.txt is in the same file as this file.
open(WRITE, ">data.txt") || die "Unable to open file data.txt: $!";
#print information to data.txt
print WRITE "Hi, $name, you are \s $age and your email is \s $email";
#close the connection
close(WRITE);
How I solved this problem solved.
I have Strawberry Perl perl.exe installed on the c: drive, installed through and using the installer with a folder also on c with my scripts in, which meant I couldn't red/write to a file (directional or using functions, ie the open one) and I always had to use full paths to launch a script. I solved this problem after a suggestion of leaving the interpreter installed where it was and moving my scripts file to the desktop (leave the OS command in the first line of the script where it is as the interpreter is still in the same place it was initially). Now I can run the scripts with one click and read/write and append to file with CMD prompt and using Perl functions with ease.
Backslashes have a special meaning in double-quoted strings. Try escaping the backslashes.
open(WRITE, ">c:\\programs\\data.txt") || die ...;
Or, as you're not interpolating variables, switch to single quotes.
open(WRITE, '>c:\programs\data.txt') || die ...;
It's also worth using the three-argument version of open and lexical filehandles.
open(my $write_fh, '>', 'c:\programs\data.txt') || die ...;
you must use "/" to ensure portability, so: open(WRITE, ">c:/programs/data.txt")
Note: I assume that c:/programs folder exists
You may want to try FindBin.
use strict;
use warnings;
use autodie; # open will now die on failure
use FindBin;
use File::Spec::Functions 'catfile';
my $filename = catfile $FindBin::Bin, 'data.txt';
#obtain info in variables to be written to data.txt
print("What is your name?"); my $name = <STDIN>;
print("How old are you?"); my $age = <STDIN>;
print("What is your email address?"); my $email = <STDIN>;
{
open( my $fh, '>', $filename );
print {$fh} "Hi, $name, you are $age, and your email is $email\n";
close $fh;
}
If you have an access problem when you try to print to data.txt you can change that line to:
print WRITE "Hi, $name, you are \s $age and your email is \s $email" || die $!;
to get more information. A read only file will cause this error message:
Unable to open file data.txt: Permission denied at perl.pl line 12, <STDIN> line 3.