i am using getpdftext.pl from CAM::PDF to extract pdf and print it to text, but in my web application i want to call this getpdftext.pl inside .cgi script. Can u suggest me as to what to do or how to proceed ahead. i tried converting getpdftext.pl to getpdftext.cgi but it doesnt work.
Thanks all
this is a extract from my request_admin.cgi script
my $filename = $q->param('quote');
:
:
:
&parsePdf($filename);
#function to extract text from pdf ,save it in a text file and parse the required fields
sub parsePdf($)
{
my $i;
print $_[0];
$filein = "quote_uploads/$_[0]";
$fileout = 'output.txt';
print "inside parsePdf\n";
open OUT, ">$fileout" or die "error: $!";
open IN, '-|', "getpdftext.pl $filein" or die "error :$!" ;
while(<IN>)
{
print "$i";
$i++;
print OUT;
}
}
It's highly likely that
Your CGI script's environment isn't complete enough to locate
getpdftext.pl and/or
The web-server user doesn't have permission to execute it anyway
Have a look in your web-server's error-log and see if it is reporting any pointers as to why this doesn't work.
In your particular case, it might be simpler and more direct to use CAM::PDF directly, which should have been installed along with getpdftext.pl anyway.
I had a look at this script and I think that your parsePdf sub could just as easily be written as:
#!/usr/bin/perl
use warnings;
use strict;
use CAM::PDF;
sub parsePdf {
my $filein = "quote_uploads/$_[0]";
my $fileout = 'output.txt';
open my $out_fh, ">$fileout" or die "error: $!";
my $doc = CAM::PDF->new($filein) || die "$CAM::PDF::errstr\n";
my $i = 0;
foreach my $p ($doc->rangeToArray(1,$doc->numPages()))
{
my $str = $doc->getPageText($p);
if (defined $str)
{
CAM::PDF->asciify(\$str);
print $i++;
print $out_fh $str;
}
}
}
Related
My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}
I'm planning to convert a Perl file into an executable file so that when you run the executable, it should detect all the *.css files in the current directory and throws the output.
Is there a way to do this? I have a written the program below and it's throwing up an error message. Can you please help me fine tune this?
Please bear with me since I'm trying to learn Perl. The directory to be read is bin.
#!/usr/bin/perl
my #files;
opendir(bin, $Directoryname) or die "cannot open directory $Directoryname";
#files = grep(/\.css$/, readdir(bin));
# my $regex = qr/
# (?=.*font-size) &&
# (?!.*%)
# /ix;
foreach $file (#files) {
open my $handle, '<', $file or die "could not open '$file': $!";
while (my $line = <$handle>) {
if ($line =~ /font-size/) {
if ($line !~ /%/) {
print "\n Forced Font is detected. -- $line\n";
}
else {
print "\n Font is specified in % -- $line \n";
}
}
if ($line =~ /line-height/) {
print "\n Forced Line-height is detected. \n";
}
if ($line =~ /position:absolute/) {
print "\n position:absolute is detected. \n";
}
}
}
close(txt);
This is giving the output:
cannot open directory at C:\Perl64\bin\sample1.pl line 4.
The variable $Directoryname has no value associated with it.
You should set it with something like
$Directoryname = 'bin';
That said, in order to find all css files, I'd use glob:
my #files = glob('*.css');
Looks like $Directoryname is blank or undefined where do you assign a value to it?
Add use strict; at the beginning of your script, to see where you have other problems.
Check docs for opendir/readir and how they should be used,
use strict;
use warnings;
my $Directoryname = "bin";
opendir(my $dir, $Directoryname) or die "$! $Directoryname";
my #files = grep(/\.css$/, readdir($dir));
# ..
Thanks guys ! I fixed the issue...Went through the readir/opendir and learned how to code this !
I made a file, "rootfile", that contains paths to certain files and the perl program mymd5.perl gets the md5sum for each file and prints it in a certain order. How do I redirect the output to a file if a name is given in the command line? For instance if I do
perl mymd5.perl md5file
then it will feed output to md5file. And if I just do
perl mydm5.perl
it will just print to the console.
This is my rootfile:
/usr/local/courses/cs3423/assign8/cmdscan.c
/usr/local/courses/cs3423/assign8/driver.c
/usr/local/courses/cs3423/assign1/xpostitplus-2.3-3.diff.gz
This is my program right now:
open($in, "rootfile") or die "Can't open rootfile: $!";
$flag = 0;
if ($ARGV[0]){
open($out,$ARGV[0]) or die "Can't open $ARGV[0]: $!";
$flag = 1;
}
if ($flag == 1) {
select $out;
}
while ($line = <$in>) {
$md5line = `md5sum $line`;
#md5arr = split(" ",$md5line);
if ($flag == 0) {
printf("%s\t%s\n",$md5arr[1],$md5arr[0]);
}
}
close($out);
If you don't give a FILEHANDLE to print or printf, the output will go to STDOUT (the console).
There are several way you can redirect the output of your print statements.
select $out; #everything you print after this line will go the file specified by the filehandle $out.
... #your print statements come here.
close $out; #close connection when done to avoid counfusing the rest of the program.
#or you can use the filehandle right after the print statement as in:
print $out "Hello World!\n";
You can print a filename influenced by the value in #ARGV as follows:
This will take the name of the file in $ARGV[0] and use it to name a new file, edit.$ARGV[0]
#!/usr/bin/perl
use warnings;
use strict;
my $file = $ARGV[0];
open my $input, '<', $file or die $!;
my $editedfile = "edit.$file";
open my $name_change, '>', $editedfile or die $!;
if ($input eq "md5file"){
while ($in){
# Do something...
print $name_change "$_\n";
}
}
Perhaps the following will be helpful:
use strict;
use warnings;
while (<>) {
my $md5line = `md5sum $_`;
my #md5arr = split( " ", $md5line );
printf( "%s\t%s\n", $md5arr[1], $md5arr[0] );
}
Usage: perl mydm5.pl rootfile [>md5file]
The last, optional parameter will direct output to the file md5file; if absent, the results are printed to the console.
I have a large txt file made of thousand of articles and I am trying to split it into individual files - one for each of the articles that I'd like to save as article_1, article_2 etc.. Each articles begins by a line containing the word /DOCUMENTS/.
I am totally new to perl and any insight would be so great ! (even advice on good doc websites). Thanks a lot.
So far what I have tried look like:
#!/usr/bin/perl
use warnings;
use strict;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open IN, $source or die "can t read $source: $!\n";
while (<IN>)
{
{
open OUT, ">$destination" or die "can t write $destination: $!\n";
if (/DOCUMENTS/)
{
close OUT ;
$id++;
}
}
}
close IN;
Let's say that /DOCUMENTS/ appears by itself on a line. Thus you can make that the record separator.
use English qw<$RS>;
use File::Slurp qw<write_file>;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
{ local $RS = "\n/DOCUMENTS/\n";
open my $in, $source or die "can t read $source: $!\n";
while ( <$in> ) {
chomp; # removes the line "\n/DOCUMENTS/\n"
write_file( 'file' . ( ++$id ) . '.txt', $_ );
}
# being scoped by the surrounding brackets (my "local block"),
close $in; # an explicit close is not necessary
}
NOTES:
use English declares the global variable $RS. The "messy name" for it is $/. See perldoc perlvar
A line separator is the default record separator. That is, the standard unit of file reading is a record. Which is only, by default, a "line".
As you will find in the linked documentation, $RS only takes literal strings. So, using the idea that the division between articles was '/DOCUMENTS/' all by itself on a line, I specified newline + '/DOCUMENTS/' + newline. If this is part of a path that occurs somewhere on the line, then that particular value will not work for the record separator.
Did you read Programming Perl? It is the best book for beginning!
I don't understand what you are trying to do. I assume you have text that has articles and want to get all articles in separate files.
use warnings;
use strict;
use autodie qw(:all);
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open my $IN, '<', $source;
#open first file
open my $OUT, '>', $destination;
while (<$IN>) {
chomp; # kill \n at the end
if ($_ eq '/DOCUMENTS/') { # not sure, am i right here or what you looking for
close OUT;
$id++;
$destination = "file$id.txt";
open my $OUT, '>', $destination;
} else {
print {$OUT} $_, "\n"; # print into file with $id name (as you open above)
}
}
close $IN;
I am trying to extract some information from pdf. I am trying to use getpdftext.pl from the CAM::PDF module. When I just run $~ getpdftext.pl sample.pdf, it produces a text of the pdf to stdout.
But I am thinking of writing this to a textfile and parse for required fields in perl. Can someone please guide me on how to do this?
But when I try to call pdftotext.pl inside my perl script I am getting a No such file error.
#program to extract text from pdf and save it in a text file
use PDF;
use CAM::PDF;
use CAM::PDF::PageText;
use warnings;
use IPC::System::Simple qw(system capture);
$filein = 'sample.pdf';
$fileout = 'output1.txt';
open OUT, ">$fileout" or die "error: $!";
open IN, "getpdftext.pl $filein" or die "error :$!" ;
while(<IN>)
{
print OUT $fileout;
}
It would probably be easier to make getpdftext.pl to do what you want.
Working with the code from getpdftext.pl, this (untested code) should output the pdf to a text file.
my $filein = 'sample.pdf';
my $fileout = 'output1.txt';
my $doc = CAM::PDF->new($filein) || die "$CAM::PDF::errstr\n";
open my $fo, '>', $fileout or die "error: $!";
foreach my $p ( 1 .. $doc->numPages() ) {
my $str = $doc->getPageText($p);
if (defined $str) {
CAM::PDF->asciify(\$str);
print $fo $str;
}
}
close $fo;
See perldoc -f open. You want to take the output stream of an external command and use it as an input stream inside your Perl script. That's what the -| mode is for:
open my $IN, '-|', "getpdftext.pl $filein" or die $!;
while (<$IN>) {
...
}