Run a LaTeX file for PDF generation - perl

I found this code on the internet and I was deeply confused. Could you please help me out?
It seems like the file generated will be saved into a folder with name cover_letters, which should already be there. I ran the code, but I can't find the generated file at all even though no errors are given. Did I misunderstand something?
use strict;
use Schools;
sub main;
sub generate_latex;
# Set this to 1 if you want to generate cover letters for all
# schools regardless of whether their DO_GENERATE_KEY is set.
my $generate_all_ = 0;
# The directory where the files will go (should be created already).
my $cover_letter_dir_ = "cover_letters";
# The prefix that will be prepended to each file.
my $fname_pre_ = "coverletter_";
# A log file to see an errors.
my $log_file_ = "coverletter.log";
main();
sub main {
my $key;
my $num_keys = scalar(keys(%Schools::univ_));
my $ctr = 1;
print "\n";
if(-e $cover_letter_dir_."/".$log_file_) {
unlink($cover_letter_dir_."/".$log_file_);
}
my $i;
if($generate_all_ == 1) {
my #prev_fnames = glob($cover_letter_dir_."/".$fname_pre_."*");
foreach $i (#prev_fnames) {
unlink $i;
}
}
foreach $key (keys(%Schools::univ_)) {
if($generate_all_ == 1 ||
(defined $Schools::univ_{$key}{$Schools::DO_GENERATE_KEY} &&
$Schools::univ_{$key}{$Schools::DO_GENERATE_KEY} > 0)) {
print "Generating $key, $ctr of $num_keys\n";
generate_latex($key);
$ctr++;
}
}
print "\n";
}
# NOTE: You should change the LaTeX in this function
# to generate the format you desire. I used the newlfm
# and charter (a font) packages to create my files.
sub generate_latex {
my $key = shift;
my $fname = $cover_letter_dir_."/".$fname_pre_.$key.".tex";
local *OUT;
open(OUT, '>', $fname) || die "Cannot open $fname\n";
# newlfm is available at:
# http://www.tug.org/tex-archive/help/Catalogue/entries/newlfm.html
print OUT "\\documentclass[10pt,stdletter]{newlfm}\n";
print OUT "\\usepackage{charter}\n\n";
# Determines how forceful LaTeX is in making sure page breaks
# do not leave orphaned lines.
print OUT "\\widowpenalty=1000\n";
print OUT "\\clubpenalty=1000\n\n";
# Replace "figures/logo" with the logo you want to use and
# ".eps" with the extension of your file.
print OUT "\\newsavebox{\\Luiuc}\n";
print OUT "\\sbox{\\Luiuc}{%\n";
print OUT "\t\\parbox[b]{1.75in}{%\n";
print OUT "\t\t\\vspace{0.5in}%\n";
print OUT "\t\t\\includegraphics[scale=1.0,ext=.eps]\n";
print OUT "\t\t{figures/logo}%\n";
print OUT "\t}%\n";
print OUT "}%\n";
print OUT "\\makeletterhead{Uiuc}{\\Lheader{\\usebox{\\Luiuc}}}\n\n";
print OUT "\\newlfmP{headermarginskip=20pt}\n";
print OUT "\\newlfmP{sigsize=50pt}\n";
#print OUT "\\newlfmP{sigskipbefore=20pt}\n";
print OUT "\\newlfmP{dateskipafter=20pt}\n";
print OUT "\\newlfmP{addrfromphone}\n";
print OUT "\\newlfmP{addrfromemail}\n";
print OUT "\\PhrPhone{Phone}\n";
print OUT "\\PhrEmail{Email}\n\n";
print OUT "\\lthUiuc\n\n";
print OUT "\\namefrom{Your Name}\n";
print OUT "\\addrfrom{%\n";
print OUT "\tYour Street\\\\\n";
print OUT "\tBeverly Hills, CA 90210\n";
print OUT "}\n";
print OUT "\\phonefrom{217-555-5555}\n";
print OUT "\\emailfrom{you\#school.edu}\n\n";
if(defined $Schools::univ_{$key}{$Schools::ADDR_KEY}) {
if(defined $Schools::univ_{$key}{$Schools::TO_NAME_KEY}) {
print OUT "\\nameto{".
"$Schools::univ_{$key}{$Schools::TO_NAME_KEY}}\n";
}
print OUT "\\addrto{%\n";
print OUT "$Schools::univ_{$key}{$Schools::ADDR_KEY}";
print OUT "}\n\n";
}
# Specify the default string to use if no TO_NAME key is defined.
my $to_str = "To Whom It May Concern";
#my $to_str = "Dear Faculty Search Committee";
if(defined $Schools::univ_{$key}{$Schools::TO_NAME_KEY}) {
my #name = split(/ /,
$Schools::univ_{$key}{$Schools::TO_NAME_KEY});
$to_str = "Dear $name[0] $name[$#name]";
}
print OUT "\\greetto{$to_str,}\n";
print OUT "\\closeline{Sincerely,}\n";
print OUT "\\begin{document}\n";
print OUT "\\begin{newlfm}\n\n";
# Specify a default string to use if no UNIV_KEY or DEPT_KEY is defined.
my $dept_str = "your department";
if(defined $Schools::univ_{$key}{$Schools::UNIV_KEY} &&
defined $Schools::univ_{$key}{$Schools::DEPT_KEY}) {
# Make the school possessive...deal with school names ending
# in 's'.
my $poss_str = "'s";
my $name_str = $Schools::univ_{$key}{$Schools::UNIV_KEY};
if(substr($name_str,(length($name_str) - 1), 1) eq "s") {
$poss_str = "'";
}
$dept_str = "$Schools::univ_{$key}{$Schools::UNIV_KEY}$poss_str ".
"$Schools::univ_{$key}{$Schools::DEPT_KEY} Department"
}
# By default, do not include any string about your research
# background.
my $area_str = "";
if(defined $Schools::univ_{$key}{$Schools::AREA_KEY}) {
# If the ad is relevant to your specific background, use this
# string.
$area_str =
"My background is particularly applicable to your advertised\n".
"preference in the area of ".
"$Schools::univ_{$key}{$Schools::AREA_KEY}.\n";
}
# By default, do not print anything extra for the school.
my $loc_str = "";
if(defined $Schools::univ_{$key}{$Schools::LOCATION_KEY}) {
# If extra info is defined for the school, print it.
$loc_str = $Schools::univ_{$key}{$Schools::LOCATION_KEY};
}
# Here's the actual text that will go in your cover letter.
# I will leave mine in as an example, but it is NOT a template.
# DO NOT PLAGERIZE THIS!!! The cover letter should be unique
# to you. Plus, it is a well-known fact that the job gods
# look very unfavorably upon anyone that doesn't write their
# own cover letter.
print OUT "I am writing to apply for the position of assistant\n".
"professor in $dept_str. I plan to receive my\n".
"Ph.D.\\ degree from the University of Illinois at\n".
"Urbana-Champaign in Summer of 2006. My adviser is\n".
"Prof.\\ Nitin H.\\ Vaidya, and my general areas of interest\n".
"include wireless and sensor network performance and security.\n".
"$area_str $loc_str\n";
print OUT "In my graduate work, I focus on the design of\n".
"energy-efficient protocols and secure key distribution.\n".
"More specifically, I have explored various techniques at\n".
"multiple layers of the network stack to effectively reduce\n".
"the energy consumption of wireless communication. In security,\n".
"my work was the first to propose leveraging channel diversity\n".
"for sensor network key distribution. My research appears in \n".
"the \\textit{IEEE Transactions on Mobile Computing} journal as\n".
"well as \\textit{Infocom 2006} and \\textit{ICDCS 2005},\n".
"prestigious conferences in the areas of networking and distributed\n".
"systems, respectively.\n\n";
print OUT "Enclosed is my curriculum vitae (including a list of\n".
"publications), contact information for my references, a research\n".
"statement, and a teaching statement. All of my publications and\n".
"presentations are available at:\n\n";
print OUT "http://www.crhc.uiuc.edu/\$\\sim\$mjmille2/publications/\n\n";
print OUT "Please let me know if there are any other materials\n".
"or information that will assist you in processing my application.\n\n";
print OUT "Thank you for your consideration. I look forward to\n".
"hearing from you.\n\n";
print OUT "\\end{newlfm}\n";
print OUT "\\end{document}\n\n";
close OUT;
chdir($cover_letter_dir_);
# Change this command if you generate LaTeX with a different
# series of commands.
system("latexmk -pdfps $fname_pre_$key >> $log_file_ 2>&1");
chdir("..");
}

Related

Perl - Using a Variable that is inside an if statement

I've been putting together a Perl script that defines a variable, and then via an if statement will assign a new value to the variable.
I want to use the last assigned value from the if statement and reference it somewhere else outside the if altogether.
It appears that when referencing it that it uses the original value.
Here's the code:
## VARIABLES
my $FILENAME = $input[0];
open my $info, $DATAFILE or die "can't open <$DATAFILE> for reading $!";
{
while (my $line = <$info>) {
chomp $line;
my #input = split(':', $line);
chomp(#input);
$FILENAME = $input[0];
$PERMIT = $input[1];
$FILESIZE = -s "$TIFF_DL_LOCATION/$PERMIT\_$FILENAME";
$SHORT_PERMIT = substr($PERMIT, 0, 2);
### DEBUG ONLY ###
print "$FILENAME / $PERMIT / $FTPBASE/$SHORT_PERMIT/$PERMIT/$FILENAME\n";
my $ftp = Net::FTP::Throttle->new(
"example.com",
MegabitsPerSecond => $THROTTLELVL,
Debug => $DEBUGLVL
) or die "Cannot connect: $#";
$ftp->login("anonymous", 'anonymous') or die "Cannot login ", $ftp->message;
$ftp->binary or die "Unable to set binary mode ", $ftp->message;
if ($PROGRESSBAR eq 1) {
print "\n[$./$LINE_COUNT] Downloading $FILENAME\n";
my $REMOTE_FILESIZE = $ftp->size("/PATH/TO/DATA/$SHORT_PERMIT/$PERMIT/$FILENAME");
if (!defined($REMOTE_FILESIZE)) {
print "=> FILE DOES NOT APPEAR TO EXIST ON FTP SERVER\n";
if ($FILENAME =~ m/_\s/) {
print "=> ATTEMPTING TO FIX NOW\n";
$FILENAME =~ s/_\s/, /g;
$REMOTE_FILESIZE = $ftp->size("/PATH/TO/DATA/$SHORT_PERMIT/$PERMIT/$FILENAME");
if (!defined($REMOTE_FILESIZE)) {
print "=> FAILED!\n";
}
}
elsif ($FILENAME =~ m/_\s\s/) {
print "=> ATTEMPTING TO FIX NOW\n";
$FILENAME =~ s/_\s\s/, /g;
$REMOTE_FILESIZE = $ftp->size("/PATH/TO/DATA/$SHORT_PERMIT/$PERMIT/$FILENAME");
if (!defined($REMOTE_FILESIZE)) {
print "$FILENAME\n";
print "=> FAILED!\n";
}
}
else {
print "=> ALL ATTEMPTS TO RESOLVE THE ISSUE HAVE FAILED.\n";
next;
}
}
$REMOTE_FILESIZE = $ftp->size("/PATH/TO/DATA/$SHORT_PERMIT/$PERMIT/$FILENAME");
print "FILENAME: $FILENAME\n";
--- SNIP SNIP - MORE DATA, NOT RELEVANT--
The output I get is the name of the file that was originally opened, not the value after modification by the substitutions in the if statements.
This is error correction: it checks the filename that it gets from the file, and if it matches something, it corrects it, and in the end, I have an old variable $FILENAME with a new value that I want to use outside the if.
I struggled to understand the code in your question, mainly because of all the upper case strings; both identifiers and comments
Local identifiers in most languages, including Perl, are usually written using lower-case letters, digits, and the underscore _. Capital letters are reserved for global variables, and in Perl's case that is mostly package (class) names
Identifiers of important variables use capital letters so that they stand out from the rest; but in your example most identifiers are all-capitals. It is diffifult to understand the structure of your code if every identifier says it's very important
I have looked carefully at the code you show, and have done my best to reorganise and simplify it so that it is more legible and does what I think you intended. I have added declarations for all of the variables that are not declared within your code sample, and have added what I think are the correct closing braces } to balance the syntax so that it will compile
As others have noted, the second conditional clause that tests for underscore followed by two spaces will never be executed, because the preceding test for underscore followed by one space will already have caught that case. It is also pointless to use a separate pattern match to determine whether a substitution s/// will succeed, as the substitution alone does nothing and returns a false value if there was no match
The intention is to help you to write Perl code that you can understand, as well as others who you may ask for help (like Stack Overflow) or may be given the job of maintaining your software.
use strict;
use warnings;
my ($progressbar, $line_count);
my ($datafile, $ftpbase, $short, $tiff_dl_location);
my ($throttlelvl, $debuglvl);
my #input;
## Variables
my $filename = $input[0];
open my $info, $datafile or die "can't open <$datafile> for reading $!";
{
while (<$info>) {
chomp;
my ($filename, $permit) = split /:/;
my $filesize = -s "$tiff_dl_location/${permit}_${filename}";
my $short_permit = substr $permit, 0, 2;
### DEBUG ONLY ###
print "$filename / $permit / $ftpbase/$short_permit/$permit/$filename\n";
my $file_path = "/PATH/TO/DATA/$short_permit/$permit/$filename";
my $ftp = Net::FTP::Throttle->new(
"example.com",
MegabitsPerSecond => $throttlelvl,
Debug => $debuglvl
) or die "Cannot connect: $#";
$ftp->login(qw/ anonymous anonymous /) or die "Cannot login: ", $ftp->message;
$ftp->binary or die "Unable to set binary mode ", $ftp->message;
if ($progressbar == 1) {
print "\n[$./$line_count] Downloading $filename\n";
my $remote_filesize = $ftp->size($file_path);
if (not defined $remote_filesize) {
warn "=> File does not appear to exist on FTP server\n";
warn "=> Attempting to fix now\n";
$filename =~ s/_\s+/, /g;
$remote_filesize = $ftp->size($file_path);
if (not defined $remote_filesize) {
warn "=> ALL ATTEMPTS TO RESOLVE THE ISSUE HAVE FAILED.\n";
next;
}
}
print "File name: $filename\n";
# --- snip snip - more data, not relevant ---
}
}
}
Declare variables outside of (before) the conditional unless they will only be used inside of that conditional block {}.
Also,
use strict;
use warnings;

A simple perl program to send data from different files

I wrote a perl program which send the updated data from a file to remote server periodically. But now i want it to read it from differnt files and send the updated data such that the reciever should know how to seperate the data from the mixed data. Do i just need to put some kind of delimiter? Is there any standards already there for such things?
#############
#Change parameters
############
$PeerAddr='192.168.0.7';
$PeerPort='7070';
##############
# Import packages
##############
use Text::Diff;
use IO::Socket;
#############
# Define global variables
#############
$lineCount=0;
$loopCount=0;
our $stats2 = 0;
for($count = 0; $count <= 10000; $count++){
my $data_dir="archive/otat/*dat";
my $data_file= `ls -t $data_dir | head -1`;
chomp($changed_data_file);
print "old data_file is $changed_data_file \n";
chomp($data_file);
if($data_file ne $changed_data_file){
$lineCount2=0;
$changed_data_file=$data_file;
print ("String:$data_file :$changed_data_file are not equal\n");
}
while(defined($data_file)){
print "$data_file \n";
open (DAT,$data_file) || die("Could not open file! $!");
#iofile = <DAT>;
$lineCount = #iofile;
splice(#diffLines);
print "printing: $lineCount\n";
print "printing 2: $lineCount2 \n";
chomp $lineCount;
chomp $lineCount2;
if($lineCount != $lineCount2){
$j=0;
for($i=$lineCount2;$i <= $lineCount; $i++){
$diffLines[$j] = $iofile[$i];
$j++;
}
$num=#diffLines;
print "count of diff lines:$num\n";
$lineCount2 = $lineCount;
$loopCount=0;
}
if($loopCount>2){
$loopCount=0;
print "Look for recent file \n";
last;
}
$loopCount++;
sleep(5);
############################
&socket_con(#diffLines);
}
}
#### Methods/Functions
sub socket_con {
if ($sock== 0){
$sock = new IO::Socket::INET (
PeerAddr => $PeerAddr,
PeerPort => $PeerPort,
Proto => 'tcp'
);
die "Could not create socket: $!\n" unless $sock;
}
print $sock #_;
#close($sock);
}
I've used JSON a lot with good results http://metacpan.org/pod/JSON You can store your data in a hash, serialize it, send the text to the client and have it turn the string back into a Perl hash for easy use. For example:
# on the server
use JSON;
...
# store changed lines in a hash
$diffLines->{$data_file}[$j]=$io_file[$i];
...
# Serialize the hash reference into a string which you then send to the client
$diffLinesSerialized = encode_json $diffLines;
# on the clinet
use JSON;
...
# convert received data from serialized string into hash
$diffLines = decode_json $diffLinesSerialized;
# $diffLines is now a has reference which can be accessed like normal
foreach my $data_file (keys %$diffLines) {
foreach my $line (#{$diffLines->{$data_file}}) {
...
}
}
All this being said, as much as I don't really like XML from a programming perspective, it is a prevalent standard for this kind of thing. If this is just a specialized, internal tool that won't grow into something larger it probably doesn't matter, but if you think this could turn into a more general service, say for non-perl clients, it would be good to consider XML as an option. Programming from a service-oriented perspective can make it easier to grow things down the road.

How can I extract U.S. DoD contract information to use in statistical analysis?

I'm trying to scrape and analyze the contracts the defense department gets, correlating it with other economic data I've already got. It's all publicly available on Defense.gov.
However, they don't list it in a table, rather the relevant information (Contractor, Date, Name, Contract ID, etc) are written in paragraph form. I've been trying to get the data into a CSV so I can run it through R.
Normally I'd just extract based on the tags around the data, but can anyone recommend a simpler way of getting at this data? I've already pulled the data using wget, but I'm just trying to extract it.
This is an example of a typical paragraph:
Booz Allen Hamilton, Inc., Herndon, Va., is being awarded a $9,450,189 cost-plus-fixed-fee, indefinite-delivery, requirements contract for research and development in order to complete/deliver the assessment of army warfighting challenges and integrated learning plans, the experiment final reports, and experiment-to-action plans. The U.S. Army will use these reports to develop and revise Army concepts and contribute to other services and joint concepts; make recommendations for the development of Army and joint capabilities development scenarios; research current and future warfare through experimentation; and build models and simulations to test new warfighting ideas. ESG/PKS DTIC, Offut Air Force Base, Neb., is the contracting activity (SP0700-03-D-1380, Delivery Order: 0452).
I started with a Perl script, but the extraction isn't working out so well. I'm curious if anyone's built a script that's more dynamic that I can build off of rather then rebuilding from scratch.
#!/usr/bin/perl -w
use Spreadsheet::WriteExcel;
# Create a new workbook called simple.xls and add a worksheet.
my $workbook = Spreadsheet::WriteExcel->new('Dec4_min.xls');
my $worksheet = $workbook->add_worksheet();
our $row = 0;
#files = <~/Def_Contracts/*.*>;
foreach $HTML (#files) { # open each file in folder #$HTML = "contract.html";
open (HTML) or die "Can't open the file!";
#fullpage = <HTML>;
print "fullpage array size = ", #fullpage. "\n";
my #cleaned; # this is a simplified array we will create
foreach $curr (#fullpage){ #this for each loop cuts array elements without dollar signs
# [0-9]+?\/[0-9]+?\/[0-9]{3}
if($curr =~ m/content="([0-9]+?\/[0-9]+?\/[0-9]{4})/) { #get date - looking for this: content="8/29/1995"
print $1;
# if ($currnt =~ m/([0-9]+,.[0-9]{4}/){ # extract date dd,(space)dddd
our $date = $1;
}
# CLEAN UP
while(substr($curr,0,1) =~ m/[^\w]/){ # while not a word char
substr($curr,0,1)=''; #cut that char
}
if($curr =~ m/\$[0-9]/) { # only use if has $number.
####################### Now we've got what we need, output relevant parts into excel.
my $firstcom = index($curr, ',');
$name = substr($curr,0,$firstcom);
# print "Name:", $name. "\n";
$worksheet->write($row,0,$name); # print the name in the first col
$worksheet->write($row,1,$date); # print the date in the 2nd col
if($curr =~ m/\$([0-9,]*)/) { # finds the cost PROBLEM: there may be more than one
# print "Cost:", $1. "\n";
$worksheet->write($row,2,$1);
}
if($curr =~ m/([A-Za-z0-9][A-Z0-9]{4}[A-Z0-9]?\-[0-9]+\-[A-Z]\-[A-Z0-9]{4})/) { # print ref # in 3rd col
# print "Cost:", $1. "\n";
$worksheet->write($row,3,$1); # ref takes form (letter ...-...-...number)
}
# 2nd attempt to get ref #
if($curr =~ m/\((.*\-.*\-.*)\)/){ # print ref # in 4rd col
# print "Cost:", $1. "\n";
$worksheet->write($row,4,$1); # ref takes form (letter ...-...-...number)
}
$worksheet->write($row,5,$curr); # print full record (for verification!)
$row ++;
} # close for if has a number statement
} # close foreach line of HTML Page
#print "cleaned array size = ", #cleaned. "\n";
print "The end.\n";
close (HTML);
} # End of foreach file
Obviously, very incomplete, but then, normally it takes a significant amount of cash to convince me to deal with this kind of mess (VIEWSTATE, really?):
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TokeParser::Simple;
use Regexp::Common qw( number );
my $parser = HTML::TokeParser::Simple->new('contract.html');
my %contracts;
while (my $tag = $parser->get_tag('p')) {
if (defined( my $align = $tag->get_attr('align')) ) {
my $text = get_text_in_p($parser);
next unless defined $text;
if (lc($text) eq 'contracts') {
process_contracts($parser, \%contracts);
}
}
}
use YAML;
print Dump \%contracts;
sub process_contracts {
my ($parser, $contracts) = #_;
my $current_dept = '';
while (my $tag = $parser->get_tag('p')) {
my $text = get_text_in_p($parser);
next unless defined $text;
if (defined $tag->get_attr('align')) {
$current_dept = $text;
next;
}
my ($company) = ($text =~ /^(.+?), (?:is|was)/);
my ($amount) = ($text =~ m{
(
\$
$RE{num}{int}{-base => 10}{-sep => ','}
)
}x
);
my ($contract) = ($text =~ m{
(
[A-Z0-9]{6}
[A-Z0-9/-]+
)
}x
);
push #{ $contracts->{$current_dept} }, {
company => $company,
amount => $amount,
contract => $contract,
# text => $text,
};
}
}
sub get_text_in_p {
my ($parser) = #_;
my $text = $parser->get_text('/p');
return unless defined $text;
$text =~ s/^[^A-Z]+//;
$text =~ s/\s+\z//;
return $text;
}
Output for No. 1001-11:
---
DEFENSE LOGISTICS AGENCY:
- amount: '$49,418,113'
company: 'Physio-Control, Inc., Redmond, Wash.'
contract: SPM200-07-D-8261/P00005
- amount: '$43,246,524'
company: 'Johnson & Johnson Healthcare Systems, on behalf of Ortho-McNell-Janssen Pharmaceuticals, Inc., Piscataway, N.J.'
contract: SPM2D0-12-D-0001
- amount: '$15,240,054'
company: 'Patterson Dental Supply, Inc., Minn.'
contract: SPM2DE-10-D-7447/P00005
NAVY:
- amount: '$60,360,995'
company: 'Raytheon Co., Integrated Defense Systems, San Diego, Calif.'
contract: N00024-11-C-2404
- amount: '$33,693,891'
company: 'Wyle Laboratories, Inc., Huntsville, Ala.'
contract: N00421-03-D-0015
- amount: '$30,071,729'
company: 'Deloitte Consulting, L.L.P., Lexington Park, Md.'
contract: N00421-03-D-0014
- amount: '$22,151,900'
company: 'Raytheon Co., Tucson, Ariz.'
contract: N00024-08-C-5401
- amount: '$18,508,325'
company: 'Canadian Commercial Corp., General Dynamics Land Systems - Canada, Ontario, Canada'
contract: M67854-07-D-5028
Looking at a few entries, I suspect these paragraphs are entered manually using a bunch of boilerplate templates. (The different branches / agencies seem to have their own formats; for example, the Air Force and the Navy write "is being awarded", while the Army and the DLA use "was awarded", and some other agencies have their own peculiar variants.)
Thus, it seems unlikely that you can write code to parse all the entries reliably. The best you can do is probably to write a bunch of regexps to parse most (say, 99% or so) of them, and flag the rest for manual processing.
I'm too tired to write a more detailed answer right now, but I'd suggest starting with something like this:
my (#records, #rejects);
foreach (split /\n\s*\n/, $text) { # split page into paragraphs
s#\s+# #g; s#^ ##; s# $##; # normalize whitespace
my (%data, #m);
if (#m = /^(.*), (?:is being|was) awarded a \$([0-9,]+) (.*) contract to (.*)\. (.*), is the contracting activity(?: \(([A-Z]{2}\d{4}-\d\d-[A-Z]-\d{4})(?:, Delivery Order: (\d+))?\))?\.$/) {
#data{qw(company amount contract_type purpose activity contract_id delivery_order)} = #m;
}
elsif (#m = /^...$/) {
#data{qw(...)} = #m;
}
# ...
else {
push #rejects, $_;
next;
}
push #records, \%data;
}
Then go through the entries being rejected, add new regexps (or adjust existing ones) to handle the most common types among them, and repeat. Also remember to check the parsed output to see if the regexps are working correctly, of course.

How to alter email2sms script to search for qualifier WARNING: blatant Plagiarism

I found a perl script that checks an email account and forwards the contents to a gsm phone. It uses below code to determine the body of the email. This can be different for each email package so doesn't really work. I was going to have a # at the beginning of the email body instead, how would go about doing this?
sub ProcessEmail
{
# Assign parameter to a local variable
my (#lines) = #_;
my $body_start = 'FALSE';
$sms_body = "";
# Declare local variables
my ($from, $line, $sms_to);
# Check each line in the header
foreach $line (#lines)
{
print $line;
if($line =~ m/^From: (.*)/)
{
# We found the "From" field, so let's get what we need
$from = $1;
$from =~ s/"|<.*>//g;
$from = substr($from, 0, 39); # This gives us the 'From' Name
}
elsif( $line =~ m/^Subject: (.*)/)
{
# We found the "Subject" field. This contains the No to send the SMS to.
$sms_to = $1;
$sms_to = substr($sms_to, 0, 29);
if ($sms_to =~ /^[+]?\d+$/ ) # here we check if the subject is a no. If so we proceed.
{
print "Got email. Subject is a number. Processing further\n";
}
else #Otherwise we delete the message and ignore it.
{
print "Got email. Subject is NOT a number. Ignoring it. \n";
return;
}
}
elsif(( $line =~ m/^Envelope-To:/)||($body_start eq 'TRUE')) # This is the last line in the email header
{ # after this the body starts
if($body_start ne 'FALSE')
{
$sms_body = $sms_body . $line;
}
$body_start='TRUE';
}
}
# At this point we know the Subject, From and Body.
# So we can send the SMS out to the provided no.
$sms_body = "SMS via Email2SMS from $from: " . $sms_body;
# You can only send SMS in chunks of 160 chars Max according to gnokii.
# so breaking the body into chunks of 160 and sending them 1 at a time.
print $sms_to;
print $sms_body;
This is something you'll be able to avoid re-inventing by using a module from CPAN to handle it.
At a quick glance, Mail::Message::Body from the Mail::Box distribution looks like it should probably do the job. See also Email::Abstract.
Altered to search for a string that was specific to the mailer. Nasty and will only work from a specific mail sender, but it worked

just can't get perl working as expected ( conditionals and variable declaring )

EDIT:
I will try a better explication this time, this is the exact code from my script (sorry for all them coments, they are a result of your sugestions, and apear in the video below).
#use warnings;
#use Data::Dumper;
open(my $tmp_file, ">>", "/tmp/some_bad.log") or die "Can not open log file: $!\n";
#if( $id_client != "")
#allowed_locations = ();
#print $tmp_file "Before the if: ". Data::Dumper->Dump([\#allowed_locations, $id_client]) . "";
if( $id_client )
{
# print $tmp_file "Start the if: ". Data::Dumper->Dump([\#allowed_locations, $id_client]) . "";
# my $q = "select distinct id_location from locations inner join address using (id_db5_address) inner join zona_rural_detaliat using (id_city) where id_client=$id_client";
# my $st = &sql_special_transaction($sql_local_host, $sql_local_database, $sql_local_root, $sql_local_root_password, $q);
# print $tmp_file "Before the while loop: ref(st)='". ref($st) . "\n";
# while((my $id)=$st->fetchrow())
# {
# print $tmp_file "Row the while loop: ". Data::Dumper->Dump([$id]) . "";
# my $id = 12121212;
# push(#allowed_locations, $id);
# }
# print $tmp_file "After the while loop: ref(st)='". ref($st) . "\n";
# my($a) = 1;
#} else {
# my($a) = 0;
}
#print $tmp_file "After the if: ". Data::Dumper->Dump([\#allowed_locations, $id_client]) . "";
close($tmp_file) or die "Can not close file: $!\n";
#&html_error(#allowed_locations);
First off all, somebody said that I should try to run it in command line, the script works fine in command line (no warnings, It was uncommented then), but when triyng to load in via apache in the browser it fails, please see this video where I captured the script behavior, what I tried to show in the video:
I have opened 2 tabs the first doesn't define the variable $id_client, the second defines the variable $id_client that is read from GET: ?id_client=36124 => $id_client = 36124; , both of them include the library in the video "locallib.pl"
When running the script with all the
new code commented the page loads
when uncoment the line that defines
the #allowed_locations = (); the
script fails
leave this definition and uncoment
the if block, and the definition of
my $a; in the if block; Now the script works fine when $id_client is
defined, but fails when $id_client
is not defined
Uncoment the else block and the
definition of my $a; in the else
block. Now the script works fine
with or without $id_client
now comment all the my $a;
definisions and comment the else
block, the script fails
but if I'm using open() to open
a file before the IF, and
close() to close it after the if it does't fail even if the IF block
is empty and event if there is no
else block
I have replicated all the steps when running the script in the command line, and the script worked after each step.
I know it sounds like something that cannot be the behavior of the script, but please watch the video (2 minutes), maybe you will notice something that I'm doing wrong there.
Using perl version:
[root#db]# perl -v
This is perl, v5.8.6 built for i386-linux-thread-mult
Somebody asked if I don't have a test server, answer: NO, my company has a production server that has multiple purposes, not only the web interface, and I cannot risk to update the kernel or the perl version, and cannot risk instaling any debuger, as the company owners say: "If it works, leave it alone", and for them the solution with my ($a); is perfect beacause it works, I'm asking here just for me, to learn more about perl, and to understand what is going wrong and what can I do better next time.
Thank you.
P.S. hope this new approach will restore some of my -1 :)
EDIT:
I had success starting the error logging, and found this in the error log after each step that resulted in a failure I got this messages:
[Thu Jul 15 14:29:19 2010] [error] locallib.pl did not return a true value at /var/www/html/rdsdb4/cgi-bin/clients/quicksearch.cgi line 2.
[Thu Jul 15 14:29:19 2010] [error] Premature end of script headers: quicksearch.cgi
What I found is that this code is at the end of the main code in the locallib.pl after this there are sub definitions, and locallib.pl is a library not a program file, so it's last statement must returns true. , a simple 1; statement at the end of the library ensures that (I put it after sub definitions to ensure that noobody writes code in the main after the 1;) and the problem was fixed.
Don't know why in CLI it had no problem ...
Maybe I will get a lot of down votes now ( be gentle :) ) , but what can I do ...and I hope that some newbies will read this and learn something from my mistake.
Thank you all for your help.
You need to explicitly check for definedness.
If you want to enter the loop when $client is defined,
use if ( defined $client ).
If you want to enter the loop when $client is defined and a valid integer,
use if ( defined $client && $client =~ /^-?\d+$/ ).
I assume it's an integer from the context, if it can be a float, the regex needs to be enhanced - there's a standard Perl library containing pre-canned regexes, including ones to match floats. If you require a non-negative int, drop -? from regex's start.
If you want to enter the loop when $client is defined and a non-zero (and assuming it shouldn't ever be an empty string),
use if ( $client ).
If you want to enter the loop when $client is defined and a valid non-zero int,
use if ( $client && $client =~ /^-?\d+$/ ).
Your #ids is "undef" when if condition is false, which may break the code later on if it relies on #ids being an array. Since you didn't actually specify how the script breaks without an else, this is the most likely cause.
Please see if this version works (use whichever "if" condition from above you need, I picked the last one as it appears to match the closest witrh the original code's intent - only enter for non-zero integers):
UPDATED CODE WITH DEBUGGING
use Data::Dumper;
open(my $tmp_file, ">", "/tmp/some_bad.log") or die "Can not open log file: $!\n";
#ids = (); # Do this first so #ids is always an array, even for non-client!
print $tmp_file "Before the if: ". Data::Dumper->Dump([\#ids, $client]) . "\n";
if ( $client && $client =~ /^-?\d+$/ ) # First expression catches undef and zero
{
print $tmp_file "Start the if: ". Data::Dumper->Dump([\#ids, $client]) . "\n";
my $st = &sql_query("select id from table where client=$client");
print $tmp_file "Before the while loop: ref(st)='". ref($st) . "'\n";
while(my $row = $st->fetchrow())
{
print $tmp_file "Row the while loop: ". Data::Dumper->Dump([row]) . "'\n";
push(#ids, $row->[0]);
}
print $tmp_file "After the while loop: ref(st)='". ref($st) . "'\n";
# No need to undef since both variables are lexically in this block only
}
print $tmp_file "After the if\n";
close($tmp_file) or die "Can not close file: $!\n";
when checking against a string, == and != should be respectively 'eq' or 'ne'
if( $client != "" )
should be
if( $client ne "" )
Otherwise you don't get what you're expecting to get.
Always begin your script with :
use warnings;
use strict;
these will give you usefull informations.
Then you could write :
my #ids;
if (defined $client) {
#ids = (); # not necessary if you run this part only once
my $st = sql_query("select id from table where client=$client");
while( my ($id) = $st->fetchrow ) {
push #ids, $id;
}
} else {
warn '$client not defined';
}
if (#ids) { # Your query returned something
# do stuff with #ids
} else {
warn "client '$client' does not exist in database";
}
Note: this answer was deleted because I consider that this is not a real question. I am undeleting it to save other people repeating this.
Instead of
if( $client != "" )
try
if ($client)
Also, Perl debugging is easier if you
use warnings;
use strict;
What I found is that this code is at the end of the main code in the locallib.pl after this there are sub definitions, and locallib.pl is a library not a program file, so it's last statement must returns true, a simple 1; statement at the end of the library ensures that (put it after sub definitions to ensure that noobody writes code in the main after the 1;) and the problem was fixed.
The conclusion:
I have learned that every time you write a library or modify one, ensure that it's last statment returns true;
Oh my... Try this as an example instead...
# Move the logic into a subroutine
# Forward definition so perl knows func exists
sub getClientIds($);
# Call subroutine to find id's - defined later.
my #ids_from_database = &getClientIds("Joe Smith");
# If sub returned an empty list () then variable will be false.
# Otherwise, print each ID we found.
if (#ids_from_database) {
foreach my $i (#ids_from_database) {
print "Found ID $i \n";
}
} else {
print "Found nothing! \n";
}
# This is the end of the "main" code - now we define the logic.
# Here's the real work
sub getClientIds($) {
my $client = shift #_; # assign first parameter to var $client
my #ids = (); # what we will return
# ensure we weren't called with &getClientIds("") or something...
if (not $client) {
print "I really need you to give me a parameter...\n";
return #ids;
}
# I'm assuming the query is string based, so probably need to put it
# inside \"quotes\"
my $st = &sql_query("select id from table where client=\"$client\"");
# Did sql_query() fail?
if (not $st) {
print "Oops someone made a problem in the SQL...\n";
return #ids;
}
my #result;
# Returns a list, so putting it in a list and then pulling the first element
# in two steps instead of one.
while (#result = $st->fetchrow()) {
push #ids, $result[0];
}
# Always a good idea to clean up once you're done.
$st->finish();
return #ids;
}
To your specific questions:
If you want to test if $client is defined, you want "if ( eval { defined $client; } )", but that's almost certainly NOT what you're looking for! It's far easier to ensure $client has some definition early in the program (e.g. $client = "";). Also note Kaklon's answer about the difference between ne and !=
if (X) { stuff } else { } is not valid perl. You could do: if (X) { stuff } else { 1; } but that's kind of begging the question, because the real issue is the test of the variable, not an else clause.
Sorry, no clue on that - I think the problem's elsewhere.
I also echo Kinopiko in recommending you add "use strict;" at the start of your program. That means that any $variable #that %you use has to be pre-defined as "my $varable; my #that; my %you;" It may seem like more work, but it's less work than trying to deal with undefined versus defined variables in code. It's a good habit to get into.
Note that my variables only live within the squiggliez in which they are defined (there's implicit squiggliez around the whole file:
my $x = 1;
if ($x == 1)
{
my $x = 2;
print "$x \n"; # prints 2. This is NOT the same $x as was set to 1 above.
}
print "$x \n"; # prints 1, because the $x in the squiggliez is gone.