How can I get a full Mail::SpamAssassin::MailMessage object from text?

How can I get a full Mail::SpamAssassin::MailMessage object from text? - perl

I use the following code to generate a spam report using SpamAssassin:
use Mail::SpamAssassin;
my $sa = Mail::SpamAssassin->new();
open FILE, "<", "mail.txt";
my #lines = <FILE>;
my $mail = $sa->parse(#lines);
my $status = $sa->check($mail);
my $report = $status->get_report();
$report =~ s/\n/\n<br>/g;
print "<h1>Spam Report</h1>";
print $report;
$status->finish();
$mail->finish();
$sa->finish();
The problem I have is that it classifies 'sample-nonspam.txt' as spam:
Content preview: [...]
Content analysis details: (6.9 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS Informational: message was not relayed via SMTP
1.2 MISSING_HEADERS Missing To: header
0.1 MISSING_MID Missing Message-Id: header
1.8 MISSING_SUBJECT Missing Subject: header
2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
Subject: text
-0.0 NO_RECEIVED Informational: message has no Received headers
1.4 MISSING_DATE Missing Date: header
0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822 headers
And that information -is- in the file. What worries me is that in the documentation, it states "Parse will return a Mail::SpamAssassin::Message object with just the headers parsed.". Does that mean it will not return a full message?

You're missing a single character:
my $mail = $sa->parse(\#lines);
From the docs (with emphasis added):
parse($message, $parse_now [, $suppl_attrib])
Parse will return a Mail::SpamAssassin::Message object with just the headers parsed. When calling this function, there are two optional parameters that can be passed in: $message is either undef (which will use STDIN), a scalar of the entire message, an array reference of the message with 1 line per array element, or a file glob which holds the entire contents of the message; and $parse_now, which specifies whether or not to create the MIME tree at parse time or later as necessary.
With the change above, I get the following output (HTML stripped):
pts rule name description
---- ---------------------- --------------------------------------------------
-2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
[score: 0.0000]
As the docs mention, parse is flexible. You could instead use
my $mail = $sa->parse(join "" => <FILE>); # scalar of the entire message
or
my $mail = $sa->parse(\*FILE); # a file glob with the entire contents
or
my $mail;
{ local $/; $mail = $sa->parse(<FILE>) } # scalar of the entire message
or even
open STDIN, "<", "mail.txt" or die "$0: open: $!";
my $mail = $sa->parse(undef); # undef means read STDIN
You'd remove my #lines = <FILE> for these last four examples to function as expected.

This is the right way to construct a Message:
my $mail = Mail::SpamAssassin::Message->new({ "message" => $content });

Related

Reading from two files (one raw, one XMP) with ExifTool

I am new to PERL and even newer to ExifTool—and am therefore likely missing something quite basic.
The goal is to read XMP fields from a photo file. Looking at the exiftool documentation on both the ExifTool site and CPAN, I was able to read tagged jpeg and the XMP sidecar files, both without issues.
The problem is when I read from a raw file—which obviously doesn't have custom fields—I would get an error with an uninitialized value. That is to be expected.
So, I want to have code that says "if you read a field/tag from the raw file and it isn't there, look at the associated XMP file, and if that fails, return a blank string."
I therefore tried to open a second instance of ExifTool, such as:
my $exifInfo = ImageInfo($filePath);
goes to
my $exifInfoXMP = ImageInfo($filePathXMP);
But that keeps failing. If I read the XMP directly from the get-go, it works just fine, so I am getting the impression that I cannot read two ExifTool structures at the same time (which can't be right; I have to be the error here). The code below works, but I cannot "interleave" the conditionals on the two files. I have to process the raw first, then run a second pass with a new handler for the XMP. Knowing how efficient PERL is, my approach cannot possibly be a good one (even though it does the job).
In particular, there is one line that puzzles me. If I remove it, nothing works. (it should be well marked).
$filePath =~ s/$photoExtensions$/.XMP/i;
That line essential does the same as reading the XMP from the get-go (not my ideal solution).
Anyone have an idea as to where I am messing up?
Thanks,
Paul
header [EDITED TO SHOW ALL OPTIONS; HAD SHOWN ALL USED IN QUESTION]
#!/usr/bin/perl
# load standard packages
use strict;
use warnings;
use Data::Dumper;
use File::Find;
no warnings 'File::Find';
use Image::ExifTool ':Public';
# define proxy for ExifTool
my $exifTool = new Image::ExifTool;
my $exifToolXMP = new Image::ExifTool;
# turn on immediate updates
$|=1;
# common extensions that I want to recognize
my $photoExtensions = "\.(jpg|crw|cr2|cr3|rw2|orf|raw|nef|arw|dng)";
my $imageExtensions = "\.(tiff|tif|psd|png|eps|hdr|exr|svg|gif|afphoto|pdf)";
my $videoExtensions = "\.(flv|vob|ogv|avi|mts|m2ts|mov|qt|wmv|mp4|m4p|m4v|svi|3gp|3g2)";
my $audioExtensions = "\.(aiff|aac|wav|mp3|m4a|m4p|ogg|wma)";
my $appFileExtensions = "\.(on1|cos|cof)";
my $GPSFileExtensions = "\.(gpx|kml|kmz|log)";
# start main program
main();
routine in question
sub listKeywords {
print "Reads and displays file information from certain tags (typically set in Photomechanic):\n";
print "\t1. Subject\n";
print "\t2. Hierarchical Subject\n";
print "\t3. Supplemental Categories\n";
print "\t4. Label Name 1\n";
print "\t5. Label Name 2\n";
print "\t6. Label Name 3\n";
print "\t7. Label Name 4\n\n";
print "List Keywords ---\n\tEnter file name (with path) --> ";
my $filePath = <STDIN>;
chomp $filePath;
$filePath =~ s/\\//g;
$filePath =~ s/\s+$//;
########################################################
# COMMENT OUT THE FOLLOWING LINE AND NOTHING WORKS;
# $filePathXMP should be defined anyway, which suggests to
# me that the second invocation of ImageInfo doesn't actually occur.
# But I don't understand why.
$filePath =~ s/$photoExtensions$/.XMP/i;
print "\n\n";
my $filePathXMP = $filePath;
$filePathXMP =~ s/$photoExtensions$/.XMP/i; # TO FIX: filename may not have uppercase extension
# Get Exif information from image file
my $exifInfo = $exifTool->ImageInfo($filePath);
# my $exifInfoXMP = $exifToolXMP->ImageInfo($filePath =~ s/$photoExtensions$/.XMP/gi);
print "XMP Sidecar: \[$filePathXMP\]\n\n";
########################################################
# Get Specific Tag Value
my $hierarchicalSubject = $exifTool->GetValue('HierarchicalSubject');
my $subject = $exifTool->GetValue('Subject');
my $supplementalCategories = $exifTool->GetValue('SupplementalCategories');
my $labelName1 = $exifTool->GetValue('LabelName1');
my $labelName2 = $exifTool->GetValue('LabelName2');
my $labelName3 = $exifTool->GetValue('LabelName3');
my $labelName4 = $exifTool->GetValue('LabelName4');
my $exifInfo = ImageInfo($filePathXMP);
if (not defined $hierarchicalSubject) {$hierarchicalSubject = $exifTool->GetValue('HierarchicalSubject');}
if (not defined $hierarchicalSubject) {$hierarchicalSubject = "";}
if (not defined $subject) {$subject = $exifTool->GetValue('Subject');}
if (not defined $subject) {$subject = "";}
if (not defined $supplementalCategories) {$supplementalCategories = $exifTool->GetValue('SupplementalCategories');}
if (not defined $supplementalCategories) {$supplementalCategories = "";}
if (not defined $labelName1) {$labelName1 = $exifTool->GetValue('LabelName1');}
if (not defined $labelName1) {$labelName1 = "";}
if (not defined $labelName2) {$labelName2 = $exifTool->GetValue('LabelName2');}
if (not defined $labelName2) {$labelName2 = "";}
if (not defined $labelName3) {$labelName3 = $exifTool->GetValue('LabelName3');}
if (not defined $labelName3) {$labelName3 = "";}
if (not defined $labelName4) {$labelName4 = $exifTool->GetValue('LabelName4');}
if (not defined $labelName4) {$labelName4 = "";}
print "Subject:\n------------------------------\n$subject\n\n";
print "Hierarchical Subject:\n------------------------------\n$hierarchicalSubject\n\n";
print "Supplemental Categories:\n------------------------------\n$supplementalCategories\n\n";
print "Label Name 1:\n------------------------------\n$labelName1\n\n";
print "Label Name 2:\n------------------------------\n$labelName2\n\n";
print "Label Name 3:\n------------------------------\n$labelName3\n\n";
print "Label Name 4:\n------------------------------\n$labelName4\n\n";
}

As your code is incomplete, I have to ask: did you make sure to start your script with the following lines?
use strict;
use warnings;
Those two lines are not there to annoy you, they will protect you from simple mistakes you might have made in your code.
IMHO the real problem with your sub listKeywords() is the following line:
my $exifInfo = ImageInfo($filePathXMP);
There are two problems here:
you redefine the variable $exifInfo from a few lines before.
you are not using the OO approach for the 2nd image info.
I think what you intended to write was the following line:
my $exifInfoXMP = $exifToolXMP->ImageInfo($filePathXMP);

perl - fetch column names from file

I have the following command in my perl script:
my #files = `find $basedir/ -type f -iname '$sampleid*.summary.csv'`; #there are multiple summary.csv files in my basedir. I store them in an array
my $summary = `tail -n 1 $files[0]`; #Each summary.csv contains a header line and a line with data. I fetch here the last line.
chomp($summary);
my #sp = split(/,/,$summary); # I split based on ','
my $gender = $sp[11]; # the values from column 11 are stored in $gender
my $qc = $sp[2]; # the values from column 2 are stored in $gender
Now, I'm experiencing the situation where my *summary.csv files don't have the same number of columns. They do all have 2 lines, where the first line represents the header.
What I want now is not storing the values from column 11 in gender, but I want to store the values from the column 'Gender' in $gender.
How can I achieve this?
First try at solution:
my %hash = ();
my $header = `head -n 1 $files[0]`; #reading the header
chomp ($header);
my #colnames = split (/,/,$header);
my $keyfield = $colnames[#here should be the column with the name 'Gender']
push #{ $hash{$keyfield} };
my $gender = $sp[$keyfield]

You will have to read the header line as well as the data to know what column holds which information. This is done easiest by writing actual Perl code instead of shelling out to various command line utilities. See further below for that solution.
Fixing your solution also requires a hash. You need to read the header line first, store the header fields in an array (as you've already done), and then read the data line. The data needs to be a hash, not an array. A hash is a map of keys and values.
# read the header and create a list of header fields
my $header = `head -n 1 $files[0]`;
chomp ($header);
my #colnames = split (/,/,$header);
# read the data line
my $summary = `tail -n 1 $files[0]`;
chomp($summary);
my %sp; # use a hash for the data, not an array
# use a hash slice to fill in the columns
#sp{#colnames} = split(/,/,$summary);
my $gender = $sp{Gender};
The tricky part here is this line.
#sp{#colnames} = split(/,/,$summary);
We have declared %sp as a hash, but we now access it with a # sigil. That's because we are taking a hash slice, as indicated by the curly braces {}. The slice we take is all elements with the names of the values in #colnames. There is more than one value, so the return value is not a scalar (with a $) any more. There is a list of return values, so the sigil turns to #. Now we use that list on the left hand side (that's called an LVALUE), and assign the result of the split to that list.
Doing it with modern Perl
The following program will use File::Find::Rule to replace your find command, and Text::CSV to read the CSV file. It grabs all the files, then opens one at a time. The header line will be read first, and fed into the Text::CSV object, so that it can then give back a hash reference, which you can use to access every field by name.
I've written it in a way that it will only read one line for each file, as you said there are only two lines per file. You can easily extend that to be a loop.
use strict;
use warnings;
use File::Find::Rule;
use Text::CSV;
my $sampleid;
my $basedir;
my $csv = Text::CSV->new(
{
binary => 1,
sep => ',',
}
) or die "Cannot use CSV: " . Text::CSV->error_diag;
my #files = File::Find::Rule->file()->name("$sampleid*.summary.csv")->in($basedir);
foreach my $file (#files) {
open my $fh, '<', $file or die "Can't open $file: $!";
# get the headers
my #cols = #{ $csv->getline($fh) };
$csv->column_names(#cols);
# read the first line
my $row = $csv->getline_hr($fh);
# do whatever you you want with the row
print "$file: ", $row->{gender};
}
Please note that I have not tested this program.

Storing a file in a hash - Only stores first line?

I am trying to read a file and store it into a hash. When i print out the contents of the hash only the first line from the file stores.
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dump;
local $/ = "";
my %parameters;
open(my $PARAMS, 'SimParams.conf')
or die "Unable to open file, $!";
while(<$PARAMS>) {
my #temp = split(/:\s*|\n/);
$parameters{$temp[0]} = $temp[1];
}
dd(\%parameters);
exit 0
The dd(\%parameters) shows only the first line of the file as key and value. How can I get all 3 lines to be Key and Value pairings in this hash?
EDIT: SimParams file as requested:
RamSize: 1000
PageSize: 200, 200
SysClock: 1
The datadump gives the output:
{ RamSize => "1000\r" }

The line
local $/ = "";
is reading your 3 line file as 1 chunk, the entire file. If you eliminate that code, your hash should be created.
You should probably chomp your input to remove the newline . Place it in your code before splitting to #temp.
chomp;
Borodin best explains what local $/ = ""; does.

Setting $/ to the null string enables paragraph mode. Each time you read from $PARAMS (which should be $params because it is a local variable) you will be given the next block of data until a blank line is encountered
It looks like there are no blank lines in your data, so the read will return the entire contents of the file
You don't say why you modified the value of $/, but it looks like just removing that assignment will get your code working properly

perl Net::LDAP::LDIF read ldif from variable string

Usually, I load parse ldif from file using following method:
use Net::LDAP::LDIF;
use Net::LDAP::Entry;
use Data::Dumper;
my $ldif = Net::LDAP::LDIF->new( "filename.ldif", "r") or die "file not exits\n";
while( not $ldif->eof ( ) ) {
$entry = $ldif->read_entry ( );
print Dumper $entry;
}
but instead of load from file, I need load the LDIF format file directly from variable string. so the code will look like :
use Net::LDAP::LDIF;
use Net::LDAP::Entry;
use Data::Dumper;
my $var_ldif = "dn: cn=Sheri Smith,ou=people,dc=example,dc=com
objectclass: inetOrgPerson
cn: Sheri Smith
sn: smith
uid: ssmith
userpassword: sSmitH
carlicense: HERCAR 125
homephone: 555-111-2225";
my $ldif = Net::LDAP::LDIF->new( $var_ldif, "r") or die "file not exits\n";
while( not $ldif->eof ( ) ) {
$entry = $ldif->read_entry ( );
print Dumper $entry;
}
so, how to do it corectly?
thanks and sorry for this stupid question. :)
BR//
BACKGROUND IDEA
My goal is to build script that compare LDIF data before and after in detail (from dn until attribute values, one by one). the LDIF data itself is really huge, about 10GB or more per file.
*So I come with idea to read the file per-DN, and compare it before and after. parsing of each DN is stored in $variable_before and $variable_after. that is why I actually need data from $ variable, because the 'LDIF format data' is come from output from previous process. *
I need LDAP::LDIF to make easier to parse LDIF string into perl hashref.
I avoid using temporary file because the "DN data" is really much, will slower in processing if using temporary file.

You can append the data you have to the end of the script and read from the DATA filehandle ( Net::LDAP::LDIF documentation states the first parameter can be a filename or a filehandle )
use Net::LDAP::LDIF;
use Net::LDAP::Entry;
use Data::Dumper;
my $ldif = Net::LDAP::LDIF->new( *DATA, "r") or die "file not exits\n";
while( not $ldif->eof ( ) ) {
$entry = $ldif->read_entry ( );
print Dumper $entry;
}
__DATA__
dn: cn=Sheri Smith,ou=people,dc=example,dc=com
objectclass: inetOrgPerson
cn: Sheri Smith
sn: smith
uid: ssmith
userpassword: sSmitH
carlicense: HERCAR 125
homephone: 555-111-2225
Another solution is to write the contents of the $var_ldif to a temporary file.
Are you sure NET::LDAP::LDIF is the right module for what you want to do ?

You can open a SCALAR ref:
Perl is built using PerlIO by default. Unless you've changed this
(such as building Perl with Configure -Uuseperlio ), you can open
filehandles directly to Perl scalars via:
open(my $fh, ">", \$variable) || ..
And per the Net::LDAP::LDIF documentation:
FILE may be the name of a file or an already open filehandle. If FILE
begins or ends with a | then FILE will be passed directly to open.
So, to answer your question:
open(my $string_fh, '<', $var_ldif) || die("failed to open: $?");
my $ldif = Net::LDAP::LDIF->new($string_fh, 'r', onerror => 'die');

sprintf format and print inconsistencies whilst creating fixed width column

(I have already cross posted onto another site and will update either with the solution but so far struggling with an answer)
19th Dec 2013 7:06pm PT --- I found the solution and I updated below.
I am outputting two items of data per line. The first column of data is not fixed length, and I want the second item of data to be correctly aligned in the same position each time so I am using sprintf to format the data and then mail out the data
The print command output illustrates that the data is formatted correctly.
Yet, when the output in my email is different, the alignment is all wrong.
I initially thought it was the mailer (MIME::Lite) program but I am not sure it is.
Reason why I think that is because I using eclipse Perl environment, when I look at the debug variable list, I see that the strings are padded out exactly like the output in my email, yet the print statement shows the data correctly aligned!!!
Please help me understand what is going on here and how to fix it.
use MIME::Lite;
$smtp = "mailserver";
$internal_email_address = 'myemailaddess';
$a = sprintf ("%-60s %-s\n", "the amount for Apple is ","34");
$b = sprintf ("%-60s %-s\n", "the amount for Lemons is", "7");
print $a;
print $b;
$c = $a.$b;
mailer( $internal_email_address,"issue", $c);
sub mailer {
my ( $addr, $subj, $output ) = #_;
print "$_\n" for $addr;
print "$_\n" for $subj;
print "$_\n" for $output;
$msg = MIME::Lite->new(
From => 'xxxx',
To => $addr,
Subject => $subj,
Data => $output
);
MIME::Lite->send( 'smtp', $smtp, Timeout => 60 );
eval { $msg->send };
$mailerror = "Status: ERROR email - MIME::Lite->send failed: $#\n" if $#;
if ( $mailerror eq '' ) {
$status = "Status: Mail sent\n";
}
else {
$status = $mailerror;
}
}

$a = sprintf ("%-10s %-s\n", "the amount for Apple is ","34");
The argument "the amount for Apple is" is too long for the format specifier %-10s, so the actual amount of space used for that argument will be the length of the string.
You could use a format specifier with a larger value (e.g., %-25s) that can accomodate any value you're likely to apply to it.
Or if you want sprintf to truncate the argument at 10 characters, use the format specifier %-10.10s.