Splitting Emails with MIME::Parser - perl

I got handed 4GB of emails concatenated into a single file and the suggestion that MIME::Parser could split the individual emails back out again. All my attempts to date end up with the parser just copying the original file without extracting any of the emails. So: Is this even something that MIME::Parser can handle? My code is very basic:
my $file = IO::File->new("somefile", O_RDONLY);
my $parser = new MIME::Parser;
$parser->output_dir("somedir");
my $entity = $parser->parse($file);
$file->close;
Below is a link to sample date that some have requested. This is all SPAM and phishing emails. DO NOT CLICK ANY OF THE LINKS. Enjoy: Pastbin of 4KB of emails.

MIME::Parser is for reading a single Mail to get the attachments etc. It can be used to extract mails which are attached inside another mail as message/rfc822, but is is not intended to extract mails from some kind of archive with lots of mails in it concatenated.
It is not clear what format your single file with mails has. But if it comes from a UNIX system or from a Thunderbird installation it might simply be in the classical Mbox format and there are several tools to split Mbox files into separate messages. Apart from several perl modules there are also other tools like git-mailsplit which help you extract the mails from Mbox-format.

Related

How to attach a flat file into informatica Email?

I want to send a mail to some users after workflow is done. This email should contain 3 xls file or 1 xls file with 3 sheets. The xls file contains a query result -less than 50 rows- which is loaded by a task dynamically on each run of the workflow.
So in Informatica I couldn't see an option inside the "Email" task to attach anything. Can you help me please?
You cannot send emails with attachments using the Email task (don't ask me why, I cannot imagine any reason for that and this is how it is).
However, sessions can send emails and these emails may contain attachments - the appropriate options are on the Components tab, the variable you need is %a<>.
More information: How to attach new files to email task?
cat filename | mailx -s "subject" abc#def.com
or
use mutt command to send files as attachments
At session level, in components tab we have option to send mail along with attachments.we have to provide path of the file as well using %a<> built in command.
Example:-
%a

How can I open multiple attachments of the same name in an email, then move the sender of the attachment to a spreadsheet?

I have an internship and was recently assigned the tedious task of cleaning the email lists. My employer has sent me a series of email with email bounces as attachments, many at a time, all with the same name. I have considered ways of doing this most efficiently, I'm looking to avoid just clicking through like a slave. My thoughts were to create a macro using autohotkey's language, but I feel like maybe a batch file or some sort of Perl might do the same thing. Could anybody give me an idea as to how to do this, specifically with a batch file? Thanks in advance!
Mail::DeliveryStatus::BounceParser parses bouncing email addresses out of delivery report messages.
If you don't know any perl, then I recommend that you first convert the mailbox into some format that stores each email in separate text files, like MH or similar.
At that point, you can trivially use the command grep _pattern_ | sed -e 's/:.*//' | sort | uniq > _list_ to obtain lists of all files matching _pattern_. You may inspect/edit this file _list_ to verify that the desired results were obtained.
You may then create another director junk or whatever and move all the files listed in _list_ into junk with a command like perl -e 'chomp; rename($_,"junk");' < _list_.
If you'll need this regularly, then you could automate this further, likely using perl alone, but a one off task will probably involve more messing about with getting the right message list.
Alternatively, you could load all the emails into a single folder in an sane mail reader, like Mac OS X's Mail.app, and do simply search, select all, move/delete commands.

Is there a difference between the Outlook .MSG and .OFT file formats?

This question is somewhat of a long shot, but I've spent hours on it to no avail. I have some code that generates an email file on a webserver, and allows the user to download that email and open it in Outlook. From here, they can make various manual changes to the email before they send it to a bunch of people.
Right now, I generate a .OFT file, which is basically an email template. What I want to do is generate a .MSG file, which is an actual email. From a binary point of view, it seems these file formats are identical. They have the same Stream IDs and properties and stuff.
My approach was to first create a blank email message in Outlook and then just save it to a file called Base.oft. In my code, I open the document and modify Stream ID __substg1.0_1013001E which is the ID for the HTML email body. I then save the file and write it out to the cilent. This works perfectly.
I tried the same approach with the MSG format. I created a blank email message, saved it as Base.msg, and modify the same Stream ID. If I look at the resulting file, the new body is actually in there and saved. However, if I open the email, the body is still blank.
What's even weirder is if I type in a body in Outlook and save that to the base file, I can see that body under stream 0_1013001E. If I then modify that stream with a different body, I can verify the new body is indeed saved in the file, but if I open the message in Outlook, I see the old, original body. It's as if the email body is stored in a different place in the file for the .MSG format, however I've looked through each stream and cannot find anything else that looks like it could be an email body.
Perhaps .MSG files are encrypted, or their bodies are stored in some proprietary binary format unlike .OFT files? Hopefully someone has some insight on this, as I scoured the Internet and found basically nothing on these formats.
Update:
It seems the .MSG format stores the body in Stream ID __substg1.0_10090102 - Which is encoded in some binary form (not sure what.) If I delete the stream (or set it to a single \0, the file becomes corrupt.
First of all, to find more information on this and related topics, move away from raw substream numbers and google for the corresponding MAPI properties. For example, 1013 is PR_HTML and 1009 is PR_RTF_COMPRESSED. MAPI has ways of synching the body from one format to the other.
See this article on MSDN for a good overview of all content-related MAPI properties (i.e. the different "streams" inside the .MSG file).
To write PR_RTF_COMPRESSED, wrap the stream inside WrapCompressedStream. On the other hand, in your particular situation you might want to avoid the MAPI-dependencies in your code, so maybe you're better off finding the PR_STORE_SUPPORT_MASK and setting the STORE_UNCOMPRESSED_RTF bit. This will allow you to use straight RTF in the PR_RTF_COMPRESSED substream. Or Outlooks fancy html-wrapped-in-rtf, if you are feeling brave.
None of this stuff is for the faint of heart, but seeing how you are already handing raw .MSG substream writing, I'm guessing it would be feasible.
When it comes to the format, there is no difference.
the only difference is that OFT files have CLSID_TemplateMessage ({0006F046-0000-0000-C000-000000000046}) as the storage class (WriteClassStg), while MSG files use CLSID_MailMessage ({00020D0B-0000-0000-C000-000000000046})

Using OLE in Perl to traverse Outlook folders

I've got a script which will happily export messages from a folder in outlook to rfc822 files, fine.
But I want to traverse/iterate/recurse through the entire list of folders in outlook to extract copies of everything.
I'm thwarted by days of unsuccessful web searches.
Point me to TFM that I may R it.
Mail::Outlook
and all_folders() method ?

How to extract e-mail data into R?

How could I export my e-mail database from Gmail (or Thunderbird) into R?
Like there is the rgoogledocs package and twitteR, is there a gmailR package, or a standard format for exporting emails into stat packages ?
Tal
Need to install it library(edeR) first. May need to manually install Java 64 on Windows 8, may need to enable IMAP access in Gmail.
dat3 <-extractKeyword(username="YOURLOGIN#gmail.com",
password="YouRPaSS",
kw="adsense",
nmail=5)
This will download 5 emails with keyword 'adsense'.
Standard email (on a Unix system) is either an mbox file (containing several messages) or a maildir setup where each mail is a file in a directory.
Either way, it's ascii text. That is how a MUA (mail-user agents -- your mail reader) is orthogonal to your MTA (mail-transport agent -- mail server software like exim, qmail, postfix, ...). The MTA may use a network protocol like POP3 or IMAP to serve the mail files to the client in which case the client (which may be Gmail or Thunderbird) no longer sees the underlying files. So you may need to learn how to export your mail from whichever backend you employ and then read it.
This has nothing to do with R or programming so far --- unless you now feel you must extend R with POP3 or IMAP facilities to connect to a (remote) mail server.
Now there is R package to extract email data. This package still in testing phase but anyone can install it from GitHub, the package name is edeR. Right now this can extract email data from IMAP enabled Gmail.
Gmail and Thunderbird are not the same... you can enable Gmail account in Thunderbird, hence export each email in ASCII file, hence write a R batch script that will take each file and import it in R as an object, hence... you get the point. =)
Usually I'm trying to avoid "the pedestrian approach"... but I'm getting an impression that you're prone on using R as a "general purpose" programming language... Python or JAVA, on the other hand can be quite efficient, so you can write (or ask someone to write it for you) a script that will "bring" you data in desirable format, and then crunch it in R. R has matured a lot, and it's not solely a tool for statistical analysis any more, but it's always a good idea to use some widely-known programming language to carry out your data.
So there... Roll up your sleeves, and dive into Python (JAVA, C... whatever you feel like diving in)!
P.S.
I reckon that this has something to do with your previous post with word cloud...
Once you have exported your e-mails in mbox format into your PC, you can make use of both tm and tm.plugin.mail packages in R. The latter makes it possible to export your e-mails into R.
require("tm")
require("tm.plugin.mail")
Then, to convert your e-mails from mbox (i.e., several mails in a single box) format to eml (i.e., every mail in a single file) format: convert_mbox_eml(mbox, dir). In the example below, mbox is represented by "yourmails.mbox" and it describes the mbox location. The output directory is given by "your_mails".
convert_mbox_eml("yourmails.mbox", "your_mails")
You can read in an electronic mail document and inspect with the following R commands.
mails <- VCorpus(DirSource("your_mails/"), readerControl = list(reader =
readMail))
inspect(mails)