Existing tool or code to identify quoted text in emails - email

I am looking for a way to identify quoted text in emails. The goal is to add something along the lines of Gmails "show quoted text" feature to my web app which involves a mail handler bot.
There are similar questions on stackoverflow, but they are asking for an algorithm. I could implement this if I have to, but I would greatly prefer a tried and true solution.
Requirements:
1) Support both HTML and plain text emails
2) Operates on the full thread (that is, it has the original text to compare the quoted text against; no need to guess)
3) Handles common quote-related additions such as "On May 10th, 2008 at 6:35 PM Brandon wrote:"
A python library would be super magically awesome ideal, but I don't expect to get that lucky. A simple command line tool which can do this would pretty close to ideal, but I don't expect to that that lucky either. I'd gladly settle on a well known good implementation from an open source mail client which would be reasonably possible to extract into a tool.
Does anyone have a suggestion what my best bet would be?
I'm kind of surprised that there is no such thing as an "email handler bot construction kit".

Just following up on an email I received regarding this question.
Sup has a pretty easy to understand/extract/translate bit of logic for accomplishing this. I ported the relevant functions to Python and tweaked it for my purposes.
Sup is terminal-based mail client written in Ruby: http://sup.rubyforge.org/

Google has a patent for their method:
http://www.google.co.uk/patents/US7222299

Related

Clients want to copy/paste from word processors; rich text editors will make it a mess. How do we solve this?

After years of experience with custom made CMS systems, I come to this conclusion:
Clients really want to copy and paste information from word processors into their website CMS. They don't like to create large texts in a website box, and prefer to do so from their good old word processor. Or they simply have their text already prepared for other purposes, and therefore want to copy and paste.
Clients do not like to lose their format. They've spent time on their boldface text, headings, etc, and they do not like to do this all over again.
Rich Text Format fields (TinyMCE, CKEditor, etc) are not yet able to properly convert all formatted text into the right HTML. I do not blame them; this has to be very difficult given the odd 'source code' that word processors put in the clipboard. But reading all SO topics about richttext related issues, I feel this is a known limitation.
What do you do in such cases? I've tried the following:
Explain the client beforehand that this is not a word processor we are implementing, and it has limitations. They can understand, but still want to copy and paste.
Only show very few buttons for formatting (bold, italic, links). That way, we can strip the tags and clean this up quite well, and this limits issues. Works better, but clients keep asking for font options, more colors, headers, etc.
So not a really good solution in sight. Are there others who have tackled this issue successfully?
One solution (and probably the best I've come up with) is to post-process the pasted content. So, catch the publish event and correct all the crappy HTML -- catch all the "mso-normal" styles, for instance, and remove them. You'd have a set of rules which clean stuff coming out of, say, MS Word.
Though, this is not just a word processing problem. You're pasting from one rich text editor to another, and styles just don't transfer between rich editing environments. This is not so much a technical problem as it is a logical problems.
Update: Someone pointed me to this: Copy-Pasting Word to your Web CMS. No real solutions, but just confirmation that it's a sticky problem.
I totally agree with you:
Last week I did a very interesting test with a customer for which I had to prepare some demo's of .NET based CMS systems (Umbraco, Sitefinity, DNN, Composite C1 ect). The customer himself had a Drupal based site and I was ashamed none of my CMS demo's did a 100% job with a complicated Word table (Ceteris paribus: I did not do some CMS fine-tuning, used every CMS out of the box). The worst part was his Drupal CMS did a 100% good job! It was exactly the same as it was in Word. For a client working a lot with Word my CMS-ses were a showstopper. Of course there are a lot of discussions on the web about 'you should not copy from Word' or 'do NOT use Word for CMS things'. Fact is: clients work with Word so we should deal with it.

Extremely simple content updating tool for websites - CMS? PHP forms? Suggestions please!

As a side project I tutor grandparents and other computer novices in Computer & Internet 101, from physically using a mouse to dealing with e-mail/searching/etc. Web development isn't really my area of focus - I do have reasonable HTML/CSS/Javascript etc skills, so I can throw together a decent-looking simple, static site - but occasionally I get asked to put together extremely simple websites for these people, that they can update themselves; that is, edit text-based content without giving Grandpa a heart attack by making him come face-to-face with HTML/Javascript.
I've waded through a mile-long list of CMS software - largely culled from the many other similar questions on SO - but they've all got something ruling it out: hosted, restricts the design (can't use w/existing CSS, looks "Word-press-y", etc), not free/FOSS, etc. I wonder if "CMS" is even the right word for what I'm looking for. What I need is a simple text editor for the client: that is, something that will give the client a text box of some variety, let them edit it, and update the content with that info. They can't mess with navigation, add new pages, change anything other than text. If it was really fancy, they could upload a picture.
I was planning to do this just with a couple of password-protected php forms, but thought I'd ask if there's anything already out there that might provide this functionality? Any suggestions on building my own version of this, in PHP or something else?
What I'm really interested in is:
1) the simplicity/customize-ability of the admin interface (or lack of admin interface, if the client could somehow edit directly in the page), and
2) ease of set up for me (not getting paid much if at all for this, don't want to wade through three million plugin options to figure out how to get some unwieldy, high learning-curve framework to do what I want).
Try pulsecms.
Here is another very simple CMS that has JQuery and modernizr , HTML5 Boilerplate and TinyMCE.
I have my wife setup with Windows LiveWriter
http://explore.live.com/windows-live-writer?os=other
This means that she just builds her articles as if she is using a word processor (almost exactly the same) and then just uploads the article to her blog. I use Blogengine.net to host the blog on a Godaddy hosting solution.
Blogengine comes with built in support for LiveWriter and only required that you input the address, username and password in.
I understand this is an old post, but i hope someone find this of interest.
You could give the users the instruction to upload text files to the site, and the have the HTLM/PHP/ASP pages load the context of such .ts files.
Each web page should have a specific named .txt file associated.

How can I customize an email-notification within a sharepoint 2010 workflow?

I've got a random workflow in my sharepoint application that allows employees to make a request for some holiday period! So some emails have to be send around.
I'd like tu customize this email-notifications in the following way:
language (default is 'en', and I'd
like to cange if possible)
structure
thanks a lot!
george
I still got no idea how you can customize an email-notification...perhaps you could look for it's tamplate and "customize" it there, but this would change it for all notificaitons...so that's not a real solution!
anyway, I've found a quite good way to change the language. Well, email notifications always take the "default language" (which you can find in Site Actions/Site Settings/Site Administration/Language Settings) from the actual Site/SiteCollection/...regarding to this problem you should check my answer to this question link
So, now you can change a Site's default language to a new one(of course only if you got the right [Language ID]and the language pack(s) installed)...do some stuff like sending notification mails in favored language(s) (for example within a workflow like the one I've described above).
When your work(act. the workflow's work) is done you can set back the Site's language to the original one and you're done!

Email difference algorithm

I would like to replicate gmails functionality to "magically" not show irrelevant quoted stuff in emails but still showing mostly relevant stuff. Are there any libraries which can help me find the text that is actually new and should be shown? Or do you have any suggestions on how to proceed?
I do know which two messages belong together and which one is the answer to the other but I would love to only show relevant text.

Is Perl's CAM::PDF able to aggregate Annotation objects?

I have several copies of the same PDF file. These copies have annotations in it (Rect type with pop-up comments).
I want to know if I can get all these annotations from these copies and aggregate them into a single master copy using CAM::PDF (or another free tool).
an example to illustrate:
I have file1_userA.pdf and file1_userB.pdf. They are both annotated.
I want to generate file1_allusers.pdf aggregating annotations from both files file1_userA.pdf and file1_userB.pdf.
ps: I have the original un-annotated copy.
-- EDIT (Aug, 4):
I have developed an extension for CAM::PDF, namely CAM::PDF::Annot. It 'use base's CAM::PDF and adds extra functionality regarding Drawing Markup Annotations.
I am in the process of tidying up the code so I can post it to CPAN.
-- EDIT (Aug, 19)
I have finally submitted it to PAUSE, but I am running into some world writable related problems...
In any case, if anyone is interested in taking a look at the code, I will try and make it available somewhere... until then, just PM me and I will mail it to you.
geez, i'm getting such a thrill posting a module to cpan... i found the joy of working in working with Perl...
best regards,
Donato Azevedo
I'm the author of CAM::PDF. I have built only very limited support for annotations to date, specifically just for form field filling. So, no, that's not a supported feature today. The feature you describe is very interesting, though, and I can imagine that others would use it too, so I'd be interested in discussing it further with you offline.