Converting email metadata RFC 5322 dates to Org mode dates in doom emacs - date

I have about 200 old emails, as *.eml files, that I want to concatenate into one *.org file so that I can use the information in Org mode. Each file has the string "Date: " followed by a timestamp in the RFC 5322 date format, i.e.
Date: Tue, 23 Apr 2019 13:31:18 -0400
I know the UNIX date command can convert the date part of that string to RFC 3339 date format, i.e. the command:
date --rfc-3339='ns' --date='Tue, 23 Apr 2019 13:31:18 -0400'
would give the result:
2019-04-23 13:31:18.000000000-04:00
I guess I could do all the conversion with awk in one go, but my awk is rusty, and I've been having trouble getting it right.
I'd really like to convert all of these dates to Org mode dates with one command, either using a vi command or a doom emacs command.
Any suggestions?

You could build up a regular expression for use with M-x query-replace-regexp which invokes a shell command to use the date command you mention above. The trick is the \, replacement option which allows you to execute any Emacs LISP code, as described in the Regexp Replacement section of the Emacs info manual.

Related

Interpret ISO week-of-year in gnu coreutils

Gnu date is usually pretty flexible with input to its --date option. However I find it is not giving me expected results using the ISO-8601 format for week-of-year that I found on wikipedia, namely yyyy-'W'ww or yyyy-'W'ww-d:
$ date -d 2019-W14-2 # expect some variation on 2019-04-02
date: invalid date '2019-W14-2'
Is there any format with which I can ask date to tell me about the 14th week of 2019?
I've tracked the relevant code down to gnulib module parse-datetime. It looks like this would have to be a new feature. I'll update this space if it's ever implemented.

Why is mail command adding extra character (">") to the email?

Why is mail command adding extra character (">") to the email?
Here is the text entry:
From the ...
Here is the text in Email:
>From the ...
Because FromĀ  (that is From with F in upper-case, rom in lower, followed by a space character) in the beginning of the line marks the beginning of a new message in the mbox format. The mbox format is just really one text file with messages appended to each other.
Quoting from mbox (Wikipedia):
mboxo and mboxrd locate the message start by scanning for From lines that are found before the email message headers. If a From string occurs at the beginning of a line in either the header or the body of a message (a mail standard violation for the former, but not for the latter), the email message must be modified before the message is stored in an mbox mailbox file or the line will be taken as a message boundary. To avoid misinterpreting a From string at the beginning of the line in the email body as the beginning of a new email, some systems "From-munge" the message, typically by prepending a greater-than sign:
>From my point of view...
That same practice has been introduced in another Linux tool: Git (2.10, June 2016), which teached format-patch and mailsplit (hence "am") how a line that happens to begin with "From " in the e-mail message is quoted with ">", so that these lines can be restored to their original shape.
See commit d9925d1, commit c88098d, commit 9f23e04 (05 Jun 2016) by Eric Wong (ele828).
(Merged by Junio C Hamano -- gitster -- in commit e25a4de, 06 Jul 2016)
pretty: support "mboxrd" output format
Signed-off-by: Eric Wong
This output format prevents format-patch output from breaking readers if somebody copy+pasted an mbox into a commit message.
Unlike the traditional "mboxo" format, "mboxrd" is designed to be fully-reversible.
"mboxrd" also gracefully degrades to showing extra ">" in existing "mboxo" readers.
This degradation is preferable to breaking message splitting completely, a problem I've seen in "mboxcl" due to having multiple, non-existent, or inaccurate Content-Length headers.
"mboxcl2" is a non-starter since it's inherits the problems of "mboxcl" while being completely incompatible with existing tooling based around mailsplit.
Documentation in Git 2.27 (Q2 2020):
See commit 88eaf36 (16 Apr 2020) by Emma Brooks (``).
(Merged by Junio C Hamano -- gitster -- in commit e08387d, 28 Apr 2020)
Documentation: explain "mboxrd" pretty format
Signed-off-by: Emma Brooks
Acked-by: Eric Wong
The "mboxrd" pretty format was introduced in 9f23e04061 (pretty: support "mboxrd" output format, 2016-06-05, Git 2.16.0) but wasn't mentioned in the documentation.
mboxrd
Like 'email', but lines in the commit message starting with "From " (preceded by zero or more ">") are quoted with ">" so they aren't confused as starting a new commit.

Inverse function to format-date

in xslt 2.0, the function format-date will convert a date to string in a desired format, e.g.
<xsl:value-of select = "format-date(xs:date('2000-01-01'), '[D01] [MN,*-3] [Y0001]', 'en', (), ())"/>
results in
01 JAN 2000.
My question is: which function takes 01 JAN 2000 as input and outputs 2000-01-01?
As noted above:
XPath 3.1 adds parse-ietf-date() which handles many of the date formats used in internet standards such as email (which are often very US-oriented). But there are too many varieties of date formats out there for a general solution to be viable.
It's much easier to define a syntax for converting one input format to a wide variety of output formats than to do the converse. A syntax that is sufficiently powerful to do the job properly would end up being very similar to doing it "by hand" using the replace() function and regular expressions.
It's quite easy to DIY.
References
XSLT Date Formatting
Parsing Date/Time Information from Google XML feed using XSL Stylesheet

How to understand these email header fields? ("From" field prefixed with a "greater" sign)

I have a raw email with headers that look like this:
From xxxx#xxxx Fri Apr 25 22:46:08 2003
>From xxxx#mxxxx Wed Feb 19 20:06:07 2003
Envelope-to: yyyy#xxxx
...
Date: Wed, 19 Feb 2003 22:05:59 +0500
From: "Actual Author" <xxxx#xxxx>
I don't know how to interpret the first two lines, and the initial reading of RFC2822 has left me without a clue. They don't look like normal headers and manage to confuse Python 2.7 email parser (fine if I remove the > sign at the start of the second line). I have the same email body in Apple mail's cache, and it seems fine, so the input is clearly correct.
What's that header format? (From <email> <date>\r\n)
Why is the second one prefixed with > (greater sign)?
What you have is a mail in mbox format, where the first "From" line marks the start of the message. The second line (>From) seems to be caused by the escaping strategy of mbox known as From quoting - has this message been double-encoded as mbox?

Formatting a java.util.Date through xsl built-in funciotn "format-dateTime" displays language

I'm using an xslt transformation to format a Java object to pdf through Apache FOP libraries.
In particular I want to format a field of my object, a java.util.Date into DD/MM/YYYY format. To be able to format using built-in function "format-dateTime" I set xslt version to 2.0 and switched the transformation processor to saxon-8.7 because xalan did not support version 2.0, then I added in the xslt the date formatting instruction as follows:
Value date: <xsl:value-of select="format-dateTime(valueDate, '[D01]/[M01]/[Y0001]') " />
before starting the transformation, I printed the Date field to stdout to be sure it was valued correctly in the input object:
valueDate: Thu Jan 01 01:00:00 CET 1970
And that's what I expected.
But in the output text, after the xsl transformation, appears an undesired "language" information that precedes the (correctly formatted) date.
[Language: en]01/01/1970
Someone knows why?
Why did you choose Saxon 8.7? It's a very old release, it actually predates the XSLT 2.0 recommendation of January 2007. The current release is 9.5.
I think you will find this goes away if you use a more recent release. However, it could still happen if your Java configuration has a default Locale which is a language that Saxon does not support. (The message indicates that Saxon has chosen to output the date in English despite this not being the language you requested, which is implicitly your default language).
If moving to a more recent release fails to solve the problem, try setting the language argument of format-date to the string "en".