Parsing email message headers in Go

Parsing email message headers in Go - email

How I can read some headers from an email message in Go?
Usually I would use ReadMIMEHeader(), but sadly not everybody has read all the relevant RFCs and for some messages I get output like:
malformed MIME header line: name="7DDA4_foo_9E5D72.zip"
I narrowed the culprit to be
Content-Type: application/x-zip-compressed; x-unix-mode=0600;
name="7DDA4_foo_9E5D72.zip"
instead of
Content-Type: application/x-zip-compressed; x-unix-mode=0600;
name="7DDA4_foo_9E5D72.zip"
in the source of the message.
Go Playground example
What is the correct way of parsing the headers correctly, regardless if indented or not?

Given that the message is malformed, I would fix it through a separate piece of code that reformats the message:
func fixBrokenMime(r_ io.Reader, w io.WriteCloser) {
r := bufio.NewScanner(bufio.NewReader(r_))
for r.Scan() {
line := r.Text()
if len(line) > 0 && line[0] != ' ' && strings.IndexByte(line, ':') < 0 {
line = " " + line
}
w.Write([]byte(line+"\n"))
}
w.Close()
}
Playground: http://play.golang.org/p/OZsXT7pmtN
Obviously, you may want a different heuristic. I assumed that a line that is not indented and doesn't contain ":", must be indented.

Check out https://github.com/sendgrid/go-gmime (disclaimer, I work with SendGrid, but did not put together anything in the lib)

Related

Mirc script to find exact match in customer list

I am using this to find customer name in text file. Names are each on a separate line. I need to find exact name. If searching for Nick specifically it should find Nick only but my code will say found even if only Nickolson is in te list.
On*:text:*!Customer*:#: {
if ($read(system\Customer.txt,$2)) {
.msg $chan $2 Customer found in list! | halt }
else { .msg $chan 4 $2 Customer not found in list. | halt }
}

You have to loop through every matching line and see if the line is an exact match
Something like this
On*:text:*!Custodsddmer*:#: {
var %nick
; loop over all lines that contains nick
while ($read(customer.txt, nw, *nick*, $calc($readn + 1))) {
; check if the line is an exact match
if ($v1 == nick) {
%nick = $v1
; stop the loop because a result is found
break;
}
}
if (%nick == $null) {
.msg $chan 4 $2 Customer not found in list.
}
else{
.msg $chan $2 Customer found in list!
}
You can find more here: https://en.wikichip.org/wiki/mirc/text_files#Iterating_Over_Matches

If you're looking for exact match in a new line separate list, then you can use the 'w' switch without using wildcard '*' character.
From mIRC documentation
$read(filename, [ntswrp], [matchtext], [N])
Scans the file info.txt for a line beginning with the word mirc and
returns the text following the match value. //echo $read(help.txt, w,
*help*)
Because we don't want the wildcard matching, but a exact match, we would use:
$read(customers.txt, w, Nick)
Complete Code:
ON *:TEXT:!Customer *:#: {
var %foundInTheList = $read(system\Customer.txt, w, $2)
if (%foundInTheList) {
.msg # $2 Customer found in list!
}
else {
.msg 4 # $2 Customer not found in list.
}
}
Few remarks on Original code
Halting
halt should only use when you forcibly want to stop any future processing to take place. In most cases, you can avoid it, by writing you code flow in a way it will behave like that without explicitly using halting.
It will also resolve new problems that may arise, in case you will want to add new code, but you will wonder why it isn't executing.. because of the darn now forgotten halt command.
This will also improve you debugging, in the case it will not make you wonder on another flow exit, without you knowing.
Readability
if (..) {
.... }
else { .. }
When considering many lines of codes inside the first { } it will make it hard to notice the else (or elseif) because mIRC remote parser will put on the same identification as the else line also the line above it, which contains the closing } code. You should almost always few extra code in case of readability, especially which it costs new nothing!, as i remember new lines are free of charge.
So be sure the to have the rule of thump of every command in a new line. (that includes the closing bracket)
Matching Text
On*:text:*!Customer*:#: {
The above code has critical problem, and bug.
Critical: Will not work, because on*:text contains no space between on and *:text
Bug: !Customer will match EVERYTHING-BEFORE!customerANDAFTER <NICK>, which is clearly not desired behavior. What you want is :!Customer *: will only match if the first word was !customer and you must enter at least another text, because I've used [SPACE]*.

Count redirects in jmeter

At the moment im usig HTTP request sampler with 'Follow Redirects' enabled and want to keep it that way. As a secondary check besides assertion i want to count the number of redirects as well, but i dont want to implement this solution.
Is there a way when i can use only 1 HTTP sampler and a postprocessor (beanshell for now) and fetch this information? Im checking SamplerResult documentation , but cant find any method which would give back this information for me.

I heard Groovy is new black moreover users are encouraged to use JSR223 Test Elements and __groovy() function since JMeter 3.1 as Beanshell performs not that well so you can count the redirects as follows:
Add JSR223 PostProcessor as a child of your HTTP Request sampler
Put the following code into "Script" area:
int redirects = 0;
def range = new IntRange(false, 299, 400)
prev.getSubResults().each {
if (range.contains(it.getResponseCode() as int)) {
redirects++;
}
}
log.info('Redirects: ' + redirects)
Once you run your test you will be able to see the number of occurred redirects in jmeter.log file:

Add the following Regular Expression Extractor as a child of your sampler:
Apply to: Main sample and sub-samples
Field to check: Response code
Regular Expression: (\d+)
Template: $1$
Match No.: -1
Then add a BeanShell Post Processor also as a child of the sampler and add the following to the script area:
int matchNr = Integer.parseInt(vars.get("MyVar_matchNr"));// MyVar is the name of the variable of the above regular expression extractor
int counter = 0;
for(i=1; i <= matchNr; i++){
String x = vars.get("MyVar_"+i);
if(x.equals("302")){
counter = counter + 1;
}}
log.info(Label + ": Number of redirects = " + String.valueOf(counter));// The output will be printed in the log like this(BeanShell PostProcessor: Number of redirects = 3 ) so you might want to change the name of the beanshell post processor to the same name of your sampler.
Then you can see the number of redirects for the sampler in the log.

Why code shows "Error 354 (net::ERR_CONTENT_LENGTH_MISMATCH): The server unexpectedly closed the connection."

I am building my HTTP WEB SERVER in JAVA.
If client request any file and that file is on that place in server, then server gives that file to client. I also made this code, and it works fine.
The part of code, that shows above functionality,
File targ = [CONTAINS ONE FILE]
PrintStream ps;
InputStream is = new FileInputStream(targ.getAbsolutePath());
while ((n = is.read(buf)) > 0) {
System.out.println(n);
ps.write(buf, 0, n);
}
But now to make my code optimized, I replace this code with below code,
InputStream is = null;
BufferedReader reader = null;
String output = null;
is = new FileInputStream(targ.getAbsolutePath());
reader = new BufferedReader(new InputStreamReader(is));
while( (output = reader.readLine()) != null) {
System.out.println("new line");
//System.out.println(output);
ps.print(output);
}
But it sometimes shows one error Why code shows "Error 354 (net::ERR_CONTENT_LENGTH_MISMATCH): The server unexpectedly closed the connection.". I didn't understand, why it shows this error. This error is very weird, because server shows 200 code, that means, that file is there.
Help me please.
Edit no. 1
char[] buffer = new char[1024*16];
int k = reader.read(buffer);
System.out.println("size : " + k);
do {
System.out.println("\tsize is : " + k);
//System.out.println(output);
ps.println(buffer);
}while( (k = reader.read(buffer)) != -1 );
This prints all the file, but for bigger files, it shows unreadable characters.
It shows below output (Snapshot of client browser)

You do output = reader.readLine() to get the data, which omits the newline characters. Then you ps.print(output), so the newline characters are not sent to the client.
Say you read this
Hello\r\n
World\r\n
Then you send this:
Content-length: 14
HelloWorld
And then close the connection, confusing the browser as it still was waiting for the other 4 bytes.
I guess you'll have to use ps.println(output).
You would have seen this if you were monitoring the network traffic, which can prove quite useful when writing or debugging a server that is supposed to communicate using the network.
Anyway this will cause trouble if the newlines of the file and the system have a mismatch (\n vs \r\n). Say you have this file:
Hello\r\n
World\r\n
Its length is 14 bytes. However when your system treats a newline when printing as \n, your code with println() will print this:
Hello\n
World\n
Which is 12 bytes, not 14. You better just print what you read.

Alternative or fix for "\n" in google script

First time I put up a question here (been searching before posting) so please bear with possible mistakes done on my end.
To the problem:
I am currently working on a website with google sites. Made some forms there and am adding a script to those forms to get the info input there emailed away once the form is submitted and saved on the spreadsheet, which works just fine, but the message inside the email that arrives is pretty messed up.
All the "\n" expressions in the code get simply ignored.
I got the base for the code from www.labnol.org and just edited it a little.
For the start, the code:
function sendFormByEmail(e)
{
//I took the two mail addresses out here, but they are working in the original
var email = "first mail address";
var email2 = "second mail address";
var subject = "New Announce your visit form submitted";
var s = SpreadsheetApp.getActiveSheet();
var headers = s.getRange(1,1,1,s.getLastColumn()).getValues()[0];
var message = "A new 'Announce your visit' form has been submitted on the website: \n\n" + "\n\n";
for(var i in headers) {
message = message + "\n \n";
message += headers[i] + ' = '+ e.namedValues[headers[i]].toString() + "\n\n";
}
var senderEmail = e.namedValues[headers[6]].toString();
MailApp.sendEmail(email, senderEmail, subject, message);
MailApp.sendEmail(email2, senderEmail, subject, message);
}
As you can see, I have been trying around a lot to place the \n in different places as alternative, though it gets ignored no matter where I place it.
the original loop looked like this:
for(var i in headers)
message += headers[i] + ' = '+ e.namedValues[headers[i]].toString() + "\n\n";
Before I did some modifications to the code it worked just fine. But even though I didn't touch these lines, nor any of the other parts of the code that contribute to getting the message, the \n stopped working.
I kept trying to fix it (as the messed up \n placing above shows), but without success.
So now I am trying to find a way to fix them, or at least a work-around and hoped that any of you might know what is going on with the \n's or how to get them working again.
Thanks in advance.
ps: if you need any more information on it, just let me know

A solution is to use the GmailApp.sendEmail method instead of the MailApp.sendEmail method. The GmailApp's sendMail handles \n correctly.

Here's an alternate solution to what you're doing: HtmlTemplate
Some sample code:
function sendEmailWithTemplateExample() {
var t = HtmlService.createTemplateFromFile("body.html");
t.someValue = "some dynamic value";
var emailBody = t.evaluate().getContent();
MailApp.sendEmail("your#email.here", "test email", emailBody);
}
And here's the corresponding template code in body.html (Click "File -> New -> Html File"):
Body goes here
<?= someValue ?>

Extracting the body of an email from mbox file, decoding it to plain text regardless of Charset and Content Transfer Encoding

I am trying to use Python 3 to extract the body of email messages from a thunderbird mbox file. It is an IMAP account.
I would like to have the text part of the body of the email available to process as a unicode string. It should 'look like' the email does in Thunderbird, and not contain escaped characters such as \r\n =20 etc.
I think that it is the Content Transfer Encodings that I don't know how to decode or remove.
I receive emails with a variety of different Content Types, and different Content Transfer Encodings.
This is my current attempt :
import mailbox
import quopri,base64
def myconvert(encoded,ContentTransferEncoding):
if ContentTransferEncoding == 'quoted-printable':
result = quopri.decodestring(encoded)
elif ContentTransferEncoding == 'base64':
result = base64.b64decode(encoded)
mboxfile = 'C:/Users/Username/Documents/Thunderbird/Data/profile/ImapMail/server.name/INBOX'
for msg in mailbox.mbox(mboxfile):
if msg.is_multipart(): #Walk through the parts of the email to find the text body.
for part in msg.walk():
if part.is_multipart(): # If part is multipart, walk through the subparts.
for subpart in part.walk():
if subpart.get_content_type() == 'text/plain':
body = subpart.get_payload() # Get the subpart payload (i.e the message body)
for k,v in subpart.items():
if k == 'Content-Transfer-Encoding':
cte = v # Keep the Content Transfer Encoding
elif subpart.get_content_type() == 'text/plain':
body = part.get_payload() # part isn't multipart Get the payload
for k,v in part.items():
if k == 'Content-Transfer-Encoding':
cte = v # Keep the Content Transfer Encoding
print(body)
print('Body is of type:',type(body))
body = myconvert(body,cte)
print(body)
But this fails with :
Body is of type: <class 'str'>
Traceback (most recent call last):
File "C:/Users/David/Documents/Python/test2.py", line 31, in <module>
body = myconvert(body,cte)
File "C:/Users/David/Documents/Python/test2.py", line 6, in myconvert
result = quopri.decodestring(encoded)
File "C:\Python32\lib\quopri.py", line 164, in decodestring
return a2b_qp(s, header=header)
TypeError: 'str' does not support the buffer interface

Here is some code that does the job, it prints errors instead of crashing for those messages where it would fail. I hope that it may be useful. Note that if there is a bug in Python 3, and that is fixed, then the lines .get_payload(decode=True) may then return a str object instead of a bytes object. I ran this code today on 2.7.2 and on Python 3.2.1.
import mailbox
def getcharsets(msg):
charsets = set({})
for c in msg.get_charsets():
if c is not None:
charsets.update([c])
return charsets
def handleerror(errmsg, emailmsg,cs):
print()
print(errmsg)
print("This error occurred while decoding with ",cs," charset.")
print("These charsets were found in the one email.",getcharsets(emailmsg))
print("This is the subject:",emailmsg['subject'])
print("This is the sender:",emailmsg['From'])
def getbodyfromemail(msg):
body = None
#Walk through the parts of the email to find the text body.
if msg.is_multipart():
for part in msg.walk():
# If part is multipart, walk through the subparts.
if part.is_multipart():
for subpart in part.walk():
if subpart.get_content_type() == 'text/plain':
# Get the subpart payload (i.e the message body)
body = subpart.get_payload(decode=True)
#charset = subpart.get_charset()
# Part isn't multipart so get the email body
elif part.get_content_type() == 'text/plain':
body = part.get_payload(decode=True)
#charset = part.get_charset()
# If this isn't a multi-part message then get the payload (i.e the message body)
elif msg.get_content_type() == 'text/plain':
body = msg.get_payload(decode=True)
# No checking done to match the charset with the correct part.
for charset in getcharsets(msg):
try:
body = body.decode(charset)
except UnicodeDecodeError:
handleerror("UnicodeDecodeError: encountered.",msg,charset)
except AttributeError:
handleerror("AttributeError: encountered" ,msg,charset)
return body
#mboxfile = 'C:/Users/Username/Documents/Thunderbird/Data/profile/ImapMail/server.name/INBOX'
print(mboxfile)
for thisemail in mailbox.mbox(mboxfile):
body = getbodyfromemail(thisemail)
print(body[0:1000])

This script seems to return all messages correctly:
def getcharsets(msg):
charsets = set({})
for c in msg.get_charsets():
if c is not None:
charsets.update([c])
return charsets
def getBody(msg):
while msg.is_multipart():
msg=msg.get_payload()[0]
t=msg.get_payload(decode=True)
for charset in getcharsets(msg):
t=t.decode(charset)
return t
Former answer from acd often returns only some footer of the real message.
(
at least in the GMANE email messagens I am opening for this toolbox:
https://pypi.python.org/pypi/gmane
)
cheers

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Parsing email message headers in Go - email

Check out https://github.com/sendgrid/go-gmime (disclaimer, I work with SendGrid, but did not put together anything in the lib)

Related

Mirc script to find exact match in customer list

Count redirects in jmeter

Why code shows "Error 354 (net::ERR_CONTENT_LENGTH_MISMATCH): The server unexpectedly closed the connection."

Alternative or fix for "\n" in google script

Extracting the body of an email from mbox file, decoding it to plain text regardless of Charset and Content Transfer Encoding

Categories

Resources