SMTP protocol: multiple mails per connection - email

I need to implement support of multiple messages per one connection for my SMTP server.
Every message ends with:
data
<<content>>
.
And it's logically that protocol state should be reset to "after receive authentication" point. Is it correct?
The question: Is it possible that any client sends message content with multiple data commands? Does the standard allow it?

From RFC2821 ("Simple Mail Transfer Protocol"):
The mail data is terminated by a line containing only a period, that
is, the character sequence "." (see section 4.5.2).
...
Receipt of the end of mail data indication requires the server to process the stored mail transaction information. This processing consumes the information in the reverse-path buffer, the forward-path buffer, and the mail data buffer, and on the completion of this command these buffers are cleared.
i.e. after <CRLF>.<CRLF> is received, the server consumes the mail data and clears its buffers; hence the client cannot then send more content associated with the message, since the server will have forgotten about the message.
...
Once started, a mail transaction consists of a transaction beginning command, one or more RCPT commands, and a DATA command, in that order.
...
MAIL (or SEND, SOML, or SAML) MUST NOT be sent if a mail transaction is already open, i.e., it should be sent only if no mail transaction had been started in the session, or it the previous one successfully concluded with a successful DATA command, or if the previous one was aborted with a RSET.
i.e. MAIL begins a new mail transaction, and a successful DATA command (terminated by <CRLF>.<CRLF>) concludes it; the client may then send another message.
From RFC4954 ("SMTP Service Extension for Authentication"):
After an AUTH command has been successfully completed, no more AUTH commands may be issued in the same session. After a successful AUTH command completes, a server MUST reject any further AUTH commands with a 503 reply.
i.e. authentication takes place at most once per session, and applies until the end of that session.

Related

Sending response in a Pub-Sub architecture

At the moment I have a single AWS EC2 instance which handles all incoming http client requests. It analyses each request and then decides which back end worker server should handle the request and then makes a http call to the chosen server. The back end server then responds when it has processed the request. The front end server will then respond to the client. The front end server is effectively a load balancer.
I now want to go to a Pub-Sub architecture instead of the front end server pushing the requests to the back end instances. The front end server will do some basic processing and then simply put the request into an SNS queue and the logic of which back end server should handle the request is left to the back end servers themselves.
My question is with this model what is the best way to have the back end servers notify the front end server that they have processed the request? Previously they just replied to the http request the front end server sent but now there is no direct request, just an item of work being published to a queue and a back end instance picking it off the queue.
Pubsub architectures are not well suited to responses/acknowledgements. Their fire-and-forget broadcasting pattern decouples publishers and the subscribers: a publisher does not know if or how many subscribers there are, and the subscribers do no know which publisher generated a message. Also, it can be difficult to guarantee sequence of responses, they won't necessarily match the sequence of messages due to the nature of network comms and handling of messages can take different amounts of time etc. So each message that needs to be acknowledge needs a unique ID that the subscriber can include in its response so the publisher can match a response with the message sent. For example:
publisher sends message "new event" and provides a UUID for the
event
many subscribers get the message; some may be the handlers for
the request, but others might be observers, loggers, analytics, etc
if only one subscriber handles the message (e.g. the first
subscriber to get a key from somewhere), that subscriber generates a
message "new event handled" and provides a UUID
the original
publisher, as well as any number of other subscribers, may get that
message;
the original publisher sees the ID is
in its cache as an unconfirmed message, and now marks it as
confirmed
if a certain amount of time passes without receiving a
confirmation with given ID, the original publisher republishes the
original message, with a new ID, and removes the old ID from cache.
In step 3, if many subscribers handled the message instead of just one, then it
less obvious how the original publisher should handle "responses": how does it
know how many subscribers handle the message, some could be down or
too busy to respond, or some may be in the process of responding by the time
the original publisher determines that "not enough handlers have
responded".
Publish-subscribe architectures should be designed to not request any response, but instead to check for some condition that should have happened as a result of the command being handled, such as a thumbnail having gotten generated (it can assume as a result of a handler of the message).

Sending data immediately after accept. Data loss possibility

I have read following on msdn about accept function:
https://msdn.microsoft.com/pl-pl/library/windows/desktop/ms737526(v=vs.85).aspx
When using the accept function, realize that the function may return
before connection establishment has traversed the entire distance
between sender and receiver. This is because the accept function
returns as soon as it receives a CONNECT ACK message; in ATM, a
CONNECT ACK message is returned by the next switch in the path as soon
as a CONNECT message is processed (rather than the CONNECT ACK being
sent by the end node to which the connection is ultimately
established). As such, applications should realize that if data is
sent immediately following receipt of a CONNECT ACK message, data loss
is possible, since the connection may not have been established all
the way between sender and receiver.
Could someone explain it in more details? What it has with SYN, SYN ACK? What's the problem here? So when such data loss can happen, and how to prevent it?
You're omitting an important paragraph on that page, right before your quote:
The following are important issues associated with connection setup,
and must be considered when using Asynchronous Transfer Mode (ATM)
with Windows Sockets 2
That is, it is only applicable when you use things like AF_ATM and SOCKADDR_ATM. It is not relevant for TCP which you seem to imply with:
What it has with SYN, SYN ACK

SSE Server Sent Events - Client keep sending requests (like polling)

How come every site explains that in SSE a single connection stays opened between client and server "With SSE, a client sends a standard HTTP request asking for an event stream, and the server responds initially with a standard HTTP response and holds the connection open"
And then, when server decides it can send data to the client while what I am trying to implement SSE I see on fiddler requests being sent every couple of seconds
For me it feels like long polling and not a one single connection kept opened.
Moreover, It is not that the server decides to send data to the client and it sends it but it sends data only when the client sends next request
If i respond with "retry: 10000" even tough something has happened that the server wants to notify right now, will get to the client only on the next request (in 10 seconds from now) which for me does not really looks like connection that is kept opened and server sends data as soon as he wants to
Your server is closing the connection immediately. SSE has a built-in retry function for when the connection is lost, so what you are seeing is:
Client connects to server
Server myteriously dies
Client waits two seconds then auto-reconnects
Server myteriously dies
Client waits two seconds then auto-reconnects
...
To fix the server-side script, you want to go against everything your parents taught you about right and wrong, and deliberately create an infinite loop. So, it will end up looking something like this:
validate user, set up database connection, etc.
while(true){
get next bit of data
send it to client
flush
sleep 2 seconds
}
Where get next bit of data might be polling a DB table for new records since the last poll, or scan a file system directory for new files, etc.
Alternatively, if the server-side process is a long-running data analysis, your script might instead look like this:
validate user, set-up, etc.
while(true){
calculate next 1000 digits of pi
send them to client
flush
}
This assumes that the calculate line takes at least half a second to run; any more frequently and you will start to clog up the socket with lots of small packets of data for no benefit (the user won't notice that they are getting 10 updates/second instead of 2 updates/second).

Receiving FD_CLOSE when there should be FD_READ

I have a strange problem on one of my clients workstation. I have a simple application that exchanges some data over network between two endpoints.
Basically the transaction goes like this:
Client A listens for incomming connection
Client B connects to A and sends some data
Client A read this data for further processing
Now the strange part is that client A does not receive whole data (sometimes it a part of buffer sometimes it is empty).
The A client uses WSAEventSelect function and waits for FD_READ to read data sent by B and for FD_CLOSE to detect disconnection.
Usually ( everytime except this one particular client) the FD_READ is signaled, data is processed and after that FD_CLOSE is signaled and all is fine, but here instead FD_READ i receive FD_CLOSE.
Can someone tell me how this is possible? Another thing is that program was working fine for about a year and suddenly it crashed.
Now the strange part is that client A does not receive whole data (sometimes it a part of buffer sometimes it is empty).
There's nothing strange about that, that's how TCP works, except that you will never receive zero bytes in blocking mode.
Usually ( everytime except this one particular client) the FD_READ is signaled, data is processed and after that FD_CLOSE is signaled and all is fine, but here instead FD_READ i receive FD_CLOSE.
Note that FD_READ can be signalled any number of times, not just once. You're not guaranteed to receive an entire message in a single read.
Can someone tell me how this is possible?
The client has closed the connection.
Quoting http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576%28v=vs.85%29.aspx
"An application should check for remaining data upon receipt of FD_CLOSE to avoid any possibility of losing data."
So if the error code associated with the FD_CLOSE notification is 0, you should check to see if you still have data to read, that might be where your missing data is.
If the error code is NOT 0, then there was an error and the missing data is probably lost.

XEP-0124 / BOSH: Omit ACK in response

I'm reading the XEP-0124 / BOSH specification and do not understand the following sentence in chapter 9.1 Request Acknowledgements:
The only exception is that, after its
session creation response, the
connection manager SHOULD NOT include
an 'ack' attribute in any response if
the value would be the 'rid' of the
request being responded to.
In my words: I should not send an ACK if the respond is dedicated for the last and only request (in connection manager's queue).
But: There is a client with it's own state machine. Maybe the client already send a second request -- where the first one is not replied -- and expect to get two answers. In this case the client except a ACK with RID of the "older" request and the connection manager have to set ACK.
Conclusion: The connection mananager MUST set ACK as long multiple requests are allowed.
I'm not sure, but is this text paragraph dedicated only for the use case where no further request is send by the client but the session creation phase is finished successfully and the connection manager have to send "ping" messages to the client due to "wait" timeouts ?
So, as I read it:
If the highest RID (in sequence) that you have received is 11 (you might have received 14 after that, but it is out of sequence since 12 & 13 are missing), and you are responding on:
The same request, then you should not (it is recommended that you do not, but if you have a good reason to, then you may) send an 'ack' attribute.
An earlier held request (say RID 10) then you should set 'ack' to 11 since that is the highest in-sequence RID that you have received so far.
It's okay if the client sent multiple requests and the server doesn't yet know about them. This is because there is a chance that when the client sent 11, the server has no held connections and it will respond back on the same connection. In that case, there are 2 requests sent out (11 & 12), but the response for each one acks that same request since the server always has something to send back immediately.