IMAP fetch command race condition on sequence number change - email

I'm trying to work my way through RFC 3501 to determine what happens when you fetch from sequence number, but a CREATE or EXPUNGE command comes before the response. e.g.
> C: t fetch 32 rfc822.size
> S: * 32 FETCH (RFC822.SIZE 4085)
is easy, but what about:
> C: t fetch 32 rfc822.size
> S: * 12 EXPUNGE
> S: * 32 EXISTS
> S: * 31 FETCH (RFC822.SIZE 4085)
Does the 31 refer to the new sequence number, or the sequence number referenced in the fetch?

Section 7.4.1 of RFC 3501 specifically contains this language:
An EXPUNGE response MUST NOT be sent when no command is in
progress, nor while responding to a FETCH, STORE, or SEARCH
command. This rule is necessary to prevent a loss of
synchronization of message sequence numbers between client and
server. A command is not "in progress" until the complete command
has been received; in particular, a command is not "in progress"
during the negotiation of command continuation.
This specifically forbids the example. It cannot have been sent unilaterally ("MUST NOT be sent when no command is in progress"), and it could not have been sent as a response to FETCH ("nor while responding to a FETCH, STORE, or SEARCH command").
Also see 5.5 which contains some information about race conditions when multiple commands are in progress. The client is forbidden from sending plain FETCH, STORE, or SEARCH while other types of commands are in progress, and vice versa.

Your answer should be obvious - for the 31 in the response following the expunge to reference something other than the "current" sequence number 31 message would mean the IMAP server is maintaining an index of sequence numbers for each command-point-in-time. Obviously the IMAP protocol requires no such work on part of the server.
Furthermore note that strictly speaking the untagged responses have nothing to do with the fetch command; the association is merely a suggestion.

Related

Server return status 200 but client doesn't receive it because network connection is broken

I have REST service and client (Android app) that send POST request to REST service. On client side there are documents (orders) that need to be synchronized with web server. Synchronization means that client sends POST request to REST service for each order. When REST service receive POST request it writes data to database and sends response with status 200 to client. Client receives 200 and mark that order as synchronized.
Problem is when connection is broken after a server sent status 200 response but before client received response. Client doesn't mark order as synchronized. Next time client sends again this order and servers write it again in database so we have same order two times.
What is good practice to deal with this kind of problem?
Problem is when connection is broken after a server sent status 200 response but before client received response. Client doesn't mark order as synchronized. Next time client sends again this order and servers write it again in database so we have same order two times.
Welcome to the world of unreliable messaging.
What is good practice to deal with this kind of problem?
You should review Nobody Needs Reliable Messaging, by Marc de Graauw (2010).
The cornerstone of reliable messaging is idempotent request handling. Idempotent semantics are described this way
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
Simply fussing with the request method, however, doesn't get you anything. First, the other semantics in the message may not align with the idempotent request methods, and second the server needs to know how to implement the effect as intended.
There are two basic patterns to idempotent request handling. The simpler of these is set, meaning "overwrite the current representation with the one I am providing".
// X == 6
server.setX(7)
// X == 7
server.setX(7) <- a second, identical request, but the _effect_ is the same.
// X == 7
The alternative is test and set (sometimes called compare and swap); in this pattern, the request has two parts - a predicate to determine is some condition holds, and the change to apply if the condition does hold.
// X == 6
server.testAndSetX(6,7)
// X == 7
server.testAndSetX(6,7) <- this is a no op, because 7 != 6
// X == 7
That's the core idea.
From your description, what you are doing is manipulating a collection of orders.
The same basic idea works there. If you can calculate a unique identifier from the information in the request, then you can treat your collection like a set/key-value store.
// collection.get(Id.of(7)) == Nothing
collection.put(Id.of(7), 7)
// collection.get(Id.of(7)) == Just(7)
collection.put(Id.of(7), 7) <- a second, identical request, but the _effect_ is the same.
// collection.get(Id.of(7)) == Just(7)
When that isn't an option, then you need some property of the collection that will change when your edit is made, encoded into the request
if (collection.size() == 3) {
collection.append(7)
}
A generic way to manage something like this is to consider version numbers -- each time a change is made, the version number is incremented as part of the same transaction
// begin transaction
if (resource.version.get() == expectedVersion) {
resource.version.set(1 + expectedVersion)
resource.applyChange(request)
}
// end transaction
For a real world example, consider JSON Patch, which includes a test operation that can be used as a condition to prevent "concurrent" modification of a document.
What we're describing in all of these test and set scenarios is the notion of a conditional request
Conditional requests are HTTP requests [RFC7231] that include one or more header fields indicating a precondition to be tested before applying the method semantics to the target resource.
What the conditional requests specification gives you is a generic way to describe conditions in the meta data of your requests and responses, so that generic http components can usefully contribute.
Note well: what this works gets us is not a guarantee that the server will do what the client wants. Instead, it's a weaker: that the client can safely repeat the request until it receives the acknowledgement from the server.
Surely your documents must have an unique identifier. The semantically correct way would be to use the If-None-Match header where you send that identifier.
Then the server checks whether a document with that identifier already exists, and will respond with a 412 Precondition Failed if that is the case.
One of possible options would be validation on server side. Order should have some uniqueness parameter: name or id or something else. But this parameter should be send by client also. Then you get this value (e.x. if name is unique and client send it), find this order in database. If order is founded then you don't need to save it into database and should send 409 Conflict response to client. If you din't find such order in database then you save it and send 201 Ok response.
Best practices:
201 Ok for POST
409 Conflict - if resource already exists
Your requests should be idempotent.
From your description, you should be using PUT instead of POST.
Client side generated Ids (guids) and Upsert logic server side, help achieve this.
This way you can implement a retry logic client side for failed requests, without introducing multiple records.

A situation when HTTP put is not idempotent

Consider the following scenario:
Alice updates item1 using http put
Bob updates item1 using http put with different data
Alice updates item1 using http put again with the same data accidentally, for instance, using the back button in a browser
Charlie reads the data
Is this idempotent?
Is this idempotent?
Yes. The relevant definition of idempotent is provided by RFC 7231
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
However, the situation you describe is that of a data race -- the representation that Charlie receives depends on the order that the server applies the PUT requests received from Alice and Bob.
The usual answer to avoiding lost writes is to use requests that target a particular version of the resource to update; this is analogous to using compare and swap semantics on your request -- a write that loses the data race gets dropped on the floor
For example
x = 7
x.swap(7, 8) # Request from Alice changes x == 7 to x == 8
x.swap(8, 9) # Request from Bob changes x == 8 to x == 9
x.swap(7, 8) # No-Op, this request is ignored, x == 9
In HTTP, the specification of Conditional Requests gives you a way to take simple predicates, and lift them into the meta data so that generic components can understand the semantics of what is going on. This is done with validators like eTag.
The basic idea is this: the server provides, in the metadata, a representation of the validator associated with the current representation of the resource. When the client wants to make a request on the condition that the representation hasn't changed, it includes that same validator in the request. The server is expected to recalculate the validator using the current state of the server side resource, and apply the change only if the two validator representations match.
If the origin server rejects a request because the expected precondition headers are missing from the request, it can use 428 Precondition Required to classify the nature of the client error.
Yes, this is idempotent. If it is wrong behavior for you, we should know bussiness logick behind that.

REST response code for accessing a corrupt/invalid resource

What's the best HTTP status code to use in response to an HTTP GET for a resource that's corrupt or semantically invalid?
E.g., consider a request to GET /person/1234 where data for person ID 1234 exists on the server but violates some business rule, so the server refuses to use it.
404 doesn't apply (because the data actually exists).
4xx in general seems not ideal (because the problem is on the server end, not under the client's control).
503 seems to apply to the service as a whole, not a particular resource.
500 certainly fits, but it's very vague in actually telling the client what might be wrong.
Any suggestions?
After reading the comments and the linked resources, it looks like #RemyLebeau's approach is best:
I think 500 is the only official response code that fits this situation. And there is nothing stopping you from including a response body that describes the reason for the failure.
according to iana.org:
4xx: Client Error - The request contains bad syntax or cannot be fulfilled
5xx: Server Error - The server failed to fulfill an apparently valid request
I think none of the 4xx status code should be valid as a response to an internal server error or migration or ... where client has no responsibilities or where user's inputs are expected to be rechecked. unless user's pre-filled data are involved like maybe user's package is not allowing him to access that data after a pre-determinate and known date, in such specific case It may be valid a 403 Forbidden as #Bari did suggest.
I'm not an expert but I think when the rejection or the decision of considering endpoint data as corrupt or invalid is made by server, then it will depends on what should be done next. I see 3 possible cases:
1. It is expected that somehow this is going to be fixed and client
should be invited to request it back but at some future moment ==> 503 (Service Unavailable):
503 (Service Unavailable)
status code indicates that the server
is currently unable to handle the request due to a temporary overload
or scheduled maintenance, which will likely be alleviated after some
delay. The server MAY send a Retry-After header field
(Section 7.1.3) to suggest an appropriate amount of time for the
client to wait before retrying the request.
2. Something is wrong, it is not client responsibility but there is an alternative way to access data, maybe following a specific process or sending further details ==> 510 Not Extended
2. Server cannot fulfill the request but there is an alternative way that requires it to include further details. Example: when requested data is corrupt, server error response may include a list of older (or unsaved, unversioned) versions of it and expect client to be more specific about which version to select so it could be fetched instead of the corrupted one ==> 510 Not Extended
510 Not Extended
The policy for accessing the resource has not been met in the
request. The server should send back all the information necessary
for the client to issue an extended request. It is outside the scope
of this specification to specify how the extensions inform the
client.
If the 510 response contains information about extensions that were
not present in the initial request then the client MAY repeat the
request if it has reason to believe it can fulfill the extension
policy by modifying the request according to the information provided
in the 510 response. Otherwise the client MAY present any entity
included in the 510 response to the user, since that entity may
include relevant diagnostic information.
case 2 was updated to include an example as IMHO it may fit in such case. but again I'm not any expert and I may be
wrong about it
3. No alternative ways, nothing to be expected or none of the other cases ==> 500 should be good
500 (Internal Server Error)
status code indicates that the server
encountered an unexpected condition that prevented it from fulfilling
the request.

Simple way to count emails in GMail thread

Is there any simple way to know how many emails are in thread in GMail mailbox? I fetched information about a message (message_id, X-GM-THRID, references, in_reply_to etc) and I want to know how many other messages with same X-GM-THRID are in the mailbox. Is it possible without fetching information about those other messages?
According to this page about GMail IMAP extensions, X-GM-THRID is supported as a search key. This is the example from that page:
a009 UID SEARCH X-GM-THRID 1266894439832287888
* SEARCH 2 3 4
a009 OK Search (Success)
That gives you the UIDs of the messages in that thread, and you can just count the number of results.
If you really just want the count, and don't need the message ids, you can make use of the fact that GMail supports the ESEARCH capability (described in RFC 4731), which lets you ask for the count and nothing else:
C: 202 SEARCH RETURN (COUNT) X-GM-THRID 1261978514042297166
S: * ESEARCH (TAG "202") COUNT 2
S: 202 OK SEARCH completed (Success)

IMAP COPY command - uid and Message-Id

I understood that UID changes but the Message-Id will not be changed during any operation on a particular mail. However, after some operations I came up with this case:
Lets assume I have a total of 2000 emails in my INBOX. If I copy the 1000th email with UID 1000 and Message-Id 1000 to my Trash mailbox and then I copy that back to the INBOX, the UID will change to 2000 and Message-Id to 2000; the current for that folder. Then, regardless of the date that email will be at the top.
Now the question is, if I . fetch 1990:2000 fast (the last 10 emails), I'll get the that particular email among the 10 fetched. How would you fetch the last 10 based on the date without having to fetch 2000 emails and then sort them out by date?
If the IMAP server supports it, you can use the SORT command for this, as descibed in RFC 5256. The specific command you're looking for is probably:
C: A11 SORT (REVERSE DATE) UTF-8 ALL
S: * SORT 5 3 4 1 2
S: A11 OK SORT completed
The server response is a list of message sequence numbers that you can use for a subsequent fetch.
If your server supports the SORT extension (modern IMAP servers do), it will be announced in response to the CAPABILITY command. Here's a response from the ancient version of Courier-IMAP I'm running:
CAPABILITY
* CAPABILITY IMAP4rev1 CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES
SORT QUOTA LOGIN IDLE ACL ACL2=UNION STARTTLS