I'm reading the XEP-0124 / BOSH specification and do not understand the following sentence in chapter 9.1 Request Acknowledgements:
The only exception is that, after its
session creation response, the
connection manager SHOULD NOT include
an 'ack' attribute in any response if
the value would be the 'rid' of the
request being responded to.
In my words: I should not send an ACK if the respond is dedicated for the last and only request (in connection manager's queue).
But: There is a client with it's own state machine. Maybe the client already send a second request -- where the first one is not replied -- and expect to get two answers. In this case the client except a ACK with RID of the "older" request and the connection manager have to set ACK.
Conclusion: The connection mananager MUST set ACK as long multiple requests are allowed.
I'm not sure, but is this text paragraph dedicated only for the use case where no further request is send by the client but the session creation phase is finished successfully and the connection manager have to send "ping" messages to the client due to "wait" timeouts ?
So, as I read it:
If the highest RID (in sequence) that you have received is 11 (you might have received 14 after that, but it is out of sequence since 12 & 13 are missing), and you are responding on:
The same request, then you should not (it is recommended that you do not, but if you have a good reason to, then you may) send an 'ack' attribute.
An earlier held request (say RID 10) then you should set 'ack' to 11 since that is the highest in-sequence RID that you have received so far.
It's okay if the client sent multiple requests and the server doesn't yet know about them. This is because there is a chance that when the client sent 11, the server has no held connections and it will respond back on the same connection. In that case, there are 2 requests sent out (11 & 12), but the response for each one acks that same request since the server always has something to send back immediately.
Related
Half-Established Connections
With a half-established connection I mean a connection for which the client's call to connect() returned successfully, but the servers call to accept() didn't. This can happen the following way: The client calls connect(), resulting in a SYN packet to the server. The server goes into state SYN-RECEIVED and sends a SYN-ACK packet to the client. This causes the client to reply with ACK, go into state ESTABLISHED and return from the connect() call. If the final ACK is lost (or ignored, due to a full accept queue at the server, which is probably the more likely scenario), the server is still in state SYN-RECEIVED and the accept() does not return. Due to timeouts associated with the SYN-RECEIVED state the SYN-ACK will be resend, allowing the client to resend the ACK. If the server is able to process the ACK eventually, it will go into state ESTABLISHED as well. Otherwise it will eventually reset the connection (i.e. send a RST to the client).
You can create this scenario by starting lots of connections on a single listen socket (if you do not adjust the backlog and tcp_max_syn_backlog). See this questions and this article for more details.
Experiments
I performed several experiments (with variations of this code) and observed some behaviour I cannot explain. All experiments where performed using Erlang's gen_tcp and a current Linux, but I strongly suspect that the answers are not specific to this setup, so I tried to keep it more general here.
connect() -> wait -> send() -> receive()
My starting point was to establish a connection from the client, wait between 1 to 5 seconds, send a "Ping" message to the server and wait for the reply. With this setup I observed that the receive() failed with the error closed when I had a half-established connection. There was never an error during the send() on a half-established connection. You can find a more detailed description of this setup here.
connect() -> long wait -> send()
To see, if I can get errors while sending data on a half-established connection I waited for 4 minutes before sending data. The 4 minutes should cover all timeouts and retries associated with the half-established connection. Sending data was still possible, i.e. send() returned without error.
connect() -> receive()
Next I tested what happens if I only call receive() with a very long timeout (5 minutes). My expectation was to get an closed error for the half-established connections, as in the original experiments. Alas, nothing happend, no error was thrown and the receive eventually timed out.
My questions
Is there a common name for what I call a half-established connection?
Why is the send() on a half-established connection successful?
Why does a receive() only fail if I send data first?
Any help, especially links to detailed explanations, are welcome.
From the client's point of view, the session is fully established, it sent SYN, got back SYN/ACK and sent ACK. It is only on the server side that you have a half-established state. (Even if it gets a repeated SYN/ACK from the server, it will just re-ACK because it's in the established state.)
The send on this session works fine because as far as the client is concerned, the session is established. The sent data does not have to be acknowledged by the far side in order to succeed (the send system call is finished when the data is copied into kernel buffers) but see below.
I believe here that the send actually is generating an error on the connection (probably a RST) because the receiving system cannot ACK data on a session it has not finished establishing. My guess is that any system call referencing the socket on the client side that happens after the send plus a short delay (i.e. when the RST has had a chance to come back) will result in an error.
The receive by itself never causes an error because the client side doesn't need to do anything (I mean TCP protocol-wise) for a receive; it's just idly waiting. But once you send some data, you've forced the server side's hand: it either has completed the session establishment (in which case it can accept the data) or it must send a reset (my guess here that it can't "hold" undelivered data on a session that isn't fully established).
I am trying to implement Fix.4.2 protocol, but It is difficult to understand the message log I attached below. Here Logon(35=A) request was sent with MsgSeqNum(34=1) from client. Then for testing ResendRequest and SequenceReset session level messages I sent a NewOrderSingle request with MsgSeqNum=7 (instead of MsgSeqNum=2, as subsequent messages should have incremeted msgseqnum after logon request). As expected MsgSeqNum is too high than recieved one Fiximulator responded with a ResendRequest(35=2) to send from 2 to 0 (i.e., from 2 to 7). Here why the Fiximulator is not waiting for client's reply ? instead it is sending an heartbeat message. Why the client is sending ResendRequest in response to ResendRequest of Fiximulator instead of sending SequenceReset message ?.
Also explain remaining cases if possible.
Thanks in advance.
What is your status of ResetOnLogon in your config file for the acceptor ? Default value is N so it isn't being reset. Always check your config file or try debugging to figure out issues.
ResetOnLogon Determines if sequence numbers should be reset when recieving a logon request. Acceptors only
How come every site explains that in SSE a single connection stays opened between client and server "With SSE, a client sends a standard HTTP request asking for an event stream, and the server responds initially with a standard HTTP response and holds the connection open"
And then, when server decides it can send data to the client while what I am trying to implement SSE I see on fiddler requests being sent every couple of seconds
For me it feels like long polling and not a one single connection kept opened.
Moreover, It is not that the server decides to send data to the client and it sends it but it sends data only when the client sends next request
If i respond with "retry: 10000" even tough something has happened that the server wants to notify right now, will get to the client only on the next request (in 10 seconds from now) which for me does not really looks like connection that is kept opened and server sends data as soon as he wants to
Your server is closing the connection immediately. SSE has a built-in retry function for when the connection is lost, so what you are seeing is:
Client connects to server
Server myteriously dies
Client waits two seconds then auto-reconnects
Server myteriously dies
Client waits two seconds then auto-reconnects
...
To fix the server-side script, you want to go against everything your parents taught you about right and wrong, and deliberately create an infinite loop. So, it will end up looking something like this:
validate user, set up database connection, etc.
while(true){
get next bit of data
send it to client
flush
sleep 2 seconds
}
Where get next bit of data might be polling a DB table for new records since the last poll, or scan a file system directory for new files, etc.
Alternatively, if the server-side process is a long-running data analysis, your script might instead look like this:
validate user, set-up, etc.
while(true){
calculate next 1000 digits of pi
send them to client
flush
}
This assumes that the calculate line takes at least half a second to run; any more frequently and you will start to clog up the socket with lots of small packets of data for no benefit (the user won't notice that they are getting 10 updates/second instead of 2 updates/second).
I looked through FIX v4.2 spec, it is not clear to me what the expected behavior it should be when the TCP connection is lost in the middle of a session.
More specifically, suppose the current sequence number is 100 and at this point the TCP connection is lost, when either side tries to resume the session, it re-sends message number 100, or starts a new session with logon?
In describing FIX session, the spec says one session has one logon and one logout, but could go across multiple physical connections. This leads me to think that when the TCP connection is lost, the resuming process should not be starting with a logon message, but I am not positive on that.
Thanks in advance!
FIX protocol does not define anything related to the transport protocol. There were some documents on the official web site that only suggest how it can be implemented on top of this or that protocol, but only suggests.
Therefore, the expected behavior in case of TCP/IP disconnect depends on implementation. For instance, it is possible to have a system that does not care about TCP/IP disconnects at all, which would make those details irrelevant. In that case, the expected behavior would have been to continue sending receiving messages after connection is re-established, and of course proceed to a “recovery” of lost messages, if any. In reality, though, I have never seen a system like that.
In practice, all systems treat TCP/IP disconnects as implicit lose of session and expect clients to send a logon upon re-connect.
When logging in, there are two options — a re-connecting session may send the next outgoing sequence number or it may ask server to reset the sequence (to 1). In first case, the server side may send a logon acknowledgement if sequence is greater or equal to what it expected, or close (or even reject) the session if the received sequence number is less than expected. Additionally, if the sequence was greater than expected, server will issue a re-transmission. Client session monitors the sequence of the server as well, and needs to request a re-transmission if it detects a gap (received sequence is greater than expected). In the second case, if the server supports sequence reset, both in and out sequences are reset to 1 and no messages are recovered.
In your case, if connection is lost after sending a message with sequence number 100, client would have to re-connect and send a logon with sequence 101, and proceed from there. Alternatively, connect and reset the sequence, in which case some messages might get lost.
Also, don’t forget to check specifics of the venue you connect to. There could be very weird details that are not specified by the FIX protocol at all, or even those going against the FIX protocol. For instance, ICE (indeed one of the most brain-dead exchanges in general) is one of the silliest exchanges in this regard — it doesn’t allow re-connecting within first 15 seconds, and then if clients cannot connect for 30 seconds, they should switch to a failover server. If failover happens, they fail to keep the sequence number in tact, and clients are left no choice but reset the sequence number.
Hope it makes things a bit clearer for you. Good Luck!
If the transport layer is TCP/IP, I would expect the session initator to:
Re-establish a socket connection
Send a new logon message
The sequence number to use on the logon message depends on the type of session and what has been agreed with the FIX session acceptor (see the spec for details). For sessions where there is no value in replaying any lot messages e.g. market data feeds where the prices would be stale, it makes sense to send a logon message with sequence number 1 and set tag 141=Y (to reset the sequence numbers). For an orders session, where message replay might be required, the session initiator should generally logon with a sequence number of one greater than the last message sent (and expect a logon response from the FIX session acceptor with sequence number of 1 greater than the last message received).
Unless you really need the message replay, it is cleaner and easier to reset the sequence numbers each time upon logon. This obviously depends on the FIX session acceptor (FIX server) support for this. For things like STP feeds, I've found this to be far more reliable and it is generally better for the application protocol to provide application level replay facilities rather than relying on the brittleness of FIX session replay.
I already read this question about socket synchronization but I still dont get it yet.
Recently I was working on a relatively simple client/server app where the communication happens over a tcp socket. The client is written in PHP using the C-like functions (especially fsockopen and fgetc) PHP provides to interact with sockets, the server is written in node.js using a Stream for outputting data.
The protocol is quite simple, the message is just a string which ends with a 0-byte character.
Basically it works like this:
SERVER: Message 1
CLIENT: Ack 1
SERVER: Message 2
CLIENT: Ack 2
....
Which really worked fine as my client processed one message at a time by reading char by char from the socket until a 0-byte was encountered which designates the end of the message. Then the client writes back to the server that it has successfully received the message (thats the Ack <message id> part).
Now this happened:
SERVER: Message 1
CLIENT: Ack 1
SERVER: Message 2
CLIENT: Ack 2
SERVER: Message 3
Message 4
Message 5
Message 6
CLIENT: <DOH!>
....
Meaning the server unexpectedly sent multiple messages in one "batch" to the client, although every message is a single stream.write(...) operation on the server. It seemed like the messages were buffered somewhere and then sent to the client at once. My client code couldnt cope with multiple messages in the socket WITHOUT an Ack response in between, so it cut off the remaining messages after id 3.
So my question is:
How synchronized are sockets in their read and writes? From the question above I understand that a socket is basically two uni-directional pipes, which means they are not synchronized at all?
How can it happen that some messages were sent to my client in a simple "one message-one ack" manner and then suddendly multiple messages are written to the stream?
Does it actually change the picture if the socket is opened in a blocking/non-blocking manner?
I tested this on a Ubuntu VM (so no load or anything that could provoke strange behaviour) using PHP 5.4 and node 0.6.x.
TCP is an abstraction of a bi-directional stream, and as such has no concept of messages and cannot preserve message boundaries. There is no guarantee how multiple send() or recv() calls will map to TCP packets. You should treat send() as if calling it multiple times is equivalent to calling it once with the concatenation of all the data. More importantly, when receiving, you should make sure that your code interprets the incoming data exactly the same way, no matter how it was split over indvidual recv() calls.
To receive properly, you can use a buffer where you store incomplete messages. But be careful that when you have an incomplete message in a buffer, the next recv() call may complete the current message, as well as provide zero or more complete messages, and possibly part of another incomplete message.
The blocking or non-blocking mode doesn't change anything here - it's only about the way your application interfaces with the OS.
There are two synchronization concepts to deal with:
The (generally) synchronous operation of send() or recv().
The asynchronous way that one process sends a message and the way the other process handles the message.
If you can, try to avoid a design that keeps a client and server in process-synchronized "lock step" with each other. That's asking for trouble. What if the one of the processes closes unexpectedly? The other process/thread might hang on a recv() that will never come. It's one thing for your design to expect each message to be acknowledged eventually, but it's quite another for your design to expect that only one message can be sent, then it must be acknowledged, before you may send another.
Consider this:
Server: send 1
Client: ack 1
Server: send 2
Server: send 3
Client: ack 2
Server: send 4
Client: ack 3
Client: ack 4
A design that can accommodate this situation is better than one that expects:
Server: send 1
Client: ack 1
Server: send 2
Client: ack 2
Server: send 3
Client: ack 3
Server: send 4
Client: ack 4