Infinity confirmation loop - sockets

I came to interesting theoretical problem:
Let's assume we have Program A and Program B connected via some IPC like tcp socket or named pipe. Program A sends some data to Program B and depending on success of data delivery both A and B do some operations. However B should do its operation only if it is sure that A has got the delivery confirmation. So we came up to 3 connections:
A -> B [data tranfer]
B -> A [delivery confirmation]
A -> B [confirmation of getting delivery confirmation]
It may look weird but the goal is to don't do any operation neither on A nor B until both sides know that data has been transfered.
And here is the problem because second connection is for confirmation of success of first. And third is for confirmation of second but in fact there is no guarantee that connection 2 and 3 not fail and in that case we fall into infinite loop of confirmation. Is there some CS theory which solve that problem?

If I read your question right, the problem is called "the two general's problem". The gist of the issue is that the last entity that sends either a message or an acknowledgement knows nothing about the status of what it just sent, and so on.

Related

ZeroMQ: Which socket types for arbitrary communication between exactly 2 peers?

I'm using 0MQ to let multiple processes talk to each other (IPC sockets, but should also work via TCP across different nodes). My code is similar to a client/server pattern, but REQ/REP sockets are not enough. Here is a sample conversation. See below for further details.
Process A
Process B
open socket
not started
start process B
-
-
open socket, connect to A
-
send hello (successful start, socket information)
request work
-
-
do work
-
send response (work result 1)
-
send response (work result 2)
-
send unsolicited message
-
send response (work finished)
request termination
-
Actually, A is (even though doing all the requests) closer to be the server component, since it is constantly running. Based on external triggers, A starts a sort of plugin process B.
Every request needs to be answered by a finished response. Before that, N (between 0 and an arbitrary upper bound) responses can be sent from B.
A new request can be sent from A even when the current request is still ongoing (no finished message received). If relevant, the code could be updated to buffer the requests.
B sends an initial message which is not preceded by a request from A.
B can send other messages (logging) anywhere in between, also not preceded by a request.
Optional: A single socket in A should handle multiple plugin processes B, C, D...
A DEALER/ROUTER combination would probably match all requirements, but might be a bit too much. Process B will only ever connect to a single peer. And without the optional requirement above, the same would be true for process A as well. So I'm a bit hesitant to use DEALER and ROUTER sockets which are both able to handle multiple peers.

Always use neg[.z.w] to ensure that all messages are asynchronous?

Consider the following definition on server:
f:{show "Received ",string x; neg[.z.w] (`mycallback; x+1)}
on client side:
q)mycallback:{show "Returned ",string x;}
q)neg[h] (`f; 42)
q)"Returned 43"
In the q for motrtals, the tip says:
When performing asynchronous messaging, always use neg[.z.w] to ensure
that all messages are asynchronous. Otherwise you will get a deadlock
as each process waits for the other.
therefore I change the definition on the server as:
f:{show "Received ",string x; .z.w (`mycallback; x+1)}
everything goes fine, and I haven't seen any deadlocks.
Can anyone give me an example to show why I should always use neg[.z.w]?
If I understand you're question correctly I think your asking how sync and async messages work. The issue with the example you have provided is that x+1 is a very simple query that can be evaluated almost instantaneously. For a more illustrative example consider changing this to a sleep (or a more strenuous calculation, eg. a large database query).
On your server side define:
f:{show "Received ",string x;system "sleep 10"; neg[.z.w] (`mycallback; x+1)}
Then on your client side you can send the synchronous query:
h(`f; 42)
multiple times. Doing this you will see there is no longer a q prompt on the client side as it must wait for a response. These requests can be queued and thus block both the client and server for a significant amount of time.
Alternatively, if you were to call:
(neg h)(`f; 42)
on the client side. You will see the q prompt remain, as the client is not waiting for a response. This is an asynchronous call.
Now, in your server side function you are looking at using either .z.w or neg .z.w. This follows the exact same principal however from a server perspective. If the response to query is large enough, the messaging can take a significant amount of time. Consequently, by using neg this response can be sent asynchronously so the server is not blocked during this process.
NOTE: If you are working on a windows machine you will need to swap out sleep for timeout or perhaps a while loop if you are following my examples.
Update: I suppose one way to cause such a deadlock would be to have two dependant processes, attempting to synchronously call each other. For example:
q)\p 10002
q)h:hopen 10003
q)g:{h (`f1;`)}
q)h (`f;`)'
on one side and
q)\p 10003
q)h:hopen 10002
q)f:{h (`g;`)}
q)f1:{show "test"}
on the other. This would result in both processes being stuck and thus test never being shown.
Joe's answer covers pretty much everything, but to your specific example, a deadlock happens if the client calls
h (`f; 42)
Client is waiting response from the server before processing the next request, but the server is also waiting response from the client before it completes the client's request.

Mirth Connect: 2 Way ACK

I'm trying to figure out if it is somehow possible to setup Mirth to send 2 ACK back to the caller Application:
A) 1 ACK sent from Mirth to the caller when the transmission has been received from Mirth;
B) 1 ACK sent from Mirth to the caller after the channel is finished processing the message.
I know that Mirth can either be configured to send ACK before processing (case A above) or after processing (Case B above), but I could not find any way to send both.
Has anyone had experience in doing this?
Thank you all for your help.
Mirth uses a single responseMap to store acknowledgement which is processed after all scripts. So, if you put anything there when a message is received, this Ack will be overridden with a new Ack placed into the same map at the end. And only the latter will be sent, which you've already experienced I guess.
If I'm correct, what you are trying to achieve is, first, to confirm that the message is received by a remote location (let's call it System B) and, second, is to confirm that the message successfully processed. If your client (System A) is capable to send a message to two endpoints at System B then you may create two receiving channels on the System B side, one of these channels sends ACK immediately after receiving the message and does nothing. The other channel processes the message and sends ACK in postprocessor.
There are other options, say, on System B side redirect an incoming message to another channel which forms Ack and sends it back to System A, but then System A should have a listener on its side.
Or, System B may have a receiving channel that sends Ack immediately, routes the message to another channel that is connected to its destinations, and remove that destination to prevent incoming message to propagate to that channel. The second channel processes the message and sends Ack back to the first channel. First channel resends that Ack back to System A. (I have not tested such configuration, so this is just an idea to overcome a single responseMap. It may not work.)

What's the expected behavior when TCP connection is lost?

I looked through FIX v4.2 spec, it is not clear to me what the expected behavior it should be when the TCP connection is lost in the middle of a session.
More specifically, suppose the current sequence number is 100 and at this point the TCP connection is lost, when either side tries to resume the session, it re-sends message number 100, or starts a new session with logon?
In describing FIX session, the spec says one session has one logon and one logout, but could go across multiple physical connections. This leads me to think that when the TCP connection is lost, the resuming process should not be starting with a logon message, but I am not positive on that.
Thanks in advance!
FIX protocol does not define anything related to the transport protocol. There were some documents on the official web site that only suggest how it can be implemented on top of this or that protocol, but only suggests.
Therefore, the expected behavior in case of TCP/IP disconnect depends on implementation. For instance, it is possible to have a system that does not care about TCP/IP disconnects at all, which would make those details irrelevant. In that case, the expected behavior would have been to continue sending receiving messages after connection is re-established, and of course proceed to a “recovery” of lost messages, if any. In reality, though, I have never seen a system like that.
In practice, all systems treat TCP/IP disconnects as implicit lose of session and expect clients to send a logon upon re-connect.
When logging in, there are two options — a re-connecting session may send the next outgoing sequence number or it may ask server to reset the sequence (to 1). In first case, the server side may send a logon acknowledgement if sequence is greater or equal to what it expected, or close (or even reject) the session if the received sequence number is less than expected. Additionally, if the sequence was greater than expected, server will issue a re-transmission. Client session monitors the sequence of the server as well, and needs to request a re-transmission if it detects a gap (received sequence is greater than expected). In the second case, if the server supports sequence reset, both in and out sequences are reset to 1 and no messages are recovered.
In your case, if connection is lost after sending a message with sequence number 100, client would have to re-connect and send a logon with sequence 101, and proceed from there. Alternatively, connect and reset the sequence, in which case some messages might get lost.
Also, don’t forget to check specifics of the venue you connect to. There could be very weird details that are not specified by the FIX protocol at all, or even those going against the FIX protocol. For instance, ICE (indeed one of the most brain-dead exchanges in general) is one of the silliest exchanges in this regard — it doesn’t allow re-connecting within first 15 seconds, and then if clients cannot connect for 30 seconds, they should switch to a failover server. If failover happens, they fail to keep the sequence number in tact, and clients are left no choice but reset the sequence number.
Hope it makes things a bit clearer for you. Good Luck!
If the transport layer is TCP/IP, I would expect the session initator to:
Re-establish a socket connection
Send a new logon message
The sequence number to use on the logon message depends on the type of session and what has been agreed with the FIX session acceptor (see the spec for details). For sessions where there is no value in replaying any lot messages e.g. market data feeds where the prices would be stale, it makes sense to send a logon message with sequence number 1 and set tag 141=Y (to reset the sequence numbers). For an orders session, where message replay might be required, the session initiator should generally logon with a sequence number of one greater than the last message sent (and expect a logon response from the FIX session acceptor with sequence number of 1 greater than the last message received).
Unless you really need the message replay, it is cleaner and easier to reset the sequence numbers each time upon logon. This obviously depends on the FIX session acceptor (FIX server) support for this. For things like STP feeds, I've found this to be far more reliable and it is generally better for the application protocol to provide application level replay facilities rather than relying on the brittleness of FIX session replay.

Receiving FD_CLOSE when there should be FD_READ

I have a strange problem on one of my clients workstation. I have a simple application that exchanges some data over network between two endpoints.
Basically the transaction goes like this:
Client A listens for incomming connection
Client B connects to A and sends some data
Client A read this data for further processing
Now the strange part is that client A does not receive whole data (sometimes it a part of buffer sometimes it is empty).
The A client uses WSAEventSelect function and waits for FD_READ to read data sent by B and for FD_CLOSE to detect disconnection.
Usually ( everytime except this one particular client) the FD_READ is signaled, data is processed and after that FD_CLOSE is signaled and all is fine, but here instead FD_READ i receive FD_CLOSE.
Can someone tell me how this is possible? Another thing is that program was working fine for about a year and suddenly it crashed.
Now the strange part is that client A does not receive whole data (sometimes it a part of buffer sometimes it is empty).
There's nothing strange about that, that's how TCP works, except that you will never receive zero bytes in blocking mode.
Usually ( everytime except this one particular client) the FD_READ is signaled, data is processed and after that FD_CLOSE is signaled and all is fine, but here instead FD_READ i receive FD_CLOSE.
Note that FD_READ can be signalled any number of times, not just once. You're not guaranteed to receive an entire message in a single read.
Can someone tell me how this is possible?
The client has closed the connection.
Quoting http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576%28v=vs.85%29.aspx
"An application should check for remaining data upon receipt of FD_CLOSE to avoid any possibility of losing data."
So if the error code associated with the FD_CLOSE notification is 0, you should check to see if you still have data to read, that might be where your missing data is.
If the error code is NOT 0, then there was an error and the missing data is probably lost.