QuickFix maintaining sequence number between multiple acceptors - quickfixn

I am using Quickfix/n library. I have acceptor running on 2 machines and one initiator. Currently I am getting issues with maintaining sequence number between initiator and acceptor when one of the acceptor goes down.
For ex- Initiator is sending and receiving messages from acceptor which is on machine 1. Last SeqNum sent by acceptor on machine 1 was 5 when it went down. Now acceptor on machine 2 is connected but initiator is sending log out messages saying that MsgSeqNum is too low, expecting SeqNum 6 but receiving 1.
So how do i ensure that acceptor on machine 2 will start from SeqNum 6 and not 1?

I think the only way this could work is if both Acceptors are using the same source for their message store.
If you are using a FileStore, then that file would have to be on a shared drive, with both Acceptors pointing to it.
Updated answer based on comment:
I didn't realize your acceptors were running simultaneously. (I don't know how that would work, though. How does the initiator get routed to the second acceptor when the first one dies?)
Parenthesized question aside, a custom DB store could work. It's pretty easy to implement the IMessageStore & IMessageStoreFactory interface. Both acceptors could point to the same DB; as long as they're not both writing to the same table at the same time then you should be good.
Doesn't have to be a DB, of course. Any persistent location that can allow two connections is fine. Just implement an IMessageStore to work with it.

Related

How to handle reordered RPC in raft

When implementing the Raft algorithm, I found there is a situation that I think may or may not do harm to the cluster.
It is reasonable to assume some AppendEntriesRPC from Leader are received reordered(network delay or other reasons). Consider the Leader send a heartbeat AppendEntriesRPC to peer A, with prev_log_index = 1, and then send another AppendEntriesRPC with entry 2, and then it crash(I ensure this happen immediately by a callback in my test). If the two RPCs are handled in the order which they are sent, entry 2 will be inserted successfully. However, if the heartbeat RPC is delayed, then peer A will firstly insert entry 1 and respond to the Leader. Then comes the delayed heartbeat, peer A will erase entry 2, because the entry conflict with the Leader's prev_log_index = 1. So peer A erases a log entry by mistake.
To dig a little deeper, if the Leader doesn't crash immediately, will it fix this? I think if peer A respond to the delayed heartbeat correctly, the Leader will find out and fix it up in some later RPCs.
However, what if peer A's response to entry 2 lead to the commit_index advancing? In this case peer A vote to advance commit_index to 2, even though it actually does not have entry 2. So there may not enough votes for this advancing. When the Leader crashs now, a node with less logs will be elected as Leader. And I do encounter such situation during my testing.
My question is:
Is my reasoning correct?
If reordered RPC a real problem, how should I solve that? Is indexing and caching all RPCs, and force them be handled one by one a good solution? I found it hard to implement in gRPC.
Raft assumes an ordered stream protocol such as TCP. That is, if a message arrives out of order then it is buffered until its predecessor arrives. (This behavior is why TCP exists: because each individual packet can go through separate routes between servers and there is a high chance of out-of-order messages, and most applications prefer the ease-of-mind of a strict ordering.)
Other protocols, such as plain old Paxos, can work with out-of-order messages, but are typically much slower than Raft.

quickfixj initiator disconnecting due to low seqnum too low

quickfixj initiator getting Disconnecting: Encountered END_OF_STREAM while trying to logon to the acceptor. We are using vendor's fix engine as acceptor. and feedback from acceptor is that logon request for xxxx was not accepted, incoming too small, expect 305, received 27.
I read the quickfix documentation but didn't get it exactly what's the proper solution for the sequence number mismatch. I understand that if I am disconnected, my initiator will send an 35=4 for resend with initiator side seqnum asking acceptor to resend the messages and fill up the gap.
But in what case, if initiator is sending a lower seqnum will be rejected by acceptor and refuse the connection?
And what's the proper procedure to handle this kind of rejection and reconnect? In order to not loose any message, how should both side do the reset and fill the gap?
In case there is a break between the initiator and acceptor, what's the recommended solution to keep the messages in sync and not loosing any?
Due to the first sentence of your question I would like to show you an answer to the same error message Disconnecting: Encountered END_OF_STREAM. There is a blog post by bhageera quoted.
In the end the reason was pretty silly… the counterparty I was connecting to allows only 1 connection per user/password (i.e. session with those credentials) at a time. As it turns out there was another application using the same credentials against the same TargetCompID. As soon as that application was killed off, the current one logged in fine.
I searched for the cause of the bug for a while, until I realized that I had two initiators with the same credentials running on two different test environments.
According to default logic in QuickfixJ:
QuickfixJ manages 2 sequence number, expectedSeqNum to receive(targetSeqNum) and nextSeqNumber to sent.
Check the next expected target SeqNum against the received SeqNum.If a mismatch is detected, apply the following logic:
if lower than expected SeqNum, logout
if higher, send a resend request
In your case received was lower than expected so it gets disconnected.
Reason for receiving higher than expected SeqNum:
Receiver misses some message so it could be a normal scenario.
Reason for lower than expected SeqNum(Your case):
One of the counterparties resets its sequence number, which is not expected it should be agreed by both the counterparties.
In a normal scenario, whenever you miss the message you will receive a higher number and it would be managed by QuickFixJ.

How are out-of-order and wait-free writes handled?

As stated in Guarantees:
Sequential Consistency - Updates from a client will be applied in the order that they were sent.
Let's assume a client makes 2 updates (update1 and update2) in a very short time window (I understand zookeeper is good at read-domination applications). So my questions are:
Is that possible update2 is received before update1, therefore for zookeeper update1 has later stamp than that of update2? I assume yes due to network connection nature. If this the case that means client will lose its update2 and will have update1. Is there anyway zookeeper can ACK back the client with different stamp or whatever other data that let the client to determine if update2 is really received after update1. Basically zookeeper tells what it sees from server side to client, which gives client some info to act if that's not what the client wants.
What if there is a leader failure after receiving and confirming update1 and before receiving update2? I assume such writes are persisted somewhere in disk/DB etc. When the new leader comes back will it catch up first, meaning conduct update1, before confirming update2 back to client?
Just curious, since zookeeper claims it supports wait-free writing, does that mean there is a message queue built inside zookeeper to hold incoming writes? Otherwise if the leader has to make sure the update is populated to all other followers, the client is actually being blocked by during this replication process. I am guessing that's part of reason zookeeper does not support heavy write application.
For the first two questions, I think you can find details in Zookeeper's paper.
It's quite normal that different operations from the same client arrive in disorder to Zookeeper node. But Zookeeper use TCP to ensure that sequential network package will be receive orderly.
Leader must write operations in Write-Ahead-Log before it can confirm operations. The problems will diverge in two dimensions. The first situation we should consider is whether the leader could recover before followers realize leader failure. If yes, nothing bad will happen, all operations in failure time will lost, and client will resend the operations. If not, then we should consider whether the Leader has proposed a proposal before it fails. If it fails before proposing a proposal, then client will know the failure. If it has proposed a proposal, there must be at least one node in the cluster which has got the newest transactions. Then it will be the new Leader in next rolling. When the original Leader recovers from failure, it will realize he's no longer the leader(All transactions of Zookeeper contains a 64-bits transaction id, of which the higher 32 bits represent epoch, and the lower 32 bits represents proposal id). It will communicate with new Leader and then get updated(Sometimes it need truncate it's local transaction log first).
I don't know the details since I haven't read ZooKeeper's source code. But Leader only needs over half acknowledge from followers before it response to clients. Zookeeper provide both blocking and non-blocking API and you can choose what you like.

What's the expected behavior when TCP connection is lost?

I looked through FIX v4.2 spec, it is not clear to me what the expected behavior it should be when the TCP connection is lost in the middle of a session.
More specifically, suppose the current sequence number is 100 and at this point the TCP connection is lost, when either side tries to resume the session, it re-sends message number 100, or starts a new session with logon?
In describing FIX session, the spec says one session has one logon and one logout, but could go across multiple physical connections. This leads me to think that when the TCP connection is lost, the resuming process should not be starting with a logon message, but I am not positive on that.
Thanks in advance!
FIX protocol does not define anything related to the transport protocol. There were some documents on the official web site that only suggest how it can be implemented on top of this or that protocol, but only suggests.
Therefore, the expected behavior in case of TCP/IP disconnect depends on implementation. For instance, it is possible to have a system that does not care about TCP/IP disconnects at all, which would make those details irrelevant. In that case, the expected behavior would have been to continue sending receiving messages after connection is re-established, and of course proceed to a “recovery” of lost messages, if any. In reality, though, I have never seen a system like that.
In practice, all systems treat TCP/IP disconnects as implicit lose of session and expect clients to send a logon upon re-connect.
When logging in, there are two options — a re-connecting session may send the next outgoing sequence number or it may ask server to reset the sequence (to 1). In first case, the server side may send a logon acknowledgement if sequence is greater or equal to what it expected, or close (or even reject) the session if the received sequence number is less than expected. Additionally, if the sequence was greater than expected, server will issue a re-transmission. Client session monitors the sequence of the server as well, and needs to request a re-transmission if it detects a gap (received sequence is greater than expected). In the second case, if the server supports sequence reset, both in and out sequences are reset to 1 and no messages are recovered.
In your case, if connection is lost after sending a message with sequence number 100, client would have to re-connect and send a logon with sequence 101, and proceed from there. Alternatively, connect and reset the sequence, in which case some messages might get lost.
Also, don’t forget to check specifics of the venue you connect to. There could be very weird details that are not specified by the FIX protocol at all, or even those going against the FIX protocol. For instance, ICE (indeed one of the most brain-dead exchanges in general) is one of the silliest exchanges in this regard — it doesn’t allow re-connecting within first 15 seconds, and then if clients cannot connect for 30 seconds, they should switch to a failover server. If failover happens, they fail to keep the sequence number in tact, and clients are left no choice but reset the sequence number.
Hope it makes things a bit clearer for you. Good Luck!
If the transport layer is TCP/IP, I would expect the session initator to:
Re-establish a socket connection
Send a new logon message
The sequence number to use on the logon message depends on the type of session and what has been agreed with the FIX session acceptor (see the spec for details). For sessions where there is no value in replaying any lot messages e.g. market data feeds where the prices would be stale, it makes sense to send a logon message with sequence number 1 and set tag 141=Y (to reset the sequence numbers). For an orders session, where message replay might be required, the session initiator should generally logon with a sequence number of one greater than the last message sent (and expect a logon response from the FIX session acceptor with sequence number of 1 greater than the last message received).
Unless you really need the message replay, it is cleaner and easier to reset the sequence numbers each time upon logon. This obviously depends on the FIX session acceptor (FIX server) support for this. For things like STP feeds, I've found this to be far more reliable and it is generally better for the application protocol to provide application level replay facilities rather than relying on the brittleness of FIX session replay.

QuickFix Sequence Reset not working

I am working on QuickFix/J (FIX 4.2)to submit orders to an acceptor FIX engine. Basically I need help on two accounts:
When I first try to establish a connection with the acceptor, the acceptor rejects the initial Logon requests saying "Msg Seq No too Low". After this my initiator goes on incrementing the outgoing sequence number by one and when this seq no. and the no. expected by the acceptor engine match, I get a stable connection. To speed this process, I began to extract the expected seq. no. from the reject message sent by the acceptor engine and changed the outgoing sequence no. for my engine using
session.setNextTargetMsgSeqNum(expectedSeqNo).
However, later on, if my engine finds incoming sequence no. higher than expected, it sends a Resend request. In response, the other party sends back a Sequence Reset msg (35=4, 123=Y). Now after receiving this msg, incoming seq no. for my engine should be automatically set to the one it received from Seq Reset msg. But this does not happen and my engine goes on asking for messages resend request with no change in the incoming seq no.
Interesting thing is, I found this thing to work when I don't explicitly change the outgoing seq no in the first place (using setNextTargetMsgSeqNum).
Why is my engine not showing expected behavior when it gets Sequence Reset Msg?
I have talked to the other party and they won't have ResetOnLogon=Y in their configuration. So every time my engine comes up, it often sends Logon request with a seq no. lower than expected(starts from 1). Is there a better way to have the connection set up quickly? Like can I somehow make my engine use the sequence no. resuming from the point just before it went down? What should be the ideal approach?
So I am now persisting the messages in a file which is taking care of sequence numbers. However, what is troubling again is, my quickfix initiator engine is not responding to Sequence Reset messages. There are no admin call backs at all now.
I notice that no response to sequence reset message is happening almost always when I am connecting to the acceptor from one server and then, closing that session, and using a different server to connect to the acceptor, using the same session id. Once the logon is accepted, I expect things to work fine. However, while the other engine sends sequence reset to a particular number (gap fill basically), my fix engine does not respond to it, meaning, it does not reset its expected sequence number and keeps on sending resend requests to the acceptor. Any help will be greatly appreciated!
For normal FIX session usage, you configure the session start and end times and let the engine manage the sequence numbers. For example, if your session is active from 8:00 AM to 4:30 PM then QuickFIX/J will automatically reset the outgoing and incoming sequence number to 1 the first time the engine is started after 8:00 AM (or at 8:00 AM if the engine is already started at that time).
(Question #1). You are correct that your engine should use the new incoming sequence number after the Sequence Reset. Given that this works properly for thousands of QuickFIX/J users, think about what you might be doing that would change that behavior. For example, do you have an admin message callback and might it be throwing exceptions. Have you looked at your log files to see if there are any hints there?
(Question #2). If you are using a persistent MessageStore (FileStore, JdbcStore, etc.) then your outgoing sequence number will be available when you restart.