quickfixj initiator disconnecting due to low seqnum too low - quickfix

quickfixj initiator getting Disconnecting: Encountered END_OF_STREAM while trying to logon to the acceptor. We are using vendor's fix engine as acceptor. and feedback from acceptor is that logon request for xxxx was not accepted, incoming too small, expect 305, received 27.
I read the quickfix documentation but didn't get it exactly what's the proper solution for the sequence number mismatch. I understand that if I am disconnected, my initiator will send an 35=4 for resend with initiator side seqnum asking acceptor to resend the messages and fill up the gap.
But in what case, if initiator is sending a lower seqnum will be rejected by acceptor and refuse the connection?
And what's the proper procedure to handle this kind of rejection and reconnect? In order to not loose any message, how should both side do the reset and fill the gap?
In case there is a break between the initiator and acceptor, what's the recommended solution to keep the messages in sync and not loosing any?

Due to the first sentence of your question I would like to show you an answer to the same error message Disconnecting: Encountered END_OF_STREAM. There is a blog post by bhageera quoted.
In the end the reason was pretty silly… the counterparty I was connecting to allows only 1 connection per user/password (i.e. session with those credentials) at a time. As it turns out there was another application using the same credentials against the same TargetCompID. As soon as that application was killed off, the current one logged in fine.
I searched for the cause of the bug for a while, until I realized that I had two initiators with the same credentials running on two different test environments.

According to default logic in QuickfixJ:
QuickfixJ manages 2 sequence number, expectedSeqNum to receive(targetSeqNum) and nextSeqNumber to sent.
Check the next expected target SeqNum against the received SeqNum.If a mismatch is detected, apply the following logic:
if lower than expected SeqNum, logout
if higher, send a resend request
In your case received was lower than expected so it gets disconnected.
Reason for receiving higher than expected SeqNum:
Receiver misses some message so it could be a normal scenario.
Reason for lower than expected SeqNum(Your case):
One of the counterparties resets its sequence number, which is not expected it should be agreed by both the counterparties.
In a normal scenario, whenever you miss the message you will receive a higher number and it would be managed by QuickFixJ.

Related

quickfixj initiator manually resend reset to a seqnum at logon

I have a quickfixj initiator connecting to vendor's acceptor and receiving messages. I keep the fix messages in a buffer which is processed by a thread. To avoid loosing the message in case crash with message in the buffer, I have the last seqnum processed, and plan to send resend message for that next seqnum on my side when I reconnect.
I know the better solution would be that I save the messages before I receives them, but the design is to avoid doing any db access in the onMessage call.
I didn't find any example how this could be done, resending request for a specific seqnum. Should I simply overload the logon message and send the seqnum?
Anyone has an example?
I guess you are already in synch as per the last thread if quickfixj crash in onmessage, will I lose my current message?.
QuickFixJ manages 2 sequence numbers:
SenderSequenceNum: Sequence number used in sending messages.
TargetSequenceNum: Sequence number expected to receive.
So you have two options:
Option 1: Process the receive messages on the QuickfixJ onMessage() callback thread. So that in case of an exception the sequence number does not increment. And QuickFixJ automatically sends the resend request on receiving next fix message as it will detect the sequence gap.
Option 2: Persist the sequence number that you have successfully processed. In case of crash, on restart you can set the expected receive sequence number using:
Session.lookupSession(session_).setNextTargetMsgSeqNum();
So if you receive a sequence number higher than that, QuickfixJ automatically sends resend the request.
Note: Do not change the sender sequence number then another party will receive a sequence number lower than expected and can cause disconnection.

Why are resent messages discarded in QuickFIX?

I have a QuickFIX/J application running as acceptor. ResetOnLogon is N in the configuration.
When the initiator is logged on, since the seq nums are different the initiator app sends the messages and I see those messages in the FIX log file. The first one of those message is passed to the application layer but the others are not, all are discarded.
What can be the reason that the messages are received but not passed to the application level?
The most likely reason for this is that the messages contain the PossDupFlag <43> with a 'Y' value, and a MsgSeqNum <34> that is infact recognized as a dupe by the engine. In that case you won't receive these as application level messages.

What's the expected behavior when TCP connection is lost?

I looked through FIX v4.2 spec, it is not clear to me what the expected behavior it should be when the TCP connection is lost in the middle of a session.
More specifically, suppose the current sequence number is 100 and at this point the TCP connection is lost, when either side tries to resume the session, it re-sends message number 100, or starts a new session with logon?
In describing FIX session, the spec says one session has one logon and one logout, but could go across multiple physical connections. This leads me to think that when the TCP connection is lost, the resuming process should not be starting with a logon message, but I am not positive on that.
Thanks in advance!
FIX protocol does not define anything related to the transport protocol. There were some documents on the official web site that only suggest how it can be implemented on top of this or that protocol, but only suggests.
Therefore, the expected behavior in case of TCP/IP disconnect depends on implementation. For instance, it is possible to have a system that does not care about TCP/IP disconnects at all, which would make those details irrelevant. In that case, the expected behavior would have been to continue sending receiving messages after connection is re-established, and of course proceed to a “recovery” of lost messages, if any. In reality, though, I have never seen a system like that.
In practice, all systems treat TCP/IP disconnects as implicit lose of session and expect clients to send a logon upon re-connect.
When logging in, there are two options — a re-connecting session may send the next outgoing sequence number or it may ask server to reset the sequence (to 1). In first case, the server side may send a logon acknowledgement if sequence is greater or equal to what it expected, or close (or even reject) the session if the received sequence number is less than expected. Additionally, if the sequence was greater than expected, server will issue a re-transmission. Client session monitors the sequence of the server as well, and needs to request a re-transmission if it detects a gap (received sequence is greater than expected). In the second case, if the server supports sequence reset, both in and out sequences are reset to 1 and no messages are recovered.
In your case, if connection is lost after sending a message with sequence number 100, client would have to re-connect and send a logon with sequence 101, and proceed from there. Alternatively, connect and reset the sequence, in which case some messages might get lost.
Also, don’t forget to check specifics of the venue you connect to. There could be very weird details that are not specified by the FIX protocol at all, or even those going against the FIX protocol. For instance, ICE (indeed one of the most brain-dead exchanges in general) is one of the silliest exchanges in this regard — it doesn’t allow re-connecting within first 15 seconds, and then if clients cannot connect for 30 seconds, they should switch to a failover server. If failover happens, they fail to keep the sequence number in tact, and clients are left no choice but reset the sequence number.
Hope it makes things a bit clearer for you. Good Luck!
If the transport layer is TCP/IP, I would expect the session initator to:
Re-establish a socket connection
Send a new logon message
The sequence number to use on the logon message depends on the type of session and what has been agreed with the FIX session acceptor (see the spec for details). For sessions where there is no value in replaying any lot messages e.g. market data feeds where the prices would be stale, it makes sense to send a logon message with sequence number 1 and set tag 141=Y (to reset the sequence numbers). For an orders session, where message replay might be required, the session initiator should generally logon with a sequence number of one greater than the last message sent (and expect a logon response from the FIX session acceptor with sequence number of 1 greater than the last message received).
Unless you really need the message replay, it is cleaner and easier to reset the sequence numbers each time upon logon. This obviously depends on the FIX session acceptor (FIX server) support for this. For things like STP feeds, I've found this to be far more reliable and it is generally better for the application protocol to provide application level replay facilities rather than relying on the brittleness of FIX session replay.

FIX protocol sequence number

I have few question on FIX protocol sequence number:
What is the benefit of setting ResetOnLogon=N?
Does initiator and acceptor both can send Resend request?
How message sequence helps in session recovery/error handling?
it means that sequence numbers are reset by the protocol on a logon message. This keeps sequence numbers low which can be useful. The sell side usually defines whether this should be done or not.
Yes, as long as the engine thinks that, due to out of synch sequence numbers, a message may have been lost it may request a resend.
If sequence numbers are out of synch between a message and its predecessor, and the number is higher than expected then the engine may assume that some messages have been lost in the connection. This means that it needs to recover these meaasges.
If you have any more questions or want more information I would be happy to reply.
ResetOnLogon determines if sequence numbers should be reset when recieving a logon request. (please find documentation here: http://www.quickfixengine.org/quickfix/doc/html/configuration.html)
Yes, both can send a Resend Request, but you must follow the specs between your side and the counterparty.
The message sequence numbers tell that no messages were lost during the current session. If there is a mismatch, actions must be taken in order to establish the correct sync between the 2 sides.

QuickFix Sequence Reset not working

I am working on QuickFix/J (FIX 4.2)to submit orders to an acceptor FIX engine. Basically I need help on two accounts:
When I first try to establish a connection with the acceptor, the acceptor rejects the initial Logon requests saying "Msg Seq No too Low". After this my initiator goes on incrementing the outgoing sequence number by one and when this seq no. and the no. expected by the acceptor engine match, I get a stable connection. To speed this process, I began to extract the expected seq. no. from the reject message sent by the acceptor engine and changed the outgoing sequence no. for my engine using
session.setNextTargetMsgSeqNum(expectedSeqNo).
However, later on, if my engine finds incoming sequence no. higher than expected, it sends a Resend request. In response, the other party sends back a Sequence Reset msg (35=4, 123=Y). Now after receiving this msg, incoming seq no. for my engine should be automatically set to the one it received from Seq Reset msg. But this does not happen and my engine goes on asking for messages resend request with no change in the incoming seq no.
Interesting thing is, I found this thing to work when I don't explicitly change the outgoing seq no in the first place (using setNextTargetMsgSeqNum).
Why is my engine not showing expected behavior when it gets Sequence Reset Msg?
I have talked to the other party and they won't have ResetOnLogon=Y in their configuration. So every time my engine comes up, it often sends Logon request with a seq no. lower than expected(starts from 1). Is there a better way to have the connection set up quickly? Like can I somehow make my engine use the sequence no. resuming from the point just before it went down? What should be the ideal approach?
So I am now persisting the messages in a file which is taking care of sequence numbers. However, what is troubling again is, my quickfix initiator engine is not responding to Sequence Reset messages. There are no admin call backs at all now.
I notice that no response to sequence reset message is happening almost always when I am connecting to the acceptor from one server and then, closing that session, and using a different server to connect to the acceptor, using the same session id. Once the logon is accepted, I expect things to work fine. However, while the other engine sends sequence reset to a particular number (gap fill basically), my fix engine does not respond to it, meaning, it does not reset its expected sequence number and keeps on sending resend requests to the acceptor. Any help will be greatly appreciated!
For normal FIX session usage, you configure the session start and end times and let the engine manage the sequence numbers. For example, if your session is active from 8:00 AM to 4:30 PM then QuickFIX/J will automatically reset the outgoing and incoming sequence number to 1 the first time the engine is started after 8:00 AM (or at 8:00 AM if the engine is already started at that time).
(Question #1). You are correct that your engine should use the new incoming sequence number after the Sequence Reset. Given that this works properly for thousands of QuickFIX/J users, think about what you might be doing that would change that behavior. For example, do you have an admin message callback and might it be throwing exceptions. Have you looked at your log files to see if there are any hints there?
(Question #2). If you are using a persistent MessageStore (FileStore, JdbcStore, etc.) then your outgoing sequence number will be available when you restart.