Why are resent messages discarded in QuickFIX? - quickfix

I have a QuickFIX/J application running as acceptor. ResetOnLogon is N in the configuration.
When the initiator is logged on, since the seq nums are different the initiator app sends the messages and I see those messages in the FIX log file. The first one of those message is passed to the application layer but the others are not, all are discarded.
What can be the reason that the messages are received but not passed to the application level?

The most likely reason for this is that the messages contain the PossDupFlag <43> with a 'Y' value, and a MsgSeqNum <34> that is infact recognized as a dupe by the engine. In that case you won't receive these as application level messages.

Related

Is there a way to explicitly acknowledge message receipt with QuickFIX/J?

For a guaranteed message receiver, in an ACK-based protocol like Apache Kafka, TIBCO EMS/RVCM, IBM MQ and JMS there is a way to explicitly acknowledge the receipt of a message. Explicit Acks are not just automatically sent when you return from a dispatcher's callback but an extra method on the session or message to say "I've processed this message". The reason for the existence of this explicit ack is that you can safely queue received messages to be processed by another thread at a later time and then only call this explicit-ack method once your are really done processing this message (safely storing to DB, forwarding to another MOM, etc.) Having this explicit method ensures that you are not losing messages even when you crash after receiving messages but didn't process them yet.
Now with QuickFIX/J (of FIX in general) I know it's not ACK-based but instead persists the last received SeqNum in a file and instead of sendings Acks, message guarantee is achieved by sending ResendRequests for missed SeqNums. But still, is there a way to tell the QuickFIX/J API "I don't automatically want you to persist this last SeqNum once I exit this onMessage() callback but hold off until I tell you so". In other words is there a Session variation which doesn't persist SeqNums automatically and then I can call something on the FIX message to persist this last Seqnum once I've really processed/saved that message ?
(If this feature doesn't exist I think it would be a good addition to the API)

quickfixj initiator manually resend reset to a seqnum at logon

I have a quickfixj initiator connecting to vendor's acceptor and receiving messages. I keep the fix messages in a buffer which is processed by a thread. To avoid loosing the message in case crash with message in the buffer, I have the last seqnum processed, and plan to send resend message for that next seqnum on my side when I reconnect.
I know the better solution would be that I save the messages before I receives them, but the design is to avoid doing any db access in the onMessage call.
I didn't find any example how this could be done, resending request for a specific seqnum. Should I simply overload the logon message and send the seqnum?
Anyone has an example?
I guess you are already in synch as per the last thread if quickfixj crash in onmessage, will I lose my current message?.
QuickFixJ manages 2 sequence numbers:
SenderSequenceNum: Sequence number used in sending messages.
TargetSequenceNum: Sequence number expected to receive.
So you have two options:
Option 1: Process the receive messages on the QuickfixJ onMessage() callback thread. So that in case of an exception the sequence number does not increment. And QuickFixJ automatically sends the resend request on receiving next fix message as it will detect the sequence gap.
Option 2: Persist the sequence number that you have successfully processed. In case of crash, on restart you can set the expected receive sequence number using:
Session.lookupSession(session_).setNextTargetMsgSeqNum();
So if you receive a sequence number higher than that, QuickfixJ automatically sends resend the request.
Note: Do not change the sender sequence number then another party will receive a sequence number lower than expected and can cause disconnection.

quickfixj initiator disconnecting due to low seqnum too low

quickfixj initiator getting Disconnecting: Encountered END_OF_STREAM while trying to logon to the acceptor. We are using vendor's fix engine as acceptor. and feedback from acceptor is that logon request for xxxx was not accepted, incoming too small, expect 305, received 27.
I read the quickfix documentation but didn't get it exactly what's the proper solution for the sequence number mismatch. I understand that if I am disconnected, my initiator will send an 35=4 for resend with initiator side seqnum asking acceptor to resend the messages and fill up the gap.
But in what case, if initiator is sending a lower seqnum will be rejected by acceptor and refuse the connection?
And what's the proper procedure to handle this kind of rejection and reconnect? In order to not loose any message, how should both side do the reset and fill the gap?
In case there is a break between the initiator and acceptor, what's the recommended solution to keep the messages in sync and not loosing any?
Due to the first sentence of your question I would like to show you an answer to the same error message Disconnecting: Encountered END_OF_STREAM. There is a blog post by bhageera quoted.
In the end the reason was pretty silly… the counterparty I was connecting to allows only 1 connection per user/password (i.e. session with those credentials) at a time. As it turns out there was another application using the same credentials against the same TargetCompID. As soon as that application was killed off, the current one logged in fine.
I searched for the cause of the bug for a while, until I realized that I had two initiators with the same credentials running on two different test environments.
According to default logic in QuickfixJ:
QuickfixJ manages 2 sequence number, expectedSeqNum to receive(targetSeqNum) and nextSeqNumber to sent.
Check the next expected target SeqNum against the received SeqNum.If a mismatch is detected, apply the following logic:
if lower than expected SeqNum, logout
if higher, send a resend request
In your case received was lower than expected so it gets disconnected.
Reason for receiving higher than expected SeqNum:
Receiver misses some message so it could be a normal scenario.
Reason for lower than expected SeqNum(Your case):
One of the counterparties resets its sequence number, which is not expected it should be agreed by both the counterparties.
In a normal scenario, whenever you miss the message you will receive a higher number and it would be managed by QuickFixJ.

JMS messages moving to DLQ

JMS mesages are sometimes moving to the DLQ without throwing any exception.
Jboss server instance used is 4.3.0.GA_CP04_EAP.
We are using an an MDB that listens for incoming messages on a queue A, when it receives any message it updates the database and sens an email in one transaction.Transaction is CMT.
Now, what is happening is, sometimes mesages are not picked up by the consumer and they end up in the DLQ. Though from the JMX- console message count i could see that the message once did arrive to the queue A but then goes to the DLQ.
This happens intermittently and does not throw any exceptions on the logs either .
What seems to work most of the times is restarting the servers. No idea about what happens behind the scenes though.
**And after 29 days, same problem has returned.
This follows a pattern but varies with every restart.
There are 2 clustered serevrs which also do loadbalancing , P1 and P2.
First two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email sent
Next two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email NOT SENT
and the cycle repeats
I have found a workaround to this nagging problem thanks to the helpful info found at http://leakfromjavaheap.blogspot.in/2013/05/when-dead-letter-queue-becomes-zombie.html
DLQ listener is set up to listen for any incoming messages and puts them back to their intended destination if any of them is found on DLQ.
Also, considering the situation where any message is travelling from DLQ to the Queue and back to the DLQ in endless loops, a counter is set to check how many times the message has been to the DLQ before, if it exceeds the limit, then it is put to a Permanent DLQ (DLQ for a DLQ).
Application has been running smoothly ever since.
If you can provide the log details when message goes to DLQ, would be better to dig into this issue.
The logs did not contain any useful info; not even an exception to give a hint.
Finally,changed the local tx data source to xa data source and it was a success.Still wondering if there is a reason behind it.

What's the expected behavior when TCP connection is lost?

I looked through FIX v4.2 spec, it is not clear to me what the expected behavior it should be when the TCP connection is lost in the middle of a session.
More specifically, suppose the current sequence number is 100 and at this point the TCP connection is lost, when either side tries to resume the session, it re-sends message number 100, or starts a new session with logon?
In describing FIX session, the spec says one session has one logon and one logout, but could go across multiple physical connections. This leads me to think that when the TCP connection is lost, the resuming process should not be starting with a logon message, but I am not positive on that.
Thanks in advance!
FIX protocol does not define anything related to the transport protocol. There were some documents on the official web site that only suggest how it can be implemented on top of this or that protocol, but only suggests.
Therefore, the expected behavior in case of TCP/IP disconnect depends on implementation. For instance, it is possible to have a system that does not care about TCP/IP disconnects at all, which would make those details irrelevant. In that case, the expected behavior would have been to continue sending receiving messages after connection is re-established, and of course proceed to a “recovery” of lost messages, if any. In reality, though, I have never seen a system like that.
In practice, all systems treat TCP/IP disconnects as implicit lose of session and expect clients to send a logon upon re-connect.
When logging in, there are two options — a re-connecting session may send the next outgoing sequence number or it may ask server to reset the sequence (to 1). In first case, the server side may send a logon acknowledgement if sequence is greater or equal to what it expected, or close (or even reject) the session if the received sequence number is less than expected. Additionally, if the sequence was greater than expected, server will issue a re-transmission. Client session monitors the sequence of the server as well, and needs to request a re-transmission if it detects a gap (received sequence is greater than expected). In the second case, if the server supports sequence reset, both in and out sequences are reset to 1 and no messages are recovered.
In your case, if connection is lost after sending a message with sequence number 100, client would have to re-connect and send a logon with sequence 101, and proceed from there. Alternatively, connect and reset the sequence, in which case some messages might get lost.
Also, don’t forget to check specifics of the venue you connect to. There could be very weird details that are not specified by the FIX protocol at all, or even those going against the FIX protocol. For instance, ICE (indeed one of the most brain-dead exchanges in general) is one of the silliest exchanges in this regard — it doesn’t allow re-connecting within first 15 seconds, and then if clients cannot connect for 30 seconds, they should switch to a failover server. If failover happens, they fail to keep the sequence number in tact, and clients are left no choice but reset the sequence number.
Hope it makes things a bit clearer for you. Good Luck!
If the transport layer is TCP/IP, I would expect the session initator to:
Re-establish a socket connection
Send a new logon message
The sequence number to use on the logon message depends on the type of session and what has been agreed with the FIX session acceptor (see the spec for details). For sessions where there is no value in replaying any lot messages e.g. market data feeds where the prices would be stale, it makes sense to send a logon message with sequence number 1 and set tag 141=Y (to reset the sequence numbers). For an orders session, where message replay might be required, the session initiator should generally logon with a sequence number of one greater than the last message sent (and expect a logon response from the FIX session acceptor with sequence number of 1 greater than the last message received).
Unless you really need the message replay, it is cleaner and easier to reset the sequence numbers each time upon logon. This obviously depends on the FIX session acceptor (FIX server) support for this. For things like STP feeds, I've found this to be far more reliable and it is generally better for the application protocol to provide application level replay facilities rather than relying on the brittleness of FIX session replay.