FIX - Determining message length? - fix-protocol

In FIX what determines a message length? Because I have read if a message exceeds its length it will be sent in fragments.

Splitting up large messages is not an implicit part of the FIX protocol.
Some counterparties may choose to split up data into multiple messages instead of sending giant messages, but they don't have to. In my experience, I've seen counterparties send ridiculously large messages.
If a message is split up, it's because the sending party chose to implement their system that way.

Related

How does the FIX protocol handle a message sequence number overflow?

We are currently incorporating a FIX engine (using QuickFixJ) in our application. We will be the initiator and use trade capture reports to get informed on all trades happening on the platform.
The trading (and thus the FIX session) will be running 24/7 and we are currently looking into ways to handle this properly. Our concern is that at some point we will need to reset the message sequence numbers to avoid an overflow. We would ideally not want to reset the sequence number as we need to be sure that we catch every single trade. We are worried about the following scenario:
We send a SequenceReset message
Our system crashes due to unrelated reasons
The acceptor side send us one or more TradeCaptureReport messages
Only now does the acceptor side receive our SequenceReset message
Our system has recovered and sends a ResendRequest message, with BeginSeqNo equal to 1 (because we have reset the message sequence number)
We do not get the TradeCaptureReport messages from (3.)
However, we have noticed that in case of a message sequence overflow, neither our engine nor the acceptor side seem to be troubled by this.
The example I have tested is simply sending heartbeats which will overflow the sequence number:
8=FIXT.1.19=13135=A34=149=INITIATOR50=INITIATOR52=20220901-15:26:03.40356=ACCEPTOR98=0108=10141=Y553=INITIATOR554=password1137=910=224
8=FIXT.1.19=00010235=A49=ACCEPTOR56=INITIATOR34=157=INITIATOR52=20220901-15:26:03.65498=0108=10141=Y1409=01137=910=212
8=FIXT.1.19=9035=434=249=INITIATOR50=INITIATOR52=20220901-15:26:03.71856=ACCEPTOR36=2147483646123=Y10=038
8=FIXT.1.19=00007035=049=ACCEPTOR56=INITIATOR34=257=INITIATOR52=20220901-15:26:13.79210=009
8=FIXT.1.19=7935=034=214748364649=INITIATOR50=INITIATOR52=20220901-15:26:13.78956=ACCEPTOR10=044
8=FIXT.1.19=00007035=049=ACCEPTOR56=INITIATOR34=357=INITIATOR52=20220901-15:26:23.85210=008
8=FIXT.1.19=7935=034=214748364749=INITIATOR50=INITIATOR52=20220901-15:26:23.85056=ACCEPTOR10=035
8=FIXT.1.19=00007035=049=ACCEPTOR56=INITIATOR34=457=INITIATOR52=20220901-15:26:33.89610=018
8=FIXT.1.19=8035=034=-214748364849=INITIATOR50=INITIATOR52=20220901-15:26:33.89256=ACCEPTOR10=080
8=FIXT.1.19=00007035=049=ACCEPTOR56=INITIATOR34=557=INITIATOR52=20220901-15:26:43.93310=012
8=FIXT.1.19=8035=034=-214748364749=INITIATOR50=INITIATOR52=20220901-15:26:43.93256=ACCEPTOR10=075
Is this a feature of the FIX protocol or is it undefined behaviour (and just works coincidentally)? And if this doesn't work (or is discouraged), is there a best way to handle ongoing FIX sessions? We have not found any usable information and most exchanges we have seen simply reset once a day.
I think the title of the question should rather be "how does a FIX engine handle message sequence number overflow".
As per the FIX spec the sequence number is always positive: FIX datatypes
Sequence of character digits without commas or decimals. Value must be
positive.
I can only speak for QuickFIX/J: internally the sequence number is of type java.lang.Integer which means its maximum positive value is 2147483647.
Now when QuickFIX/J (or any other engine) accepts or uses negative sequence numbers it clearly is a bug.
Maybe you should approach your Exchange how other clients handle this. I think at some point they have a time window where sequence numbers can (and should) be reset.
I guess the exchange handles it like outlined here: FIX session 24-hour connectivity

TCPSteam package combine multiple packages

I have a question regarding the TCPStream package in Rust. I want to read data from a server. The problem is that it is not guaranteed that the data is sent in one TCP package.
And here comes my question:
Is the read message capable of reading more than one package, or do I have to call it more than one? Is there any "best practice"?
From the user space TCP packets are not visible and their boundaries don't matter. Instead user space reads only a byte stream and writes only to a byte stream. Packetizing is done at a lower level in a way to be optimal for latency and bandwidth. It might well happen that multiple write from user space end up in the same packet and it might also happen that a single write will result in multiple packets. And the same is true with read: it might get part of a packet, it might get the payload taken from multiple consecutive packets ...
Any packet boundaries from the underlying transport are no longer visible from user space. Thus protocols using TCP must implement their own message semantic on top of the byte stream.
All of this is not specific to Rust, but applies to other programming languages too.

TCP Socket Read Variable Length Data w/o Framing or Size Indicators

I am currently writing code to transfer data to a remote vendor. The transfer will take place over a TCP socket. The problem I have is the data is variable length and there are no framing or size markers. Sending the data is no problem, but I am unsure of the best way to handle the returned data.
The data is comprised of distinct "messages" but they do not have a fixed size. Each message has an 8 or 16 byte bitmap that indicates what components are included in this message. Some components are fixed length and some are variable. Each variable length component has a size prefix for that portion of the overall message.
When I first open the socket I will send over messages and each one should receive a response. When I begin reading data I should be at the start of a message. I will need to interpret the bitmap to know what message fields are included. As the data arrives I will have to validate that each field indicated by the bitmap is present and of the correct size.
Once I have read all of the first message, the next one starts. My concern is if the transmission gets cut partway through a message, how can I recover and correctly find the next message start?
I will have to simulate a connection failure and my code needs to automatically retry a set number of times before canceling that message.
I have no control over the code on the remote end and cannot get framing bytes or size prefixes added to the messages.
Best practices, design patterns, or ideas on the best way to handle this are all welcomed.
From a user's point of view, TCP is a stream of data, just like you might receive over a serial port. There are no packets and no markers.
A non-blocking read/recv call will return you what has currently arrived at which point you can parse that. If, while parsing, you run out of data before reaching the end of the message, read/recv more data and continue parsing. Rinse. Repeat. Note that you could get more bytes than needed for a specific message if another has followed on its heels.
A TCP stream will not lose or re-order bytes. A message will not get truncated unless the connection gets broken or the sender has a bug (e.g. was only able to write/send part and then never tried to write/send the rest). You cannot continue a TCP stream that is broken. You can only open a new one and start fresh.
A TCP stream cannot be "cut" mid-message and then resumed.
If there is a short enough break in transmission then the O/S at each end will cope, and packets retransmitted as necessary, but that is invisible to the end user application - as far as it's concerned the stream is contiguous.
If the TCP connection does drop completely, both ends will have to re-open the connection. At that point, the transmitting system ought to start over at a new message boundary.
For something like this you would probably have a lot easier of a time using a networking framework (like netty), or a different IO mechansim entirely, like Iteratee IO with Play 2.0.

nServiceBus with large XML messages

I have read about the true messaging and that instead of sending payload on the bus, it sends an identifier. In our case, we have a lot of legacy apps/services and those were designed to receive the payload of messages (xml) that is close to 4MB (close MSMQ limit). Is there a way for nService bus to handle large payload and persist messages automatically or another work-around, so that the publisher/subscriber services don't have to worry neither about the payload size, nor about how to de/re-hydrate the payload?
Thank you in advance.
You could use the Message Sequence pattern. In NServiceBus, you would split the payload in the sender, wrap the chunks in a custom 'Sequence' IMessage, and then implement a saga at the other end to extract the chunks & reassemble. You would need to put some effort into error handling & timeouts.
You can always use the quick "fix" of compressing the messages.
A POCO serialized with the binary serializer can be compressed down by a large margin. We saw our messages that were 20mb compressed down to 3.1mb.
So if your messages are hovering around 4mb it might be simple to just write an IMessageSerializer that automatically compresses the message while it is on the wire.
I'm not aware of any internal NServiceBus capability to associate extra data with a message out of band.
I think you're right on the mark - if the entire payload can't fit within the limit, then it's better to persist it elsewhere on your own and then passing an ID.
However, it may be possible for you to design a message structure such that a message could implement an IHasPayload interface (which would perhaps incorporate an ID and a Type?), and then your application logic could have a common method for getting the payload given an IHasPayload message.

Replacing a message in a jms queue

I am using activemq to pass requests between different processes. In some cases, I have multiple, duplicate message (which are requests) in the queue. I would like to have only one. Is there a way to send a message in a way that it will replace an older message with similar attributes? If there isn't, is there a way to inspect the queue and check for a message with specific attributes (in this case I will not send the new message if an older one exists).
Clarrification (based on Dave's answer): I am actually trying to make sure that there aren't any duplicate messages on the queue to reduce the amount of processing that is happening whenever the consumer gets the message. Hence I would like either to replace a message or not even put it on the queue.
Thanks.
This sounds like an ideal use case for the Idempotent Consumer which removes duplicates from a queue or topic.
The following example shows how to do this with Apache Camel which is the easiest way to implement any of the Enterprise Integration Patterns, particularly if you are using ActiveMQ which comes with Camel integrated out of the box
from("activemq:queueA").
idempotentConsumer(memoryMessageIdRepository(200)).
header("myHeader").
to("activemq:queueB");
The only trick to this is making sure there's an easy way to calculate a unique ID expression on each message - such as pulling out an XPath from the document or using as in the above example some unique message header
You could browse the queue and use selectors to identify the message. However, unless you have a small amount of messages this won't scale very well. Instead, you message should just be a pointer to a database-record (or set of records). That way you can update the record and whoever gets the message will then access the latest version of the record.