Session level and application lvl seqNumber? - quickfix

Does quickfix/j maintain same sequence of seqNum for both type of messages(session and application lvl messages) or each level has separate sequence of seqNum.
Suppose I have a connection running, in the same time I'm getting orders msgType=D and quickfix session level message heartbeat msgType=0.
currecnt seqNum of msgType=10, and in the mean while order msgType come in, i want to know what will be the ser number of msgType=D. eother seqNum will be 11 or 1?

It's the same sequence for all types of messages.
Note that incoming and outgoing, however, each have their own sequence.

Related

CQRS + ES Implementation Advice

I'm working on a generic CQRS + ES framework (with nodejs) in the company. Remark: Only RDBMS + Redis (without AOF/RDB persistence) is allowed due to some reasons.
I really need some advices on how to implement the CQRS + ES framework....
Ignoring the ES part, I'm struggling with the implementation on the message propagation.
Here is the tables I have in the RDBMS.
EventStore: [aggregateId (varchar), aggregateType (varchar), aggregateVersion (bigint), messageId (varchar), messageData (varchar), messageMetadata (varchar), sequenceNumber (bigint)]
EventDelivery: [messageId (varchar, foreign key to EventStore), sequenceId (equal to aggregateId, varchar), sequenceNumber (equal to the one in EventStore, bigint)]
ConsumerGroup: [consumerGroup (varchar), lastSequenceNumberSeen (bigint)]
And I have multiple EventSubscriber
// In Application 1
#EventSubscriber("consumerGroup1", AccountOpenedEvent)
...
// In Application 2
#EventSubscriber("consumerGroup2", AccountOpenedEvent)
...
Here is the the flow when an AccountOpenedEvent is written to EventStore table.
For each application (i.e application 1 and application 2), it will scan the codebase to obtain all the #EventSubscriber, create a consumer group in ConsumerGroup table with lastSequeneNumberSeen = 0, then having a scheduler (with 100ms polling interval) to poll all the interested events (group by consumer group) in EventStore with condition sequeneNumber >= lastSequeneNumberSeen.
For each event (EventStore) in step 1, calculate the sequenceId (here the sequenceId is equal to aggregateId), this sequenceId (together with the sequenceNumber) is used to guarantee the message delivery ordering. Persist it into EventDelivery table, and update the lastSequeneNumberSeen = sequenceNumber (this is to prevent duplicate event being scanned in next interval).
For each application (i.e application 1 and application 2), we have another scheduler (also with 100ms polling interval) to poll the EventDelivery table (group by seqeunceId and order by sequenceNumber ASC).
For each event (EventDelivery) in step 3, call the corresponding message handler, after message is handled, acknowledge the message by deleting the record in EventDelivery.
Since I have 2 applications, I have to separate the AccountOpenedEvent in EventStore into 2 transactions, supposing 2 applications don't know each other, I can only do it passively. Thats why I need the EventDelivery table and polling scheduler.
Assuming I can use redlock + cron to make sure there is only 1 instance do the polling jobs, in case application 1 have more than 1 replicas.
Application 1 will poll the AccountOpenedEvent and create a record in EventDelivery, and store the lastSequenceNumberSeen in its consumer group.
Application 2 will also poll the AccountOpenedEvent and create a record in EventDelivery and store the lastSequenceNumberSeen in its consumer group.
Since application 1 and application 2 are different consumer group, they treat the event store stream separately.
Here is a problem, we have 2 schedulers and we would have more if there are more consumer group, these will make heavy traffic loads to the database. How to solve this? One of my solution is convert these 2 schedulers to a job and put these jobs into queue, the queue will handle the jobs per interval (lets say 100ms), but seems like this would introduce large latency if the job is unfortunately placed at the end of the queue.
Here is the 2nd problem, in the above flow, I introduced the 2nd polling job to guarantee the message delivery ordering. But unlike the first one, I don't have the lastSequenceNumberSeen, the 2nd polling job will remove the job in EventDelivery if the message is handled. But it is common a message would be handled over 100ms. If thats in case, the same event in EventDelivery will be scanned again.
I'm not sure the common practice. I'm quite struggling on how to implement this. I did lots of research on the internet. I see some of them implement the message propagation by using Debezium + Kafka (Although I cannot use these 2 tools, I still cannot understand how it works).
I know Debezium using CDC approach to tail the transaction logs of RDBMS and forward the message to Kafka. And I see some recommendations that we should not have multiple subscription on the same transaction log. Let's say Debezium guaranteed the event can be propagated to Kafka, it means I need applciation 1 and applciation 2 subscribe the Kafka topic, both should belongs to different consumer group (also use aggregateId as partition key). Since Kafka guaranteed the message ordering, everything should work fine. But I don't think Kafka would store all the message from the most beginning, lets say it is configured to store 1000000 messages, when the message handler keep failed due to unexpected reason, the 1000000 messages after this failed message cannot be handled, the 1000001th event will get lost... Although this is rare case, I'm not sure I understand it right or not, the database table is the most reliable source to trust as it store all the events from the most beginning, if the system suffer from this case, is that mean I need to manually republish all the events to Kafka to recover the projection model?
And other case, if I have new event subscriber, which need to historical events to build the projection model. With Debezium + Kafka, we need assign a new consumerGroup and configured it to read the Kafka stream from the most beginning? It has the same problem as the consumerGroup can only get the last 1000000 events... But this is not a case if we poll the database table directly instead.
I don't understand why most implementation doesn't poll the database table but make use of message broker.
And, I really need advice on how to implement a CQRS + ES framework.... especially the message propagation part (keep in mind I can only use RDBMS + Redis(without persistence))....

Kafka Topology Design: How to do sliding window join and emit events on timeout? [Hard]

I have a set of requirements as below:
Message 'T' arrives, must wait for 5 seconds for corresponding message in 'A' to arrive (with same key). If it comes within 5 seconds, then send joined values and send downstream. If it does not come within 5 seconds, send only 'T' message downstream.
Message 'A' arrives, must wait for 5 seconds for corresponding message in 'T' to arrive (with same key). If it comes within 5 seconds, then send joined values and send downstream. If it does not come within 5 seconds, send only 'A' message downstream.
My current thinking was to do a KStream-KStream Sliding Window OUTER join. However, that does not wait for 5 seconds before sending the (T, null) or (null, T) message downstream (that is done instantly).
I need to wait for a timeout to happen, and if a join did not occur, then send the unjoined message through.
I've attached a diagram to help make sense of the cases. I am trying to use DSL as much as possible.
Any help appreciated.
Okay I found a fairly hacky solution that i'm still evaluating, but will work for this scenario.
I can simply groupByKey at the end and then suppress until window expires, with an unbounded buffer.

Kafka KStream-KTable join race condition

I have the following:
KTable<Integer, A> tableA = builder.table("A");
KStream<Integer, B> streamB = builder.stream("B");
Messages in streamB need to be enriched with data from tableA.
Example data:
Topic A: (1, {name=john})
Topic B: (1, {type=create,...}), (1, {type=update,...}), (1, {type=update...})
In a perfect world, I would like to do
streamB.join(tableA, (b, a) -> { b.name = a.name; return b; })
.selectKey((k,b) -> b.name)
.to("C");
Unfortunately this does not work for me because my data is such that every time a message is written to topic A, a corresponding message is also written to topic B (the source is a single DB transaction). Now after this initial 'creation' transaction topic B will keep receiving more messages. Sometimes several events per seconds will show up on topic B but it is also possible to have consecutive events hours apart for a given key.
The reason the simple solution does not work is that the original 'creation' transaction causes a race condition: Topic A and B get their message almost simultaneously and if the B message reaches the 'join' part of the topology first (say a few ms before the A message gets there) the tableA will not yet contain a corresponding entry. At this point the event is lost. I can see this happening on topic C: some events show up, some don't (if I use a leftJoin, all events show up but some have null key which is equivalent to being lost). This is only a problem for the initial 'creation' transaction. After that every time an event arrives on topic B, the corresponding entry exists in tableA.
So my question is: how do you fix this?
My current solution is ugly. What I do is that I created a 'collection of B' and read topic B using
B.groupByKey()
.aggregate(() -> new CollectionOfB(), (id, b, agg) -> agg.add(b));
.join(tableA, ...);
Now we have a KTable-KTable join, which is not susceptible to this race condition. The reason I consider this 'ugly' is because after each join, I have to send a special message back to topic B that essentially says "remove the event(s) that I just processed from the collection". If this special message is not sent to topic B, the collection will keep growing and every event in the collection will be reported on every join.
Currently I'm investigating whether a window join would work (read both A and B into KStreams and use a windowed join). I'm not sure that this will work either because there is no upper bound on the size of the window. I want to say, "window starts 1 second 'before' and ends infinity seconds 'after'". Even if I can somehow make this work, I am a bit concerned with the space requirement of having an unbounded window.
Any suggestion would be greatly appreciated.
Not sure what version you are using, but latest Kafka 2.1 improves the stream-table-join. Even before 2.1, the following holds:
stream-table join is base on event-time
Kafka Streams processes messages based on event-time, however, in offset-order (for two input streams, the stream with smaller record timestamps is processed first)
if you want to ensure that the table is updated first, the table update record should have a smaller timestamp than the stream record
Since 2.1:
to allow for some delay, you can configure max.task.idle.ms configuration to delay processing for the case that only one input topic has input data
The event-time processing order is implemented as best-effort in 2.0 and earlier versions what can lead to the race condition you describe. In 2.1, processing order is guaranteed and might only be violated if max.task.idle.ms hits.
For details, see https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization

quickfixj initiator manually resend reset to a seqnum at logon

I have a quickfixj initiator connecting to vendor's acceptor and receiving messages. I keep the fix messages in a buffer which is processed by a thread. To avoid loosing the message in case crash with message in the buffer, I have the last seqnum processed, and plan to send resend message for that next seqnum on my side when I reconnect.
I know the better solution would be that I save the messages before I receives them, but the design is to avoid doing any db access in the onMessage call.
I didn't find any example how this could be done, resending request for a specific seqnum. Should I simply overload the logon message and send the seqnum?
Anyone has an example?
I guess you are already in synch as per the last thread if quickfixj crash in onmessage, will I lose my current message?.
QuickFixJ manages 2 sequence numbers:
SenderSequenceNum: Sequence number used in sending messages.
TargetSequenceNum: Sequence number expected to receive.
So you have two options:
Option 1: Process the receive messages on the QuickfixJ onMessage() callback thread. So that in case of an exception the sequence number does not increment. And QuickFixJ automatically sends the resend request on receiving next fix message as it will detect the sequence gap.
Option 2: Persist the sequence number that you have successfully processed. In case of crash, on restart you can set the expected receive sequence number using:
Session.lookupSession(session_).setNextTargetMsgSeqNum();
So if you receive a sequence number higher than that, QuickfixJ automatically sends resend the request.
Note: Do not change the sender sequence number then another party will receive a sequence number lower than expected and can cause disconnection.

Why are resent messages discarded in QuickFIX?

I have a QuickFIX/J application running as acceptor. ResetOnLogon is N in the configuration.
When the initiator is logged on, since the seq nums are different the initiator app sends the messages and I see those messages in the FIX log file. The first one of those message is passed to the application layer but the others are not, all are discarded.
What can be the reason that the messages are received but not passed to the application level?
The most likely reason for this is that the messages contain the PossDupFlag <43> with a 'Y' value, and a MsgSeqNum <34> that is infact recognized as a dupe by the engine. In that case you won't receive these as application level messages.