Aggregator pattern in Mirth

Aggregator pattern in Mirth - mirth

I am receiving a large number of correlated HL7 messages in Mirth. They contain an ID which is always the same for all correlated messages and they always come within a minute. Multiple batches can be received at the same time. It's hard to say when the batch ends, but when there are no more messages for a minute, it's safe to assume that the batch has finished.
How could I implement an aggregator pattern in Mirth that would keep reading and completing correlated messages and send the completed message after it didn't receive any new messages with the same ID within a defined time interval?

You may drop all incoming message to a folder and store the message ID in a Global Map. Once new messages start to arrive with the message ID different than the one stored in the map (meaning that the next sequence is started), trigger another channel either by sending the message ID it needs to look for or in some other way. After that replace the message ID in the Global Map with the message ID of a new sequence.
If that sounds too complicated, you may do the same, but the second channel will constantly scan the folder (File Reader) and grab only files having the same message ID and older than a minute from a current time (which is in my mind is too vague qualifier).

I've implemented this by saving all messages in a folder using an ID inside the message (that identifies the sequence) as the file name. The file gets updated with each new message. Several sequences live together in the same folder.
The next channel is using a simple file reader that only fetches the files that are a minute or more old.

Related

MongoDB: Schema for storing read receipts of messages in a chat system

We are implementing read receipts in a chat system. We want to store the timestamp when each message was read by the participants. (i.e messageId, participantId and timestamp), and display this when the user wants to see message info on the frontend. Other than this we also want to display ticks next to each message, depending on whether all participants have seen the message.
We are using sockets for sending the event when the message(s) is seen.
As in a chat system, the number of messages will keep growing, and we are concerned about the scalability of the system.
We have thought of the following schemas/approaches as of now:
Creating entries for the participant and the message in the 'ReadMessages' collection when a message is seen. When fetching the messages, it has to be populated with ‘ReadMessages'.
Embedding this data in the messages collection itself. This approach would require frequent updates to the 'Messages' collection, and probably slowing down reads on this collection?
Creating entries in the 'ReadMessages' collection when a message is seen by a participant, and storing a flag named 'Read' in the 'Messages' collection, that will be re-calculated and updated (if required), whenever a message is seen. This approach doesn't require populating while fetching messages, and also minimizes updates.
Similar as 3. Just the status 'Read' in the messages collection will be updated in a cron job. This will require lesser processing when a message seen event occurs, but this cron job will have to scan through all messages to see which are not marked as 'Read' yet (as the index is only on _id). Also, read receipts on the client will be updated with a delay depending upon the interval in which the cron job is run.
Which of these would be better in the long run?
Or is there any other better way schema for this can be designed?
Also, could multiple concurrent inserts to the collection cause performance problems?

How to process a record in Kafka based on the processing result of another record?

I have a #KafkaListener class that listens to a particular topic and consumes records that contain either a Person object or a Phone object (and only one of them). Every Phone has a reference / correlation id to the corresponding Person. The listener class performs certain validations that are specific to the type received, saves the object into a database and produces a transfer success / failed response back to Kafka that is consumed by another service.
So a Person can successfully be transferred without any corresponding Phone, but a Phone transfer should only succeed if the corresponding Person transfer has succeeded. I can't wrap my head around how to implement this "synchronization", because Persons and Phones get into Kafka independently as separate records and it's not guaranteed that the Person corresponding to a particular Phone will be processed before the Phone.
Is it at all possible to have such a synchronization given the current architecture or should I redesign the producer and send a Person / Phone pair as a separate type?
Thanks.

It's not clear how you're using the same serializer for different object types, but you should probably create separate topics and/or branch your current one into two (refer Kafka Streams API)
I assume there are less people than phones, in which case you could build a KTable from a people topic, then as you get phone records, you can perform a left join or lookup against this table for some person ID
Other solutions could involve using Kafka Connect to dump records into a system where you can do the join

What does a consumer of a RESTful events atom feed have to remember?

I am researching Atom feeds as a way of distributing event data as part of our organisation's internal REST APIs. I can control the feeds and ensure:
there is a "head" feed containing time-ordered events with an etag which updates if the feed changes (and short cache headers).
there are "archive" feeds containing older events with a fixed etag (and long cache headers).
the events are timestamped and immutable, i.e. they happened and can't change.
The question is, what must the consumer remember to be sure to synchronize itself with the latest data at any time, without double processing of events?
The last etag it processed?
The timestamp of the last event it processed?
I suppose it needs both? The etag to efficiently ask the feed if there's been any changes, (using HTTP If-None-Match) and if so, then use the datestamp to apply only the changes from that updated feed that haven't already been processed...
The question is nothing particularly to do with REST or the technology used to consume the feed. It would apply for anyone writing code to consume an Atom based RSS feed reader, for example.
UPDATE
Thinking about it - some of the events may have the same timestamp, as they get "detected" at the same time in batches. Could be awkward then for the consumer to rely on the timestamp of the last event successfully processed in case its processing dies half way through processing a batch with the same timestamp... This is why I hate timestamps!
In that case does the feed need to send an id with every event that the consumer has to remember instead? Wouldn't that id have to increment to eternity, and never ever be reset? What are the alternatives?

Your events should all carry a unique ID. A client is required to track those IDs, and that it is enough to prevent double-processing.
In that case does the feed need to send an id with every event that the consumer has to remember instead?
Yes. An atom:entry is required to have an atom:id that is unique. If your events are immutable, uniqueness of the ID is enough. In general, entries aren't required to be immutable. atom:updated contains the last significant change:
the most
recent instant in time when an entry or feed was modified in a way
the publisher considers significant
So a general client would need to consider the pair of id and updated.

Mule messages currently in the VM Queue

How can I get copy of all messages (or references of all messages) that are on a VMQueue?
I want to loop through the list of messages that are currently on VMQueue, check the payload of every message, and based on that make decision about next step in the flow.
Thanks.

You have to consume them with a vm:inbound-endpoint, you can't peek or browse them without actually taking them out of the queue.

Replacing a message in a jms queue

I am using activemq to pass requests between different processes. In some cases, I have multiple, duplicate message (which are requests) in the queue. I would like to have only one. Is there a way to send a message in a way that it will replace an older message with similar attributes? If there isn't, is there a way to inspect the queue and check for a message with specific attributes (in this case I will not send the new message if an older one exists).
Clarrification (based on Dave's answer): I am actually trying to make sure that there aren't any duplicate messages on the queue to reduce the amount of processing that is happening whenever the consumer gets the message. Hence I would like either to replace a message or not even put it on the queue.
Thanks.

This sounds like an ideal use case for the Idempotent Consumer which removes duplicates from a queue or topic.
The following example shows how to do this with Apache Camel which is the easiest way to implement any of the Enterprise Integration Patterns, particularly if you are using ActiveMQ which comes with Camel integrated out of the box
from("activemq:queueA").
idempotentConsumer(memoryMessageIdRepository(200)).
header("myHeader").
to("activemq:queueB");
The only trick to this is making sure there's an easy way to calculate a unique ID expression on each message - such as pulling out an XPath from the document or using as in the above example some unique message header

You could browse the queue and use selectors to identify the message. However, unless you have a small amount of messages this won't scale very well. Instead, you message should just be a pointer to a database-record (or set of records). That way you can update the record and whoever gets the message will then access the latest version of the record.