I have a distributed publisher service, which puts items in RabbitMQ. I want to avoid putting duplicate items into a queue in RabbitMQ.
Is there anyway that I check the existence of an item in RabbitMQ before putting a new item in?
RabbitMQ queues implementation do not allow to check for their content.
Nevertheless, if what you are after is message de-duplication you can check this plugin.
Related
I'm using ActiveMQ Artemis 2.18.0 and some Spring Boot clients that communicate with each other via topics. The Spring Boot clients use JMS for all MQTT operations.
I'd like to know if it is possible for a producer with one or more subscribers to find out whether a certain subscriber is actively listening or not. For example, there are 3 clients - SB1, SB2, and SB3. SB1 publishes to test/topic, and SB2 and SB3 are subscribed to test/topic. If SB2 shuts down for any reason would it be possible for SB1 to become aware of this?
I understand that queues would be the way to go for this, but my project is much better suited to the use of topics, and it is set up this way already and works fine. There's just one operation where it must be determined whether the listener is active or not in order to update the listener's online status, a crucial parameter. Right now, clients and the server continually poll a database so that the online status is periodically updated, I want to avoid doing this and use something that Artemis may provide instead.
Apache ActiveMQ Artemis emits notifications to inform listeners of potentially interesting events as consumer created or closed, see Management Notifications at http://activemq.apache.org/components/artemis/documentation/latest/management.html.
A listener of the management notification address would receive a message for each consumer created or closed, see the Management Notification Example at http://activemq.apache.org/components/artemis/documentation/latest/examples.html#management-notification
Part of the point to pub/sub based messaging is to decouple the information producer (publisher) from the consumer (subscriber). As a rule a published REALLY shouldn't care if there even are any subscribers.
If you want to know the status of the subscriber then it's up to the subscriber to update this, not the publisher. Things like the Last Will & Testament feature allow the subscriber to update it's status in the event of a failure to explicitly do it when going offline.
I have a system that uses mongoDB as persistence and RabbitMQ as Message broker. I have a challenge that I only want to implement transactional outbox for RabbitMQ publish fail scenarios. I'm not sure is it possible because, I also have consumers that is using same mongoDB persistence so when I'm writing a code that covers transactional outbox for RabbitMQ publish fail scenarios, published messages reaching consumers before mongoDB commitTransaction so my consumer couldn't find the message in mongoDB because of latency.
My code is something like below;
1- start session transaction
2- insert into document with session (so it doesn't persist until I call commit)
3- publish rabbitMQ
4- if success commitTransaction
5- if error insert into outbox document with session than commitTransaction
6- if something went wrong on mongoDB abortTransaction (if published succeed and mongoDB failed, my consumers first check for mongoDB existence and if it doesn't exist don't do anything.)
So the problem is in here messages reaching consumer earlier than
mongoDB persistence, do you advice any solution that covers my
problem?
As far as I can tell the architecture outlined in the picture in https://microservices.io/patterns/data/transactional-outbox.html maps directly to MongoDB change streams:
keep the transaction around 1
insert into the outbox table in the transaction
setup message relay process which requests a change stream on the outbox table and for every inserted document publishes a message to the message broker
The publication to message broker can be retried and the change stream reads can also be retried in case of any errors. You need to track resume tokens correctly, see e.g. https://docs.mongodb.com/ruby-driver/master/reference/change-streams/#resuming-a-change-stream.
Limitations of this approach:
only one message relay process, no scalability and no redundancy - if it dies you won't get notifications until it comes back
Your proposed solution has a different set of issues, for example by publishing notifications before committing you open yourself up to the possibility of the notification processor not being able to find the document it got out of the message broker as you said.
So I would like to share my solution.
Unfortunately it's not possible to implement transactional outbox pattern for only fail scenarios.
What I decided is, create an architecture around High Availability so;
MongoDB as High Available persistence and RabbitMQ as High Available message broker.
I removed all session transactions that I coded before and implemented immediate write and publish.
In worst case scenario:
1- insert into document (success)
2- rabbitmq publish (failed)
3- insert into outbox (failed)
What will I have is, unpublished documents in my mongo. Even in worst case scenario I could re publish messages from MongoDB with another application but I'll not write that application until I'll face with that case because we can not cover every fail scenarios on our code. So our message brokers or persistences must be high available.
I'm trying to find a messaging system that supports the following use case.
Producer registers topic namespace
client subscribes to topic
first client triggers notification on producer to start producing
new client with subscription to the same topic receives data (potentially conflated, similar to hot/cold observables in RX world)
When the last client goes away, unsubscribe or crash, notify the producer to stop producing to said topic
I am aware that according to the pub/sub pattern A producer is defined to be blissfully unaware of the existence of consumers, meaning that my use-case simply does not fit the pub/sub paradigm.
So far I have looked into Kafka, Redis, NATS.io and Amazon SQS, but without much success. I've been thinking about a few possible ways to solve this, Haven't however found anything that would satisfy my needs yet.
One option that springs to mind, for bullet 2) is to model a request/reply pattern as amongs others detailed on the NATS page to have the producer listen to clients. A client would then publish a 'subscribe' message into the system that the producer would pick up on a wildcard subscription. This however leaves one big problem, which is unsubscribing. Assuming the consumer stops as it should, publishing an unsubscribe message just like the subscribe would work. But in the case of a crash or similar this won't work.
I'd be grateful for any ideas, references or architectural patterns/best practices that satisfy the above.
I've been doing quite a bit of research over the past week but haven't come across any satisfying Q&A or articles. Either I'm approaching it entirely wrong, or there just doesn't seem to be much out there which would surprise me as to me, this appears to be a fairly common scenario that applies to many domains.
thanks in advance
Chris
//edit
An actual simple use-case that I have at hand is stock quote distribution.
Quotes come from external source
subscribe to stock A quotes from external system when the first end-user looks at stock A
Stop receiving quotes for stock A from external system when no more end-users look at said stock
RabbitMQ has internal events you can use with the Event Exchange Plugin. Event such as consumer.created or consumer.deleted could be use to trigger some actions at your server level: for example, checking the remaining number of consumers using RabbitMQ Management API and takes action such as closing a topic, based on your use cases.
I don't have any messaging consumer present based publishing in mind. Got ever worst because you'll need kind of heartbeat mechanism to handle consumer crashes.
So here are my two cents, not sue if you're looking for an out of the box solution, but if not, you could wrap your application around a zookeeper cluster to handle all your use cases.
Simply use watchers on ephemeral nodes to check when you have no more consumers ( including crashes) and put some watcher around a 'consumers' path to be advertised when you get consumers.
Consumers side, you would have to register your zk node ID whenever you start it.
It's not so complicated to do, and zk is not the only solution for this, you might use other consensus techs as well.
A start for zookeeper :
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
( strongly advise to use curator api, which handle lot of recipes in a smooth way)
Yannick
Unfortunately you haven't specified your use business use case that you try to solve with such requirements. From the sound of it you want not the pub/sub system, but an orchestration one.
I would recommend checking out the Cadence Workflow that is capable of supporting your listed requirements and many more orchestration use cases.
Here is a strawman design that satisfies your requirements:
Any new subscriber sends an event to a workflow with a TopicName as a workflowID to subscribe. If workflow with given ID doesn't exist it is automatically started.
Any subscribe sends another signal to unsubscribe.
When no subscribers are left workflow exits.
Publisher sends an event to the workflow to deliver to subscribers.
Workflow delivers the event to the subscribers using an activity.
If workflow with given TopicName doesn't run the publish event to it is going to fail.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
Distributed CRON support
See the presentation that goes over Cadence programming model.
I'm following the axon-springboot example shared by Allard (https://github.com/abuijze/bootiful-axon).
My understanding so far is: (please correct me if I have misunderstood some of the concepts)
Events are raised and stored in the event store/event bus (Mysql) (using EmbeddedEventStore). Now, event processors (TrackingProcessors - in my case) will pull events from the source (MySql - right?) and event handlers will execute the business logic and update the query storage and message published to RabbitMQ.
First question is where, when and who publishes this message to the RabbitMQ (used by statistics application which has the message listener configured.)
I have configured the TrackingProcessor to try the replay functionality. To execute the replay I stop my processor, delete the token entry for the processor, start the processor and events are replayed and my Query Storage is up-to-date as expected.
Second question is, when the replay is triggered and Query Storage is updated, I don't see any messages being published to the RabbitMQ...so my statistics application is out of sync. Am I doing something wrong?
Can you please advise?
Thanks
Singh
First of all, a correction: it is not the Tracking Processor or the updater of the view model that sends the messages to RabbitMQ. The Events are forwarded to Rabbit as they are published to the Event Bus.
The answer to your first question: messages are published by the SpringAmqpPublisher, which connects directly to the Event Bus, and forwards any published message to RabbitMQ as they are published.
To answer your second question, let's clarify how replays work, first. While it's called a "replay", essentially it's more a "reset". The Tracking Processor uses a TrackingToken to remember its progress of processing the Event Store. When the token is deleted (or just not yet available), the Tracking Processor starts processing from the beginning of the Event Store.
You never reply an entire application, just a single (Tracking) Processor. Just imagine: you re-publish all messages to RabbitMQ again, other components are triggered again, unaware of the fact that these are "old" messages, and user-confirmation emails are sent again, orders placed again, etc. etc.
If your Statistics are out of date, it's because they aren't part of the same processor and aren't rebuilt together with the other element. RabbitMQ doesn't support "replaying", since it doesn't remember the messages after delivering them.
Any model that you want to be able to rebuild, should be managed by a Tracking Processor.
Check out the Axon Reference guide for more information: https://docs.axonframework.org/part3/event-processing.html#event-processors
What all functionality are there in queue which can't be achieved by topic??
The main requirement that I run into is that consumers cannot compete for a single message on a topic. For example, I have a client who publishes call center events. Several systems subscribe to these events. One of these systems is the actual call routing application which has multiple instances running. If each instance subscribes then the call is routed to all of them. However, if the message is dropped onto a queue and all the instances consume off the same queue then only one will receive the message and the call goes to that operator. If the publishing application converts from topics to a queue, the call center works but all the other subscriber apps don't get the message.
The solution (as implemented in WebSphere MQ) was to create an administrative subscription on the topic and deliver the messages to a queue that all application instances consume from. So the producer apps are still publishers, all the dynamic subscribers still get copies of the message and the call center app instances compete for a single instance of each published message.
Also, you can't use browse semantics on a topic whereas you can on a queue. With topics you can specify selectors to filter the messages that are returned but that's about it. With queues you can browse, reset the browse pointer and then browse some more.
If you put a message on a queue and nothing is there to receive it, the message remains queued up. If you put a message to a topic and there are no active subscribers or durable subscriptions, the message is discarded. Therefore messages in a queue are naturally durable whereas messages on a topic may or may not be.
From a pure JMS perspective, queue and topic are both instances of destination and are interchangeable if you don't try to browse. An application may not know whether the destination it opens is a queue or a topic unless it uses instanceOf() at run-time to find out.