Send broadcast message to 100K subscribers using ejabberd - xmpp

I am creating PubSub like messaging application in which all subscribers of particular channel will get the message if publishers sends to the channel moreover 100k is the maximum subscriber count.
using ejabberd may I know the possibility of performance i.e can ejabberd handle 100k subscribers and will able to send message to all ?

Performance depends on many elements. Payload size, push frequency, node configuration, type of online clients connection (slow / fast), machine specification.
However, you should be able to reach that level, indeed.

Related

Google PubSub with pull subscriber design flaw?

We are using googles steaming pull subscriber the design is as follows
We are doing
sending file from FE (frontend) to BE (backend)
BE converting that file to ByteArray and publishing to pubsub topic as message (so ByteArray going as message)
Topic sending that message to subscriber, subscriber converting the ByteArray to file again
that converted file subscriber sending to that tool
tool doing some cool stuff with file and notify the status to subscriber
that status going to BE and BE update the DB and sending that status to FE
Now in our subscriber when we receive message we are immediately acknowledge it and removing the listener of subscriber so that we don't get message any more
and when that tool done that stuff, it sending status to subscriber (we have express server running on subscriber) and
after receiving status we are re-creating listener of subscriber to receive message
Note
that tool may take 1hr or more to do stuff
we are using ordering key to properly distribute message to VM's
this code is working fine but my question is
is there any flaw in this (bcz we r removing listener then again re creating it or anything like that)
or any better option or GCP services to best fit this design
or any improvement in code
EDIT :
Removed code sample
I would say that there are several parts of this design that are sub-optimal. First of all, acking a message before you have finished processing it means you risk message loss. What happens if your tool or subscriber crashes after acking the message, but before processing has completed? This means when the processes start back up, they will not receive the message again. Are you okay with requests from the frontend possibly never being processed? If not, you'll want to ack after processing is completed, or--given that your processing takes so long--persist the request to a database or to some storage and then acknowledge the message. If you are going to have to persist the file somewhere else anyway, you might want to consider taking Pub/Sub out of the picture and just writing the file to storage like GCS and then having your subscribers instead read out of GCS directly.
Secondly, stopping the subscriber upon each message being received is an anti-pattern. Your subscriber should be receiving and processing each message as it arrives. If you need to limit the number of messages being processed in parallel, use message flow control.
Also ordering keys isn't really a way to "properly distribute message to VM's." Ordering keys is only a means by which to ensure ordered delivery. There are no guarantees that the messages for the same ordering key will continually go to the same subscriber client. In fact, if you shut down the subscriber client after receiving each message, then another subscriber could receiving the next message for the ordering key since you've acked the earlier message. If all you mean by "properly distribute message" is that you want the messages delivered in order, then this is the correct way to use ordering keys.
You say you have a subscription per client, then whether or not that is the right thing to do depends on what you mean by "client." If client means "user of the front end," then I imagine you plan to have a different topic per user as well. If so, then you need to keep in mind the 10,000 topic-per-project limit. If you mean that each VM has its own subscription, then note that each VM is going to receive every message published to the topic. If you only want one VM to receive each message, then you need to use the same subscription across all VMs.
In general, also keep in mind that Cloud Pub/Sub has at-least-once delivery semantics. That means that even an acknowledged message could be redelivered, so you do need to be prepared to handle duplicate message delivery.

kafka topic with Server Sent Events (SSE)

I have a requirement to use Kafka for SSE events, my application is a mid-tier application which hooks up Mobile chat calls from front-end with the server-side.
P.S: I saw a post similar to my requirement "https://stackoverflow.com/questions/65071532/replay-kafka-topic-with-server-sent-events" I have a few questions on it.
If we go with Kafka, is it a good practice to use multiple group id's, if an application is deployed in 200 machines(creating group id with machine name), as it will create 200 group id's?
Whenever we get a response from the server for a particular call we will need to map it to the same machine, in order to achieve it we have to consume messages in all 200 machines and read the header to see if it belongs to the same machine if machine name matches then we will send the response but other machines which consumes the same have to discard it after reading the header, will it not be an over kill in terms of CPU consumption or memory usage?

Is it possible for a single message to be given to multiply instance of the same subscription in gcloud PubSub

I have a publisher that publishes messages to a particular topic (myTopic), then on my PubSub I create a subscription name: myTopicSub to this topic (myTopic), then I have a VM that runs a service that listeners on my subscription myTopicSub
THIS WORKS
MY PROBLEM IS: if there be a need to scale, and I add 5 more VM to handle more messages from my subscription... is it possible for PubSub to send the same message to more than one VM...
Because I only need one VM to process the message once. Please I need help
Cloud Pub/Sub offers at-least-once delivery. That means that a message can be delivered multiple times and in some cases, can be delivered to two different subscribers for the same subscription within a short period of time. That particular type of duplicate delivery is rare, but not impossible.
Subscribers have to be able to handle the delivery of duplicates and depending on their nature may handle it in different ways. For some, all actions are idempotent, so re-processing the same message has no ill effects. In other cases, the subscribers need to track which messages they have received and processed and if a message is a duplicate, just immediately ack the message instead of process it.

How to prevent sending same data to different clients in REST api GET?

I have 15 worker clients and one master connected through internet. Job & data are been passed through REST api in json format.
Jobs are not restricted to any particular client. Any worker can query for the available job in regular interval(say 30 seconds), process it and will update the status.
In this scenario, how can I prevent same records been sent to different clients while GET request.
Followings are my solution approach to overcome this issue:
Take top 5 unprocessed records from the database and make it as SENT and expose via REST GET.
But the problem is, it creates inconsistency. Some times, the client doesn't got data due to network connectivity issue. But in server, it will be marked as SENT. So, no other clients can get that data. It will remain as SENT forever.
Get the list from server, and reply back the list of job IDs to Server as received. But in-between this time gap, some other clients also getting same set of Jobs.
You've stumbled upon a fundamental problem in distributed systems: there is no way to know if the other side received your message. You can certainly improve the situation with TCP and ack messages. But if you never get the ACK did the message never arrive, did it arrive but the recipient die before processesing, or did the recipient send he ACK and the ACK get dropped?
That means you need to design your system to handle receiving data more than once.
You offer two partial solutions; if you combine them, your solution starts to look like how SQS works. Mark the item as pending_ack with a timestamp. After client replies, it is marked sent. Any pending_ackss past a certain time period are eligible to be resent.
Pick your time period to allow for slow network and slow clients and it boils down to only sending duplicates when you really don't know if the client died or not.
Maybe you should reconsider the approach to blocking resources. REST architecture - by definition is not obliged to save information about client. Instead, you may want to consider optimistic concurrency control (http://en.wikipedia.org/wiki/Optimistic_concurrency_control).

Is RabbitMQ capable of "pushing" messages from a queue to a consumer?

With RabbitMQ, is there a way to "push" messages from a queue TO a consumer as opposed to having a consumer "poll and pull" messages FROM a queue?
This has been the cause of some debate on a current project i'm on. The argument from one side is that having consumers (i.e. a windows service) "poll" a queue until a new message arrives is somewhat inefficient and less desirable than the idea having the message "pushed" automatically from the queue to the subscriber(s)/consumer(s).
I can only seem to find information supporting the idea of consumers "polling and pulling" messages off of a queue (e.g. using a windows service to poll the queue for new messages). There isn't much information on the idea of "pushing" messages to a consumer/subscriber...
Having the server push messages to the client is one of the two ways to get messages to the client, and the preferred way for most applications. This is known as consuming messages via a subscription.
The client is connected. (The AMQP/RabbitMQ/most messaging systems model is that the client is always connected - except for network interruptions, of course.)
You use the client API to arrange that your channel consume messages by supplying a callback method. Then whenever a message is available the server sends it to the client over the channel and the client application gets it via an asynchronous callback (typically one thread per channel). You can set the "prefetch count" on the channel which controls the amount of pipelining your client can do over that channel. (For further parallelism an application can have multiple channels running over one connection, which is a common design that serves various purposes.)
The alternative is for the client to poll for messages one at a time, over the channel, via a get method.
You "push" messages from Producer to Exchange.
https://www.rabbitmq.com/tutorials/tutorial-three-python.html
BTW this is fitting very well IoT scenarios. Devices produce messages and sends them to an exchange. Queue is handling persistence, FIFO and other features, as well as delivery of messages to subscribers.
And, by the way, you never "Poll" the queue. Instead, you always subscribe to publisher. Similar to observer pattern. Generally, I would say genius principle.
So it is similar to post box or post office, except it sends you a notification when message is available.
Quoting from the docs here:
AMQP brokers either deliver messages to consumers subscribed to
queues, or consumers fetch/pull messages from queues on demand.
And from here:
Storing messages in queues is useless unless applications can consume
them. In the AMQP 0-9-1 Model, there are two ways for applications to
do this:
Have messages delivered to them ("push API")
Fetch messages as needed ("pull API")
With the "push API", applications have to indicate interest in
consuming messages from a particular queue. When they do so, we say
that they register a consumer or, simply put, subscribe to a queue. It
is possible to have more than one consumer per queue or to register an
exclusive consumer (excludes all other consumers from the queue while
it is consuming).
Each consumer (subscription) has an identifier called a consumer tag.
It can be used to unsubscribe from messages. Consumer tags are just
strings.
RabbitMQ broker is like server that wont send data to consumer without consumer client getting registering itself to server. but then question comes like below
Can RabbitMQ keep client consumer details and connect to client when packet comes?
Answer is no. so what is alternative well then write plugin by yourself that maintain client information in some kind of config. Plugin will pull from RabbitMQ Queue and push to client.
Please give look at this plugin might help.
https://www.rabbitmq.com/shovel.html
Frankly speaking Client need to implement AMQP protocol to receive so and should listen connection on some port for that. This sound like another server.
Regards,
Vishal