MSMQ multiple readers - msmq

This is my proposed architecture. Process A would create items and add it to a queue A on the local machine and I plan to have multiple instances of windows service( running on different machines) reading from this queue A .Each of these windows service would read a set of messages and then process that batch.
What I want to make sure is that a particular message will not get processed multiple times ( by the different windows service). Does MSMQ by default guarantee single delivery?
Should I make the queue transactional? or would a regular queue suffice.

If you need to make sure that the message is delivered only once, you would want to use a transactional queue. However, when a service reads a message from the queue it is removed from the queue and can only be received once.

Related

What's the point of having a single celery worker with multiple queues?

continuing How does a Celery worker consuming from multiple queues decide which to consume from first?
I've setup a single worker and have it listen to two queues. I understand from the above linked question that the worker would consume messages from those two queues in round-robin or in the order they arrived (depending on celery version).
So what's the purpose of this setting? Why is it different than a single queue? Would that be helpful only for monitoring, or is there an operational benefit i'm missing here?
In most scenarios you will have your worker subscribed only to a single queue, however there are scenarios when having ability to subscribe to multiple queues makes sense.
Here is one. Imagine you have a Celery cluster of 10 machines. They perform various tasks, and among them there is a task that downloads files from remote file-server. However, the owner of the file-server whitelisted only two of your 10 machine IPs, so basically only two of them can download files from that particular file-server. Typically you will have Celery workers on these two machines subcribe to an additional queue, called "download" for an example, and schedule download tasks by sending them to the "download" queue.
This is a very common scenario where a subset of your nodes can do particular thing (access remote servers - file servers, database servers, etc).
One could argue "why not have just the 'download' queue on these two machines?" - that may be a waste of resources.

Using Celery with multiple workers in different pods

What I'm trying to do is using Celery with Kubernetes. I'm using Redis as the message broker in a different pod and I have multiple pods for each queue of Celery.
Imagine if I have 3 queues, I would have 3 different pods (i.e workers) that can accept and handle the requests.
Everything is working fine so far but my question is, what would happen if I clone the pod of one of queues to have two pods for one single queue?
I think client (i.e Django) creates a new message using Redis to send to the worker and start the job but it's not clear to me what would happen because I have two pods listening to the same queue? Does the first pod accept the request and start the job and prevents the other pod to accept the request?
(I tried to search a bit on the documentation of Celery to see if I can find any clues but I couldn't. That's why I'm asking this question)
I guess you are using basic task type, which employs 'direct' queue type, not 'fanout' or 'topic' queue, the latter two have much difference, which will not be discussed here.
While using Redis as broker transport, celery/kombu use a Redis list object as a storage of queue (source), use command LPUSH to publish message, BRPOP to consume the message.
In short, BRPOP(doc) blocks the connection when there are no elements to pop from the given lists, if the list is not empty, an element is popped from the tail of the given list. It is guaranteed that this operation is atomic, no two connection could get the same element.
Celery leverage this feature to guarantees at-least-once message delivery. use of acknowledgment doesn't affect this guarantee.
In your case, there are multiple celery workers across multiple pods, but all of them connected to one same Redis server, all of them blocked for the same key, try to pop an element from the same list object. when new message arrived, there will be one and only one worker could get that message.
A task message is not removed from the queue until that message has been acknowledged by a worker. A worker can reserve many messages in advance and even if the worker is killed – by power failure or some other reason – the message will be redelivered to another worker.
More: http://docs.celeryproject.org/en/latest/userguide/tasks.html
The two workers (pods) will receive tasks and complete them independently. It's like have a single pod, but processing task at twice the speed.

Biztalk - How to throttle a streaming disassemble pipeline

I need to limit the number of orchestration instances spawned while debatching a large message in a streaming disassemble receive pipeline. Let’s say that I have a large xml coming in that contains 100 000 separate "Order" message. The receive pipeline would then debatch it and create 100 000 "ProcessOrder" orchestrations. This is too much and I need to limit that.
Requirements
The debatching needs to be done in a streaming manner so that I only load one "Order" message in memory at a time before sending it to the messagebox;
The debatching needs to be throttled based on the number of current running "ProcessOrder" orchestration instances (say if I already have 100 running instances, the debatching would wait till one is over to send another "Order" message to the messagebox).
Where I'm at
I have the receive pipeline that does the debatching and functional modifications to my messages. It does what it should in a streaming manner and puts individual messages in VirtualStreams;
I have an orchestration and helper methods that can limit the number of “ProcessOrder” orchestration instances.
The problem
I know that I can run a receive pipeline inside an orchestration (and that would solve my problem since on every "getnext" call to the pipeline, I could just hold on if there are too many running orchestration instances) but, digging in biztalk dlls, I noticed that using Microsoft.XLANGs.Pipeline.XLANGPipelineManager still loads up all the messages in memory instead of enumerating them like Microsoft.BizTalk.PipelineOM.PipelineManager does. I know they are putting every messages in VirtualStream but this is still inadequate, memory wise, for such a large message number.
Question
My next step would be to run the receive pipeline directly in the receive port (so it would use Microsoft.BizTalk.PipelineOM.PipelineManager) without having the orchestration that limits the number of “ProcessOrder” instances, but to meet the requirements, I would need to add a delay logic in my pipeline. Is this a viable option? If not, why? and what other alternative do I have?
You should debatch all messages once from pipeline and store those individual messages in MSMQ before even they are processed by orchestration. Use standard pipeline to debatch messages as they are efficient to handle large files debatching. MSMQ is available for free through Turn On Windows Features. Using MSMQ is very easy and does not require any development. Sending to MSMQ will be very fast 100K messages is not issue at all.
Then have a receive location to read from MSMQ. Depending on your orchestration throughput, you can control message flow by using BizTalk receive host throttling or by receiving the messages from MSMQ in Order or using the combination of both. Make sure you have separate host instance for both receive MSMQ and send MSMQ and for your orchestration processing.
This will be done through all configurations without any extra code simplifing your design. Make sure you have orchestration with minimum number of persistent points.

NServiceBus distributor worker create a queue called PRIVATE$\order_queue$

I have created an NServiceBus Distributor and Worker, running on separate machines. When I run the worker, it successfully sends a message to the Distributor (and I can see it processed through the Storage queue) but for some reason an output queue is created on the Distributor called
'DIRECT=TCP:xx.xx.xx.xx\PRIVATE$\order_queue$ when the queue should be called
'DIRECT=OS:WORKERDNSNAME\private$\myqueue'.
Does anyone know why the order_queue$ is being created?
Shameless copy direct from an old post at pg2e.blogspot.co.uk:
Transactional queues over HTTP from private networks
When sending messages to a transactional queue over http/s from a
server without a public ip address the ACK-messages may have a hard
time reaching their destination. This is due to the same cause as in
this post (Basically NATting causing a mismatch with the message destination address).
By default the receipts are sent to the sending computers name, which
of course will not work unless both parties resides on the same
network. To fix this you have to map the receipts to the public address
of the sender. This is done by creating an xml-file (of any name) in
C:\WINDOWS\system32\msmq\mapping with the following content.
<StreamReceiptSetup xmlns="msmq-streamreceipt-mapping.xml">
<setup>
<LogicalAddress>http://msmq.domain.com/*</LogicalAddress>
<StreamReceiptURL>http://[ADDRESS_TO_SENDER]/msmq/Private$/order_queue$</StreamReceiptURL>
</setup>
<default>http://xxx.xx.xxx.xx/msmq/Private$/order_queue$</default>
</StreamReceiptSetup>
Explanation: All messages sent to any queue at msmq.domain.com will
have their receipts sent to the given StreamReceiptURL. The
order_queue$ queue is used to handle transactional control messages.
I suspect later versions of MSMQ or NServiceBus handle creating this queue automatically without you having to create the XML file yourself.

MSMQ as a job queue

I am trying to implement job queue with MSMQ to save up some time on me implementing it in SQL. After reading around I realized MSMQ might not offer what I am after. Could you please advice me if my plan is realistic using MSMQ or recommend an alternative ?
I have number of processes picking up jobs from a queue (I might need to scale out in the future), once job is picked up processing follows, during this time job is locked to other processes by status, if needed job is chucked back (status changes again) to the queue for further processing, but physically the job still sits in the queue until completed.
MSMQ doesn't let me to keep the message in the queue while working on it, eg I can peek or read. Read takes message out of queue and peek doesn't allow changing the message (status).
Thank you
Using MSMQ as a datastore is probably bad as it's not designed for storage at all. Unless the queues are transactional the messages may not even get written to disk.
Certainly updating queue items in-situ is not supported for the reasons you state.
If you don't want a full blown relational DB you could use an in-memory cache of some kind, like memcached, or a cheap object db like raven.
Take a look at RabbitMQ, or many of the other messages queues. Most offer this functionality out of the box.
For example. RabbitMQ calls what you are describing, Work Queues. Multiple consumers can pull from the same queue and not pull the same item. Furthermore, if you use acknowledgements and the processing fails, the item is not removed from the queue.
.net examples:
https://www.rabbitmq.com/tutorials/tutorial-two-dotnet.html
EDIT: After using MSMQ myself, it would probably work very well for what you are doing, as far as I can tell. The key is to use transactions and multiple queues. For example, each status should have it's own queue. It's fairly safe to "move" messages from one queue to another since it occurs within a transaction. This moving of messages is essentially your change of status.
We also use the Message Extension byte array for storing message metadata, like status. This way we don't have to alter the actual message when moving it to another queue.
MSMQ and queues in general, require a different set of patterns than what most programmers are use to. Keep that in mind.
Perhaps, if you can give more information on why you need to peek for messages that are currently in process, there would be a way to handle that scenario with MSMQ. You could always add a database for additional tracking.