Azure DevOps Azure Service Bus items stuck as Queued - azure-devops

I have a Service Hook setup that sends an entry to an Azure Service Bus Queue. After experiencing delays on processing, I noticed the items were not getting processed and remained in Queued status. After several hours, new items were processed successfully.
No errors found on Service Bus Queue message processing for those that went through.
Will the queued items eventually get processed?

Will the queued items eventually get processed?
I'm afraid they won't be dealt with eventually.
Any that are still showing a Queued status will not be picked up again by the service and will not be delivered at this point. So there is no method could send the request again.
As far as I know, Service Hooks doesn't use jobs to create or send notifications, only AT threads, and it runs entirely in memory. Hence if the AT machine restarts or loses memory, the notification is lost for that event and will not be sent. This could be the cause of this issue. But it is hard to determine the specific cause
On the other hand, you could also submit the suggestion ticket about resending the Queued/Failed event (Because there is no such feature now). It will help deal with this situation.

Related

Rebus: How to cancel or clear defer message from schedule list with azure service bus?

Recently we have been moved to Azure Service Bus from RabitMQ, from there we were getting many complaints about duplicate reminders/reports from customer.
Cause: As defer message is getting duplicated on restart app/component, due to that it is sending multiple times.
Why was working with RabitMQ? It was working because in initialization of bus it can be used configurer.Timeouts(x => x.StoreInMemory()) this feature, but with Azure Service Bus it can't be. On every restart defer (schedule list) will get cleared as it is stored in memory and with Azure Service Bus t gets persistence due to that it is increased request on every restart.
So, what is best way to handle it?
Is it possible to clear defer message on startup or is it possible to use Timeouts (in memory) feature with Azure Service Bus?

Event Replay using TrackingEventProcessor - Axon 3

I'm following the axon-springboot example shared by Allard (https://github.com/abuijze/bootiful-axon).
My understanding so far is: (please correct me if I have misunderstood some of the concepts)
Events are raised and stored in the event store/event bus (Mysql) (using EmbeddedEventStore). Now, event processors (TrackingProcessors - in my case) will pull events from the source (MySql - right?) and event handlers will execute the business logic and update the query storage and message published to RabbitMQ.
First question is where, when and who publishes this message to the RabbitMQ (used by statistics application which has the message listener configured.)
I have configured the TrackingProcessor to try the replay functionality. To execute the replay I stop my processor, delete the token entry for the processor, start the processor and events are replayed and my Query Storage is up-to-date as expected.
Second question is, when the replay is triggered and Query Storage is updated, I don't see any messages being published to the RabbitMQ...so my statistics application is out of sync. Am I doing something wrong?
Can you please advise?
Thanks
Singh
First of all, a correction: it is not the Tracking Processor or the updater of the view model that sends the messages to RabbitMQ. The Events are forwarded to Rabbit as they are published to the Event Bus.
The answer to your first question: messages are published by the SpringAmqpPublisher, which connects directly to the Event Bus, and forwards any published message to RabbitMQ as they are published.
To answer your second question, let's clarify how replays work, first. While it's called a "replay", essentially it's more a "reset". The Tracking Processor uses a TrackingToken to remember its progress of processing the Event Store. When the token is deleted (or just not yet available), the Tracking Processor starts processing from the beginning of the Event Store.
You never reply an entire application, just a single (Tracking) Processor. Just imagine: you re-publish all messages to RabbitMQ again, other components are triggered again, unaware of the fact that these are "old" messages, and user-confirmation emails are sent again, orders placed again, etc. etc.
If your Statistics are out of date, it's because they aren't part of the same processor and aren't rebuilt together with the other element. RabbitMQ doesn't support "replaying", since it doesn't remember the messages after delivering them.
Any model that you want to be able to rebuild, should be managed by a Tracking Processor.
Check out the Axon Reference guide for more information: https://docs.axonframework.org/part3/event-processing.html#event-processors

Reference counted Pub/Sub system

I am searching for a way to design my system that consists of multiple publishers, multiple channels and multiple subscribers, all of which can be uniquely identified easily.
I need to send messages in both directions, with as low as possible latency. However, if a subscriber dies, the messages he subscribed to should not be dropped, when it comes back online, it should receive all pending messages. Since I handle with very high numbers of messages (up to 1000 per second happens on a regular basis) while having a low-spec server, meaning keeping lists of all messages at all times is not an option.
I was considering if a reference count/list for messages is a viable option. When a message is published, it is initialized with a list of subscribers to that specific channel, when a subscriber receives the message, the subscriber is removed from the list. The message is removed if the list is empty.
Now, if a subscriber dies without unsubscribing, the messages will not be removed because the list of missing subscribers is not empty. When it comes back online, it will be able to receive the list of all pending messages, since it identifies with the same ID as the dead instance.
Perhaps it would be required to have messages/subscribers time out, for example if a subscriber has been inactive for 10 minutes, all list entries containing it are cleared.
Is that a good idea, have I forgotten problems that could arise with this system in particular? Is there any system that already does this? RabbitMQ and similar PubSub systems dont seem to have this - if not, I guess redis is the way to go?
I can imagine managing reference count for the purposes of message lifecycles. This sounds reasonable in terms of message and memory management during normal Service operation. Of course, timeouts provide patch for references from dead services.
However in terms of health monitoring and service recovery issues this is quite another story.
The danger that I currently see here is state management. Imagine a service that is a stateful subscriber (i.e. has a State Machine) that is driven from it initial state (I) to a certain state (S). Each message is being processed differently in different states. Now imagine that your service dies and gets restarted. Meanwhile some messages are stored and after the service is back online, they are dispatched to it. However the Service receives them in the wrong state (I instead of S) and acts unexpectedly.
Can you restore the service in the exact state it was when crashed? In practice, this is extremely difficult since even in the State Machine approach the service has side effects / communicates with global state(s) etc.
Bottomline, reference counting seems reasonable in terms of managing Messages, but mixing it with health monitoring results in lots of complexity issues.

First message not arriving over an MSMQ/MassTransit Service Bus

I've got a MassTransit ServiceBus running over MSMQ. It appears that the first message sent over the Bus doesn't arrive, but subsequent messages do?
Is there some initialization that needs performing on the queue or bus before the message is sent?
This depends on a few settings in how much time the system needs to setup before everything will correctly route. If only first message is failing to end up in the right location, then likely the subscription data isn't propagated everywhere yet. http://readthedocs.org/docs/masstransit/en/develop/overview/subscriptions.html
Using Multicast subscriptions, the easiest choice, will require a few seconds after a endpoint has come up and register a subscriber with all other endpoints. If you can control the order of services starting up, then this can often be avoided by started back to front in the flow.
If you are using the subscription service, then that can also take a couple seconds to get data everywhere. It has to go through the subscription service but the subscription is send to everyone on the bus. This is tied to a SQL db, and latency to the db can effect this timing.
Lastly, if you are using static routing, then that should work immediately, because the subscription is setup upon startup.

NServiceBus MSMQ messages intermittently get stuck on the Outgoing Queue

We have a Pub / Sub system based on NServiceBus, where we have intermittent issues with messages getting stuck on the Publishers outgoing queue indefinitely, rather than being transmitted to the Subscribers input queues.
Points to note:
When we restart the Publisher Service and Subscriber services, message flow resumes normally for a while.
The problem seems to occur more often if a sustained period of time between messages occurs.
The publisher service resides on the LAN, the subscribers on the otherside of a firewall.
Some messages get through! As mentioned after service restarts, things go fine for a while.
Using QueueExplorer, I can see the messages on the Outgoing queue have a state of WAITING.
Annoyingly our development environment does not exhibit this behaviour, but then again the publisher and subscribers all reside on the same LAN in this environment.
MSMQ messages being stuck in an outgoing queue is purely an MSMQ issue. Restarting the Publisher and Subscriber services should make no difference as they are not directly involved in message delivery. If you can fix the problem by ONLY restarting the Pub/Sub services and NOT the Message Queuing services then it looks like a resources/memory leak problem.
I imagine something like this happening:
Messages flow to destination, which uses up kernel memory in storing them
For some reason, kernel memory runs out (too many messages, memory leak, whetever)
Destination now rejects new messages as they cannot be loaded into memory from the wire
Connection is reset and not re-connected until WaitTime value reached; Queue is "waiting" at this point
System loops through (3) and (4) until ...
Pub/Sub services are restarted and now there is sufficient resources for messages to be delivered
Goto (2)
Occasional messages get through when just enough kernel memory is temporarily freed up by one of the many services and device drivers that use it.
Item 4 of this blog post is the most likely culprit:
http://blogs.msdn.com/b/johnbreakwell/archive/2006/09/18/insufficient-resources-run-away-run-away.aspx
Cheers
John Breakwell
We had a similar scenario in production, it turned out we migrated one of our subscriber endpoints to a new physical host and forgot to unsubscribe before shutting down the old endpoint. Our publisher was trying to deliver messages to both the old and new endpoints but could only reach the new one. Eventually the publishers outbound queue grew so large that it started affecting all outgoing messages.
I have run into this issue as well, I know it is not Item 4, as I don't send anything to it before it gets stuck in the outgoing queue. If I let both publisher and subscriber sit for about 10 minutes before sending a message, it never leaves the outgoing queue. If I send a message before that amount of time, it flows fine. Also, if I restart the subscriber the message will then flow. This is reproducible every time I let them sit idle for 10 minutes.
I think I found the answer here, at least this fixed the issue I was having:
http://support.microsoft.com/kb/2554746
Also, in my case it had nothing to do with restarting, so don't let that throw you off, I did exhibit the symptoms in the netstat and messages would initially go through when the client was first started up.
Just to throw my 2p in:
We had an issue where the message queuing service had some kind of memory leak and would consume large amounts of memory which is did not release.
This lead to messages getting stuck for long periods of time - although they would eventually be delivered (sometimes after 3 days).
We have not bothered fixing this yet as it only happens when the service is under heavy load which does not happen often.