Microsoft Message Queue Missing Messages - msmq

I am using C# and .Net Framework 1.1 (yes its old but I inherited this stuff and can't convert up). I places messages on a transactional queue but it does not get on the queue about 50% of the time. Running workgroup and Windows/XP Professional with all service packs installed. I don't see any messages in the dead letter queue either.
Any ideas where to look?

If it isn't hitting the queue at all and isn't going to the dead-letter queue, it suggests the item isn't being sent to the queue. You should be able to confirm that this is the case by switching on the journal for the queue.
Assuming it isn't hitting the queue, it is probably a transaction issue. I would check that you are definitely committing the message to the queue every time. Make sure there aren't any exceptions being thrown and swallowed that causes the transaction to roll back or never be committed (essentially the same thing). Also make sure there aren't any conditional statements that mean the commit gets skipped.
I would add some logging around every location where a transaction is started, committed and rolled back and also around any location where you are creating a message. You can then review you log to see the order of events and see what's going astray.
Another option would be to remove all of the transaction code and test the code against a non-transactional queue. If the messages all appear then it is a transactional problem. If not, the issue is elsewhere.
I use MSMQ a lot and the one thing I have learned through experience is that it works really well and the weak point is me :-)

Related

How to persist and replay NestJS CQRS event and saga across restart?

I am making an application which will need to use NestJS' CQRS module, as the requirements naturally lend themselves to that pattern.
Updates to the application logic are expected to be frequent and to happen during busy hours (that's just how my management works...), so the application needs to be able to restart gracefully. However, this means that events started just before the shutdown may not finish, or even if they do, some sagas may not trigger due to some events having happened before the restart... I'd like to ensure that doesn't happen.
I'm aware of NestJS' OnApplicationShutdown and OnApplicationBootstrap hooks, which is exactly for this purpose, but what I'm not sure is what I should do there. How can I capture all events that have unfinished handlers and sagas? Then after a restart, how can I make the event bus aware of the events monitored by sagas, without executing the already executed handlers?
I guess the second part could be worked around with a random ID per event/handler combo, that will be looked up in a log, and if present, the handler will be skipped, and if not, it will be executed and added to the log... But even with such a workaround, I don't see how I could do the first part. There will be a lot of events, and sagas (by definition) execute commands, meaning they have side effects... Even if all commands can become idempotent, the sheer quantity of events and frequent restarts means restarting from the very first command is a no go.
I've seen this package but I'm not sure if it solves this particular use case, or if it's really just logging the events, and pretty much nothing more.

Laravel Mail Queue Infinite Loop on Exception

Hello fellow programmers, I wish everyone a good morning.
The Situation
Laravel is great. Laravel Mail queues and the beanstalkd integration is great. It took me almost no time to get everything working. The sun is shining and its not raining. Its awesome.
Except when an exception is thrown while sending an email. Then thise mail is processed again and again and again and the exception is also thrown again and again and again.
Infinite loop.
I think I wouldnt even notice this if I wouldn't have seeded the database with invalid data. Validation usually would have taken care of that, that emails like 361FlorindaMatthäi#gmail.com dont end up with the folowing exception:
[Swift_RfcComplianceException]
Address in mailbox given [361FlorindaMatthäi#gmail.com] does not
comply with RFC 2822, 3.6.2.
But what validation wouldnt have taken care for is for example, when my mandrill account reaches its limits or my server looses internet connection, whatever. An Exception sends it into an infinite loop.
In the world where the sun is shining and everything is great the job has to be marked as buried or suspended and the next email should be processed. An infinite loop with an invalid email address is not great.
Basicly your application doesnt send out any emails anymore. This guy has roughly the same issue.
How can I fix this? Has anyone else encountered this Error?
Any Help is much appreciated.
You just need to travel Laravel how many times to try a specific job, before deciding it has failed:
php artisan queue:daemon --tries=3
This way, it will stop processing that specific job after 3 tries.
The hard part of any queue-based system is dealing with the errors, I've run tens of millions of jobs through BeanstalkD and many more through other systems like SQS.
With this Swift_RfcComplianceException exception it's clear that the job will never be able to succeed, and so trying it again would be futile.
Some other problems might be able to be recovered, but in either event, you have to wrap the code in a try/catch block and do what you can.
Since there is no way to 'fix' this particular issue, I would record what happened (the name of the exception and any message, and the data) to a log to check on, and then delete or bury the job. If you store the job-id in the log when it is buried it, you can go back and delete or kick that particular job again later - this would be after being able to change what happens to the job (rather than having it fail again).

PostgreSQL backend behavior upon receiving "Terminate" ('X') after "COMMIT"

We run a postgres server v9.2.8, and use epgsql (erlang) as a client library. And in some cases, which we had on production but weren't able to reproduce in dev environment, we're loosing data.
A function in our application (it should be killed) allows an operator to change session parameters on a running connection. Since connection is usually always busy on production, a "SET SESSION bla-bla" query always crashes pgsql_connection process.
Before crashing, pgsql_connection sends a "Terminate" ('X') signal via pgsql_sock (a wrapper around tcp socket) to a backend. At the same time another erlang process (let's call it "worker") is waiting for a response from postgres backend using the same socket.
Now the question: is it possible that upon receiving a "Terminate" signal from a client, backend can cancel last transaction even if it has sent an "OK" on "COMMIT" statement already?
Because if it is possible, a worker will have a chance to report to the main application process about successfully written transaction while indeed the transaction has been cancelled.
Or, where can I read more details about this? Documentation says (http://www.postgresql.org/docs/9.2/static/protocol-flow.html):
For either normal or abnormal termination, any open transaction is
rolled back, not committed. One should note however that if a frontend
disconnects while a non-SELECT query is being processed, the backend
will probably finish the query before noticing the disconnection. If
the query is outside any transaction block (BEGIN ... COMMIT sequence)
then its results might be committed before the disconnection is
recognized.
– not a crystal clear statement.
Now the question: is it possible that upon receiving a "Terminate" signal from a client, backend can cancel last transaction even if it has sent an "OK" on "COMMIT" statement already?
No. that is fundamentally impossible. If it's committed, it's committed, and there's no going back. That's what "commit" means.
The only time Pg might return success before the commit hits disk and is persistent is if you told it to by setting synchronous_commit = off.
If you're seeing anything different happening then most likely it's a result of attempting to share a single connection between multiple processes (as you establish the connection before fork()) without proper locking or other mutual exclusion to ensure that the connection is locked while a command is in-flight.
Note that the reverse isn't true, which might be what you're thinking of with the quoted documentation passage. A transaction can get committed without returning a successful OK to the client if the client goes away (crashes, loses connection, etc) after issuing the commit command.
What the application is doing, where it sends out-of-sync messages on the wire protocol, is totally broken. It's guaranteed to cause unpredictable problems. The protocol is somewhat robust, so you're not likely to get things like an unintended commit, but you're very likely to get transactions aborted or whole sessions disconnected suddenly.
If you need to be able to roll back/abort committed transactions, then your application design has problems. You're not really ready to commit when you say COMMIT. You would have the same problem if the app process crashed or the whole server crashed between Pg committing the transaction and you doing whatever you need to do.
If you cannot fix the app design to avoid this then you will have to use two-phase transactions, either directly using PREPARE TRANSACTION then COMMIT PREPARED, or indirectly via the XA API. This has significant costs in performance and management overhead, but it's the only option if you need to do special work after database commit but before you're really "done".
The docs you quote are talking about the case where the app has sent a COMMIT but then disconnects before receiving the backend's acknowledgement of the commit. Because TCP/IP is buffered there's no guarantee the COMMIT got flushed to Pg, and if it did there's no guarantee it doesn't accompany the RST that terminates the connection. So in this specific case it's somewhat uncertain whether the transaction will commit or not. An application for which this is a problem would need to have a way of checking whether the last unit of work committed or not when it resumes work, or if it can't do that use two-phase transactions. The docs you quote say nothing about being able to cancel a commit after it's completed, because you can't. Ever.
Assuming that the app has to do some kind of extra work after commit, like moving a file or sending an email or doing work on another data store, then you're probably going to need two-phase transactions. Even then you're vulnerable to issues unless all parties in the distributed transaction support two phase commit, because your "other bit" could get done then your worker or server could crash before the confirmation of its completion is sent to the database to finish phase II of the commit.
You can keep your own two phase commit log of sorts in the DB instead of using true 2PC:
Do the main database work and write a record to the work log table that says "I've done the work in the database and I'm about to do the next part".
Do the next part; and
Update the work log to say the next part is done.
... but this has the same problem, where a crash between parts 2 and 3 causes the app to forget that it did part 2 and repeat it on startup. If you can't live with that, you need to find a way to make part 2 commit completion verifiable, so you can tell if it's done or not, or find a way to make it capable of doing 2-phase commit.
To learn more about this topic, read about XA, distributed transactions, two-phase commit, etc.

CQRS/EventStore: How are failures to deliver events handled?

Getting into CQRS and I understand that you have commands (app layer) and events (from the domain).
In the simple case where events are to update the read model, do read model updates fail? If there is no "bug" then I cannot see them failing and as I am using EventStore, I know there is a commit flag which will retry failures.
So my question is do I have to do anything in addition to EventStore to handle failures?
Coming from a world where you do everything in one transaction and now things are done separately is worrying me.
Of course there may be cases where a published event will fail in the read models.
You have to make sure you can detect that and solve it.
The nice thing is that you can replay all the events again and again so you have the chance not only to fix the error. You can also test the fix by replaying every single event if you want.
I use NServiceBus as my publishing mechanism which allows me to use an error queue. Using my other logging tools together with the error queue I can easily determine what happened since I have the error log and the actual message that caused the error in the first place.

Silly WebSphere MQ questions

I have two very basic questions on WebSphere MQ - given that I had been kind of administrating it for past few months I tend to think that these are silly questions
Is there a way to "deactivate" a
queue ? (for example through a
runmqsc command or through the
explorer interface) - I think not. I
think what I can do is just delete
it.
What will happen if I create a
remote queue definition if the real
remote queue is not in place? Will
it cause any issues on the queue
manager? - I think not. I think all
I will have are error messages in
the logs.
Please let me know your thoughts.
Thanks!
1 Is there a way to "deactivate" a
queue?
Yes. You can change the queue attributes like so:
ALTER Q(QUEUE_NAME) PUT(DISABLED) GET(DISABLED)
Any connected applications will receive a return code on the next API call telling them that the queue is no longer available for PUT/GET. If these are well-behaved programs they will then report the error and either end or go into a retry loop.
2 What will happen if I create a
remote queue definition if the real
remote queue is not in place?
The QRemote definition will resolve to a transmit queue. If the message can successfully be placed there your application will receive a return code of zero. (Any unsuccessful PUT will be due to hitting MAXDEPTH or other local problem not connected to the fact that the remote definition does not exist.)
The problem will be visible when the channel tries to deliver the message. If the remote QMgr has a Dead Letter Queue, the message will go there. If not, it will be backed out onto the local XMitQ and the channel will stop.