Where can I find a NServiceBus 4.1 message during an SLR retry? - nservicebus4

We are currently implementing a new system. It now happens, that the content of my message is wrong and gets rejected by the connecting system (we transfer data over a REST service). I can edit my message as soon as it is in the error queue and re-queue it. But while NServiceBus is trying to re-send it (which will of course fail every time), I can't seem to find the message to correct it for the next time around. Any idea where the message is "parked" during SLR?

The message gets moved to our timeout storage, which is by default RavenDB.


Message lost while the receiver's presence is not updated in OF server

I have browsed this forum searching solution for this problem but couldnt find one. My issue is same as this,
https://igniterealtime.org/issues/si/jira.issueviews:issue-html/OF-161/OF-161.ht ml
I have configured the Ping request from server side for 30 seconds. But still 30 second is huge time. During that time lots of message are getting lost.
XEP-0184 is more of a client side delivery receipt management. Is that possible that i can get the acknowledgement in server as well?
Is it possible to store all the message in OF until we receive the delivery receipt from receiver. And delete the message from OF once we get the delivery receipt.
Please suggest me on how to prevent this message loss.
Right now there is no working solution in openfire 3.9.3 version.
What i have done is created a custom plugin,
* This will intercept the message packet and add it to custom table, until it receives ack packet from the receiver.
By this way we are avoiding the message loss.

MSMQ: How do you send a msg from transactional dead letter queue to a private queue on remote machine

Windows Server 2012
MSMQ 6 Workgroup Mode
We've had issues trying to recover MSMQ messages that were sent to the transaction dead letter queue. We've tried moving them to the outbound queue, the message seems to send fine (even the Event Log says so) however it never gets to the destination queue.
After trial and error we've figured out how to get them to another queue on the same server but not to the destination queue on a remote server. We don't want to lose anymore messages. Does anyone have any suggestion on how we can deliver these messages?
Thank you,
As I understood your question, it's a one time problem with some number of messages you already have in MSMQ, and not general connectivity issue between machines? If so, you should be able to solve it with some MSMQ management tool. Disclaimer: I'm the author of one such tool - QueueExplorer. I don't know what other tools can do, but with QueueExplorer you can copy/paste or drag/drop messages to another machine opened in separate tab/window. In order to do that QueueExplorer has to perform MSMQ Send operation, so messages will have to pass through MSMQ between these two machines.
So if there's still that issue that prevented original delivery you'll still be stuck. In that case you can save all messages to a file, transfer it to another machine through file system and load it there to whichever queue they should go. This is obviusly just a manual workaround for one time situation. Btw. this could be done in QueueExplore's trial mode.
If however problem is with connectivity and messages always end up in dead letter queue, it's better to check them from Computer Management. It's one area where it's better than our tool - you can turn on "Class" column and see reason why messages couldn't be delivered. For instance if you see "The time-to-be-received has elapsed" you'll know what's the problem.

Apple Push Notifications in Bulk

I have an app that involves sending Apple Push Notifications to ~1M users periodically. The setup for doing so has been built and tested for small numbers of notifications. Since there is no way I can test sending at that scale, I am interested in knowing whether there are any gotchas in sending bulk push notifications. I have scripts written in Python that open a single connection to the push server and send all notifications over that connection. Apple recommends keeping it open for as long as possible. But I have also seen that the connection terminates and you need to reestablish it.
All in all, it is disconcerting that successful sends are not acknowledged, only erroneous ones are flagged. From a programmer's standpoint instead of simply checking one thing "if (success)" you now need to watch for numerous things that could go wrong.
My question is: What are the typical set of errors that you need to watch out for to make sure your messages don't silently disappear into oblivion? The connection closing is an easy one. Are there others?
I completely agree with you that this API is very frustrating, and if they would have sent a response for each notification it would have been much easier to implement.
That said, here's what Apple say you should do (from Technical Note) :
Push Notification Throughput and Error Checking
There are no caps or batch size limits for using APNs. The iOS 6.1
press release stated that APNs has sent over 4 trillion push
notifications since it was established. It was announced at WWDC 2012
that APNs is sending 7 billion notifications daily.
If you're seeing throughput lower than 9,000 notifications per second,
your server might benefit from improved error handling logic.
Here's how to check for errors when using the enhanced binary
interface. Keep writing until a write fails. If the stream is ready
for writing again, resend the notification and keep going. If the
stream isn't ready for writing, see if the stream is available for
If it is, read everything available from the stream. If you get zero
bytes back, the connection was closed because of an error such as an
invalid command byte or other parsing error. If you get six bytes
back, that's an error response that you can check for the response
code and the ID of the notification that caused the error. You'll need
to send every notification following that one again.
Once everything has been sent, do one last check for an error
It can take a while for the dropped connection to make its way from
APNs back to your server just because of normal latency. It's possible
to send over 500 notifications before a write fails because of the
connection being dropped. Around 1,700 notifications writes can fail
just because the pipe is full, so just retry in that case once the
stream is ready for writing again.
Now, here's where the tradeoffs get interesting. You can check for an
error response after every write, and you'll catch the error right
away. But this causes a huge increase in the time it takes to send a
batch of notifications.
Device tokens should almost all be valid if you've captured them
correctly and you're sending them to the correct environment. So it
makes sense to optimize assuming failures will be rare. You'll get way
better performance if you wait for write to fail or the batch to
complete before checking for an error response, even counting the time
to send the dropped notifications again.
None of this is really specific to APNs, it applies to most
socket-level programming.
If your development tool of choice supports multiple threads or
interprocess communication, you could have a thread or process waiting
for an error response all the time and let the main sending thread or
process know when it should give up and retry.
Just wanted to chime in with a first person perspective, as we send millions of APNS notifications every day.
The reference #Eran quotes is unfortunately about the best resource we have for how Apple manages APNS sockets. It's fine for low volume, but Apple's documentation overall is very skewed towards the casual, low volume developer. You will see plenty of undocumented behavior once you get to scale.
The part of that document about doing error detection asynchronously is critical for high throughput. If you insist on blocking for errors on every send, then you'll need to heavily parallelize your workers to keep up throughput. The recommended way, however, is to just send as fast as you can send, and whenever you do get and error: repair and replay.
The part of that post I take exception to is:
Device tokens should almost all be valid if you've captured them
correctly and you're sending them to the correct environment. So it
makes sense to optimize assuming failures will be rare.
To predicate that advice with such a huge "IF" seems hugely misleading. I can almost guarantee that most developers are not capturing tokens and processing Apple's feedback service 100% "correctly". Even if they were, the system is inherently lossy, so drift is going to happen.
We see a non-zero number of error #8 responses (invalid device token) which I attribute to rooted phones, client bugs, or users intentionally spoofing their tokens to us. We have also seen a number of error #7 (invalid payload size) in the past, which we tracked down to improperly encoded messages that a developer added on our end. That was our fault of course, but that's my point--saying "optimize assuming failures will be rare" is the wrong message to send to learning developers. What I would say instead would be:
Assume errors will happen.
Hope that they happen infrequently, but
code defensively in case they don't.
If you optimize assuming errors will be rare, you may be putting your infrastructure at risk whenever the APNS service goes down and every message you send returns an error #10.
The trouble comes when trying to figure out how to properly respond to errors. Documentation is ambiguous or absent regarding how to properly handle and recover from different errors. This is left as an exercise for the reader apparently.

First message not arriving over an MSMQ/MassTransit Service Bus

I've got a MassTransit ServiceBus running over MSMQ. It appears that the first message sent over the Bus doesn't arrive, but subsequent messages do?
Is there some initialization that needs performing on the queue or bus before the message is sent?
This depends on a few settings in how much time the system needs to setup before everything will correctly route. If only first message is failing to end up in the right location, then likely the subscription data isn't propagated everywhere yet. http://readthedocs.org/docs/masstransit/en/develop/overview/subscriptions.html
Using Multicast subscriptions, the easiest choice, will require a few seconds after a endpoint has come up and register a subscriber with all other endpoints. If you can control the order of services starting up, then this can often be avoided by started back to front in the flow.
If you are using the subscription service, then that can also take a couple seconds to get data everywhere. It has to go through the subscription service but the subscription is send to everyone on the bus. This is tied to a SQL db, and latency to the db can effect this timing.
Lastly, if you are using static routing, then that should work immediately, because the subscription is setup upon startup.

How can I get QuickFix to process messages that come in from a resend request?

I am writing an acceptor application and using a persistent FIX session. I am trying to write a recovery mode, such that if I go offline or my program restarts, when I reconnect I want to reprocess all the messages sent to me during the day to get back to the current state.
To do this, when I start up I send a resend request for all messages to the server. They fire me back all the relevant messages, and they are marked possdupflag=Y and possresend=Y. Before each message, they send a sequence reset for the repeated message they are about to send.
The problem is though, these messages do not seem to be processed by my message cracker. Both fromAdmin and fromApp do not get these messages. I assume they are being ignored because of the dup flag and/or resend. So is there a way for me to tell QuickFIX that I want to see these messages?
On that note- if anyone has any recommendations on better recovery processes I would be open to them.
There's at least a couple of potential problems with this recovery strategy. The first is that it's not very friendly to your trading counterparty. If you only receive a small number of messages during your session then it may not be an issue, but if you receive hundreds of thousands of messages then your counterparty might complain about the massive resends.
The other issue is that message resend is intended for error recovery and is managed by the session protocol layer. In QuickFIX/J (and other FIX engines) the session maintains recovery state in addition to sending the ResendRequest automatically when it detects a sequence number gap. Your approach might work if you reset the next expected incoming sequence number to 1. When the session receives the next message with a higher sequence number it will detect the gap and request the missing messages. If the messages are validated, they will be forwarded to application layer with the PossDup flag set. If you send the ResendRequest message yourself the behavior is undefined since the session state will not have been set up properly.
I recommend using a MessageLog implementation to store your incoming messages in a form you can use for recovery when your application starts. You can look at the implementation of the existing message logs (FileLog, JdbcLog) to get some ideas.
The behaviour occurs because the engine's persistance system tells it that the recieved messages are resent messages and so (per the FIX protocol specification) are discarded. Here we save FIXml strings into our database to provide a similar recovery ability to that which you describe(they are also written to xml files on disk for other reasons). I don't believe that there is any way to tell quickfix that you want to see duplicate messages but it is probably better to use a different form or persistance to save on connection overheads. Quickfix does provide a way of outputting messages to file as they come in if that helps.
I too have the same issue and What Frank Says is absolutely correct ,
Just use the below method to set the target sequence number to the begin seq number of the desired resend req .
The engine internally identifies that the target number is way too large and automatically sends resend request , and all messages will be captured in onMessage call back itself as usual