What all kind of errors result in Rejected Messages other than Translation errors? I have to desgin a fault policy for handling Rejected Messages, i need to be sure that only Translation errors get catched.
All the inbound adapter translation errors lead to Rejected Messages. If your incoming data at runtime is not in compliance with the XSD defined at design time, then all such translation errors leads to Rejected Messages. As you mentioned, use a fault policy file to handle them.
Related
A Quickfix client validates incoming messages using XML spec files. If a message fails validation, quickfix automatically sends a rejection response. AFAIK in this case quickfix does not call the standard callback for incoming messages fromApp(), so up till now I was unable to programatically capture these erroneous incoming messages and handle them.
Is there a way to capture incoming FIX messages which fail quickfix validation?
Of course they may appear in the default quickfix log files, but I would rather capture them in my code in realtime.
There is not.
QuickFIX simply does not consider this a useful feature. If a message is invalid, QF performs the protocol-specified behavior and there is nothing that the application could or should do to recover. Any fix will require developer analysis and xml and/or code fixes, thus log files are sufficient to record the issue.
If you would like an automated alert when such errors occur, I suggest perhaps some kind of external log monitoring app that could watch your logs for occurrences of 35=3 or 35=j. (On the cheap side, a composition of cron/grep actions could do this very easily.)
Validation via XML spec file is in session level processing.
So, there is not suitable hook for this.
On the other hand, there are some configuration parameters;
UseDataDictionary : eliminates validation
ValidateUserDefinedFields : eliminates user defined field's validation
look for detailed descriptions
edit:
If your real problem is monitoring rejections, capturing Reject(3) and BusinessReject(j) messages at toAdmin() hook is sufficient.
I have an app that involves sending Apple Push Notifications to ~1M users periodically. The setup for doing so has been built and tested for small numbers of notifications. Since there is no way I can test sending at that scale, I am interested in knowing whether there are any gotchas in sending bulk push notifications. I have scripts written in Python that open a single connection to the push server and send all notifications over that connection. Apple recommends keeping it open for as long as possible. But I have also seen that the connection terminates and you need to reestablish it.
All in all, it is disconcerting that successful sends are not acknowledged, only erroneous ones are flagged. From a programmer's standpoint instead of simply checking one thing "if (success)" you now need to watch for numerous things that could go wrong.
My question is: What are the typical set of errors that you need to watch out for to make sure your messages don't silently disappear into oblivion? The connection closing is an easy one. Are there others?
I completely agree with you that this API is very frustrating, and if they would have sent a response for each notification it would have been much easier to implement.
That said, here's what Apple say you should do (from Technical Note) :
Push Notification Throughput and Error Checking
There are no caps or batch size limits for using APNs. The iOS 6.1
press release stated that APNs has sent over 4 trillion push
notifications since it was established. It was announced at WWDC 2012
that APNs is sending 7 billion notifications daily.
If you're seeing throughput lower than 9,000 notifications per second,
your server might benefit from improved error handling logic.
Here's how to check for errors when using the enhanced binary
interface. Keep writing until a write fails. If the stream is ready
for writing again, resend the notification and keep going. If the
stream isn't ready for writing, see if the stream is available for
reading.
If it is, read everything available from the stream. If you get zero
bytes back, the connection was closed because of an error such as an
invalid command byte or other parsing error. If you get six bytes
back, that's an error response that you can check for the response
code and the ID of the notification that caused the error. You'll need
to send every notification following that one again.
Once everything has been sent, do one last check for an error
response.
It can take a while for the dropped connection to make its way from
APNs back to your server just because of normal latency. It's possible
to send over 500 notifications before a write fails because of the
connection being dropped. Around 1,700 notifications writes can fail
just because the pipe is full, so just retry in that case once the
stream is ready for writing again.
Now, here's where the tradeoffs get interesting. You can check for an
error response after every write, and you'll catch the error right
away. But this causes a huge increase in the time it takes to send a
batch of notifications.
Device tokens should almost all be valid if you've captured them
correctly and you're sending them to the correct environment. So it
makes sense to optimize assuming failures will be rare. You'll get way
better performance if you wait for write to fail or the batch to
complete before checking for an error response, even counting the time
to send the dropped notifications again.
None of this is really specific to APNs, it applies to most
socket-level programming.
If your development tool of choice supports multiple threads or
interprocess communication, you could have a thread or process waiting
for an error response all the time and let the main sending thread or
process know when it should give up and retry.
Just wanted to chime in with a first person perspective, as we send millions of APNS notifications every day.
The reference #Eran quotes is unfortunately about the best resource we have for how Apple manages APNS sockets. It's fine for low volume, but Apple's documentation overall is very skewed towards the casual, low volume developer. You will see plenty of undocumented behavior once you get to scale.
The part of that document about doing error detection asynchronously is critical for high throughput. If you insist on blocking for errors on every send, then you'll need to heavily parallelize your workers to keep up throughput. The recommended way, however, is to just send as fast as you can send, and whenever you do get and error: repair and replay.
The part of that post I take exception to is:
Device tokens should almost all be valid if you've captured them
correctly and you're sending them to the correct environment. So it
makes sense to optimize assuming failures will be rare.
To predicate that advice with such a huge "IF" seems hugely misleading. I can almost guarantee that most developers are not capturing tokens and processing Apple's feedback service 100% "correctly". Even if they were, the system is inherently lossy, so drift is going to happen.
We see a non-zero number of error #8 responses (invalid device token) which I attribute to rooted phones, client bugs, or users intentionally spoofing their tokens to us. We have also seen a number of error #7 (invalid payload size) in the past, which we tracked down to improperly encoded messages that a developer added on our end. That was our fault of course, but that's my point--saying "optimize assuming failures will be rare" is the wrong message to send to learning developers. What I would say instead would be:
Assume errors will happen.
Hope that they happen infrequently, but
code defensively in case they don't.
If you optimize assuming errors will be rare, you may be putting your infrastructure at risk whenever the APNS service goes down and every message you send returns an error #10.
The trouble comes when trying to figure out how to properly respond to errors. Documentation is ambiguous or absent regarding how to properly handle and recover from different errors. This is left as an exercise for the reader apparently.
What is the purpose of the message store structure in quickfix? I understand that you can log all incoming and outgoing fix messages via the message store interface and quickfix provides multiple implementations like file store etc.
My question is why do you even care about the message store other than logging your fix messages for the record?
You are confusing the MessageStore and the Log, which are two different things.
The MessageStore is for internal engine use. It tracks the current incoming and outgoing message sequence numbers, session start time, and other stuff. If your app goes down for whatever reason, when it restarts, it uses the MessageStore to resume where it left off with regards to sequence number and whether to reset the session.
The Log, however, is just a log. The engine doesn't really care about it. It's for the developers.
I am writing an acceptor application and using a persistent FIX session. I am trying to write a recovery mode, such that if I go offline or my program restarts, when I reconnect I want to reprocess all the messages sent to me during the day to get back to the current state.
To do this, when I start up I send a resend request for all messages to the server. They fire me back all the relevant messages, and they are marked possdupflag=Y and possresend=Y. Before each message, they send a sequence reset for the repeated message they are about to send.
The problem is though, these messages do not seem to be processed by my message cracker. Both fromAdmin and fromApp do not get these messages. I assume they are being ignored because of the dup flag and/or resend. So is there a way for me to tell QuickFIX that I want to see these messages?
On that note- if anyone has any recommendations on better recovery processes I would be open to them.
Thanks.
There's at least a couple of potential problems with this recovery strategy. The first is that it's not very friendly to your trading counterparty. If you only receive a small number of messages during your session then it may not be an issue, but if you receive hundreds of thousands of messages then your counterparty might complain about the massive resends.
The other issue is that message resend is intended for error recovery and is managed by the session protocol layer. In QuickFIX/J (and other FIX engines) the session maintains recovery state in addition to sending the ResendRequest automatically when it detects a sequence number gap. Your approach might work if you reset the next expected incoming sequence number to 1. When the session receives the next message with a higher sequence number it will detect the gap and request the missing messages. If the messages are validated, they will be forwarded to application layer with the PossDup flag set. If you send the ResendRequest message yourself the behavior is undefined since the session state will not have been set up properly.
I recommend using a MessageLog implementation to store your incoming messages in a form you can use for recovery when your application starts. You can look at the implementation of the existing message logs (FileLog, JdbcLog) to get some ideas.
The behaviour occurs because the engine's persistance system tells it that the recieved messages are resent messages and so (per the FIX protocol specification) are discarded. Here we save FIXml strings into our database to provide a similar recovery ability to that which you describe(they are also written to xml files on disk for other reasons). I don't believe that there is any way to tell quickfix that you want to see duplicate messages but it is probably better to use a different form or persistance to save on connection overheads. Quickfix does provide a way of outputting messages to file as they come in if that helps.
I too have the same issue and What Frank Says is absolutely correct ,
Just use the below method to set the target sequence number to the begin seq number of the desired resend req .
getSession()->setNextTargetMsgSeqNum(atoi(seq.c_str()));
The engine internally identifies that the target number is way too large and automatically sends resend request , and all messages will be captured in onMessage call back itself as usual
We're doing stress testing of our application right now and when it blows up we wind up hammering our Exchange server with exception notifications sent via EntLib 4.0 Email Trace Listener.
What strategies can we use to throttle the emails being sent. Is there anything in EntLib for this or does it have to be something configured in Exchange? I'm hoping this can be solved in our App so we don't have to tell the client to make changes to their email server config.
It's either that, or we just disable the Email Trace Listener, which I don't think is a very good option at all.
The e-mail trace listener is really geared up for high throughput of notifications. If you are getting huge numbers of exceptions then this may not be the appropriate mechanism for you.
You should ask yourself what the objective of logging through email was. Then you can adjust the level of severity of log messages (Critical, Error, Warning, Info etc) that need to go through to meet your logging objectives. Notice, 'All' is the default option. You can do this at the level of the TraceListener or by setting up Filters on log categories.