I am implementing QOS in tigase to ensure delivery of messages.Below are listed my QOS steps
Caching every message packet ( except composing, typing, stopped etc) in a sorted order of timestamp
On receiving delivery notification of packet i delete it from cache
After regular intervals a thread is launched to check packets which are still present in cache with time window
If thread found any packet in that time window ( it means that message is not delivered) so thread need to send it again
My question is how can i send packet again sitting inside that thread.
Can i try addOutPacket from thread ( it's not working right now )
Should i implement my QOS in component so that it can easily do addOutPacket
Or is there any better way to achieve this
Your suggestions are highly thankful
EDIT:
Some clarifications:
We will not save messages in cache indefinitely
We will retry every message at most 3 times after which we will save the un-acknowledged packet in the Offline Storage
If XEP-0198 is enabled in Tigase by default which I assume it is, we still experience message loss when the connection between server and client is lost (irrecoverable failure). If the network layer takes time to detect an irrecoverable failure, the messages sent to that connection are permanently lost. In case of EDGE or a shaky internet connection, we face this consistently seriously hurting user experience.
Tigase already support XEP-0198 - stream management extension which includes packet delivery confirmation. Therefore I am not sure if you really need to implement your QoS system for Tigase.
Please explain why XEP-0198 is not good enough and what you are trying to implement. What you really mean by QoS system?
A few other questions - what happens when the message cannot be delivered let's say 100 times? Are you going to retry indefinitely? Another question. How many messages can you keep in your cache? What if your cache is full and you cannot put a new message in the cache? Is your QoS system designed to handle load of 100k messages per second for 10 millions of connected users?
Related
I have 15 worker clients and one master connected through internet. Job & data are been passed through REST api in json format.
Jobs are not restricted to any particular client. Any worker can query for the available job in regular interval(say 30 seconds), process it and will update the status.
In this scenario, how can I prevent same records been sent to different clients while GET request.
Followings are my solution approach to overcome this issue:
Take top 5 unprocessed records from the database and make it as SENT and expose via REST GET.
But the problem is, it creates inconsistency. Some times, the client doesn't got data due to network connectivity issue. But in server, it will be marked as SENT. So, no other clients can get that data. It will remain as SENT forever.
Get the list from server, and reply back the list of job IDs to Server as received. But in-between this time gap, some other clients also getting same set of Jobs.
You've stumbled upon a fundamental problem in distributed systems: there is no way to know if the other side received your message. You can certainly improve the situation with TCP and ack messages. But if you never get the ACK did the message never arrive, did it arrive but the recipient die before processesing, or did the recipient send he ACK and the ACK get dropped?
That means you need to design your system to handle receiving data more than once.
You offer two partial solutions; if you combine them, your solution starts to look like how SQS works. Mark the item as pending_ack with a timestamp. After client replies, it is marked sent. Any pending_ackss past a certain time period are eligible to be resent.
Pick your time period to allow for slow network and slow clients and it boils down to only sending duplicates when you really don't know if the client died or not.
Maybe you should reconsider the approach to blocking resources. REST architecture - by definition is not obliged to save information about client. Instead, you may want to consider optimistic concurrency control (http://en.wikipedia.org/wiki/Optimistic_concurrency_control).
I'm writing a game server for a turn-based game. One criteria is that the game needs to be as fair for all players as possible.
So far it works like this:
Each client has a TCP connection. (If relevant, the connection is opened via WebSockets)
While running, continually check for incoming socket messages via epoll.
Iterate through clients with sockets ready to read:
Read all messages from the client.
Update the internal game state for each message.
Queue outgoing messages to affected clients.
At the end of each "window" (turn):
Iterate through clients and write all queued outgoing messages to their sockets
My concern for fairness raises the following questions:
Does it matter in which order I send messages to the clients?
Calling write() on all the sockets takes only a fraction of a second for my program, but somewhere in the underlying OS or networking would it make a difference if I sorted the client list?
Perhaps I should be sending to the highest-latency clients first?
Does it matter how I write the outgoing messages to the sockets?
Currently I'm writing them as one large chunk. The size can exceed a single packet.
Would it be faster for the client to begin its processing if I sent messages in smaller chunks than 1 packet?
Would it be better to write 1 packet worth to each client at a time, and iterate over the clients multiple times?
Are there any linux/networking configurations that would bear impact here?
Thanks in advance for your feedback and tips.
Does it matter in which order I send messages to the clients?
Yes, by fractions of milliseconds. If the network interface is available for sending the OS will immediately start sending. Why would it wait?
Perhaps I should be sending to the highest-latency clients first?
I think you should be sending in random order. Shuffle the list prior to sending. This makes it fair. I think your question is valid and this should be addressed.
Currently I'm writing them as one large chunk. [...]
First, realize that TCP is stream-based and that there are no packets/messages at the protocol level. On a physical level data is indeed packetized.
It is not necessary to manually split off packets because clients will read data as it arrives anyway. If a client issues a read, that read will complete immediately once the first packet has arrived. There is no artificial waiting in the OS.
Are there any linux/networking configurations that would bear impact here?
I don't know. Be sure to disable nagling.
I have an existing system and am wondering if MSMQueue can retain value of queue if it restarts. It clears the value when I restart.
As paxdiablo writes MSMQ is a persistent queueing solution, but not by default! The default is to store messages in RAM and to have MSMQ to persist messages to disk so they are not lost in case of a server crash you have to specify it on EACH message.
More information on this can be found if you take a look at the property Message.Recoverable.
As #Kjell-Åke Gafvelin already said, you may configure each message, but the IMHO more convenient way would be to set it on the Queue itself.
MessageQueue msgQ = new MessageQueue(#".\private$\Orders");
msgQ.DefaultPropertiesToSend.Recoverable = true;
msgQ.Send("This message will be marked as Recoverable");
msgQ.Close();
From the article above (highlights by me):
By default, MSMQ stores some messages in memory for increased
performance, and a message may be sent and received from a queue
without ever having been written to disk.
Aditionally, you should make the queue transactional to guarantee the correct shipment and receiving of a message.
(Edit 2020-10-27: Removed link to external Microsoft post "Reliable messaging with MSMQ and .NET" as it is not available anymore.)
Yes, MSMQ is a persistent queueing solution. It stores messages securely on backing storage that will not be affected by loss of power (unless you experience things like the disk blowing apart from a truly massive power surge of course).
Its whole point is to provide reliable queueing of messages in a potentially unreliable environment. To that end, losing messages when a particular server went down would be a considerable disadvantage.
From Microsoft's own pages (and apologies for the sales-pitch-like language):
Message Queuing applications can use the Message Queuing infrastructure to communicate across heterogeneous networks and with computers that may be offline. Message Queuing provides guaranteed message delivery, efficient routing, security, transaction support, and priority-based messaging.
I am writing an acceptor application and using a persistent FIX session. I am trying to write a recovery mode, such that if I go offline or my program restarts, when I reconnect I want to reprocess all the messages sent to me during the day to get back to the current state.
To do this, when I start up I send a resend request for all messages to the server. They fire me back all the relevant messages, and they are marked possdupflag=Y and possresend=Y. Before each message, they send a sequence reset for the repeated message they are about to send.
The problem is though, these messages do not seem to be processed by my message cracker. Both fromAdmin and fromApp do not get these messages. I assume they are being ignored because of the dup flag and/or resend. So is there a way for me to tell QuickFIX that I want to see these messages?
On that note- if anyone has any recommendations on better recovery processes I would be open to them.
Thanks.
There's at least a couple of potential problems with this recovery strategy. The first is that it's not very friendly to your trading counterparty. If you only receive a small number of messages during your session then it may not be an issue, but if you receive hundreds of thousands of messages then your counterparty might complain about the massive resends.
The other issue is that message resend is intended for error recovery and is managed by the session protocol layer. In QuickFIX/J (and other FIX engines) the session maintains recovery state in addition to sending the ResendRequest automatically when it detects a sequence number gap. Your approach might work if you reset the next expected incoming sequence number to 1. When the session receives the next message with a higher sequence number it will detect the gap and request the missing messages. If the messages are validated, they will be forwarded to application layer with the PossDup flag set. If you send the ResendRequest message yourself the behavior is undefined since the session state will not have been set up properly.
I recommend using a MessageLog implementation to store your incoming messages in a form you can use for recovery when your application starts. You can look at the implementation of the existing message logs (FileLog, JdbcLog) to get some ideas.
The behaviour occurs because the engine's persistance system tells it that the recieved messages are resent messages and so (per the FIX protocol specification) are discarded. Here we save FIXml strings into our database to provide a similar recovery ability to that which you describe(they are also written to xml files on disk for other reasons). I don't believe that there is any way to tell quickfix that you want to see duplicate messages but it is probably better to use a different form or persistance to save on connection overheads. Quickfix does provide a way of outputting messages to file as they come in if that helps.
I too have the same issue and What Frank Says is absolutely correct ,
Just use the below method to set the target sequence number to the begin seq number of the desired resend req .
getSession()->setNextTargetMsgSeqNum(atoi(seq.c_str()));
The engine internally identifies that the target number is way too large and automatically sends resend request , and all messages will be captured in onMessage call back itself as usual
We have a Pub / Sub system based on NServiceBus, where we have intermittent issues with messages getting stuck on the Publishers outgoing queue indefinitely, rather than being transmitted to the Subscribers input queues.
Points to note:
When we restart the Publisher Service and Subscriber services, message flow resumes normally for a while.
The problem seems to occur more often if a sustained period of time between messages occurs.
The publisher service resides on the LAN, the subscribers on the otherside of a firewall.
Some messages get through! As mentioned after service restarts, things go fine for a while.
Using QueueExplorer, I can see the messages on the Outgoing queue have a state of WAITING.
Annoyingly our development environment does not exhibit this behaviour, but then again the publisher and subscribers all reside on the same LAN in this environment.
MSMQ messages being stuck in an outgoing queue is purely an MSMQ issue. Restarting the Publisher and Subscriber services should make no difference as they are not directly involved in message delivery. If you can fix the problem by ONLY restarting the Pub/Sub services and NOT the Message Queuing services then it looks like a resources/memory leak problem.
I imagine something like this happening:
Messages flow to destination, which uses up kernel memory in storing them
For some reason, kernel memory runs out (too many messages, memory leak, whetever)
Destination now rejects new messages as they cannot be loaded into memory from the wire
Connection is reset and not re-connected until WaitTime value reached; Queue is "waiting" at this point
System loops through (3) and (4) until ...
Pub/Sub services are restarted and now there is sufficient resources for messages to be delivered
Goto (2)
Occasional messages get through when just enough kernel memory is temporarily freed up by one of the many services and device drivers that use it.
Item 4 of this blog post is the most likely culprit:
http://blogs.msdn.com/b/johnbreakwell/archive/2006/09/18/insufficient-resources-run-away-run-away.aspx
Cheers
John Breakwell
We had a similar scenario in production, it turned out we migrated one of our subscriber endpoints to a new physical host and forgot to unsubscribe before shutting down the old endpoint. Our publisher was trying to deliver messages to both the old and new endpoints but could only reach the new one. Eventually the publishers outbound queue grew so large that it started affecting all outgoing messages.
I have run into this issue as well, I know it is not Item 4, as I don't send anything to it before it gets stuck in the outgoing queue. If I let both publisher and subscriber sit for about 10 minutes before sending a message, it never leaves the outgoing queue. If I send a message before that amount of time, it flows fine. Also, if I restart the subscriber the message will then flow. This is reproducible every time I let them sit idle for 10 minutes.
I think I found the answer here, at least this fixed the issue I was having:
http://support.microsoft.com/kb/2554746
Also, in my case it had nothing to do with restarting, so don't let that throw you off, I did exhibit the symptoms in the netstat and messages would initially go through when the client was first started up.
Just to throw my 2p in:
We had an issue where the message queuing service had some kind of memory leak and would consume large amounts of memory which is did not release.
This lead to messages getting stuck for long periods of time - although they would eventually be delivered (sometimes after 3 days).
We have not bothered fixing this yet as it only happens when the service is under heavy load which does not happen often.