I'm working with Rest Api that requires an incremented parameter to be sent with each request. I use unix miliseconds as nonce and originally naively sent requests one after another but even if I send one message before another, they can arrive in a reversed order which results in an error.
One solution could be sending the next request only after the previous one got back. But it would be too slow. I'm thinking about less strict solution like measuring latency over the last 10 requests and waiting for x% of latency before sending the next message. I feel like this problem should've been already solved but can't find any good reference. Would appreciate any advice.
Related
I would like to count the number of messages in the last hour (last hour referring to a timestamp field in the message data).
I currently have a code that will count the messages synchronously (I am using Google Cloud Pub/Sub Synchronous pull), but I noticed it will take quite long.
My code will repeatedly poll the subscription for a predefined (I set it to 100+) number of times so that I am sure there are no more messages in the last hour that are coming in out of order.
This is not an acceptable design because it means the user has to wait for 5-10 mins for the service to count the messages when they want the metric!
Are there best practices in Pub Sub design for solving this kind of problem?
This seems like a simple problem to solve (count the number of events in the last X timeframe) so I thought there might be.
Will asynchronous design help? How would an async design work? I am not too sure about the async and Python future concept (I am using GCP Pub/Sub's Python client library).
I will try to catch the message differently. My solution is based on logging and BigQuery. The idea is to write a log, for example message received with timestamp xxxxx, to filter this log pattern and to sink the result in BigQuery.
Then, when a user ask, you simply have to request BigQuery and to count the message in the desired lap of time. You also have the advantage to change the time frame, to have an history,...
For writing this log, 2 solutions
Cheaper but not really recommended, the process which consume the message log it with it process it. However, you are dependent of an external service. And this service has 2 responsibilities: its work, and this log (for metrics). Not SOLID. Maybe it's can be the role of the publisher with a loge like this: message published at XXXX. However this imply that all the publisher or all the subscribers are on GCP.
Better is to plug a function, the cheaper (128Mb of memory) to simply handle the message and write the log.
I have some questions regarding Azure Insights REST Api for Events.
When I make HTTP request to Inisghts API for events, I receive the header "
x-ms-ratelimit-remaining-subscription-reads", with value "14999".
But next query in 1s returns me the same value of remaining reads.
I see there is some throttling policy there, but I would like to understand how it works and what is the correct way to deal with that.
In particular,
1) how many reads I am able to do per second?
2) if I exceed the whole remaining reads parameter, how much time should I wait before it will again be maximum?
3) is it decreased on every query attempt, despite of the $top parameter setted and how many results has been returned?
Thank you!
This article seems to have the responses you need.
To answer the questions based on it:
There is no limit to the number of requests per second, but you have 15k
requests/hour/subscription/region/instance of ARM region. Worst case scenario you will get throttled after 15k requests but you'd have to be extremely unlucky for that.
If you exceed the limit, you are
told how much you have to wait and you can integrate that logic by
looking at the Retry-After header. Happily, it's a matter of
seconds.
I believe the $top parameter doesn't affect the query since
no matter how many results are brought back, a paging request is
still just one request.
As for the fact that you get 14999 requests
remaining multiple times, as they say in their documentation it is
expected since an ARM region has multiple instances and each instance has
15k requests limit/subscription/hour. If you hit simultaneously and
you get the same number remaining, it just means that you were lucky
enough to hit different instances within the same ARM region.
1) how many reads I am able to do per second?
Based on the rate limits published here - https://azure.microsoft.com/en-in/documentation/articles/azure-subscription-service-limits/#subscription-limits, you can perform 15000 reads / hour (not sure it would translate to 4 reads / second).
2) if I exceed the whole remaining reads parameter, how much time
should I wait before it will again be maximum?
Given the rates are defined per hour, my guess would be to wait till next hour if you exhaust 15000 read request limit.
3) is it decreased on every query attempt, despite of the $top
parameter setted and how many results has been returned?
This is based on the number of API calls and not the amount of data returned. So I would say defining $top parameter should not have any impact on this.
When I make HTTP request to Inisghts API for events, I receive the
header " x-ms-ratelimit-remaining-subscription-reads", with value
"14999". But next query in 1s returns me the same value of remaining
reads.
I would assume there's some caching in play here. Is it the same request you're repeating or a different request all together?
I have a ZMQ_PUB socket sending messages out at ~50Hz. One destination needs to react to each message, so it has a standard ZMQ_SUB socket with a while(true) loop checking for new messages. A second destination should only react once a second to the "most recent" message. That is, my second destination needs to subsample.
For the second destination, I believe I'd want to have a time-based loop that is called at my desired rate (1Hz) and recv() the latest message, dropping the rest. I believe this is done via a ZMQ_HWM on the subscriber. Is there another option that needs to be set somewhere?
Do I need to worry about the different subscribers having different HWMs? Will the publisher become angry? It's a shame ZMQ_RATE only applies to multicast sockets.
Is there a best way to accomplish what I'm attempting?
zmq v3.2.4
The HighWaterMark will not be a fantastic solution for your problem. Setting it on the subscriber to, let's say, 10 and reading 1 message per second, will just give you the old messages first, slowly, and throw away all the new, because it's limit are reached.
You could either use a topic on you publisher that makes you able to filter out every 50th message like making the topic messageCount % 50 and subscribe to 0.
Otherwise maybe you shouldn't use zmq's pub/sub, but instead do you own look alike with router/dealer that allows you to subscribe to sampled messages.
Lastly you could also just send them all. 50 m/s is hardly anything in zmq (if they aren't heavy on data, like megs) and then only use every 50th message.
I have a use case where by i wish to have a ZeroMQ Request / Reply socket 'stream' back results, is this possible with MultiPart messages (i.e. The Reply sockets streams the frames back before HasMore = false?) or am i approaching this incorrectly?
The situation:
1) Client makes a query (Request) for some records
2) Server looks up Database for results and responds with the current large amount records (Reply) split into frames
3) Server must wait until a Server Side event is generated before the final Frame is sent (HasMore = false)
4) Client wont get the previous Frames until the Final Event has been generated and HasMore = false
Thanks for your help.
As far as I understand what you're aiming for, it sounds like what you have will work the way you expect. See here for more discussion on message frames. The salient points:
As you say, all of the frames will be sent to the client at one time, they will be stored on the server until HasMore is set to false.
One important thing to remember here, if it's a truly large amount of data, you must be able to fit the entire data set into memory, because it'll be stored in your server memory until the entire message with all frames is complete, and then it'll be received into memory before it's processed on the client side.
I assume primarily what you're looking for is a way to iteratively build up a message before you send it? And perhaps to be able to deal with the data on the client iteratively as well? Also you get a guarantee that you won't lose part of the data in the middle, you either get the whole message or lose the whole message (as opposed to instead sending each frame as a separate message). This is one of the primary use cases for frames, so you've done well.
The only thing I object to is using the word "stream", as that implies that the data is being sent to the client continuously as it's being processed on the server, and that's explicitly not what you're trying to do (nor is it possible with ZMQ message frames).
Goal:
I use Bloomberg Java API's subscription service to monitor bond prices in real time (subscribing to ASK/BID real time fields). However in the RESPONSE messages, bloomberg does not provide the associated yield for the given price. I need a way to calculate the yields.
Attempt:
Here's what I've tried:
Within in the code that processes Events coming backing from a real time subscription, when I get a BID or ASK response, I extract the price from the message element, and then initiates a new synchronous reference data request, using overrides to get the YAS_BOND_YLD by providing YAS_BOND_PX and setting the overriding flag.
Problem:
This seems very slow and cumbersome. Is there a better way other than having to calculate yields myself?
In my code, I seem to be able to process real time prices if they are being sent to me slowly. If a few bonds' prices were updated at the same time (say, in MSG1 pricing), I seem to only capture one out of these updates, it feels like I'm missing the other events.. Is this because I cannot use a synchronous reference data request while the subscription is still alive?
Thanks.
bloomberg does not provide the associated yield for the given price
Have you tried retrieving the ASK_YIELD and BID_YIELD fields? They may be what you are looking for.
Problem: This seems very slow and cumbersome.
Synchronous one-off requests are slower than real time subscription. Unless you need real time data on the yield, you could queue the requests and send them all at once every x seconds for example. The time to get 100 or 1 yield is probably not that different, and certainly not 100 times slower.
In my code, I seem to be able to process real time prices if they are being sent to me slowly. If a few bonds' prices were updated at the same time (say, in MSG1 pricing), I seem to only capture one out of these updates, it feels like I'm missing the other events.. Is this because I cannot use a synchronous reference data request while the subscription is still alive?
You should not miss items just because you are sending a synchronous request. You may get a "Slow consumer warning" but that's about it. It's difficult to say more without seeing your code. However, if you want to make sure your real time data is not delayed by your synchronous requests, you should use two separate Sessions.