Last week encountered for the first time a rate limit exceed error (4003) in our nightly batch-process. This batch proces is synchronising Smartsheet objects with our TimeTracking application 4TT.
Since 2016 this proces works fine, but somehow now this rate limit error occurs and therefore stops synchronising. With the help of the API (and blog about rate limit) I managed to change the code, putting in pauses when this error occurs. This has taken me quite a lot of time, as every time the error occured in a different part of the synchronisation proces.
Is there or will there be a way to let the API automatically pauses, when the rate limit is about to exceed in stead of changing the code every time. And for those who don't want this feature, for example adding an optional boolean argument 'AutomaticallyPauseWhenRateLimitExceeds' (default false) when making the connection to the Smartsheet API?
You'll need to include logic in your code to effectively handle the rate limiting error -- there's no mechanism by which the Smartsheet API can automatically handle this situation for you.
A simple approach would be for you to include logic in your code such that when a rate limiting error is thrown, your code pauses execution for 60 seconds before continuing. Alternatively, a more sophisticated approach would be to implement exponential backoff logic in your code (an error handling strategy whereby you periodically retry a failed request with progressively longer wait times between retries, until either the request succeeds or the certain number of retry attempts is reached).
Implementing this type of error handling logic should not be difficult or tedious, provided that your code is structured in an efficient manner (i.e., error handling logic is encapsulated in a single location).
Additional note: The Smartsheet API Best Practices blog post (specifically the Be practical: Adhere to rate limiting guidelines section) contains info about this topic.
All our SDKs include error retry. So that's the easiest way to handle this situation. http://smartsheet-platform.github.io/api-docs/#sdks-and-sample-code
I found this and other interesting problems (in my lab) while updating the sheet including Poor Internet connection/bandwidth issues.
If unable to accommodate your code to process chunks of data, my suggestion is to use a simple Try/Catch logic to pause the thread/task for 60 secs and then try again.
using System.Threading
...
... //all your code goes here
...
try
{
// your code to Save/update the Sheet goes here
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Thread.Sleep(60000);
}
The next step is to work notifications when those errors happen
Related
I have a Spring Batch Process which submits something around 5M urls to Google Indexing API. In the past, the process was segmented e parallelized int two threads by an attribute, one for the small segments and one for the bigger. From some days ago up to now, it was refactored to submit request as it come from a query response (sorted by its priority, ignoring the previous segmenting attribute, using a single thread to execute). After that refactoring, I started getting a "rateLimitExceed" error from Google API. I have (by contract) 5M request a day and I'm submitting batches of 500 urls a time. The average sending time is around 1.2 seconds for each 500 urls batch.
Does anybody know what may be causing this error?
I did not do the math, but if you are getting this exception, it means you are exceeding the limit. Depending on where you are doing the API call (ie in the item writer or in an item processor), you can do the math and delay the call as needed with a listener to not exceed the limit.
You can find a similar question/answer here: Spring batch writer throttling
One of my tests for a function that performs increments using the MongoDB driver for Go is randomly breaking in an unexpected way. Here's what the test does:
Create a proxy (with toxiproxy) to a local MongoDB instance.
Disable the proxy, so the database looks like it's down.
Run a function that does an update that increments a field, timing out after 100ms. If it fails, it keeps retrying every 100ms until the command succeeds.
Sleep 1 second.
Enable the proxy.
Wait for the function to complete and assert that the field has been incremented correctly - only once.
This test is randomly breaking because sometimes that field gets incremented twice. I noticed that it happens when an update is retried just as the proxy gets enabled: the client code receives an incomplete read of message header: context deadline exceeded error, which makes it retry the command, but the previous one indeed succeeded because the field ends up being incremented twice.
I took a look at the driver code and I guess it's timing out while reading the server response - perhaps the proxy is enabled just after the update has started and there isn't much timeout left for both write and read operations to complete.
Is there anything that I can do on my side to prevent this from happening? I tried to find a specific error to catch, but I couldn’t find any. Or is this something the driver itself is supposed to handle?
Any help is appreciated.
UPDATE: I looked closely at the error messages and noticed that, while the MongoDB instance was down, all errors were handshake failures. So I made sure the test ping the database before disabling the proxy to get the handshake out of the way and the test stopped randomly breaking; it ran 1000 times flawlessly, at least. I assume the handshake itself takes time to complete and that contributes to the command timeout.
In general, if you know the command went through (to the server), if you can't read the response, you can't assume anything about its success.
In some cases when it only matters if the server got the command, or you only care about the command reaching the server, then read on.
Unfortunately the current state of the driver (v1.7.1) is not "sophisticated" enough to easily tell if the error is from reading the response.
I was able to reproduce your issue locally. Here is the error when a timeout happens reading the response:
mongo.CommandError{Code:0, Message:"connection(localhost:27017[-30]) incomplete read of message header: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-30]", Wrapped:context.deadlineExceededError{}, init:false, message:"incomplete read of message header"}}
And there is the error when the timeout happens writing the command:
mongo.CommandError{Code:0, Message:"connection(localhost:27017[-31]) unable to write wire message to network: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-31]", Wrapped:context.deadlineExceededError{}, init:false, message:"unable to write wire message to network"}}
As you can see, in both cases mongo.CommandError is returned, with identical Code and Labels fields. Which leaves you having to analyze the error string (which is ugly and may "break" with future changes).
So the best you can do is check if the error string contains "incomplete read of message header", and if so, you don't have to retry. Hopefully this (error support and analysis) improves in the future.
If you are using the retryable writes as implemented by MongoDB 3.6+ and the respective drivers, this shouldn't happen. Each write is accompanied by a transaction number (not to be confused with client-side transactions as implemented by MongoDB 4.0+), and if the same transaction number is used in two consecutive writes there is only one write being done by the server.
This functionality has been around for years so unless you are using an ancient driver version you should already have it.
If you are performing write retries in your application manually rather than using the driver's retryable write functionality, you can write twice as you found out. The solution is to use the driver's retryable writes.
I had the same problem (running on go.mongodb.org/mongo-driver v1.8.1 on a MongoDB 4.4) and will leave my experiences with this problem here.
To add to #icza solution:
You can also get the error context deadline exceeded so check also for that.
A check for a context abortion would look something like this:
if strings.Contains(err.Error(), "context") && (strings.Contains(err.Error(), " canceled") || strings.Contains(err.Error(), " deadline exceeded")) {
...
}
My solution to the problem was instead of first checking if there was an error you'd first check if there was a result from the transaction.
Example:
result, err := database.collection.InsertOne(context, item)
if result != nil {
return result.InsertedID, err
}
return nil, err
If the transaction did process it despite the error, you could add some compensation logic to undo the transaction.
We have a java based Data flow pipeline which reads from Bigtable, after some processing write data back to Bigtable. We use CloudBigtableIO for these purposes.
I am trying wrap my head around failure handling in CloudBigtableIO. I haven;t found any references/documentation on how the errors are handled inside and outside the CloudBigtableIO.
CloudBigtableIO has bunch of Options in BigtableOptionsFactory which specify timeouts, grpc codes to retry on, retry limits.
google.bigtable.grpc.retry.max.scan.timeout.retries - is this the retry limit for scan operations or does it include Mutation operations as well? if this is just for scan, how many retries are done for Mutation operations? is it configurable?
google.bigtable.grpc.retry.codes - Do these codes enable retries for both scan, Mutate operations?
Customizing options would only enable retries, would there be cases where CloudBigtableIO reads partial data than what is requested but not fails the pipeline?
When mutating few millions of records, I think it is possible we get errors beyond the retry limits, what happens to such mutations? do they fail simply? how do we handle them in pipeline? BigQueryIO has function that collects failures and provides a way to retrieve them through side output, why do CloudBigtableIO doesn't have one such functions?
We occasionally get DEADLINE_EXCEEDED errors while writing mutations but the logs are not clear whether the mutations were retried and successful or Retries were exhausted, I do see RetriesExhaustedWithDetailsException but that is of no use, if we are not able to handle failures
Are these failures thrown back to the preceding step in data flow pipeline if preceding step and CloudBigtableIO write are fused? with bulk mutations enabled it is not really clear on how the failures are thrown back to preceding steps.
For question 1, I believe google.bigtable.mutate.rpc.timeout.ms would correspond to mutation operations, though it is noted in the Javadoc that the feature is experimental. google.bigtable.grpc.retry.codes allows you to add additional codes to retry on that are not set by default (defaults include DEADLINE_EXCEEDED, UNAVAILABLE, ABORTED, and UNAUTHENTICATED)
You can see an example of the configuration getting set for mutation timeouts here: https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-client-core-parent/bigtable-hbase/src/test/java/com/google/cloud/bigtable/hbase/TestBigtableOptionsFactory.java#L169
google.bigtable.grpc.retry.max.scan.timeout.retries:
It is only for setting the number of time to retry after a SCAN timeout.
Regarding retries on mutation operations
This is how Bigtable handles operations failures.
Regarding your question about handling errors in the pipeline
I see that you already are aware for the "RetriesExhaustedWithDetailsException". Please keep in mind that in order to retrieve the detailed exceptions for each failed request you have to call the "RetriesExhaustedWithDetailsException#getCauses()"
As for the failures, Google documentation states:
" Append and Increment operations are not suitable for retriable batch
programming models, including Hadoop and Cloud Dataflow, and are
therefore not supported inputs to CloudBigtableIO.writeToTable.
Dataflow bundles, or a group of inputs, can fail even though some of
the inputs have been processed. In those cases, the entire bundle will
be retried, and previously completed Append and Increment operations
would be performed a second time, resulting in incorrect data."
Some documentation that you may consider helpful:
Cloud Bigtable Client Libraries
Source code for Cloud Bigtable HBase client for Java
Sample code for use with Cloud BigTable
Hope you find the above helpful.
I would like to count the number of messages in the last hour (last hour referring to a timestamp field in the message data).
I currently have a code that will count the messages synchronously (I am using Google Cloud Pub/Sub Synchronous pull), but I noticed it will take quite long.
My code will repeatedly poll the subscription for a predefined (I set it to 100+) number of times so that I am sure there are no more messages in the last hour that are coming in out of order.
This is not an acceptable design because it means the user has to wait for 5-10 mins for the service to count the messages when they want the metric!
Are there best practices in Pub Sub design for solving this kind of problem?
This seems like a simple problem to solve (count the number of events in the last X timeframe) so I thought there might be.
Will asynchronous design help? How would an async design work? I am not too sure about the async and Python future concept (I am using GCP Pub/Sub's Python client library).
I will try to catch the message differently. My solution is based on logging and BigQuery. The idea is to write a log, for example message received with timestamp xxxxx, to filter this log pattern and to sink the result in BigQuery.
Then, when a user ask, you simply have to request BigQuery and to count the message in the desired lap of time. You also have the advantage to change the time frame, to have an history,...
For writing this log, 2 solutions
Cheaper but not really recommended, the process which consume the message log it with it process it. However, you are dependent of an external service. And this service has 2 responsibilities: its work, and this log (for metrics). Not SOLID. Maybe it's can be the role of the publisher with a loge like this: message published at XXXX. However this imply that all the publisher or all the subscribers are on GCP.
Better is to plug a function, the cheaper (128Mb of memory) to simply handle the message and write the log.
Goal:
I use Bloomberg Java API's subscription service to monitor bond prices in real time (subscribing to ASK/BID real time fields). However in the RESPONSE messages, bloomberg does not provide the associated yield for the given price. I need a way to calculate the yields.
Attempt:
Here's what I've tried:
Within in the code that processes Events coming backing from a real time subscription, when I get a BID or ASK response, I extract the price from the message element, and then initiates a new synchronous reference data request, using overrides to get the YAS_BOND_YLD by providing YAS_BOND_PX and setting the overriding flag.
Problem:
This seems very slow and cumbersome. Is there a better way other than having to calculate yields myself?
In my code, I seem to be able to process real time prices if they are being sent to me slowly. If a few bonds' prices were updated at the same time (say, in MSG1 pricing), I seem to only capture one out of these updates, it feels like I'm missing the other events.. Is this because I cannot use a synchronous reference data request while the subscription is still alive?
Thanks.
bloomberg does not provide the associated yield for the given price
Have you tried retrieving the ASK_YIELD and BID_YIELD fields? They may be what you are looking for.
Problem: This seems very slow and cumbersome.
Synchronous one-off requests are slower than real time subscription. Unless you need real time data on the yield, you could queue the requests and send them all at once every x seconds for example. The time to get 100 or 1 yield is probably not that different, and certainly not 100 times slower.
In my code, I seem to be able to process real time prices if they are being sent to me slowly. If a few bonds' prices were updated at the same time (say, in MSG1 pricing), I seem to only capture one out of these updates, it feels like I'm missing the other events.. Is this because I cannot use a synchronous reference data request while the subscription is still alive?
You should not miss items just because you are sending a synchronous request. You may get a "Slow consumer warning" but that's about it. It's difficult to say more without seeing your code. However, if you want to make sure your real time data is not delayed by your synchronous requests, you should use two separate Sessions.