Rate limiting in Google Cloud Storage - google-cloud-storage

At the tope of every minute my code uploads between 20 to 40 files total (from multiple machines, about 5 files in parallel until they are all uploaded) to Google Cloud Storage. I frequently get 429 - Too Many Errors, like the following:
java.io.IOException: Error inserting: bucket: mybucket, object: work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1583)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$3.run(GoogleCloudStorageImpl.java:474)
... 3 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 429 Too Many Requests
{
"code" : 429,
"errors" : [ {
"domain" : "usageLimits",
"message" : "The total number of changes to the object mybucket/work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/ exceeds the rate limit. Please reduce the rate of create, update, and delete requests.",
"reason" : "rateLimitExceeded"
} ],
"message" : "The total number of changes to the object mybucket/work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/ exceeds the rate limit. Please reduce the rate of create, update, and delete requests."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$3.run(GoogleCloudStorageImpl.java:471)
... 3 more
I have some retry logic, which helps a bit, but even after some exponential backoff and up to 3 retries, I still often get the error.
Strangely, when I go to the Google Developers Console -> APIs & auth -> APIs -> Cloud Storage API -> Quotas, I see Per-user limit 102,406.11 requests/second/user. When I look at the Usage tab, it shows no usage.
What am I missing? How do I stop getting rate limited when uploading files to GCS? Why is my quota so high and my usage reported as 0?

Judging by your description of multiple machines all taking an action at the same moment, I suspect all of your machines are attempting to write exactly the same object name at the same moment. GCS limits the number of writes per second against any one single object (1 per second).
Since it looks like your object names end in a slash, like they're meant to be a directory (work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/ ), is it possible you meant to end them with some unique value or a machine name or something but left that bit off?

That error happens when you attempt to update the same object too frequently. From https://cloud.google.com/storage/docs/concepts-techniques#object-updates:
There is no limit to how quickly you can create or update different objects in a bucket. However, a single particular object can only be updated or overwritten up to once per second.

Related

Argo and Kubernetes "Request entity to large: limit is 3145728"

I've been trying to deploy a workflow in Argo with Kubernetes and I'm getting this error
Can someone help me to know the root of the issue?
I’ve tried several things but I’ve been unsuccessful.
The way Argo solves that problem is by using compression on the stored entity, but the real question is whether you have to have all 3MB worth of that data at once, or if it is merely more convenient for you and they could be decomposed into separate objects with relationships between each other. The kubernetes API is not a blob storage, and shouldn't be treated as one.
The "error": "Request entity too large: limit is 3145728" is probably
the default response from kubernetes handler for objects larger than
3MB, as you can see here at L305 of the source code:
expectedMsgFor1MB := etcdserver: request is too large
expectedMsgFor2MB := rpc error: code = ResourceExhausted desc = trying to send message larger than max
expectedMsgFor3MB := Request entity too large: limit is 3145728
expectedMsgForLargeAnnotation := metadata.annotations: Too long: must have at most 262144 bytes
The ETCD has indeed a 1.5MB limit for processing a file and you will
find on ETCD Documentation a suggestion to try the--max-request-bytes
flag but it would have no effect on a GKE cluster because you don't
have such permission on master node.
But even if you did, it would not be ideal because usually this error means that you are consuming the objects instead of referencing them which would degrade your performance.
I highly recommend that you consider instead these options:
- Determine whether your object includes references that aren't used
- Break up your resource
- Consider a volume mount instead
There's a request for a new API Resource: File (orBinaryData) that could apply to your case. It's very fresh but it's good to keep an eye on.
Partial source for this answer: https://stackoverflow.com/a/60492986/12153576

Cloud firestore bandwidth exhausted error

We are using cloud firestore as our database and getting following error when rate of parallel reads from database increases.
details: "Bandwidth exhausted"
message: "8 RESOURCE_EXHAUSTED: Bandwidth exhausted"
stack: "Error: 8 RESOURCE_EXHAUSTED: Bandwidth exhausted
at callErrorFromStatus (/usr/service/node_modules/#grpc/grpc-js/build/src/call.js:30:26)
at Http2CallStream.call.on (/usr/service/node_modules/#grpc/grpc-js/build/src/call.js:79:34)
at Http2CallStream.emit (events.js:198:15)
at process.nextTick (/usr/service/node_modules/#grpc/grpc-js/build/src/call-stream.js:100:22)
at processTicksAndRejections (internal/process/task_queues.js:79:9)"
We couldn't find what is the rate limits. Could you please let me know what are the read rate limits and in which cases firestore returns Bandwidth exhausted error?
Note: Billing is enabled in our project. The problem is we can't find what limit we are reaching.
The RESOURCE_EXHAUSTED error indicates that the project exceeded either its quota or the region/multi-region capacity, so probably your app is doing more reads than expected given what you described. You can check more details in this documentation.
You can check the free quotas and the standard limits on this link and the pricing for what exceeds those numbers on this link. It's important to note that, if you choose to allow your app to go further than the free quotas, you must enable billing for your Cloud Platform project, here is a how to.
You can also check how much your app is actually using of the quotas on app engine on the section below:
Hope This helps.
If your are reading all data from Firebase, this issue happens to you, I had the same problem for reading all data from firebase, after a while, I figured out if we stop this process and run a new process we can pass the error and continue the job.
then I used child-process and it helped me:
I did write a parent script and a child script,
Parent script runs child script as a child-process,
The child goes through a collection until get [8 RESOURCE_EXHAUSTED] error, then send a message to parent to inform it from the error.
Then parent kills child and create a new one and tells it where to start reading again.
This is a solution that works 100 percent, but it's a little advance and beginners-intermediates may could not able to implement it.
update:
I have written a complete instruction on GitHub Gist for this issue, you can check it:
https://gist.github.com/navidshad/973e9c594a63838d1ebb8f2c2495cf87

How to find maximum update domains/fault domains available in an Azure region

The only way I know of to enquire maximum update domains or fault domains allowed for creation of an avaiability set in Azure is by passing very large values and then parsing the error message. Is there a better way to query for the maximum values?
For example, executing New-AzureRmAvailabilitySet -PlatformUpdateDomainCount 100 -PlatformFaultDomainCount 100 <other parameters> will fail with an error that looks like below:
ErrorCode: InvalidParameter
ErrorMessage: The specified fault domain count 100 must fall in the range 1 to 2.
StatusCode: 400
ReasonPhrase: Bad Request
Maybe you can find the maximum fault domains in this article:
The maximum update domains is 20 by default.

How to design a REST API to fetch a large (ephemeral) data stream?

Imagine a request that starts a long running process whose output is a large set of records.
We could start the process with a POST request:
POST /api/v1/long-computation
The output consists of a large sequence of numbered records, that must be sent to the client. Since the output is large, the server does not store everything, and so maintains a window of records with a upper limit on the size of the window. Let's say that it stores upto 1000 records (and pauses computation whenever this many records are available). When the client fetches records, the server may subsequently delete those records and so continue with generating more records (as more slots in the 1000-length window are free).
Let's say we fetch records with:
GET /api/v1/long-computation?ack=213
We can take this to mean that the server should return records starting from index 214. When the server receives this request, it can assume that the (well-behaved) client is acknowledging that records up to number 213 are received by the client and so it deletes them, and then returns records starting from number 214 to whatever is available at that time.
Next if the client requests:
GET /api/v1/long-computation?ack=214
the server would delete record 214 and return records starting from 215.
This seems like a reasonable design until it is noticed that GET requests need to be safe and idempotent (see section 9.1 in the HTTP RFC).
Questions:
Is there a better way to design this API?
Is it OK to keep it as GET even though it appears to violate the standard?
Would it be reasonable to make it a POST request such as:
POST /api/v1/long-computation/truncate-and-fetch?ack=213
One question I always feel like that needs to be asked is, are you sure that REST is the right approach for this problem? I'm a big fan and proponent REST, but try to only apply to to situations where it's applicable.
That being said, I don't think there's anything necessarily wrong with expiring resources after they have been used, but I think it's bad design to re-use the same url over and over again.
Instead, when I call the first set of results (maybe with):
GET /api/v1/long-computation
I'd expect that resource to give me a next link with the next set of results.
Although that particular url design does sort of tell me there's only 1 long-computation on the entire system going on at the same time. If this is not the case, I would also expect a bit more uniqueness in the url design.
The best solution here is to buy a bigger hard drive. I'm assuming you've pushed back and that's not in the cards.
I would consider your operation to be "unsafe" as defined by RFC 7231, so I would suggest not using GET. I would also strongly advise you to not delete records from the server without the client explicitly requesting it. One of the principles REST is built around is that the web is unreliable. Under your design, what happens if a response doesn't make it to the client for whatever reason? If they make another request, any records from the lost response will be destroyed.
I'm going to second #Evert's suggestion that you absolutely must keep this design, you instead pick a technology that's build around reliable delivery of information, such as a messaging queue. If you're going to stick with REST, you need to allow clients to tell you when it's safe to delete records.
For instance, is it possible to page records? You could do something like:
POST /long-running-operations?recordsPerPage=10
202 Accepted
Location: "/long-running-operations/12"
{
"status": "building next page",
"retry-after-seconds": 120
}
GET /long-running-operations/12
200 OK
{
"status": "next page available",
"current-page": "/pages/123"
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "building next page",
"retry-after-seconds": 120
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "complete"
}
GET /pages/123
{
// a page of records
}
DELETE /pages/123
// remove this page so new records can be made
You'll need to cap out page size at the number of records you support. If the client request is smaller than that limit, you can background more records while they process the first page.
That's just spitballing, but maybe you can start there. No promises on quality - this is totally off the top of my head. This approach is a little chatty, but it saves you from returning a 404 if the new page isn't ready yet.

Maximum number of network updates retrieved per API call

Is there any restriction on the number of entries that are retrieved using a single call to the Network Updates API? I found this forum comment "The per-user limit is per call, so 300 requests with however many updates they have." on the thread
http://developer.linkedin.com/forum/increase-search-api-throttle-limit
I want to confirm that indeed there is no limit. I have received as many as 106 entries in a single call.
Thanks in advance.
The maximum number of updates returned from the Network Updates API appears to be 250. Performing the following query as an example:
http://api.linkedin.com/v1/people/~/network/updates?count=500
Even if I try to specify the start parameter at, say, 250, I can't get the next 250 updates from the API:
http://api.linkedin.com/v1/people/~/network/updates?count=250&start=250
So it looks like 250 is the max, with no ability to page beyond that.
UPDATE:
Have verified that 250 is the maximum number returned, either in a single call or via the paging parameters. Looks like the documentation has been updated to reflect this.