I have a grails application. I'm using amazon dynamodb for a specific requirement which is accessed, and entries are added by a different application. Now I need to get all the information from the dynamodb table to a postgreSQL table. There are over 10000 records in the dynamodb but the throughput is
Read capacity units : 100
Write capacity units : 100
In BuildConfig.groovy I have defined the plugin
compile ":dynamodb:0.1.1"
In config.groovy I have the following configuration
grails {
dynamodb {
accessKey = '***'
secretKey = '***'
disableDrop = true
dbCreate = 'create'
}
}
The code I have looks something similar to this
class book {
Long id
String author
String name
date publishedDate
static constraints = {
}
static mapWith = "dynamodb"
static mapping = {
table 'book'
throughput read:100
}
}
When I try something like book.findAll() I get the following error
AmazonClientException: Unable to unmarshall response (Connection reset)
And when I tried to reduce the number of records by trying something like book.findAllByAuthor() (which also wud have above 1000's of records) I get the following error
Caused by ProvisionedThroughputExceededException: Status Code: 400, AWS Service: AmazonDynamoDB, AWS Request ID: ***, AWS Error Code: ProvisionedThroughputExceededException, AWS Error Message: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
I have the need to get all the records in dynamodb despite the throughput restriction and save it in a postgres table. Is there a way to do so?
I'm very new to this area, thanks in advance for the help.
After some research I came Across Google Guava. But even to use Guava RateLimiter, there wont be a fixed number of times I would need to send the request of how long it would take. So I'm looking for a solution which will suit the requirement I have
Probably your issue is not connected with grails at all. The returned error message claims: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API
So you should consider increasing level of throughput (for this option you have to pay more) or adjust your queries to obey actual limits.
Check out also this answer: https://stackoverflow.com/a/31484168/2166188
Related
I've been trying to deploy a workflow in Argo with Kubernetes and I'm getting this error
Can someone help me to know the root of the issue?
I’ve tried several things but I’ve been unsuccessful.
The way Argo solves that problem is by using compression on the stored entity, but the real question is whether you have to have all 3MB worth of that data at once, or if it is merely more convenient for you and they could be decomposed into separate objects with relationships between each other. The kubernetes API is not a blob storage, and shouldn't be treated as one.
The "error": "Request entity too large: limit is 3145728" is probably
the default response from kubernetes handler for objects larger than
3MB, as you can see here at L305 of the source code:
expectedMsgFor1MB := etcdserver: request is too large
expectedMsgFor2MB := rpc error: code = ResourceExhausted desc = trying to send message larger than max
expectedMsgFor3MB := Request entity too large: limit is 3145728
expectedMsgForLargeAnnotation := metadata.annotations: Too long: must have at most 262144 bytes
The ETCD has indeed a 1.5MB limit for processing a file and you will
find on ETCD Documentation a suggestion to try the--max-request-bytes
flag but it would have no effect on a GKE cluster because you don't
have such permission on master node.
But even if you did, it would not be ideal because usually this error means that you are consuming the objects instead of referencing them which would degrade your performance.
I highly recommend that you consider instead these options:
- Determine whether your object includes references that aren't used
- Break up your resource
- Consider a volume mount instead
There's a request for a new API Resource: File (orBinaryData) that could apply to your case. It's very fresh but it's good to keep an eye on.
Partial source for this answer: https://stackoverflow.com/a/60492986/12153576
I created a REST api using aws api-gateway and dynamodb without using aws-lambda (I wrote mapping templates for both the integration request and integration response instead of lambda) on a GET API method, POST http method and Scan action setting. I'm fetching from a global secondary index in dynamodb to make my scan smaller than the original table.
It's working well except I am only able to scan roughly 1,000 of my 7,500 items that I need to scan. I checked out paginating the json in an s3 bucket, but I really want to keep it simple with just the aws api-gateway and the dynamodb, if possible.
Is there a way to get all 7,500 of the items in my payload with some modification to my integration request and/or response mappings? If not, what do you suggest?
Below is the mapping code I'm using that works for a 1000 item json payload instead of the 7,500 that I would like to have:
Integration Request:
{
"TableName": "TrailData",
"IndexName": "trail-index"
}
Integration Response:
#set($inputRoot = $input.path('$'))
[
#foreach($elem in $inputRoot.Items)
{
"id":$elem.id.N,
"trail_name":"$elem.trail_name.S",
"challenge_rank":$elem.challenge_rank.N,
"challenge_description":"$elem.challenge_description.S",
"reliability_description":"$elem.reliability_description.S"
}
#if($foreach.hasNext),#end
#end
]
Here is a screenshot of the GET method settings for my API:
API Screenshot
I have already checked out this: stackoverflow question related topic, but I can't figure out how to apply it to my situation. I have put a lot of time into this.
I am aware of the 1MB query limit for dynamodb, but the limited data I am returning is only 142KB.
I appreciate any help or suggestions. I am new to this. Thank you!
This limitation is not related to Dynamo Scan but VTL within Response Template #foreach is restricted to 1000 iterations Here is the issue.
We can also confirm this, by simply removing the #foreach(or entire response template), we should see all(1MB) the records back (but not well formatted).
Easiest solution is pass the request parameters to restrict only necessary attributes from Dynamo table
{
"TableName":"ana-qa-linkshare",
"Limit":2000,
"ProjectionExpression":"challenge_rank,reliability_description,trail_name"
}
However, we can avoid doing a single loop that goes over 1000 with multiple foreach loops, but going to get little complex with in template, instead, we could use lambda. But here is how it might look like.
#set($inputRoot = $input.path('$'))
#set($maxRec = 500)
#set($totalLoops = $inputRoot.Count / $maxRec )
#set($outerArray = [0..$totalLoops])
#set($innerArray = [0..$maxRec])
{
[
#foreach($outer in $outerArray)
#foreach($inner in $innerArray)
{
grab the element with $inputRoot.Items.get(..index)
and Build JSON here.
}
#end
#end
]
}
I made data migration with help of the Azure Database Migration Service from Mongo3.4 to Azure Cosmos DB. All collections were copied. Then I deployed app and run report inside the application. I was receiving errors in k8s like:
[report-srv-8a49370c7976028acfc037b7b9b69a37b34b8afezmg5r] 2020-09-17T14:12:27.653Z ERROR: [handleControllerHeart] Error handling heart: {"err":{"driver":true,"name":"MongoError","index":0,"code":16500}}
Error=16500, RetryAfterMs=5481, Details='Response status code does not
indicate success: TooManyRequests (429); Substatus: 3200; ActivityId:
********; Reason: ({\r\n "Errors": [\r\n "Request rate is large. More Request Units may be needed, so no changes were made.
Please retry this request later. Learn more:
http://aka.ms/cosmosdb-error-429
Then I increased RUs but the same behavior.
Does anybody have experience with migration from Mongo3.4 to Azure Cosmos DB?
You need to increase Throughput aka RUs (Request Units). You can do it from here, see how much you are using already from here and may be double it then from the dashboard as before see how much you used when you ran your report and then adjust with what you need.
As a result we created indexes in each collection and that gave us a possibility to decrease shared RU.
Increasing RU also helped but queries were very slow.
I'm using Bigquery Java API to run ~1000 copy jobs simultaneously (With scala.concurrent.Future) with WriteDisposition WRITE_APPEND, but I'm getting
com.google.cloud.bigquery.BigQueryException: API limit exceeded: Unable to return a row that exceeds the API limits. To retrieve the row, export the table
I thought this is caused by too much concurrency, then I tried to use Monix's Task to limit the parallelism to at most 20:
def execute(queries: List[Query]): Future[Seq[Boolean]] = {
val tasks: Iterator[Task[List[Boolean]]] = queries.map(q => BqApi.copyTable(q, destinationTable))
.sliding(20, 20)
.map(Task.gather(_))
val results: Task[List[Boolean]] = Task.sequence(tasks)
.map(_.flatten.toList)
results.runAsync
}
where BqApi.copyTable executes the query and copy the result to the destination table then returns a Task[Boolean].
The same exception still happens.
But if I change the WriteDisposition to WRITE_TRUNCATE, the exception goes away.
Can anyone help me to understand what happens under the hood? And why Bigquery API behaves like this?
This message is encountered when a query exceeds a maximum response size. Since copy jobs use jobs.insert, maybe you're hitting the maximum row size which are in the query jobs limits. I suggest filling a BigQuery bug on its issue tracker to describe your behavior properly regarding the Java API.
Would a database like Cassandra and scheme like GraphQL work well together?
Cassandra ideology is based on the idea of optimizing your queries and denormalizing data. This doesn't seem to really mesh well with a GraphQL ideology where data seems to be accessible in every level of a query.
Example:
Suppose I architect my Cassandra table like so:
User:
name
address
etc... (many properties)
Group:
id
name
user_name (denormalized user, where we generally just need the name of a user)
But with GraphQL, it's one wouldn't exactly expect a denormalized User.
query getGroup {
group(id: 1) {
name
users {
name
}
}
}
So a couple of things:
1.) This GraphQL query could end up hitting our Cassandra database multiple times (assuming no caching). Getting the group name and for each of the users we might even hit it for each user. But lets say our resolve creates multiple User objects with one cassandra call.
2.) We can't really build a cassandra idiomatic database with denormalization and graphql in mind, can we? Otherwise we should expect certain properties of a User aren't returned to us with the query.
To sum up the question, what's the graphql strategy for working with denormalized data? Is it acceptable to omit certain properties that the client thinks are accessible? E.g the client tries to access address of user but we don't have that at the moment because our data is denormalized. Or should one not even worry about denormalization and just let graphQL make calls with a caching mechanism in between the db and graphql. E.g graphql first gets the group, then gets the user data for the group id.
This is a side effect of GraphQL where a query can get quite complex in retrieving the data. But as long as the user is actually requesting the data they need if you are smart about your resolvers the end result will actually be faster.
Consider tools like dataloader to cache when resolving a query.
As far as omitting certain properties graphql validates the response and will throw an error, although it will also return the data you gave. It would probably be better to implement some sort of timeout and throw a more descriptive error if there is an issue retrieving the data.