Updating entities in ndb while paging with cursors

Updating entities in ndb while paging with cursors - entity-framework

To make things short, I have to make a script in Second Life communicating with an AppEngine app updating records in an ndb database. Records extracted from the database are sent as a batch (a page) to the LSL script, which updates customers, then asks the web app to mark these customers as updated in the database.
To create the batch I use a query on a (integer) property update_ver==0 and use fetch_page() to produce a cursor to the next batch. This cursor is also sent as urlsafe()-encoded parameter to the LSL script.
To mark the customer as updated, the update_ver is set to some other value like 2, and the entity is updated via put_async(). Then the LSL script fetches the next batch thanks to the cursor sent earlier.
My rather simple question is: in the web app, since the query property update_ver no longer satisfies the filter, is my cursor still valid ? Or do I have to use another strategy ?
Stripping out irrelevant parts (including authentication), my code currently looks like this (Customer is the entity in my database).
class GetCustomers(webapp2.RequestHandler): # handler that sends batches to the update script in SL
def get(self):
cursor=self.request.get("next",default_value=None)
query=Customer.query(Customer.update_ver==0,ancestor=customerset_key(),projection=[Customer.customer_name,Customer.customer_key]).order(Customer._key)
if cursor:
results,cursor,more=query.fetch_page(batchsize,start_cursor=ndb.Cursor(urlsafe=cursor))
else:
results,cursor,more=query.fetch_page(batchsize)
if more:
self.response.write("more=1\n")
self.response.write("next={}\n".format(cursor.urlsafe()))
else:
self.response.write("more=0\n")
self.response.write("n={}\n".format(len(results)))
for c in results:
self.response.write("c={},{},{}\n".format(c.customer_key,c.customer_name,c.key.urlsafe()))
self.response.set_status(200)
The handler that updates Customer entities in the database is the following. The c= parameters are urlsafe()-encoded entity keys of the records to update and the nv= parameter is the new version number for their update_ver property.
class UpdateCustomer(webapp2.RequestHandler):
#ndb.toplevel # don't exit until all async operations are finished
def post(self):
updatever=self.request.get("nv")
customers=self.request.get_all("c")
for ckey in customers:
cust=ndb.Key(urlsafe=ckey).get()
cust.update_ver=nv # filter in the query used to produce the cursor was using this property!
cust.update_date=datetime.datetime.utcnow()
cust.put_async()
else:
self.response.set_status(403)
Will this work as expected ? Thanks for any help !

Your strategy will work and that's the whole point for using these cursors, because they are efficient and you can get the next batch as it was intended regardless of what happened with the previous one.
On a side note you could also optimise your UpdateCustomer and instead of retrieving/saving one by one you can do things in batches using for example the ndb.put_multi_async.

Related

Why Spring Data doesn't support returning entity for modifying queries?

When implementing a system which creates tasks that need to be resolved by some workers, my idea would be to create a table which would have some task definition along with a status, e.g. for document review we'd have something like reviewId, documentId, reviewerId, reviewTime.
When documents are uploaded to the system we'd just store the documentId along with a generated reviewId and leave the reviewerId and reviewTime empty. When next reviewer comes along and starts the review we'd just set his id and current time to mark the job as "in progress" (I deliberately skip the case where the reviewer takes a long time, or dies during the review).
When implementing such a use case in e.g. PostgreSQL we could use the UPDATE review SET reviewerId = :reviewerId, reviewTime: reviewTime WHERE reviewId = (SELECT reviewId from review WHERE reviewId is null AND reviewTime is null FOR UPDATE SKIP LOCKED LIMIT 1) RETURNING reviewId, documentId, reviewerId, reviewTime (so basically update the first non-taken row, using SKIP LOCKED to skip any already in-processing rows).
But when moving from native solution to JDBC and beyond, I'm having troubles implementing this:
Spring Data JPA and Spring Data JDBC don't allow the #Modifying query to return anything else than void/boolean/int and force us to perform 2 queries in a single transaction - one for the first pending row, and second one with the update
one alternative would be to use a stored procedure but I really hate the idea of storing such logic so away from the code
other alternative would be to use a persistent queue and skip the database all along but this introduced additional infrastructure components that need to be maintained and learned. Any suggestions are welcome though.
Am I missing something? Is it possible to have it all or do we have to settle for multiple queries or stored procedures?

Why Spring Data doesn't support returning entity for modifying queries?
Because it seems like a rather special thing to do and Spring Data JDBC tries to focus on the essential stuff.
Is it possible to have it all or do we have to settle for multiple queries or stored procedures?
It is certainly possible to do this.
You can implement a custom method using an injected JdbcTemplate.

CQRS - How to handle if a command requires data from db (query)

I am trying to wrap my head around the best way to approach this problem.
I am importing a file that contains bunch of users so I created a handler called
ImportUsersCommandHandler and my command is ImportUsersCommand that has List<User> as one of the parameters.
In the handler, for each user that I need to import I have to make sure that the UserType is valid, this is where the confusion comes in. I need to do a query against the database, to get list of all possible user types and than for each user I am importing, I want to verify that the user type id in the import matches one that is in the db.
I have 3 options.
Create a query GetUserTypesQuery and get the rest of this and then pass it on to the ImportUsersCommand as a list and verify inside the command handler
Call the GetUserTypesQuery from the command itself and not pass it (command calling another query)
Do not create a GetUsersTypeQuery and just do the query results within the command (still a query but no query/handler involved)
I feel like all these are dirty solutions and not the correct way to apply CQRS.

I agree option 1 sounds the best but would maybe suggest adding a pre handler to validate your input?
So ImportUsersCommandHandler deals with importing you data (and only that) and add a handler that runs before that validates (in your example, checks the user types and maybe other stuff) and bails out of it does not pass. So it queries the db, checks the usertypes and does whatever it needs to if it fails. Otherwise it just passes down to your business handler (ImportUsersCommandHandler).
I am used to using Mediatr in NET Core and this pattern works well (this is what we do) so sorry if this does not fit with your environment/setup!

Update associated field in Waterline/Sails without affecting other fields

I notice that multiple requests to a record causes writes to be possibly overwritten. I am using Mongo btw.
I have a schema like:
Trip { id, status, tagged_friends }
where tagged_friends is an association to Users collection
When I make 2 calls to update trips in close succession (in this case I am making 2 API calls from client - actually automated tests), its possible for them to interfere. Since they all call trip.save().
Update 1: update the tagged_friends association
Update 2: update the status field
So I am thinking these 2 updates should only save the "dirty" fields. I think I can do that with Trips.update() rather than trip.save()? But problem is I cannot use update to update an association? That does not appear to work?
Or perhaps there's a better way to do this?

EF database concurrency

I have two apps: one app is asp.net and another is a windows service running in background.
The windows service running in background is performing some tasks (read and update) on database while user can perform other operations on database through asp.net app. So I am worried about it as for example, in windows service I collect some record that satisfy a condition and then I iterate over them, something like:
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>)
foreach (EntityA entity in collection)
{
// do some stuff
}
so, if user modify a record that is used later in the loop iteration, what value for that record is EF taken into account? the original retrieved when performed:
context.EntitiesA.where(<condition>)
or the new one modified by the user and located in database?
As far as I know, during iteration, EF is taken each record at demand, I mean, one by one, so when reading the next record for the next iteration, this record corresponds to that collected from :
context.EntitiesA.where(<condition>)
or that located in database (the one the user has just modified)?
Thanks!

There's a couple of process that will come into play here in terms of how this will work in EF.
Queries are only performed on enumeration (this is sometimes referred to as query materialisation) at this point the whole query will be performed
Lazy loading only effects navigation properties in your above example. The result set of the where statement will be pulled down in one go.
So what does this mean in your case:
//nothing happens here you are just describing what will happen later to make the
// query execute here do a .ToArray or similar, to prevent people adding to the sql
// resulting from this use .AsEnumerable
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>);
//when it first hits this foreach a
//SELECT {cols} FROM [YourTable] WHERE [YourCondition] will be performed
foreach (EntityA entity in collection)
{
//data here will be from the point in time the foreach started (eg if you have updated during the enumeration in the database you will have out of date data)
// do some stuff
}

If you're truly concerned that this can happen then get a list of id's up front and process them individually with a new DbContext for each (or say after each batch of 10). Something like:
IList<int> collection = context.EntitiesA.Where(...).Select(k => k.id).ToList();
foreach (int entityId in collection)
{
using (Context context = new Context())
{
TEntity entity = context.EntitiesA.Find(entityId);
// do some stuff
context.Submit();
}
}

I think the answer to your question is 'it depends'. The problem you are describing is called 'non repeatable reads' an can be prevented from happening by setting a proper transaction isolation level. But it comes with a cost in performance and potential deadlocks.
For more details you can read this

Multiple / Rapid ajax requests and concurrency issues with Entity Framework

I have an asp.net MVC4 application that I am using Unity as my IoC. The constructor for my controller takes in a Repository and that repository takes in a UnitOfWork (DBContext). Everything seems to work fine until multiple ajax requests from the same session happen too fast. I get the Store update, insert, or delete statement affected an unexpected number of rows (0) error due to a concurrency issue. This is what the method looks like called from the ajax request:
public void CaptureData(string apiKey, Guid sessionKey, FormElement formElement)
{
var trackingData = _trackingService.FindById(sessionKey);
if(trackingData != null)
{
formItem = trackingData.FormElements
.Where(f => f.Name == formElement.Name)
.FirstOrDefault();
if(formItem != null)
{
formItem.Value = formElement.Value;
_formElementRepository.Update(formItem);
}
}
}
This only happens when the ajax requests happens rapidly, meaning fast. When the requests happen at a normal speed everything seems fine. It is like the app needs time to catch up. Not sure how I need to handle the concurrency check in my repository so I don't miss an update. Also, I have tried setting the "MultipleActiveResultSets" to true and that didn't help.

As you mentioned in the comment you are using a row version column. The point of this column is to prevent concurrent overwrites of the same row. You have two operations:
Read record - reads record and current row version
Update record - update record with specified key and row version. The row version is updated automatically
Now if those operations are executed by concurrent request you may receive this:
Request A: Read record
Request B: Read record
Request A: Write record - changes row version!
Request B: Write record - fires exception because record with row version retrieved during Read record doesn't exist
The exception is fired to tell you that you are trying to update obsolete data because there is already a new version of the updated record. Normally you need to refresh data (by reloading current record from the database) and try to save them again. In highly concurrent scenario this handling may repeat many times because simply your database is designed to prevent this. Your options are:
Remove row version and let requests overwrite the value as they wish. If you really need concurrent request processing and you are happy to have "some" value, this may be the way to go.
Not allow concurrent requests. If you need to process all updates you most probably also need their real order. In such case your application should not allow concurrent requests.
Use SQL / stored procedure instead. By using table hints you will be able to lock record during Read operation and no other request will be able to read that record before the first one save changes and commits or rollbacks transaction.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse