batch updates using Entity Framework - entity-framework

Can anyone suggest me how to deal with batch updates using Entity Framework?
For ex: Updating a field for a list (List) of between 100 and 1000 Id's, which the primary key. Calling each update separately seem to be to much overhead and takes a long time.

Until you call SaveChanges() none of the data is updated in the database so you can make how many changes you'd like.
You can take a look at a Unit of Work design pattern if you want a more structured way of saving changes to the database.

Related

What is the most efficient way to update existing records using EF core

I am getting data from an external API and then using Entity Framework Core to insert the data into a table that has an identity field which is the primary key. I plan to regularly update the table with data from the API. There will be new records, and records that need to be updated.
I struggling to determine what the best approach is to update existing records because the data from the API does not contain the primary key as this is generated when data is inserted into the table. The API data does contain another unique id that I could use to look up the primary key before performing an update.
I am hoping someone has an example of a clean approach to performing the updates.
The straightforward thing to do is:
Make sure you have an index on that other unique id
Look up the entity using that other unique id
Save the data coming in on the API request into the entity
Save changes
The more complicated thing to do would be something like:
context.DbSetForEntity.ExecuteRawSQL("Update Entity Set blaa blaa blaa where externalId={0}", externalId)..
The second approach is much harder to get right and maintain and overall will not be that much faster, just a few milliseconds. I'd go with the first approach.
A good maxim to follow is to defer optimization until you see that you need it.
By the way, if you really want to experiment with the performance you might want to try running the queries in LinqPad. The free edition should be enough to give you timing without having to put it all in your code.

Is optimistic locking equivalent to Select For Update?

It is my first time using EF Core and DDD concepts. Our database is Microsoft SQL Server. We use optimistic concurrency based on the RowVersion for user requests. This handles concurrent read and writes by users.
With the DDD paradigma user changes are not written directly to the database nor is the logic handled in database with a stored procedure. It is a three step process:
get aggregate from repository that pulls it from the database
update aggregate through domain commands that implement business logic
save aggregate back to repository that writes it to the database
The separation of read and write in the application logic can lead again to race conditions between parallel commands.
Since the time between read and write in the backend is normally fairly short, those race conditions can be handled with optimistic and also pessimistic locking.
To my understanding optimistic concurrency using RowVersion is sufficient for lost update problem, but not for write skew as is shown in Martin Kleppmann's book "Designing Data-Intensive Applications". This would require locking the read records.
To prevent write skew a common solution is to lock the records in step 1 with FOR UPDATE or in SQL Server with the hints UPDLOCK and HOLDLOCK.
EF Core does neither support FOR UPDATE nor SQL Server's WITH.
If I'm not able to lock records with EF Core does it mean there is no way to prevent write skew except using Raw SQL or Stored Procedures?
If I use RowVersion, I first check the RowVersion after getting the aggregate from the database. If it doesn't match I can fail fast. If it matches it is checked through EF Core in step 3 when updating the database. Is this pattern sufficient to eliminate all race conditions except write skew?
Since the write skew race condition occurs when read and write is on different records, it seems that there can always be a transaction added maybe later during development that makes a decision on a read. In a complex system I would not feel safe if it is not just simple CRUD access. Is there another solution when using EF Core to prevent write skew without locking records for update?
If you tell EF Core about the RowVersion attribute, it will use it in any update statement. BUT you have to be careful to preserve the RowVersion value from your data retrieval. The usual work pattern would retrieve the data, the user potentially edits the data, and then the user saves the data. When the user saves the data, you would normally have EF retrieve the entity, update the entity with the user's changes, and save the updates. EF uses the RowVersion in a Where clause to ensure nothing has changed since you read the data. This is the tricky part- you want to make sure the RowVersion is still the same as your initial data retrieval, not the second retrieval used to update the entity before saving.

Entity Framework - add or subtract set amount from DB field

I am working on my first project using an ORM (currently using Entiry Framework, although that's not set in stone) and am unsure what is the best practice when I need to add or subtract a given amount from a database field, when I am not interested in the new value and I know the field in question is frequently updated, so concurrency conflicts are a concern.
For example, in a retail system where I am recording a sale, as well as creating records for the sale and each of the line items, I need to update the quantity on hand of the items sold. It seems unnecessary to query the database for the existing quantity on hand, just so that I can populate the entity model before saving the updated quantity - and in the time taken for that round-trip, there is a chance that the same item will have been sold through another checkout or the website, so I either have a conflict or (if using a transaction) the other sale is blocked until I complete my update.
In SQL I would simply write
UPDATE Item SET Quantity=Quantity-1 WHERE ...
It seems the best option in this case is to fall back to ADO.NET + stored procedure for this one update, but is there a better way within Entity Framework?
You're right. ORMs are specialized in tracking changes to each individual entity, and applying those changes to the DB individually. Some ORMs support sending thechanges in btaches, but, even so, to modify all the records in a table implies reading them all, modifyng each one, and sending the changes back to the DB as individual UPDATEs.
And that's a big no-no! as you have corectly thought. It implies loading all the rows into memory, modifying all of them, track their changes, and send them back to the DB as indivudal updates, which is way more expensive that running a single UPDATE on the DB.
As to the final question, to run a SQL command you don't need to use traditional ADO.NET. You can run SQL queries directly from an EF DbContext using ExecuteSqlCommand like this:
MyDbContext.Database.ExecuteSqlCommand('Your SQL here!!');
I recommend you to look at the MSDN docs for Database class, to learn all the things that can be done, for example managing transactions, executing commands that return no data (as the previous example) or executing queries that return data, and even mapping them to entities (classes) in your model: SqlQuery().
So you can run SQL commands and queries without using a different technology.

Eclipselink Batch vs fetch-join reading in Fetch Groups is this possible?

I have a situation whereby I would like to create a query against an entity using EclipsLink JPA, I require 5 fields from this entity of which it has many. 2 of those fields are joined OneToMany relationships. I only require 2 primitive fields from each of the joins.
What is the most efficient way to do this?
I've considered a number of possiblities, batch reading seems the best bet based on what I have read however I believe this will only work if I retrieve the full entity i.e. SELECT a FROM Entity a... and the reason I don't want to do this is I have LOB and BLOB types that will eat dangerously into the memory.
Join-fetch is another but the entity has ~10 joined tables and I don't want to duplicate all of this data.
I have been using fetch groups (http://wiki.eclipse.org/EclipseLink/Examples/JPA/AttributeGroup) and specifying the fields I want which causes cached lazing loading. This is workable and the memory footprint is better. The issue is though when I do entity.getCollection() it must do a single SQL statement for each call and this is where I feel it is inefficient. If I could do SELECT a.Field, a.Field2 from Entity A using some form of batching or join-fetching or better still apply this to my fetch group this would be best I would imagine but not sure if I could ensure that it would not load all related tables and only give me the ones I want.
Help/thoughts would be much appreciated.
I think batch fetching with also work with nested fetch groups, did you try this?
You can also set a defaultFetchGroup on your FetchGroupManager (either directly or by adding fetch=LAZY to your fields you do not want in your fetch group, i.e. add fetch=LAZY to your LOB fields).

Entity Framework Self Tracking Entities - Synchronize between 2 databases

I am using Self Tracking Entities with the Entity Framework 4. I have 2 databases, with the exact same schema. However, tables in one database will be added to/edited etc (and I mean data will be added/edited, not the actual table definitions) and at certain points of the day I will need to synchronize all the changes between this database and the other database.
I can create a separate context for both of them. But if I read a large graph from one database, how can I update the other database with the graph? Is there an easy way?
My database model is large and complex and fully relational. So it would be a big job to go through every single entity and do a read from the other database to see if it exists or not, update/insert it if need be, and then carry this on through the full object graph!
Any ideas?
This is not a use case for EF. In EF you will have to do exactly what you've described. Self tracking entities are able to track changes to these object instances - they know nothing about changes made to their own database over time and they will not know anything about state of your second database as well.
Try to look at SQL server native features (including mirroring, transaction log shipping or SSIS) and MS Sync framework. Depending on your detailed requirements these tools can suite you better.