Entity Framework bulk insert without holding objects

Entity Framework bulk insert without holding objects - entity-framework

Is there a way to essentially make the EF context stateless so I can insert a bunch of POCOs and not have them remain in memory, kind of the equivalent of a stateless session in NHibernate? This is to try an improve performance of bulk inserts. I'm going to be inserting 1.7M POCOs into a SQL Server Compact table in the first run, and then inserting/updating records on subsequent runs.

No. EF will require you to load all objects into context (memory) and after that it will insert / update each object in separate roundtrip to database.
Every performance improvement is mostly based on hacking EF and trying to overcome its limitations. In such case you can directly write insert yourselves and batch them to SqlCeCommands manually - you will build such solution faster, it will have better performance and it will be less error prone.

Related

Is optimistic locking equivalent to Select For Update?

It is my first time using EF Core and DDD concepts. Our database is Microsoft SQL Server. We use optimistic concurrency based on the RowVersion for user requests. This handles concurrent read and writes by users.
With the DDD paradigma user changes are not written directly to the database nor is the logic handled in database with a stored procedure. It is a three step process:
get aggregate from repository that pulls it from the database
update aggregate through domain commands that implement business logic
save aggregate back to repository that writes it to the database
The separation of read and write in the application logic can lead again to race conditions between parallel commands.
Since the time between read and write in the backend is normally fairly short, those race conditions can be handled with optimistic and also pessimistic locking.
To my understanding optimistic concurrency using RowVersion is sufficient for lost update problem, but not for write skew as is shown in Martin Kleppmann's book "Designing Data-Intensive Applications". This would require locking the read records.
To prevent write skew a common solution is to lock the records in step 1 with FOR UPDATE or in SQL Server with the hints UPDLOCK and HOLDLOCK.
EF Core does neither support FOR UPDATE nor SQL Server's WITH.
If I'm not able to lock records with EF Core does it mean there is no way to prevent write skew except using Raw SQL or Stored Procedures?
If I use RowVersion, I first check the RowVersion after getting the aggregate from the database. If it doesn't match I can fail fast. If it matches it is checked through EF Core in step 3 when updating the database. Is this pattern sufficient to eliminate all race conditions except write skew?
Since the write skew race condition occurs when read and write is on different records, it seems that there can always be a transaction added maybe later during development that makes a decision on a read. In a complex system I would not feel safe if it is not just simple CRUD access. Is there another solution when using EF Core to prevent write skew without locking records for update?

If you tell EF Core about the RowVersion attribute, it will use it in any update statement. BUT you have to be careful to preserve the RowVersion value from your data retrieval. The usual work pattern would retrieve the data, the user potentially edits the data, and then the user saves the data. When the user saves the data, you would normally have EF retrieve the entity, update the entity with the user's changes, and save the updates. EF uses the RowVersion in a Where clause to ensure nothing has changed since you read the data. This is the tricky part- you want to make sure the RowVersion is still the same as your initial data retrieval, not the second retrieval used to update the entity before saving.

How many rows in a SQL Server table is too much?

I am parsing and storing some OSM (open street map) data in a SQL table, using Entity framework.
I've estimated there will be around 11 million records in this table. Which will be bound a to a model with EF etc. Is this too many?
What can I do to make this amount of data useable and not too slow?

The total number of rows in the DB is not the deciding factor regarding EF usage. What does become an issue with EF is when you need to work with many records at once. If you regularly manipulate many records at once, eg insert 10k, delete 10k, or update 10k at once, daily, then you will want to use SQL stored procs.
With Context, context objects and proxies and even change tracking, all nice with small transactions, large volume activity becomes slow.
My personal rule of thumb is around 1000 objects loaded at once. Use direct sql.
I use direct SQL side by side EF. I use EF for 95% of activity.
For data loads, extracts, table copies etc, all with SQL scripts/Sps.
Also Since EF6 you can tell EF to add extra indexes beyond the foreign keys, so that SQL generated performs ok.

Can EF successfully be used to create very large tables?

We are very successfully using EF 5.0 for our real time server as well as from our internal websites. Now I need to create a utility that parses the data history to create a new table, which will be done using a copy of the production database used for data mining. Given that EF is transaction based, is there a good way to create a very large table where the table may have > 1M rows? My current thinking is no, and that the way to do this is to perhaps read the data with EF, but create a CSV file that is then bulk loaded, which I successfully do in some other situations. I'm not necessarily looking for the most efficient way, but I cannot imagine that EF or SQL would do well to add > 1M records with a single transaction. I know I could batch them in 1000 record chunks but that is not especially appealing. EF is said to be MSFT's principal data access technology going forward but they need to support this sort of scenario as part of that plan. Any ideas and insight appreciated. Thx.

EF is not geared for bulk operations (see Efficient way to do bulk insert/update with Entity Framework).
Instead of using a CSV to bulk-load, perhaps you want to look into the SQL Server Bulk Copy API

How Entity Framework works in case of batch insert and update data

I use MS data access application block for interaction with database and I saw its performance is good. When I like to add 100 or more records then I send those 100 records in xml format to a stored procedure and from there I do a bulk insert. Now I have to use Entity Framework. I haven't ever used EF before so I am not familiar with EF and how it works.
In another forum I asked a question like "How Entity Framework works in case of batch insert and update data" and got answer
From my experience, EF does not support batch insert or batch update.
What it does is that it will issue an individual insert or update statement, but it will wrap all of them in a transaction if you add all of your changes to the dbcontect before calling SaveChanges().
Is it true that EF can not handle batch insert/update? In case of batch insert/update EF inserts data in loop? If there are 100 records which we need to commit at once then EF can not do it?
If it is not right then please guide me how one should write code as a result EF can do batch insert/update. Also tell me the trick how to see what kind of SQL it will generate.
If possible please guide me with sample code for batch insert/update with EF. also tell me which version of EF support true batch operation. Thanks

Yes EF is not a Bulk load, Update tool.
You can of course put a a few K entries and commit (SaveChanges)
But when you have serious volumes of speed is critical, use SQL.
see Batch update/delete EF5 as an example on the topic

entity framework performance

I am using Entity Framework to layer on my SQL Server 2008 database. The EF is present in my web service and the webservice is invoked by a Silverlight client.
I am seeing a serious performance issue in terms of the duration taken by a query to execute in the EF. This wouldn't happen in the consecutive calls.
A little bit of googling revealed that, it's caused per app domain to construct the in-memory model of the db objects. I found this Microsoft link explaining pre-generation of views for performance improvement. Even after implementing the steps, the performance actually degraded instead of improving. I am curious, if anyone has tried this approach successfully and if there are any other avenues for improving performance.
I am using .NET 3.5.

A couple areas to look at for EF performance
Do as much of the processing before calling things like tolist(). ToList will bring everything in the set into memory. By default, EF will keep building the expression tree and only actually process it when you need the data in memory. That first query will be against the database, but afterwards the processing will be in memory. When working with large data, you definitely want as much of the heavy lifting done by the database as possible.
EF 1 only has the option to pull the entire row back. Therefore if you have a column that is a large string or binary blob, it is going to be pulled down and into memory whether you need it or not. You can create a projection that doesn't include this column, but then you don't get the benefits of having it be an entity.
You can look at the sql generated by EF using the suggestion in this post
How do I view the SQL generated by the Entity Framework?

The same laws of physics apply for EF queries as they do for ordinary SQL. Check your database tables and make sure that you have indexes on primary and foreign keys, that your database is properly normalized, and so forth. If performance is degrading after Microsoft's suggestions, then that's my guess as to the problem area.

Are you hosting the webservice in IIS? Is it running on the same site as the Silverlight App? What about the database itself? Is it running on a dedicated machine? Are there other apps hitting it? The first call to a dormant database is painful (I've had situations where it would actually time out in my environment.)
There are a number of factors to take into consideration here. But it comes down to more than just EF's overhead.
edit I didn't fully qualify but the process of opening the first connection to SQL Server is slow regardless of your data access solution.

Use SQL Profiler to check how many queries executed to retrieve your data.If it's large number use Include() method of ObjectQuery to retrieve child objects with parent in one query.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse