Bulk Insert/Update with EF6? - entity-framework

Bulk Insert/Update with EF6? - entity-framework

I’m looking for a way to Insert or Update about 155,000 records using EF6. It has become obvious that EF6 out of the box is going to take way to long to look up a record and decide if it’s an insert or update, create or update an object, and then commit it to the database.
Looking around I’ve seen third party apps like EntityFramework.Extend but it looks like they are designed to do mass updates like “Update Table where Field=value” which doesn’t quite fit what I’m looking to do.
In my case I read in an XML doc, create a list of objects from that document, and then use EF to either insert or update to a table. Would it be better off going back to just regular ADO.Net and using bulk inserts that way?
BTW: this is using an Oracle database, not SQL Server.

You may use the EntityFramework.BulkInsert-ef6 package:
using EntityFramework.BulkInsert.Extensions;
class Program
{
static void Main(string[] args)
{
var data = new List<Demo>();
for (int i = 0; i < 1000000; i++)
{
data.Add(new Demo { InsertDate = DateTime.Now, Key = Guid.NewGuid(), Name = "Example " + i });
}
Stopwatch sw = Stopwatch.StartNew();
using (Model1 model = new Model1())
{
sw.Start();
model.BulkInsert(data);
sw.Stop();
}
Console.WriteLine($"Elapsed time for {data.Count} rows: {sw.Elapsed}");
Console.ReadKey();
}
}
Running this on my local HDD drive gives
Elapsed time for 1000000 rows: 00:00:24.9646688
P.S. The package provider claims that this version of the bulk package is outdated. Anyhow it's fitting my needs for years now the the package proposed by the author is no longer free of charge.

If you are looking for a "free" way to do it, I recommend you to go back to ADO.NET and use Array Bindings which is what I do under the hood with my library.
Disclaimer: I'm the owner of Entity Framework Extensions
This library support all major provider including Oracle
Oracle DevArt
Oracle DataAccess
Oracle DataAccessManaged
This library allows you to perform all bulk operations you need for your scenarios:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});
This library will make you save a ton of time without having you to make any ADO.NET!

Related

EF Core Bulk Delete on PostgreSQL

I’m trying to do a, potentially, large scale delete operation on a single table. (think 100,000 rows on a 1m row table)
I’m using PostgreSQL and EntityFrameworkCore.
Details: The application code has a predicate to match and knows nothing about how many rows potentially match the predicate. It could be 0 row/s or a very large amount.
Research indicates EF Core is incapable of handling this efficiently. (i.e. the following code produces a Delete statement for each row!)
Using (var db = new DbContext)
var queryable = db.Table.AsQueryable()
.Where(o => o.ForeignKey == fKey)
.Where(o => o.OtherColumn == false);
db.Table.RemoveRange(queryable);
await db.SaveChangesAsync();
So here is the SQL I would prefer to run in a sort of batched operation:
delete from Table
where ForeignKey = 1234
and OtherColumn = false
and PK in (
select PK
from Table
where ForeignKey = 1234
and OtherColumn = false
limit 500
)
There are extension libraries out there, but I’ve yet to find an active one that supports Postgres. I’m currently executing the raw sql above through EF Core.
This leads to a couple questions:
Is there anyway to get EF Core to delete these rows more efficiently on Postgres using LINQ, etc?
(Seems to me like handing the context a queryable should give it everything it needs to make the proper decision here)
If not, what are your opinions on deleting in batches vs handing the DB just the predicate?

I think you are trying to do something you should not use EntityFrameworkCore for. The object of EntityFrameworkCore is to have a nice way to move data between a .Net-Core application and a database. The typical useway is single or a small amount of objects. For bulk operations there are some nuget-packages. There is this package for inserting and updating with postgres.This article by the creator explains how it uses temporary tables and the postgres COPY command to do bulk operations. This shows us a way to delete rows in bulk by id:
var toDelete = GetIdsToDelete();
using (var conn = new NpgsqlConnection(connectionString))
{
conn.Open();
using ( var cmd = conn.CreateCommand())
{
cmd.CommandText =("CREATE TEMP TABLE temp_ids_to_delete (id int NOT NULL) ON COMMIT DROP ");
cmd.Prepare();
cmd.ExecuteNonQuery();
}
using (var writer = conn.BeginBinaryImport($"COPY temp_ids_to_delete (id) FROM STDIN (FORMAT BINARY)"))
{
foreach (var id in toDelete)
{
writer .StartRow();
writer .Write(id);
}
writer .Complete();
}
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "delete from myTable where id in(select id from temp_ids_to_delete)";
cmd.Prepare();
cmd.ExecuteNonQuery();
}
conn.Close();
With some small changes this can be more generalized.
But you want to do something different. You dont want to move data or information between the application and the database. You want to use efcore to create a slq-procedure on the fly and run that on the server. The problem is that ef core is not realy build to do that. But maybe there are ways around that. One way i could think of is to use ef-core to build a query, get the query string and then insert that string into another sql-string to run on the server.
Getting the query string is currently not easy but apparently it will be with EF Core 5.0. Then you could do this:
var queryable = db.Table.AsQueryable()
.Where(o => o.ForeignKey == fKey)
.Where(o => o.OtherColumn == false);
var queryString=queryable.ToQueryString();
db.Database.ExecuteSqlRaw("delete from Table where PK in("+queryString+")" )
And yes that is terribly hacky and i would not recommend that. I would recommend to write procedures and functions on the databaseServer because this is not something ef-core should be used for. And then you can still run those functions from ef-core and pass parameters.

I would suggest using temp tables to do an operation like this. You would create a mirror temp table, bulk add the records to keep or delete to the temp table and then execute a delete operation that looks for records in/not in that temp table. Try using a library such as PgPartner to accomplish bulk additions and temp table creation very easily.
Check out PgPartner: https://www.nuget.org/packages/PgPartner/
https://github.com/SourceKor/PgPartner

Disclaimer: I'm the owner of the project Entity Framework Plus
Your scenario look to be something that our Batch Delete features could handle: https://entityframework-plus.net/batch-delete
Using (var db = new DbContext)
var queryable = db.Table.AsQueryable()
.Where(o => o.ForeignKey == fKey)
.Where(o => o.OtherColumn == false);
queryable.Delete();
Entities are not loaded in the application, and only a SQL is executed as you specified.

how to audit log inserts (adds) of records using tracker enabled dbcontext

We are using https://github.com/bilal-fazlani/tracker-enabled-dbcontext
to create an audit trail of changes. We'd also like to record inserts into the trail of new records. we can loop though the entities just added but there seems to be no way of getting the ID for the entity just added?
A year ago there was an article written suggesting it's a limitation / not possible - https://www.exceptionnotfound.net/entity-change-tracking-using-dbcontext-in-entity-framework-6/
but there are also some comments suggesting there is a way. we studied those and the related code but are not any clearer, is it actually possible to audit inserts properly with this framework?
foreach (var entryAdded in addedEntities)
{
var entityKeyObject = objectContext.ObjectStateManager.GetObjectStateEntry(entryAdded.Entity).EntityKey;
var entityKey = entityKeyObject.EntityKeyValues.Length >= 1 ? entityKeyObject.EntityKeyValues[0].Value : 0;
// insert into audit log here..
}

Yes, it's possible to get inserted ID for entity just added.
In short, you simply need to handle two events:
PreSaveChanges: Get information not related to primary key
PostSaveChanges: Get information related to the primary key (which has been generated)
All the codes can be found using the link. So I cannot answer how to make it with this library but at least, I can ensure you at 100% it's possible.
One alternative is using another Entity Framework Audit Library
Disclaimer: I'm the owner of the project Entity Framework Plus
Wiki: EF+ Audit
This library allows you to audit & save information in a database. By default, it already supports ID added.
// using Z.EntityFramework.Plus; // Don't forget to include this.
var ctx = new EntityContext();
// ... ctx changes ...
var audit = new Audit();
audit.CreatedBy = "ZZZ Projects"; // Optional
ctx.SaveChanges(audit);
// Access to all auditing information
var entries = audit.Entries;
foreach(var entry in entries)
{
foreach(var property in entry.Properties)
{
}
}

CreateOrUpdate Operation Over Many Records Using Entity Framework 6

I am writing a web crawler for fun.
I have a remote SQL database that I want to save information about each page I visit and I am using Entity Framework 6 to persist data. For the sake of illustration, let's assume that the only data I want to save about each page is the last time I visited it.
Updating this database is very slow. Here is the operation that I want make fast:
For the current page, check if it exists in the database already
If it exists already, update the "lastVisited" timestamp field on the record and save.
If it doesn't exist, create it.
Currently I can only do 300 updates per minute. My SQL server instance shows almost no activity, so I assume I am client-bound.
My code is naive:
public static void AddOrUpdatePage(long id, DataContext db)
{
Page p = db.Pages.SingleOrDefault(f => f.id == id);
if (p == null)
{
// create
p = new Page();
p.id = id;
db.Pages.Add(p);
}
p.lastSeen = DateTime.Now;
db.SaveChanges();
}
I crawl a bunch of pages (1000s), and then call AddOrUpdatePage in a loop for each page.
It seems like the way to get more speed is batching? What is the best way to get 1000 records from my database at a time, given a set of page ids? In SQL I would use a table variable for this and join, or use a lengthy IN clause.

Entity Framework 6: is there a way to iterate through a table without holding each row in memory

I would like to be able to iterate through every row in an entity table without holding every row in memory. This is a read only operation and every row can be discarded after being processed.
If there is a way to discard the row after processing that would be fine. I know that this can be achieved using a DataReader (which is outside the scope of EF), but can it be achieved within EF?
Or is there a way to obtain a DataReader from within EF without directly using SQL?
More detailed example:
Using EF I can code:
foreach (Quote in context.Quotes)
sw.WriteLine(sw.QuoteId.ToString()+","+sw.Quotation);
but to achieve the same result with a DataReader I need to code:
// get the connection to the database
SqlConnection connection = context.Database.Connection as SqlConnection;
// open a new connection to the database
connection.Open();
// get a DataReader for our table
SqlCommand command = new SqlCommand(context.Quotes.ToString(), connection);
SqlDataReader dr = command.ExecuteReader();
// get a recipient for our database fields
object[] L = new object[dr.FieldCount];
while (dr.Read())
{
dr.GetValues(L);
sw.WriteLine(((int)L[0]).ToString() + "," + (string)L[1]);
}
The difference is that the former runs out of memory (because it is pulling in the entire table in the client memory) and the later runs to completion (and is much faster) because it only retains a single row in memory at any one time.
But equally importantly the latter example loses the Strong Typing of EF and should the database change, errors can be introduced.
Hence, my question: can we get a similar result with strongly typed rows coming back in EF?

Based on your last comment, I'm still confused. Take a look at both of below code.
EF
using (var ctx = new AppContext())
{
foreach (var order in ctx.Orders)
{
Console.WriteLine(order.Date);
}
}
Data Reader
var constr = ConfigurationManager.ConnectionStrings["AppContext"].ConnectionString;
using (var con = new SqlConnection(constr))
{
con.Open();
var cmd = new SqlCommand("select * from dbo.Orders", con);
var reader = cmd.ExecuteReader();
while (reader.Read())
{
Console.WriteLine(reader["Date"]);
}
}
Even though EF has few initial query, both of them execute similar query that can be seen from profiler..

I haven't tested it, but try foreach (Quote L in context.Quotes.AsNoTracking()) {...}. .AsNoTracking() should not put entities in cache so I assume they will be consumed by GC when they out of the scope.
Another option is to use context.Entry(quote).State = EntityState.Detached; in the foreach loop. Should have the similar effect as the option 1.
Third option (should definitely work, but require more coding) would be to implement batch processing (select top N entities, process, select next top N). In this case make sure that you dispose and create new context every iteration (so GC can eat it:)) and use proper OrderBy() in the query.

You need to use an EntityDataReader, which behaves in a way similar to a traditional ADO.NET DataReader.
The problem is that, to do so, you need to use ObjectContext instead of DbContext, which makes things harder.
See this SO answer, not the acepted one: How can I return a datareader when using Entity Framework 4?
Even if this referes to EF4, in EF6 things work in the same way. Usually an ORM is not intended for streaming data, that's why this functionality is so hidden.
You can also look at this project: Entity Framework (Linq to Entities) to IDataReader Adapter

I have done this by pages.
And cleaning the Context after each page load.
Sample:
Load first 50 rows
Iterate over them
Clean the Context or create a new one.
Load second 50 rows
...
Clean the Context = Set all its Entries as Detached.

iOS coredata using existing SQLite?

I want to create a new table in iOS core data, I have used the following xml file to create in java before and would like to re-use if possible
sql.xml file
<sql>
<statement>
CREATE TABLE IF NOT EXISTS place (
_id INTEGER PRIMARY KEY AUTOINCREMENT,
Name VARCHAR(50),
Location VARCHAR(50),
Description VARCHAR(300),
Type VARCHAR(50),
longitude DOUBLE(50),
latitude DOUBLE(50),
</statement>
<statement>INSERT INTO place VALUES(1,'Clare'
,'Co Clare'
,'Clare Description'
,'County'
,'52.924014'
,'-9.353399')
</statement>
<statement>INSERT INTO surfSpot VALUES(2,'etc...
Java code
public void onCreate(SQLiteDatabase db){
String s;
try{
InputStream in = context.getResources().openRawResource(R.raw.sql);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(in, null);
NodeList statements = doc.getElementsByTagName("statement");
for (int i=0; i<statements.getLength(); i++) {
s = statements.item(i).getChildNodes().item(0).getNodeValue();
db.execSQL(s);
}
} catch (Throwable t) {
}
}
The database is static, I would like suggestions on how to do the same thing for iOS, step by step instructions would be the ideal answer

That's not how Core Data works, I'm afraid. That it uses SQLite is an implementation detail. In fact, it doesn't even have to use SQLite; there are other persistent store types.
You could insert directly into the SQLite database that Core Data creates. I would strong recommend against doing this. It would be very fragile and liable to fail at major version updates.
A better solution might be to use SQLite directly, ignoring Core Data entirely. Core Data is a great abstraction for most apps, but isn't the only way and isn't the best way for all use cases.

You must first recognize that Core Data is not a database engine; it is an object graph persistence framework. One of its persistent store types happens to be sqlite store. Therefore, terms like "table" that are recognizable in the database world are not transferable to Core Data, at least at the level of abstraction you would be working with in your application.
You could use the existing XML export to populate your Core Data persistent store; but realize that the sqlite backing store format is opaque - you would have to locate it on the simulator's file system, then write a block of code that bridges the existing XML export to Core Data's sqlite persistent store. It would be much more trouble than it's worth.