What is the best way to deal with batch updates using (Entity Framework) EF5?
I have 2 particular cases I'm interested in:
Updating a field (e.g. UpdateDate) for a list (List) of between 100 and 100.000 Id's, which the primary key. Calling each update separately seem to be to much overhead and takes a long time.
Inserting many, also between the 100 and 100.000, of the same objects (e.g. Users) in a single go.
Any good advice?
There are two open source projects allowing this: EntityFramework.Extended and Entity Framework Extensions. You can also check discussion about bulk updates on EF's codeplex site.
Inserting 100k records through EF is in the first place wrong application architecture. You should choose different lightweight technology for data imports. Even EF's internal operation with such big record set will cost you a lot of processing time. There is currently no solution for batch inserts for EF but there is broad discussion about this feature on EF's code plex site.
I see the following options:
1 . The simplest way - create your SQL request by hands and execute through ObjectContext.ExecuteStoreCommand
context.ExecuteStoreCommand("UPDATE TABLE SET FIELD1 = {0} WHERE FIELD2 = {1}", value1, value2);
2 . Use EntityFramework.Extended
context.Tasks.Update(
t => t.StatusId == 1,
t => new Task {StatusId = 2});
3 . Make your own extension for EF. There is an article Bulk Delete where this goal was achieved by inheriting ObjectContext class. It's worth to take a look. Bulk insert/update can be implemented in the same way.
You may not want to hear it, but your best option is to not use EF for bulk operations. For updating a field across a table of records, use an Update statement in the database (possibly called through a stored proc mapped to an EF Function). You can also use the Context.ExecuteStoreQuery method to issue an Update statement to the database.
For massive inserts, your best bet is to use Bulk Copy or SSIS. EF will require a separate hit to the database for each row being inserted.
Bulk inserts should be done using the SqlBulkCopy class. Please see pre-existing StackOverflow Q&A on integrating the two: SqlBulkCopy and Entity Framework
SqlBulkCopy is a lot more user-friendly than bcp (Bulk Copy command-line utility) or even OPEN ROWSET.
Here's what I've done successfully:
private void BulkUpdate()
{
var oc = ((IObjectContextAdapter)_dbContext).ObjectContext;
var updateQuery = myIQueryable.ToString(); // This MUST be above the call to get the parameters.
var updateParams = GetSqlParametersForIQueryable(updateQuery).ToArray();
var updateSql = $#"UPDATE dbo.myTable
SET col1 = x.alias2
FROM dbo.myTable
JOIN ({updateQuery}) x(alias1, alias2) ON x.alias1 = dbo.myTable.Id";
oc.ExecuteStoreCommand(updateSql, updateParams);
}
private void BulkInsert()
{
var oc = ((IObjectContextAdapter)_dbContext).ObjectContext;
var insertQuery = myIQueryable.ToString(); // This MUST be above the call to get the parameters.
var insertParams = GetSqlParametersForIQueryable(insertQuery).ToArray();
var insertSql = $#"INSERT INTO dbo.myTable (col1, col2)
SELECT x.alias1, x.alias2
FROM ({insertQuery}) x(alias1, alias2)";
oc.ExecuteStoreCommand(insertSql, insertParams.ToArray());
}
private static IEnumerable<SqlParameter> GetSqlParametersForIQueryable<T>(IQueryable<T> queryable)
{
var objectQuery = GetObjectQueryFromIQueryable(queryable);
return objectQuery.Parameters.Select(x => new SqlParameter(x.Name, x.Value));
}
private static ObjectQuery<T> GetObjectQueryFromIQueryable<T>(IQueryable<T> queryable)
{
var dbQuery = (DbQuery<T>)queryable;
var iqProp = dbQuery.GetType().GetProperty("InternalQuery", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
var iq = iqProp.GetValue(dbQuery, null);
var oqProp = iq.GetType().GetProperty("ObjectQuery", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
return (ObjectQuery<T>)oqProp.GetValue(iq, null);
}
public static bool BulkDelete(string tableName, string columnName, List<object> val)
{
bool ret = true;
var max = 2000;
var pages = Math.Ceiling((double)val.Count / max);
for (int i = 0; i < pages; i++)
{
var count = max;
if (i == pages - 1) { count = val.Count % max; }
var args = val.GetRange(i * max, count);
var cond = string.Join("", args.Select((t, index) => $",#p{index}")).Substring(1);
var sql = $"DELETE FROM {tableName} WHERE {columnName} IN ({cond}) ";
ret &= Db.ExecuteSqlCommand(sql, args.ToArray()) > 0;
}
return ret;
}
I agree with the accepted answer that ef is probably the wrong technology for bulk inserts.
However, I think it's worth having a look at EntityFramework.BulkInsert.
Related
I have a console program that moves Data between two different servers (DatabaseA and DatabaseB).
Database B is a Postgres-Server.
It calls a lot of stored procedures and other raw queries.
I use ExecuteSqlRaw a lot.
I also use NpsqlBulk.EfCore.
The program uses the same context instance for DatabaseB during the whole run it takes to finish.
Somehow i get locks on some of my tables on DatabaseB that never get released.
This happens always on my table mytable_fromdatabase_import.
The code run on that is the following:
protected override void AddIdsNew()
{
var toAdd = IdsNotInDatabaseB();
var newObjectsToAdd = GetByIds(toAdd).Select(Converter.ConvertAToB);
DatabaseBContext.Database.ExecuteSqlRaw("truncate mytable_fromdatabase_import; ");
var uploader = new NpgsqlBulkUploader(DatabaseBContext);
uploader.Insert(newObjectsToAdd); // inserts data into mytable_fromdatabase_import
DatabaseBContext.Database.ExecuteSqlRaw("call insert_myTable_from_importTable();");
}
After i run it the whole table is not accessable annymore and when i query the locks on the server i can see there is a process holding it.
How can i make sure this process always closes and releases its locks on tables?
I thought ef-core would do that automaticaly.
-----------Edit-----------
I just wanted to add that this is not a temporary problem during the run of the console. When i run this code and it is finished my table is still locked and nothing can access it. My understanding was that the ef-core context would release everything after it is disposed (if by error or by being finished)
The problem had nothing to do with ef core but with a wrong configured backupscript. The program is running now with no changes to it and it works fine
For concrete task you need right tools. Probably you have locks when retrieve Ids and also when trying to do not load already imported records. These steps are slow!
I would suggest to use linq2db (disclaimer, I'm co-author of this library)
Create two projects with models from different databases:
Source.Model.csproj - install linq2db.SQLServer
Destination.Model.csproj - install linq2db.PostgreSQL
Follow instructions in T4 templates how to generate model from two databases. It is easy and you can ask questions on linq2db`s github site.
I'll post helper class which I've used for transferring tables on my previous project. It additionally uses library CodeJam for mapping, but in your project, for sure, you can use Automapper.
public class DataImporter
{
private readonly DataConnection _source;
private readonly DataConnection _destination;
public DataImporter(DataConnection source, DataConnection destination)
{
_source = source;
_destination = destination;
}
private long ImportDataPrepared<TSource, TDest>(IOrderedQueryable<TSource> source, Expression<Func<TSource, TDest>> projection) where TDest : class
{
var destination = _destination.GetTable<TDest>();
var tableName = destination.TableName;
var sourceCount = source.Count();
if (sourceCount == 0)
return 0;
var currentCount = destination.Count();
if (currentCount > sourceCount)
throw new Exception($"'{tableName}' what happened here?.");
if (currentCount >= sourceCount)
return 0;
IQueryable<TSource> sourceQuery = source;
if (currentCount > 0)
sourceQuery = sourceQuery.Skip(currentCount);
var projected = sourceQuery.Select(projection);
var copied =
_destination.BulkCopy(
new BulkCopyOptions
{
BulkCopyType = BulkCopyType.MultipleRows,
RowsCopiedCallback = (obj) => RowsCopiedCallback(obj, currentCount, sourceCount, tableName)
}, projected);
return copied.RowsCopied;
}
private void RowsCopiedCallback(BulkCopyRowsCopied obj, int currentRows, int totalRows, string tableName)
{
var percent = (currentRows + obj.RowsCopied) / (double)totalRows * 100;
Console.WriteLine($"Copied {percent:N2}% \tto {tableName}");
}
public class ImporterHelper<TSource>
{
private readonly DataImporter _improrter;
private readonly IOrderedQueryable<TSource> _sourceQuery;
public ImporterHelper(DataImporter improrter, IOrderedQueryable<TSource> sourceQuery)
{
_improrter = improrter;
_sourceQuery = sourceQuery;
}
public long To<TDest>() where TDest : class
{
var mapperBuilder = new MapperBuilder<TSource, TDest>();
return _improrter.ImportDataPrepared(_sourceQuery, mapperBuilder.GetMapper().GetMapperExpressionEx());
}
public long To<TDest>(Expression<Func<TSource, TDest>> projection) where TDest : class
{
return _improrter.ImportDataPrepared(_sourceQuery, projection);
}
}
public ImporterHelper<TSource> ImprortData<TSource>(IOrderedQueryable<TSource> source)
{
return new ImporterHelper<TSource>(this, source);
}
}
So begin transferring. Note that I have used OrderBy/ThenBy to specify Id order to do not import already transferred records - important order fields should be Unique Key combination. So this sample is reentrant and can be re-run again when connection is lost.
var sourceBuilder = new LinqToDbConnectionOptionsBuilder();
sourceBuilder.UseSqlServer(SourceConnectionString);
var destinationBuilder = new LinqToDbConnectionOptionsBuilder();
destinationBuilder.UsePostgreSQL(DestinationConnectionString);
using (var source = new DataConnection(sourceBuilder.Build()))
using (var destination = new DataConnection(destinationBuilder.Build()))
{
var dataImporter = new DataImporter(source, destination);
dataImporter.ImprortData(source.GetTable<Source.Model.FirstTable>()
.OrderBy(e => e.Id1)
.ThenBy(e => e.Id2))
.To<Dest.Model.FirstTable>();
dataImporter.ImprortData(source.GetTable<Source.Model.SecondTable>().OrderBy(e => e.Id))
.To<Dest.Model.SecondTable>();
}
For sure boring part with OrderBy can be generated automatically, but this will explode this already not a short answer.
Also play with BulkCopyOptions. Native Npgsql COPY may fail and Multi-Line variant should be used.
I need to know the correct way to handle SQL Injection when using the FromSQL command.
At runtime, I need to dynamically create a where statement. So I am using FromSql to create the SQL command. Now I know that using use string interpolation is the way to go. However, I need to step through a list of "Where Parameters" to generate the command. Simple enough to do;
foreach (var x in wp)
{
if (!string.IsNullOrEmpty(results))
results = $"{results} and {x.Field} = {x.Value}";
if (string.IsNullOrEmpty(results))
results = $"where {x.Field} = {x.Value}";
}
Problem is that this return a simple string and would not be string interpolation. How can I do this correctly?
Entityframework will parameterize your queries if you put it in the following format:
db.something.FromSql("SELECT * FROM yourTable WHERE AuthorId = {0}", id)
Is x.Field a form field that has a fixed number of possibilities? i.e. title, firstname etc. If so then something like the following:
var sqlstring = new StringBuilder();
var sqlp = new List<SqlParameter>();
var i = 0;
foreach (var x in wp)
{
var param = "#param" + i.ToString();
if (i!=0)
{
sqlstring.Append($" AND {x.Field} = " + param);
sqlp.Add(new SqlParameter(param, x.Value));
}
if (i==0)
{
sqlstring.Append($"WHERE {x.Field} = " + " #param" + i.ToString());
sqlp.Add(new SqlParameter(param, x.Value));
}
i++;
}
You'd then need to do something like this:
db.something.FromSql(sqlstring.ToString(), sqlp.ToArray())
Might be a better/cleaner way but that should work.
My solution to this problem is a VS extension, QueryFirst. QueryFirst generates a C# wrapper for sql that lives in a .sql file. As such, parameters are the only way to get data into your query and SQL injection is near impossible. There are numerous other advantages: you edit your sql in a real environment, it's constantly validated against your db, and using your query in your code is very simple.
Our current system is using Lazyloading by default (it is something I am going to be disabling but it can't be done right now)
For this basic query I want to return two tables, CustomerNote and Note.
This is my query
using (var newContext = new Entities(true))
{
newContext.Configuration.LazyLoadingEnabled = false;
var result = from customerNotes in newContext.CustomerNotes.Include(d=>d.Note)
join note in newContext.Notes
on customerNotes.NoteId equals note.Id
where customerNotes.CustomerId == customerId
select customerNotes;
return result.ToList();
}
My result however only contains the data in the CustomerNote table
The linked entities Customer and Note are both null, what am I doing wrong here?
I got it working with the following which is much simpler than what I've found elsewhere
Context.Configuration.LazyLoadingEnabled = false;
var result = Context.CustomerNotes.Where<CustomerNote>(d => d.CustomerId == customerId)
.Include(d=>d.Note)
.Include(d=>d.Note.User);
return result.ToList();
This returns my CustomerNote table, related Notes and related Users from the Notes.
That is callled eager loading you want to achieve.
var customerNotes = newContext.CustomerNotes.Include(t=> t.Node).ToList();
This should work, i don't really understand the keyword syntax.
If the code above doesn't work try this:
var customerNotes = newContext.CustomerNotes.Include(t=> t.Node).Select(t=> new {
Node = t.Node,
Item = t
}).ToList();
A newbie question. I am using EntityFramework 4.0. The backend database has a function that will return a subset of records based on time.
Example of working code is:
var query = from rx in context.GetRxByDate(tencounter,groupid)
select rx;
var result = context.CreateDetachedCopy(query.ToList());
return result;
I need to verify that a record does not exist in the database before inserting a new record. Before performing the "Any" filter, I would like to populate the context.Rxes with a subset of the larger backend database using the above "GetRxByDate()" function.
I do not know how to populate "Rxes" before performing any further filtering since Rxes is defined as
IQueryable<Rx> Rxes
and does not allow "Rxes =.. ". Here is what I have so far:
using (var context = new EnityFramework())
{
if (!context.Rxes.Any(c => c.Cform == rx.Cform ))
{
// Insert new record
Rx r = new Rx();
r.Trx = realtime;
context.Add(r);
context.SaveChanges();
}
}
I am fully prepared to kick myself since I am sure the answer is simple.
All help is appreciated. Thanks.
Edit:
If I do it this way, "Any" seems to return the opposite results of what is expected:
var g = context.GetRxByDate(tencounter, groupid).ToList();
if( g.Any(c => c.Cform == rx.Cform ) {....}
Ok, I must be working too hard because I can't get my head around what it takes to use the Entity Framework correctly.
Here is what I am trying to do:
I have two tables: HeaderTable and DetailTable. The DetailTable will have 1 to Many records for each row in HeaderTable. In my EDM I set up a Relationship between these two tables to reflect this.
Since there is now a relationship setup between these tables, I thought that by quering all the records in HeaderTable, I would be able to access the DetailTable collection created by the EDM (I can see the property when quering, but it's null).
Here is my query (this is a Silverlight app, so I am using the DomainContext on the client):
// myContext is instatiated with class scope
EntityQuery<Project> query = _myContext.GetHeadersQuery();
_myContext.Load<Project>(query);
Since these calls are asynchronous, I check the values after the callback has completed. When checking the value of _myContext.HeaderTable I have all the rows expected. However, the DetailsTable property within _myContext.HeaderTable is empty.
foreach (var h in _myContext.HeaderTable) // Has records
{
foreach (var d in h.DetailTable) // No records
{
string test = d.Description;
}
I'm assuming my query to return all HeaderTable objects needs to be modified to somehow return all the HeaderDetail collectoins for each HeaderTable row. I just don't understand how this non-logical modeling stuff works yet.
What am I doing wrong? Any help is greatly appriciated. If you need more information, just let me know. I will be happy to provide anything you need.
Thanks,
-Scott
What you're probably missing is the Include(), which I think is out of scope of the code you provided.
Check out this cool video; it explained everything about EDM and Linq-to-Entities to me:
http://msdn.microsoft.com/en-us/data/ff628210.aspx
In case you can't view video now, check out this piece of code I have based on those videos (sorry it's not in Silverlight, but it's the same basic idea, I hope).
The retrieval:
public List<Story> GetAllStories()
{
return context.Stories.Include("User").Include("StoryComments").Where(s => s.HostID == CurrentHost.ID).ToList();
}
Loading the the data:
private void LoadAllStories()
{
lvwStories.DataSource = TEContext.GetAllStories();
lvwStories.DataBind();
}
Using the data:
protected void lvwStories_ItemDataBound(object sender, ListViewItemEventArgs e)
{
if (e.Item.ItemType == ListViewItemType.DataItem)
{
Story story = e.Item.DataItem as Story;
// blah blah blah....
hlStory.Text = story.Title;
hlStory.NavigateUrl = "StoryView.aspx?id=" + story.ID;
lblStoryCommentCount.Text = "(" + story.StoryComments.Count.ToString() + " comment" + (story.StoryComments.Count > 1 ? "s" : "") + ")";
lblStoryBody.Text = story.Body;
lblStoryUser.Text = story.User.Username;
lblStoryDTS.Text = story.AddedDTS.ToShortTimeString();
}
}