I want to know why Code fragment 1 is faster than Code 2 using POCO's with Devart DotConnect for Oracle.
I tried it over 100000 records and Code 1 is way faster than 2. Why? I thought "SaveChanges" would clear the buffer making it faster as there is only 1 connection. Am I wrong?
Code 1:
for (var i = 0; i < 100000; i++)
{
using (var ctx = new MyDbContext())
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
Code 2:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
The first code snippet works faster as the same connection is taken from the pool every time, so there are no performance losses on its re-opening.
In the second case 100000 objects gradually are added to the context. A slow snapshot-based tracking is used (if no dynamic proxy). This leads to the detection if any changes in any of cached objects occured on each SaveChanges(). More and more time is spent by each subsequent iteration.
We recommend you to try the following approach. It should have a better performance than the mentioned ones:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
}
ctx.SaveChanges();
}
EDIT
If you use an approach with executing large number of operations within one SaveChanges(), it will be useful to configure additionally the Entity Framework behaviour of Devart dotConnect for Oracle provider:
// Turn on the Batch Updates mode:
var config = OracleEntityProviderConfig.Instance;
config.DmlOptions.BatchUpdates.Enabled = true;
// If necessary, enable the mode of re-using parameters with the same values:
config.DmlOptions.ReuseParameters = true;
// If object has a lot of nullable properties, and significant part of them are not set (i.e., nulls), omitting explicit insert of NULL-values will decrease greatly the size of generated SQL:
config.DmlOptions.InsertNullBehaviour = InsertNullBehaviour.Omit;
Only some options are mentioned here. The full list of them is available in our article:
http://www.devart.com/blogs/dotconnect/index.php/new-features-of-entity-framework-support-in-dotconnect-providers.html
Am I wrong to assume that when SaveChanges() is called, all the
objects in cache are stored to DB and the cache is cleared, so each
loop is independent?
SaveChanges() sends and commits all changes to database, but change tracking is continued for all entities which are attached to the context. And new SaveChanges, if snapshot-based change tracking is used, will start a long process of checking (changed or not?) the values of each property for each object.
Related
I wrote the code to update my table (SecurityQuestionAnswer) with new security password questions and move to old questions to another table (SecurityQuestionAnswersArchives). Total no of security questions is 3. I am able to update the current table, but when I add the same rows to history table, it shows weird data: only two records are added instead of 3 and the data is also duplicated. My code is as follows:
if (oldQuestions.Any())
{
var oldquestionstoarchivelist = new List<SecurityQuestionAnswersArchives>();
var oldquestionstoarchive =new SecurityQuestionAnswersArchives();
for (int i = 0; i < 3; i++)
{
oldquestionstoarchive.Id = oldQuestions[i].Id;
oldquestionstoarchive.SecurityQuestionId = oldQuestions[i].SecurityQuestionId;
oldquestionstoarchive.Answer = oldQuestions[i].Answer;
oldquestionstoarchive.UpdateDate = oldQuestions[i].UpdateDate;
oldquestionstoarchive.IpAddress = oldQuestions[i].IpAddress;
oldquestionstoarchive.SecurityQuestion = oldQuestions[i].SecurityQuestion;
oldquestionstoarchive.User = oldQuestions[i].User;
oldquestionstoarchivelist.Add(oldquestionstoarchive);
}
user.SecurityQuestionAnswersArchives = oldquestionstoarchivelist;
//await Store.UpdateAsync(user);
_dbContext.ArchiveSecurityQuestionAnswers.AddRange(oldquestionstoarchivelist);
_dbContext.SecurityQuestionAnswers.RemoveRange(oldQuestions);
await _dbContext.SaveChangesAsync();
oldquestionstoarchivelist.Clear();
}
UPDATE 1
The loop looks fine, It iterates three times(0,1,2), which is expected. First issue is with AddRange function to which I was passing a list , but it takes an IEnumerable input, I rectified it using following code.
IEnumerable<SecurityQuestionAnswersArchives> finalArchiveses = oldquestionstoarchivelist;
_dbContext.ArchiveSecurityQuestionAnswers.AddRange(finalArchiveses);
The other issue is duplicate data , which I am unable to figure out where the issue is. Please help me in finding this out.
Your help is much appreciated !
Got it ! Just sharing in case anybody has same issue.
The problem was with initialization at wrong place. I moved
var oldquestionstoarchive =new SecurityQuestionAnswersArchives();
in side the Forloop, now the variable will hold the unique values over each iteration.
var oldquestionstoarchivelist = new List<SecurityQuestionAnswersArchives>();
for (int i = 0; i < 3; i++)
{
var oldquestionstoarchive = new SecurityQuestionAnswersArchives();
oldquestionstoarchive.SecurityQuestionId = oldQuestions[i].SecurityQuestionId;
oldquestionstoarchive.Answer = oldQuestions[i].Answer;
oldquestionstoarchive.UpdateDate = oldQuestions[i].UpdateDate;
oldquestionstoarchive.IpAddress = oldQuestions[i].IpAddress;
oldquestionstoarchive.SecurityQuestion = oldQuestions[i].SecurityQuestion;
oldquestionstoarchive.User = oldQuestions[i].User;
oldquestionstoarchivelist.Add(oldquestionstoarchive);
}
What is the best way to deal with batch updates using (Entity Framework) EF5?
I have 2 particular cases I'm interested in:
Updating a field (e.g. UpdateDate) for a list (List) of between 100 and 100.000 Id's, which the primary key. Calling each update separately seem to be to much overhead and takes a long time.
Inserting many, also between the 100 and 100.000, of the same objects (e.g. Users) in a single go.
Any good advice?
There are two open source projects allowing this: EntityFramework.Extended and Entity Framework Extensions. You can also check discussion about bulk updates on EF's codeplex site.
Inserting 100k records through EF is in the first place wrong application architecture. You should choose different lightweight technology for data imports. Even EF's internal operation with such big record set will cost you a lot of processing time. There is currently no solution for batch inserts for EF but there is broad discussion about this feature on EF's code plex site.
I see the following options:
1 . The simplest way - create your SQL request by hands and execute through ObjectContext.ExecuteStoreCommand
context.ExecuteStoreCommand("UPDATE TABLE SET FIELD1 = {0} WHERE FIELD2 = {1}", value1, value2);
2 . Use EntityFramework.Extended
context.Tasks.Update(
t => t.StatusId == 1,
t => new Task {StatusId = 2});
3 . Make your own extension for EF. There is an article Bulk Delete where this goal was achieved by inheriting ObjectContext class. It's worth to take a look. Bulk insert/update can be implemented in the same way.
You may not want to hear it, but your best option is to not use EF for bulk operations. For updating a field across a table of records, use an Update statement in the database (possibly called through a stored proc mapped to an EF Function). You can also use the Context.ExecuteStoreQuery method to issue an Update statement to the database.
For massive inserts, your best bet is to use Bulk Copy or SSIS. EF will require a separate hit to the database for each row being inserted.
Bulk inserts should be done using the SqlBulkCopy class. Please see pre-existing StackOverflow Q&A on integrating the two: SqlBulkCopy and Entity Framework
SqlBulkCopy is a lot more user-friendly than bcp (Bulk Copy command-line utility) or even OPEN ROWSET.
Here's what I've done successfully:
private void BulkUpdate()
{
var oc = ((IObjectContextAdapter)_dbContext).ObjectContext;
var updateQuery = myIQueryable.ToString(); // This MUST be above the call to get the parameters.
var updateParams = GetSqlParametersForIQueryable(updateQuery).ToArray();
var updateSql = $#"UPDATE dbo.myTable
SET col1 = x.alias2
FROM dbo.myTable
JOIN ({updateQuery}) x(alias1, alias2) ON x.alias1 = dbo.myTable.Id";
oc.ExecuteStoreCommand(updateSql, updateParams);
}
private void BulkInsert()
{
var oc = ((IObjectContextAdapter)_dbContext).ObjectContext;
var insertQuery = myIQueryable.ToString(); // This MUST be above the call to get the parameters.
var insertParams = GetSqlParametersForIQueryable(insertQuery).ToArray();
var insertSql = $#"INSERT INTO dbo.myTable (col1, col2)
SELECT x.alias1, x.alias2
FROM ({insertQuery}) x(alias1, alias2)";
oc.ExecuteStoreCommand(insertSql, insertParams.ToArray());
}
private static IEnumerable<SqlParameter> GetSqlParametersForIQueryable<T>(IQueryable<T> queryable)
{
var objectQuery = GetObjectQueryFromIQueryable(queryable);
return objectQuery.Parameters.Select(x => new SqlParameter(x.Name, x.Value));
}
private static ObjectQuery<T> GetObjectQueryFromIQueryable<T>(IQueryable<T> queryable)
{
var dbQuery = (DbQuery<T>)queryable;
var iqProp = dbQuery.GetType().GetProperty("InternalQuery", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
var iq = iqProp.GetValue(dbQuery, null);
var oqProp = iq.GetType().GetProperty("ObjectQuery", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
return (ObjectQuery<T>)oqProp.GetValue(iq, null);
}
public static bool BulkDelete(string tableName, string columnName, List<object> val)
{
bool ret = true;
var max = 2000;
var pages = Math.Ceiling((double)val.Count / max);
for (int i = 0; i < pages; i++)
{
var count = max;
if (i == pages - 1) { count = val.Count % max; }
var args = val.GetRange(i * max, count);
var cond = string.Join("", args.Select((t, index) => $",#p{index}")).Substring(1);
var sql = $"DELETE FROM {tableName} WHERE {columnName} IN ({cond}) ";
ret &= Db.ExecuteSqlCommand(sql, args.ToArray()) > 0;
}
return ret;
}
I agree with the accepted answer that ef is probably the wrong technology for bulk inserts.
However, I think it's worth having a look at EntityFramework.BulkInsert.
I am having a real issue with the EF v1. I have quite a big EDMX with maybe 50 entities mapped, but this one entity is causing me grief.
The entity has mappings to other entities which in effect are reference tables, but for some reason it is trying to do an insert and not just update itself.
Here is a fragment of my code:
using (var context = new someEntities()) {
var studentCourseJoin =
context.StudentCourseJoinSet.Where(o => o.Code == scjCode).First();
studentCourseJoin.EntryStatus = new EntryStatus { Code = viewModel.StudentDetails.EntryStatusCode };
studentCourseJoin.ParentalInHigherEducation = new ParentalInHigherEducation { Code = viewModel.StudentDetails.ParentalInHigherEducationCode };
studentCourseJoin.School = new School { Code = viewModel.StudentDetails.SchoolCode };
studentCourseJoin.Institution = new Institution { Code = viewModel.StudentDetails.InstitutionCode };
studentCourseJoin.LastSchoolEndYear = viewModel.StudentDetails.LastSchoolEndYear;
studentCourseJoin.LastInstitutionEndYear = viewModel.StudentDetails.LastInstitutionEndYear;
// Blows up here trying to do an insert on the studentCourseJoin.Institution.
// But if I removed this one, then it will blow up on another one.
context.SaveChanges(true);
}
If anyone has ANY ideas please, they would help a lot.
Try adding those lines before calling SaveChanges:
ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(studentCourseJoin);
entry.ChangeState(EntityState.Modified);
Update:
Try this for Institution instead:
studentCourseJoin.Institution = context.Institutions.FirstOrDefault(i => i.Code == viewModel.StudentDetails.InstitutionCode);
I'm new to the entity framework and I'm really confused about how savechanges works. There's probably a lot of code in my example which could be improved, but here's the problem I'm having.
The user enters a bunch of picks. I make sure the user hasn't already entered those picks.
Then I add the picks to the database.
var db = new myModel()
var predictionArray = ticker.Substring(1).Split(','); // Get rid of the initial comma.
var user = Membership.GetUser();
var userId = Convert.ToInt32(user.ProviderUserKey);
// Get the member with all his predictions for today.
var memberQuery = (from member in db.Members
where member.user_id == userId
select new
{
member,
predictions = from p in member.Predictions
where p.start_date == null
select p
}).First();
// Load all the company ids.
foreach (var prediction in memberQuery.predictions)
{
prediction.CompanyReference.Load();
}
var picks = from prediction in predictionArray
let data = prediction.Split(':')
let companyTicker = data[0]
where !(from i in memberQuery.predictions
select i.Company.ticker).Contains(companyTicker)
select new Prediction
{
Member = memberQuery.member,
Company = db.Companies.Where(c => c.ticker == companyTicker).First(),
is_up = data[1] == "up", // This turns up and down into true and false.
};
// Save the records to the database.
// HERE'S THE PART I DON'T UNDERSTAND.
// This saves the records, even though I don't have db.AddToPredictions(pick)
foreach (var pick in picks)
{
db.SaveChanges();
}
// This does not save records when the db.SaveChanges outside of a loop of picks.
db.SaveChanges();
foreach (var pick in picks)
{
}
// This saves records, but it will insert all the picks exactly once no matter how many picks you have.
//The fact you're skipping a pick makes no difference in what gets inserted.
var counter = 1;
foreach (var pick in picks)
{
if (counter == 2)
{
db.SaveChanges();
}
counter++;
}
I've tested and the SaveChanges doesn't even have to be in the loop.
The below code works, too.
foreach (var pick in picks)
{
break;
}
db.SaveChanges()
There's obviously something going on with the context I don't understand. I'm guessing I've somehow loaded my new picks as pending changes, but even if that's true I don't understand I have to loop over them to save changes.
Can someone explain this to me?
Here's updated working code based on Craig's responses:
1) Remove the Type then loop over the results and populate new objects.
var picks = (from prediction in predictionArray
let data = prediction.Split(':')
let companyTicker = data[0]
where !(from i in memberQuery.predictions
select i.Company.ticker).Contains(companyTicker)
select new //NO TYPE HERE
{
Member = memberQuery.member,
Company = db.Companies.Where(c => c.ticker == companyTicker).First(),
is_up = data[1] == "up", // This turns up and down into true and false.
}).ToList();
foreach (var prediction in picks)
{
if (includePrediction)
{
var p = new Prediction{
Member = prediction.Member,
Company = prediction.Company,
is_up = prediction.is_up
};
db.AddToPredictions(p);
}
}
2) Or if I don't want the predictions to be saved, I can detach the predictions.
foreach (var prediction in picks) {
if (excludePrediction)
{
db.Detach(prediction)
}
}
The reason is here:
select new Prediction
{
Member = memberQuery.member,
These lines will (once the IEnumerable is iterated; LINQ is lazy) :
Instantiate a new Prediction
Associate that Prediction with an existing Member, *which is attached to db.
Associating an instance of an entity with an attached entity automatically adds that entity to the context of the associated, attached entity.
So as soon as you start iterating over predictionArray, the code above executes and you have a new entity in your context.
When I run this code:
korlenEntities2 _db = new korlenEntities2();
for (int i = 0; i < 10; i++)
{
klienci klient = new klienci();
klient.nazwa = "Janek_" + i.ToString();
klient.miejscowosc = "-";
_db.AddToklienci(klient);
};
_db.SaveChanges();
records are added to database in random order, so my field ID is not filled correctly. this is important to me since I want to use it for later ordering
You cannot control the order of query execution unless you call SaveChanges after every query. Nor can you depend on auto-incremented keys to be sequential in all cases (consider replication). If order is important, you should add a field for that.