When should I call SaveChanges() when creating 1000's of Entity Framework objects? (like during an import) - entity-framework

I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:
Which of these makes the most sense:
Run SaveChanges() every AddToClassName() call.
Run SaveChanges() every n number of AddToClassName() calls.
Run SaveChanges() after all of the AddToClassName() calls.
The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.
I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.
The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?

I would test it first to be sure. Performance doesn't have to be that bad.
If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.
Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.
Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).
EDIT
I've done some tests (time in miliseconds):
10000 rows:
SaveChanges() after 1 row:18510,534SaveChanges() after 100 rows:4350,3075SaveChanges() after 10000 rows:5233,0635
50000 rows:
SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765
So it is actually faster to commit after n rows than after all.
My recommendation is to:
SaveChanges() after n rows.
If one commit fails, try it one by one to find faulty row.
Test classes:
TABLE:
CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Class:
public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private string RandomString(int size)
{
var randomSize = _rng.Next(size);
char[] buffer = new char[randomSize];
for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}
public ActionResult EFPerformance()
{
string result = "";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();
return Content(result);
}
private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = #"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}
private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;
using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);
if (i % commitAfterRows == 0) context.SaveChanges();
}
}
var endDate = DateTime.Now;
return endDate.Subtract(startDate);
}
}

I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.
This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.

If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.
MSDN docs on SqlBulkCopy
Use SqlBulkCopy to Quickly Load Data from your Client to SQL Server
Transferring Data Using SqlBulkCopy

Use a stored procedure.
Create a User-Defined Data Type in Sql Server.
Create and populate an array of this type in your code (very fast).
Pass the array to your stored procedure with one call (very fast).
I believe this would be the easiest and fastest way to do this.

Sorry, I know this thread is old, but I think this could help other people with this problem.
I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.
// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
// Delete invalid User or Change
var chUser = (db_User) ch.Entry.Entity;
if (chUser.LastUpdated == null)
{
// Invalid, new User
_userDatabase.db_User.Remove(chUser);
Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
}
else
{
// Invalid Change of an Entry
_userDatabase.Entry(chUser).Reload();
Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
}
}
_userDatabase.SaveChanges();

Related

How to calculate mean of distributed data?

How I can calculate the arithmetic mean of a large vector(series) in distributed computing where I partition the data on multiple nodes. I do not want to use map reduce paradigm. Is there any distributed algorithm to efficiently compute the mean besides the trivial computation of individual sum on each node and then bringing the result at master node and dividing with the size of the vector(series).
distributed average consensus is an alternative.
The problem with the trivial approach of map-reduce with a master is that if you have a vast set of data, in essence to make everything dependent on each other, it could take a very long time to calculate the data, by which time the information is very out of date, and therefore wrong, unless you lock the entire dataset - impractical for a massive set of distributed data. Using distributed average consensus (the same methods work for alternative algorithms to Mean), you get a more up to date, better guess at the current value of the Mean without locking the data, and in real time.
Here is a link to a paper on it, but it's math heavy :
http://web.stanford.edu/~boyd/papers/pdf/lms_consensus.pdf
You can google for many papers on it.
The general concept is like this: say on each node you have a socket listener. You evaluate your local sum and average, then publish it to the other nodes. Each node listens for the other nodes, and receives their sum and averages on a timescale that makes sense. You can then evaluate a good guess at the total average by (sumForAllNodes(storedAverage[node] * storedCount[node]) / (sumForAllNodes(storedCount[node])). If you have a truly large dataset, you could just listen for new values as they are stored in the node, and amend the local count and average, then publish them.
If even this is taking too long, you could average over a random subset of the data in each node.
Here is some c# code that gives you an idea (uses fleck to run on more versions of windows than windows-10-only microsoft websockets implementation). Run this on two nodes, one with
<appSettings>
<add key="thisNodeName" value="UK" />
</appSettings>
in the app.config, and use "EU-North" in the other. Here is some sample code. The two instances exchange means using websockets. You just need to add your back end enumeration of the database.
using Fleck;
namespace WebSocketServer
{
class Program
{
static List<IWebSocketConnection> _allSockets;
static Dictionary<string,decimal> _allMeans;
static Dictionary<string,decimal> _allCounts;
private static decimal _localMean;
private static decimal _localCount;
private static decimal _localAggregate_count;
private static decimal _localAggregate_average;
static void Main(string[] args)
{
_allSockets = new List<IWebSocketConnection>();
_allMeans = new Dictionary<string, decimal>();
_allCounts = new Dictionary<string, decimal>();
var serverAddresses = new Dictionary<string,string>();
//serverAddresses.Add("USA-WestCoast", "ws://127.0.0.1:58951");
//serverAddresses.Add("USA-EastCoast", "ws://127.0.0.1:58952");
serverAddresses.Add("UK", "ws://127.0.0.1:58953");
serverAddresses.Add("EU-North", "ws://127.0.0.1:58954");
//serverAddresses.Add("EU-South", "ws://127.0.0.1:58955");
foreach (var serverAddress in serverAddresses)
{
_allMeans.Add(serverAddress.Key, 0m);
_allCounts.Add(serverAddress.Key, 0m);
}
var thisNodeName = ConfigurationSettings.AppSettings["thisNodeName"]; //for example "UK"
var serverSocketAddress = serverAddresses.First(x=>x.Key==thisNodeName);
serverAddresses.Remove(thisNodeName);
var websocketServer = new Fleck.WebSocketServer(serverSocketAddress.Value);
websocketServer.Start(socket =>
{
socket.OnOpen = () =>
{
Console.WriteLine("Open!");
_allSockets.Add(socket);
};
socket.OnClose = () =>
{
Console.WriteLine("Close!");
_allSockets.Remove(socket);
};
socket.OnMessage = message =>
{
Console.WriteLine(message + " received");
var parameters = message.Split('~');
var remoteHost = parameters[0];
var remoteMean = decimal.Parse(parameters[1]);
var remoteCount = decimal.Parse(parameters[2]);
_allMeans[remoteHost] = remoteMean;
_allCounts[remoteHost] = remoteCount;
};
});
while (true)
{
//evaluate my local average and count
Random rand = new Random(DateTime.Now.Millisecond);
_localMean = 234.00m + (rand.Next(0, 100) - 50)/10.0m;
_localCount = 222m + rand.Next(0, 100);
//evaluate my local aggregate average using means and counts sent from all other nodes
//could publish aggregate averages to other nodes, if you wanted to monitor disagreement between nodes
var total_mean_times_count = 0m;
var total_count = 0m;
foreach (var server in serverAddresses)
{
total_mean_times_count += _allCounts[server.Key]*_allMeans[server.Key];
total_count += _allCounts[server.Key];
}
//add on local mean and count which were removed from the server list earlier, so won't be processed
total_mean_times_count += (_localMean * _localCount);
total_count = total_count + _localCount;
_localAggregate_average = (total_mean_times_count/total_count);
_localAggregate_count = total_count;
Console.WriteLine("local aggregate average = {0}", _localAggregate_average);
System.Threading.Thread.Sleep(10000);
foreach (var serverAddress in serverAddresses)
{
using (var wscli = new ClientWebSocket())
{
var tokSrc = new CancellationTokenSource();
using (var task = wscli.ConnectAsync(new Uri(serverAddress.Value), tokSrc.Token))
{
task.Wait();
}
using (var task = wscli.SendAsync(new ArraySegment<byte>(Encoding.UTF8.GetBytes(thisNodeName+"~"+_localMean + "~"+_localCount)),
WebSocketMessageType.Text,
false,
tokSrc.Token
))
{
task.Wait();
}
}
}
}
}
}
}
Don't forget to add static lock or separate activity by synchronising at given times. (not shown for simplicity)
There are two simple approaches you can use.
One is, as you correctly noted, to calculate the sum on every node and then combine the sums and divide by the total amount of data:
avg = (sum1+sum2+sum3)/(cnt1+cnt2+cnt3)
Another possibility is to calculate the average on every node and then use weighted average:
avg = (avg1*cnt1 + avg2*cnt2 + avg3*cnt3) / (cnt1+cnt2+cnt3)
= avg1*cnt1/(cnt1+cnt2+cnt3) + avg2*cnt2/(cnt1+cnt2+cnt3) + avg3*cnt3/(cnt1+cnt2+cnt3)
I don't see anything wrong with these trivial ways and am wondering why you would want to use a different approach.

please help me to convert trigger to batch apex

please help me in converting my after trigger to batch apex.
This trigger fires when opportunity stage changes to won.
It runs through line items and checks if forecast(custom objet) exists with that acunt.if yes,iit links to them..if no,itt will create a new forecat.
my trigger works fine forr some records.but to mass update i am getting timed out error.So opting batch apex but i had never written it.pls help me.
trigger Accountforecast on Opportunity (after insert,after update) {
List<Acc_c> AccproductList =new List<Acc_c>();
List<Opportunitylineitem> opplinitemlist =new List<Opportunitylineitem>();
list<opportunitylineitem > oppdate= new list<opportunitylineitem >();
List<Acc__c> accquery =new List<Acc__c>();
List<date> dt =new List<date>();
Set<Id> sProductIds = new Set<Id>();
Set<Id> sAccountIds = new Set<Id>();
Set<id> saccprodfcstids =new set<Id>();
Acc__c accpro =new Acc__c();
string aname;
Integer i;
Integer myIntMonth;
Integer myIntyear;
Integer myIntdate;
opplinitemlist=[select Id,PricebookEntry.Product2.Name,opp_account__c,Opp_account_name__c,PricebookEntry.Product2.id, quantity,ServiceDate,Acc_Product_Fcst__c from Opportunitylineitem WHERE Opportunityid IN :Trigger.newMap.keySet() AND Acc__c=''];
for(OpportunityLineItem oli:opplinitemlist) {
sProductIds.add(oli.PricebookEntry.Product2.id);
sAccountIds.add(oli.opp_account__c);
}
accquery=[select id,Total_Qty_Ordered__c,Last_Order_Qty__c,Last_Order_Date__c,Fcst_Days_Period__c from Acc__c where Acc__c.product__c In :sproductids and Acc__c.Account__c in :saccountids];
for(Acc__c apf1 :accquery){
saccprodfcstids.add(apf1.id);
}
if(saccprodfcstids!=null){
oppdate=[select servicedate from opportunitylineitem where Acc__c IN :saccprodfcstids ];
i =[select count() from Opportunitylineitem where acc_product_fcst__c in :saccprodfcstids];
}
for(Opportunity opp :trigger.new)
{
if(opp.Stagename=='Closed Won')
{
for(opportunitylineitem opplist:opplinitemlist)
{
if(!accquery.isempty())
{
for(opportunitylineitem opldt :oppdate)
{
string myDate = String.valueOf(opldt);
myDate = myDate.substring(myDate.indexof('ServiceDate=')+12);
myDate = myDate.substring(0,10);
String[] strDate = myDate.split('-');
myIntMonth = integer.valueOf(strDate[1]);
myIntYear = integer.valueOf(strDate[0]);
myIntDate = integer.valueOf(strDate[2]);
Date d = Date.newInstance(myIntYear, myIntMonth, myIntDate);
dt.add(d);
}
dt.add(opp.closedate);
dt.sort();
integer TDays=0;
system.debug('*************dt:'+dt.size());
for(integer c=0;c<dt.size()-1;c++)
{
TDays=TDays+dt[c].daysBetween(dt[c+1]);
}
for(Acc_product_fcst__c apf:accquery)
{
apf.Fcst_Days_Period__c = TDays/i;
apf.Total_Qty_Ordered__c =apf.Total_Qty_Ordered__c +opplist.quantity;
apf.Last_Order_Qty__c=opplist.quantity;
apf.Last_Order_Date__c=opp.CloseDate ;
apf.Fcst_Qty_Avg__c=apf.Total_Qty_Ordered__c/(i+1);
Opplist.Acc__c =apf.Id;
}
}
else{
accpro.Account__c=opplist.opp_account__c;
accpro.product__c=opplist.PricebookEntry.Product2.Id;
accpro.opplineitemid__c=opplist.id;
accpro.Total_Qty_Ordered__c =opplist.quantity;
accpro.Last_Order_Qty__c=opplist.quantity;
accpro.Last_Order_Date__c=opp.CloseDate;
accpro.Fcst_Qty_Avg__c=opplist.quantity;
accpro.Fcst_Days_Period__c=7;
accproductList.add(accpro);
}
}
}
}
if(!accproductlist.isempty()){
insert accproductlist;
}
update opplinitemlist;
update accquery;
}
First of all, you should take a look at this: Apex Batch Processing
Once you get a better idea on how batches work, we need to take into account the following points:
Identify the object that requires more processing. Account? Opportunity?
Should the data be maintained across batch calls? Stateful?
Use correct data structure in terms of performance. Map, List?
From your code, we can see you have three objects: OpportunityLineItems, Accounts, and Opportunities. It seems that your account object is using the most processing here.
It seems you're just keeping track of dates and not doing any aggregations. Thus, you don't need to maintain state across batch calls.
Your code has a potential of hitting governor limits, especially memory limits on the heap. You have a four-nested loop. Our suggestion would be to maintain opportunity line items related to Opportunities in a Map rather than in a List. Plus, we can get rid of those unnecessary for loops by refactoring the code as follows:
Note: This is just a template for the batch you will need to construct.
globalglobal Database.QueryLocator start(Database.BatchableContext BC) class AccountforecastBatch implements Database.Batchable<sObject>
{
global Database.QueryLocator start(Database.BatchableContext BC)
{
// 1. Do some initialization here: (i.e. for(OpportunityLineItem oli:opplinitemlist) {sProductIds.add(oli.PricebookEntry.Product2.id)..}
// 2. return Opportunity object here: return Database.getQueryLocator([select id,Total_Qty_Ordered__c,Last_Order_Qty ....]);
}
global void execute(Database.BatchableContext BC, List<sObject> scope)
{
// 1. Traverse your scope which at this point will be a list of Accounts
// 2. You're adding dates inside the process for Opportunity Line Items. See if you can isolate this process outside the for loops with a Map data structure.
// 3. You have 3 potential database transactions here (insert accproductlist;update opplinitemlist; update accquery; ). Ideally, you will only need one DB transaction per batch.If you can complete step 2 above, you might only need to update your opportunity line items. Otherwise, you're trying to do more than one thing in a method and you will need to redesign your solution
}
global void finish(Database.BatchableContext BC)
{
// send email or do some other tasks here
}
}

Batch update/delete EF5

What is the best way to deal with batch updates using (Entity Framework) EF5?
I have 2 particular cases I'm interested in:
Updating a field (e.g. UpdateDate) for a list (List) of between 100 and 100.000 Id's, which the primary key. Calling each update separately seem to be to much overhead and takes a long time.
Inserting many, also between the 100 and 100.000, of the same objects (e.g. Users) in a single go.
Any good advice?
There are two open source projects allowing this: EntityFramework.Extended and Entity Framework Extensions. You can also check discussion about bulk updates on EF's codeplex site.
Inserting 100k records through EF is in the first place wrong application architecture. You should choose different lightweight technology for data imports. Even EF's internal operation with such big record set will cost you a lot of processing time. There is currently no solution for batch inserts for EF but there is broad discussion about this feature on EF's code plex site.
I see the following options:
1 . The simplest way - create your SQL request by hands and execute through ObjectContext.ExecuteStoreCommand
context.ExecuteStoreCommand("UPDATE TABLE SET FIELD1 = {0} WHERE FIELD2 = {1}", value1, value2);
2 . Use EntityFramework.Extended
context.Tasks.Update(
t => t.StatusId == 1,
t => new Task {StatusId = 2});
3 . Make your own extension for EF. There is an article Bulk Delete where this goal was achieved by inheriting ObjectContext class. It's worth to take a look. Bulk insert/update can be implemented in the same way.
You may not want to hear it, but your best option is to not use EF for bulk operations. For updating a field across a table of records, use an Update statement in the database (possibly called through a stored proc mapped to an EF Function). You can also use the Context.ExecuteStoreQuery method to issue an Update statement to the database.
For massive inserts, your best bet is to use Bulk Copy or SSIS. EF will require a separate hit to the database for each row being inserted.
Bulk inserts should be done using the SqlBulkCopy class. Please see pre-existing StackOverflow Q&A on integrating the two: SqlBulkCopy and Entity Framework
SqlBulkCopy is a lot more user-friendly than bcp (Bulk Copy command-line utility) or even OPEN ROWSET.
Here's what I've done successfully:
private void BulkUpdate()
{
var oc = ((IObjectContextAdapter)_dbContext).ObjectContext;
var updateQuery = myIQueryable.ToString(); // This MUST be above the call to get the parameters.
var updateParams = GetSqlParametersForIQueryable(updateQuery).ToArray();
var updateSql = $#"UPDATE dbo.myTable
SET col1 = x.alias2
FROM dbo.myTable
JOIN ({updateQuery}) x(alias1, alias2) ON x.alias1 = dbo.myTable.Id";
oc.ExecuteStoreCommand(updateSql, updateParams);
}
private void BulkInsert()
{
var oc = ((IObjectContextAdapter)_dbContext).ObjectContext;
var insertQuery = myIQueryable.ToString(); // This MUST be above the call to get the parameters.
var insertParams = GetSqlParametersForIQueryable(insertQuery).ToArray();
var insertSql = $#"INSERT INTO dbo.myTable (col1, col2)
SELECT x.alias1, x.alias2
FROM ({insertQuery}) x(alias1, alias2)";
oc.ExecuteStoreCommand(insertSql, insertParams.ToArray());
}
private static IEnumerable<SqlParameter> GetSqlParametersForIQueryable<T>(IQueryable<T> queryable)
{
var objectQuery = GetObjectQueryFromIQueryable(queryable);
return objectQuery.Parameters.Select(x => new SqlParameter(x.Name, x.Value));
}
private static ObjectQuery<T> GetObjectQueryFromIQueryable<T>(IQueryable<T> queryable)
{
var dbQuery = (DbQuery<T>)queryable;
var iqProp = dbQuery.GetType().GetProperty("InternalQuery", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
var iq = iqProp.GetValue(dbQuery, null);
var oqProp = iq.GetType().GetProperty("ObjectQuery", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
return (ObjectQuery<T>)oqProp.GetValue(iq, null);
}
public static bool BulkDelete(string tableName, string columnName, List<object> val)
{
bool ret = true;
var max = 2000;
var pages = Math.Ceiling((double)val.Count / max);
for (int i = 0; i < pages; i++)
{
var count = max;
if (i == pages - 1) { count = val.Count % max; }
var args = val.GetRange(i * max, count);
var cond = string.Join("", args.Select((t, index) => $",#p{index}")).Substring(1);
var sql = $"DELETE FROM {tableName} WHERE {columnName} IN ({cond}) ";
ret &= Db.ExecuteSqlCommand(sql, args.ToArray()) > 0;
}
return ret;
}
I agree with the accepted answer that ef is probably the wrong technology for bulk inserts.
However, I think it's worth having a look at EntityFramework.BulkInsert.

Devart Oracle Entity Framework 4.1 performance

I want to know why Code fragment 1 is faster than Code 2 using POCO's with Devart DotConnect for Oracle.
I tried it over 100000 records and Code 1 is way faster than 2. Why? I thought "SaveChanges" would clear the buffer making it faster as there is only 1 connection. Am I wrong?
Code 1:
for (var i = 0; i < 100000; i++)
{
using (var ctx = new MyDbContext())
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
Code 2:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
The first code snippet works faster as the same connection is taken from the pool every time, so there are no performance losses on its re-opening.
In the second case 100000 objects gradually are added to the context. A slow snapshot-based tracking is used (if no dynamic proxy). This leads to the detection if any changes in any of cached objects occured on each SaveChanges(). More and more time is spent by each subsequent iteration.
We recommend you to try the following approach. It should have a better performance than the mentioned ones:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
}
ctx.SaveChanges();
}
EDIT
If you use an approach with executing large number of operations within one SaveChanges(), it will be useful to configure additionally the Entity Framework behaviour of Devart dotConnect for Oracle provider:
// Turn on the Batch Updates mode:
var config = OracleEntityProviderConfig.Instance;
config.DmlOptions.BatchUpdates.Enabled = true;
// If necessary, enable the mode of re-using parameters with the same values:
config.DmlOptions.ReuseParameters = true;
// If object has a lot of nullable properties, and significant part of them are not set (i.e., nulls), omitting explicit insert of NULL-values will decrease greatly the size of generated SQL:
config.DmlOptions.InsertNullBehaviour = InsertNullBehaviour.Omit;
Only some options are mentioned here. The full list of them is available in our article:
http://www.devart.com/blogs/dotconnect/index.php/new-features-of-entity-framework-support-in-dotconnect-providers.html
Am I wrong to assume that when SaveChanges() is called, all the
objects in cache are stored to DB and the cache is cleared, so each
loop is independent?
SaveChanges() sends and commits all changes to database, but change tracking is continued for all entities which are attached to the context. And new SaveChanges, if snapshot-based change tracking is used, will start a long process of checking (changed or not?) the values of each property for each object.

ADO.NET Entity Data Model - order of executing query

When I run this code:
korlenEntities2 _db = new korlenEntities2();
for (int i = 0; i < 10; i++)
{
klienci klient = new klienci();
klient.nazwa = "Janek_" + i.ToString();
klient.miejscowosc = "-";
_db.AddToklienci(klient);
};
_db.SaveChanges();
records are added to database in random order, so my field ID is not filled correctly. this is important to me since I want to use it for later ordering
You cannot control the order of query execution unless you call SaveChanges after every query. Nor can you depend on auto-incremented keys to be sequential in all cases (consider replication). If order is important, you should add a field for that.