i have a grails job which is updating the totalSellCount of a product, for which i run a loop , i have a map productTotalSellCount which have the identifier of each product with its total sell count , now i am iterating the loop to update all product sell count like this
productTotalSellCount.each { k,v ->
Product product = Product.findByIdentifier(k)
product.totalSellCount = productTotalSellCount.get(k)
product.save(flush: true)
}
i have around 50k products , and this is a daily schedule job and it always fails , help !!
Instead of using GORM, you should use batch updates: for example:
org.hibernate.StatelessSession session = grails.util.Holders.applicationContext.sessionFactory.openStatelessSession()
org.hibernate.Transaction tx = session.beginTransaction()
groovy.sql.Sql sql = new groovy.sql.Sql(session.connection())
//Create batch of 100 update statements before executing them to db
sql.withBatch(100, "update product set totalSellCount = :val0 where identifier = :val1") {
groovy.sql.BatchingStatementWrapper stmt ->
productTotalSellCount.each { identifier, value ->
stmt.addBatch(val0: value, val1: identifier)
}
}
tx.commit()
session.close()
You can also try parallel execution using GPars.
Related
I am having trouble with en EF method returning duplicate rows of data. When I am running this, in my example, it returns four rows from a database view. The fourth row includes details from the third row.
The same query in SSMS returns four individual rows with the correct details. I have read somewhere about EK and problems with optimization when there are no identity column. But - is there anyway to alter the below code to force EK to read all records with all details?
public List<vs_transactions> GetTransactionList(int cID)
{
using (StagingDataEntities db = new StagingDataEntities())
{
var res = from trans in db.vs_transactions
where trans.CreditID == cID
orderby trans.ActionDate descending
select trans;
return res.ToList();
}
}
Found the solution :) MergeOption.NoTracking
public List<vs_transactions> GetTransactionList(int cID)
{
db.vs_transactions.MergeOption = MergeOption.NoTracking;
using (StagingDataEntities db = new StagingDataEntities())
{
var res = from trans in db.vs_transactions
where trans.CreditID == cID
orderby trans.ActionDate descending
select trans;
return res.ToList();
}
}
So the context is that I require to update a value in a single document, I have a Mono, the parameter Object contains values such as username (to find the correct user by unique username) and an amount value.
The problem is that this value (due to other components of my application) is the value by which I need to increase/decrease the users balance, as opposed to passing a new balance. I intend to do this using two Monos where one finds the user, then this is combined to the other Mono with the inbound request, where I can then perform a simple sum (i.e balance + changeRequest.amount) then return this to the document database.
override fun increaseBalance(changeRequest: Mono<ChangeBalanceRequestResource>): Mono<ChangeBalanceResponse> {
val changeAmount: Mono<Decimal128> = changeRequest.map { it.transactionAmount }
val user: Mono<User> = changeRequest.flatMap { rxUserRepository.findByUsername(it.username)
val newBalace = user.map {
val r = changeAmount.block()
it.balance = sumBalance(it.balance!!, r!!)
rxUserRepository.save(it)
}
.flatMap { it }
.map { it.balance!! }
return Mono.just(ChangeBalanceResponse("success", newBalace.block()!!))
}
Obviously I'm trying to achieve this in a non-blocking fashion. I'm also open to using only a single Mono if that's possible/optimal. I also appreciate I've truly butchered the example and used .block as a placeholder to illustrate what I'm trying to achieve.
P.S this is my first post, so any tips on how to express my problem clearer would be useful.
Here's how I would do this in Java (Using Double instead of Decimal128):
public Mono<ChangeBalanceResponse> increaseBalance(Mono<ChangeBalanceRequestResource> changeRequest) {
Mono<Double> changeAmount = changeRequest.map(a -> a.transactionAmount());
Mono<User> user = changeRequest.map(a -> a.username()).flatMap(RxUserRepository::findByUsername);
return Mono.zip(changeAmount,user).flatMap(t2 -> {
Double changeAmount = t2.getT1();
User user = t2.getT2();
//assumes User is chained
return rxUserRepository.save(user.balance(sumBalance(changeAmount,user.balance())));
}).map(res -> new ChangeBalanceResponse("success",res.newBalance()))
}
I am populating a Titan 1.0.0 single instance with a moderate graph in order to test its query performance. I am using Cassandra 2.0.17 as storage backend.
The thing is I am not able to create node indexes, and hence query results optimally. I have read the docs and I am trying to follow them carefully without much success. I am using the following groovy script for the schema definition, data population and index creation:
import com.thinkaurelius.titan.core.*;
import com.thinkaurelius.titan.core.schema.*;
import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem;
import java.time.temporal.ChronoUnit;
graph = TitanFactory.open('conf/my-titan.properties');
mgmt = graph.openManagement();
// Build graph schema
// Node properties
idProp = mgmt.containsPropertyKey('userId') ?
mgmt.getPropertyKey('userId') : mgmt.makePropertyKey('id').dataType(String.class).cardinality(Cardinality.SINGLE);
isPublicProp = mgmt.containsPropertyKey('isPublic') ?
mgmt.getPropertyKey('isPublic') : mgmt.makePropertyKey('isPublic').dataType(Boolean.class).cardinality(Cardinality.SINGLE);
completionPercentageProp = mgmt.containsPropertyKey('completionPercentage') ?
mgmt.getPropertyKey('completionPercentage') : mgmt.makePropertyKey('completionPercentage').dataType(Integer.class).cardinality(Cardinality.SINGLE);
genderProp = mgmt.containsPropertyKey('gender') ?
mgmt.getPropertyKey('gender') : mgmt.makePropertyKey('gender').dataType(String.class).cardinality(Cardinality.SINGLE);
regionProp = mgmt.containsPropertyKey('region') ?
mgmt.getPropertyKey('region') : mgmt.makePropertyKey('region').dataType(String.class).cardinality(Cardinality.SINGLE);
lastLoginProp = mgmt.containsPropertyKey('lastLogin') ?
mgmt.getPropertyKey('lastLogin') : mgmt.makePropertyKey('lastLogin').dataType(String.class).cardinality(Cardinality.SINGLE);
registrationProp = mgmt.containsPropertyKey('registration') ?
mgmt.getPropertyKey('registration') : mgmt.makePropertyKey('registration').dataType(String.class).cardinality(Cardinality.SINGLE);
ageProp = mgmt.containsPropertyKey('age') ? mgmt.getPropertyKey('age') : mgmt.makePropertyKey('age').dataType(Integer.class).cardinality(Cardinality.SINGLE);
mgmt.commit();
nUsers = 0
println 'Starting nodes population...';
// Load users
new File('/home/jarandaf/soc-pokec-profiles.txt').eachLine {
try {
fields = it.split('\t').take(8);
userId = fields[0];
isPublic = fields[1] == '1' ? true : false;
completionPercentage = fields[2]
gender = fields[3] == '1' ? 'male' : 'female';
region = fields[4];
lastLogin = fields[5];
registration = fields[6];
age = fields[7] as int;
graph.addVertex('userId', userId, 'isPublic', isPublic, 'completionPercentage', completionPercentage, 'gender', gender, 'region', region, 'lastLogin', lastLogin, 'registration', registration, 'age', age);
} catch (Exception e) {
// Silently skip...
}
nUsers += 1
if (nUsers % 100000 == 0) println String.valueOf(nUsers) + ' loaded...';
};
graph.tx().commit();
println 'Nodes population finished';
// Index users by userId, gender and age
println 'Getting node properties...';
mgmt = graph.openManagement();
userId = mgmt.getPropertyKey('userId');
gender = mgmt.getPropertyKey('gender');
age = mgmt.getPropertyKey('age');
println 'Building byUserId index...';
if (mgmt.getGraphIndex('byUserId') == null) mgmt.buildIndex('byUserId', Vertex.class).addKey(userId).buildCompositeIndex();
println 'Building byGender index...';
if (mgmt.getGraphIndex('byGender') == null) mgmt.buildIndex('byGender', Vertex.class).addKey(gender).buildCompositeIndex();
println 'Building byAge index...';
if (mgmt.getGraphIndex('byAge') == null) mgmt.buildIndex('byAge', Vertex.class).addKey(age).buildCompositeIndex();
mgmt.commit();
// Wait for the indexes to become available
println 'Awaiting byUserId graph index status...';
ManagementSystem.awaitGraphIndexStatus(graph, 'byUserId')
.status(SchemaStatus.REGISTERED)
.timeout(10, ChronoUnit.MINUTES)
.call();
println 'Awaiting byGender graph index status...';
ManagementSystem.awaitGraphIndexStatus(graph, 'byGender')
.status(SchemaStatus.REGISTERED)
.timeout(10, ChronoUnit.MINUTES)
.call();
println 'Awaiting byAge graph index status...';
ManagementSystem.awaitGraphIndexStatus(graph, 'byAge')
.status(SchemaStatus.REGISTERED)
.timeout(10, ChronoUnit.MINUTES)
.call();
// Reindex the existing data
mgmt = graph.openManagement();
println 'Reindexing data by byUserId index...';
mgmt.updateIndex(mgmt.getGraphIndex('byUserId'), SchemaAction.REINDEX).get();
println 'Reindexing data by byGender index...';
mgmt.updateIndex(mgmt.getGraphIndex('byGender'), SchemaAction.REINDEX).get();
println 'Reindexing data by byAge index...';
mgmt.updateIndex(mgmt.getGraphIndex('byAge'), SchemaAction.REINDEX).get();
mgmt.commit();
// Enable indexes
println 'Enabling byUserId index...'
mgmt.awaitGraphIndexStatus(graph, 'byUserId').status(SchemaStatus.ENABLED).call();
println 'Enabling byGender index...'
mgmt.awaitGraphIndexStatus(graph, 'byGender').status(SchemaStatus.ENABLED).call();
println 'Enabling byAge index...'
mgmt.awaitGraphIndexStatus(graph, 'byAge').status(SchemaStatus.ENABLED).call();
graph.close();
The error I am getting is the following and is related with the reindex phase:
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger - Evicted [2#0ac717511509-mybox] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]]
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger - Evicted [3#0ac717511509-mybox] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]]
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger - Evicted [4#0ac717511509-mybox] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]]
Any hints on this would be much appreciated.
The errors you get indicate that you have open transactions when you try to modify the schema. Titan needs to wait for all transactions to complete before it can modify the schema. See the answer from Matthias Broecheler on the mailing list for more information.
In general, you should avoid reindexing if possible as it requires Titan to walk over all vertices to see whether they need to be added to the index that should be updated. The documentation contains more information about this process.
For your use case, you can simply create all indexes before you load any data. When you then add the data after all indexes are ready, they will be simply added to the indexes. That way, you should be able to use the indexes immediately.
A minimal example for the schema creation in Groovy (but it should be basically the same in Java):
import com.thinkaurelius.titan.core.TitanFactory;
import com.thinkaurelius.titan.core.Multiplicity;
import com.thinkaurelius.titan.core.Cardinality;
graph = TitanFactory.open('conf/my-titan.properties')
mgmt = graph.openManagement()
id = mgmt.makePropertyKey('id').dataType(String.class).cardinality(Cardinality.SINGLE)
// some other properties that will not be indexed
mgmt.makePropertyKey('isPublic').dataType(Boolean.class).cardinality(Cardinality.SINGLE)
mgmt.makePropertyKey('completionPercentage').dataType(Integer.class).cardinality(Cardinality.SINGLE)
// I prefer to use vertex labels to differentiate between different 'types' of vertices but this isn't necessary
User = mgmt.makeVertexLabel('User').make()
mgmt.buildIndex('UserById',Vertex.class).addKey(id).indexOnly(user).buildCompositeIndex()
mgmt.commit()
I removed all the checks for already existing schema elements for simplicity, but you can of course add them again.
After the schema creation, you can add your data just like before.
A final node about index management: Try to always define the property keys that you want to index in the same transaction in which you create the index. Otherwise, Titan cannot know whether there is already data that needs to be added to the new index which requires again a complete scan of all data. This might require to choose a different name for a property. When you add for example a new vertex label post, then you might want to use a new name like postId instead of using the property id again to avoid the scan of all existing data.
please help me in converting my after trigger to batch apex.
This trigger fires when opportunity stage changes to won.
It runs through line items and checks if forecast(custom objet) exists with that acunt.if yes,iit links to them..if no,itt will create a new forecat.
my trigger works fine forr some records.but to mass update i am getting timed out error.So opting batch apex but i had never written it.pls help me.
trigger Accountforecast on Opportunity (after insert,after update) {
List<Acc_c> AccproductList =new List<Acc_c>();
List<Opportunitylineitem> opplinitemlist =new List<Opportunitylineitem>();
list<opportunitylineitem > oppdate= new list<opportunitylineitem >();
List<Acc__c> accquery =new List<Acc__c>();
List<date> dt =new List<date>();
Set<Id> sProductIds = new Set<Id>();
Set<Id> sAccountIds = new Set<Id>();
Set<id> saccprodfcstids =new set<Id>();
Acc__c accpro =new Acc__c();
string aname;
Integer i;
Integer myIntMonth;
Integer myIntyear;
Integer myIntdate;
opplinitemlist=[select Id,PricebookEntry.Product2.Name,opp_account__c,Opp_account_name__c,PricebookEntry.Product2.id, quantity,ServiceDate,Acc_Product_Fcst__c from Opportunitylineitem WHERE Opportunityid IN :Trigger.newMap.keySet() AND Acc__c=''];
for(OpportunityLineItem oli:opplinitemlist) {
sProductIds.add(oli.PricebookEntry.Product2.id);
sAccountIds.add(oli.opp_account__c);
}
accquery=[select id,Total_Qty_Ordered__c,Last_Order_Qty__c,Last_Order_Date__c,Fcst_Days_Period__c from Acc__c where Acc__c.product__c In :sproductids and Acc__c.Account__c in :saccountids];
for(Acc__c apf1 :accquery){
saccprodfcstids.add(apf1.id);
}
if(saccprodfcstids!=null){
oppdate=[select servicedate from opportunitylineitem where Acc__c IN :saccprodfcstids ];
i =[select count() from Opportunitylineitem where acc_product_fcst__c in :saccprodfcstids];
}
for(Opportunity opp :trigger.new)
{
if(opp.Stagename=='Closed Won')
{
for(opportunitylineitem opplist:opplinitemlist)
{
if(!accquery.isempty())
{
for(opportunitylineitem opldt :oppdate)
{
string myDate = String.valueOf(opldt);
myDate = myDate.substring(myDate.indexof('ServiceDate=')+12);
myDate = myDate.substring(0,10);
String[] strDate = myDate.split('-');
myIntMonth = integer.valueOf(strDate[1]);
myIntYear = integer.valueOf(strDate[0]);
myIntDate = integer.valueOf(strDate[2]);
Date d = Date.newInstance(myIntYear, myIntMonth, myIntDate);
dt.add(d);
}
dt.add(opp.closedate);
dt.sort();
integer TDays=0;
system.debug('*************dt:'+dt.size());
for(integer c=0;c<dt.size()-1;c++)
{
TDays=TDays+dt[c].daysBetween(dt[c+1]);
}
for(Acc_product_fcst__c apf:accquery)
{
apf.Fcst_Days_Period__c = TDays/i;
apf.Total_Qty_Ordered__c =apf.Total_Qty_Ordered__c +opplist.quantity;
apf.Last_Order_Qty__c=opplist.quantity;
apf.Last_Order_Date__c=opp.CloseDate ;
apf.Fcst_Qty_Avg__c=apf.Total_Qty_Ordered__c/(i+1);
Opplist.Acc__c =apf.Id;
}
}
else{
accpro.Account__c=opplist.opp_account__c;
accpro.product__c=opplist.PricebookEntry.Product2.Id;
accpro.opplineitemid__c=opplist.id;
accpro.Total_Qty_Ordered__c =opplist.quantity;
accpro.Last_Order_Qty__c=opplist.quantity;
accpro.Last_Order_Date__c=opp.CloseDate;
accpro.Fcst_Qty_Avg__c=opplist.quantity;
accpro.Fcst_Days_Period__c=7;
accproductList.add(accpro);
}
}
}
}
if(!accproductlist.isempty()){
insert accproductlist;
}
update opplinitemlist;
update accquery;
}
First of all, you should take a look at this: Apex Batch Processing
Once you get a better idea on how batches work, we need to take into account the following points:
Identify the object that requires more processing. Account? Opportunity?
Should the data be maintained across batch calls? Stateful?
Use correct data structure in terms of performance. Map, List?
From your code, we can see you have three objects: OpportunityLineItems, Accounts, and Opportunities. It seems that your account object is using the most processing here.
It seems you're just keeping track of dates and not doing any aggregations. Thus, you don't need to maintain state across batch calls.
Your code has a potential of hitting governor limits, especially memory limits on the heap. You have a four-nested loop. Our suggestion would be to maintain opportunity line items related to Opportunities in a Map rather than in a List. Plus, we can get rid of those unnecessary for loops by refactoring the code as follows:
Note: This is just a template for the batch you will need to construct.
globalglobal Database.QueryLocator start(Database.BatchableContext BC) class AccountforecastBatch implements Database.Batchable<sObject>
{
global Database.QueryLocator start(Database.BatchableContext BC)
{
// 1. Do some initialization here: (i.e. for(OpportunityLineItem oli:opplinitemlist) {sProductIds.add(oli.PricebookEntry.Product2.id)..}
// 2. return Opportunity object here: return Database.getQueryLocator([select id,Total_Qty_Ordered__c,Last_Order_Qty ....]);
}
global void execute(Database.BatchableContext BC, List<sObject> scope)
{
// 1. Traverse your scope which at this point will be a list of Accounts
// 2. You're adding dates inside the process for Opportunity Line Items. See if you can isolate this process outside the for loops with a Map data structure.
// 3. You have 3 potential database transactions here (insert accproductlist;update opplinitemlist; update accquery; ). Ideally, you will only need one DB transaction per batch.If you can complete step 2 above, you might only need to update your opportunity line items. Otherwise, you're trying to do more than one thing in a method and you will need to redesign your solution
}
global void finish(Database.BatchableContext BC)
{
// send email or do some other tasks here
}
}
I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:
Which of these makes the most sense:
Run SaveChanges() every AddToClassName() call.
Run SaveChanges() every n number of AddToClassName() calls.
Run SaveChanges() after all of the AddToClassName() calls.
The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.
I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.
The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?
I would test it first to be sure. Performance doesn't have to be that bad.
If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.
Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.
Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).
EDIT
I've done some tests (time in miliseconds):
10000 rows:
SaveChanges() after 1 row:18510,534SaveChanges() after 100 rows:4350,3075SaveChanges() after 10000 rows:5233,0635
50000 rows:
SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765
So it is actually faster to commit after n rows than after all.
My recommendation is to:
SaveChanges() after n rows.
If one commit fails, try it one by one to find faulty row.
Test classes:
TABLE:
CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Class:
public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private string RandomString(int size)
{
var randomSize = _rng.Next(size);
char[] buffer = new char[randomSize];
for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}
public ActionResult EFPerformance()
{
string result = "";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();
return Content(result);
}
private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = #"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}
private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;
using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);
if (i % commitAfterRows == 0) context.SaveChanges();
}
}
var endDate = DateTime.Now;
return endDate.Subtract(startDate);
}
}
I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.
This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.
If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.
MSDN docs on SqlBulkCopy
Use SqlBulkCopy to Quickly Load Data from your Client to SQL Server
Transferring Data Using SqlBulkCopy
Use a stored procedure.
Create a User-Defined Data Type in Sql Server.
Create and populate an array of this type in your code (very fast).
Pass the array to your stored procedure with one call (very fast).
I believe this would be the easiest and fastest way to do this.
Sorry, I know this thread is old, but I think this could help other people with this problem.
I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.
// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
// Delete invalid User or Change
var chUser = (db_User) ch.Entry.Entity;
if (chUser.LastUpdated == null)
{
// Invalid, new User
_userDatabase.db_User.Remove(chUser);
Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
}
else
{
// Invalid Change of an Entry
_userDatabase.Entry(chUser).Reload();
Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
}
}
_userDatabase.SaveChanges();