I am trying to save hundreds of thousands of records using Entity framework. After saving few hundreds of thousands of records I get following error:
:System.OutOfMemoryException
My code
foreach (BibContent objbibcontents in lstBibContent)
{
db.BibContents.AddObject(objbibcontents);
c = c + 1;
if (c == 1000)
{
db.SaveChanges();
c = 0;
}
}
I noticed after saving 1000 records my db is not overriding another 1000 records. it is adding them into my dbcontext.
I am creating a new instance after 1000 records but my db still has the previous object's data. See my code
foreach (var objbibcontents in lstBibContent)
{
vibrantEntities db1 = new vibrantEntities(szConStr);
lstBibCon.Add(objbibcontents);
// db.BibContents.AddObject(objbibcontents);
c = c + 1;
if (c == 1000)
{
foreach (BibContent bibobject in lstBibCon)
{
db1.BibContents.AddObject(bibobject);
}
lstBibCon.Clear();
db1.SaveChanges();
c = 0;
flag = 1;
}
}
How many objects are you going to save and how big is single object? DbContext holds references to all objects you added with AddObject call. Calling SaveChanges does not purge its internal data structures so if you call your code for 1M objects you will have 1M object in memory and they will be fully alive because their root object for GC will be the context instance which is still in scope of your running code.
If you want to avoid the memory issue you should use a new context instance for every 1000 records (or even every record). The only difference between running SaveChanges for 1000 records and for single record is the automatically involved transaction.
I search on Web and Finally i found good solution for my problem.
Fastest Way of Inserting in Entity Framework
Related
I am working on .net core entity framework. I have two list of class type. One for update and other for new entry, adding new records all worked fine but which is achieved by context.[Model].Add but update which is done by context.[Model].Update throw exception update i know no record been updated as it is running on local.
$exception {Microsoft.EntityFrameworkCore.DbUpdateConcurrencyException: Database operation expected to affect 1 row(s) but actually affected 0 row(s). Data may have been modified or deleted since entities were loaded.
Code
List<AnswerDataModel> surveyResponseListToCreate = new
List<AnswerDataModel>();
List<AnswerDataModel> surveyResponseListToUpdate = new
List<AnswerDataModel>();
if (surveyResponseListToUpdate.Count > 0)
{
foreach (var answerObject in surveyResponseListToUpdate)
{
Context.Answers.Update(answerObject);
if (answerObject.AnswerOptions.Count > 0)
{
foreach (var optItem in answerObject.AnswerOptions)
{
AnswerOptionDataModel answOpt = new AnswerOptionDataModel();
answOpt = optItem;
Context.AnswerOptions.Update(answOpt);
}
}
}
}
var recordsAffected = Context.SaveChanges();
if (!UsingExternalTransaction)
{
FinalizeTransaction(recordsAffected);
}
I can't resist a quote:
"I do not think [your code] means what you think it means."
Assuming that surveyResponseListToUpdate was a list of entities previously loaded and modified:
if (answerObject.AnswerOptions.Count > 0) // Unnecessary...
{
foreach (var optItem in answerObject.AnswerOptions)
{
AnswerOptionDataModel answOpt = new AnswerOptionDataModel(); // does nothing.
answOpt = optItem; // references existing answer option..
Context.AnswerOptions.Update(answOpt);
}
}
The whole block boils down to:
foreach (var optItem in answerObject.AnswerOptions)
Context.AnswerOptions.Update(optItem);
The error you are likely running into is because Update will recurse through navigation properties automatically, so when the parent (Answer) is updated, it's AnswerOptions will be updated as well. So when you go through the extra steps to try and save answer options, they've already been updated when the answer was saved. Provided the Answer was loaded by the same context that you are saving it to, you should be in the clear with:
foreach (var answerObject in surveyResponseListToUpdate)
Context.Answers.Update(answerObject);
var recordsAffected = Context.SaveChanges();
This should update the answer and it's associated answer objects. Even if options were added or removed, the change tracking should do it's job and ensure all of the associated data records are updated.
The extra if checks and such aren't necessary and just add to nesting depth making code harder to read.
However, I suspect that your real code is doing something different to the example given that my tests where I tried to reproduce your error, the code worked fine even updating the child references after updating the parent. If the above still raises issues, please update your example with the code you are running.
I'm profiling my application locally (using the Dev server) to get more information about how GAE works. My tests are comparing the common full Entity query and the Projection Query. In my tests both queries do the same query, but the Projection is specified with 2 properties. The test kind has 100 properties, all with the same value for each Entity, with a total of 10 Entities. An image with the Datastore viewer and the Appstats generated data is shown bellow. In the Appstats image, Request 4 is a memcache flush, Request 3 is the test database creation (it was already created, so no costs here), Request 2 is the full Entity query and Request 1 is the projection query.
I'm surprised that both queries resulted in the same amount of reads. My guess is that small and read operations and being reported the same by Appstats. If this is the case, I want to separate them in the reports. That's the queries related functions:
// Full Entity Query
public ReturnCodes doQuery() {
DatastoreService dataStore = DatastoreServiceFactory.getDatastoreService();
for(int i = 0; i < numIters; ++i) {
Filter filter = new FilterPredicate(DBCreation.PROPERTY_NAME_PREFIX + i,
FilterOperator.NOT_EQUAL, i);
Query query = new Query(DBCreation.ENTITY_NAME).setFilter(filter);
PreparedQuery prepQuery = dataStore.prepare(query);
Iterable<Entity> results = prepQuery.asIterable();
for(Entity result : results) {
log.info(result.toString());
}
}
return ReturnCodes.SUCCESS;
}
// Projection Query
public ReturnCodes doQuery() {
DatastoreService dataStore = DatastoreServiceFactory.getDatastoreService();
for(int i = 0; i < numIters; ++i) {
String projectionPropName = DBCreation.PROPERTY_NAME_PREFIX + i;
Filter filter = new FilterPredicate(DBCreation.PROPERTY_NAME_PREFIX + i,
FilterOperator.NOT_EQUAL, i);
Query query = new Query(DBCreation.ENTITY_NAME).setFilter(filter);
query.addProjection(new PropertyProjection(DBCreation.PROPERTY_NAME_PREFIX + 0, Integer.class));
query.addProjection(new PropertyProjection(DBCreation.PROPERTY_NAME_PREFIX + 1, Integer.class));
PreparedQuery prepQuery = dataStore.prepare(query);
Iterable<Entity> results = prepQuery.asIterable();
for(Entity result : results) {
log.info(result.toString());
}
}
return ReturnCodes.SUCCESS;
}
Any ideas?
EDIT: To get a better overview of the problem I have created another test, which do the same query but uses the keys only query instead. For this case, Appstats is correctly showing DATASTORE_SMALL operations in the report. I'm still pretty confused about the behavior of the projection query which should also be reporting DATASTORE_SMALL operations. Please help!
[I wrote the go port of appstats, so this is based on my experience and recollection.]
My guess is this is a bug in appstats, which is a relatively unmaintained program. Projection queries are new, so appstats may not be aware of them, and treats them as normal read queries.
For some background, calculating costs is difficult. For write ops, the cost are returned with the results, as they must be, since the app has no way of knowing what changed (which is where the write costs happen). For reads and small ops, however, there is a formula to calculate the cost. Each appstats implementation (python, java, go) must implement this calculation, including reflection or whatever is needed over the request object to determine what's going on. The APIs for doing this are not entirely obvious, and there's lots of little things, so it's easy to get it wrong, and annoying to get it right.
In my symfony2 command, I am running a script that inserts hundreds of thousands of urls (as string) into a document.
Here are the basic structures of the 2 documents I'm using. Before the program is run, there are thousands of ParentDocuments already inside the mongodb, but zero ChildDocuments:
ParentDocument:
$id:id
$subDocument:OneToManyReference(ChildDocument)
$etc:everythingelse
ChildDocument:
$id:id
$url:string
$parentDocument:ManyToOneReference(ParentDocument)
And my Command code:
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
$parentDocuments = $dm->repository('My:Bundle:ParentDocument')->findAll();
while ($parentDocument = $parentDocuments->getNext()) {
//Returns an array of hundreds of thousands urls
$urls = $this->somehowFetchUrlsRelatedToTheParentDocument($parentDocument);
foreach ($urls as $url) {
$subDocument = new SubDocument();
$subDocument->setUrl($url);
$subDocument->setParentDocument($parentDocument);
$dm->persist($subDocument);
}
$dm->flush();
}
When I run this simple command, the write speed at first is incredibly fast. However, in the case of inserting millions of rows, the write speeds become significantly slower. As slow as 1 write per second after the command has been running for 10 minutes, making the code extremely ineffective.
My first attempt at fixing this problem was to clear the document manager right after it flushes using $dm->clear();
But this meant that the document manager would lose track of the current ParentDocument. So my solution was this:
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
$parentDocumentCursors = $dm->repository('My:Bundle:ParentDocument')->findAll();
$parentDocuments = array();
while ($parentDocument = $parentDocumentCursors->getNext()) {
array_push($parentDocuments, $parentDocument);
}
$dm->clear();
unset($dm);
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
foreach ($parentDocuments as $parentDocument) {
$urls = $this->somehowFetchUrlsRelatedToTheParentDocument($parentDocument);
foreach ($urls as $url) {
$subDocument = new SubDocument();
$subDocument->setUrl($url);
$subDocument->setParentDocument($parentDocument);
$dm->persist($subDocument);
}
$dm->flush();
$dm->clear();
}
This solved the problem. The write speeds were consistently fast throughout the whole execution of the program and millions of rows were able to be inserted without gradual delay.
However, this feels like a bad practice and a quick fix hack. What is the best practice for inserting millions of rows in Symfony2 using document manager without read/write speeds becoming slow?
I would avoid using Symfony's document manager and use the batchInsert() function directly. This is described in the documentation at http://php.net/manual/en/mongocollection.batchinsert.php It feels to me like Doctrine's ODM is actually hurting you here.
In order to do a bulk insert in doctrine you would need to move your flush outside of your loop. Consider the scenario below where you would persist in the foreach then flush when the foreach is completed. Your only catch will be that you will not be able to query any of the data being inserted in the batch until after the flush.
$dm = $this->getContainer()->get('doctrine_mongodb.odm.document_manager');
foreach ($parentDocuments as $parentDocument) {
$urls = $this->somehowFetchUrlsRelatedToTheParentDocument($parentDocument);
foreach ($urls as $url) {
$subDocument = new SubDocument();
$subDocument->setUrl($url);
$subDocument->setParentDocument($parentDocument);
$dm->persist($subDocument);
}
}
$dm->flush();
$dm->clear();
Another option is to do a a push,pushall, or addto set.
One issue to consider is you will need to use stdClass in php in order to add an object.
I find this to be the quickest way to update a subdocument.
For example:
$dm->createQueryBuilder('My:Bundle:ParentDocument')
->update()
->field('subDocument')->push( (object) array('url'=> $url) )
->field('id')->equals( $parentDocumentId )
->getQuery()
->execute();
I have employee details of more than 4000 employees. While retrieving those rows, I am faced with performance issues due to the looping. So what can I do to improve performance?
This is the looping I mentioned:
List<EmployeesEntityObject> lstEmployee = new List<EmployeesEntityObject>();
foreach (var item in lst)
{
EmployeesEntityObject obj = new EmployeesEntityObject();
obj.EmployeeID = item.EmployeeID;
obj.EmployeeName = item.EmployeeName;
lstEmployee.Add(obj);
}
You could try and see what happens when you completely "linqify" your code:
var lstEmployee = lst.Select(emp => new EmployeesEntityObject
{
EmployeeID = item.EmployeeID,
EmployeeName = item.EmployeeName
}).ToList();
But as marc_s says, there is no apparent cause for any performance issues in your code. Unless, as said, the constructor and/or the setters of the properties conceal time-consuming code. Both of which, by the way, would not be recommendable.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What is the best way to check if an object exists in the database from a performance point of view? I'm using Entity Framework 1.0 (ASP.NET 3.5 SP1).
If you don't want to execute SQL directly, the best way is to use Any(). This is because Any() will return as soon as it finds a match. Another option is Count(), but this might need to check every row before returning.
Here's an example of how to use it:
if (context.MyEntity.Any(o => o.Id == idToMatch))
{
// Match!
}
And in vb.net
If context.MyEntity.Any(function(o) o.Id = idToMatch) Then
' Match!
End If
From a performance point of view, I guess that a direct SQL query using the EXISTS command would be appropriate. See here for how to execute SQL directly in Entity Framework: http://blogs.microsoft.co.il/blogs/gilf/archive/2009/11/25/execute-t-sql-statements-in-entity-framework-4.aspx
I had to manage a scenario where the percentage of duplicates being provided in the new data records was very high, and so many thousands of database calls were being made to check for duplicates (so the CPU sent a lot of time at 100%). In the end I decided to keep the last 100,000 records cached in memory. This way I could check for duplicates against the cached records which was extremely fast when compared to a LINQ query against the SQL database, and then write any genuinely new records to the database (as well as add them to the data cache, which I also sorted and trimmed to keep its length manageable).
Note that the raw data was a CSV file that contained many individual records that had to be parsed. The records in each consecutive file (which came at a rate of about 1 every 5 minutes) overlapped considerably, hence the high percentage of duplicates.
In short, if you have timestamped raw data coming in, pretty much in order, then using a memory cache might help with the record duplication check.
I know this is a very old thread but just incase someone like myself needs this solution but in VB.NET here's what I used base on the answers above.
Private Function ValidateUniquePayroll(PropertyToCheck As String) As Boolean
// Return true if Username is Unique
Dim rtnValue = False
Dim context = New CPMModel.CPMEntities
If (context.Employees.Any()) Then ' Check if there are "any" records in the Employee table
Dim employee = From c In context.Employees Select c.PayrollNumber ' Select just the PayrollNumber column to work with
For Each item As Object In employee ' Loop through each employee in the Employees entity
If (item = PropertyToCheck) Then ' Check if PayrollNumber in current row matches PropertyToCheck
// Found a match, throw exception and return False
rtnValue = False
Exit For
Else
// No matches, return True (Unique)
rtnValue = True
End If
Next
Else
// The is currently no employees in the person entity so return True (Unqiue)
rtnValue = True
End If
Return rtnValue
End Function
I had some trouble with this - my EntityKey consists of three properties (PK with 3 columns) and I didn't want to check each of the columns because that would be ugly.
I thought about a solution that works all time with all entities.
Another reason for this is I don't like to catch UpdateExceptions every time.
A little bit of Reflection is needed to get the values of the key properties.
The code is implemented as an extension to simplify the usage as:
context.EntityExists<MyEntityType>(item);
Have a look:
public static bool EntityExists<T>(this ObjectContext context, T entity)
where T : EntityObject
{
object value;
var entityKeyValues = new List<KeyValuePair<string, object>>();
var objectSet = context.CreateObjectSet<T>().EntitySet;
foreach (var member in objectSet.ElementType.KeyMembers)
{
var info = entity.GetType().GetProperty(member.Name);
var tempValue = info.GetValue(entity, null);
var pair = new KeyValuePair<string, object>(member.Name, tempValue);
entityKeyValues.Add(pair);
}
var key = new EntityKey(objectSet.EntityContainer.Name + "." + objectSet.Name, entityKeyValues);
if (context.TryGetObjectByKey(key, out value))
{
return value != null;
}
return false;
}
I just check if object is null , it works 100% for me
try
{
var ID = Convert.ToInt32(Request.Params["ID"]);
var Cert = (from cert in db.TblCompCertUploads where cert.CertID == ID select cert).FirstOrDefault();
if (Cert != null)
{
db.TblCompCertUploads.DeleteObject(Cert);
db.SaveChanges();
ViewBag.Msg = "Deleted Successfully";
}
else
{
ViewBag.Msg = "Not Found !!";
}
}
catch
{
ViewBag.Msg = "Something Went wrong";
}
Why not do it?
var result= ctx.table.Where(x => x.UserName == "Value").FirstOrDefault();
if(result?.field == value)
{
// Match!
}
Best way to do it
Regardless of what your object is and for what table in the database the only thing you need to have is the primary key in the object.
C# Code
var dbValue = EntityObject.Entry(obj).GetDatabaseValues();
if (dbValue == null)
{
Don't exist
}
VB.NET Code
Dim dbValue = EntityObject.Entry(obj).GetDatabaseValues()
If dbValue Is Nothing Then
Don't exist
End If