AddObject() method very slow - entity-framework

AddObject() method very slow - entity-framework

I have a strange problem, when I try to use ObjectSet(TEntity).AddObject() to add a new Entity it is very slow compared to the normal behavior, it takes like 500ms and it happens just in one occasion and I am sure that the instruction which take some time is the AddObject and not the SaveChanges.
The code is very simple
ObjectEntity obj = businessObj.ExtractEntityObj();
context.ObjectTable.AddObject(ObjectEntity);
I do some work to the ObjectContext before to call this part of code but it's not much more than what I do in all the other operations, It seems that it depends on how many items with the same foreign key are stored in database, I know it could sound weird but it's the only variable that makes change the performance of this same instruction. I have also noticed that if I try that on a new DataContext the performance goes back to normal (few milliseconds) but for architectural problem I can't have new DataContext each time I have to accomplish something.
Could someone help me?
Thanks

I resolved my issue, maybe could be helpfull to someone, it was due to a simple query I was doing before that instruction, this one:
int id = context.ObjectTable.Where(t=>t.Something = somethingElse).FirstOrDefault().ID
I still don't know why but this instruction attaches all "ObjectTable" data to the context making it very slow, "ObjectTable" is also the same table in which I was adding the new entity.
Setting the ObjectQuery MergeOption to NoTracking I have resolved the problem
Bye

Modify your query
int id = context.ObjectTable.Where(t=>t.Something = somethingElse).FirstOrDefault().ID
to
int id = context.ObjectTable.FirstOrDefault(t=>t.Something = somethingElse).ID
It should't have any significant performance improvement but why use Where() and then FirsOrDefault().

Related

Entity framework performance issue, saveChanges is very slow

Recently, I am doing some simple EF work. Very simple,
first,
List<Book> books = entity.Books.WHERE(c=>c.processed==false)ToList();
then
foreach(var book in books)
{
//DoSomelogic, update some properties,
book.ISBN = "ISBN " + randomNumber();
book.processed = true;
entity.saveChanges(book)
}
I put entity.saveChanges within the foreach because it is a large list, around 100k records, and if this record is processed without problem, then flag this record, set book.processed = true, if the process is interrupted by exception, then next time, I don't have to process these good records again.
Everything seems ok to me. It is fast when you process hundreds of records. Then when we move to 100k records, entity.saveChanges is very very slow. around 1-3 seconds per record. Then we keep the entity model, but replace entity.saveChanges with classic SqlHelper.ExecuteNonQuery("update_book", sqlparams). And it is very fast.
Could anyone tell me why entity framework process that slow? And if I still want to use entity.saveChanges, what is the best way to improve the performance?
Thank you

Turn off change tracking before you perform your inserts. This will improve your performance significantly (magnitudes of order). Putting SaveChanges() outside your loop will help as well, but turning off change tracking will help even more.
using (var context = new CustomerContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
// A loop to add all your new entities
context.SaveChanges();
}
See this page for some more information.

I would take the SaveChanges(book) outside of the foreach. Since book is on the entity as a list, you can put this outside and EF will work better with the resulting code.
The list is an attribute on the entity, and EF is designed to optimize updates/creates/deletes on the back end database. If you do this, I'd be curious whether it helps.

I too may advise you to take the SaveChanges() out of the loop, as it does 'n' number of updates to the database, thus the context will have 'n' times to iterate through the checkpoints and validations required.
var books = entity.Books.Where(c => c.processed == false).ToList();
books.Foreach(b =>
{
b.ISBN = "ISBN " + randomNumber();
b.processed = true;
//DoSomelogic, update some properties
});
entity.SaveChanges();

The Entity Framework, in my opinion, is a poor choice for BULK operations both from a performance and a memory consumption standpoint. Once you get beyond a few thousand records, the SaveChanges method really starts to break down.
You can try to partition your work over into smaller transactions, but again, I think you're working too hard to create this.
A much better approach is to leverage the BULK operations that are already provided by your DBMS. SQL Server provides BULK COPY via .NET. Oracle provides BULK COPY for the Oracle.DataAccess or unmanaged data access. For Oracle.ManagedDataAccess, the BULK COPY library unfortunately isn't available. But I can create an Oracle Stored Procedure using BULK COLLECT/FOR ALL that allows me to insert thousands of records within seconds with a much lower memory footprint within your application. Within the .NET app you can implement PLSQL Associative arrays as parameters, etc.
The benefit of leveraging the BULK facilities within your DBMS is reducing the context switches between your application, the query processor and the database engine.
I'm sure other database vendors provide something similar.

"AsNoTracking" works for me
ex:
Item itemctx = ctx.Items.AsNoTracking().Single(i=>i.idItem == item.idItem);
ctx.Entry(itemctx).CurrentValues.SetValues(item);
itemctx.images = item.images;
ctx.SaveChanges();
Without "AsNoTracking" the updates is very slow.

I just execute the insert command directly.
//the Id property is the primary key, so need to have this update automatically
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] ON");
foreach (var p in itemsToSave)
{
db.Database.ExecuteSqlCommand("INSERT INTO[dbo].[MyTable]([Property1], [Property2]) VALUES(#Property1, #Property2)",
new SqlParameter("#Property1", p.Property1),
new SqlParameter("#Property2", p.Property2)"
}
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] OFF");
Works really quickly. EF is just impossibly slow with bulk updates, completely unusable in my case for more than a dozen or so items.

Use this Nuget package:
Z.EntityFramework.Extensions
It has extension methods that you can call on the DbContext like DbContext.BulkSaveChanges which works amazingly fast.
Note: this is NOT a free package but it has a trial period.

Entity Framework 5 without transactions

I want to set entity framework so that SavesChanges() doesn't wrap all its generated SQL with a BEGIN and a COMMIT. I came across this suggestion from a related SO question:
using( var transation = new TransactionScope(TransactionScopeOption.Suppress) )
{
ObjectContext.SaveChanges();
}
But that doesn't work for me. We are using the MYSQL connector, so I did do some digging by stepping through the source code of that, and I found EntityFramework is the one that is asking for a new transaction to be created, and not the fault of the connector see the following stack trace.
Next I started looking through EF's source code (EF 6 is open source, and the relevant code looks the same in the decompiler)
var needLocalTransaction = false;
var connection = (EntityConnection)Connection;
if (connection.CurrentTransaction == null
&& !connection.EnlistedInUserTransaction
&& _lastTransaction == null)
{
needLocalTransaction = startLocalTransaction;
}
...
if (needLocalTransaction)
{
localTransaction = connection.BeginTransaction();
}
Okay, so it looks if a transaction doesn't exist it creates one itself. And stepping further to how the current transaction is set up in EF I get to the code where EF sets up the connection
var currentTransaction = Transaction.Current;
EnsureContextIsEnlistedInCurrentTransaction(
currentTransaction,
() =>
{
Connection.Open();
_openedConnection = true;
_connectionRequestCount++;
return true;
},
false);
And this seems to be the only place where the "TransactionScopeOption.Suppress" line comes in to play, however all that does is set the ambient transaction (Transaction.Current) to null. Forcing EF not to see a transaction and doing the opposite of what I want and creating a new transaction.
Has anyone else had any luck with turning off transaction in EF 5, or is my only solution to hack and build my own version of the sql connector?
Thanks!

EF5 is hard-coded to wrap INSERTS and UPDATES into transactions. And in almost all cases, this is the desired behavior when batching writes of multiple entities.
If you want to change that you will have to write new code for yourself, compile it to an appropriate DLL, and reference the new code instead. A lot of work to get around a very important safety feature. I don't recommend it.
The question is why do you want to disable transactions so much?
Is it causing performance problems? Do you find the extra safety of the transaction to be detrimental to the business process?
It sounds like you don't care if writes are lost, phantom reads, etc. happen (which I assume because you are wanting to skip transactions all together), which is a bad idea.
If that's not the case (I hope) and you're worried about performance, maybe your transaction level isolation is too high for what you want? You may want to wrap your other calls in transactions that set a much lower (less safe, but less locking so better performance) isolation level like READ UNCOMMITTED?

Entity Framework 5 Get All method

In our EF implementation for Brand New Project, we have GetAll method in Repository. But, as our Application Grows, we are going to have let's say 5000 Products, Getting All the way it is, would be Performance Problem. Wouldn't it ?
if So, what is good implementation to tackle this future problem?
Any help will be hightly appreciated.

It could become a performance problem if get all is enumerating on the collection, thus causing the entire data set to be returned in an IEnumerable. This all depends on the number of joins, the size of the data set, if you have lazy loading enabled, how SQL Server performs, just to name a few.
Some people will say this is not desired, you could have GetAll() return an IQueryable which would defer the query until something caused the collection to be filled by going to SQL. That way you could filter the results with other Where(), Take(), Skip(), etc statements to allow for paging without having to retrieve all 5000+ products from the database.

It depends on how your repository class is set up. If you're performing the query immediately, i.e. if your GetAll() method returns something like IEnumerable<T> or IList<T>, then yes, that could easily be a performance problem, and you should generally avoid that sort of thing unless you really want to load all records at once.
On the other hand, if your GetAll() method returns an IQueryable<T>, then there may not be a problem at all, depending on whether you trust the people writing queries. Returning an IQueryable<T> would allow callers to further refine the search criteria before the SQL code is actually generated. Performance-wise, it would only be a problem if developers using your code didn't apply any filters before executing the query. If you trust them enough to give them enough rope to hang themselves (and potentially take your database performance down with them), then just returning IQueryable<T> might be good enough.
If you don't trust them, then, as others have pointed out, you could limit the number of records returned by your query by using the Skip() and Take() extension methods to implement simple pagination, but note that it's possible for records to slip through the cracks if people make changes to the database before you move on to the next page. Making pagination work seamlessly with an ever-changing database is much harder than a lot of people think.
Another approach would be to replace your GetAll() method with one that requires the caller to apply a filter before returning results:
public IQueryable<T> GetMatching<T>(Expression<Func<T, bool>> filter)
{
// Replace myQuery with Context.Set<T>() or whatever you're using for data access
return myQuery.Where(filter);
}
and then use it like var results = GetMatching(x => x.Name == "foo");, or whatever you want to do. Note that this could be easily bypassed by calling GetMatching(x => true), but at least it makes the intention clear. You could also combine this with the first method to put a firm cap on the number of records returned.
My personal feeling, though, is that all of these ways of limiting queries are just insurance against bad developers getting their hands on your application, and if you have bad developers working on your project, they'll find a way to cause problems no matter what you try to do. So my vote is to just return an IQueryable<T> and trust that it will be used responsibly. If people abuse it, take away the GetAll() method and give them training-wheels methods like GetRecentPosts(int count) or GetPostsWithTag(string tag, int count) or something like that, where the query logic is out of their hands.

One way to improve this is by using pagination
context.Products.Skip(n).Take(m);

What your referring to is known as paging, and it's pretty trivial to do using LINQ via the Skip/Take methods.
EF queries are lazily loaded which means they won't actually hit the database until they are evaluated so the following would only pull the first 10 rows after skipping the first 10
context.Table.Skip(10).Take(10);

Entity framework saving child objects

Im pretty new to entity framework and am having some trouble adjusting. Like all things, in the beginning i saw how much simpler it made my life for CRUD operations and thought it was great. However as my object model grew slightly more complex i began to have more and more issues with it.
Although I have managed to find answers to virtually all my questions by searching on here to date, my current one stumps me.
I have two entities which are linked by a 1 to 1/0 relationship. I cant post images yet so please fogive my dodgy drawing below:
Person (id, name, address, dob, etc. etc.)
Spouse (id, name, address, dob, etc. etc.)
Whilst the actual is more complex I don't think its part of my problem.
Now, seeing this is a 1 to 0/1 relationship either a person has 1 spouse, or they have 0. If i construct my object by:
Person person = new Person();
person.Spouse = new Spouse();
person.Spouse = spouse;
(imagine property setting included)
and then save it. It works a treat. I can then load it again, edit it, etc. etc. Life is grand.
Where I run into problems is where I save a person (no spouse), and then load the person for editing at a later stage and try to attach a spouse. When i get to my container.Attach(person); call, it throws the following exception: "An object with a temporary EntityKey value cannot be attached to an object context."
Now, I am extremely confident this is due to the way im adding the spouse, ie:
Person person = LoadPerson(id);
Spouse spouse = new Spouse();
person.Spouse1 = spouse;
The problem im seeing is im trying to now associate a child entity which was not previously associated. I have done a lot of searching on the error message but generally it seems to relate to the object context. Also i have found a work around (from the results i have found). If I use a 1 to many relationship and do Person.Spouses.Add(spouse) it works fine. However im hesitant to do so as the relationship is not logical (in this country anyway...). Im sure its a simple answer which i have obviously overlooked in the results i have seen (i highly doubt im the first to try and do this...) but nothing i have tried seems to work.
Any help would be highly appreciated...

You probably need to add the Spouse to the Context first
Context.Spouses.Add(spouse);
Before you attach it to the Person.

How to keep track of objects deleted from an ObservableCollection in CRUD scenarios?

In our multi-tier business application we have ObservableCollections of Self-Tracking Entities that are returned from service calls.
The idea is we want to be able to get entities, add, update and remove them from the collection client side, and then send these changes to the server side, where they will be persisted to the database.
Self-Tracking Entities, as their name might suggest, track their state themselves.
When a new STE is created, it has the Added state, when you modify a property, it sets the Modified state, it can also have Deleted state but this state is not set when the entity is removed from an ObservableCollection (obviously). If you want this behavior you need to code it yourself.
In my current implementation, when an entity is removed from the ObservableCollection, I keep it in a shadow collection, so that when the ObservableCollection is sent back to the server, I can send the deleted items along, so Entity Framework knows to delete them.
Something along the lines of:
protected IDictionary<int, IList> DeletedCollections = new Dictionary<int, IList>();
protected void SubscribeDeletionHandler<TEntity>(ObservableCollection<TEntity> collection)
{
var deletedEntities = new List<TEntity>();
DeletedCollections[collection.GetHashCode()] = deletedEntities;
collection.CollectionChanged += (o, a) =>
{
if (a.OldItems != null)
{
deletedEntities.AddRange(a.OldItems.Cast<TEntity>());
}
};
}
Now if the user decides to save his changes to the server, I can get the list of removed items, and send them along:
ObservableCollection<Customer> customers = MyServiceProxy.GetCustomers();
customers.RemoveAt(0);
MyServiceProxy.UpdateCustomers(customers);
At this point the UpdateCustomers method will verify my shadow collection if any items were removed, and send them along to the server side.
This approach works fine, until you start to think about the life-cycle these shadow collections. Basically, when the ObservableCollection is garbage collected there is no way of knowing that we need to remove the shadow collection from our dictionary.
I came up with some complicated solution that basically does manual memory management in this case. I keep a WeakReference to the ObservableCollection and every few seconds I check to see if the reference is inactive, in which case I remove the shadow collection.
But this seems like a terrible solution... I hope the collective genius of StackOverflow can shed light on a better solution.
EDIT:
In the end I decided to go with subclassing the ObservableCollection. The service proxy code is generated so it was a relatively simple task to change it to return my derived type.
Thanks for all the help!

Instead of rolling your own "weak reference + poll Is it Dead, Is it Alive" logic, you could use the HttpRuntime.Cache (available from all project types, not just web projects).
Add each shadow collection to the Cache, either with a generous timeout, or a delegate that can check if the original collection is still alive (or both).
It isn't dreadfully different to your own solution, but it does use tried and trusted .Net components.
Other than that, you're looking at extending ObservableCollection and using that new class instead (which I'd imagine is no small change), or changing/wrapping UpdateCustomers method to remove the shadow collection form DeletedCollections
Sorry I can't think of anything else, but hope this helps.
BW

If replacing ObservableCollection is a possibility (e.g. if you are using a common factory for all the collections instances) then you could subclass ObservableCollection and add a Finalize method which cleans up the deleted items that belongs to this collection.
Another alternative is to change the way you compute which items are deleted. You could maintain the original collection, and give the client a shallow copy. When the collection comes back, you can compare the two to see what items are no longer present. If the collections are sorted, then the comparison can be done in linear time on the size of the collection. If they're not sorted, then the modified collection values can be put in a hash table and that used to lookup each value in the original collection. If the entities have a natural id, then using that as the key is a safe way of determining which items are not present in the returned collection, that is, have been deleted. This also runs in linear time.
Otherwise, your original solution doesn't sound that bad. In java, a WeakReference can register a callback that gets called when the reference is cleared. There is no similar feature in .NET, but using polling is a close approximation. I don't think this approach is so bad, and if it's working, then why change it?
As an aside, aren't you concerned about GetHashCode() returning the same value for distinct collections? Using a weak reference to the collection might be more appropriate as the key, then there is no chance of a collision.

I think you're on a good path, I'd consider refactoring in this situation. My experience is that in 99% of the cases the garbage collector makes memory managment awesome - almost no real work needed.
but in the 1% of the cases it takes someone to realize that they've got to up the ante and go "old school" by firming up their caching/memory management in those areas. hats off to you for realizing you're in that situation and for trying to avoid the IDispose/WeakReference tricks. I think you'll really help the next guy who works in your code.
As for getting a solution, I think you've got a good grip on the situation
-be clear when your objects need to be created
-be clear when your objects need to be destroyed
-be clear when your objects need to be pushed to the server
good luck! tell us how it goes :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse