Entity Framework memory leak in Azure worker role - entity-framework

I investigate my worker role memory dumps using WinDbg.
I found the following results by grabbing dumps every half hour from WaWorkerHost.exe locally. First column is a count of objects, second is a size. Also, most expensive objects in the dump have string type.
35360 3394560 System.Data.Objects.EntitySqlQueryState
40256 3864576 System.Data.Objects.EntitySqlQueryState
45152 4334592 System.Data.Objects.EntitySqlQueryState
I found that class here http://entityframework.codeplex.com/SourceControl/latest#src/EntityFramework/Core/Objects/Internal/EntitySqlQueryState.cs
And you can see that it caches query string.
Is it possible that Entity Framework caches that objects without any releases ?
I found an article that NHibernate can.
http://rasmuskl.dk/2008/12/19/a-windbg-debugging-journey-nhibernate-memory-leak/
That worker role automatically restart every day on production server when RAM is over.
I have Entity Framework 5, Azure SDK 2.5.
Please, help me with that issue, what could you advice ?

Related

GCP datastore sudden extreme data inconsistency (NDB 1.8.0)

I have 6 months old Py38 standard gae project in europe-west3 region along with Firestore in DATASTORE mode.
Even with Redis as global cache or without, I have never had any inconsistency issues. Immediate (1 sec took the redirect) fetch after put (insert) yielded fresh results, up until last week. I have made some benching and it takes around 30s for put to result in global query. It actually behaves similar to datastore emulator with consistency parameter set to 0.05
I have read a lot about datastore and its eventual consistency here, but as the document says, this is true for "old" version. New firestore in datastore mode should ensure strong consistency as per this part.
Eventual consistency, all Datastore queries become strongly consistent.
Am I interpreting this claim wrong?
I have also created a fresh project (same region) with only the essential ndb initialization and still extreme "lag".
I'm running out of ideas what could cause this new behavior. Could be that Warshaw datacenter just started and this is causing the issues?
Abstract code with google-cloud-ndb==1.8.0
class X(ndb.Model):
foo = ndb.StringProperty()
x = X(foo="a")
x.put()
time.sleep(5)
for y in X.query(): # returns 0 results
print(y)
If I get entity its by key, it's there and fresh. It even instantly shows up in datastore admin.
This was also filed as https://github.com/googleapis/python-ndb/issues/666 . It turns out Cloud NDB before 1.9.0 was explicitly requesting eventually consistent queries.

MS Application Insights - Sql Dependencies error code 208

What does error 208 means? the query:
dependencies
| where type == "SQL" and success == "False"
| summarize count() by resultCode
is giving me 4500+ itens on the last hour alone and I can't seem to find any solid documentation about this.
Details:
The frequency of error rises as concurrency rises, meaning 1000 concurrent requests will generate more erros than 1000 sequential ones.
My application is Asp.Net MVC 4 framework 4.6 using latest EF
The error is intermittent. Performing a certain operation won't definitely result in the error
I don't think that this error means "Invalid Object Name" (as per other threads) because i can see EF auto-retrying this and eventually it goes through and the whole request is successfully returned (otherwise i would have A LOT of missed phone calls...)
The error occurs on both ASYNC and sync requests
I got in touch with MS support and according to them, this is caused by entity framework. Apparently EF keeps looking for 2 tables (migrationsHistory and edmMetadata) that I deliberately deleted. although that makes sense, i don't know why that error does not present itself on our in-house tests (the table are not present on the in-house dev env too...)
Above answer is correct however Id like to add additional information:
You need to have MigrationHistory table and it has to be populated correctly. edmMetadata is old table which was replaced by MigrationHistory so no need to worry about that.
Just by adding MigrationHistory tabled did not solve the issue completely ( I was down to 3 exceptions 208 from 5 ).
However, keep in mind that populating MigrationHistory table will render your dbContext out of sync if latest migration is not inserted in MigrationHistory!
Best way to get this is to issue:
UpdateDatabase -script
command and copy CREATE/INSERT/UPDATE statements from there.

Exporting Importing Breeze Model and hasTempKey issues

When you create a new entity Breeze set id:-1 state:'Added' hasTempKey:true. After export and re-import Breeze doesn't merge the import -1 entity with the current -1 entity in memory it adds a new one... this is explain in the docs... ( but how do we overcome this problem is the question in my case... ) So I tried to set the entity created to setUnchanged(); Now the export import cycle runs as expected but the created entity has lost it's hasTempKey:true property so newly created entity can conflict with a current one... some advice on how to resolve these issues would really be appreciated thanks
I assume that this question relates to your approach to implementing a disconnected app as described in this SO question.
As I said there, I you're spending too much time trying to trick Breeze into doing what it should not do. Here, for example, you want EntityManager.saveChanges to not actually save to the remote data store. But the whole point of "saveChanges" is that it persists permanently. "Saving locally" is not really saving. No one but you knows about these saved changes. You don't know if they would pass the business validation rules on your server or if they would collide with a different user's changes. If your laptop dies or is stolen, your locally saved data are gone.
I think breeze can be a huge help in crafting occasionally connected applications. But I think it is critical to properly differentiate stashing changes locally with the intent to save them and actually saving them remotely.
Outline of a Disconnected App
I urge you to take a different tack.
Your app could easily initiate a sequence of distinct editing sessions. For example, one session could be a travel reservation for client 'A', another session for client 'B', and a third session is about something else entirely ... maybe the client 'C' profile.
When your app can't reach the server, it preserves each session as a WIP ("Work In Progress"). Each WIP session its own serialized bundle, identified by a WIP key.
Aside: you'll see this pattern in John Papa's "Building Apps with Angular and Breeze Part 2" when that comes out later this year.
The Breeze EntityManager.exportEntities(list_of_entities) serializes everything about the changed entities of that session including their change-state, original values, and temp keys. Remember that the list_of_entities can be anything including an object graph. You can save that bundle to browser local storage under the WIP key and restore it later.
I'd keep a directory of WIP sessions that included information about the state of the session as a whole (e.g., what kind of editing session it is and whether this session was ready to be persisted remotely). Your app presents WIP sessions to the user while offline. When it gets a connection, it goes through a "synchronization" phase during which it tries to persist the changes. With luck it succeeds. If not, you can rehydrate the session in the UI and help the user reconcile the conflicts.
These are broad strokes. The devil is in the details.
The critical thing in this context is that you do not mess with entity state or the temp keys. You don't care what the keys are or if they change. The serialized session will hold that state information for you. The serialized bundles will move in and out of local storage without complaint. You are using Breeze as intended, while offline or online.
The current behavior is deliberate. Generally we assume that temp keys are effectively not comparable. However, I do understand your use case. So, one approach would be to:
1) import your "exported entities" into a temporary EntityManager and check for temp key collisions between this entityManager and your "destination" EntityManager.
2) Then remove any "dups" from the destination EntityManager
3) Import your original "exported entities" into your destination EntityManager
You can actually skip step 1 if you know that all tempKeys are dups.
Another approach would be to use Guid's for your keys. This completely bypasses the temp key issue because Guid never need to be "temporary".

Arquillian Persistence Extension - Long execution time, is it normal?

I'm writing some tests with arquillian for persistence layer in my app. I would like to use an Persistence Extension for database populating etc. The problem is that one test takes about ~15-25 seconds. Is it normal? Or am I doing something wrong? I've tried to run these tests on local postgres database (~10sec per test), remote postgres database (~15sec per test) and hsqldb at local container (~15sec per test).
Thanks in advance
P.S. When I'm not using "Persistence Extension" 12 tests takes about ~11sec (and that's acceptable), but I have to persist and delete entities from the code (hard to maintain and manage).
I am going to guess you are using APE (Arquillian Persistence Extension) v1.0.0a6. If this is the case what you are experiencing is the result of refactoring done between alpha5 and alpha6 which I filed the following ticket against: https://issues.jboss.org/browse/ARQ-1440
You could try using 1.0.0a5 which has some different issues that you might encounter and need to work around but it has 300% better performance then alpha6.

Issue with Entity Framework 4.2 Code First taking a long time to add rows to a database

I am currently using Entity Framework 4.2 with Code First. I currently have a Windows 2008 application server and a database server running on Amazon EC2. The application server has a Windows Service installed that runs once per day. The service executes the following code:
// returns between 2000-4000 records
var users = userRepository.GetSomeUsers();
// do some work
foreach (var user in users)
{
var userProcessed = new UserProcessed { User = user };
userProcessedRepository.Add(userProcessed);
}
// Calls SaveChanges() on DbContext
unitOfWork.Commit();
This code takes a few minutes to run. It also maxes out the CPU on the application server. I have tried the following measures:
Remove the unitOfWork.Commit() to see if it is network related when the application server talks to the database. This did not change the outcome.
Changed my application server from a medium instance to a high CPU instance on Amazon to see if it is resource related. This caused the server not to max out the CPU anymore and the execution time improved slightly. However, the execution time was still a few minutes.
As a test I modified the above code to run three times to see if execution time for the second and third loop using the same DbContext. Every consecutive loop took longer to run that the previous one but that could be related to using the same DbContext.
Am I missing something? Is it really possible that something as simple as this takes minutes to run? Even if I don't commit to the database after each loop? Is there a way to speed this up?
Entity Framework (as it stands) isn't really well suited to this kind of bulk operation. Are you able to use one of the bulk insert methods with EC2? Otherwise, you might find that hand-coding the T-SQL INSERT statements is significantly faster. If performance is important then that probably outweighs the benefits of using EF.
My guess is that your ObjectContext is accumulating a lot of entity instances. SaveChanges seems to have a phase that has time linear in the number of entities loaded. This is likely the reason for the fact that it is taking longer and longer.
A way to resolve this is to use multiple, smaller ObjectContexts to get rid of old entity instances.