Entity Framework 5 Memory Leak - entity-framework

The application is a Windows Service that is retrieving a Generic List of my entity and then based on the data returned, I am running other tasks. This data is read-only. There will be no requirement to update the data in the backend store (Oracle 11g).
Why do I believe I have a memory leak?
Despite disposing the context via a using statement, it doesn't seem the GC is harvesting these objects. While profiling the application, I have noticed that the Heap memory gradually increases throughout the lifetime of the application. I also notice more than a few GEN 2 Collections, which of course means objects are existing for quite some time before being collected.
This doesn't occur when I just return a manually generated collection of objects. In that case I see very few GEN 2 collections (good) and the Heap memory doesn't increase gradually.
I am running Entity Framework 5.0/C# 4.5. I am accessing an Oracle database. I am using the database first approach.
Besides disposing the context immediately, what else can I do to ensure that the GC will collect the context?
public IEnumerable<EventServer> GetRegionalEventServers()
{
IEnumerable<EventServer> eventServers = null;
using (AssetEntities context = new AssetEntities())
{
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.LazyLoadingEnabled = false;
context.Configuration.ProxyCreationEnabled = false;
try
{
eventServers = from p in context.EVENTSERVERS
select new EventServer
{
ServerName = p.SERVER_NAME,
Domain = p.DOMAIN,
ServerLocation = p.SERVER_LOCATION
};
return eventServers;
}
finally
{
eventServers = null;
}
}
}

Related

avoid concurrent access of postgres db

We have two .net services (.Net core console applications) which are accessing a postgres db table.
Service 1 inserts some 500 rows every 1 minute. It runs as a background thread.
Service 2 reads data from the same table continuously. There is an MQTT publisher which keeps reading data from this table when any new data is requested. This also happens very frequently i.e atleast 4/5 times a minute.
We are getting "FATAL: sorry, too many clients already " error.
What I am assuming is since write and read is happening simultaneously too frequently, the connection is not getting dispose properly.
Is there a way to avoid read whenever a write is happening.
EDITED
Thanks for the reply.. I know some connection pooling is happening but not sure where.. so my question was how to avoid concurrent access of postgres db..
Was not sure what part of code I can post to make the question clear
I am having using clause on dbcontext and also disposed like the below..
This is retrieval section
using (PlatinumDBContext platinumDBContext = new PlatinumDBContext())
{
try
{
var data = platinumDBContext.TrendPoints.Where(x => ids.Contains(x.TrendPointID) && x.TimeStamp >= DateTime.Now.AddHours(-timeinHours));
result = data.Select(x => new Last24hours
{
Label = x.TrendPointID.ToString(),
Value = (double)x.TrendPointValue,
time = x.TimeStamp.ToString("MM/dd/yyyy HH:mm:ss")
}).ToList();
}
catch (Exception oE)
{
}
finally {
platinumDBContext.Dispose();
}
}
This is the insertion section
using (PlatinumDBContext platinumDBContext = new PlatinumDBContext())
{
try
{
foreach (var point in trendPoints)
{
if (point != null)
{
TrendPoint item = new TrendPoint();
item.CreatedDate = DateTime.Now;
item.ObjectState = ObjectState.Added;
item.TrendPointID = point.TrendID;
item.TrendPointValue = double.IsNaN(point.Value) ? decimal.MinValue : (decimal)point.Value;
item.TimeStamp = new DateTime(point.TimeStamp);
platinumDBContext.Add(item);
}
}
platinumDBContext.SaveChanges();
}
catch (Exception ex)
{
}
finally
{
platinumDBContext.Dispose();
}
}
Regards,
Geervani

What impact does changing a IReliableQueue to a IReliableConcurrentQueue have in an existing deployment?

I am working in a Service Fabric application that uses IReliableQueue. For the uses cases of this system, the IReliableConcurrentQueue makes sense to use and some local testing (i.e. basically by just changing the code to use IReliableConcurrentQueue instead of IReliableQueue - queue name does not change) shows great performance improvements. However, I am worried about the impact of changing this in a production system (i.e. upgrading). I can't find any docs or online questions (unless I just missed them) about these considerations. For example, in this system, the existing IReliableQueue will almost always have items. So what happens to that data when I upgrade the SF application? Will it be available to dequeue in the IReliableConcurrentQueue? Or would data be lost? I know I can "just try it" but wanted to see if someone out there had done the same or could offer pointers to existing resources. Thanks!
Sorry for a late answer (that you probably don't need anymore but still).
When we calling GetOrAddAsync method on IReliableStateManager we aren't retrieving the interface to store values - we actually creating an instance of reliable collection. This basically means that type of the interface we specify is very important.
Taking this into account if we do this:
Service v. 1.0
// Somewhere in RunAsync for example
await this.StateManager.GetOrAddAsync<IReliableQueue<long>>("MyCollection")
Then doing this in the next version:
Service v. 1.1
// Somewhere in RunAsync for example
await this.StateManager.GetOrAddAsync<IReliableConcurrentQueue<long>>("MyCollection")
will throw an exception:
Returned reliable object of type Microsoft.ServiceFabric.Data.Collections.DistributedQueue`1[System.Int64] cannot be casted to requested type Microsoft.ServiceFabric.Data.Collections.IReliableConcurrentQueue`1[System.Int64]
and then:
System.ExecutionEngineException: 'Exception of type 'System.ExecutionEngineException' was thrown.'
The above exception looks like a bug so I have filled one.
UPDATE 2019.06.28
It turned out that appearance of System.ExecutionEngineException isn't a bug but rather an undocumented behavior of Environment.FailFast method in combination with Visual Studio debugger.
Please see my comment to the above issue.
This is what would happen.
There are plenty ways to overcome this.
Here is the most obvious one:
Example
var migrate = false; // This flag indicates whether the migration was already done.
var migrateValues = new List<long>();
var applicationFlags = await this.StateManager
.GetOrAddAsync<IReliableDictionary<string, bool>>("application-flags");
using (var transaction = this.StateManager.CreateTransaction())
{
var flag = await applicationFlags
.TryGetValueAsync(transaction, "queue-to-concurrent-queue-migration");
if (!flag.HasValue || !flag.Value)
{
var queue = await this.StateManager
.GetOrAddAsync<IReliableQueue<long>>("value-collection");
for (;;)
{
var c = await queue.TryDequeueAsync(transaction);
if (!c.HasValue)
{
break;
}
migrateValues.Add(c.Value);
}
migrate = true;
}
}
if (migrate)
{
await this.StateManager.RemoveAsync("value-collection");
using (var transaction = this.StateManager.CreateTransaction())
{
var concurrentQueue = await this.StateManager
.GetOrAddAsync<IReliableConcurrentQueue<long>>("value-collection");
foreach (var i in migrateValues)
{
await concurrentQueue.EnqueueAsync(transaction, i);
}
await applicationFlags.AddOrUpdateAsync(
transaction,
"queue-to-concurrent-queue-migration",
true,
(s, b) => true);
}
await transaction.CommitAsync();
}
Please note that this code is just an illustrative example and should be properly tested before applying it to real life application.

Parallel.Foreach and BulkCopy

I have a C# library which connects to 59 servers of the same database structure and imports data to my local db to the same table. At this moment I am retrieving data server by server in foreach loop:
foreach (var systemDto in systems)
{
var sourceConnectionString = _systemService.GetConnectionStringAsync(systemDto.Ip).Result;
var dbConnectionFactory = new DbConnectionFactory(sourceConnectionString,
"System.Data.SqlClient");
var dbContext = new DbContext(dbConnectionFactory);
var storageRepository = new StorageRepository(dbContext);
var usedStorage = storageRepository.GetUsedStorageForCurrentMonth();
var dtUsedStorage = new DataTable();
dtUsedStorage.Load(usedStorage);
var dcIp = new DataColumn("IP", typeof(string)) {DefaultValue = systemDto.Ip};
var dcBatchDateTime = new DataColumn("BatchDateTime", typeof(string))
{
DefaultValue = batchDateTime
};
dtUsedStorage.Columns.Add(dcIp);
dtUsedStorage.Columns.Add(dcBatchDateTime);
using (var blkCopy = new SqlBulkCopy(destinationConnectionString))
{
blkCopy.DestinationTableName = "dbo.tbl";
blkCopy.WriteToServer(dtUsedStorage);
}
}
Because there are many systems to retrieve data, I wonder if it is possible to use Pararel.Foreach loop? Will BulkCopy lock the table during WriteToServer and next WriteToServer will wait until previous will complete?
-- EDIT 1
I've changed Foreach to Parallel.Foreach but I face one problem. Inside this loop I have async method: _systemService.GetConnectionStringAsync(systemDto.Ip)
and this line returns error:
System.NotSupportedException: A second operation started on this
context before a previous asynchronous operation completed. Use
'await' to ensure that any asynchronous operations have completed
before calling another method on this context. Any instance members
are not guaranteed to be thread safe.
Any ideas how can I handle this?
In general, it will get blocked and will wait until the previous operation complete.
There are some factors that may affect if SqlBulkCopy can be run in parallel or not.
I remember when adding the Parallel feature to my .NET Bulk Operations, I had hard time to make it work correctly in parallel but that worked well when the table has no index (which is likely never the case)
Even when worked, the performance gain was not a lot faster.
Perhaps you will find more information here: MSDN - Importing Data in Parallel with Table Level Locking

Code First - Retrieve and Update Record in a Transaction without Deadlocks

I have a EF code first context which represents a queue of jobs which a processing application can retrieve and run. These processing applications can be running on different machines but pointing at the same database.
The context provides a method that returns a QueueItem if there is any work to do, or null, called CollectQueueItem.
To ensure no two applications can pick up the same job, the collection takes place in a transaction with an ISOLATION LEVEL of REPEATABLE READ. This means that if there are two attempts to pick up the same job at the same time, one will be chosen as the deadlock victim and be rolled back. We can handle this by catching the DbUpdateException and return null.
Here is the code for the CollectQueueItem method:
public QueueItem CollectQueueItem()
{
using (var transaction = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead }))
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
transaction.Complete();
return queueItem;
}
}
catch (DbUpdateException) //we might have been the deadlock victim. No matter.
{ }
return null;
}
}
I ran a test in LinqPad to check that this is working as expected. Here is the test below:
var ids = Enumerable.Range(0, 8).AsParallel().SelectMany(i =>
Enumerable.Range(0, 100).Select(j => {
using (var context = new QueueContext())
{
var queueItem = context.CollectQueueItem();
return queueItem == null ? -1 : queueItem.OperationId;
}
})
);
var sw = Stopwatch.StartNew();
var results = ids.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
sw.Stop();
Console.WriteLine("Elapsed time: {0}", sw.Elapsed);
Console.WriteLine("Deadlocked: {0}", results.Where(r => r.Key == -1).Select(r => r.Value).SingleOrDefault());
Console.WriteLine("Duplicates: {0}", results.Count(r => r.Key > -1 && r.Value > 1));
//IsolationLevel = IsolationLevel.RepeatableRead:
//Elapsed time: 00:00:26.9198440
//Deadlocked: 634
//Duplicates: 0
//IsolationLevel = IsolationLevel.ReadUncommitted:
//Elapsed time: 00:00:00.8457558
//Deadlocked: 0
//Duplicates: 234
I ran the test a few times. Without the REPEATABLE READ isolation level, the same job is retrieved by different theads (seen in the 234 duplicates). With REPEATABLE READ, jobs are only retrieved once but performance suffers and there are 634 deadlocked transactions.
My question is: is there a way to get this behaviour in EF without the risk of deadlocks or conflicts? I know in real life there will be less contention as the processors won't be continually hitting the database, but nonetheless, is there a way to do this safely without having to handle the DbUpdateException? Can I get performance closer to that of the version without the REPEATABLE READ isolation level? Or are Deadlocks not that bad in fact and I can safely ignore the exception and let the processor retry after a few millis and accept that the performance will be OK if the not all the transactions are happening at the same time?
Thanks in advance!
Id recommend a different approach.
a) sp_getapplock
Use an SQL SP that provides an Application lock feature
So you can have unique app behaviour, which might involve read from the DB or what ever else activity you need to control. It also lets you use EF in a normal way.
OR
b) Optimistic concurrency
http://msdn.microsoft.com/en-us/data/jj592904
//Object Property:
public byte[] RowVersion { get; set; }
//Object Configuration:
Property(p => p.RowVersion).IsRowVersion().IsConcurrencyToken();
a logical extension to the APP lock or used just by itself is the rowversion concurrency field on DB. Allow the dirty read. BUT when someone goes to update the record As collected, it fails if someone beat them to it. Out of the box EF optimistic locking.
You can delete "collected" job records later easily.
This might be better approach unless you expect high levels of concurrency.
As suggested by Phil, I used optimistic concurrency to ensure the job could not be processed more than once. I realised that rather than having to add a dedicated rowversion column I could use the IsLocked bit column as the ConcurrencyToken. Semantically, if this value has changed since we retrieved the row, the update should fail since only one processor should ever be able to lock it. I used the fluent API as below to configure this, although I could also have used the ConcurrencyCheck data annotation.
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<QueueItem>()
.Property(p => p.IsLocked)
.IsConcurrencyToken();
}
I was then able to simple the CollectQueueItem method, losing the TransactionScope entirely and catching the more DbUpdateConcurrencyException.
public OperationQueueItem CollectQueueItem()
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
return queueItem;
}
}
catch (DbUpdateConcurrencyException) //someone else grabbed the job.
{ }
return null;
}
I reran the tests, you can see it's a great compromise. No duplicates, nearly 100x faster than with REPEATABLE READ, and no DEADLOCKS so the DBAs won't be on my case. Awesome!
//Optimistic Concurrency:
//Elapsed time: 00:00:00.5065586
//Deadlocked: 624
//Duplicates: 0

Get connection used by DatabaseFactory.GetDatabase().ExecuteReader()

We have two different query strategies that we'd ideally like to operate in conjunction on our site without opening redundant connections. One strategy uses the enterprise library to pull Database objects and Execute_____(DbCommand)s on the Database, without directly selecting any sort of connection. Effectively like this:
Database db = DatabaseFactory.CreateDatabase();
DbCommand q = db.GetStoredProcCommand("SomeProc");
using (IDataReader r = db.ExecuteReader(q))
{
List<RecordType> rv = new List<RecordType>();
while (r.Read())
{
rv.Add(RecordType.CreateFromReader(r));
}
return rv;
}
The other, newer strategy, uses a library that asks for an IDbConnection, which it Close()es immediately after execution. So, we do something like this:
DbConnection c = DatabaseFactory.CreateDatabase().CreateConnection();
using (QueryBuilder qb = new QueryBuilder(c))
{
return qb.Find<RecordType>(ConditionCollection);
}
But, the connection returned by CreateConnection() isn't the same one used by the Database.ExecuteReader(), which is apparently left open between queries. So, when we call a data access method using the new strategy after one using the old strategy inside a TransactionScope, it causes unnecessary promotion -- promotion that I'm not sure we have the ability to configure for (we don't have administrative access to the SQL Server).
Before we go down the path of modifying the query-builder-library to work with the Enterprise Library's Database objects ... Is there a way to retrieve, if existent, the open connection last used by one of the Database.Execute_______() methods?
Yes, you can get the connection associated with a transaction. Enterprise Library internally manages a collection of transactions and the associated database connections so if you are in a transaction you can retrieve the connection associated with a database using the static TransactionScopeConnections.GetConnection method:
using (var scope = new TransactionScope())
{
IEnumerable<RecordType> records = GetRecordTypes();
Database db = DatabaseFactory.CreateDatabase();
DbConnection connection = TransactionScopeConnections.GetConnection(db).Connection;
}
public static IEnumerable<RecordType> GetRecordTypes()
{
Database db = DatabaseFactory.CreateDatabase();
DbCommand q = db.GetStoredProcCommand("GetLogEntries");
using (IDataReader r = db.ExecuteReader(q))
{
List<RecordType> rv = new List<RecordType>();
while (r.Read())
{
rv.Add(RecordType.CreateFromReader(r));
}
return rv;
}
}