EF database concurrency - entity-framework

I have two apps: one app is asp.net and another is a windows service running in background.
The windows service running in background is performing some tasks (read and update) on database while user can perform other operations on database through asp.net app. So I am worried about it as for example, in windows service I collect some record that satisfy a condition and then I iterate over them, something like:
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>)
foreach (EntityA entity in collection)
{
// do some stuff
}
so, if user modify a record that is used later in the loop iteration, what value for that record is EF taken into account? the original retrieved when performed:
context.EntitiesA.where(<condition>)
or the new one modified by the user and located in database?
As far as I know, during iteration, EF is taken each record at demand, I mean, one by one, so when reading the next record for the next iteration, this record corresponds to that collected from :
context.EntitiesA.where(<condition>)
or that located in database (the one the user has just modified)?
Thanks!

There's a couple of process that will come into play here in terms of how this will work in EF.
Queries are only performed on enumeration (this is sometimes referred to as query materialisation) at this point the whole query will be performed
Lazy loading only effects navigation properties in your above example. The result set of the where statement will be pulled down in one go.
So what does this mean in your case:
//nothing happens here you are just describing what will happen later to make the
// query execute here do a .ToArray or similar, to prevent people adding to the sql
// resulting from this use .AsEnumerable
IQueryable<EntityA> collection = context.EntitiesA.where(<condition>);
//when it first hits this foreach a
//SELECT {cols} FROM [YourTable] WHERE [YourCondition] will be performed
foreach (EntityA entity in collection)
{
//data here will be from the point in time the foreach started (eg if you have updated during the enumeration in the database you will have out of date data)
// do some stuff
}

If you're truly concerned that this can happen then get a list of id's up front and process them individually with a new DbContext for each (or say after each batch of 10). Something like:
IList<int> collection = context.EntitiesA.Where(...).Select(k => k.id).ToList();
foreach (int entityId in collection)
{
using (Context context = new Context())
{
TEntity entity = context.EntitiesA.Find(entityId);
// do some stuff
context.Submit();
}
}

I think the answer to your question is 'it depends'. The problem you are describing is called 'non repeatable reads' an can be prevented from happening by setting a proper transaction isolation level. But it comes with a cost in performance and potential deadlocks.
For more details you can read this

Related

Entity Framework Arithabort ON, but still query is slow

I have a simple query
var count = await _context.ExchangeRate.AsNoTracking().CountAsync(u => u.Currency == "GBP");
The table has only 3 Columns and 10 rows data.
When I tried to execute the query from Net 5 project it is taking around 2.3 seconds for the first time and 500ms (+- 100) for subsequent requests. When I hit the same request in SSMS it is returning in almost no time (45ms as seen in sql profiler).
I have implemented ARITHABORT ON in EF from here
When I see in SQL Profiler it is setting ARITHABORT ON but still the query takes the same time for the first request and subsequent requests.
How do I achieve speed same as SSMS query speed. I need the query to run really speed as my project has requirement to the return the response in 1 second (Need to make atleast 5 simple DB calls...if 1 call takes 500ms then it is crossing 1 second requirement)
Edit
Tried with even ADO.Net. The execution time took as seen in SQL Profiler is 40ms where as when it reached the code it is almost 400ms. So much difference
using (var conn = new SqlConnection(connectionString))
{
var sql = "select count(ExchangeRate) as cnt from ExchangeRate where Currency = 'GBP'";
SqlCommand cmd = new SqlCommand();
cmd.CommandText = "SET ARITHABORT ON; " + sql;
cmd.CommandType = CommandType.Text;
cmd.Connection = conn;
conn.Open();
var t1 = DateTime.Now;
var rd = cmd.ExecuteReader();
var t2 = DateTime.Now;
TimeSpan diff = t2 - t1;
Console.WriteLine((int)diff.TotalMilliseconds);
while (rd.Read())
{
Console.WriteLine(rd["cnt"].ToString());
}
conn.Close();
}
Your "first run" scenario is generally the one-off static initialization of the DbContext. This is where the DbContext works out its mappings for the first time and will occur when the first query is executed. The typical approach to avoid this occurring for a user is to have a simple "warm up" query that runs when the service starts up.. For instance after your service initializes, simply put something like the following:
// Warm up the DbContext
using (var context = new AppDbContext())
{
var hasUser = context.Users.Any();
}
This also serves as a quick start-up check that the database is reachable and responding. The query itself will do a very quick operation, but the DbContext will resolve its mappings at this time so any newly generated DbContext instances will respond without incurring that cost during a request.
As for raw performance, if it isn't a query that is expected to take a while and tie up a request, don't make it async. Asynchronous requests are not faster, they are actually a bit slower. Using async requests against the DbContext is about ensuring your web server / application thread is responsive while potentially expensive database operations are processing. If you want a response as quickly as possible, use a synchronous call.
Next, ensure that any fields you are filtering against, in this case Currency, are indexed. Having a field called Currency in your entity as a String rather than a CurrencyId FK (int) pointing to a Currency record is already an extra indexing expense as indexes on integers are smaller/faster than those on strings.
You also don't need to bother with AsNoTracking when using a Count query. AsNoTracking applies solely when you are returning entities (ToList/ToArray/Single/Firstetc.) to avoid having the DbContext holding onto a reference to the returned entity. When you use Count/Any or projection to return properties from entities using Select there is no entity returned to track.
Also consider network latency between where your application code is running and the database server. Are they the same machine or is there a network connection in play? How does this compare when you are performing a SSMS query? Using a profiler you can see what SQL EF is actually sending to the database. Everything else in terms of time is a cost of: Getting the request to the DB, Getting the resulting data back to the requester, parsing that response. (If in the case where you are returning entities, allocating, populating, checking against existing references, etc... In the case of counts etc. checking existing references)
Lastly, to ensure you are getting the peak performance, ensure that your DbContexts lifetimes are kept short. If a DbContext is kept open and has had a number of tracking queries run against in (Selecting entities without AsNoTracking) those tracked entity references accumulate and can have a negative performance impact on future queries, even if you use AsNoTracking as EF looks to check through it's tracked references for entities that might be applicable/related to your new queries. Many times I see developers assume DbContexts are "expensive" so they opt to instantiate them as little as possible to avoid those costs, only to end up making operations more expensive over time.
With all that considered, EF will never be as fast as raw SQL. It is an ORM designed to provide convenience to .Net applications when it comes to working with data. That convenience in working with entity classes rather than sanitizing and writing your own raw SQL every time comes with a cost.

Spring Boot controller preventing multiple inserts upon quick successive requests in mongodb

I have a REST API to calculate something upon a request, and if the same request is made again, return the result from the cache, which consist of documents saved in MongoDB. To know if two request is the same, I am hashing some relevant fields in the request. But when same request is made in a quick succession, duplicate documents occur in MongoDB, which later results in "IncorrectResultSizeDataAccessException" when I try to read them.
To solve it I tried to synchronize on hash value in following controller method (tried to cut out non relevant parts):
#PostMapping(
path = "/{myPath}",
consumes = {MediaType.APPLICATION_JSON_UTF8_VALUE},
produces = {MediaType.APPLICATION_JSON_UTF8_VALUE})
#Async("asyncExecutor")
public CompletableFuture<ResponseEntity<?>> retrieveAndCache( ... a,b,c,d various request parameters) {
//perform some validations on request...
//hash relevant equest parameters
int hash = Objects.hash(a, b, c, d);
synchronized (Integer.toString(hash).intern()) {
Optional<Result> resultOpt = cacheService.findByHash(hash);
if (resultOpt.isPresent()) {
return CompletableFuture.completedFuture(ResponseEntity.status(HttpStatus.OK).body(opt.get().getResult()));
} else {
Result result = ...//perform requests to external services and do some calculations...
cacheService.save(result);
return CompletableFuture.completedFuture(ResponseEntity.status(HttpStatus.OK).body(result));
}
}
}
//cacheService methods
#Transactional
public Optional<Result> findByHash(int hash) {
return repository.findByHash(hash); //this is the part that throws the error
}
I am sure that no hash collision is occuring, its just when the same request is performed in a quick succession duplicate records occur. To my understanding, it shouldn't occur as long as I have only 1 running instance of my spring boot application. Do you see any other reason than there are multiple instances running in production?
You should check the settings of your MongoDB client.
If one thread calls the cacheService.save(result) method, and after that method returns, releases the lock, then another thread calls cacheService.findByHash(hash), it's still possible that it will not find the record that you just saved.
It's possible that e.g. the save method returns as soon as the saved object is in the transaction log, but not fully processed yet. Or the save is processed on the primary node, but the findByHash is executed on the secondary node, where it's not replicated yet.
You could use WriteConcern.MAJORITY, but I'm not 100% sure if it covers everything.
Even better is to let MongoDB do the locking by using findAndModify with FindAndModifyOptions.upsert(true), and forget about the lock in your java code.

Get int value from database

How i can get int value from database?
Table has 4 columns
Id, Author, Like, Dislike.
I want to get Dislike amount and add 1.
i try
var db = new memyContext();
var amountLike = db.Memy.Where(s => s.IdMema == id).select(like);
memy.like=amountLike+1;
I know that this is bad way.
Please help
I'm not entirely sure what your question is here, but there's a few things that might help.
First, if you're retrieving via something that reasonably only has one match, or in a scenario where you want just one thing, then you should be use SingleOrDefault or FirstOrDefault, respectively - not Where. Where is reserved for scenarios where you expect multiple things to match, i.e. the result will be a list of objects, not an object. Since you're querying by an id, then it's fairly obvious that you expect just one match. Therefore:
var memy = db.Memy.SingleOrDefault(s => s.IdMema == id);
Second, if you just need to read the value of Like, then you can use Select, but here there's two problems with that. First, Select can only be used on enumerables, as already discussed here, you need a single object, not a list of objects. In truth, you can sidestep this in a somewhat convoluted way:
var amountLike = db.Memy.Select(x => x.Like).SingleOrDefault(x => x.IdMema == id);
However, this is still flawed, because you not only need to read this value, but also write back to it, which then needs the context of the object it belongs to. As such, your code should actually look like:
var memy = db.Memy.SingleOrDefault(s => s.IdMema == id);
memy.Like++;
In other words, you pull out the instance you want to modify, and then modify the value in place on that instance. I also took the liberty of using the increment operator here, since it makes far more sense that way.
That then only solves part of your problem, as you need to persist this value back to the database as well, of course. That also brings up the side issue of how you're getting your context. Since this is an EF context, it implements IDisposable and should therefore be disposed when you're done with it. That can be achieved simply by calling db.Dispose(), but it's far better to use using instead:
using (var db = new memyContext())
{
// do stuff with db
}
And while we're here, based on the tags of your question, you're using ASP.NET Core, which means that even this is sub-optimal. ASP.NET Core uses DI (dependency injection) heavily, and encourages you to do likewise. An EF context is generally registered as a scoped service, and should therefore be injected where it's needed. I don't have the context of where this code exists, but for illustration purposes, we'll assume it's in a controller:
public class MemyController : Controller
{
private readonly memyContext _db;
public MemyController(memyContext db)
{
_db = db;
}
...
}
With that, ASP.NET Core will automatically pass in an instance of your context to the constructor, and you do not need to worry about creating the context or disposing of it. It's all handled for you.
Finally, you need to do the actual persistence, but that's where things start to get trickier, as you now most likely need to deal with the concept of concurrency. This code could be being run simultaneously on multiple different threads, each one querying the database at its current state, incrementing this value, and then attempting to save it back. If you do nothing, one thread will inevitably overwrite the changes of the other. For example, let's say we receive three simultaneous "likes" on this object. They all query the object from the database, and let's say that the current like count is 0. They then each increment that value, making it 1, and then they each save the result back to the database. The end result is the value will be 1, but that's not correct: there were three likes just added.
As such, you'll need to implement a semaphore to essentially gate this logic, allowing only one like operation through at a time for this particular object. That's a bit beyond the scope here, but there's plenty of stuff online about how to achieve that.

Does EF caching work differently for SQL Server CE 3.5?

I have been developing some single-user desktop apps using Entity Framework and SQL Server 3.5. I thought I had read somewhere that once records are in an EF cache for one context, if they are deleted using a different context, they are not removed from the cache for the first context even when a new query is executed. Hence, I've been writing really inefficient and obfuscatory code so I can dispose the context and instantiate a new one whenever another method modifies the database using its own context.
I recently discovered some code where I had not re-instantiated the first context under these conditions, but it worked anyway. I wrote a simple test method to see what was going on:
using (UnitsDefinitionEntities context1 = new UnitsDefinitionEntities())
{
List<RealmDef> rdl1 = (from RealmDef rd in context1.RealmDefs
select rd).ToList();
RealmDef rd1 = RealmDef.CreateRealmDef(100, "TestRealm1", MeasurementSystem.Unknown, 0);
context1.RealmDefs.AddObject(rd1);
context1.SaveChanges();
int rd1ID = rd1.RealmID;
using (UnitsDefinitionEntities context2
= new UnitsDefinitionEntities())
{
RealmDef rd2 = (from RealmDef r in context2.RealmDefs
where r.RealmID == rd1ID select r).Single();
context2.RealmDefs.DeleteObject(rd2);
context2.SaveChanges();
rd2 = null;
}
rdl1 = (from RealmDef rd in context1.RealmDefs select rd).ToList();
Setting a breakpoint at the last line I was amazed to find that the added and deleted entity was in fact not returned by the second query on the first context!
I several possible explanations:
I am totally mistaken in my understanding that the cached records
are not removed upon requerying.
EF is capricious in its caching and it's a matter of luck.
Caching has changed in EF 4.1.
The issue does not arise when the two contexts are
instantiated in the same process.
Caching works differently for SQL CE 3.5 than other versions of SQL
server.
I suspect the answer may be one of the last two options. I would really rather not have to deal with all the hassles in constantly re-instantiating contexts for single-user desktop apps if I don't have to do so.
Can I rely on this discovered behavior for single-user desktop apps using SQL CE (3.5 and 4)?
When you run the 2nd query on an the ObjectSet it's requerying the database, which is why it's reflecting the change exposed by your 2nd context. Before we go too far into this, are you sure you want to have 2 contexts like you're explaining? Contexts should be short lived, so it might be better if you're caching your list in memory or doing something else of that nature.
That being said, you can access the local store by calling ObjectStateManager.GetObjectStateEntries and viewing what is in the store there. However, what you're probably looking for is the .Local storage that's provided by DbSets in EF 4.2 and beyond. See this blog post for more information about that.
Judging by your class names, it looks like you're using an edmx so you'll need to make some changes to your file to have your context inherit from a DbSet to an objectset. This post can show you how
Apparently Explanation #1 was closer to the fact. Inserting the following statement at the end of the example:
var cached = context1.ObjectStateManager.GetObjectStateEntries(System.Data.EntityState.Unchanged);
revealed that the record was in fact still in the cache. Mark Oreta was essentially correct in that the database is actually re-queried in the above example.
However, navigational properties apparently behave differently, e.g.:
RealmDef distance = (from RealmDef rd in context1.RealmDefs
where rd.Name == "Distance"
select rd).Single();
SystemDef metric = (from SystemDef sd in context1.SystemDefs
where sd.Name == "Metric"
select sd).Single();
RealmSystem rs1 = (from RealmSystem rs in distance.RealmSystems
where rs.SystemID == metric.SystemID
select rs).Single();
UnitDef ud1 = UnitDef.CreateUnitDef(distance.RealmID, metric.SystemID, 100, "testunit");
rs1.UnitDefs.Add(ud1);
context1.SaveChanges();
using (UnitsDefinitionEntities context2 = new UnitsDefinitionEntities())
{
UnitDef ud2 = (from UnitDef ud in context2.UnitDefs
where ud.Name == "testunit"
select ud).Single();
context2.UnitDefs.DeleteObject(ud2);
context2.SaveChanges();
}
udList = (from UnitDef ud in rs1.UnitDefs select ud).ToList();
In this case, breaking after the last statement reveals that the last query returns the deleted entry from the cache. This was my source of confusion.
I think I now have a better understanding of what Julia Lerman meant by "Query the model, not the database." As I understand it, in the previous example I was querying the database. In this case I am querying the model. Querying the database in the previous situation happened to do what I wanted, whereas in the latter situation querying the model would not have the desired effect. (This is clearly a problem with my understanding, not with Julia's advice.)

Get duplicate key exception when deleting and then re-adding child rows with Entity Framework

I'm using the Entity Framework to model a simple parent child relationship between a document and it's pages. The following code is supposed to (in this order):
make a few property updates to the document
delete any of the document's existing pages
insert a new list of pages passed into the method.
The new pages do have the same keys as the deleted pages because there is an index that consists of the document number and then the page number (1..n).
This code works. However, when I remove the first call to SaveChanges, it fails with:
System.Data.SqlClient.SqlException: Cannot insert duplicate key row in object 
'dbo.DocPages' with unique index 'IX_DocPages'.
Here is the working code with two calls to SaveChanges:
Document doc = _docRepository.GetDocumentByRepositoryDocKey(repository.Repository_ID, repositoryDocKey);
if (doc == null) {
doc = new Document();
_docRepository.Add(doc);
}
_fieldSetter.SetDocumentFields(doc, fieldValues);
List<DocPage> pagesToDelete = (from p in doc.DocPages
select p).ToList();
foreach (DocPage page in pagesToDelete) {
_docRepository.DeletePage(page);
}
_docRepository.GetUnitOfWork().SaveChanges(); //IF WE TAKE THIS OUT IT FAILS
int pageNo = 0;
foreach (ConcordanceDatabase.PageFile pageFile in pageList) {
++pageNo;
DocPage newPage = new DocPage();
newPage.PageNumber = pageNo;
newPage.ImageRelativePath = pageFile.Filespec;
doc.DocPages.Add(newPage);
}
_docRepository.GetUnitOfWork().SaveChanges(); //WHY CAN'T THIS BE THE ONLY CALL TO SaveChanges
If I leave the code as written, EF creates two transactions -- one for each call to SaveChanges. The first updates the document and deletes any existing pages. The second transaction inserts the new pages. I examined the SQL trace and that is what I see.
However, if I remove the first call to SaveChanges (because I'd like the whole thing to run in a single transaction), EF mysteriously does not do the deletes at all but rather generates only the inserts?? -- which result in the duplicate key error. I wouldn't think that waiting to call SaveChanges should matter here?
Incidentally, the call to _docRepository.DeletePage(page) does a objectContext.DeleteObject(page). Can anyone explain this behavior? Thanks.
I think a more likely explanation is that EF does do the deletes, but probably it does them after the insert, so you end up passing through an invalid state.
Unfortunately you don't have low level control over the order DbCommands are executed in the database.
So you need two SaveChanges() calls.
One option is to create a wrapping TransactionScope.
Then you can call SaveChanges() twice and it all happens inside the same transaction.
See this post for more information on the related techniques
Hope this helps
Alex
Thank you Alex. This is very interesting. I did indeed decide to wrap the whole thing up in a transaction scope and it did work fine with the two SaveChanges() -- which, as you point out, appear to be needed due to the primary key conflict with the deletes and subsequent inserts. A new issue now arises based on the article to which you linked. It properly advises to call SaveChanges(false) -- instructing EF to hold onto it's changes because the outer transaction scope will actually control whether those changes ever actually make it to the database. Once the controlling code calls scope.Complete(), the pattern is to then call EF's context.AcceptAllChanges(). But I think this will be problematic for me because I'm forced to call SaveChanges TWICE for the problem originally described. If both those calls to SaveChanges specify False for the accept changes parameter, then I suspect the second call will end up repeating the SQL from the first. I fear I may be in a Catch-22.