Entity framework core and ChangeTracker memory consumption - entity-framework

Using entity framework core 6:
In my aspnet core project, my controllers get a DB context through dependency injection with service setup like the following:
...
builder.Services.AddDbContext<MyDBContext>(options => options.UseSqlServer(dbConnectionString));
...
The controllers in the application do several database operations that are not inter-related. Means:
Step 1:
Query Table A, get a large result, then do CUD on it
_dbContect.SaveChanges();
Step 2:
Query Table B, get a large result, then do CUD on it
_dbContect.SaveChanges();
Step 3:
Query Table C, get a large result, then do CUD on it
_dbContect.SaveChanges();
Question:
1- Should I call _dbContect.ChangeTracker.Clear(); after each SaveChanges(); to save memory?
2- Is there a better way of handling _dbContect excess change tracking memory consumption?
3- Should I even be worry about change tracking memory consumption?

Related

JPA First level cache and when its filled

working with Spring data JPA and reading it Hibernate first level cache is missed, the answer says "Hibernate does not cache queries and query results by default. The only thing the first level cache is used is when you call EntityManger.find() you will not see a SQL query executing. And the cache is used to avoid object creation if the entity is already loading."
So, if If get an entity not by its Id but other criteria, if I update some property I should not see an update sql inside a transactional methods because it has not been stored int the first level cache, right?
According to the above answer, if I get some list of entities, they will not be stored in first level cache not matter the criteria I use to find them, right?
When a Transactional(propagation= Propagation.NEVER) method loads the same entity by its id two times, is not supposed it will hit the database two times because each loading will run in its own "transaction" and will have its own persistent context? What is the expected behaviour in this case?
Thanks

How does EntityFramework Core mange data internally?

I'm trying to understand how EntityFramework Core manages data internally because it influences how I call DbSets. Particularly, does it refer to in-memory data or re-query the database every time?
Example 1)
If I call _context.ToDo.Where(x => x.id == 123).First() and then in a different procedure call the same command again, will EF give me the in-memory value or re-query the DB?
Example 2)
If I call _context.ToDo.Where(x => x.id == 123).First() and then a few lines later call _context.ToDo.Find(123).Where(x => x.id == 123).Incude(x => x.Children).First(), will it use the in-memeory and then only query the DB for "Children" or does it recall the entire dataset?
I guess I'm wondering if it matters if I duplicate a call or not?
Is this affected by the AsNoTracking() switch?
What you really ask is how caching works in EF Core, not how DbContext manages data.
EF always offered 1st level caching - it kept the entities it loaded in memory, as long as the context remains alive. That's how it can track changes and save all of them when SaveChanges is called.
It doesn't cache the query itself, so it doesn't know that Where(....).First() is meant to return those specific entities. You'd have to use Find() instead. If tracking is disabled, no entities are kept around.
This is explained in Querying and Finding Entities, especially Finding entities using primary keys:
The Find method on DbSet uses the primary key value to attempt to find an entity tracked by the context. If the entity is not found in the context then a query will be sent to the database to find the entity there. Null is returned if the entity is not found in the context or in the database.
Find is different from using a query in two significant ways:
A round-trip to the database will only be made if the entity with the given key is not found in the context.
Find will return entities that are in the Added state. That is, Find will return entities that have been added to the context but have not yet been saved to the database.
In Example #2 the queries are different though. Include forces eager loading, so the results and entities returned are different. There's no need to call that a second time though, if the first entity and context are still around. You could just iterate over the Children property and EF would load the related entities one by one, using lazy loading.
EF will execute 1 query for each child item it loads. If you need to load all of them, this is slow. Slow enough to be have its own name, the N+1 selects problem. To avoid this you can load a related collection explicitly using explicit loading, eg. :
_context.Entry(todo).Collection(t=>t.Children).Load();
When you know you're going to use all children though, it's better to eagerly load all entities with Include().

What's the point of running an EF migration when you can SQL directly in database?

How to create View (SQL) from Entity Framework in ABP Framework
Not allowed to post comments because of reputation. Just trying to get more information on connecting a database to an Entity Framework, without having to switch to a code-first development style. View selected answer's response (he told the OP to basically do the same thing he was going to do in the DB but with EF, and then added an extra step where EF "...ignores..." the previous instructions...
I want to create tables and design database directly in SQL, and have the csharp library just read/write the table values (kind of like how dapper function where it isnt replacing your database, just working along side of it).
The tutorials don't talk about how to integrate your databases with your project. It either brushes over the subject, ignores it completely, or discusses how to replace it.
I don't want to do any EF migrations (i dont want/need to destroy/create database everytime i decide to run, duplicate, or transfer project). Any and all database back-track (back-up/restore) should be done with and thru SQL (within my work environment).
Just to be clear on exactly what i'm trying to learn:
How does somebody who specializes in database administration (building database schema, managing and monitoring data, and has existing database with data established) connect to project to fetch data (again, specifically referencing Dapper's Query functionality).
I want to integrate and design micro-services, some may share the same database connection or rely on another. But i just simply want to read data in a clean strongly-typed class entity, and maybe deal with insert/update somewhere else if i have to.
I would prefer to use Dapper instead of EF, but ABP is so heavily integrated with EF's design, it's more of a headache to avoid it, than it is to just go along with.
You should be able to map EF under ABP the same way as any other project using DB-first configuration.
The consistent approach I use for EF: (DB-First)
Define entities to match the table/view structure.
Define configuration classes extending EntityTypeConfiguration<TEntity> with the associated ToTable(), HasKey(), and any HasMany/HasRequired/HasOptional for relationships as needed.
In DbContext.OnModelCreating: modelBuilder.Configurations.AddFromAssembly(GetType().Assembly); to load all entity configurations. (assuming DbContext is in the same assembly as the models/configurations Substitute GetType().Assembly to point at the entity assembly.
Turn off Migrations. In DbContext constructor: Database.SetInitializer<MyDbContext>(null);
EF offers a lot more than simply mapping tables to classes. By mapping relationships between entities, EF can help generate optimized queries for retrieving data across those related entities. This can allow you to flatten data structures without returning unnecessary data, replace the need for views, and generally reduce the amount of data coming across the wire from the database to the application server.

Aggregate Root support in Entity Framework

How can we tell Entity Framework about Aggregates?
when saving an aggregate, save entities within the aggregate
when deleting an aggregate, delete entities within the aggregate
raise a concurrency error when two different users attempt to modify two different entities within the same aggreate
when loading an aggregate, provide a consistent point-in-time view of the aggregate even if there is some time delay before we access all entities within the aggregate
(Entity Framework 4.3.1 Code First)
EF provides features which allows you defining your aggregates and using them:
This is the most painful part. EF works with entity graphs. If you have an entity like Invoice and this entity has collection of related InvoiceLine entities you can approach it like aggregate. If you are in attached scenario everything works as expected but in detached scenario (either aggregate is not loaded by EF or it is loaded by different context instance) you must attach the aggregate to context instance and tell it exactly what did you changed = set state for every entity and independent association in object graph.
This is handled by cascade delete - if you have related entities loaded, EF will delete them but if you don't you must have cascade delete configured on the relation in the database.
This is handled by concurrency tokens in the database - most commonly either timestamp or rowversion columns.
You must either use eager loading and load all data together at the beginning (= consistent point of view) or you will use lazy loading and in such case you will not have consistent point of view because lazy loading will load current state of relations but it will not update other parts of aggregate you have already loaded (and I consider this as performance killer if you try to implement such refreshing with EF).
I wrote GraphDiff specifically for this purpose. It allows you to define an 'aggregate boundary' on update by providing a fluent mapping. I have used it in cases where I needed to pass detached entity graphs back and forth.
For example:
// Update method of repository
public void Update(Order order)
{
context.UpdateGraph(order, map => map
.OwnedCollection(p => p.OrderItems);
}
The above would tell the Entity Framework to update the order entity and also merge the collection of OrderItems. Mapping in this fashion allows us to ensure that the Entity Framework only manages the graph within the bounds that we define on the aggregate and ignores all other properties. It supports optimistic concurrency checking of all entities. It handles much more complicated scenarios and can also handle updating references in many to many scenarios (via AssociatedCollections).
Hope this can be of use.

Entity Framework - what's the difference between using Include/eager loading and lazy loading?

I've been trying to familiarize myself with the Entity Framework. Most of it seems straight forward, but I'm a bit confused on the difference between eager loading with the Include method and default lazy loading. Both seem like they load related entities, so on the surface it looks like they do the same thing. What am I missing?
Let's say you have two entities with a one-to-many relationship: Customer and Order, where each Customer can have multiple Orders.
When loading up a Customer entity, Entity Framework allows you to either eager load or lazy load the Customer's Orders collection. If you choose to eager load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that retrieves both the Customer's information and the Customer's Orders in one query. However, if you choose to lazy load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that only pulls the Customer's information (Entity Framework will then generate a separate SQL statement if you access the Customer's Orders collection later in your code).
Determining when to use eager loading and when to use lazy loading all comes down to what you expect to do with the entities you retrieve. If you know you only need a Customer's information, then you should lazy-load the Orders collection (so that the SQL query can be efficient by only retrieving the Customer's information). Conversely, if you know you'll need to traverse through a Customer's Orders, then you should eager-load the Orders (so you'll save yourself an extra database hit once you access the Customer's Orders in your code).
P.S. Be very careful when using lazy-loading as it can lead to the N+1 problem. For example, let's say you have a page that displays a list of Customers and their Orders. However, you decide to use lazy-loading when fetching the Orders. When you iterate over the Customers collection, then over each Customer's Orders, you'll perform a database hit for each Customer to lazy-load in their Orders collection. This means that for N customers, you'll have N+1 database hits (1 database hit to load up all the Customers, then N database hits to load up each of their Orders) instead of just 1 database hit had you used eager loading (which would have retrieved all Customers and their Orders in one query).
If you come from SQL world think about JOIN.
If you have to show in a grid 10 orders and the customer that put the order you have 2 choices:
1) LAZY LOAD ( = 11 queryes = SLOW PERFORMANCES)
EF will shot a query to retrieve the orders and a query for each order to retrieve the customer data.
Select * from order where order=1
+
10 x (Select * from customer where id = (order.customerId))
1) EAGER LOAD ( = 1 query = HIGH PERFORMANCES)
EF will shot a single query to retrieve the orders and customers with a JOIN.
Select * from orders INNER JOIN customers on orders.customerId=customer.Id where order=1
PS:
When you retrieve an object from the db, the object is stored in a cache while the context is active.
In the example that I made with LAZY LOAD, if all the 10 orders relate to the same customer you will see only 2 query because when you ask to EF to retrieve an object the EF will check if the object is in the cache and if it find it will not fire another SQL query to the DB.
Eager loading is intended to solve the N+1 Selects problem endemic to ORMs. The short version is this: If you are going to directly retrieve some number of entities and you know you will be accessing certain related entities via the retrieved entities, it is much more efficient to retrieve all the related entities up-front in one pass, as compared to retrieving them incrementally via lazy loading.
An important issue is serialization. Microsoft recommends NOT using the default lazy loading if you're dealing with serialized objects. Serialization causes ALL related properties to be called, which can start a chain reaction of related entities being queried. This really comes into play if you're returning JSON data from a controller. JSON data is obviously serialized. You'd either want to return data immediately via Eager or turn the lazyloading off in the context and employ Explicit Lazy loading.