I'm trying to use Entity Framework for my model/data access, and running into speed issues, hopefully someone can assist?
What I've been doing is using the EF diagram with the default code generator to generate partial classes describing anything that is going to be persisted. I then have partial classes with methods and non-persisted properties. These might be simple things like full name as the concatenated first/last names, or derived from related entities, such as total stock as the sum of the collection of stock locations' quantity.
Any methods accessing related entities do work, but seem to be very slow. Here's an example of an especially slow one, it takes about 6-7 seconds:
Quick description of entities involved:
Supplier --> supplies many SupplierLines, each has cost price
SupplierLine --> broken down into StockLines
StockLine --> has many Locations, each location has quantity
So I'm trying to add a method to get the total stock value from a supplier, i.e. mySupplier.StockValue() which should obviously be the total of cost price x total quantity for each supplier line and its stock lines.
I've done this as a function in Supplier as:
Public Function StockValue() As Decimal
Return SupplierLines.
Sum(Function(sul) sul.LastPrice * sul.StockLines.Sum(Function(skl) skl.Locations.Sum(Function(l) l.Quantity)))
End Function
Which gives correct results, but takes forever to do so.
Any thoughts as to how I can get better results?
I want to keep my model classes persistence-ignorant
I want to keep all my logic compile-checked
I want to everything to be easily unit-testable using fake data sources
I don't really want to pre-load this information, because it isn't always needed
Your problem is most probably lazy loading. If you load only Supplier entity it doesn't load its realted SupplierLine instances and they related StockLine instances and their related Location instances. If this is really case (if you didn't Include them in query retrieving Supplier) the situation is as follows:
SupplierLines. - executes query to database to get all lines for current Supplier
sul.StockLines. - executes separate query for each supplier line to get its stock lines
skl.Locations. - executes separate query for each stock line to get its locations
So depending on amount of data you have in these collections you can end up with tens to thousands sql queries executed in your first call to StockValue. Next call will be fast because data are already loaded.
If you want to avoid it you must retrieve supplier with all its realted data:
context.Supplier.Include("SupplierLines.StockLines.Locations").Where(...);
I found an Include()... method that extends IEnumerable, and use that. It resolves the performance issues, while maintaining ignorance of the context.
Related
After reading dozens of articles and watching hours of videos, I don't seem to get an answer to a simple question:
Should static data be included in the events of the write/read models?
Let's take the oh-so-common "orders" example.
In all examples you'll likely see something like:
class OrderCreated(Event):
....
class LineAdded(Event):
itemID
itemCount
itemPrice
But in practice, you will also have lots of "static" data (products, locations, categories, vendors, etc).
For example, we have a STATIC products table, with their SKUs, description, etc. But in all examples, the STATIC data is never part of the event.
What I don't understand is this:
Command side: should the STATIC data be included in the event? If so, which data? Should the entire "product" record be included? But a product also has a category and a vendor. Should their data be in the event as well?
Query side: should the STATIC data be included in the model/view? Or can/should it be JOINED with the static table when an actual query is executed.
If static data is NOT part of the event, then the projector cannot add it to the read model, which implies that the query MUST use joins.
If static data IS part of the event, then let's say we change something in the products table (e.g. typo in the item description), this change will not be reflected in the read model.
So, what's the right approach to using static data with ES/CQRS?
Should static data be included in the events of the write/read models?
"It depends".
First thing to note is that ES/CQRS are a distraction from this question.
CQRS is simply the creation of two objects where there was previously only one. -- Greg Young
In other words, CQRS is a response to the idea that we want to make different trade offs when reading information out of a system than when writing information into the system.
Similarly, ES just means that the data model should be an append only sequence of immutable documents that describe changes of information.
Storing snapshots of your domain entities (be that a single document in a document store, or rows in a relational database, or whatever) has to solve the same problems with "static" data.
For data that is truly immutable (the ratio of a circle's circumference and diameter is the same today as it was a billion years ago), pretty much anything works.
When you are dealing with information that changes over time, you need to be aware of the fact that that the answer changes depending on when you ask it.
Consider:
Monday: we accept an order from a customer
Tuesday: we update the prices in the product catalog
Wednesday: we invoice the customer
Thursday: we update the prices in the product catalog
Friday: we print a report for this order
What price should appear in the report? Does the answer change if the revised prices went down rather than up?
Recommended reading: Helland 2015
Roughly, if you are going to need now's information later, then you need to either (a) write the information down now or (b) write down the information you'll need later to look up now's information (ex: id + timestamp).
Furthermore, in a distributed system, you'll need to think about the implications when part of the system is unavailable (ex: what happens if we are trying to invoice, but the product catalog is unavailable? can we cache the data ahead of time?)
Sometimes, this sort of thing can turn into a complete tangle until you discover that you are missing some domain concept (the invoice depends on a price from a quote, not the catalog price) or that you have your service boundaries drawn incorrectly (Udi Dahan talks about this often).
So the "easy" part of the answer is that you should expect time to be a concept you model in your solution. After that, it gets context sensitive very quickly, and discovering the "right" answer may involve investigating subtle questions.
I am dealing with CoreData, for training, I decided to create a small application for recording user income and expenses. CoreData tutorials all contain To-Do-List examples, and I haven't found any good examples that would help me.
// MARK: - Grammar
// I want to apologize for grammatical errors in the text. Unfortunately,
// English is not my native language, so in some places I used a translator.
If something is not clear, I will definitely try to explain it again.
When I began to think over how I would implement the application, I assumed that the most convenient way would be to save all user operations and make calculations in the application in the right places. So far, abstract, since It seems to me that this has little to do with the question, if you need to be more precise, I can provide a complete idea.
So, I'm going to save the user model, which will have the following data:
User operations (Operation type) - all operations will be saved, each operation includes the category for which the operation was performed, as well as the amount in currency.
User-selected categories (Category Type) - Categories that will be used for expenses or income when adding an operation.
Wallets (Type Wallet) - User's wallets, Everything is simple, the name, and the balance on it.
Budget Units (BudgetUnit Type) - These are user budgets, contains a category, and a budget for it. For example: Products - 10.000 $
When I started building dependencies in CoreData, I got a little strange behavior.
That is, the user has a relationship on the same category model as the Budget Unit and Operation. Something tells me that it won't work that way.
I want the user categories to be independent, he selected them, and I'm going to display them on the main screen, and each operation will have its own category model
In the picture above, the category model is used 3 times, the same model.
This is roughly how I represent the data structure that I would like to see. Different models have their own category model, independently of the others.
I think it could be implemented using 3 different models with the same values, but it seems to me that this approach is considered wrong.
So how do you properly implement the data model so that everything works as expected? I would be grateful for any help!
--- EDIT ---
As a solution to the problem, I can create multiple entities as Category (Example bellow)
But I don't know if this is good practice
I looked into several other open source projects and saw a solution to the problem.
I hope this helps someone in the future.
There is no need to save the categories for the user, you can simply save the categories in the application by adding the IsSelected and ID parameter to them in order to change these parameters when you select a category, and immediately understand which ones you need to display.
For budgets and operations (transactions) , we only need to save the category ID to immediately display the correct one.
For example:
Thanks #JoakimDanielson and #Moose for helping. It gave me a different view of the subject.
Let's say I have a large collection of tasks stored in DB and I want to retrieve the latest one according to requesting user's permissions. The permissions checking logic is complex and not related to the persistence layer, hence I can't put it in an SQL query. What I'm doing today is retrieving ALL tasks from DB ordered by descending date, then filter them by permissions set and taking first one. Not a perfect solution: I retrieve thousands of objects when I need only one.
My question is: how can I materialize objects coming from DB until I find one that matches my criteria and discards rest of results?
I thought about one solution, but couldn't find information regarding EF Core behavior in this case and don't know how to check it myself:
Build the IQueryable, cast to IEnumerable, then iterate over it and take the first good task. I know that IQueryable part will be executed on Server and IEnumerable on the client, but I don't know if all task will be materialized before applying FilterByPermissions or it will be performed by demand? And I also don't like the synchronous nature of this solution.
IQueryable<MyTask> allTasksQuery = ...;
IEnumerable<MyTask> allTasksEnumerable = allTasksQuery.AsEnumerable();
IEnumerable<MyTask> filteredTasks = FilterByPermissions(allTasksEnumerable);
MyTask latestTask = filteredTasks.FirstOrDefault();
The workaround could be retrieving small sets of data (pages of 50 for example) until one good task is found but I don't like it.
I am working on an app where I use core data.
I already have tried to do it with one entity but that didn't work.
But I now have like twenty entities and my question is: is there a limit to the number of entities or a number recommended?
Is there a better way to store that amount of data?
UPDATE:
What I am storing are grads from school but not the A,b,c,d,e,f but a number from 1 to 10. And each grad has is own weighing(amount of times a number count) like some grad count 2 time because the are more imported.
So i first thought to have an array with a string for the name of the subject and then to array one store's the grad the other the corresponding weighing.
Like this:
var subjects: [String,[Int],[Int]]
but this isn't possible and I don't even know how I should put this in core data and get it back properly.
Because I couldn't figure it out, I thought of just making a entity for each subject but there are a lot of them so there for this question.
There's no limit to the number of entities, but it's possible to go overboard and create more than you actually need. The recommended number is "as many as you need and no more", which obviously will vary a great deal depending on the nature of the data and how the app uses it. Whether there's a better way than your current approach is totally dependent on the fine details of exactly what you're doing, and so is impossible to answer without a far more detailed question.
You could setup a Subject entity that has one-to-many relationships to ordered sets of Grade and Weight, like so:
However, each grade apparently has a corresponding weight, so it would be more accurate to store each grade's weight in the Grade entity:
This still may not represent your real-world model.
If your subject is something general, like math or english, you could have more than one subject per grade, (e.g., algebra, geometry, trigonometry), or more than one level per subject (e.g., algebra 1, algebra 2) which may or may not have a different grade.
If your subject is very specific, your data may end up spread across unique one-to-one relationships, instead of one-to-many relationships.
You would also need to consider whether you can use ordered or unordered relationships, or whether an attribute exists that you can use to sort an entity.
You should consider these different facets of what you're trying to model (as well as the specific fetches you'd want to perform), before you try to design or implement the model, to allow you to efficiently represent this particular object graph.
I find I'm confused about lazy loading, etc.
First, are these two statements equivalent:
(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");
(2) Eager loading:
_flaggedDates = context.FlaggedDates;
In other words, in (1) the "Includes" cause the navigation collections/properties to be loaded along with the specific collection requested, regardless of the fact that you are using lazy loading ... right?
And in (2), the statement will load all the navigation entities even though you do not specifically request them, because you are using eager loading ... right?
Second: even if you are using eager loading, the data will not actually be downloaded from the database until you "enumerate the enumerable", as in the following code:
var dates = from d in _flaggedDates
where d.dateID = 2
select d;
foreach (FlaggedDate date in dates)
{
... etc.
}
The data will not actually be downloaded ("enumerated") until the foreach loop ... right? In other words, the "var dates" line defines the query, but the query is not executed until the foreach loop.
Given that (if my assumptions are correct), what's the real difference between eager loading and lazy loading?? It seems that in either case, the data does not appear until the enumeration. Am I missing something?
(My specific experience is with code-first, POCO development, by the way ... though the questions may apply more generally.)
Your description of (1) is correct, but it is an example of Eager Loading rather than Lazy Loading.
Your description of (2) is incorrect. (2) is technically using no loading at all, but will use Lazy Loading if you try to access any non-scalar values on your FlaggedDates.
In either case, you are correct that no data will be loaded from your data store until you attempt to "do something" with the _flaggedDates. However, what happens is different in each case.
(1): Eager loading: as soon as you begin your for loop, every one of the objects that you have specified will get pulled from the database and built into a gigantic in-memory data structure. This will be a very expensive operation, pulling an enormous amount of data from your database. However, it will all happen in one database round trip, with a single SQL query getting executed.
(2): Lazy loading: When your for loop begins, it will only load the FlaggedDates objects. However, if you access related objects inside your for loop, it will not have those objects loaded into memory yet. The first attempt to retrieve the scheduledSchools for a given FlaggedDate will result in either a new database roundtrip to retrieve the schools, or an Exception being thrown because your context has already been disposed. Since you'd be accessing the scheduledSchools collection inside a for loop, you would have a new database round trip for every FlaggedDate that you initially loaded at the beginning of the for loop.
Reponse to Comments
Disabling Lazy Loading is not the same as Enabling Eager Loading. In this example:
context.ContextOptions.LazyLoadingEnabled = false;
var schools = context.FlaggedDates.First().scheduledSchools;
The schools variable will contain an empty EntityCollection, because I didn't Include them in the original query (FlaggedDates.First()), and I disabled lazy loading so that they couldn't be loaded after the initial query had been executed.
You are correct that the where d.dateID == 2 would mean that only the objects related to that specific FlaggedDate object would be pulled in. However, depending on how many objects are related to that FlaggedDate, you could still end up with a lot of data going over that wire. This is due to the way the EntityFramework builds out its SQL query. SQL Query results are always in a tabular format, meaning you must have the same number of columns for every row. For every scheduledSchool object, there needs to be at least one row in the result set, and since every row has to contain at least some value for every column, you end up with every scalar value on your FlaggedDate object being repeated. So if you have 10 scheduledSchools and 10 interviews associated with your FlaggedDate, you'll end up with 20 rows that each contain every scalar value on FlaggedDate. Half of the rows will have null values for all the ScheduledSchool columns, and the other half will have null values for all of the Interviews columns.
Where this gets really bad, though, is if you go "deep" in the data you're including. For example, if each ScheduledSchool had a students property, which you included as well, then suddenly you would have a row for each Student in each ScheduledSchool, and on each of those rows, every scalar value for the Student's ScheduledSchool would be included (even though only the first row's values end up getting used), along with every scalar value on the original FlaggedDate object. It can add up quickly.
It's difficult to explain in writing, but if you look at the actual data coming back from a query with multiple Includes, you will see that there is a lot of duplicate data. You can use LinqPad to see the SQL Queries generated by your EF code.
No difference. This was not true in EF 1.0, which didn't support eager loading (at least not automatically). In 1.0, you had to either modify the property to load automatically, or call the Load() method on the property reference.
One thing to keep in mind is that those Includes can go up in smoke if you query across multiple objects like so:
from d in ctx.ObjectDates.Include("MyObjectProperty")
from da in d.Days
ObjectDate.MyObjectProperty will not be automatically loaded.