Entity Framework 5 Get All method - entity-framework

In our EF implementation for Brand New Project, we have GetAll method in Repository. But, as our Application Grows, we are going to have let's say 5000 Products, Getting All the way it is, would be Performance Problem. Wouldn't it ?
if So, what is good implementation to tackle this future problem?
Any help will be hightly appreciated.

It could become a performance problem if get all is enumerating on the collection, thus causing the entire data set to be returned in an IEnumerable. This all depends on the number of joins, the size of the data set, if you have lazy loading enabled, how SQL Server performs, just to name a few.
Some people will say this is not desired, you could have GetAll() return an IQueryable which would defer the query until something caused the collection to be filled by going to SQL. That way you could filter the results with other Where(), Take(), Skip(), etc statements to allow for paging without having to retrieve all 5000+ products from the database.

It depends on how your repository class is set up. If you're performing the query immediately, i.e. if your GetAll() method returns something like IEnumerable<T> or IList<T>, then yes, that could easily be a performance problem, and you should generally avoid that sort of thing unless you really want to load all records at once.
On the other hand, if your GetAll() method returns an IQueryable<T>, then there may not be a problem at all, depending on whether you trust the people writing queries. Returning an IQueryable<T> would allow callers to further refine the search criteria before the SQL code is actually generated. Performance-wise, it would only be a problem if developers using your code didn't apply any filters before executing the query. If you trust them enough to give them enough rope to hang themselves (and potentially take your database performance down with them), then just returning IQueryable<T> might be good enough.
If you don't trust them, then, as others have pointed out, you could limit the number of records returned by your query by using the Skip() and Take() extension methods to implement simple pagination, but note that it's possible for records to slip through the cracks if people make changes to the database before you move on to the next page. Making pagination work seamlessly with an ever-changing database is much harder than a lot of people think.
Another approach would be to replace your GetAll() method with one that requires the caller to apply a filter before returning results:
public IQueryable<T> GetMatching<T>(Expression<Func<T, bool>> filter)
{
// Replace myQuery with Context.Set<T>() or whatever you're using for data access
return myQuery.Where(filter);
}
and then use it like var results = GetMatching(x => x.Name == "foo");, or whatever you want to do. Note that this could be easily bypassed by calling GetMatching(x => true), but at least it makes the intention clear. You could also combine this with the first method to put a firm cap on the number of records returned.
My personal feeling, though, is that all of these ways of limiting queries are just insurance against bad developers getting their hands on your application, and if you have bad developers working on your project, they'll find a way to cause problems no matter what you try to do. So my vote is to just return an IQueryable<T> and trust that it will be used responsibly. If people abuse it, take away the GetAll() method and give them training-wheels methods like GetRecentPosts(int count) or GetPostsWithTag(string tag, int count) or something like that, where the query logic is out of their hands.

One way to improve this is by using pagination
context.Products.Skip(n).Take(m);

What your referring to is known as paging, and it's pretty trivial to do using LINQ via the Skip/Take methods.
EF queries are lazily loaded which means they won't actually hit the database until they are evaluated so the following would only pull the first 10 rows after skipping the first 10
context.Table.Skip(10).Take(10);

Related

Check for object ownership with Prisma

I'm new to working with Prisma. One aspect that is unclear to me is the right way to check if a user has permission on an object. Let's assume we have Book and Author models. Every book has an author (one-to-many). Only authors have permission to delete books.
An easy way to enforce this would be this:
prismaClient.book.deleteMany({
id: bookId, <-- id is known
author: {
id: userId <-- id is known
}
})
But this way it's very hard to show an UnauthorizedError to the user. Instead, the response will be a 500 status code since we can't know the exact reason why the query failed.
The other approach would be to query the book first and check the author of the book instance, which would result in one more query.
Is there a best practice for this in Prisma?
Assuming you are using PostgreSQL, the best approach would be to use row-level-security(RLS) - but unfortunately, it is not yet officially supported by Prisma.
There is a discussion about this subject here
https://github.com/prisma/prisma/issues/5128
As for the current situation, to my opinion, it is better to use an additional query and provide the users with informative feedback rather than using the other method you suggested without knowing why it was not deleted.
Eventually, it is up to you to decide based on your use case - whether or not it is important for you to know the reason for failure.
So this question is more generic than prisma - it is also true when running updates/deletes in raw SQL.
When you have extra where clauses to check for ownership, it's difficult to infer which of the clause(s) caused that if the update does not happen, without further queries.
You can achieve this with row level security in postgres, but even that does not come out the box and involves custom configuration to throw specific exceptions when rows are not found due to row level security rules. See this answer for more detail.
I tend to think that doing customised stuff like this is rarely worth the tradeoff, unless you need specialised UX for an uncommon circumstance.
What I would suggest instead in this case is to keep it simple and just use extra queries to check for ownership, but optimise the UX optimistically for the case where the user does own the entity and keep that common and legitimate usecase to a single query.
That is, catch the exception from primsa (or the fact that the update returns 0 rows, or whatever it is in different cases), and only then run a specific select for ownership, to check if that was the reason the update failed.
This is a nice tradeoff because it keeps things simple, and only runs extra queries in the (usually) far less common failure case.
Even having said all that, the broader caveat as always is that probably the extra queries simply won't matter regardless! It's, in 99% of cases, probably best to just always run the extra ownership query upfront as a pattern to keep things as simple as possible, which is what you really want to optimise for over performance until you're running at significant scale.

Disable global query filters *inside* another global query filter

Currently using EF Core 3.1 (although upgrading to EF Core 5 soon, so if this becomes easier in that version, I'm still interested).
I'm implementing some global query filters to restrict access to entity data based on various criteria.
One of the filters operates on a type called "Encounter". An Encounter is linked to a single Person (via foreign key PersonID), but a Person can have many Encounters. The Person record has a navigation property to all linked encounters via Person.Encounters.
The query filter for Encounter is to restrict records based on the value of a particular property on Encounter (call it EncounterType). This works fine if my query filter is something like this:
x => x.EncounterType == "MyType"
However, I need to extend this logic so that an Encounter is allowed/loaded if any encounter linked to the Person meets the criteria.
So, the query filter for Encounter needs to be something like this:
x => x.Person.Encounters.Any(y => y.EncounterType == "MyType")
This does not currently work, because we run into a cycle and a StackOverflowException1; the global query filter for Encounter ends up expanding itself infinitely. (I think it's because we access the Person.Encounters navigation property and evaluate the same Encounter query filter for each encounter in Person.Encounters.)
What I really want to do is to completely disable the global query filter for any navigation properties used in this particular expression. In this scenario, I want to consider all other Encounters linked to the Person, without further filtering.
I know when actually running a query, I can call IgnoreQueryFilters(). I want something like that, but available inside the expression or when adding the query filter with HasQueryFilter().
Is this possible? Is there any other way I can accomplish this with global query filters?
[1] Which, while frustrating, is pretty cool for me...I've never posted a question about an actual stack overflow on StackOverflow :)

SOQL Query in loop

The following APEX code:
for (List<MyObject__c> list : [select id,some_fields from MyObject__c]) {
for (MyObject__c item : list) {
//do stuff
}
}
Is a standard pattern for processing large quantitues of data - it will automatically break up a large result set into chucks of 200 records, so each List will contain 200 objects.
Is there any difference between the above approach, and below:
for (List<MyObject__c> list : Database.query('select...')) {
for (MyObject__c item : list) {
//do stuff
}
}
used when the SOQL needs to by dynamic?
I have been told that the 2nd approach is losing data, but I'm not sure of the details, and the challenge is to work out why, and to prevent data loss.
Where did you read that using Dynamic SOQL will result in data loss in this scenario? This is untrue AFAIK. The static Database.query() method behaves exactly the same as static SOQL, except for a few small differences.
To answer your first question, the main difference between your approaches is that the first is static (the query is fixed), and the second allows you to dynamically define the query, as the name suggests. Using your second approach with Dynamic SOQL will still chunk the result records for you.
This difference does lead to some other subtle considerations - Dynamic SOQL doesn't support variable bindings other than basic operations. See Daniel Ballinger's idea for this feature here. I also need to add the necessary security caveat - if you're using Dynamic SOQL, do not construct the query based on input, as that would render your application vulnerable to SOQL injection.
Also, I'm just curious, what is your use case/how large of quantities of data are you talking about here? It might be better to use batch Apex, depending on your requirement.

Optimizing sling queries

I have a cq5 component that needs to query a given path for a couple of other component types like this:
String query = "select * from nt:unstructured where jcr:path like '/content/some/path/%' and ( contains(sling:resourceType, 'resourceType1') or contains(sling:resourceType, 'resourceType2')) ";
Iterator<Resource> resources = resourceResolver.findResources( query,"sql");
Unfortunately if it is working through a path with a lot of content the page times out. Is there any way to optimize a function like this or tips on improving performance?
1. Use some more specific JCR type than nt:unstructured.
I guess you are looking for page nodes, so try cq:Page or (even better) cq:PageContent.
2. Denormalize your data.
If I understand your query correctly, it should return pages containing resource1 or resource2. Rather than using contains() predicate, which is very costly and prevents JCR from using index, mark pages containing these resources with an additional attribute. Eg., set jcr:content/containsResource1 and jcr:content/containsResource2 properties appropriate and then use them in your query:
select * from cq:PageContent where (containsResource1 is not null or containsResource2 is not null) and jcr:path like '/content/some/path/%'
You could use EventHandler or SlingPostProcessor to set the properties automatically when the resource1 or resource2 is added.
I have added "jackrabbit" and "jcr" tags to your question - I'm not an expert in JCR queries but one of those experts might want to comment on the query statement that you are using and if and how that can be optimized.
That being said, your "page times out" statement seems to imply that it's the client browser that is timing out as it does not receive data for too long. I would first check (with a debugger or log statements) if it's really the findResources call that takes too long, or if it's code that runs after that that's the culprit.
If findResources is slow you'll need to optimize the query or redesign your code to make it asynchronous, for example have the client code first get the HTML page and then get the query results via asynchronous calls.
If code that runs after findResources causes the timeout, you might redesign it to start sending data to the browser as soon as possible, and flush the output regularly to avoid timeouts. But if you're finding lots of results that might take too long for the user anyway and a more asynchronous behavior would then be needed as well.

How can I filter DBIX::Class resultsets with external data?

Using DBIx::Class and I have a resultset which needs to be filtered by data which cannot be generated by SQL. What I need to do is something effectively equivalent to this hypothetical example:
my $resultset = $schema->resultset('Service')->search(\%search);
my $new_resultset = $resultset->filter( sub {
my $web_service = shift;
return $web_service->is_available;
} );
Reading through the docs gives me no clue how to accomplish a strategy like this.
You can’t really, due to the goals for which DBIC result sets are designed:
They compile down to SQL and run a single query, which they do no earlier than when you ask for results.
They are composable.
Allowing filtering by code that runs on the Perl side would make it extremely hairy to achieve those properties, and would hide the fact that such result sets actually run N queries when composed.
Why do you want this, anyway? Why is simply retrieving the results and filtering them yourself insufficient?
Encapsulation? (Eg. hiding the filtering logic in your business logic layer but kicking off the query in the display logic layer.) Then write a custom ResultSet subclass that has an accessor that runs the query and does the desired filtering.
Overhead? (Eg. you will reject most results so you don’t want the overhead of creating objects for them.) Then use HashRefInflator.
If you filter the results and end up with a list of rows you can create a new resultset like this: http://search.cpan.org/~abraxxa/DBIx-Class-0.08127/lib/DBIx/Class/Manual/Cookbook.pod#Creating_a_result_set_from_a_set_of_rows.
This may keep things consistent in keeping the results as a resultset but I imagine you would not be able to chain it or use any other resultset methods on it.