Materialize partial set of results with EF Core 2.1

Materialize partial set of results with EF Core 2.1 - entity-framework-core

Let's say I have a large collection of tasks stored in DB and I want to retrieve the latest one according to requesting user's permissions. The permissions checking logic is complex and not related to the persistence layer, hence I can't put it in an SQL query. What I'm doing today is retrieving ALL tasks from DB ordered by descending date, then filter them by permissions set and taking first one. Not a perfect solution: I retrieve thousands of objects when I need only one.
My question is: how can I materialize objects coming from DB until I find one that matches my criteria and discards rest of results?
I thought about one solution, but couldn't find information regarding EF Core behavior in this case and don't know how to check it myself:
Build the IQueryable, cast to IEnumerable, then iterate over it and take the first good task. I know that IQueryable part will be executed on Server and IEnumerable on the client, but I don't know if all task will be materialized before applying FilterByPermissions or it will be performed by demand? And I also don't like the synchronous nature of this solution.
IQueryable<MyTask> allTasksQuery = ...;
IEnumerable<MyTask> allTasksEnumerable = allTasksQuery.AsEnumerable();
IEnumerable<MyTask> filteredTasks = FilterByPermissions(allTasksEnumerable);
MyTask latestTask = filteredTasks.FirstOrDefault();
The workaround could be retrieving small sets of data (pages of 50 for example) until one good task is found but I don't like it.

Related

How to fetch complex MongoDB Data from Kedro?

I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before).
My Data is stored in a MongoDB instance over multiple “Tables”. One table are my usernames. First, I want to fetch them.
Thereafter, based on the usernames I get, I would like to fetch Data from three “Tables” and merge them.
How should I do this best in Kedro?
Shall I put everything in a Custom Dataset? Fetch only the Usernames and do the rest in a Part of the pipeline?

So this is an interesting one - Kedro has been designed in a way that the tasks have no knowledge of the IO that is required to provide/save the data. This (for good reasons) requires you to cross this boundary.
My recommendation is to go down the custom dataset, but potentially go a little further and make it return the 3 tables you need directly. I.e. do the username filter logic in this stage as well.
It also perfectly fine to raise a NotImplementedError on save() if you're not going do that.

Disable global query filters inside another global query filter

Currently using EF Core 3.1 (although upgrading to EF Core 5 soon, so if this becomes easier in that version, I'm still interested).
I'm implementing some global query filters to restrict access to entity data based on various criteria.
One of the filters operates on a type called "Encounter". An Encounter is linked to a single Person (via foreign key PersonID), but a Person can have many Encounters. The Person record has a navigation property to all linked encounters via Person.Encounters.
The query filter for Encounter is to restrict records based on the value of a particular property on Encounter (call it EncounterType). This works fine if my query filter is something like this:
x => x.EncounterType == "MyType"
However, I need to extend this logic so that an Encounter is allowed/loaded if any encounter linked to the Person meets the criteria.
So, the query filter for Encounter needs to be something like this:
x => x.Person.Encounters.Any(y => y.EncounterType == "MyType")
This does not currently work, because we run into a cycle and a StackOverflowException1; the global query filter for Encounter ends up expanding itself infinitely. (I think it's because we access the Person.Encounters navigation property and evaluate the same Encounter query filter for each encounter in Person.Encounters.)
What I really want to do is to completely disable the global query filter for any navigation properties used in this particular expression. In this scenario, I want to consider all other Encounters linked to the Person, without further filtering.
I know when actually running a query, I can call IgnoreQueryFilters(). I want something like that, but available inside the expression or when adding the query filter with HasQueryFilter().
Is this possible? Is there any other way I can accomplish this with global query filters?
[1] Which, while frustrating, is pretty cool for me...I've never posted a question about an actual stack overflow on StackOverflow :)

Taking a snapshot of a sequence/pattern query in Siddhi

I am currently working on a mechanism to extract the intermediate states of CEP queries, so that I can use this information to restore the query state at a later point in time (for example, after node failure).
Extracing intermediate state works fine for window-based (e.g., timeBatch query) or aggregate-based queries (e.g., sum), given that I can create a custom window/aggregate function and extract the intermediate state (for example, the current events in the window or the current sum).
Even for a sequence/pattern query, I can see how extracting the intermediate state can be useful for restoration purposes. For example, if I have a pattern query where I look for A -> B -> C, and so far, A -> B has already been recognised, I can try extracting this sub-pattern as part of a query snapshot. Later, upon restoration, I only need to search for the events of type C to satisfy the pattern.
So far, I have not come across a viable approach of extracting this information. Does anyone have any ideas on how I can go about this issue? Thanks!

You can start by referring to SnapshotService, which can be used to get the current serialized states of queries.

Entity Framework 5 Get All method

In our EF implementation for Brand New Project, we have GetAll method in Repository. But, as our Application Grows, we are going to have let's say 5000 Products, Getting All the way it is, would be Performance Problem. Wouldn't it ?
if So, what is good implementation to tackle this future problem?
Any help will be hightly appreciated.

It could become a performance problem if get all is enumerating on the collection, thus causing the entire data set to be returned in an IEnumerable. This all depends on the number of joins, the size of the data set, if you have lazy loading enabled, how SQL Server performs, just to name a few.
Some people will say this is not desired, you could have GetAll() return an IQueryable which would defer the query until something caused the collection to be filled by going to SQL. That way you could filter the results with other Where(), Take(), Skip(), etc statements to allow for paging without having to retrieve all 5000+ products from the database.

It depends on how your repository class is set up. If you're performing the query immediately, i.e. if your GetAll() method returns something like IEnumerable<T> or IList<T>, then yes, that could easily be a performance problem, and you should generally avoid that sort of thing unless you really want to load all records at once.
On the other hand, if your GetAll() method returns an IQueryable<T>, then there may not be a problem at all, depending on whether you trust the people writing queries. Returning an IQueryable<T> would allow callers to further refine the search criteria before the SQL code is actually generated. Performance-wise, it would only be a problem if developers using your code didn't apply any filters before executing the query. If you trust them enough to give them enough rope to hang themselves (and potentially take your database performance down with them), then just returning IQueryable<T> might be good enough.
If you don't trust them, then, as others have pointed out, you could limit the number of records returned by your query by using the Skip() and Take() extension methods to implement simple pagination, but note that it's possible for records to slip through the cracks if people make changes to the database before you move on to the next page. Making pagination work seamlessly with an ever-changing database is much harder than a lot of people think.
Another approach would be to replace your GetAll() method with one that requires the caller to apply a filter before returning results:
public IQueryable<T> GetMatching<T>(Expression<Func<T, bool>> filter)
{
// Replace myQuery with Context.Set<T>() or whatever you're using for data access
return myQuery.Where(filter);
}
and then use it like var results = GetMatching(x => x.Name == "foo");, or whatever you want to do. Note that this could be easily bypassed by calling GetMatching(x => true), but at least it makes the intention clear. You could also combine this with the first method to put a firm cap on the number of records returned.
My personal feeling, though, is that all of these ways of limiting queries are just insurance against bad developers getting their hands on your application, and if you have bad developers working on your project, they'll find a way to cause problems no matter what you try to do. So my vote is to just return an IQueryable<T> and trust that it will be used responsibly. If people abuse it, take away the GetAll() method and give them training-wheels methods like GetRecentPosts(int count) or GetPostsWithTag(string tag, int count) or something like that, where the query logic is out of their hands.

One way to improve this is by using pagination
context.Products.Skip(n).Take(m);

What your referring to is known as paging, and it's pretty trivial to do using LINQ via the Skip/Take methods.
EF queries are lazily loaded which means they won't actually hit the database until they are evaluated so the following would only pull the first 10 rows after skipping the first 10
context.Table.Skip(10).Take(10);

Couchbase: Synchronizing views with a bucket

I've got a question for you couchbase pros: Is it possible to synchronize a subset of documents (eg. the documents within a view) with an other bucket?
So that the other bucket documents are always a direct subset of the "master" bucket?
if so, isn't that to much expensive in terms of perfomance? or does couchbase have any functionality to only create deeplinks to the documents instead of copying it?
Alternatively: is it possible to write views on views?
Thank you in advance!
--- EDIT ----
Let's say I want to have two sets (buckets) of documents S1 and S2. S2 is a subset of S1. Each set contains the same views V1, V2 and V3 since I want to be able to query any of them with the same logic/interface. In my case set S2 is build per user/company/store/whatever, in production there should be like 1000ish subsets S2 - to stay abstract let's call them S2a S2b and S2c.
The selection of documents which to be contained in any subset is done by a filtering instance (for example a view). Let's call these filtering instances F1 for filtering S1 to S2 hence F1a, F1b and F1c.
So with my actual knowledge of couchbase this results in the following design/view architecture: I've got the three "base" views to display V1,V2 and V3, and to realize S2a, S2b and S2c I must create the design views S2aV1, S2aV2, S2aV3, S2bV1, S2bV2, etc. (9 Views).
One could say "Well choose your keys wisely and you can avoid the sub views" but in my opinion this isn't that easy because of the following circumstances: In worst case the filter parameters change every minute and contain many WHERE IN constraints which could (at my actual point of view) not be handled efficiently querying k/v lists.
This leads to the following thoughts and the question I initially asked. If I use the same views in any subset (defined by a filter) shouldn't it be possible to build up an entity which helps me handling complex filtering? For example a function which is called during runtime while generating the view output? This could look like /design/view?filter=F1 or something like that.
Or do you have any other ideas to solve this problem? Or should I use SQL since it's more capable of handling frequently changing filters?

Generally speaking for most models you don't really need to have bucket "subsets", is there a particular reason you are trying to do this and why you would want that data broken out? You can also query your views, or instead of a view on a view, you can just make a separate view that maps/filters further based on your needs (i.e. does the same job as a view on a view).

We are working on Elastic Search integration. Maybe better for your use case

I think what you want to do is write a view on your original bucket, and then copy the key/values from that view, to be documents in a new bucket.
It shouldn't be hard to write an automated framework for managing this so that you can keep the derived data up to date in near real time.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse