Splitting up ODATA query with multiple ApplyTo returns incorrect TotalCount - entity-framework

I'm using an ODATA-enabled grid with ASP.NET Core 6 and EF Core 3 backend. The grid applies sorting, filtering and paging with ODATA which I can easily integrate into EF Core using query.ApplyTo(efContext.Table).
There is a special requirement though: all data rows are grouped in sets, and filtering should not apply per row but rather per set of rows.
Let's say the rows are products, which are bundled in packages. A user can only buy the whole package; however, they need to be able to search the products in detail.
For example: there are columns such as Price or Brand that can be used for filtering. If a user filters on Price <= 100 && Brand = 'Contoso' then all sets need to be shown where at least one row satisfies the filter. All rows in a set need to be shown if one passes the filter.
Currently I implemented this by calling ApplyTo twice, with different AllowedQueryOptions:
public IQueryable<Product> ListPackages(ODataQueryOptions<Product> query)
{
var packageIds = ((IQueryable<Product>)query.ApplyTo(efContext.Products, AllowedQueryOptions.Filter))
.Select(product => product.PackageID)
.Distinct()
.ToHashSet();
var packages = efContext.Products.Where(product => packageIds.Contains(product.PackageID).AsQueryable();
return (IQueryable<Product>)query.ApplyTo(packages, AllowedQueryOptions.Select | AllowedQueryOptions.OrderBy | AllowedQueryOptions.Top | AllowedQueryOptions.Skip | AllowedQueryOptions.Count | AllowedQueryOptions.Expand);
}
It seems that the TotalCount that is returned, is based on the results of the first ApplyTo() call, even though Count is not applied there. The second call is actually the one that returns the total set of rows. How could I fix this?

Related

How should I aggregate some columns in ObjectionJS while eager loading a large number of relations?

I'm using Objection in a project that often needs to grab data from a large number of related tables. In one of my queries, I need to get a subset of columns and sum two of those columns. However, I can't seem to find a way to build that query without having to write every single column in each of 6 or so tables into a group by. That seems too inefficient to be the only way that this can be done.
Example code:
ModelA.query()
.select()
.sum('columnA')
.sum('columnB')
.withGraphJoined({
ModelB: true,
ModelC: true,
ModelD: {
ModelE: true,
ModelF: true
}
})
.where('columnC', value)
// Do I just have to enter every single column in models A - F?
// Putting a list of columns into the select doesn't seem to prevent the
// query builder from including every column in each table
.groupBy()
Any thoughts?
EDIT: I've just noticed another problem. Since withGraphJoined requires that an id be selected for each table, I can't actually use groupBy effectively, since that would create a separate row for each individual id

Search functions inadequately searching large dataframe in R

first time asking a question on here and really hoping to get some help. I don't believe this question is out there yet.
I have a dataframe of 7,000,000+ observations, each with 140 variables. I am trying filter the data down to a smaller cohort using a set of multiple criteria, any of which would allow for inclusion in the smaller, filtered dataset.
I have tried two methods to search my data:
The first strategy does filter_all() on all variables and searches for my criteria for inclusion
filteredData <- filter_all(rawData, any_vars(. %in% c(criteria1, criteria2, criteria3)))
The second strategy does a series of which() functions, also trying to identify every row that contains one of my criteria.
filteredData <- rawData[which(rawData$criteria1 == "criteria" | rawData$criteria2 == "criteria | rawData$criteria3 == "criteria"),]
These results will accurately pull one or two rows meeting this criteria, however I don't believe all 7,000,000 rows are being searched. I added a row label to my rawData set and saw that the function successfully pulled row #60,192. I am expecting hundreds of rows in the final search and am very confused why only a couple from early on in the dataframe get accurately identified.
My questions:
Do the filter_all() and which() functions have size limits that they stop searching after?
Does anyone have a suggestion on how to filter/search based on multiple criteria on a very large dataset?
Thank you!

Active Record efficient querying on multiple different tables

Let me give a summary of what I've been attempting to do and the efficiency issues I've been running into:
Essentially I want my users to be able to select parameters to filter data from my database, then I want to pass relevant data which passes those filters from the controller.
However, these filters query on data from multiple different tables (that is, about 5-6 different tables), some of which are quite large (as in 100k+ rows). These tables are all related to what I want to show, e.g. Here is a bond that meets so and so criteria, which is issued by so and so issuer, which must meet these criteria, and so on.
From an end result, I only really need about 100 rows after querying based on the parameters given by the user, but it feels like I need to look at everything in every table because I dont know how strict the filters will be beforehand. e.g. With a starting universe of 100k sets of data, passing filter f1,f2 of Table 1 might leave 90k, but after passing through filter f3 of table 2, f4,f5,f6 of table 3, and so ..., we might end up with 100 or less sets of data that pass these parameters because the last filters checked might be quite strict.
How can I go about querying through these multiple different tables efficiently?
Doing a join between them seems like it'd yield some time complexity of |T_1||T_2||T_3||T_4||T_5||T_6| where T_i is the "size" of table i.
On the other hand, just looking through the other tables based off the ids of the ones that pass the previous filter (as in, id 5,7,8 pass filters in T_1, which of those ids then pass filters in T_2, then which of those pass filters in T_3 and so on) looks like it might(?) have time complexity of |T_1| + |T_2| + ... + |T_6|.
I'm relatively new to Ruby on Rails, so im not entirely sure all of the tools at my disposal that could help with optimizing this, but at the same time I'm not entirely sure how to best approach this algorithmically.

Entity Framework with OrderBy, Take and Skip can give incorrect results

So I want to return paged data from a query:
var data = context.MyTable.OrderByDescending(r => r.Field1)
.Skip(10)
.Take(10)
This will give me the second page of 10 rows order by Field1
But lets say all rows in the table have the same value for Field1, the data return by the Skip/Take is not correct. I've seen where the 2nd page may contain rows that where already returned in Page 1.
Note using EF 6.1.3
It would seem that to get the correct results, I need to ensure that the Ordering, results in a unique order of the data. So I add another column to the OrderBy, the Unique Id of the table.
var data = context.MyTable.OrderByDescending(r => r.Field1).ThenBy(r => r.FieldId)
.Skip(10)
.Take(10)
I've not found any documentation that confirms I need to do this, or is this a bug in EF?

Apex query optimization

I am trying this query:
List<Account> onlyRRCustomer = [SELECT
ac.rr_First_Name__c,
ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c,
ac.id,
ac.rr_Date_of_Birth__c
FROM
Account ac
WHERE
ac.rr_National_Insurance_Number__c IN :uniqueNiInputSet
AND RecordTypeId = :recordTypeId];
It gives me an error:
SELECT ac.rr_First_Name__c, ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c, ac.id, ac.rr_Date_of_Birth__c FROM
Account ac WHERE (ac.rr_National_Insurance_Number__c = :tmpVar1 AND
RecordTypeId = :tmpVar2) 10:12:05.0
(11489528)|EXCEPTION_THROWN|[49]|System.QueryException: Non-selective
query against large object type (more than 200000 rows). Consider an
indexed filter or contact salesforce.com about custom indexing.
I understand uniqueNiInputSet.size() ~ 50, so, it's not an issue but for that record type, it might contains more records.
So, if i changed the position will that work? Means, first the recordtype and then the NIset in where clause. Is there any order how where clause are selected in SF. So, it will only look for 50 member and then within 50 it will serach for the particular record type?
That just means that the script is taking too long to execute. You may need to move this to a #future method or make execute it using Database.Batchable.
I don't think the order matters in SOQL, I think it's just trying to return too many records.
A non-selective query means you are performing a query against a table that has a large number of records and your query is not specific enough. You can work with Salesforce support to try to resolve this, either through the creation of additional backend indexes or by making the query more selective.
To be honest, your query looks very selective already, you're not using LIKE or IN. You should also put your most selective conditions first (resulting in a more focused query against your records).
I know it should'nt matter, but I would also move your conditions out of the parenthesis.
If there are any other fields you can filter on, that may help. Sometimes, you have to actually create new fields and populate them just to help make your queries more selective.
Also, if rr_National_Insurance_Number__c is a formula field, you will want to change it to a text field and populate workflow or apex instead. Formula fields require additional time on the servers to calculate.
SELECT rr_First_Name__c, rr_Last_Name__c, rr_National_Insurance_Number__c, id, rr_Date_of_Birth__c
FROM Account
WHERE new_custom_field__c = TRUE
AND rr_National_Insurance_Number__c = :tmpVar1
AND RecordTypeId = :tmpVar2
Your query is non-selective. For a standard indexes is 30% for the fist million records and 15% of records over a million up to 1 million records total. For and "AND" query each individual where criteria must itself be selective see this quick reference cheat sheet. In general try making
rr_National_Insurance_Number__c
an external id which will make it an indexed by salesforce by default and retry you query. Record Types are already indexed by default. If the result is still non-selective because of the number of results returned, try limiting the number of results using a field like CreatedDate to limit the scope of the query.