Doctrine QueryBuilder cuts the result when I use orderBy - postgresql

I have a large repository method which generates a regular query at backend, some of the parameters I pass to that repository method are the max-results, firs-result, order-by and order-by-dir in order to control the total of records to display, pagination and the order of the records. The problem is when I am in some configuration ex.(4th page, max-results:10, first-result:40), this should give me the 40th to 50th records of +1000 records in database but only is returning -10 records from +1000 records.
QB Code
....
return $total ? //this is a bool parameter to find out if I want the records or the records amount
$qb
->select($qb->expr()->count('ec.id'))
->getQuery()->getSingleScalarResult() :
$qb//these are the related entities all are joined by leftJoin of QB
->addSelect('c')
->addSelect('e')
->addSelect('pr')
->addSelect('cl')
->addSelect('ap')
->addSelect('com')
->addSelect('cor')
->addSelect('nav')
->addSelect('pais')
->addSelect('tarifas')
->addSelect('transitario')
->orderBy(isset($options['sortBy']) ? $options['sortBy'] : 'e.bl', isset($options['sortDir']) ? $options['sortDir'] : 'asc')
->getQuery()
->setMaxResults(isset($options['limit']) ? $options['limit'] : 10)
->setFirstResult(isset($options['offset']) ? $options['offset'] : 0)
->getArrayResult();
Scenario 1: QueryBuilder with orderBy and database
QB: In this case the result is only one entity with the expected data, but only one entity not 10 when exists more than 1000 records
DB: In this case I get 10 records but with the same entity(the same output from QB but repeated 10 times)
Scenario 2: QueryBuilder with out orderBy and database
QB: In this case the result is as expected 10 records filtered from +1000 records
DB: In this case the result is as expected 10 records
The only problem in this scenario is that I can't order my results using the QB.
Environment description
Symfony: 3.4.11
PostgeSQL: 9.2
PHP 7.2
OS: Ubuntu Server 16.04 x64
Why doctrine/postgres are giving me that kind of result?
There is no Exceptions, miss configurations its only cuts the results when I use orderBy

As from comments posting this as an answer
I guess its because you are selecting related entities via left join, So you will be getting multiple results per main entity (due one to many relationships) but not in a sorted manner but when you do order by on your result set, the duplicates shows up in a same row, In absence of order by the the duplicates were still there but not in same row as unsorted results so you haven't noticed/considered them as duplicate record.
What i think as a workaround for your case is select only your main entity lets A in query builder don't select related ones addSelect(...) and use lazy loading when you want to display your desired results from related entities.

Related

Apex query optimization

I am trying this query:
List<Account> onlyRRCustomer = [SELECT
ac.rr_First_Name__c,
ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c,
ac.id,
ac.rr_Date_of_Birth__c
FROM
Account ac
WHERE
ac.rr_National_Insurance_Number__c IN :uniqueNiInputSet
AND RecordTypeId = :recordTypeId];
It gives me an error:
SELECT ac.rr_First_Name__c, ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c, ac.id, ac.rr_Date_of_Birth__c FROM
Account ac WHERE (ac.rr_National_Insurance_Number__c = :tmpVar1 AND
RecordTypeId = :tmpVar2) 10:12:05.0
(11489528)|EXCEPTION_THROWN|[49]|System.QueryException: Non-selective
query against large object type (more than 200000 rows). Consider an
indexed filter or contact salesforce.com about custom indexing.
I understand uniqueNiInputSet.size() ~ 50, so, it's not an issue but for that record type, it might contains more records.
So, if i changed the position will that work? Means, first the recordtype and then the NIset in where clause. Is there any order how where clause are selected in SF. So, it will only look for 50 member and then within 50 it will serach for the particular record type?
That just means that the script is taking too long to execute. You may need to move this to a #future method or make execute it using Database.Batchable.
I don't think the order matters in SOQL, I think it's just trying to return too many records.
A non-selective query means you are performing a query against a table that has a large number of records and your query is not specific enough. You can work with Salesforce support to try to resolve this, either through the creation of additional backend indexes or by making the query more selective.
To be honest, your query looks very selective already, you're not using LIKE or IN. You should also put your most selective conditions first (resulting in a more focused query against your records).
I know it should'nt matter, but I would also move your conditions out of the parenthesis.
If there are any other fields you can filter on, that may help. Sometimes, you have to actually create new fields and populate them just to help make your queries more selective.
Also, if rr_National_Insurance_Number__c is a formula field, you will want to change it to a text field and populate workflow or apex instead. Formula fields require additional time on the servers to calculate.
SELECT rr_First_Name__c, rr_Last_Name__c, rr_National_Insurance_Number__c, id, rr_Date_of_Birth__c
FROM Account
WHERE new_custom_field__c = TRUE
AND rr_National_Insurance_Number__c = :tmpVar1
AND RecordTypeId = :tmpVar2
Your query is non-selective. For a standard indexes is 30% for the fist million records and 15% of records over a million up to 1 million records total. For and "AND" query each individual where criteria must itself be selective see this quick reference cheat sheet. In general try making
rr_National_Insurance_Number__c
an external id which will make it an indexed by salesforce by default and retry you query. Record Types are already indexed by default. If the result is still non-selective because of the number of results returned, try limiting the number of results using a field like CreatedDate to limit the scope of the query.

MyBatis API RowBounds Class does full scan of the DB?

When the Class RowBounds in MyBatis API gets data from DB, does it do full scan and then cut the row that is set up by limit and offset parameters? or does it only get the data bound?
If the SQL query contains offset and limit/fetch first n rows only then the resultset will return only data within bounds. Bounds are applied on DB side. OFFSET 10000 LIMIT 20 will produces a (maximum) 20 records resultset.
This is likely what you need.
Rowbound does not alter the SQL query and operates independently. Mybatis works with whole Resultset returned by the DB.
e.g.: RowBounds(10000, 20) will skip first 10000 records of the resultset, then fetch 20 records and stop. But the result size may be MAX_INT.
It does retrieve full data from database however return only the requested number of records into the program. So, no worries for OutOfMemory but query will take long on database side.
Hibernate and Eclipselink, on the other hand pass on the given limit count onto the database and retrieves only required number of records from database. Hibernate achieves this by using database vendor specific SQL construct in its generated SQL. Ex - LIMIT clause in MS-SQL, Rownum for Oracle.
If you want to achieve the same in mybatis, you need to use these constructs yourselves.
It is easy and you can use mybatis conditions to make the SQL specific to any database.

how to find record count mismatch in a stored procedure in sql server 2008 R2?

2 stored procedures are developed by .net developers. which are giving same record counts when you pass the same parameter?
now due to some changes , we are getting mismatch record count i.e
if first stored procedure is giving 2 records for a paramemter , the second SP is giving only 1 record.
to find this i followed the approach like
i verified
i counted total records of a table after joining
total tables used in joining
3.distinct / group by is used in 2 tables or not?
finally i am not able to find the issue.
how do i fix it?
could any body share some ideas.
thanks in advance?
Assuming the same JOINs and filters, then the problem is NULLs.
That is, either
A WHERE clause has a direct NULL comparison which will fail
A COUNT is on a nullable column. See Count(*) vs Count(1) for more
Either way, why do you have the same very similar stored procedures written by 2 different developers, that appear to have differences?

Nhibernate Expression.In limit

I am querying the Nhibernate criteria query with more then 2100 values for In clause.
I do something like Session.CreateCriteria(typeof()).Add(Expression.In("fieldName",arrayValue))
Where arrayValue contains more then 2100 values. I face error
Exception occurred:
UnknownError
NHibernate.ADOException: could not execute query ..then the query with more then 3000 values in array.
with some google help we found out that IN clause in Sql supports only till 2100 values.
Does anyone has faced similar issue earlier? We do not want to change the query as it is written in some generic way and not customized one.
This is a limitation of SQL Server. I wouldn't suggest doing this, but if you insist, you could work around it by creating a table-value sql function (see http://www.dzone.com/snippets/function-getting-comma) that splits up a string by commas (or whatever delimiter you want) and returns the values as a table, and then pass in all your ID's as (say) a comma separated list in 1 parameter and use a SQLCriterion in your criteria query.
eg:
criteria.Add(
new SQLCriterion("{alias}.ID IN (SELECT element FROM dbo.GetCSVValues(?))",
new[]{csvListOfIds},
new[]{NHibernateUtil.String}))
You could split the array into multiple batches, query multiple times, and then combine the result.

Does DataReader.NextResult retrieves the result is always the same order

I have a SELECT query that yields multiple results and do not have any ORDER BY clause.
If I execute this query multiple times and then iterate through results using DataReader.NextResult(), would I be guaranteed to get the results in the same order?
For e.g. if I execute the following query that return 199 rows:
SELECT * FROM products WHERE productid < 200
would I always get the first result with productid = 1 and so on?
As far as I have observed it always return the results in same order, but I cannot find any documentation for this behavior.
======================================
As per my research:
Check out this blog Conor vs. SQL. I actually wanted to ask if the query-result changes even if the data in table remains the same (i.e no update or delete). But it seems like in case of large table, when SQL server employees parallelism, the order can be different
First of all, to iterate the rows in a DataReader, you should call Read, not NextResult.
Calling NextResult will move to the next result set if your query has multiple SELECT statements.
To answer your question, you must not rely on this.
A query without an ORDER BY clause will return rows in SQL Server's default iteration order.
For small tables, this will usually be the order in which the rows were added, but this is not guaranteed and is liable to change at any time. For example, if the table is indexed or partitioned, the order will be different.
No, DataReader will return the results in the order they come back from SQL. If you don't specify an ORDER BY clause, that will be the order that they exist in the table.
It is possible, perhaps even likely that they will always return in the same order, but this isn't guaranteed. The order is determined by the queryplan (at least in SQL Server) on the database server. If something changes that queryplan, the order could change. You should always use ORDER BY if the order of results is in anyway important to your processing of the data.