MyBatis API RowBounds Class does full scan of the DB? - mybatis

When the Class RowBounds in MyBatis API gets data from DB, does it do full scan and then cut the row that is set up by limit and offset parameters? or does it only get the data bound?

If the SQL query contains offset and limit/fetch first n rows only then the resultset will return only data within bounds. Bounds are applied on DB side. OFFSET 10000 LIMIT 20 will produces a (maximum) 20 records resultset.
This is likely what you need.
Rowbound does not alter the SQL query and operates independently. Mybatis works with whole Resultset returned by the DB.
e.g.: RowBounds(10000, 20) will skip first 10000 records of the resultset, then fetch 20 records and stop. But the result size may be MAX_INT.

It does retrieve full data from database however return only the requested number of records into the program. So, no worries for OutOfMemory but query will take long on database side.
Hibernate and Eclipselink, on the other hand pass on the given limit count onto the database and retrieves only required number of records from database. Hibernate achieves this by using database vendor specific SQL construct in its generated SQL. Ex - LIMIT clause in MS-SQL, Rownum for Oracle.
If you want to achieve the same in mybatis, you need to use these constructs yourselves.
It is easy and you can use mybatis conditions to make the SQL specific to any database.

Related

DB2 Optimize for n rows

I'm learning DB2, and I came across this clause: OPTIMIZE FOR 1 ROW right after FETCH FIRST 100 ROWS ONLY.
I understand that FETCH FIRST 100 ROWS ONLY would give me the first 100 rows that qualified. But I don't understand what the OPTIMIZE FOR 1 ROW really doing here. I read this DB2 documentation, it says
Use OPTIMIZE FOR 1 ROW clause to influence the access path. OPTIMIZE FOR 1 ROW tells Db2 to select an access path that returns the first qualifying row quickly.
and this DB2 documentation, it says
In general, if you are retrieving only a few rows, specify OPTIMIZE FOR 1 ROW to influence the access path that Db2 selects.
But I'm still confused. Is using OPTIMIZE FOR n ROWS would make a query more efficient?
I also found this post on SO and it seems like OPTIMIZE FOR n ROWS is equivalent to FETCH FIRST n ROWS ONLY per the accepted answer.
But when I experimented it myself using OPTIMIZE FOR n ROWS instead of FETCH FIRST n ROWS ONLY, the result set was not the same. With OPTIMIZE FOR n ROWS, the query returns all qualifying rows.
Could someone please explain it to me what OPTIMIZE FOR n ROWS really does? Thanks!
Is using OPTIMIZE FOR n ROWS would make a query more efficient?
Not necessarily. However, it might cause your application to start receiving rows earlier than it otherwise would, if there is an access plan alternative that can find the first row matching the query criteria faster although the entire query will as a result run longer.
There's this bit in the Db2 for LUW docs that gives some examples specific to that platform:
Try specifying OPTIMIZE FOR n ROWS along with FETCH FIRST n ROWS ONLY, to encourage query access plans that return rows directly from the referenced tables, without first performing a buffering operation such as inserting into a temporary table, sorting, or inserting into a hash join hash table.
Applications that specify OPTIMIZE FOR n ROWS to encourage query access plans that avoid buffering operations, yet retrieve the entire result set, might experience poor performance. This is because the query access plan that returns the first n rows fastest might not be the best query access plan if the entire result set is being retrieved.

Apex query optimization

I am trying this query:
List<Account> onlyRRCustomer = [SELECT
ac.rr_First_Name__c,
ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c,
ac.id,
ac.rr_Date_of_Birth__c
FROM
Account ac
WHERE
ac.rr_National_Insurance_Number__c IN :uniqueNiInputSet
AND RecordTypeId = :recordTypeId];
It gives me an error:
SELECT ac.rr_First_Name__c, ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c, ac.id, ac.rr_Date_of_Birth__c FROM
Account ac WHERE (ac.rr_National_Insurance_Number__c = :tmpVar1 AND
RecordTypeId = :tmpVar2) 10:12:05.0
(11489528)|EXCEPTION_THROWN|[49]|System.QueryException: Non-selective
query against large object type (more than 200000 rows). Consider an
indexed filter or contact salesforce.com about custom indexing.
I understand uniqueNiInputSet.size() ~ 50, so, it's not an issue but for that record type, it might contains more records.
So, if i changed the position will that work? Means, first the recordtype and then the NIset in where clause. Is there any order how where clause are selected in SF. So, it will only look for 50 member and then within 50 it will serach for the particular record type?
That just means that the script is taking too long to execute. You may need to move this to a #future method or make execute it using Database.Batchable.
I don't think the order matters in SOQL, I think it's just trying to return too many records.
A non-selective query means you are performing a query against a table that has a large number of records and your query is not specific enough. You can work with Salesforce support to try to resolve this, either through the creation of additional backend indexes or by making the query more selective.
To be honest, your query looks very selective already, you're not using LIKE or IN. You should also put your most selective conditions first (resulting in a more focused query against your records).
I know it should'nt matter, but I would also move your conditions out of the parenthesis.
If there are any other fields you can filter on, that may help. Sometimes, you have to actually create new fields and populate them just to help make your queries more selective.
Also, if rr_National_Insurance_Number__c is a formula field, you will want to change it to a text field and populate workflow or apex instead. Formula fields require additional time on the servers to calculate.
SELECT rr_First_Name__c, rr_Last_Name__c, rr_National_Insurance_Number__c, id, rr_Date_of_Birth__c
FROM Account
WHERE new_custom_field__c = TRUE
AND rr_National_Insurance_Number__c = :tmpVar1
AND RecordTypeId = :tmpVar2
Your query is non-selective. For a standard indexes is 30% for the fist million records and 15% of records over a million up to 1 million records total. For and "AND" query each individual where criteria must itself be selective see this quick reference cheat sheet. In general try making
rr_National_Insurance_Number__c
an external id which will make it an indexed by salesforce by default and retry you query. Record Types are already indexed by default. If the result is still non-selective because of the number of results returned, try limiting the number of results using a field like CreatedDate to limit the scope of the query.

SQL Query with a little bit more complicated WHERE clause performance issue

I have some performance issue in this case:
Very simplified query:
SELECT COUNT(*) FROM Items WHERE ConditionA OR ConditionB OR ConditionC OR ...
Simply I have to determine how many Items the user has access through some complicated conditions.
When there is a large number of records (100,000+) in the Items table and say ~10 complicated conditions concatenated in WHERE clause, I get the result about 2 seconds in my case. The problem is when a very few conditions are met, f.e. when I get only 10 Items from 100,000.
How can I improve the performace in this "Get my items" case?
Additional information:
the query is generated by EF 6.1
MS SQL 2012 Express
SQL Execution Plan
Add an additional table to your schema. Instead of building a long query, insert the value for each condition into this new table, along with a key for that user/session. Then, JOIN the two tables together.
This should perform much better, because it will allow the database engine to make better use of indexes on your Items table.
Additionally, this will position you to eventually pre-define sets of permissions for your users, such that you don't need to insert them right at the moment you do the check. The permission sets will already be in the table, and the new table can also be indexed, which will improve performance further.

Optimize SQLite query that has multiple parameters using FMDB

I have a query that starts pretty simple:
SELECT zudf16, zudf17, zudf18, zudf19, zudf20, zitemid, zItemName, zPhotoName, zBasePrice FROM zskus WHERE zmanufacturerid=? AND zudf16=? ORDER BY zItemID
As choices are made I build on it:
SELECT zudf16, zudf17, zudf18, zudf19, zudf20, zitemid, zItemName, zPhotoName, zBasePrice FROM zskus WHERE zmanufacturerid=? AND zudf16=? AND zudf19=? AND zudf20=? ORDER BY zItemID
For each parameter I add the query drops in performance, especially on iPad1. Each of these columns have an index on them.
How can I increase the performance of this query as it get's more targeted rather than lose performance?
This is across around 60,000 records that match the initial ManufacturerID but with a total of about 200,000 rows total in the database.

Does DataReader.NextResult retrieves the result is always the same order

I have a SELECT query that yields multiple results and do not have any ORDER BY clause.
If I execute this query multiple times and then iterate through results using DataReader.NextResult(), would I be guaranteed to get the results in the same order?
For e.g. if I execute the following query that return 199 rows:
SELECT * FROM products WHERE productid < 200
would I always get the first result with productid = 1 and so on?
As far as I have observed it always return the results in same order, but I cannot find any documentation for this behavior.
======================================
As per my research:
Check out this blog Conor vs. SQL. I actually wanted to ask if the query-result changes even if the data in table remains the same (i.e no update or delete). But it seems like in case of large table, when SQL server employees parallelism, the order can be different
First of all, to iterate the rows in a DataReader, you should call Read, not NextResult.
Calling NextResult will move to the next result set if your query has multiple SELECT statements.
To answer your question, you must not rely on this.
A query without an ORDER BY clause will return rows in SQL Server's default iteration order.
For small tables, this will usually be the order in which the rows were added, but this is not guaranteed and is liable to change at any time. For example, if the table is indexed or partitioned, the order will be different.
No, DataReader will return the results in the order they come back from SQL. If you don't specify an ORDER BY clause, that will be the order that they exist in the table.
It is possible, perhaps even likely that they will always return in the same order, but this isn't guaranteed. The order is determined by the queryplan (at least in SQL Server) on the database server. If something changes that queryplan, the order could change. You should always use ORDER BY if the order of results is in anyway important to your processing of the data.