SQL limit query - tsql

I'm having an issue with limiting the SQL query. I'm using SQL 2000 so I can't use any of the functions like ROW_NUMBER(),CTE OR OFFSET_ROW FETCH.
I have tried the Select TOP limit * FROM approach and excluded the already shown results but this way the query is so slow because sometimes my result query fetches more than 10000 records.
Also I have tried the following approach:
SELECT * FROM (
SELECT DISTINCT TOP 100 PERCENT i.name, i.location, i.image ,
( SELECT count(DISTINCT i.id) FROM image WHERE i.id<= im.id ) AS recordnum
FROM images AS im
order by im.location asc, im.name asc) as tmp
WHERE recordnum between 5 AND 15
same problem here plus issue because I couldn't add ORDER option in sub query from record um. I have placed both solution in stored procedure but still the query execution is still so slow.
So my question is:
IS there an efficient way to limit the query to pull 20 records per page in SQL 2000 for large amounts of data i.e more than 10000?
Thanks.

Now the subquery is only run once
where im2.id is null will skip the first 40 rows
SELECT top 25 im1.*
FROM images im1
left join ( select top 40 id from images order by id ) im2
on im1.id = im2.id
where im2.id is null
order by im1.id

Query-wise, there is no great performing way. If performance is critical and the data will always be grouped/ordered the same, you could add a int column and set the value by trigger based on the grouping/ordering. Index it and it should be extremely fast for reads; writes will be a bit slower.
Also, make sure you have indexes on the Id columns on image and images.

Related

How to delete rows (with geometry column) from a table in PostgreSQL (PostGIS) where the row has no spatial intersection with any row in another table

I have seen a strange behavior in PostgreSQL (PostGIS).
I have two tables in PostGIS with geometry columns. one table is a grid and the other one is lines. I want to delete all grid cells that no line passes through them.
In other words, I want to delete the rows from a table when that row has no spatial intersection with any rows of second table.
First, in a subquery, I find the ids of the rows that have any intersection. Then, I delete any row that its id is not in that returned list of ids.
DELETE FROM base_grid_916453354
WHERE id NOT IN
(
SELECT DISTINCT bg.id
FROM base_grid_916453354 bg, (SELECT * FROM tracks_heatmap_1000
LIMIT 100000) tr
WHERE bg.geom && tr.buffer
);
The following subquery returns in only 12 seconds
SELECT DISTINCT bg.id
FROM base_grid_916453354 bg, (SELECT * FROM tracks_heatmap_1000 LIMIT
100000) tr
WHERE bg.geom && tr.buffer
, while the whole query did not return even in 1 hour!!
I ran explain command and it is the result of it, but I cannot interpret it:
How can I improve this query and why deleting from the returned list takes so much of time?
It is very strange because the subquery is a spatial query between 2 tables of 9 million and 100k rows, while the delete part is just checking a list and deleting!! In my mind, the delete part is much much easier.
Don't post text as images of text!
Increase work_mem until the subplan becomes a hashed subplan.
Or rewrite it to use 'NOT EXISTS' rather than NOT IN
I found a fast way to do this query. As #jjanes said, I used EXISTS() function:
DELETE FROM base_grid_916453354 bg
WHERE NOT EXISTS
(
SELECT 1
FROM tracks_heatmap_1000 tr
WHERE bg.geom && tr.buffer
);
This query takes around 1 minute and it is acceptable for the size of my tables.

PostgREST using limit and offset in subqueries or CTE

we are using PostgREST in our project for some quite complex database views.
From some point on, when we are using limit and offset (x-range headers or query parameters) with sub-selects we get very high response times.
From what we have read, it seems like this is a known issue where postgresql executes the sub-selects even for the records which are not requested. The solution would be to jiggle a little with the offset and limit, putting it in a subselect or a CTE table.
Is there an internal GUC value or something similar that we can use in the database views in order to optimize the response times ? Does anybody have a hint on how to achieve this ?
EDIT: as suggested here are some more details. Let's say we have a relationship between product and parts. I want to know the parts count per product (this is a simplified version of the database views that we are exposing).
There are two ways of doing this
A. Subselect:
SELECT products.id
,(
SELECT count(part_id) AS total
FROM parts
WHERE product_id = products.id
)
FROM products limit 1000 OFFSET 99000
B. CTE:
WITH parts_count
AS (
SELECT product_id
,count(part_id) AS total
FROM parts
GROUP BY product_id
ORDER BY product_id
)
SELECT products.id
,parts_count.total
FROM products
LEFT JOIN parts_count ON parts_count.product_id = product.id
LIMIT 1000
OFFSET 99000
Problem with A is that the sub-select is performed for every row so even though I read only 1000 records there are 100 000 subselects.
Problem with B is that the join with parts_count table takes very long since there are 100 0000 records there (although the with query takes only 200 ms! for 2000 records). Ideally I would like to limit the parts_count table with the same limit and offset as the main query but I can't do this in PostgREST since it just appends the limit and offset at the end, I don't have access to those parameters inside the WITH query
It is unavoidable that high OFFSET leads to bad performance.
There is no other way to compute OFFSET but to scan and discard all the rows until you reach the offset, and no database in the world will be fast if OFFSET is high.
That's a conceptual problem, and the only way to avoid it is to avoid OFFSET.
If your goal is pagination, then usually keyset pagination is a better solution:
You add an ORDER BY clause that matches your requirements, make sure there is a unique key in the ORDER BY clause and remember the last value you found. To fetch the next page, add a WHERE condition with that values. With proper index support, this can be very fast.
For your query, a more efficient version is probably:
SELECT p.id
count(parts.part_id) AS total
FROM (SELECT id FROM products
LIMIT 1000 OFFSET 99000) p
LEFT JOIN parts ON parts.product_id = p.id
GROUP BY p.id;
It is rather weird that you have no ORDER BY, but LIMIT and OFFSET. That doesn't make much sense.

Redshift large 'in' clause best practices

We have a query in which a list of parameter values is provided in "IN" clause of the query. Some time back this query failed to execute as the size of data in "IN" clause got quite large and hence the resulting query exceeded the 16 MB limit of the query in REDSHIFT. As a result of which we then tried processing the data in batches so as to limit the data and not breach the 16 MB limit.
My question is what are the factors/pitfalls to keep in mind while supplying such large data for the "IN" clause of a query or is there any alternative way in which I can deal with such large data for the "IN" clause?
If you have control over how you are generating your code, you could split it up as follows
first code to be submitted, drop and recreate filter table:
drop table if exists myfilter;
create table myfilter (filter_text varchar(max));
Second step is to populate the filter table in parts of a suitable size, e.g. 1000 values at a time
insert into myfilter
values({{myvalue1}},{{myvalue2}},{{myvalue3}} etc etc up to 1000 values );
repeat the above step multiple times until you have all of your values inserted
Then, use that filter table as follows
select * from master_table
where some_value in (select filter_text from myfilter);
drop table myfilter;
Large IN is not the best practice itself, it's better to use joins for large lists:
construct a virtual table a subquery
join your target table to the virtual table
like this
with
your_list as (
select 'first_value' as search_value
union select 'second_value'
...
)
select ...
from target_table t1
join your_list t2
on t1.col=t2.search_value

Reform a postgreSQL query using one (or more) index - Example

I am a beginner in PostgreSQL and, after understanding very basic things, I want to find out how I can get a better performance (on a query) by using an index (one or more). I have read some documentation, but I would like a specific example so as to "catch" it.
MY EXAMPLE: Let's say I have just a table (MyTable) with three columns (Customer(text), Time(timestamp), Consumption(integer)) and I want to find the customer(s) with the maximum consumption on '2014-07-01 01:00:00'. MY SOLUTION (without index usage):
SELECT Customer FROM MyTable WHERE Time='2013-07-01 02:00:00'
AND Consumption=(SELECT MAX(consumption) FROM MyTable);
----> What would be the exact full code, using - at least one - index for the query-example above ?
The correct query (using a correlated subquery) would be:
SELECT Customer
FROM MyTable
WHERE Time = '2013-07-01 02:00:00' AND
Consumption = (SELECT MAX(t2.consumption) FROM MyTable t2 WHERE t2.Time = '2013-07-01 02:00:00');
The above is very reasonable. An alternative approach if you want exactly one row returned is:
SELECT Customer
FROM MyTable
WHERE Time = '2013-07-01 02:00:00'
ORDER BY Consumption DESC
LIMIT 1;
And the best index is MyTable(Time, Consumption, Customer).

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.