Optimizing select query with limit and order by - postgresql

Following is my query:
select * from table order by timestamp desc limit 10
this takes too much time compared to
select * from table limit 10
How can I optimize the first query to get to near performance of second query.
UPDATE: I don't have control over the db server, so can not index columns to gain performance.

Create an index on timestamp.

Quassnoi is correct -- you need an index on timestamp.
That said, if your timestamp field reasonably maps your primary key (e.g. a date_created or an invoice_date field), you can try this workaround:
select *
from (select * from table order by id desc limit 1000) as table
order by timestamp desc limit 10;

#Nishan is right. There is little you can do. If you do not need every column in the table you may gain a few milliseconds by explicitly asking for just the columns you need

Related

postgresql: how to get the last record even with WHERE clause

I have the following postgresql command
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
WHERE t.col1="someval"
Now i also want to get the last record of along with the above query
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
)
Currently i am doing
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
WHERE t.col1="someval"
UNION ALL
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
ORDER BY id ASC
LIMIT 1
Is this is the right way
I would use UNION rather than UNION ALL in this case, since the final row could also be returned by the first query, and I wouldn't want to have it twice in the result set if that happens. The primary key will guarantee that UNION can accidentally remove duplicate result rows.
I don't understand the query, in particular why there is a WHERE condition at the outside query in the first case, but not in the second. But that is unrelated to the question.
Your current effort is wrong, since the LIMIT 1 applies outside the UNION ALL, so you get only one row as a result. That this is wrong should have been immediately obvious upon testing, so it is baffling that you are asking us if it is right.
You should wrap the whole second SELECT in parenthesis, so the LIMIT applies just to it.
Better yet, rather than ordering and taking 1000 rows and then reversing the order and taking the first row, you could just do OFFSET 999 LIMIT 1 to get the 1000th row.
If the 1000th rows matches both conditions, do you want to see it twice?

The last updated data shows first in the postgres selet query?

I have simple query that takes some results from User model.
Query 1:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
LIMIT 20 OFFSET 0;
Result:
Then I have done some update on the customer 341683 and again I run the same query that time the result shows different, means the last updated shows first. So postgres is taking the last updated by default or any other things happens here?
Without an order by clause, the database is free to return rows in any order, and will usually just return them in whichever way is fastest. It stands to reason the row you recently updated will be in some cache, and thus returned first.
If you need to rely on the order of the returned rows, you need to explicitly state it, e.g.:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
ORDER BY id -- Here!
LIMIT 20 OFFSET 0

How do I sort partition data when using query binding on an SSAS cube?

I'm trying to implement various sorts as described in this article.
I have a typical Sales Measure Group partitioned by fiscal period. If I try to add an order by clause to the query it fails when processing because SSAS wraps the query into a subquery. Is there a way to prevent this from happening? How do I ensure the sort order in a case like this?
Here is the code that is generated for a partition:
SELECT *
FROM
(
SELECT *
FROM [Sales]
WHERE SaleDate between '1/1/2015' and '1/28/2015'
order by SaleDate
)
AS [Sales]
I replaced the field names with * for clarity.
SELECT TOP 100 PERCENT * FROM Sales ORDER BY SaleDate
That is not guaranteed to work. The best way to order it is to ensure the clustered index is on the column you want to order by.

get the latest records in an effective way

I must get the latest records in my table. For this, first I select all records then I order them and get the latest 100 records. It costs a lot. I wonder is it a better way for this?
I am using oracle 10g.
You may try to add some condition if you know that those 100 records will always meet the criteria, e.g. load_date > '01-Jan-2012'
I presume you are doing this all in SQL?
SELECT * FROM (
SELECT *
FROM table
ORDER BY modified_date DESC
)
WHERE rownum <= 100;
One thing you can do is create a descending index on 'modified_date':
CREATE INDEX table_modified_date_desc_idx on table(modified_date DESC);
Then your query should use this index to retrieve only the latest records. If it doesn't you may also have to re-gather statistics on this table.

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.