postgresql: how to get the last record even with WHERE clause - postgresql

I have the following postgresql command
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
WHERE t.col1="someval"
Now i also want to get the last record of along with the above query
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
)
Currently i am doing
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
WHERE t.col1="someval"
UNION ALL
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
ORDER BY id ASC
LIMIT 1
Is this is the right way

I would use UNION rather than UNION ALL in this case, since the final row could also be returned by the first query, and I wouldn't want to have it twice in the result set if that happens. The primary key will guarantee that UNION can accidentally remove duplicate result rows.
I don't understand the query, in particular why there is a WHERE condition at the outside query in the first case, but not in the second. But that is unrelated to the question.

Your current effort is wrong, since the LIMIT 1 applies outside the UNION ALL, so you get only one row as a result. That this is wrong should have been immediately obvious upon testing, so it is baffling that you are asking us if it is right.
You should wrap the whole second SELECT in parenthesis, so the LIMIT applies just to it.
Better yet, rather than ordering and taking 1000 rows and then reversing the order and taking the first row, you could just do OFFSET 999 LIMIT 1 to get the 1000th row.
If the 1000th rows matches both conditions, do you want to see it twice?

Related

How to limit to just one result per condition when looking through multiple OR/IN conditions in the WHERE clause (Postgresql)

For Example:
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
I want to LIMIT 1 for each of the countries in my IN clause so I only see a total of 3 rows: One customer for per country (1 German, 1 France, 1 UK). Is there a simple way to do that?
Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.
As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.
As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.
SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1
The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.
The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.

Find time difference between two most recent orders

I am trying to estimate the time of a new order from repeat customers by finding the time difference between the most recent order and the second most recent order, and then adding that difference to the most recent order.
I have been trying limit and offset, but this returns a blanket date for every row. I am thinking I need to do a lateral join, but not sure how to implement it correctly. When I try to do it, I receive no output.
select public.orders.customer_id,
max(public.orders.created_at) as last_order_date,
(select created_at from public.orders group by created_at order by created_at desc limit 1 offset 1) as second_last
from public.orders
inner join
(select
customer_id, count(*)
from public.orders
where status = 'fulfilled'
group by public.orders.customer_id
having count(customer_id) >1) repeat_customers
on public.orders.customer_id = repeat_customers.customer_id
group by public.orders.customer_id;
I wanted the second_last field to be populated by the second most recent date for each customer_id, but the output is the second most recent date for the entire table, resulting in the same date for every entry.
For your second_last column you're not limiting it per customer, it will indeed find the max of everything just like the results you've seen. See the WHERE clause in the example below which should solve this:
(SELECT
created_at
FROM
public.orders po
WHERE
po.customer_id = customer_id
ORDER BY
created_at
LIMIT 1 OFFSET 1) AS second_last
I've also aliased the table because I wasn't sure if it would complain about ambiguity since the same table is mentioned in the main select.

PostgreSQL - order randomly, but with NULLs first

I have a query that takes all rows out of a table, and joins with another table I am updating. The other table has some items that have been checked (these get a value), and some which are not yet checked. I am trying to implement a way to update all records, but make sure any NULLs get sorted as quickly as possible. I have the following query:
SELECT * FROM posts
LEFT JOIN post_stats
ON post_stats.post_id = posts.id
ORDER BY RANDOM() NULLS FIRST LIMIT 10
However, this is ordering everything randomly. Is there a way to order everything randomly, but any NULLs get shown first?
Note that you don't even specify which column can contain NULLs in your query. This is an indicator that something is going wrong.
The following query (replace with what you need) should do what you want.
SELECT *
FROM posts
LEFT JOIN post_stats ON post_stats.post_id = posts.id
ORDER BY <YOUR_COLUMN> IS NOT NULL, RANDOM()
LIMIT 10;

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.

Select only half the records

I am trying to figure out how to select half the records where an ID is null. I want half because I am going to use that result set to update another ID field. Then I am going to update the rest with another value for that ID field.
So essentially I want to update half the records someFieldID with one number and the rest with another number splitting the update basically between two values for someFieldID the field I want to update.
In oracle you can use the ROWNUM psuedocolumn. I believe in sql server you can use TOP.
Example:
select TOP 50 PERCENT * from table
You can select by percent:
SELECT TOP 50 PERCENT *fields* FROM YourTable WHERE ...
update x set id=#value from (select top 50 percent * from table where id is null) x
The following SQL will return the col_ids of the first half of the table.
SELECT col_id FROM table
WHERE rownum <= (SELECT count(col_id)/2 FROM table);
If the total number of col_ids is an odd number then you will get the first half - 1. This is because, for instance, we have 51 total records, the count(col_id)/2 returns 25.5, and since there is no rownum equal to this result, we get everything equal to 25 and below. That means the other 26 are not returned.
However, I have not seen the reverse statement working:
SELECT col_id FROM table
WHERE rownum > (SELECT count(col_id)/2 FROM table);
So if you want the other half of the table, you could just store the first results into a temp table, lets call it TABLE_A. Then just do MINUS on the original table from this table:
SELECT col_id FROM table
MINUS
SELECT col_id FROM table_a
Hopefully this helps someone.