PostgreSQL - order randomly, but with NULLs first - postgresql

I have a query that takes all rows out of a table, and joins with another table I am updating. The other table has some items that have been checked (these get a value), and some which are not yet checked. I am trying to implement a way to update all records, but make sure any NULLs get sorted as quickly as possible. I have the following query:
SELECT * FROM posts
LEFT JOIN post_stats
ON post_stats.post_id = posts.id
ORDER BY RANDOM() NULLS FIRST LIMIT 10
However, this is ordering everything randomly. Is there a way to order everything randomly, but any NULLs get shown first?

Note that you don't even specify which column can contain NULLs in your query. This is an indicator that something is going wrong.
The following query (replace with what you need) should do what you want.
SELECT *
FROM posts
LEFT JOIN post_stats ON post_stats.post_id = posts.id
ORDER BY <YOUR_COLUMN> IS NOT NULL, RANDOM()
LIMIT 10;

Related

postgresql: how to get the last record even with WHERE clause

I have the following postgresql command
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
WHERE t.col1="someval"
Now i also want to get the last record of along with the above query
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
)
Currently i am doing
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
WHERE t.col1="someval"
UNION ALL
SELECT *
FROM (
SELECT *
FROM tablename
ORDER by id DESC
LIMIT 1000
) as t
ORDER BY id ASC
LIMIT 1
Is this is the right way
I would use UNION rather than UNION ALL in this case, since the final row could also be returned by the first query, and I wouldn't want to have it twice in the result set if that happens. The primary key will guarantee that UNION can accidentally remove duplicate result rows.
I don't understand the query, in particular why there is a WHERE condition at the outside query in the first case, but not in the second. But that is unrelated to the question.
Your current effort is wrong, since the LIMIT 1 applies outside the UNION ALL, so you get only one row as a result. That this is wrong should have been immediately obvious upon testing, so it is baffling that you are asking us if it is right.
You should wrap the whole second SELECT in parenthesis, so the LIMIT applies just to it.
Better yet, rather than ordering and taking 1000 rows and then reversing the order and taking the first row, you could just do OFFSET 999 LIMIT 1 to get the 1000th row.
If the 1000th rows matches both conditions, do you want to see it twice?

Postgres pagination with non-unique keys?

Suppose I have a table of events with (indexed) columns id : uuid and created : timestamp.
The id column is unique, but the created column is not. I would like to walk the table in chronological order using the created column.
Something like this:
SELECT * FROM events WHERE created >= $<after> ORDER BY created ASC LIMIT 10
Here $<after> is a template parameter that is taken from the previous query.
Now, I can see two issues with this:
Since created is not unique, the order will not be fully defined. Perhaps the sort should be id, created?
Each row should only be on one page, but with this query the last row is always included on the next page.
How should I go about this in Postgres?
SELECT * FROM events
WHERE created >= $<after> and (id >= $<id> OR created > $<after>)
ORDER BY created ASC ,id ASC LIMIT 10
that way the events each timestamp values will be ordered by id. and you can split pages anywhere.
you can say the same thing this way:
SELECT * FROM events
WHERE (created,id) >= ($<after>,$<id>)
ORDER BY created ASC ,id ASC LIMIT 10
and for me this produces a slightly better plan.
An index on (created,id) will help performance most, but for
many circumstances an index on created may suffice.
First, as you said, you should enforce a total ordering. Since the main thing you care about is created, you should start with that. id could be the secondary ordering, a tie breaker invisible to the user that just ensures the ordering is consistent. Secondly, instead of messing around with conditions on created, you could just use an offset clause to return later results:
SELECT * FROM events ORDER BY created ASC, id ASC LIMIT 10 OFFSET <10 * page number>
-- Note that page number is zero based

Will Postgres' DISTINCT function always return null as the first element?

I'm selecting distinct values from tables thru Java's JDBC connector and it seems that NULL value (if there's any) is always the first row in the ResultSet.
I need to remove this NULL from the List where I load this ResultSet. The logic looks only at the first element and if it's null then ignores it.
I'm not using any ORDER BY in the query, can I still trust that logic? I can't find any reference in Postgres' documentation about this.
You can add a check for NOT NULL. Simply like
select distinct columnName
from Tablename
where columnName IS NOT NULL
Also if you are not providing the ORDER BY clause then then order in which you are going to get the result is not guaranteed, hence you can not rely on it. So it is better and recommended to provide the ORDER BY clause if you want your result output in a particular output(i.e., ascending or descending)
If you are looking for a reference Postgresql document then it says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
If it is not stated in the manual, I wouldn't trust it. However, just for fun and try to figure out what logic is being used, running the following query does bring the NULL (for no apparent reason) to the top, while all other values are in an apparent random order:
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
However, cross joining the table with a modified version of itself brings two NULLs to the top, but random NULLs dispersed througout the resultset. So it doesn't seem to have a clear-cut logic clumping all NULL values at the top.
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
cross join (select n+3 from t) t2

Understanding a simple DISTINCT ON in postgresql

I am having a small difficulty understanding the below simple DISTINCT ON query:
SELECT DISTINCT
ON (bcolor) bcolor,
fcolor
FROM
t1
ORDER BY
bcolor,
fcolor;
I have this table here:
What is the order of execution of the above table and why I am getting the following result:
As I understand since ORDER BY is used it will display the table columns (both of them), in alphabetical order and since ON is used it will return the 1st matched duplicate, but I am still confused about how the resulting table is displayed.
Can somebody take me through how exactly this query is executed ?
This is an odd one since you would think that the SELECT would happen first, then the ORDER BY like any normal RDBMS, but the DISTINCT ON is special. It needs to know the order of the records in order to properly determine which records should be dropped.
So, in this case, it orders first by the bcolor, then by the fcolor. Then it determines distinct bcolors, and drops any but the first record for each distinct group.
In short, it does ORDER BY then applies the DISTINCT ON to drop the appropriate records. I think it would be most helpful to think of 'DISTINCT ON' as being special functionality that differs greatly from DISTINCT.
Added after initial post:
This could be done using window functions and a subquery as well:
SELECT
bcolor,
fcolor
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY bcolor ORDER BY fcolor ASC) as rownumber,
bcolor,
fcolor
FROM t1
) t2
WHERE rownumber = 1

sql date order by problem

i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.
To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id
I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.
To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.