https://www.postgresql.org/docs/10/queries-order.html
When you do sorting in Postgres you can specify if NULLs should come first or last. You can do this in every ORDER BY clause.
But... I wonder if there's a setting for this on the DB level, or on the server level... so that you don't need to specify always NULLS FIRST if that's what you always want for ASC sorting.
No, there is no way to do that.
The default value for ASC ordering is always NULLS LAST, and for DESC it is NULLS FIRST. If you need anything else, you'll have to say it explicitly.
Related
I'm building a TimescaleDB local server and I'm creating my first "production" hypertables. The point is that, at the moment, all the future consumers of my DB are going to use the data in ASC order, but by default timescale creates a DESC index in the time column.
My doubt is, does it worth to change the default behaviour and make the index to be ASC?
I don't know if it's DESC by default for a good reason and I'm going to have some penalty. I have also read that indexs in postgresql can be read backward, so a DESC index could be used in an ASC query, but I don't know if there are performance penalties.
In the other hand, it's safe to simple delete the default index and create a new one with different order? Also not sure if deleting it I'm going to screw up some timescale internal functionality.
Thanks for your time,
H25E
For a single-column index, it does not matter at all if it is created ASC or DESC, because indexes can be read in both directions with the same efficiency.
The only time when you really need to specify DESC in an index is if the index is supposed to support an ORDER BY clause like ORDER BY a, b DESC. Then one of the index columns must be sorted ASC and the other DESC — but again it doesn't matter which one is ASC and which DESC, as the index can be read in both directions.
So, for a single column index, there is no need to build the index again, and there was no good reason to create it DESC in the first place (but it doesn't matter).
Pursuant to PostgreSQL: detecting the first/last rows of result set, I've been given reason to suspect that such a clause is dangerous or otherwise inappropriate, and want to understand that better. Take:
SELECT last_value(unique_column) OVER (), * FROM mytable;
unique_column is unique and not null. So what's wrong with using OVER () in this way? Is it dangerous/unreliable? Suboptimal? From what I can tell, this should return the value from the last row in the result set—at least, it has when I've tried it. I've been told that "last" doesn't make sense without sorting, but clearly there is a last row that is returned. I've also been told that OVER () means "anything goes", which suggests that the results are unreliable, but so far, every time I've run that kind of query, I've been consistently given the value from the end of the result set.
Now I have found a problem if I use ORDER BY:
SELECT last_value(unique_column) OVER (), * FROM mytable ORDER BY something_else;
But, my solution to that is to subquery:
SELECT last_value(unique_column) OVER (), * FROM (SELECT * FROM mytable ORDER BY something_else) sub;
It's as if OVER () means the analytic functions (like first_value() and last_value()) operate according to the order in which the engine happens the read the table/subquery. And, from what I can tell, you have enough control over the order in which the engine happens to read the table/subquery (without having to do unnecessary sorting).
I'm running PostgreSQL 9.6 in a Debian 9.5 environment.
You should provide ORDER BY inside OVER clause:
SELECT *,
last_value(unique_column)
OVER (ORDER BY sth_else ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM mytable
I should point out that in the last few months, this solution has worked out rather well, and I've not been shown an alternative, so I'm going to continue using it. However, I should point out that it is finicky and can fail if you make certain changes and do not take the analytics into consideration. (No doubt, I'm misusing the feature and it was not developed for this purpose). So I'll use this space to record the gotchas as I find them.
If you order your results, you've got a problem, but I've already explained that in the question.
I tried to use it in an outer join. Since this caused fields in the result set to be null (even though they are taken from fields in a table that cannot be null) this caused OVER() to return NULL. I have a few ideas about how to get around this, but they would make the query very ugly and possibly very inefficient.
Is there a configuration option somewhere or anything that will allow me to force postgres to use NULLS LAST on every query that uses DESC ordering?
I dont want to rewrite all queries from Criteria API to JPQL in my app, and it seems JPA Criteria API does not allow setting nulls last option.
No. At least I never heard of such. Simple check does not give hope for it either:
t=# select setting, name from pg_settings where name like '%null%';
setting | name
---------+-----------------------
on | array_nulls
off | transform_null_equals
(2 rows)
https://www.postgresql.org/docs/current/static/queries-order.html does not mention such global switches either, just:
The NULLS FIRST and NULLS LAST options can be used to determine
whether nulls appear before or after non-null values in the sort
ordering. By default, null values sort as if larger than any non-null
value; that is, NULLS FIRST is the default for DESC order, and NULLS
LAST otherwise.
I'm selecting distinct values from tables thru Java's JDBC connector and it seems that NULL value (if there's any) is always the first row in the ResultSet.
I need to remove this NULL from the List where I load this ResultSet. The logic looks only at the first element and if it's null then ignores it.
I'm not using any ORDER BY in the query, can I still trust that logic? I can't find any reference in Postgres' documentation about this.
You can add a check for NOT NULL. Simply like
select distinct columnName
from Tablename
where columnName IS NOT NULL
Also if you are not providing the ORDER BY clause then then order in which you are going to get the result is not guaranteed, hence you can not rely on it. So it is better and recommended to provide the ORDER BY clause if you want your result output in a particular output(i.e., ascending or descending)
If you are looking for a reference Postgresql document then it says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
If it is not stated in the manual, I wouldn't trust it. However, just for fun and try to figure out what logic is being used, running the following query does bring the NULL (for no apparent reason) to the top, while all other values are in an apparent random order:
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
However, cross joining the table with a modified version of itself brings two NULLs to the top, but random NULLs dispersed througout the resultset. So it doesn't seem to have a clear-cut logic clumping all NULL values at the top.
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
cross join (select n+3 from t) t2
i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.
To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id
I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.
To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.