JSONB nulls and order by - postgresql

It seems that order by with nulls doesn't really work for jsonb?
if I have many rows in a table that look like this:
key | cae1f6e1-8c1b-4fec-9002-7fd878e0dc06
value | {"id": "cae1f6e1-8c1b-4fec-9002-7fd878e0dc06",
"debit-amount": 207853501,
"credit-amount": null}
and when I run query like this:
select value->'debit-amount' deb from balance_table
order by deb asc
nulls last
limit 20;
it still shows only nulls

You can cast 'null'::jsonb to null with nullif():
select nullif(value->'debit-amount', 'null') deb
from balance_table
order by deb asc nulls last
limit 20;

The problem with your query is in the following part:
order by deb asc
nulls last
limit 20;
You would assume that this sorts all rows in a way that the ones that have null for debit-amount will end up in the end, so that when to do limit 20 you will probably see only non null ones. There is only one issue, the jsonb null is not the same as the sql null.
For example, suppose that a particular row has a value with "debit-amount": null. Then value->'debit-amount' will return a jsonb null, which will get sorted to the top, because that's how the order in jsonb works. In other words, you'll never get a sql null (unless of course the whole value is sql null), so the nulls last is pointless here.
You probably want something like that:
select value->'debit-amount' deb
from balance_table
order by ((value->'debit-amount')='null'::jsonb)::integer, deb asc
limit 20;
The expression ((value->'debit-amount')='null'::jsonb)::integer will convert all (jsonb) nulls for debit-amount to 1 and will return 0 otherwise. This will push all (jsonb) nulls to the end.

Related

postgresql order the null date to bottom [duplicate]

I have a SQL table with a datetime field. The field in question can be null. I have a query and I want the results sorted ascendingly by the datetime field, however I want rows where the datetime field is null at the end of the list, not at the beginning.
Is there a simple way to accomplish that?
select MyDate
from MyTable
order by case when MyDate is null then 1 else 0 end, MyDate
(A "bit" late, but this hasn't been mentioned at all)
You didn't specify your DBMS.
In standard SQL (and most modern DBMS like Oracle, PostgreSQL, DB2, Firebird, Apache Derby, HSQLDB and H2) you can specify NULLS LAST or NULLS FIRST:
Use NULLS LAST to sort them to the end:
select *
from some_table
order by some_column DESC NULLS LAST
I also just stumbled across this and the following seems to do the trick for me, on MySQL and PostgreSQL:
ORDER BY date IS NULL, date DESC
as found at https://stackoverflow.com/a/7055259/496209
If your engine allows ORDER BY x IS NULL, x or ORDER BY x NULLS LAST use that. But if it doesn't these might help:
If you're sorting by a numeric type you can do this: (Borrowing the schema from another answer.)
SELECT *
FROM Employees
ORDER BY ISNULL(DepartmentId*0,1), DepartmentId;
Any non-null number becomes 0, and nulls become 1, which sorts nulls last because 0 < 1.
You can also do this for strings:
SELECT *
FROM Employees
ORDER BY ISNULL(LEFT(LastName,0),'a'), LastName
Any non-null string becomes '', and nulls become 'a', which sorts nulls last because '' < 'a'.
This even works with dates by coercing to a nullable int and using the method for ints above:
SELECT *
FROM Employees
ORDER BY ISNULL(CONVERT(INT, HireDate)*0, 1), HireDate
(Lets pretend the schema has HireDate.)
These methods avoid the issue of having to come up with or manage a "maximum" value of every type or fix queries if the data type (and the maximum) changes (both issues that other ISNULL solutions suffer). Plus they're much shorter than a CASE.
You can use the built-in function to check for null or not null, as below. I test it and its working fine.
select MyDate from MyTable order by ISNULL(MyDate,1) DESC, MyDate ASC;
order by coalesce(date-time-field,large date in future)
When your order column is numeric (like a rank) you can multiply it by -1 and then order descending. It will keep the order you're expecing but put NULL last.
select *
from table
order by -rank desc
In Oracle, you can use NULLS FIRST or NULLS LAST: specifies that NULL values should be returned before / after non-NULL values:
ORDER BY { column-Name | [ ASC | DESC ] | [ NULLS FIRST | NULLS LAST ] }
For example:
ORDER BY date DESC NULLS LAST
Ref: http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj13658.html
If you're using MariaDB, they mention the following in the NULL Values
documentation.
Ordering
When you order by a field that may contain NULL values, any NULLs are
considered to have the lowest value. So ordering in DESC order will see the
NULLs appearing last. To force NULLs to be regarded as highest values, one can
add another column which has a higher value when the main field is NULL.
Example:
SELECT col1 FROM tab ORDER BY ISNULL(col1), col1;
Descending order, with NULLs first:
SELECT col1 FROM tab ORDER BY IF(col1 IS NULL, 0, 1), col1 DESC;
All NULL values are also regarded as equivalent for the purposes of the
DISTINCT and GROUP BY clauses.
The above shows two ways to order by NULL values, you can combine these with the
ASC and DESC keywords as well. For example the other way to get the NULL values
first would be:
SELECT col1 FROM tab ORDER BY ISNULL(col1) DESC, col1;
-- ^^^^
SELECT *
FROM Employees
ORDER BY ISNULL(DepartmentId, 99999);
See this blog post.
Thanks RedFilter for providing excellent solution to the bugging issue of sorting nullable datetime field.
I am using SQL Server database for my project.
Changing the datetime null value to '1' does solves the problem of sorting for datetime datatype column. However if we have column with other than datetime datatype then it fails to handle.
To handle a varchar column sort, I tried using 'ZZZZZZZ' as I knew the column does not have values beginning with 'Z'. It worked as expected.
On the same lines, I used max values +1 for int and other data types to get the sort as expected. This also gave me the results as were required.
However, it would always be ideal to get something easier in the database engine itself that could do something like:
Order by Col1 Asc Nulls Last, Col2 Asc Nulls First
As mentioned in the answer provided by a_horse_with_no_name.
Solution using the "case" is universal, but then do not use the indexes.
order by case when MyDate is null then 1 else 0 end, MyDate
In my case, I needed performance.
SELECT smoneCol1,someCol2
FROM someSch.someTab
WHERE someCol2 = 2101 and ( someCol1 IS NULL )
UNION
SELECT smoneCol1,someCol2
FROM someSch.someTab
WHERE someCol2 = 2101 and ( someCol1 IS NOT NULL)
USE NVL function
select * from MyTable order by NVL(MyDate, to_date('1-1-1','DD-MM-YYYY'))
Here's the alternative of NVL in most famous DBMS
order by -cast([nativeDateModify] as bigint) desc

ERROR: array must not contain nulls PostgreSQL

My Query is
SELECT
id,
ARRAY_AGG(session_os)::integer[]
FROM
t
GROUP BY id
HAVING ARRAY_AGG(session_os)::integer[] && ARRAY[1,NULL]
It's giving ERROR: array must not contain nulls
Actually I want to get rows like
id | Session_OS
-------|-------------
641 | {1, 2}
642 | {NULL, 2}
643 | {NULL}
Kindly check the sample data here
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=7793fa763a360bf7334787e4249d6107
The && operator does not support NULL values. So, you need another approach. For example you could join the data to the table first. This gives you the ids which are linked to your required data. At the second step you are able to arregate all values using these ids.
step-by-step demo:db<>fiddle
SELECT
id,
ARRAY_AGG(session_os) -- 4
FROM t
WHERE id IN ( -- 3
SELECT
id
FROM
t
JOIN (
SELECT unnest(ARRAY[1, null]) as a -- 1
)s ON s.a IS NOT DISTINCT FROM t.session_os -- 2
)
GROUP BY id
Create a table or query result which contains your relevant data, incl. the NULL value.
You can join the data, incl. the NULL value, using the operator IS NOT DISTINCT FROM, which considers the NULL.
Now you have fetched the relevant id values which can be used in the WHERE filter
Finally your can do your aggregation
The extension intarray installs its own && operator for int[], and this doesn't allow NULLs and it takes precedence over the built-in && operator.
If you are not using intarray, you can just uninstall it (except for in dbfiddle, where you can't). If you are using it occasionally, I think it is best to install it in its own schema which is not in your search path. Then you need to schema qualify its operators when you do need them.
Alternatively, you can leave intarray in place and schema qualify the normal built-in operators when you need those ones specifically, as shown here.

Find duplicate rows in PostgreSQL with additional criteria

I have a table called entries that has the following columns: case_id, number and filed_on.
If I were only looking for duplicates where the case_id and number were the same, I would use the following query:
SELECT case_id, number, count(*) FROM entries GROUP BY case_id, number HAVING count(*) > 1;
But I would like to filter by an additional criterion, namely, that at least 1 of the duplicate rows has a filed_on that is null.
I thought the following query would work, but I think it gives me duplicate rows where ALL the duplicates have filed_on set to null, instead of duplicate rows where 1 or more of the rows have filed_on of null:
SELECT case_id, number, count(*) FROM entries WHERE filed_on IS NULL GROUP BY case_id, number HAVING count(*) > 1;
Any ideas for how I can modify this query to get what I want?
You want a condition which is checked after grouping, not before, i.e. HAVING instead of WHERE. Note that the condition should either be one of grouping fields or aggregate (just like in SELECT). You should be able to count number of rows which satisfies the condition like in this answer:
SELECT case_id, number, count(*)
FROM entries
GROUP BY case_id, number
HAVING (count(*) > 1) AND (count(CASE WHEN filed_on IS NULL THEN 1 END) >= 1)
See SQL Fiddle

postgresql: exclude data based on other incomplete data

In this data - there are multiple DATA_ID values associated with time-series data. I am trying to exclude all data from any DATA_ID values that return a NULL value for USE for any timestamp value.
In other words, I only want to return DATA_ID values (and their data) if they have complete (not any NULL) values for all timestamp values.
Sample query given below:
SELECT
My.Table.DATA_ID,
MY.Table.timestamp,
My.Table.USE
FROM
My.TABLE
WHERE timestamp BETWEEN '2012-06-01 00:00:00' AND '2012-06-02 23:59:59'
-- Something here that says exclude all data from DATA_ID(s)
-- with any missing USE data, i.e. USE=NULL
ORDER BY DATA_ID, timestamp
Assuming I understand your question correctly and you want to exclude whole batches of samples (determined by equal data_id and timestamp) that contain a null value.
SELECT
My.Table.DATA_ID,
MY.Table.timestamp,
My.Table.USE
FROM
My.TABLE o
WHERE timestamp BETWEEN '2012-06-01 00:00:00' AND '2012-06-02 23:59:59'
and not exists (select 1 from my_table i
where i.use is null
and i.data_id = o.data_id
and i.timestamp BETWEEN '2012-06-01 00:00:00' AND '2012-06-02 23:59:59')
ORDER BY DATA_ID, timestamp
The simple thing to do is something like this:
CREATE FUNCTION missing_info(MY.TABLE)
RETURNS BOOL
LANGUAGE SQL AS
$$ select $1.use is null -- chain conditions together with or.
-- no from clause needed. no where clause needed.
$$;
Then you can just add:
where (My.Table).missing_info is not true;
And as you need to change the logic as to what sorts of info is missing you can just change it in the function and everything still works.
This is the sort of encapsulation of derived information where ORDBMS's like PostgreSQL really shine.
Edit: Re-reading your example, it looks like what you are looking for is the IS NULL operator. However if you need to re-use some sort of logic, see the above example. NULL never "equals" NULL (because we can't say whether two unknown values are the same). But IS NULL tells you whether it is NULL or not.

PostgreSQL DISTINCT problem: works locally but not on server

I've come across a vexing problem with a PostgreSQL query. This works in my local development environment:
SELECT distinct (user_id) user_id, created_at, is_goodday
FROM table
WHERE ((created_at >= '2011-07-01 00:00:00') AND user_id = 95
AND (created_at < '2011-08-01 00:00:00'))
ORDER BY user_id, created_at ASC;
...but gives the following error on my QA server (which is on Heroku):
PGError: ERROR: syntax error at or near "user_id"
LINE 1: SELECT distinct (user_id) user_id, created_at,
^
Why could this be?
Other possibly relevant info:
I have tried single-quoting and double-quoting the field names
It's a Rails 3 app, but I'm using this SQL raw, i.e. no ActiveRecord magic
My local version of Postgres is 9.0.4 on Mac, but I have no idea what version Heroku is using
As per your comment, the standard PostgreSQL version of that query would be:
SELECT user_id, created_at, is_goodday
FROM table
WHERE created_at >= '2011-07-01 00:00:00'
AND created_at < '2011-08-01 00:00:00'
AND user_id = 95
ORDER BY created_at DESC, id DESC
LIMIT 1
You don't need user_id in the ORDER BY because you have user_id = 95, you want created_at DESC in the ORDER BY to put the most recent created_at at the top; then you LIMIT 1 to slice off just the first row in the result set. GROUP BY can be used to enforce uniqueness or if you need to group things for an aggregate function but you don't need it for either one of those here as you can get uniqueness through ORDER BY and LIMIT and you can hide your aggregation inside the ORDER BY (i.e. you don't need MAX because ORDER BY does that for you).
Since you have user_id = 95 in your WHERE, you don't need user_id in the SELECT but you can leave it in if that makes it easier for you in Ruby-land.
It is possible that you could have multiple entries with the same created_at so I added an id DESC to the ORDER BY to force PostgreSQL to choose the one with the highest id. There's nothing wrong with being paranoid when they really are out to get you and bugs definitely are out to get you.
Also, you want DESC in your ORDER BY to get the highest values at the top, ASC puts the lowest values at the top. The more recent timestamps will be the higher ones.
In general, the GROUP BY and SELECT have to match up because:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
But that doesn't matter here because you don't need a GROUP BY at all. I linked to the 8.3 version of the documentation to match the PostgreSQL version you're using.
There are probably various other ways to do this but this one as probably as straight forward and clear as you're going to get.
put a quote in user_id like user_id = '95'. Your query should be
SELECT distinct (user_id) as uid, created_at, is_goodday FROM table WHERE
((created_at >= '2011-07-01 00:00:00') AND user_id = '95' AND (created_at < '2011-08-01 00:00:00')) ORDER BY user_id, created_at ASC;
You're using DISTINCT ON (without writing the ON). Perhaps you should write the ON. Perhaps your postgres server dates from before the feature was implemented (which is pretty old by now).
If all else fails, you can always do that with some GROUP BY...