Removing null values from lead function - postgresql

I have used the following postgresql query to find the maximum difference between timestamp events for each user:
select
sq.user_id,
max(sq.diffs) inactivity
from (
select
user_id,
(lead("when", 1, now()) over (partition by user_id order by "when") - "when") as diffs
from tracking_viewed
) as sq
group by sq.user_id
order by inactivity desc;
This query works for a different table, but it returns all null values for the "when" column that includes nulls.
How can I remove or skip nulls from the lead and partition functions?

Related

postgresql order the null date to bottom [duplicate]

I have a SQL table with a datetime field. The field in question can be null. I have a query and I want the results sorted ascendingly by the datetime field, however I want rows where the datetime field is null at the end of the list, not at the beginning.
Is there a simple way to accomplish that?
select MyDate
from MyTable
order by case when MyDate is null then 1 else 0 end, MyDate
(A "bit" late, but this hasn't been mentioned at all)
You didn't specify your DBMS.
In standard SQL (and most modern DBMS like Oracle, PostgreSQL, DB2, Firebird, Apache Derby, HSQLDB and H2) you can specify NULLS LAST or NULLS FIRST:
Use NULLS LAST to sort them to the end:
select *
from some_table
order by some_column DESC NULLS LAST
I also just stumbled across this and the following seems to do the trick for me, on MySQL and PostgreSQL:
ORDER BY date IS NULL, date DESC
as found at https://stackoverflow.com/a/7055259/496209
If your engine allows ORDER BY x IS NULL, x or ORDER BY x NULLS LAST use that. But if it doesn't these might help:
If you're sorting by a numeric type you can do this: (Borrowing the schema from another answer.)
SELECT *
FROM Employees
ORDER BY ISNULL(DepartmentId*0,1), DepartmentId;
Any non-null number becomes 0, and nulls become 1, which sorts nulls last because 0 < 1.
You can also do this for strings:
SELECT *
FROM Employees
ORDER BY ISNULL(LEFT(LastName,0),'a'), LastName
Any non-null string becomes '', and nulls become 'a', which sorts nulls last because '' < 'a'.
This even works with dates by coercing to a nullable int and using the method for ints above:
SELECT *
FROM Employees
ORDER BY ISNULL(CONVERT(INT, HireDate)*0, 1), HireDate
(Lets pretend the schema has HireDate.)
These methods avoid the issue of having to come up with or manage a "maximum" value of every type or fix queries if the data type (and the maximum) changes (both issues that other ISNULL solutions suffer). Plus they're much shorter than a CASE.
You can use the built-in function to check for null or not null, as below. I test it and its working fine.
select MyDate from MyTable order by ISNULL(MyDate,1) DESC, MyDate ASC;
order by coalesce(date-time-field,large date in future)
When your order column is numeric (like a rank) you can multiply it by -1 and then order descending. It will keep the order you're expecing but put NULL last.
select *
from table
order by -rank desc
In Oracle, you can use NULLS FIRST or NULLS LAST: specifies that NULL values should be returned before / after non-NULL values:
ORDER BY { column-Name | [ ASC | DESC ] | [ NULLS FIRST | NULLS LAST ] }
For example:
ORDER BY date DESC NULLS LAST
Ref: http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj13658.html
If you're using MariaDB, they mention the following in the NULL Values
documentation.
Ordering
When you order by a field that may contain NULL values, any NULLs are
considered to have the lowest value. So ordering in DESC order will see the
NULLs appearing last. To force NULLs to be regarded as highest values, one can
add another column which has a higher value when the main field is NULL.
Example:
SELECT col1 FROM tab ORDER BY ISNULL(col1), col1;
Descending order, with NULLs first:
SELECT col1 FROM tab ORDER BY IF(col1 IS NULL, 0, 1), col1 DESC;
All NULL values are also regarded as equivalent for the purposes of the
DISTINCT and GROUP BY clauses.
The above shows two ways to order by NULL values, you can combine these with the
ASC and DESC keywords as well. For example the other way to get the NULL values
first would be:
SELECT col1 FROM tab ORDER BY ISNULL(col1) DESC, col1;
-- ^^^^
SELECT *
FROM Employees
ORDER BY ISNULL(DepartmentId, 99999);
See this blog post.
Thanks RedFilter for providing excellent solution to the bugging issue of sorting nullable datetime field.
I am using SQL Server database for my project.
Changing the datetime null value to '1' does solves the problem of sorting for datetime datatype column. However if we have column with other than datetime datatype then it fails to handle.
To handle a varchar column sort, I tried using 'ZZZZZZZ' as I knew the column does not have values beginning with 'Z'. It worked as expected.
On the same lines, I used max values +1 for int and other data types to get the sort as expected. This also gave me the results as were required.
However, it would always be ideal to get something easier in the database engine itself that could do something like:
Order by Col1 Asc Nulls Last, Col2 Asc Nulls First
As mentioned in the answer provided by a_horse_with_no_name.
Solution using the "case" is universal, but then do not use the indexes.
order by case when MyDate is null then 1 else 0 end, MyDate
In my case, I needed performance.
SELECT smoneCol1,someCol2
FROM someSch.someTab
WHERE someCol2 = 2101 and ( someCol1 IS NULL )
UNION
SELECT smoneCol1,someCol2
FROM someSch.someTab
WHERE someCol2 = 2101 and ( someCol1 IS NOT NULL)
USE NVL function
select * from MyTable order by NVL(MyDate, to_date('1-1-1','DD-MM-YYYY'))
Here's the alternative of NVL in most famous DBMS
order by -cast([nativeDateModify] as bigint) desc

postgress: insert rows to table with multiple records from other join tables

ّ am trying to insert multiple records got from the join table to another table user_to_property. In the user_to_property table user_to_property_id is primary, not null it is not autoincrementing. So I am trying to add user_to_property_id manually by an increment of 1.
WITH selectedData AS
( -- selection of the data that needs to be inserted
SELECT t2.user_id as userId
FROM property_lines t1
INNER JOIN user t2 ON t1.account_id = t2.account_id
)
INSERT INTO user_to_property (user_to_property_id, user_id, property_id, created_date)
VALUES ((SELECT MAX( user_to_property_id )+1 FROM user_to_property),(SELECT
selectedData.userId
FROM selectedData),3,now());
The above query gives me the below error:
ERROR: more than one row returned by a subquery used as an expression
How to insert multiple records to a table from the join of other tables? where the user_to_property table contains a unique record for the same user-id and property_id there should be only 1 record.
Typically for Insert you use either values or select. The structure values( select...) often (generally?) just causes more trouble than it worth, and it is never necessary. You can always select a constant or an expression. In this case convert to just select. For generating your ID get the max value from your table and then just add the row_number that you are inserting: (see demo)
insert into user_to_property(user_to_property_id
, user_id
, property_id
, created
)
with start_with(current_max_id) as
( select max(user_to_property_id) from user_to_property )
select current_max_id + id_incr, user_id, 3, now()
from (
select t2.user_id, row_number() over() id_incr
from property_lines t1
join users t2 on t1.account_id = t2.account_id
) js
join start_with on true;
A couple notes:
DO NOT use user for table name, or any other object name. It is a
documented reserved word by both Postgres and SQL standard (and has
been since Postgres v7.1 and the SQL 92 Standard at lest).
You really should create another column or change the column type
user_to_property_id to auto-generated. Using Max()+1, or
anything based on that idea, is a virtual guarantee you will generate
duplicate keys. Much to the amusement of users and developers alike.
What happens in an MVCC when 2 users run the query concurrently.

Rank() window function on Redshift over multiple columns , not 1 or 2

I want to use rank() window function on a Redshift database to rank over specific, multiple columns. The code shall check those multiple columns per each row and assign same rank to rows that have identical values in ALL those columns.
Example image found in link below:
https://ibb.co/GJv1xQL
There are 18 distinct rows, however the rank shows 3 distinct rows, according to the ranking I wish to apply.
I tried :
select tbl.*
, dense_rank() over (partition by secondary_id order by created_on, type1, type2, money, amount nulls last ) as rank
from table tbl
where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6'
But the ranks assigned were wrong, and then I tried:
select tbl.*
, dense_rank() over (partition by secondary_id,created_on, type1, type2, money, amount ) as rank
from table tbl
where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6'
But this assigned rank=1 everywhere, in every row.
I found how to solve this.
The reason that the order by all the columns of interest was failing, is because the timestamp column contained different values in miliseconds, which was not obvious by viewing the data . So I only took into account the timestamp up until seconds and it worked! So I converted created_on column to date_trunc('s',cd.created_on) .
select tbl.* , dense_rank() over (partition by secondary_id order by date_trunc('s',created_on), type1, type2, money, amount nulls last ) as rank from table tbl where secondary_id='92d30f87-b2da-45c0-bdf7-c5ca96fe5ea6'

select last of an item for each user in postgres

I want to get the last entry for each user but the customer_id is a hash 'ASAG#...' order by customer_id destroys the query. Is there an alternative?
Select Distinct On (l.customer_id)
l.customer_id
,l.created_at
,l.text
From likes l
Order By l.customer_id, l.created_at Desc
Your current query already appears to be working, q.v. here:
Demo
I don't know why your current query is not generating the results you would expect. It should return one distinct record for every customer, corresponding to the more recent one, given your ORDER BY statement.
In any case, if it does not do what you want, an alternative would be to use ROW_NUMBER() here with a partition by user. The inner query assigns a row number to each user, with the value 1 going to the most recent record for each user. Then the outer query retains only the latest record.
SELECT
t.customer_id,
t.created_at,
t.text
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) rn
FROM likes
) t
WHERE t.rn = 1
To speed up the inner query which uses ROW_NUMBER() you can try adding a composite index on the customer_id and created_at columns:
CREATE INDEX yourIdx ON likes (customer_id, created_at);

PostgreSQL row diff timestamp, and calculate stddev for group

I have a table with an ID column called mmsi and another column of timestamp, with multiple timestamps per mmsi.
For each mmsi I want to calculate the standard deviation of the difference between consecutive timestamps.
I'm not very experienced with SQL but have tried to construct a function as follows:
SELECT
mmsi, stddev(time_diff)
FROM
(SELECT mmsi,
EXTRACT(EPOCH FROM (timestamp - lag(timestamp) OVER (ORDER BY mmsi ASC, timestamp ASC)))
FROM ais_messages.ais_static
ORDER BY mmsi ASC, timestamp ASC) AS time_diff
WHERE time_diff IS NOT NULL
GROUP BY mmsi;
Your query looks on the right track, but it has several problems. You labelled your subquery, which looks almost right, with an alias which you then select. But this subquery returns multiple rows and columns so this doesn't make any sense. Here is a corrected version:
SELECT
t.mmsi,
STDDEV(t.time_diff) AS std
FROM
(
SELECT
mmsi,
EXTRACT(EPOCH FROM (timestamp - LAG(timestamp) OVER
(PARTITION BY mmsi ORDER BY timestamp))) AS time_diff
FROM ais_messages.ais_static
ORDER BY mmsi, timestamp
) t
WHERE t.time_diff IS NOT NULL
GROUP BY t.mmsi
This approach should be fine but there is one edge case where it might not behave as expected. If a given mmsi group have only one record, then it would not even appear in the result set of standard deviations. This is because the LAG calculation would return NULL for that single record and it would be filtered off.