Postgres 10 lateral unnest missing null values - postgresql

I have a Postgres table where the content of a text column is delimited with '|'.
ID | ... | my_column
-----------------------
1 | ... | text|concatenated|as|such
2 | ... | NULL
3 | ... | NULL
I tried to unnest(string_to_array()) this column to separate rows which works fine, except that my NULL values (>90% of all entries) are excluded. I have tried several approaches:
SELECT * from "my_table", lateral unnest(CASE WHEN "this_column" is NULL
THEN NULL else string_to_array("this_column", '|') END);
or
as suggested here: PostgreSQL unnest with empty array
What I get:
ID | ... | my_column
-----------------------
1 | ... | text
1 | ... | concatenated
1 | ... | as
1 | ... | such
But this is what I need:
ID | ... | my_column
-----------------------
1 | ... | text
1 | ... | concatenated
1 | ... | as
1 | ... | such
2 | ... | NULL
3 | ... | NULL

Use a LEFT JOIN instead:
SELECT m.id, t.*
from my_table m
left join lateral unnest(string_to_array(my_column, '|')) as t(w) on true;
There is no need for the CASE statement to handle NULL values. string_to_array will handle them correctly.
Online example: http://rextester.com/XIGXP80374

Related

How to expand columns into individual timesteps in PostgreSQL

I have a table of columns that represent a time series. The datatypes are not important, but anything after timestep2 could potentially be NULL.
| id | timestep1 | timestep2 | timestep3 | timestep4 |
|----|-----------|-----------|-----------|-----------|
| a | foo1 | bar1 | baz1 | qux1 |
| b | foo2 | bar2 | baz2 | NULL |
I am attempting to retrieve a view of the data more suitable for modeling. My modeling use-case requires that I break each time series (row) into rows representing their individual "states" at each step. That is:
| id | timestep1 | timestep2 | timestep3 | timestep4 |
|----|-----------|-----------|-----------|-----------|
| a | foo1 | NULL | NULL | NULL |
| a | foo1 | bar1 | NULL | NULL |
| a | foo1 | bar1 | baz1 | NULL |
| a | foo1 | bar1 | baz1 | qux1 |
| b | foo2 | NULL | NULL | NULL |
| b | foo2 | bar2 | NULL | NULL |
| b | foo2 | bar2 | baz2 | NULL |
How can I accomplish this in PostgreSQL?
Use UNION.
select id, timestep1, timestep2, timestep3, timestep4
from my_table
union
select id, timestep1, timestep2, timestep3, null
from my_table
union
select id, timestep1, timestep2, null, null
from my_table
union
select id, timestep1, null, null, null
from my_table
order by
id,
timestep2 nulls first,
timestep3 nulls first,
timestep4 nulls first
There is a more compact solution, maybe more convenient when dealing with a greater number of timesteps:
select distinct
id,
timestep1,
case when i > 1 then timestep2 end as timestep2,
case when i > 2 then timestep3 end as timestep3,
case when i > 3 then timestep4 end as timestep4
from my_table
cross join generate_series(1, 4) as i
order by
id,
timestep2 nulls first,
timestep3 nulls first,
timestep4 nulls first
Test it in Db<>fiddle.

Filter postgresql query where value is not a substring of any other value in the same column

I have a table with a column called values:
values | other_columns...
-------+-----------------
f | ...
foo | ...
fo | ...
bar | ...
ba | ...
baz | ...
foobar | ...
When querying this table I want to filter the results so that the only remaining rows are those in which value is not a substring of any other value in the column:
prime_values | other_result_columns...
-------------+------------------------
baz | ...
foobar | ...
How can I do this?
With NOT EXISTS:
select t.*
from tablename t
where not exists (
select 1
from tablename
where
values <> t.values
and
values like concat('%', t.values, '%')
)
See the demo.
Results:
> | values |...
> | :----- |
> | baz |
> | foobar |

How to fill Null with the previous value in PostgreSQL?

I have a table which contains Null values. I need to replace them with a previous non-Null value.
This is an example of data which I have:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | Null |
2018-01-03 | A | 0 | Null |
2018-01-04 | A | 0 | Null |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | Null |
2018-01-07 | B | 0 | Null |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | Null |
2018-01-10 | A | 0 | Null |
The result should look like this:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | 1 |
2018-01-03 | A | 0 | 1 |
2018-01-04 | A | 0 | 1 |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | 2 |
2018-01-07 | B | 0 | 2 |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | 3 |
2018-01-10 | A | 0 | 3 |
I tried the following query, but in this case, only the first Null value will be replaced.
select
date,
category,
start_period,
case
when period_number isnull then lag(period_number) over()
else period_number
end as period_number
from period_table;
Also, I tried to use first_value() window function, but I don't know how to set up the correct window.
Any help is highly appreciated.
You can join table with itself and get desired value. Assuming your date column is the primary key or unique.
update your_table upd set period_number = tbl.period_number
from
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t
inner join your_table tbl on tbl.date = t.d2
where t.date= upd.date
If you don't need to update the table but only a select statement then
select yt.date, yt.category, yt.start_period, tbl.period_number
from your_table yt
inner join
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t on yt.date = t.date
inner join your_table tbl on tbl.date = t.d2
If you replace your case statement with:
(
select
_.period_number
from
period_table as _
where
_.period_number is not null
and _.category = period_table.category
and _.date <= period_table.date
order by
_.date desc
limit 1
) as period_number
Then it should have the intended effect. It's nowhere near as elegant as a window function but I don't think window functions are quite flexible enough for your specific use case here (Or at least, if they are, I don't know how to flex them that much)
Examples of windows function and frame clause:
select
date,category,score
,FIRST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
) as last_score
from testing.rec_test
order by date, category
select
date,category,score
,LAST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as last_score
from testing.rec_test
order by date, category

NULL help in T_SQL script

On SQL Server 2008R2, I am using this script:
SELECT a.id,
a.ea1,
b.ea2
FROM database1table1 AS a
WHERE a.id LIKE N'Active;
The result set looks like this:
+-----+-----+---------------+---------------+
| Row | ID | EA1 | EA2 |
+-----+-----+---------------+---------------+
| 1 | 1 | wf#email.co | NULL |
| 2 | 1 | NULL | wf2#email.co |
| 3 | 1 | NULL | NULL |
| 4 | 2 | NULL | NULL |
| 5 | 3 | wf3#email.co | NULL |
+-----+-----+---------------+---------------+
etc.
ID = business number.
EA = email address.
In the above output, there are three rows where ID=1, but only two of those have email addresses.
I want my result to output the rows where there is no email address. So for this example, the output should only include rows where ID=2.
I have tried adding this WHERE clause:
AND (a.EA1 IS NULL) AND (a.EA2 IS NULL);
It's still returning rows where ID=1, because one of the rows there has no email address.
Can anyone please suggest an amendment to my script which would only return the row where ID=2?
Many thanks
Try with NOT EXISTS
SELECT
*
FROM
Tbl T
WHERE
T.EA1 IS NULL AND
T.EA2 IS NULL AND
NOT EXISTS
(
SELECT 1 FROM Tbl IT
WHERE
IT.ID = T.ID AND
(
IT.EA1 IS NOT NULL OR
IT.EA2 IS NOT NULL
)
)
;WITH CTE
AS
(
SELECT ID,MAX(ROW) AS RW,MAX(EA1) AS EA1,MAX(EA2) AS EA2
FROM #TEMP GROUP BY ID
)
SELECT * FROM CTE WHERE EA1 IS NULL AND EA2 IS NULL
Output:
ID RW EA1 EA2
2 4 NULL NULL

postgres select latest record per entity except on field value

I have a log table that looks something like this:
---------------------------------------------
| id | company | type | date_created |notes|
---------------------------------------------
| 1 | co1 | | 2016-06-30 | ... |
| 2 | co2 | ERROR | 2016-06-30 | ... |
| 3 | co1 | | 2016-06-29 | ... |
| 4 | co2 | | 2016-06-29 | ... |
I have the following which selects the latest record per entity:
SELECT *
FROM import_data_log a
JOIN (SELECT company, max(date_created) date_created
FROM import_data_log
GROUP BY company) b
ON a.company = b.company AND a.date_created = b.date_created
which gives the result:
| 1 | co1 | | 2016-06-30 | ... |
| 2 | co2 | ERROR | 2016-06-30 | ... |
I need to add a condition that does not select the entry with type = ERROR and get the next highest date for that company, so it would give:
| 1 | co1 | | 2016-06-30 | ... |
| 4 | co2 | | 2016-06-29 | ... |
Any ideas? It's probably something simple but for the life of me I can't seem to get it to work.
UPDATE / FIX:
Ok so after a lot of hair pulling, for anyone running into this issue, apparently Postgres doesn't compare null fields with anything, so it completely ignores all rows with type = null.
The way I fixed it is this, there is probably a nicer solution out there but for now this works:
SELECT *
FROM import_data_log a
JOIN (SELECT company, max(date_created) date_created
FROM import_data_log
WHERE (type <> 'ERROR' OR type is null)
GROUP BY company) b
ON a.company = b.company AND a.date_created = b.date_created
Use the below Query
SELECT id,company,type,max(date_created),notes
FROM import_data_log
WHERE type != 'ERROR'
GROUP BY company
select distinct on (company) *
from import_data_log
where type is distinct from 'ERROR'
order by company, date_created desc
Check distinct on and is [not] distinct from:
Ordinary comparison operators yield null (signifying "unknown"), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL. When this behavior is not suitable, use the IS [ NOT ] DISTINCT FROM constructs