How to expand columns into individual timesteps in PostgreSQL - postgresql

I have a table of columns that represent a time series. The datatypes are not important, but anything after timestep2 could potentially be NULL.
| id | timestep1 | timestep2 | timestep3 | timestep4 |
|----|-----------|-----------|-----------|-----------|
| a | foo1 | bar1 | baz1 | qux1 |
| b | foo2 | bar2 | baz2 | NULL |
I am attempting to retrieve a view of the data more suitable for modeling. My modeling use-case requires that I break each time series (row) into rows representing their individual "states" at each step. That is:
| id | timestep1 | timestep2 | timestep3 | timestep4 |
|----|-----------|-----------|-----------|-----------|
| a | foo1 | NULL | NULL | NULL |
| a | foo1 | bar1 | NULL | NULL |
| a | foo1 | bar1 | baz1 | NULL |
| a | foo1 | bar1 | baz1 | qux1 |
| b | foo2 | NULL | NULL | NULL |
| b | foo2 | bar2 | NULL | NULL |
| b | foo2 | bar2 | baz2 | NULL |
How can I accomplish this in PostgreSQL?

Use UNION.
select id, timestep1, timestep2, timestep3, timestep4
from my_table
union
select id, timestep1, timestep2, timestep3, null
from my_table
union
select id, timestep1, timestep2, null, null
from my_table
union
select id, timestep1, null, null, null
from my_table
order by
id,
timestep2 nulls first,
timestep3 nulls first,
timestep4 nulls first
There is a more compact solution, maybe more convenient when dealing with a greater number of timesteps:
select distinct
id,
timestep1,
case when i > 1 then timestep2 end as timestep2,
case when i > 2 then timestep3 end as timestep3,
case when i > 3 then timestep4 end as timestep4
from my_table
cross join generate_series(1, 4) as i
order by
id,
timestep2 nulls first,
timestep3 nulls first,
timestep4 nulls first
Test it in Db<>fiddle.

Related

Postgresql array unique aggregation

I have a large table with structure
CREATE TABLE t (
id SERIAL primary key ,
a_list int[] not null,
b_list int[] not null,
c_list int[] not null,
d_list int[] not null,
type int not null
)
I want query all unique values from a_list, b_list, c_list, d_list for type like this
select
some_array_unique_agg_function(a_list),
some_array_unique_agg_function(b_list),
some_array_unique_agg_function(c_list),
some_array_unique_agg_function(d_list),
count(1)
where type = 30
For example, for this data
+----+---------+--------+--------+---------+------+
| id | a_list | b_list | c_list | d_list | type |
+----+---------+--------+--------+---------+------+
| 1 | {1,3,4} | {2,4} | {1,1} | {2,4,5} | 30 |
| 1 | {1,2,4} | {2,4} | {4,1} | {2,4,5} | 30 |
| 1 | {1,3,5} | {2} | {} | {2,4,5} | 30 |
+----+---------+--------+--------+---------+------+
I want the next result
+-------------+--------+--------+-----------+-------+
| a_list | b_list | c_list | d_list | count |
+-------------+--------+--------+-----------+-------+
| {1,2,3,4,5} | {2,4} | {1,4} | {2,4,5} | 3 |
+-------------+--------+--------+-----------+-------+
Is there some_array_unique_agg_function for my purposes?
Try this
with cte as (select
unnest( a_list::text[] )::integer as a_list,
unnest( b_list::text[] )::integer as b_list,
unnest( c_list::text[] )::integer as c_list,
unnest( d_list::text[] )::integer as d_list,
(select count(type) from t) as type
from t
where type = 30
)
select array_agg(distinct a_list),array_agg(distinct b_list)
,array_agg(distinct c_list),array_agg(distinct d_list),type from cte group by type ;
Result:
"{1,2,3,4,5}";"{2,4,NULL}";"{1,4,NULL}";"{2,4,5}";3

How to fill Null with the previous value in PostgreSQL?

I have a table which contains Null values. I need to replace them with a previous non-Null value.
This is an example of data which I have:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | Null |
2018-01-03 | A | 0 | Null |
2018-01-04 | A | 0 | Null |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | Null |
2018-01-07 | B | 0 | Null |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | Null |
2018-01-10 | A | 0 | Null |
The result should look like this:
date | category | start_period | period_number |
------------------------------------------------------
2018-01-01 | A | 1 | 1 |
2018-01-02 | A | 0 | 1 |
2018-01-03 | A | 0 | 1 |
2018-01-04 | A | 0 | 1 |
2018-01-05 | B | 1 | 2 |
2018-01-06 | B | 0 | 2 |
2018-01-07 | B | 0 | 2 |
2018-01-08 | A | 1 | 3 |
2018-01-09 | A | 0 | 3 |
2018-01-10 | A | 0 | 3 |
I tried the following query, but in this case, only the first Null value will be replaced.
select
date,
category,
start_period,
case
when period_number isnull then lag(period_number) over()
else period_number
end as period_number
from period_table;
Also, I tried to use first_value() window function, but I don't know how to set up the correct window.
Any help is highly appreciated.
You can join table with itself and get desired value. Assuming your date column is the primary key or unique.
update your_table upd set period_number = tbl.period_number
from
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t
inner join your_table tbl on tbl.date = t.d2
where t.date= upd.date
If you don't need to update the table but only a select statement then
select yt.date, yt.category, yt.start_period, tbl.period_number
from your_table yt
inner join
(
select b.date, max(b2.date) as d2 from your_table b
inner join d_batch_tab b2 on b2.date< b.date and b2.period_number is not null
group by b.date
)t on yt.date = t.date
inner join your_table tbl on tbl.date = t.d2
If you replace your case statement with:
(
select
_.period_number
from
period_table as _
where
_.period_number is not null
and _.category = period_table.category
and _.date <= period_table.date
order by
_.date desc
limit 1
) as period_number
Then it should have the intended effect. It's nowhere near as elegant as a window function but I don't think window functions are quite flexible enough for your specific use case here (Or at least, if they are, I don't know how to flex them that much)
Examples of windows function and frame clause:
select
date,category,score
,FIRST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
) as last_score
from testing.rec_test
order by date, category
select
date,category,score
,LAST_VALUE(score) OVER (
PARTITION BY category
ORDER BY date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as last_score
from testing.rec_test
order by date, category

Select all columns from two tables

Lets say I have the following:
table_a
| id | date | order_id | sku | price |
--------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 |
table_b
| id | date | order_id | description | type | notes | valid |
-------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | test | AA | | true |
I want to get get all columns from both tables, so the resulting table looks like this:
| id | date | order_id | sku | price | description | type | notes | valid |
---------------------------------------------------------------------------------
| 10 | 2016-08-18 | 111 | ABC | 10 | | | | |
---------------------------------------------------------------------------------
| 50 | 2016-08-18 | 111 | | | test | AA | | true |
I tried union:
(
SELECT *
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT *
from table_b
where table_b.date > Date('today')
)
But I get a:
ERROR: each UNION query must have the same number of columns
How can this be fixed / is there another way to do this?
Easily :)
(
SELECT id, date, order_id, sku, price, NULL AS description, NULL AS type, NULL AS notes, NULL AS valid
from table_a
where table_a.date > Date('today')
)
UNION
(
SELECT id, date, order_id, NULL AS sku, NULL AS price, description, type, notes, valid
from table_b
where table_b.date > Date('today')
)
Alternatively, instead of UNION you can just JOIN them:
SELECT *
FROM table_a A
JOIN table_b B USING ( id )
WHERE A.date > TIMESTAMP 'TODAY'
AND B.date > TIMESTAMP 'TODAY';
See more options: https://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-JOIN

Left outer join - how to return a boolean for existence in the second table?

In PostgreSQL 9 on CentOS 6 there are 60000 records in pref_users table:
# \d pref_users
Table "public.pref_users"
Column | Type | Modifiers
------------+-----------------------------+--------------------
id | character varying(32) | not null
first_name | character varying(64) | not null
last_name | character varying(64) |
login | timestamp without time zone | default now()
last_ip | inet |
(... more columns skipped...)
And another table holds around 500 ids of users which are not allowed to play anymore:
# \d pref_ban2
Table "public.pref_ban2"
Column | Type | Modifiers
------------+-----------------------------+---------------
id | character varying(32) | not null
first_name | character varying(64) |
last_name | character varying(64) |
city | character varying(64) |
last_ip | inet |
reason | character varying(128) |
created | timestamp without time zone | default now()
Indexes:
"pref_ban2_pkey" PRIMARY KEY, btree (id)
In a PHP script I am trying to display all 60000 users from pref_users in a jQuery-dataTable. And I would like to mark the banned users (the users found in pref_ban2).
Which means I need a column named ban for each record in my query holding true or false.
So I am trying a left outer join query:
# select
b.id, -- how to make this column a boolean?
u.id,
u.first_name,
u.last_name,
u.city,
u.last_ip,
to_char(u.login, 'DD.MM.YYYY') as day
from pref_users u left outer join pref_ban2 b on u.id=b.id
limit 10;
id | id | first_name | last_name | city | last_ip | day
----+----------+-------------+-----------+-------------+-----------------+------------
| DE1 | Alex | | Bochum | 2.206.0.224 | 21.11.2014
| DE100032 | Княжна Мэри | | London | 151.50.61.131 | 01.02.2014
| DE10011 | Aлександр Ш | | Симферополь | 37.57.108.13 | 01.01.2014
| DE10016 | Semen10 | | usa | 69.123.171.15 | 25.06.2014
| DE10018 | Горловка | | Горловка | 178.216.97.214 | 25.09.2011
| DE10019 | -Дмитрий- | | пермь | 5.140.81.95 | 21.11.2014
| DE10047 | Василий | | Cумы | 95.132.42.185 | 25.07.2014
| DE10054 | Maedhros | | Чикаго | 207.246.176.110 | 26.06.2014
| DE10062 | ssergw | | москва | 46.188.125.206 | 12.09.2014
| DE10086 | Вадим | | Тула | 109.111.26.176 | 26.02.2012
(10 rows)
As you can see the b.id column above is empty - because these 10 users aren't banned.
How to get a false value in that column instead of a String?
And I am not after some coalesceor case expression, but am looking for "the proper" way to do such a query.
"IS NULL" and "IS NOT NULL" return a boolean, so this should make it easy.
I think this is all you need?
SELECT
b.id IS NOT NULL as is_banned, -- The value of "is_banned" will be a boolean
Not sure if you need the "NOT" or not, but you'll get a bool either way.
A CASE or COALESCE statement with an outer join IS the proper way to do this.
select
CASE
WHEN b.id IS NULL THEN true
ELSE false
END AS banned,
u.id,
u.first_name,
u.last_name,
u.city,
u.last_ip,
to_char(u.login, 'DD.MM.YYYY') as day
from pref_users u
left outer join pref_ban2 b
on u.id=b.id
limit 10;

Recursive SQL PostgreSQL Empty Result Set

The categories table:
=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+-------
public | categories | table | pgsql
public | products | table | pgsql
public | ticketlines | table | pgsql
(3 rows)
Contents of categories:
=# select * from categories;
id | name | parentid
----+--------+----------
1 | Rack |
2 | Women | 1
3 | Shorts | 2
4 | Wares |
5 | Toys | 4
6 | Trucks | 5
(6 rows)
Running the following query:
WITH RECURSIVE nodes_cte(name, id, parentid, depth, path) AS (
-- Base case?
SELECT c.name,
c.id,
c.parentid,
1::INT AS depth,
c.id::TEXT AS path
FROM categories c
WHERE c.parentid = ''
UNION ALL
-- nth case
SELECT c.name,
c.id,
c.parentid,
n.depth + 1 AS depth,
(n.path || '->' || c.id::TEXT)
FROM nodes_cte n
JOIN categories c on n.id = c.parentid
)
SELECT * FROM nodes_cte AS n GROUP BY n.name, n.id, n.parentid, n.depth, n.path ORDER BY n.id ASC
;
yields these results:
name | id | parentid | depth | path
--------+----+----------+-------+---------
Rack | 1 | | 1 | 1
Women | 2 | 1 | 2 | 1->2
Shorts | 3 | 2 | 3 | 1->2->3
Wares | 4 | | 1 | 4
Toys | 5 | 4 | 2 | 4->5
Trucks | 6 | 5 | 3 | 4->5->6
(6 rows)
Great!
But given a similar table (categories):
=# \d categories
Table "public.categories"
Column | Type | Modifiers
----------+-------------------+-----------
id | character varying | not null
name | character varying | not null
parentid | character varying |
image | bytea |
Indexes:
"categories_pkey" PRIMARY KEY, btree (id)
"categories_name_inx" UNIQUE, btree (name)
Referenced by:
TABLE "products" CONSTRAINT "products_fk_1" FOREIGN KEY (category) REFERENCES categories(id)
=# select * from categories;
id | name | parentid | image
--------------------------------------+-------+--------------------------------------+-------
611572c9-326d-4cf9-ae4a-af5269fc788e | Rack | |
22d15300-40b5-4f43-a8d1-902b8d4c5409 | Women | 611572c9-326d-4cf9-ae4a-af5269fc788e |
6b061073-96f4-49a1-9205-bab7c878f0cf | Wares | |
3f018dfb-e6ee-40d1-9dbc-31e6201e7625 | Toys | 6b061073-96f4-49a1-9205-bab7c878f0cf |
(4 rows)
the same query produces zero rows.
Why?
Is it something to do with primary / foreign keys?
WHERE COALESCE(parent_id, '') = ''
Worked. Thank you.