How to have two sort options in PostgreSQL RECURSIVE - postgresql

I have the following query that recursively organises comments and their replies.
WITH RECURSIVE comment_tree AS (
SELECT
id AS comment_id,
body AS comment_body,
reply_to AS comment_reply_to,
1 AS level,
"createdAt" AS comment_date,
commenter_id,
article_id,
array["createdAt"] AS path_info
FROM "Comments"
WHERE "reply_to" IS NULL
UNION ALL
SELECT
c.id,
c.body,
c.reply_to,
p.level + 1,
"createdAt",
c.commenter_id,
c.article_id,
p.path_info || c."createdAt"
FROM "Comments" c
JOIN comment_tree p ON c.reply_to = comment_id
)
SELECT
comment_id,
path_info,
comment_body,
comment_reply_to,
comment_date,
level,
U.first_name,
U.last_name,
coalesce(U.username, CAST(U.id AS VARCHAR)) AS username
FROM comment_tree
LEFT JOIN
"Users" U ON commenter_id = U.id
WHERE article_id = '62834723-B804-4CA1-B984-D949B1A7E1E2'
ORDER BY path_info DESC;
From what I can see... this is working well so far except for the sorting.
Currently the comments are sorted oldest to newest. which then nests the replies underneath correctly but I want the parent list to be newest to oldest.
Is there a way I can sort the child values DESC and the Parents ASC?
eg.
+----+----------+----------+
| id | reply_to | date |
+----+----------+----------+
| C1 | null | 01052016 | < - Oldest
| C2 | null | 02052016 |
| C3 | C1 | 03052016 |
| C4 | C1 | 04052016 |
| C5 | null | 05052016 |
| C6 | C4 | 06052016 |
| C7 | C2 | 07052016 |
| C8 | C6 | 08052016 | < - Newest
| | | |
+----+----------+----------+
desired result
| C5 (Newest Parent first)
| C2
| C7
| C1
| C3 (Oldest Child first for all tiers below parent)
| C4
| C6
| C8

I'd introduce an artificial column sort in the Common Table Expression.
With Commentsdefined like this:
Table "laurenz.Comments"
┌───────────┬───────────────────────┬───────────┐
│ Column │ Type │ Modifiers │
├───────────┼───────────────────────┼───────────┤
│ id │ character varying(10) │ not null │
│ reply_to │ character varying(10) │ │
│ createdAt │ date │ not null │
└───────────┴───────────────────────┴───────────┘
Indexes:
"comment_tree_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"comment_tree_reply_to_fkey" FOREIGN KEY (reply_to) REFERENCES "Comments"(id)
Referenced by:
TABLE ""Comments"" CONSTRAINT "comment_tree_reply_to_fkey" FOREIGN KEY (reply_to) REFERENCES "Comments"(id)
I'd write something like this:
WITH RECURSIVE comment_tree AS (
SELECT id, reply_to, "createdAt",
CAST(current_date - "createdAt" AS text) AS sort
FROM "Comments"
WHERE reply_to IS NULL
UNION ALL SELECT c.id, c.reply_to, c."createdAt",
substring(p.sort FROM '^[^-]*') || '-' || c."createdAt"
FROM "Comments" c
JOIN comment_tree p ON c.reply_to = p.id
)
SELECT id, reply_to, "createdAt"
FROM comment_tree
ORDER BY sort;

Related

Using unnest to join in Postgres

Appreciate this is a simple use case but having difficulty doing a join in Postgres using an array.
I have two tables:
table: shares
id | likes_id_array timestamp share_site
-----------------+-----------------+----------+-----------
12345_6789 | [xxx, yyy , zzz]| date1 | fb
abcde_wxyz | [vbd, fka, fhx] | date2 | tw
table: likes
likes_id | name | location
--------+-------+----------+-----
xxx | aaaa | nice
fpg | bbbb | dfpb
yyy | mmmm | place
dhf | cccc | fiwk
zzz | dddd | here
desired - a result set based on shares.id = 12345_6789:
likes_id | name | location | timestamp
--------+-------+----------+------------+-----------
xxx | aaaa | nice | date1
yyy | mmmm | place | date1
zzz | dddd | here | date1
the first step is using unnest() for the likes_id_array:
SELECT unnest(likes_id_array) as i FROM shares
WHERE id = '12345_6789'
but I can't figure out how to join the results set this produces, with the likes table on likes_id. Any help would be much appreciated!
You can create a CTE with your query with the likes identifiers, and then make a regular inner join with the table of likes
with like_ids as (
select
unnest(likes_id_array) as like_id
from shares
where id = '12345_6789'
)
select
likes_id,
name,
location
from likes
inner join like_ids
on likes.likes_id = like_ids.like_id
Demo
You can use ANY:
SELECT a.*, b.timestamp FROM likes a JOIN shares b ON a.likes_id = ANY(b.likes_id_array) WHERE id = '12345_6789';
You could do this with subqueries or a CTE, but the easiest way is to call the unnest function not in the SELECT clause but as a table expression in the FROM clause:
SELECT likes.*, shares.timestamp
FROM shares, unnest(likes_id_array) as arr(likes_id)
JOIN likes USING (likes_id)
WHERE shares.id = '12345_6789'
You can use jsonb_array_elements_text with a (implicit) lateral join:
SELECT
likes.likes_id,
likes.name,
likes.location,
shares.timestamp
FROM
shares,
jsonb_array_elements_text(shares.likes_id_array) AS share_likes(id),
likes
WHERE
likes.likes_id = share_likes.id AND
shares.id = '12345_6789';
Output:
┌──────────┬──────┬──────────┬─────────────────────┐
│ likes_id │ name │ location │ timestamp │
├──────────┼──────┼──────────┼─────────────────────┤
│ xxx │ aaaa │ nice │ 2022-10-12 11:32:39 │
│ yyy │ mmmm │ place │ 2022-10-12 11:32:39 │
│ zzz │ dddd │ here │ 2022-10-12 11:32:39 │
└──────────┴──────┴──────────┴─────────────────────┘
(3 rows)
Or if you want to make the lateral join explicit (notice the addition of the LATERAL keyword):
SELECT
likes.likes_id,
likes.name,
likes.location,
shares.timestamp
FROM
shares,
LATERAL jsonb_array_elements_text(shares.likes_id_array) AS share_likes(id),
likes
WHERE
likes.likes_id = share_likes.id AND
shares.id = '12345_6789';

Selecting on a condition in window function postgresql

I am using postgresql and applying window function. previously I had to find first gid with same last name , and address(street_address and city) so i simply put last name in partition by clause in window function.
but now I have requirement to find first g_id of which last name is not same. while address is same How can I do it ?
This is what i was doing previously.
SELECT g_id as g_id,
First_value(g_id)
OVER (PARTITION BY lname,street_address , city ,
order by last_date DESC NULLS LAST )as c_id,
street_address as street_address FROM my table;
lets say this is my db
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x2 | bar | abc road | khi | 12-6-19
x3 | foo | abc road | khi | 19-6-19
x4 | harry | abc road | khi | 17-6-19
x5 | bar | xyz road | khi | 11-6-19
_________________________________________________
In previous scenario :
for if i run for the first row my c_id, it should return 'x2' as it considers these rows:
_________________________________________________
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x2 | bar | abc road | khi | 12-6-19
_________________________________________________
and return a row with latest last_date.
what i want now to select these rows (rows with same street_address and city but no same l_name):
g_id | l_name | street_address | city | last_date
_________________________________________________
x1 | bar | abc road | khi | 11-6-19
x3 | foo | abc road | khi | 19-6-19
x4 | harry | abc road | khi | 17-6-19
_________________________________________________
and output will be x3.
somehow i want to compare last_name column if it is not equals to the current value of last name and then partition by address field. and if no rows satisfy the condition c_id should be equal to current g_id
Looking at your expected output,it's not clear whether you want earliest or oldest for each group. You may change the ORDER BY accordingly for last_date in this query which uses DISTINCT ON
SELECT DISTINCT ON ( street_address, city, l_name) *
FROM mytable
ORDER BY street_address,
city,
l_name,
last_date --change this to last_date desc if you want latest
DEMO
After discussing the details in this chat:
demo:db<>fiddle
SELECT DISTINCT ON (t1.g_id)
t1.*,
COALESCE(t2.g_id, t1.g_id) AS g_id
FROM
mytable t1
LEFT JOIN mytable t2
ON t1.street_address = t2.street_address AND t1.l_name != t2.l_name
ORDER BY t1.g_id, t2.last_date DESC
here is how I solved it using subquery
creating example table.
CREATE TABLE mytable
("g_id" varchar(2), "l_name" varchar(5), "street_address" varchar(8), "city" varchar(3), "last_date" date)
;
INSERT INTO mytable
("g_id", "l_name", "street_address", "city", "last_date")
VALUES
('x1', 'bar', 'abc road', 'khi', '11-6-19'),
('x2', 'bar', 'abc road', 'khi', '12-6-19'),
('x3', 'foo', 'abc road', 'khi', '19-6-19'),
('x4', 'harry', 'abc road', 'khi', '17-6-19'),
('x5', 'bar', 'xyz road', 'khi', '11-6-19')
;
query to get g_ids
SELECT * ,
(select b.g_id from mytable b where (base.g_id = b.g_id) or (base.l_name <>
b.l_name and base.street_address = b.street_address and base.city = b.city )
order by b.last_date desc limit 1)
from mytable base

PostgreSQL: detecting the first/last rows of result set

Is there any way to embed a flag in a select that indicates that it is the first or the last row of a result set? I'm thinking something to the effect of:
> SELECT is_first_row() AS f, is_last_row() AS l FROM blah;
f | l
-----------
t | f
f | f
f | f
f | f
f | t
The answer might be in window functions but I've only just learned about them, and I question their efficiency.
SELECT first_value(unique_column) OVER () = unique_column, last_value(unique_column) OVER () = unique_column, * FROM blah;
seems to do what I want. Unfortunately, I don't even fully understand that syntax, but since unique_column is unique and NOT NULL it should deliver unambiguous results. But if it does sorting, then the cure might be worse than the disease. (Actually, in my tests, unique_column is not sorted, so that's something.)
EXPLAIN ANALYZE doesn't indicate there's an efficiency problem, but when has it ever told me what I needed to know?
And I might need to use this in an aggregate function, but I've just been told window functions aren't allowed there. 😕
Edit:
Actually, I just added ORDER BY unique_column to the above query and the rows identified as first and last were thrown into the middle of the result set. It's as if first_value()/last_value() really means "the first/last value I picked up before I began sorting." I don't think I can safely do this optimally. Not unless a much better understanding of the use of the OVER keyword is to be had.
I'm running PostgreSQL 9.6 in a Debian 9.5 environment.
This isn't a duplicate, because I'm trying to get the first row and last row of the result set to identify themselves, while Postgres: get min, max, aggregate values in one select is just going for the minimum and maximum values for a column in a result set.
You can use the lead() and lag() window functions (over the appropiate window) and compare them to NULL:
-- \i tmp.sql
CREATE TABLE ztable
( id SERIAL PRIMARY KEY
, starttime TIMESTAMP
);
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '1 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '2 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '3 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '4 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '5 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '6 minute');
SELECT id, starttime
, ( lead(id) OVER www IS NULL) AS is_first
, ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY id )
ORDER BY id
;
SELECT id, starttime
, ( lead(id) OVER www IS NULL) AS is_first
, ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY id
;
SELECT id, starttime
, ( lead(id) OVER www IS NULL) AS is_first
, ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY random()
;
Result:
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
id | starttime | is_first | is_last
----+----------------------------+----------+---------
1 | 2018-08-31 18:38:45.567393 | f | t
2 | 2018-08-31 18:37:45.575586 | f | f
3 | 2018-08-31 18:36:45.587436 | f | f
4 | 2018-08-31 18:35:45.592316 | f | f
5 | 2018-08-31 18:34:45.600619 | f | f
6 | 2018-08-31 18:33:45.60907 | t | f
(6 rows)
id | starttime | is_first | is_last
----+----------------------------+----------+---------
1 | 2018-08-31 18:38:45.567393 | t | f
2 | 2018-08-31 18:37:45.575586 | f | f
3 | 2018-08-31 18:36:45.587436 | f | f
4 | 2018-08-31 18:35:45.592316 | f | f
5 | 2018-08-31 18:34:45.600619 | f | f
6 | 2018-08-31 18:33:45.60907 | f | t
(6 rows)
id | starttime | is_first | is_last
----+----------------------------+----------+---------
2 | 2018-08-31 18:37:45.575586 | f | f
4 | 2018-08-31 18:35:45.592316 | f | f
6 | 2018-08-31 18:33:45.60907 | f | t
5 | 2018-08-31 18:34:45.600619 | f | f
1 | 2018-08-31 18:38:45.567393 | t | f
3 | 2018-08-31 18:36:45.587436 | f | f
(6 rows)
[updated: added a randomly sorted case]
It is simple using window functions with particular frames:
with t(x, y) as (select generate_series(1,5), random())
select *,
count(*) over (rows between unbounded preceding and current row),
count(*) over (rows between current row and unbounded following)
from t;
┌───┬───────────────────┬───────┬───────┐
│ x │ y │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.543995119165629 │ 1 │ 5 │
│ 2 │ 0.886343683116138 │ 2 │ 4 │
│ 3 │ 0.124682310037315 │ 3 │ 3 │
│ 4 │ 0.668972567655146 │ 4 │ 2 │
│ 5 │ 0.266671542543918 │ 5 │ 1 │
└───┴───────────────────┴───────┴───────┘
As you can see count(*) over (rows between unbounded preceding and current row) returns rows count from the data set beginning to current row and count(*) over (rows between current row and unbounded following) returns rows count from the current to data set end. 1 indicates the first/last rows.
It works until you ordering your data set by order by. In this case you need to duplicate it in the frames definitions:
with t(x, y) as (select generate_series(1,5), random())
select *,
count(*) over (order by y rows between unbounded preceding and current row),
count(*) over (order by y rows between current row and unbounded following)
from t order by y;
┌───┬───────────────────┬───────┬───────┐
│ x │ y │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.125781774986535 │ 1 │ 5 │
│ 4 │ 0.25046408502385 │ 2 │ 4 │
│ 5 │ 0.538880597334355 │ 3 │ 3 │
│ 3 │ 0.802807193249464 │ 4 │ 2 │
│ 2 │ 0.869908029679209 │ 5 │ 1 │
└───┴───────────────────┴───────┴───────┘
PS: As mentioned by a_horse_with_no_name in the comment:
there is no such thing as the "first" or "last" row without sorting.
In fact, Window Functions are a great approach and for that requirement of yours, they are awesome.
Regarding efficiency, window functions work over the data set already at hand. Which means the DBMS will just add extra processing to infer first/last values.
Just one thing I'd like to suggest: I like to put an ORDER BY criteria inside the OVER clause, just to ensure the data set order is the same between multiple executions, thus returning the same values to you.
Try using
SELECT columns
FROM mytable
Join conditions
WHERE conditions ORDER BY date DESC LIMIT 1
UNION ALL
SELECT columns
FROM mytable
Join conditions
WHERE conditions ORDER BY date ASC LIMIT 1
SELECT just cut half of the processing time. You can go for indexing also.

Flatten hierarchy on self-join table

I have data in a self-join hierarchical table where Continents have many Countries have many Regions have many States have many Cities.
Self-joining table structure:
|-------------------------------------------------------------|
| ID | Name | Type | ParentID | IsTopLevel |
|-------------------------------------------------------------|
| 1 | North America | Continent | NULL | 1 |
| 12 | United States | Country | 1 | 0 |
| 113 | Midwest | Region | 12 | 0 |
| 155 | Kansas | State | 113 | 0 |
| 225 | Topeka | City | 155 | 0 |
| 2 | South America | Continent | NULL | 1 |
| 22 | Argentina | Country | 2 | 0 |
| 223 | Southern | Region | 22 | 0 |
| 255 | La Pampa | State | 223 | 0 |
| 777 | Santa Rosa | City | 255 | 0 |
|-------------------------------------------------------------|
I have been able to successfully use a recursive CTE to get the tree structure and depth of each node. Where I am failing is using a pivot to create a nice list of all bottom locations and their corresponding parents at each level.
The expected results:
|------------------------------------------------------------------------------------|
| Continent | Country | Region | State | City | Bottom_Level_ID |
|------------------------------------------------------------------------------------|
| North America | United States | Midwest | Kansas | Topeka | 234 |
| South America | Argentina | Southern | La Pampa | Santa Rosa | 777 |
|------------------------------------------------------------------------------------|
There are a few key points I should clarify.
Every single entry has a bottom level and a top level. There are no
cases where all five Types are not present for a given location.
If I filled out this data, I'd have 50 entries for North America at the
State level, so you can imagine how immense this table is at the
City level for every continent on the planet. Billions of rows.
The reason this is a necessity is because I need to be able to join onto a historical table of all addresses a person has lived at, and journey up the tree. I figure if I have the LocationID from that table, I can just LEFT JOIN onto a View of this query and nab the appropriate columns.
This is an old database, 2005, and I don't have sysadmin or control of the schema.
My CTE Code
--CTE
;WITH Tree
AS (
SELECT ID, Name, ParentID, Type, 1 as Depth
FROM LocationTable
WHERE IsTopLevel = 1
UNION ALL
SELECT L.ID, L.Name, L.ParentID, L.Type, T.Depth+1
FROM Tree T
JOIN LocationTable L
ON L.ParentGUID = T.GUID
)
Good solid data, in a mostly useful format. BUT then I got to thinking about it and isn't the table structure already in this format, so why would I bother doing a depth tree search if I wasn't going to join the entries together at the same time?
Anyway, here was the rest.
The Pivot Attempt
;WITH Tree
AS (
SELECT ID, Name, ParentID, Type
FROM LocationTable
WHERE IsTopLevel = 1
UNION ALL
SELECT L.ID, L.Name, L.ParentID, L.Type
FROM Tree T
JOIN LocationTable L
ON L.ParentGUID = T.GUID
)
select *
from Tree
pivot (
max(Name)
for Type in ([Continent],[Country],[Region],[State],[City])
) pvt
And now I have everything by Type in a column, with nulls for everything else. As I have struggled with before, I need to filter/join the CTE data before I attempt my pivot, but I have no idea where to start with that piece. Everything I have tried is soooooooooo sloooooooow.
Everytime I think I understand CTEs and Pivot, something new makes me extremely humbled. Please help me. ; ;
If your structure is as clean as you describe it (no gaps, 5 levels always) you might go the easy way:
This data really demands for a classical 1:n-table-tree, where your Countries, States etc. live in their own tables and link to their parent record
Make sure there's an index on ParentID and ID!
DECLARE #tbl TABLE(ID INT,Name VARCHAR(100),Type VARCHAR(100),ParentID INT,IsTopLevel BIT);
INSERT INTO #tbl VALUES
(1,'North America','Continent',NULL,1)
,(12,'United States','Country',1,0)
,(113,'Midwest','Region',12,0)
,(155,'Kansas','State',113,0)
,(225,'Topeka','City',155,0)
,(2,'South America','Continent',NULL,1)
,(22,'Argentina','Country',2,0)
,(223,'Southern','Region',22,0)
,(255,'La Pampa','State',223,0)
,(777,'Santa Rosa','City',255,0);
SELECT Level1.Name AS Continent
,Level2.Name AS Country
,Level3.Name AS Region
,Level4.Name AS State
,Level5.Name AS City
,Level5.ID AS Bottom_Level_ID
FROM #tbl AS Level1
INNER JOIN #tbl AS Level2 ON Level1.ID=Level2.ParentID
INNER JOIN #tbl AS Level3 ON Level2.ID=Level3.ParentID
INNER JOIN #tbl AS Level4 ON Level3.ID=Level4.ParentID
INNER JOIN #tbl AS Level5 ON Level4.ID=Level5.ParentID
WHERE Level1.ParentID IS NULL
The result
Continent Country Region State City Bottom_Level_ID
North America United States Midwest Kansas Topeka 225
South America Argentina Southern La Pampa Santa Rosa 777
Another solution with CTE could be :
;WITH Tree
AS (
SELECT cast(NULL as varchar(100)) as C1, cast(NULL as varchar(100)) as C2, cast(NULL as varchar(100)) as C3, cast(NULL as varchar(100)) as C4, Name as C5, ID as B_Level
FROM LocationTable
WHERE IsTopLevel = 1
UNION ALL
SELECT T.C2, T.C3, T.C4, T.C5, L.Name, L.ID
FROM Tree T
JOIN LocationTable L
ON L.ParentID = T.B_Level
)
select *
from Tree
where C1 is not null

How to eliminate repeated field with GROUP BY clause?

I have 3 tables called:
1.app_tenant pk:id, fk:pasar_id
---+--------+-----------+
id | nama | pasar_id |
----+--------+-----------+
1 | joe | 1 |
2 | adi | 2 |
3 | adam | 3 |
2.app_pasar pk:id
----+------------- +
id | nama |
----+------------- +
1 | kosambi |
2 | gede bage |
3 | pasar minggu |
3.app_kios pk:id, fk:tenant_id
----+---------------+----------
id | nama |tenant_id
----+-------------- +----------
1 | kios1 |1
2 | kios2 |2
3 | kios3 |3
4 | kios4 |1
5 | kios5 |1
6 | kios6 |2
7 | kios7 |2
8 | kios8 |3
9 | kios9 |3
Then with a LEFT JOIN query and grouping by id in every table I want to displaying data like this:
----+---------------+------------+-----------
id | nama_tenant |nama_pasar |nama_kios
----+-------------- +------------------------
1 | joe |kosambi |kios 1
2 | adi |gede bage |kios 2
2 | adam |pasar minggu|kios 3
but after I execute this query, data are not shown as expected. The problem is
redundancy in the nama_tenant field. How can I eliminate repeated nama_tenantrecords?
This is my query:
select a.id,a.nama as nama_tenant,
b.nama as nama_pasar,
c.nama as nama_kios
from app_tenant a
left join app_pasar b on a.id=b.id
left join app_kios c on a.id= c.tenant_id
group by
a.id,
b.id,
c.id
Table definitions:
CREATE TABLE app_tenant (
id serial PRIMARY KEY,
nama character varying,
pasar_id integer);
CREATE TABLE app_kios (
id serial PRIMARY KEY,
nama character varying,
tenant_id integer REFERENCES app_tenant);
The problem is that tenants can have multiple kiosks. From your sample data it looks like you want to display the first kiosk of every tenant (although "first" is a vague concept on strings, here I use alphabetical sort order). Your query would be like this:
SELECT t.id, t.nama AS nama_tenant, p.nama AS nama_pasar, k.nama AS nama_kios
FROM app_tenant t
LEFT JOIN app_pasar p ON p.id = t.pasar_id
LEFT JOIN (
SELECT tenant_id, nama, rank() OVER (PARTITION BY tenant_id ORDER BY nama) AS rnk
FROM app_kios
WHERE rnk = 1) k ON k.tenant_id = t.id
ORDER BY t.id
The sub-query on app_kios uses a window function to get the first kiosk name after sorting the names of the kiosk for each tenant.
I would also suggest to use meaningful aliases for table names instead of simply a, b, c.