PostgreSQL - Fetch only a specified limit of related rows but with a number indicating the total count - postgresql

Basically I have 2 models like so:
Comments:
+----+--------+----------+-----------------+
| id | userId | parentId | text |
+----+--------+----------+-----------------+
| 2 | 5 | 1 | Beautiful photo |
+----+--------+----------+-----------------+
| 3 | 2 | 2 | Thanks Jeff. |
+----+--------+----------+-----------------+
| 4 | 7 | 2 | Thank you, Jeff.|
+----+--------+----------+-----------------+
This table is designed to handle threads. Each parentId is a comment itself.
And CommentLikes:
+----+--------+-----------+
| id | userId | commentId |
+----+--------+-----------+
| 1 | 2 | 2 |
+----+--------+-----------+
| 2 | 7 | 2 |
+----+--------+-----------+
| 3 | 7 | 3 |
+----+--------+-----------+
What I'm trying to achieve is an SQL query that will perform the following (given the parameter parentId):
Get a limit of 10 replies that belong to parentId. With each reply, I need a count indicating the total number of replies to that reply and another count indicating the total number of likes given to that reply.
Sample input #1: /replies/1
Expected output:
[{
id: 2,
userId: 5,
parentId: 1,
text: 'Beautiful photo',
likeCount: 2,
replyCount: 2
}]
Sample input #2: /replies/2
Expected output:
[
{
id: 2,
userId: 2,
parentId: 2,
text: 'Thanks Jeff.'
replyCount: 0,
likeCount: 1
},
{
id: 3,
userId: 7,
parentId: 2,
text: 'Thank you, Jeff.'
replyCount: 0,
likeCount: 0
}
]
I'm trying to use Sequelize for my case but it seems to only over-complicate things so any raw SQL query will do.
Thank you in advance.

what about something like this
SELECT *,
(SELECT COUNT(*) FROM comment_likes WHERE comments.id =comment_likes."commentId") AS likecount,
(SELECT COUNT(*) FROM comments AS c WHERE c."parentId" = comments.id) AS commentcount
FROM comments
WHERE comments."parentId"=2

Related

How to query across multiple rows in postgres

I'm saving dynamic objects (objects of which I do not know the type upfront) using the following 2 tables in Postgres:
CREATE TABLE IF NOT EXISTS objects(
id UUID NOT NULL DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE IF NOT EXISTS object_values(
id UUID NOT NULL DEFAULT gen_random_uuid(),
event_id UUID NOT NULL,
param TEXT NOT NULL,
value TEXT NOT NULL,
);
So for instance, if I have the following objects:
dog = [
{ breed: "poodle", age: 15, ...},
{ breed: "husky", age: 9, ...},
}
monitors = [
{ manufacturer: "dell", ...},
}
It will live in the DB as follows:
-- objects
| id | user_id | name |
|----|---------|---------|
| 1 | 1 | dog |
| 2 | 2 | dog |
| 3 | 1 | monitor |
-- object_values
| id | event_id | param | value |
|----|----------|--------------|--------|
| 1 | 1 | breed | poodle |
| 2 | 1 | age | 15 |
| 3 | 2 | breed | husky |
| 4 | 2 | age | 9 |
| 5 | 3 | manufacturer | dell |
Note, these tables are big (hundreds of millions). Generally optimised for writing.
What would be a good way of querying/filtering objects based on multiple object params? For instance: Select the number of all husky dogs above the age of 10 per unique user.
I also wonder whether it would have been better to denormalise the tables and collapse the params onto a JSON column (and use gin indexes).
Are there any standards I can use here?
"Select the number of all husky dogs above the age of 10 per unique user" - The following query would do it.
SELECT user_id, COUNT(DISTINCT event_id) AS num_husky_dogs_older_than_10
FROM objects o
INNER JOIN object_values ov
ON o.id_ = ov.event_id
AND o.name_ = 'dog'
GROUP BY o.user_id
HAVING MAX(CASE WHEN ov.param = 'age'
AND ov.value_::integer >= 10 THEN 1 END) = 1
AND MAX(CASE WHEN ov.param = 'breed'
AND ov.value_ = 'husky' THEN 1 END) = 1;
Since your queries are most likely affected by having always the same JOIN operation between these two tables on the same fields, would be good to have a indices on:
the fields you join on ("objects.id", "object_values.event_id")
the fields you filter on ("objects.name", "object_values.param", "object_values.value_")
Check the demo here.

Recursice JSONB join in Postgres

I am trying to expand existing system to allow self-referencing "foreighn key" relationships based on values in JSONB. I'll give an example for beter understanding
| ID | DATA |
|---------------------------------------------------------------|
| 1 |{building_id: 1, building_name: 'Office 1'} |
|---------------------------------------------------------------|
| 2 |{building_id: 2, building_name: 'Office 2'} |
|---------------------------------------------------------------|
| 3 |{building_id: 1, full_name: 'John Doe', calary: 3000} |
|---------------------------------------------------------------|
| 4 |{building_id: 1, full_name: 'Alex Smit', calary: 2000} |
|---------------------------------------------------------------|
| 5 |{building_id: 1, full_name: 'Anna Birkin', calary: 2500}|
I tried using jsonb_array_elements_text but i need new data to be combined in single JSON field like this
| ID | DATA |
|--------------------------------------------------------------------------------------------|
| 1 |{building_id: 1, building_name: 'Office 1', full_name: 'John Doe', calary: 3000} |
|--------------------------------------------------------------------------------------------|
| 2 |{building_id: 1, building_name: 'Office 1', full_name: 'Alex Smit', calary: 2000} |
|--------------------------------------------------------------------------------------------|
| 3 |{building_id: 2, building_name: 'Office 2', , full_name: 'Anna Birkin', calary: 2500}|
I am wondering whether it is even possible
Assuming that Anna Birkin is in building_id 2 and these different object types are consistent enough to determine types by the presence of keys, try something like this:
select b.data || p.data as result
from really_bad_idea b
join really_bad_idea p on p.data->'building_id' = b.data->'building_id'
where b.data ? 'building_name'
and p.data ? 'full_name';
db<>fiddle here

Replicating MongoDB $bucket with conditional sum in Postgres

I have a database with hundreds of thousands of rows with this schema:
+----+----------+---------+
| id | duration | type |
+----+----------+---------+
| 1 | 41 | cycling |
+----+----------+---------+
| 2 | 15 | walking |
+----+----------+---------+
| 3 | 6 | walking |
+----+----------+---------+
| 4 | 26 | running |
+----+----------+---------+
| 5 | 30 | cycling |
+----+----------+---------+
| 6 | 13 | running |
+----+----------+---------+
| 7 | 10 | running |
+----+----------+---------+
I was previously using a MongoDB aggregation to do this and get a distribution of activities by type and total count:
{
$bucket: {
groupBy: '$duration',
boundaries: [0, 16, 31, 61, 91, 121],
default: 121,
output: {
total: { $sum: 1 },
walking: {
$sum: { $cond: [{ $eq: ['$type', 'walking'] }, 1, 0] },
},
running: {
$sum: { $cond: [{ $eq: ['$type', 'running'] }, 1, 0] },
},
cycling: {
$sum: { $cond: [{ $eq: ['$type', 'cycling'] }, 1, 0] },
},
},
},
}
I have just transitioned to using Postgres and can't figure out how to do the conditional sums there. What would the query be to get a result table like this?
+---------------+---------+---------+---------+-------+
| duration_band | walking | running | cycling | total |
+---------------+---------+---------+---------+-------+
| 0-15 | 41 | 21 | 12 | 74 |
+---------------+---------+---------+---------+-------+
| 15-30 | 15 | 1 | 44 | 60 |
+---------------+---------+---------+---------+-------+
| 30-60 | 6 | 56 | 7 | 69 |
+---------------+---------+---------+---------+-------+
| 60-90 | 26 | 89 | 32 | 150 |
+---------------+---------+---------+---------+-------+
| 90-120 | 30 | 0 | 6 | 36 |
+---------------+---------+---------+---------+-------+
| 120+ | 13 | 90 | 0 | 103 |
+---------------+---------+---------+---------+-------+
| Total | 131 | 257 | 101 | 492 |
+---------------+---------+---------+---------+-------+
SQL is very good at retrieving and making calculations on data, and delivering it so getting the values you want is an easy task. It is not so good at formatting results, that why that task is typically left to the presentation layer. That said, however, does not mean it cannot be done - it can and in a single query. The difficulty is the pivot process - transforming rows into columns. But first some setup. You should put the duration data on a its own table (if not already). With the addition of a identifier which then allows multiple criteria sets (more on that later). I will proceed that way.
create table bands( name text, period int4range, title text );
insert into bands(name, period, title)
values ('Standard', '[ 0, 15)'::int4range , '0 - 15')
, ('Standard', '[ 15, 30)'::int4range , '15 - 30')
, ('Standard', '[ 30, 60)'::int4range , '30 - 60')
, ('Standard', '[ 60, 90)'::int4range , '60 - 00')
, ('Standard', '[ 90,120)'::int4range , '90 - 120')
, ('Standard', '[120,)'::int4range , '120+');
This sets up the your current criteria. The name column is the prior mentioned identifier where the title column becomes the duration band on the output. The interesting column is the period; defined as an integer range. In this case a [closed,open) range that includes the 1st number but not the 2nd - yea the brackets have meaning. That definition becomes the heart of resulting query. The query builds as follows:
Retrieve the desired interval set ( [0-5) ... ) set and append to it
a "totals" entry.
Define the list of activities (cycling, ...).
Combine these sets to create a list of interval set with each
activity.
The above gives the activity intervals which becomes the matrix generated when pivoted.
Combine the "test" table values into the above list calculating the
total time for each activity within each interval. This is the work
horse of the query. It does ALL of the calculations.
The above now contains intervals plus total activity for each cell in the matrix. However it still exists in row orientation.
With the results calculated pivot them from row orientation to
column orientation.
Finally compress the pivoted results into a single row for each interval and set the final interval ordering.
And the result is:
with buckets ( period , title, ord) as
( select period , title, row_number() over (order by lower(b.period)) ord ---- 1
from bands b
where name = 'Standard'
union all
select '[0,)','Total',count(*) + 1
from bands b
where name = 'Standard'
)
, activities (activity) as ( values ('running'),('walking'),('cycling'), ('Total')) ---- 2
, activity_buckets (period, title, ord, activity) as
(select * from buckets cross join activities) ---- 3
select s2.title "Duration Band" ---- 6
, max(cycling) "Cycling"
, max(running) "Running"
, max(walking) "Walking"
, max(Total) "Total "
from ( select s1.title, s1.ord
, case when s1.activity = 'cycling' then duration else null end cycling ---- 5
, case when s1.activity = 'running' then duration else null end running
, case when s1.activity = 'walking' then duration else null end walking
, case when s1.activity = 'Total' then duration else null end total
from ( select ab.ord, ab.title, ab.activity
, sum(coalesce(t.duration,0)) duration ---- 4
from activity_buckets ab
left join test t
on ( (t.type = ab.activity or ab.activity = 'Total')
and t.duration <# ab.period --** determines which time interval(s) the value belongs
)
group by ab.ord, ab.title, ab.activity
) s1
) s2
group by s2.ord,s2.title
order by s2.ord;
See demo. It contains each of the major steps along the way. Additionally it shows how creating a table for the intervals can be put to use. Since I dislike long queries I generally hide them behind a SQL function and then just use the function. Demo also contains this.

How to iterate over a JSON object in a transaction for multiple inserts in database using pg-promise

Apologies if my title might not be clear, I'll explain the question further here.
What I would like to do is to have multiple inserts based on a JSON array that I (backend) will be receiving from the frontend. The JSON object has the following data:
//Sample JSON
{
// Some other data here to insert
...
"quests": {
[
{
"player_id": [1, 2, 3],
"task_id": [11, 12],
},
{
"player_id": [4, 5, 6],
"task_id": [13, 14, 15],
}
]
}
Based on this JSON, this is my expected output upon being inserted in Table quests and processed by the backend:
//quests table (Output)
----------------------------
id | player_id | task_id |
----------------------------
1 | 1 | 11 |
2 | 1 | 12 |
3 | 2 | 11 |
4 | 2 | 12 |
5 | 3 | 11 |
6 | 3 | 12 |
7 | 4 | 13 |
8 | 4 | 14 |
9 | 4 | 15 |
10| 5 | 13 |
11| 5 | 14 |
12| 5 | 15 |
13| 6 | 13 |
14| 6 | 14 |
15| 6 | 15 |
// Not sure if useful info, but I will be using the player_id as a join later on.
-- My current progress --
What I currently have (and tried) is to do multiple inserts by iterating each JSON object.
//The previous JSON response I accept:
{
"quests: {
[
{
"player_id": 1,
"task_id": 11
},
{
"player_id": 1,
"task_id": 12
},
{
"player_id": 6,
"task_id": 15
}
]
}
}
// My current backend code
db.tx(async t => {
const q1 // some queries
....
const q3 = await t.none(
`INSERT INTO quests (
player_id, task_id)
SELECT player_id, task_id FROM
json_to_recordset($1::json)
AS x(player_id int, tasl_id int)`,[
JSON.stringify(quests)
]);
return t.batch([q1, q2, q3]);
}).then(data => {
// Success
}).catch(error => {
// Fail
});
});
It works, but I think it's not good to have a long request body, which is why I'm wondering if it's possible to run iteration of the arrays inside the object.
If there are information needed, I'll edit again this post.
Thank you advance!

How do I group by identifier and create an array of JSON objects in postgreSQL?

I have a table of transactions where I need to group by customerId and create aggregated JSON records in an array.
customerId|transactionVal|day
1234| 2|2019-01-01
1234| 3|2019-01-04
14| 1|2019-01-01
What I'm looking to do is return something like this:
customerId|transactions
1234|[{'transactionVal': 2, 'day': '2019-01-01'}, {'transactionVal': 3, 'day': '2019-01-04'}]
14|[{'transactionVal': 1, 'day': '2019-01-01'}]
I will need to later iterate through each transaction in the array to calculate % changes in the transactionVal.
I searched for a while but could not seem to find something to handle this as the table is quite large > 70mil rows.
Thanks!
Should be possible to use array_agg and json_build_object like so:
ok=# select customer_id,
array_agg(json_build_object('thing', thing, 'time', time))
from test group by customer_id;
customer_id | array_agg
-------------+---------------------------------------------------------------------------------------------------
2 | {"{\"thing\" : 1, \"time\" : 1}"}
1 | {"{\"thing\" : 1, \"time\" : 1}",
"{\"thing\" : 2, \"time\" : 2}",
"{\"thing\" : 3, \"time\" : 3}"}
(2 rows)
ok=# select * from test;
customer_id | thing | time
-------------+-------+------
1 | 1 | 1
2 | 1 | 1
1 | 2 | 2
1 | 3 | 3
(4 rows)
ok=#