Finding and creating missing rows in table - postgresql

Hello postgres experts,
I have an app where users can vote on a poll. My schema looks like this:
polls table:
id
name
1
Favorite fruit
options table:
id
poll_id
content
1
1
apple
2
1
orange
3
1
grape
4
1
banana
participants table:
id
poll_id
name
1
1
John
2
1
Jane
votes table:
id
poll_id
participant_id
option_id
type
1
1
1
1
yes
2
1
1
3
yes
3
1
2
2
yes
I made the poor choice of deciding to not create rows for "no" votes in the votes table thinking it would "save space". I realize now that it was not such a great idea because in the future I would like to know whether the user explicitly voted "no" or if perhaps the option was added after they voted and thus did not have the option to choose it. So I need to run a query that will fill all the missing "no" votes in the votes table for existing participants. The final result should look like this:
votes table:
id
poll_id
participant_id
option_id
type
1
1
1
1
yes
2
1
1
3
yes
3
1
2
2
yes
4
1
1
2
no
5
1
1
4
no
6
1
2
1
no
7
1
2
3
no
8
1
2
4
no
I have a dbfiddle with all the data already in it:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7d0f4c83095638cc6006b1d7876d0e01
Side question: Should I be concerned about the size of the votes table in this schema? I expect it to quickly blow up to millions of rows. Is a schema where options are stored as an array in the polls table and votes stored in the participants table a better idea?
Thank you for your help.

You seem to be looking for a JOIN of participants with options, EXCEPT the rows that already are in votes. There are various ways to do that, but most straightforward:
INSERT INTO votes(poll_id, participant_id, option_id, type)
SELECT poll_id, participant_id, option_id, 'no'
FROM (
SELECT o.poll_id, p.id, o.id
FROM options o
JOIN participants p ON o.poll_id = p.poll_id
EXCEPT
SELECT poll_id, participant_id, option_id
FROM votes
) AS missing;
Alternatively:
INSERT INTO votes(poll_id, participant_id, option_id, type)
SELECT o.poll_id, p.id, o.id, 'no'
FROM options o
JOIN participants p ON o.poll_id = p.poll_id
WHERE NOT EXISTS (
SELECT *
FROM votes
WHERE poll_id = o.poll_id AND participant_id = p.id AND option_id = o.id
);
Or, assuming you already have UNIQUE index on votes, just
INSERT INTO votes(poll_id, participant_id, option_id, type)
SELECT o.poll_id, p.id, o.id, 'no'
FROM options o
ON CONFLICT ON CONSTRAINT votes_p_key
DO NOTHING;

Related

PostgreSQL Query not returning the proper results

So this is my table structure
learning_paths
id
name
version
created_at
updated_at
learning_path_levels
id
name
learning_path_id
order
created_at
updated_at
learning_path_level_nodes
id
name
description
documentation_links
evaluation_methodology
learning_path_level_id
created_at
updated_at
learning_path_node_users
id
learning_path_level_node_id
user_id
evaluated_by
evaluated_at
is_successful
created_at
updated_at
I'm writing a query to retrieve the learning_path_name, count of the amount of levels each learning path has, the pending and completed nodes per level for the user, and the total amount of nodes per level.
I have the following query
select learning_paths."name",
sum(case when learning_path_node_users.is_successful and learning_path_node_users.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when learning_path_node_users.is_successful = false or learning_path_node_users.user_id is null then 1 else 0 end) as pending_nodes,
count(learning_path_levels.id) as total_levels,
count(*) as total_nodes
from learning_path_level_nodes
inner join learning_path_levels on learning_path_levels.id = learning_path_level_nodes.learning_path_level_id
inner join learning_paths on learning_paths.id = learning_path_levels.learning_path_id
left join learning_path_node_users on learning_path_node_users.learning_path_level_node_id = learning_path_level_nodes.id
group by learning_paths."name"
which returns:
name
completed_nodes
pending_nodes
total_levels
total_nodes
Devops
5
3
8
8
QA
0
1
1
1
Project manager
3
3
6
6
AI
0
5
5
5
Everything is correct, except for the levels count,
for example, for Devops,it should be 2, and it is returning 8
for Project Manager it should be 2, and it is returning 6
a pattern I see is that it returns the amount of nodes as the amount of levels,
How can I fix this?
I'd really appreciate any help or suggestions, as I've been struggling with this.
Thanks in advance
EDIT: As per your suggestion, I'm attaching a fiddle with the tables and data.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f29676ff7051686a28de96928db1e3a6
While I don't get the exact results you want, I think you want to add a distinct to your count for the total levels:
select
lp.name,
sum(case when u.is_successful and u.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when u.is_successful = false or u.user_id is null then 1 else 0 end) as pending_nodes,
count(distinct lpl.id) as total_levels, -- added "distinct"
array_agg (lpl.id) as level_detail, -- debugging aid
count(*) as total_nodes
from
learning_path_level_nodes n
join learning_path_levels lpl on lpl.id = n.learning_path_level_id
join learning_paths lp on lp.id = lpl.learning_path_id
left join learning_path_node_users u on u.learning_path_level_node_id = n.id
group by
lp.name
To help expose the rationale, I added the field level_detail, which you can delete, to show why the results are what they are. You can obviously remove that once the results are what you want.
If it's not what you expect, perhaps you can explain or give by example what I might be missing.

How to include and exclude ids in once query postgresql

I use PostgreSQL 13.3
I'm trying to think how I can make include/exclude in query at the same time
I have include_system_ids [1,5] and exclude_system_ids [3]
There's one big table - records
system_records table
record
system_id
1
1
1
5
1
3
2
1
2
5
If a record contains an exclusive identifier, then it should not be included in the final selection. I had some several tries, but I didn't get a necessary result
Awaiting result: record with id 2
Fact result: 1, 2
My variants
select r.id from records r
left join (select record_id from system_records
where system_id in (1,5)
) include_ids on r.id = include_ids
left join (select record_id from system_records
where system_id not in (3)
) exclude_ids on r.id = exclude_ids.id
Honestly, I don't understand how I can do it((
Is there anyone who can help me
Maybe this query could be a solution (result here)
with x as (select record,string_agg(system_id::varchar,',') as sys_id from records group by record)
select records.*
from records,x
where records.record = x.record
and x.sys_id = '1,5'

Limit for inner Join Table

I have a scenario where I am joining three tables and getting the results.
My problem is i have apply limit for joined table.
Take below example, i have three tables 1) books and 2) Customer 3)author. I need to find list of books sold today with author and customer name however i just need last nth customers not all by passing books Id
Books Customer Authors
--------------- ---------------------- -------------
Id Name AID Id BID Name Date AID Name
1 1 1 ABC 1 A1
2 2 1 CED 2 A2
3 3 2 DFG
How we can achieve this?
You are looking for LATERAL.
Sample:
SELECT B.Id, C.Name
FROM Books B,
LATERAL (SELECT * FROM Customer WHERE B.ID=C.BID ORDER BY ID DESC LIMIT N) C
WHERE B.ID = ANY(ids)
AND Date=Current_date

Postgresql selecting with limit equal values?

I have one postgresql table where I store some stories from different sites.
At this table I got story_id and site_id fields.
Where story_id is the primary key and site_id is the id of the site where I got this story from.
I need to make SELECT from this table picking the latest 30 added stories.
But I dont want to get more than 2 stories comming from same site...
So if I have something like this:
story_id | site_id
1 | 1
2 | 1
3 | 2
4 | 1
5 | 3
My results must be : story_ids = 1,2,3,5!
4 must be skipped because I have already picked 2 ids with site_id 1.
select story_id,
site_id
from (
select story_id,
site_id,
row_number() over (partition by site_id order by story_id desc) as rn
from the_table
) t
where rn <= 2
order by story_id desc
limit 30
If you want more or less than 2 entries "per group" you have to adjust the value in the outer where clause.

How to determine whether a value exists in a junction table and return zero or one?

I am using SQL Server 2008 R2
I am trying to write a single query that will return only exactly what I need. I will drop in a MovieID and get back a list of ALL genres. If the movie represents a specific genre (has an associated record in the junction table), the Checked value will be 1. If not, then 0.
My result set should look like this:
GenreID Genre Checked
1 ABC 0
2 DEF 1
3 HIJ 0
4 KLM 1
My First table is named Genres. It looks like this:
GenreID Genre
1 ABC
2 DEF
3 HIJ
4 KLM
My second table is named Movies. It looks like this:
MovieID Title
1 Blah
2 Foo
3 Carpe
4 Diem
My third table is a junction table named Movies_Genres. It looks like this:
MovieID GenreID
1 2
1 1
1 4
2 1
2 3
3 4
4 1
I would normally, do a couple of queries and a couple of loops to handle this, but I want to really just make the database do the work here. How do I tweak my query so that I can get the resultset that I need with just a single query?
Here's the starting query:
SELECT GenreID,
Genre
FROM Genres
Thanks in advance for your help!!!
SELECT g.GenreID, g.Genre, Checked = CASE WHEN EXISTS
(SELECT 1 FROM dbo.Movies_Genres AS mg
INNER JOIN dbo.Movies AS m
ON mg.MovieID = m.MovieID
WHERE mg.GenreID = g.GenreID
AND m.MovieID = #MovieID) THEN 1 ELSE 0 END
FROM dbo.Genres AS g
ORDER BY g.GenreID;
If there is a unique constraint or primary key on dbo.Movies_Genres(MovieID, GenreID) then this can be simply:
SELECT g.GenreID, g.Genre, Checked = COUNT(mg.GenreID)
FROM dbo.Genres AS g
LEFT OUTER JOIN dbo.Movies_Genres AS mg
ON g.GenreID = mg.GenreID
AND mg.MovieID = #MovieID
GROUP BY g.GenreID, g.Genre;
...since the count for any genre can only be 0 or 1 given a single #MovieID.
Pretty straight forward using CASE;
SELECT DISTINCT g.GenreID, g.Genre,
CASE WHEN mg.MovieID IS NULL THEN 0 ELSE 1 END Checked
FROM Genres g
LEFT JOIN Movies_Genres mg
ON g.GenreID=mg.GenreID
AND mg.MovieId=#MovieID;
Demo here.
Edit: If entries are guaranteed to be unique in Movies_Genres, you could choose to drop the DISTINCT.
The #MovieID is the movie, you want to filter by.
SELECT Genres.GenreID,
Genres.Genre,
CASE WHEN (Movies_Genres.GenreID IS NULL)
THEN 0
ELSE 1
END AS Checked
FROM Genres LEFT JOIN
Movies_Genres ON Movies_Genres.GenreID = Genres.GenreID AND
MovieID = #MovieID