PostgreSQL: a variation of rows to columns - postgresql

postgresql V 9.3
Best explained with an example:
So I have 2 tables:
Books tables:
book_id name
1 Aragorn
2 Harry Potter
3 The Great Gatsby
4 Book name, with a comma
Users ids to books ids table:
user_id book_id
31 1
31 2
32 3
34 1
34 4
And I would like to show each user his/her books so something like this:
user_id book_names
31 Aragorn,Harry Potter
32 The Great Gatsby
34 Aragorn,Book name, with a comma
Basically each user get his/her books separated by commas
How can I achieve this in an efficient way?

If you are using Postgres version 8.4 or later, then you have array_agg() at your disposal. One option is to aggregate over the user books table by user_id and then use array_agg() to generate the CSV list of books for each user.
SELECT t1.user_id,
array_to_string(array_agg(t2.name), ',') AS book_names
FROM user_books t1
INNER JOIN books t2
ON t1.book_id = t2.book_id
GROUP BY t1.user_id
In Postgres 9.0 and above, you could use the following to aggregate book names into a CSV list:
string_agg(t2.name, ',' order by t2.name)

Related

PostgreSQL how do I COUNT with a condition?

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.
Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

Best usage of indexes and primary key on joined and filtered data in PostgreSQL

I have 2 tables with the exact same number of rows and the same non-repeated id. Because the data comes from 2 sources I want to keep it 2 tables and not combine it. I assume the best approach would be to leave the unique id as the primary key and join on it?
SELECT * FROM tableA INNER JOIN tableB ON tableA primary key = tableB primary key
The data is used by an application that force the user to select 1 or many values from 5 drop downs in cascading order:
select 1 or many values from tableA column1.
select 1 or many values from tableA column2 but filtered from the first filter.
select 1 or many values from tableA column3 but filtered from the second filter which in turn is filtered from the first filter.
For example:
pk
Column 1
Column 2
Column 3
123
Doe
Jane
2022-01
234
Doe
Jane
2021-12
345
Doe
John
2022-03
456
Jones
Mary
2022-04
Selecting "Doe" from column1 would limit the second filter to ("Jane","John"). And selecting "Jane" from column2 would filter column3 to ("2022-01","2021-12")
And last part of the question;
The application have 3 selection options for column3:
picking the exact value (for example "2022-01") or picking the year ("2022") or picking the quarter that the month falls into ("Q1", which equates in "01","02","03").
What would be the best usage of indexes AND/OR additional columns for this scenario?
Volume of data would be 20-100 million rows.
Each filter is in the range of 5-25 distinct values.
Which version of Postgres do you operate?
The volume you state is rather daunting for such a use case of populating drop-down boxes using live data for a PG db.
No kidding, it's possible, Kibana/Elastic has even a filter widget that works exactly this way for instance.
My guess is you may consider storing the different combinations of search columns in another table simply to speed up populating the dropboxes. You can achieve that with triggers on the two main tables. So instead of additional columns/indexes you may end with an additional table ;)
Regarding indexing strategy and given the hints you stated (AND/OR), I'd say there's no silver bullet. Index the columns that will be queried the most often.
Index each column individually because Postgres starting from 11 IIRC can combine multiple indexes to answer conjunctive/disjunctive formulas in WHERE clauses.
Hope this helps

count(1) and distinct behaviour in postgres

Imagine a table:
name age
John 20
Sam 60
Dave 30
John 15
I want to check count of distinct names, I query the table like so:
SELECT COUNT(1), DISTINCT(name)
FROM table
GROUP BY 2
But I get:
ERROR: syntax error at or near "DISTINCT"
Position: 18
But when I use:
SELECT DISTINCT(name), COUNT(1)
FROM table
GROUP BY 1
I do get what's expected:
John 2
Sam 1
Dave 1
Is there a reason why the first query is not working or am I making a mistake somewhere?
The distinct here is not required. GROUP BY means 'group by a distinct set of values'
so
SELECT COUNT(*), name
FROM table
GROUP BY name;
Will give you the result I think you want.

number of lessons for a teacher SQL

I have four tables, one is teacher and 3 different kinds of lessons.
My task is to find the teacher that has given the most lessons during a year.
input:
lesson1.id lesson1.date teacher.id
1 2020-12-01 1
2 2020-04-01 1
lesson2.id lesson2.date teacher.id
1 2020-10-01 2
2 2020-05-01 3
lesson3.id lesson3.date teacher.id
1 2020-02-01 1
2 2020-06-01 3
teacher.id teacher.name
1 john
2 scott
3 david
output:
teacher.id teacher.name lessons_given
1 john 3
I tried to join them together with left join on teacher but its not working...
Hope you guys can help me out:)
Thanks
What you are attempting to build is a many-to-many (m:m) between Teacher and Lesson. Instead what you have is many one-to-many relationships. While that works for a small number of lessons (with some difficulty) think about the same requirement with with 50 or 500 or more lessons. What you actually need is 3 tables:
create table lessons( lesson_id integer generated always as identity
, name text
, subject text -- for example
-- other lesson related attributes
);
create table teachers( teacher_id integer generated always as identity
, name text
-- other related teacher attributes
);
create table teacher_lessons( teacher_id integer
, lesson_id integer
, lesson_date date
):
Now you have a structure that can handle any number of either teachers and/or lessons. And are further are available other uses as is, say perhaps students to lessons. See fiddle for current issue.
You could union all the three lesson tables to get a "flat" list of lessons, and then join that on the teachers table:
SELECT t.id, t.name, COUNT(*)
FROM teacher t
JOIN (SELECT teacher_id FROM lesson1 UNION ALL
SELECT teacher_id FROM lesson2 UNION ALL
SELECT teacher_id FROM lesson3) l ON t.id = l.teacher_id
The obvious way to do is to use Belayer's solution.
However if and only if you can not put all the data in a single table for some reason (for example if lesson1, lesson2 and lesson3 all have specific attributes), then another solution would be to use table inheritance.
For instance :
CREATE TABLE lesson (
id INT,
date TIMESTAMP,
teacher INT
);
ALTER TABLE lesson1 INHERIT lesson;
ALTER TABLE lesson2 INHERIT lesson;
ALTER TABLE lesson3 INHERIT lesson;
Now, in order to count the number of lessons each teacher is involved into, you can just use the lesson table:
SELECT teacher.id, teacher.name, COUNT(lesson.id)
FROM teacher
LEFT JOIN lesson ON lesson.teacher = teacher.id
GROUP BY teacher
ORDER BY COUNT(lesson.id) DESC
FETCH FIRST ROW WITH TIES;
You can replace the last line with LIMIT 1 if you are only interested in getting one of the most active teachers, but then your result is no longer deterministic.
Again, please do not use inheritance if there is no need to.

SQL: How to prevent double summing

I'm not exactly sure what the term is for this but, when you have a many-to-many relationship when joining 2 tables and you want to sum up one of the variables, I believe that you can sum the same values over and over again.
What I want to accomplish is to prevent this from happening. How do I make sure that my sum function is returning the correct number?
I'm using PostgreSQL
Example:
Table 1 Table 2
SampleID DummyName SampleID DummyItem
1 John 1 5
1 John 1 4
2 Doe 1 5
3 Jake 2 3
3 Jake 2 3
3 2
If I join these two tables ON SampleID, and I want to sum the DummyItem for each DummyName, how can I do this without double summing?
The solution is to first aggregate and then do the join:
select t1.sampleid, t1.dummyname, t.total_items
from table_1 t1
join (
select t2.sampleid, sum(dummyitem) as total_items
from table_2 t2
group by t2
) t ON t.sampleid = t1.sampleid;
The real question is however: why are the duplicates in table_1?
I would take a step back and try to assess the database design. Specifically, what rules allow such duplicate data?
To address your specific issue given your data, here's one option: create a temp table that contains unique rows from Table 1, then join the temp table with Table 2 to get the sums I think you are expecting.