Postgres select all data via relation - postgresql

If parents have children, and children have books they've read, how do I know all the books read by all the children of a parent?
Basic setup:
CREATE TABLE parent(
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL
);
CREATE TABLE child(
id SERIAL PRIMARY KEY,
parent INTEGER REFERENCES parent(id) NOT NULL,
name VARCHAR(50) NOT NULL
);
CREATE TABLE book(
id SERIAL PRIMARY KEY,
readBy INTEGER REFERENCES child(id) NOT NULL,
name VARCHAR(50) NOT NULL
);
Insert some data:
INSERT INTO parent (name) VALUES
('Bob'),
('Mary');
INSERT INTO child (parent, name) VALUES
(1, 'Stu'), -- Bob has children Stu and Jane
(1, 'Jane'),
(2, 'Greg'), -- Mary has children Greg and Bella
(2, 'Bella');
INSERT INTO book (readBy, name) VALUES
(1, 'Crime & Punishment'), -- Stu has read C&P and Goodnight Moon
(1, 'Goodnight Moon'),
(2, 'The Singularity'), -- Jane has read The Singularity and 1fish2fish
(2, 'One Fish Two Fish'),
(3, 'Narnia'); -- Greg has read Narnia (Bella read nothing)
How do I formulate a SELECT query involving "Bob" as a parameter and get all the books read by his kids?:
( 'Crime & Punishment', 'Goodnight Moon', 'The Singularity', 'One Fish Two Fish' )
The same query, except involving "Mary" should give only the one book read by "Greg", who is her only child who has read anything:
( 'Narnia' )
Thanks in advance for any help! :)
Disclaimer: I'm sure this question must have been asked before but I wasn't able to find it :(

You can chain joins
select book.name
from book
join child on book.readby=child.id
join parent on child.parent=parent.id
where parent.name='Bob';
Or if you want the results as an array/list
This is unusual, usually results as rows like above are more useful but you appear to be asking for a single-line result.
select array_agg(book.name)
from book
join child on book.readby=child.id
join parent on child.parent=parent.id
where parent.name='Bob';
Note: the results could appear in any order.

Related

postgres constraint based on SELECT condition

ᕼello! I think I have a somewhat tricky postgres situation:
parents have children. children have an age, and a flag that they are the appreciated.
The rule: a parent can't appreciate two children of the same age!
My question is: how to enforce this rule?
Current schema:
CREATE TABLE parent (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL
);
CREATE TABLE child (
id SERIAL PRIMARY KEY,
parent INTEGER REFERENCES parent(id) NOT NULL,
name VARCHAR(50) NOT NULL,
age INTEGER NOT NULL,
appreciated BOOLEAN NOT NULL
);
Put some values in:
INSERT INTO parent(name) VALUES
('bob'), -- assume bob's id = 0
('mary'); -- assume mary's id = 1
INSERT INTO child(parent, name, age, appreciated) VALUES
(0, 'child1', 10, FALSE), -- Bob has children 1, 2, 3
(0, 'child2', 10, FALSE),
(0, 'child3', 15, FALSE),
(1, 'child4', 20, FALSE), -- Mary has children 4, 5, 6
(1, 'child5', 20, FALSE),
(1, 'child6', 10, FALSE);
All fine so far. No child is appreciated, which is always valid.
Mary is allowed to appreciate child6:
UPDATE child SET appreciated=TRUE WHERE name='child6';
Bob is allowed to appreciate child2. child2 is the same age as child6 (who is already appreciated), but child6 is not Bob's child.
UPDATE child SET appreciated=TRUE WHERE name='child2';
Bob now cannot appreciate child1. This child1 is the same age as child2, and child2 is already appreciated.
UPDATE child SET appreciated=TRUE WHERE name='child2'; -- This needs to FAIL!
How do I enforce such a constraint? I'm open to all kinds of solutions, but modifying the general schema is not an option.
Thanks in advance!
How about a UNIQUE partial index, like so:
CREATE UNIQUE INDEX ON child(parent,age) WHERE appreciated;
So every pair of parent,age has to be unique, but only when appreciated children are considered.
You might want to use a trigger that activates BEFORE the insert/update and that fails if the constraint you create is not satisfied.
I suppose it should be like
create trigger <trigger_name>
before insert or update on <table_name>
for each row
declare
dummy number;
begin
select count(*)
into dummy
from <table_name>
where (appreciated=TRUE and :new.child = child and :new.age = age);
if dummy > 0 then
raise_application_error(-20001,'Too many appreciated children');
end if;
end;
Some documentation
The simplest thing I would think to do is add a flag grateful(?) == false to the parent model and when child.appreciated == true { parent.grateful == true }
Check the value of parent.grateful in the function that acts on child.appreciated.
If parent.grateful == true
return "sorry this parent has already shown their appreciation."
LOL this is an interesting concept though. Good Luck. :)

Can the categories in the postgres tablefunc crosstab() function be integers?

It's all in the title. Documentation has something like this:
SELECT *
FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
I have two tables, lab_tests and lab_tests_results. All of the lab_tests_results rows are tied to the primary key id integer in the lab_tests table. I'm trying to make a pivot table where the lab tests (identified by an integer) are row headers and the respective results are in the table. I can't get around a syntax error at or around the integer.
Is this possible with the current set up? Am I missing something in the documentation? Or do I need to perform an inner join of sorts to make the categories strings? Or modify the lab_tests_results table to use a text identifier for the lab tests?
Thanks for the help, all. Much appreciated.
Edit: Got it figured out with the help of Dmitry. He had the data layout figured out, but I was unclear on what kind of output I needed. I was trying to get the pivot table to be based on batch_id numbers in the lab_tests_results table. Had to hammer out the base query and casting data types.
SELECT *
FROM crosstab('SELECT lab_tests_results.batch_id, lab_tests.test_name, lab_tests_results.test_result::FLOAT
FROM lab_tests_results, lab_tests
WHERE lab_tests.id=lab_tests_results.lab_test AND (lab_tests.test_name LIKE ''Test Name 1'' OR lab_tests.test_name LIKE ''Test Name 2'')
ORDER BY 1,2'
) AS final_result(batch_id VARCHAR, test_name_1 FLOAT, test_name_2 FLOAT);
This provides a pivot table from the lab_tests_results table like below:
batch_id |test_name_1 |test_name_2
---------------------------------------
batch1 | result1 | <null>
batch2 | result2 | result3
If I understand correctly your tables look something like this:
CREATE TABLE lab_tests (
id INTEGER PRIMARY KEY,
name VARCHAR(500)
);
CREATE TABLE lab_tests_results (
id INTEGER PRIMARY KEY,
lab_tests_id INTEGER REFERENCES lab_tests (id),
result TEXT
);
And your data looks something like this:
INSERT INTO lab_tests (id, name)
VALUES (1, 'test1'),
(2, 'test2');
INSERT INTO lab_tests_results (id, lab_tests_id, result)
VALUES (1,1,'result1'),
(2,1,'result2'),
(3,2,'result3'),
(4,2,'result4'),
(5,2,'result5');
First of all crosstab is part of tablefunc, you need to enable it:
CREATE EXTENSION tablefunc;
You need to run it one per database as per this answer.
The final query will look like this:
SELECT *
FROM crosstab(
'SELECT lt.name::TEXT, lt.id, ltr.result
FROM lab_tests AS lt
JOIN lab_tests_results ltr ON ltr.lab_tests_id = lt.id'
) AS ct(test_name text, result_1 text, result_2 text, result_3 text);
Explanation:
The crosstab() function takes a text of a query which should return 3 columns; (1) a column for name of a group, (2) a column for grouping, (3) the value. The wrapping query just selects all the values those crosstab() returns and defines the list of columns after (the part after AS). First is the category name (test_name) and then the values (result_1, result_2). In my query I'll get up to 3 results. If I have more then 3 results then I won't see them, If I have less then 3 results I'll get nulls.
The result for this query is:
test_name |result_1 |result_2 |result_3
---------------------------------------
test1 |result1 |result2 |<null>
test2 |result3 |result4 |result5

CSV file data into a PostgreSQL table

I am trying to create a database for movielens (http://grouplens.org/datasets/movielens/). We've got movies and ratings. Movies have multiple genres. I splitted those out into a separate table since it's a 1:many relationship. There's a many:many relationship as well, users to movies. I need to be able to query this table multiple ways.
So I created:
CREATE TABLE genre (
genre_id serial NOT NULL,
genre_name char(20) DEFAULT NULL,
PRIMARY KEY (genre_id)
)
.
INSERT INTO genre VALUES
(1,'Action'),(2,'Adventure'),(3,'Animation'),(4,'Children\s'),(5,'Comedy'),(6,'Crime'),
(7,'Documentary'),(8,'Drama'),(9,'Fantasy'),(10,'Film-Noir'),(11,'Horror'),(12,'Musical'),
(13,'Mystery'),(14,'Romance'),(15,'Sci-Fi'),(16,'Thriller'),(17,'War'),(18,'Western');
.
CREATE TABLE movie (
movie_id int NOT NULL DEFAULT '0',
movie_name char(75) DEFAULT NULL,
movie_year smallint DEFAULT NULL,
PRIMARY KEY (movie_id)
);
.
CREATE TABLE moviegenre (
movie_id int NOT NULL DEFAULT '0',
genre_id tinyint NOT NULL DEFAULT '0',
PRIMARY KEY (movie_id, genre_id)
);
I dont know how to import my movies.csv with columns movie_id, movie_name and movie_genre For example, the first row is (1;Toy Story (1995);Animation|Children's|Comedy)
If I INSERT manually, it should be look like:
INSERT INTO moviegenre VALUES (1,3),(1,4),(1,5)
Because 3 is Animation, 4 is Children and 5 is Comedy
How can I import all data set this way?
You should first create a table that can ingest the data from the CSV file:
CREATE TABLE movies_csv (
movie_id integer,
movie_name varchar,
movie_genre varchar
);
Note that any single quotes (Children's) should be doubled (Children''s). Once the data is in this staging table you can copy the data over to the movie table, which should have the following structure:
CREATE TABLE movie (
movie_id integer, -- A primary key has implicit NOT NULL and should not have default
movie_name varchar NOT NULL, -- Movie should have a name, varchar more flexible
movie_year integer, -- Regular integer is more efficient
PRIMARY KEY (movie_id)
);
Sanitize your other tables likewise.
Now copy the data over, extracting the unadorned name and the year from the CSV name:
INSERT INTO movie (movie_id, movie_name)
SELECT parts[1], parts[2]::integer
FROM movies_csv, regexp_matches(movie_name, '([[:ascii:]]*)\s\(([\d]*)\)$') p(parts)
Here the regular expression says:
([[:ascii:]]*) - Capture all characters until the matches below
\s - Read past a space
\( - Read past an opening parenthesis
([\d]*) - Capture any digits
\) - Read past a closing parenthesis
$ - Match from the end of the string
So on input "Die Hard 17 (John lives forever) (2074)" it creates a string array with {'Die Hard 17 (John lives forever)', '2074'}. The scanning has to be from the end $, assuming all movie titles end with the year of publication in parentheses, in order to preserve parentheses and numbers in movie titles.
Now you can work on the movie genres. You have to split the string on the bar | using the regex_split_to_table() function and then join to the genre table on the genre name:
INSERT INTO moviegenre
SELECT movie_id, genre_id
FROM movies_csv, regexp_split_to_table(movie_genre, '\|') p(genre) -- escape the |
JOIN genre ON genre.genre_name = p.genre;
After all is done and dusted you can delete the movies_csv table.

Efficient approach for language tags (small tag set) in rdbms queries

My app uses an RDBMS (postgres w/ activerecord) to store and fetch text objects. Each text object can have any number of languages associated with it. So far I've been thinking of these language associations as tags in my head, much like blog posts can have any number of arbitrary tags. However, these language tags are not arbitrary, and are instead limited to a small set of about 30. In my app a user can request some text objects and supply a set of languages (say English, German, and French) and the app should go ahead and grab some text objects associated with ANY of those languages.
What's the most effective approach/schema for associating languages with these text objects to make querying easy?
To make it easy to query you can create a view to avoid the constant joining.
create table object (
id serial unique,
object text primary key
);
create table tag (
id serial unique,
tag text primary key
);
create table object_tag (
object_id integer references object(id),
tag_id integer references tag(id)
);
insert into tag (tag) values ('English'), ('French'), ('German');
insert into object (object) values ('o1'), ('o2');
insert into object_tag (object_id, tag_id) values (1, 1), (1, 2), (2, 3);
create view v_object_tag as
select o.id object_id, o.object, t.id tag_id, t.tag
from
object o
inner join
object_tag ot on o.id = ot.object_id
inner join
tag t on t.id = ot.tag_id
;
Now query as if it were a single table:
select *
from v_object_tag
where tag in ('English', 'German')
;
object_id | object | tag_id | tag
-----------+--------+--------+---------
1 | o1 | 1 | English
2 | o2 | 3 | German

DB2 loses track of table after getting 3 levels deep into a subquery

With this standard authors/books setup:
CREATE TABLE authors (
id int NOT NULL,
name varchar(255) NOT NULL
)
CREATE TABLE books (
id int NOT NULL,
name varchar(255) NOT NULL,
author_id int NOT NULL,
sold int NOT NULL
)
INSERT INTO authors VALUES (1, 'author 1')
INSERT INTO authors VALUES (2, 'author 2')
INSERT INTO books VALUES (1, 'book 1', 1, 10)
INSERT INTO books VALUES (2, 'book 2', 1, 5)
INSERT INTO books VALUES (3, 'book 3', 2, 7)
this query somehow doesn't work:
SELECT
(
SELECT
count(*)
FROM
(
SELECT
books.name
FROM
books
WHERE
books.author_id = authors.id
AND books.sold > (
SELECT
avg(sold)
FROM
books
WHERE
books.author_id <> authors.id
)
) AS t
) AS good_selling_books
FROM
authors
WHERE
authors.id = 1
The error message is:
SQL0204N "AUTHORS.ID" is an undefined name. SQLSTATE=42704
It looks like DB2 loses track of the outermost table after getting 3 levels deep into a subquery?
NOTE: This is just a fabricated query so it may not make much sense (and can be easily rewritten to only have 2 levels nesting which works fine). I just want to confirm if DB2 indeed has such a glaring shortcoming.
Just found the solution which is rather simple. DB2 has this LATERAL keyword which is needed for such query to work, e.g.
SELECT
(
SELECT
count(*)
FROM
LATERAL( -- this is the only change
SELECT
books.name
FROM
books
WHERE
books.author_id = authors.id
AND books.sold > (
SELECT
avg(sold)
FROM
books
WHERE
books.author_id <> authors.id
)
) AS t
) AS good_selling_books
FROM
authors
WHERE
authors.id = 1
The solution came from this blog post https://www.ibm.com/developerworks/mydeveloperworks/blogs/SQLTips4DB2LUW/entry/scoping_rules_in_db2125?lang=en, where the author also noticed the same shortcoming in DB2:
But DB2 also didn't jump two levels up to S.c1. I suppose it could but, alas, it does not.
The problem is the innermost query.
You can't just compare to authors.id without having authors in the FROM clause.
This also fails in MySQL with the exact same error:
ERROR 1054 (42S22): Unknown column 'authors.id' in 'where clause'
I would suspect that the query is indeed incorrect. I believe that you need a reference to the authors table in the FROM clause on the innermost query. I've been doing a lot of NOSQL stuff lately so my SQL query skills are a little dusty, but I'm pretty sure that an inner query cannot reach out to other tables.
Rewrite the query using joins instead of nested queries if you can. The performance of nested queries tend to optimize poorly anyway (unsubstantiated in DB2, but true in MySQL 5.1 at least).