SELECT DISTINCT on a ordered subquery's table - postgresql

I'm working on a problem, involving these two tables.
books
isbn | title | author
------------+-----------------------------------------+------------------
1840918626 | Hogwarts: A History | Bathilda Bagshot
3458400871 | Fantastic Beasts and Where to Find Them | Newt Scamander
9136884926 | Advanced Potion-Making | Libatius Borage
transactions
id | patron_id | isbn | checked_out_date | checked_in_date
----+-----------+------------+------------------+-----------------
1 | 1 | 1840918626 | 2012-05-05 | 2012-05-06
2 | 4 | 9136884926 | 2012-05-05 | 2012-05-06
3 | 2 | 3458400871 | 2012-05-05 | 2012-05-06
4 | 3 | 3458400871 | 2018-04-29 | 2018-05-02
5 | 2 | 9136884926 | 2018-05-03 | NULL
6 | 1 | 3458400871 | 2018-05-03 | 2018-05-05
7 | 5 | 3458400871 | 2018-05-05 | NULL
the query "Make a list of all book titles and denote whether or not a copy of that book is checked out." so pretty much just the first table with a checked out column.
im trying to SELECT DISTINCT on a sub query with the checkout books first, but that doesn't work. I've researched and others say to accomplish this use a GROUP BY clause instead of DISTINCT but the examples they provide are one column queries and when more columns are added it doesn't work.
this is my closest attempt
SELECT DISTINCT ON (title)
title, checked_out
FROM(
SELECT b.title, t.checked_in_date IS NULL AS checked_out
FROM transactions t
natural join books b
ORDER BY checked_out DESC
) t;

or you can join only transactions where books are not checked in:
SELECT b.title, t.isbn IS NOT NULL AS checked_out
, t.checked_out_date
FROM books b
LEFT JOIN transactions t ON t.isbn = b.isbn AND t.checked_in_date IS NULL
ORDER BY checked_out DESC

I adjusted your attempt a little bit. Basically I changed the way your data is joined
SELECT DISTINCT ON (title)
title, checked_out
FROM(
SELECT b.title, t.checked_in_date IS NULL AS checked_out
FROM books b
LEFT OUTER JOIN transactions t USING (isbn)
ORDER BY checked_out DESC
) t;

Related

How to query parent child in PostgreSQL?

I have the following table structure :
place_id | parent_place_id | name
---------|-----------------|------------
1 | 2 | child
---------|-----------------|------------
2 | 3 | dad
---------|-----------------|------------
3 | | grandfather
......
I am trying to write a query so that my output data is as follows :
id_Grandfather | name_Grandfather | id_Dad | name_Dad | id_Child | name_child
----------------|------------------|--------|----------|----------|-----------
3 | grandfather | 2 | dad | 1 | child
I have tried many ways but not getting the expected result. Can anyone help me to solve it? Thank !
There is a way to do it with double join. But does it make any sense is totally different question.
SELECT
gf.place_id as id_Grandfather,
gf.name as name_Grandfather,
d.place_id as id_Dad,
d.name as name_Dad,
c.place_id as id_Child,
c.name as name_Child
FROM
your_table c
LEFT JOIN your_table d ON c.parent_place_id = d.place_id
LEFT JOIN your_table gf ON d.parent_lace_id = gf.place_id
-- Add this if you want to have only lines which has Dad and Grandfather fields populated
WHERE d.place_id IS NOT NULL
;

LEFT JOIN in Postgres when there is a WHERE clause [duplicate]

This question already has answers here:
Left Outer Join doesn't return all rows from my left table?
(3 answers)
Closed 9 months ago.
I've been using PosgreSQL almost daily for over 11 years now, and today I wrote what I though was a very simple query with a LEFT JOIN that doesn't behave the way that I expected. I'm lucky I caught the bug, but it has me concerned that there is something fundamental here that I a missing. Please look at the following to be able reproduce.
CREATE TEMP TABLE tbl_a(date date);
INSERT INTO tbl_a VALUES ('2022-01-01'), ('2022-01-02'), ('2022-01-03'), ('2022-01-04');
CREATE TEMP TABLE sale(date date, item_id int);
INSERT INTO sale VALUES ('2022-01-02', 2), ('2022-01-03', 2), ('2022-01-04', 3);
When I run the following query I get the results I expect with a LEFT JOIN
SELECT t.*, s.item_id FROM tbl_a AS t LEFT JOIN sale AS s ON t.date = s.date;
+------------+---------+
| date | item_id |
+------------+---------+
| 2022-01-01 | NULL |
| 2022-01-02 | 2 |
| 2022-01-03 | 2 |
| 2022-01-04 | 3 |
+------------+---------+
I get every record in tbl_a and since I have no sale records for 2022-01-01, I get a NULL.
However, when I add a WHERE to the query I get an unexpected result.
SELECT t.*, s.item_id FROM tbl_a AS t LEFT JOIN sale AS s ON t.date = s.date WHERE s.item_id = 2;
+------------+---------+
| date | item_id |
+------------+---------+
| 2022-01-02 | 2 |
| 2022-01-03 | 2 |
+------------+---------+
Note: there is no record for 2022-01-01 or 2022-01-04.
However, if I rewrite the query with a CTE, I get the results I expect.
WITH s AS (select * from sale WHERE item_id = 2) SELECT t.*, s.item_id FROM tbl_a AS t LEFT JOIN s ON t.date = s.date ORDER BY t.date;
+------------+---------+
| date | item_id |
+------------+---------+
| 2022-01-01 | NULL |
| 2022-01-02 | 2 |
| 2022-01-03 | 2 |
| 2022-01-04 | NULL |
+------------+---------+
My question is why do the above two queries yield different results.
Note:
SELECT version();
+-----------------------------------------------------------------------------------------------------------------------------------+
| version |
+-----------------------------------------------------------------------------------------------------------------------------------+
| PostgreSQL 13.7 (Ubuntu 13.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit |
+-----------------------------------------------------------------------------------------------------------------------------------+
Thats due to the order of execution from postgres.
Whenever you run the 1st query you are joining both tables then filtering it with the where item_id = 2).
In the second query you are filtering tbl_a then joining the result with b.
The equivalent of the 1st query would be something like:
WITH s AS
(select * from sale)
SELECT t.*, s.item_id
FROM tbl_a AS t
LEFT JOIN s ON t.date = s.date
WHERE s.item_id = 2
ORDER BY t.date;

Mysql- SELECT Column 'A' even with NULLS

Table A contains student names, table B and C contain classes and the presence of students.
I would like to display all students and attend their presence. The problem is that I can not display all students who did not have a checked presence.
Where I checked the presence of students it is ok, but if there is no checked presence in a given class, on a given day and in a given subject- nothing is displayed.
My query:
SELECT student.id_student, CONCAT(student.name,' ' ,student.surname) as 'name_surname',pres_student_present, pres_student_absent, pres_student_justified, pres_student_late, pres_student_rel, pres_student_course, pres_student_delegation, pres_student_note FROM student
LEFT JOIN class ON student.no_classes = class.no_classes
LEFT JOIN pres_student ON student.id_student = pres_student.id_student
WHERE (class.no_classes = '$class' OR NULL AND pres_student_data = '$data' AND pres_student_id_subject = $id_subject OR NULL)
GROUP BY student.surname
ORDER BY student.surname ASC
I want to display name_surname always and any other column should have NULL or 1
like:
Name | present | absent | just | late | rel | delegation | note |
Donald Trump | 1 | | | | | | |
Bush | | | | | | | |
Someone | 1 | | | | | | |
etc...
You should move restrictions on class and pres_studenttables from the WHERE clause to the ON (LEFT join).
In your case when you perform a restriction in the WHERE clause on a table with an outer join, the sql engine consider you are performing an INNER join
SELECT student.id_student
, CONCAT(student.name, ' ', student.surname) AS 'name_surname'
, pres_student_present
, pres_student_absent
, pres_student_justified
, pres_student_late
, pres_student_rel
, pres_student_course
, pres_student_delegation
, pres_student_note
FROM student
LEFT JOIN class
ON student.no_classes = class.no_classes
AND class.no_classes = '$class'
LEFT JOIN pres_student
ON student.id_student = pres_student.id_student
AND pres_student_data = '$data'
AND pres_student_id_subject = $id_subject
GROUP BY student.surname
ORDER BY student.surname ASC

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

Query to combine two tables into one based on timestamp

I have three tables in Postgres. They are all about a single event (an occurrence, not "sports event"). Each table is about a specific item during the event.
table_header columns
gid, start_timestamp, end_timestamp, location, positions
table_item1 columns
gid, side, visibility, item1_timestamp
table_item2 columns
gid, position_id, name, item2_timestamp
I've tried the following query:
SELECT h.gid, h.location, h.start_timestamp, h.end_timestamp, i1.side,
i1.visibility, i2.position_id, i2.name, i2.item2_timestamp AS timestamp
FROM tablet_header AS h
LEFT OUTER JOIN table_item1 i1 on (i1.gid = h.gid)
LEFT OUTER JOIN table_item2 i2 on (i2.gid = i1.gid AND
i1.item1_timestamp = i2.item2_timestamp)
WHERE h.start_timestamp BETWEEN '2016-03-24 12:00:00'::timestamp AND now()::timestamp
The problem is that I'm losing some data from rows when item1_timestamp and item2_timestamp do not match.
So if I have in table_item1 and table_item2:
gid | item1_timestamp | side gid | item2_timestamp | name
---------------------------- -----------------------------------
1 | 17:00:00 | left 1 | 17:00:00 | charlie
1 | 17:00:05 | right 1 | 17:00:03 | frank
1 | 17:00:10 | left 1 | 17:00:06 | dee
I would want the final output to be:
gid | timestamp | side | name
-----------------------------
1 | 17:00:00 | left | charlie
1 | 17:00:03 | | frank
1 | 17:00:05 | right |
1 | 17:00:06 | | dee
1 | 17:00:10 | left |
based purely on the timestamp (and gid). Naturally I would have the header info in there too, but that's trivial.
I tried playing around with the query I posted used different JOINs and UNIONs, but I cannot seem to get it right. The one I posted gives the best results I could manage, but it's incomplete.
Side note: every minute or so there will be a new "event". So the gid will be unique to each event and the query needs to ensure that each dataset is paired with data from the same gid. Which is the reason for my i1.gid = h.gid lines. Data between different events should not be compared.
select t1.gid, t1.timestamp, t1.side, t2.name
from t1
left join t2 on t2.timestamp=t1.timestamp and t2.gid=t1.gid
union
select t1.gid, t1.timestamp, t1.side, t2.name
from t2
left join t1 on t2.timestamp=t1.timestamp and t2.gid=t1.gid