Full Outer Join on two columns is omitting rows - postgresql

Some background, I am making a table in Postgres 9.5 that counts the number of actions performed by a user and grouping these actions by month using date_trunc(). The counts for each individual action are divided into separate tables, following this format:
Feedback table:
id | month | feedback_counted
----+---------+-------------------
1 | 2 | 3
1 | 3 | 10
1 | 4 | 7
1 | 5 | 2
Comments table:
id | month | comments_counted
----+---------+-------------------
1 | 4 | 12
1 | 5 | 4
1 | 6 | 57
1 | 7 | 12
Ideally, I would like to do a FULL OUTER JOIN of these tables ON the "id" and "month" columns at the same time and produce this query:
Combined table:
id | month | feedback_counted | comments_counted
----+---------+--------------------+-------------------
1 | 2 | 3 |
1 | 3 | 10 |
1 | 4 | 7 | 12
1 | 5 | 2 | 4
1 | 6 | | 57
1 | 7 | | 12
However, my current query does not capture the feedback dates, displaying it like such:
Rollup table:
id | month | feedback_counted | comments_counted
----+---------+--------------------+-------------------
| | |
| | |
1 | 4 | 7 | 12
1 | 5 | 2 | 4
1 | 6 | | 57
1 | 7 | | 12
This is my current statement, note that it uses date_trunc in place of month. I add the action counts later, the main issue is somewhere here.
CREATE TABLE rollup_table AS
SELECT c.id, c.date_trunc
FROM comments_counted c FULL OUTER JOIN feedback_counted f
ON c.id = f.id AND c.date_trunc = f.date_trunc
GROUP BY c.id, c.date_trunc, f.id, f.date_trunc;
I'm a bit of a novice with SQL and am not sure how to fix this, any help would be appreciated.

Replace ON c.id = f.id AND c.month = f.month with USING(id, month).
SELECT id, month, feedback_counted, comments_counted
FROM comments c
FULL OUTER JOIN feedback f
USING(id, month);
id | month | feedback_counted | comments_counted
----+-------+------------------+------------------
1 | 2 | 3 |
1 | 3 | 10 |
1 | 4 | 7 | 12
1 | 5 | 2 | 4
1 | 6 | | 57
1 | 7 | | 12
(6 rows)
Test it in db<>fiddle.

USING() basically is the same as ON, just that if the 2 tables share the same column names, you can use USING() instead of ON to save some typing effort. That being said, using USING() won't work. In Postgresql (not sure about other sql versions), you still need to specify c.id, and c.month, even with USING(). And as long as you specify the columns, Postgresql will only pull the rows where the values of these columns exist. That's why you will have missing rows under the full outer join.
Here is a way that at least works for me.
SELECT COALESCE(c.id, f.id) AS id,
COALESCE(c.month, f.month) AS month,
feedback_counted,
comments_counted
FROM comments c
FULL OUTER JOIN feedback f
ON c.id = f.id AND c.month = f.month;

Related

join columns from two table into one

I have table AnalysisForm
a_id| a_description | medical_card_id
-------------------------
1 | Analysis1 | 5
2 | Analysis2 | 3
3 | Analysis3 | 2
4 | Analysis4 | 1
and table DicomForm
d_id| d_description | medical_card_id
-------------------------
1 | DicomForm1 | 5
2 | DicomForm2 | 3
3 | DicomForm3 | 2
4 | DicomForm4 | 1
Now I want to get info by medical_card_id = 5 like this
form_id| form_description | medical_card_id
-------------------------
1 | DicomForm1 | 5
1 | Analysis1 | 5
How can I make it in Postgres?
I actually think that you want a union query here, rather than a join:
SELECT a_id AS form_id, a_description AS form_description, medical_card_id
FROM AnalysisForm
WHERE medical_card_id = 5
UNION ALL
SELECT d_id, d_description, medical_card_id
FROM DicomForm
WHERE medical_card_id = 5;

postgres tablefunc, sales data grouped by product, with crosstab of months

TIL about tablefunc and crosstab. At first I wanted to "group data by columns" but that doesn't really mean anything.
My product sales look like this
product_id | units | date
-----------------------------------
10 | 1 | 1-1-2018
10 | 2 | 2-2-2018
11 | 3 | 1-1-2018
11 | 10 | 1-2-2018
12 | 1 | 2-1-2018
13 | 10 | 1-1-2018
13 | 10 | 2-2-2018
I would like to produce a table of products with months as columns
product_id | 01-01-2018 | 02-01-2018 | etc.
-----------------------------------
10 | 1 | 2
11 | 13 | 0
12 | 0 | 1
13 | 20 | 0
First I would group by month, then invert and group by product, but I cannot figure out how to do this.
After enabling the tablefunc extension,
SELECT product_id, coalesce("2018-1-1", 0) as "2018-1-1"
, coalesce("2018-2-1", 0) as "2018-2-1"
FROM crosstab(
$$SELECT product_id, date_trunc('month', date)::date as month, sum(units) as units
FROM test
GROUP BY product_id, month
ORDER BY 1$$
, $$VALUES ('2018-1-1'::date), ('2018-2-1')$$
) AS ct (product_id int, "2018-1-1" int, "2018-2-1" int);
yields
| product_id | 2018-1-1 | 2018-2-1 |
|------------+----------+----------|
| 10 | 1 | 2 |
| 11 | 13 | 0 |
| 12 | 0 | 1 |
| 13 | 10 | 10 |

Postgresql get total matches by player

I've got the following Postgres query:
SELECT p_id as player_id, name as player_name
FROM Players
LEFT OUTER JOIN matches
ON Players.p_id = matches.player1 or Players.p_id = matches.player2
;
and it returns the following
player_id | player_name
-----------+-------------------
1 | Twilight Sparkle
1 | Twilight Sparkle
2 | Fluttershy
3 | Applejack
3 | Applejack
4 | Pinkie Pie
5 | "Rarity
5 | "Rarity
6 | Rainbow Dash
7 | Princess Celestia
7 | Princess Celestia
8 | Princess Luna
How can I end up with a table of unique p_id's with each one's name and the total of rows that p_id is in?
player_id | player_name | total_matches
-----------+-------------------+------
1 | Twilight Sparkle | 2
2 | Fluttershy | 1
3 | Applejack | 1
4 | Pinkie Pie | 1
5 | "Rarity | 2
6 | Rainbow Dash | 1
7 | Princess Celestia | 2
8 | Princess Luna | 1
You can achieve it using the group clause:
SELECT p_id as player_id, name as player_name, count(*) as total_matches
FROM Players
LEFT OUTER JOIN matches
ON Players.p_id = matches.player1 or Players.p_id = matches.player2
GROUP BY name
;

How to set sequence number of sub-elements in TSQL unsing same element as parent?

I need to set a sequence inside T-SQL when in the first column I have sequence marker (which is repeating) and use other column for ordering.
It is hard to explain so I try with example.
This is what I need:
|------------|-------------|----------------|
| Group Col | Order Col | Desired Result |
|------------|-------------|----------------|
| D | 1 | NULL |
| A | 2 | 1 |
| C | 3 | 1 |
| E | 4 | 1 |
| A | 5 | 2 |
| B | 6 | 2 |
| C | 7 | 2 |
| A | 8 | 3 |
| F | 9 | 3 |
| T | 10 | 3 |
| A | 11 | 4 |
| Y | 12 | 4 |
|------------|-------------|----------------|
So my marker is A (each time I met A I must start new group inside my result). All rows before first A must be set to NULL.
I know that I can achieve that with loop but it would be slow solution and I need to update a lot of rows (may be sometimes several thousand).
Is there a way to achive this without loop?
You can use window version of COUNT to get the desired result:
SELECT [Group Col], [Order Col],
COUNT(CASE WHEN [Group Col] = 'A' THEN 1 END)
OVER
(ORDER BY [Order Col]) AS [Desired Result]
FROM mytable
If you need all rows before first A set to NULL then use SUM instead of COUNT.
Demo here

I'm a bit new to PostgreSQL and need how to construct complex query

I need to list all the cities you can get to after stopping off at exactly one other city, starting off from any city of my choice. And list with it the distance to the final city and the intermediate city.
The tables in the database consist of cities, with the attributes:
| city_id | name |
1 Edinburgh
2 Newcastle
3 Manchester
citypairs:
| citypair_id | city_id |
1 1
1 2
2 1
2 3
3 2
3 3
and distances:
| citypair_id | distance |
1 1234
2 1324
3 1324
and trains:
| train_id | departure_city_id | destination_city_id |
1 1 2
2 2 3
3 1 3
4 3 2
I haven't put any of the data in but basically if a city.name is chosen at random by me I need to find out which cities I can get to from this city if I go via another city (i.e. in two journeys) and then the distance to the final and intermediate city.
How would you, or how should I, go about forming a query to return the desired table?
Edited to include data and a missing table! As an example you can go from Edinburgh(1) to Manchester(3) via Newcastle(2) and you can go from Edinburgh to Newcastle via Manchester, however you can not go from Manchester to Edinburgh via Newcastle (since a train departs from 3, arrives at 2, but no train from 2 arrives in 1) and this route should not be returned from the query. Apologies for any confusion beforehand.
I've got a CTE that builds a tree of all the destinations.
WITH RECURSIVE trip AS (
SELECT c.city_id AS start_city,
ARRAY[c.city_id] AS route,
cast(c.name AS varchar(100)) AS route_text,
c.city_id AS leg_start_city,
c.city_id AS leg_end_city,
0 AS trip_count,
0 AS leg_length,
0 AS total_length
FROM cities c
UNION ALL
SELECT
trip.start_city,
trip.route || t.destination_city_id,
cast(trip.route_text || ',' || c.name AS varchar(100)),
t.departure_city_id,
t.destination_city_id,
trip.trip_count + 1,
d.distance,
trip.total_length + d.distance
FROM trains t
INNER JOIN trip
ON t.departure_city_id = trip.leg_end_city
INNER JOIN citypairs cps
ON t.departure_city_id = cps.city_id
INNER JOIN citypairs cpe
ON t.destination_city_id = cpe.city_id AND
cpe.citypair_id = cps.citypair_id
INNER JOIN distances d
ON cps.citypair_id = d.citypair_id
INNER JOIN cities c
ON t.destination_city_id = c.city_id
WHERE NOT (array[t.destination_city_id] <# trip.route))
SELECT *
FROM trip
WHERE trip_count = 2
AND start_city = (SELECT city_id FROM cities WHERE name = 'Edinburgh');
The CTE starts from each city (in the non-recursive part at the start), then determines all the destination cities it can go to. It keeps a track of all the cities its been to in an array (the route column), so it won't loop back to itself again. As it progresses, it keeps track of the overall trip distance, and the number of trains taken (in trip_count).
As it goes through the tree, it keeps a running total of the distance.
This gives results of
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
If you change remove the final WHERE clause it'll show all the possible trips in the data, likewise you can change the trip_count to find all single train destinations etc.
| START_CITY | ROUTE | ROUTE_TEXT | LEG_START_CITY | LEG_END_CITY | TRIP_COUNT | LEG_LENGTH | TOTAL_LENGTH |
--------------------------------------------------------------------------------------------------------------------------------
| 1 | 1 | Edinburgh | 1 | 1 | 0 | 0 | 0 |
| 2 | 2 | Newcastle | 2 | 2 | 0 | 0 | 0 |
| 3 | 3 | Manchester | 3 | 3 | 0 | 0 | 0 |
| 1 | 1,2 | Edinburgh,Newcastle | 1 | 2 | 1 | 1234 | 1234 |
| 1 | 1,3 | Edinburgh,Manchester | 1 | 3 | 1 | 1324 | 1324 |
| 2 | 2,3 | Newcastle,Manchester | 2 | 3 | 1 | 1324 | 1324 |
| 3 | 3,2 | Manchester,Newcastle | 3 | 2 | 1 | 1324 | 1324 |
| 1 | 1,2,3 | Edinburgh,Newcastle,Manchester | 2 | 3 | 2 | 1324 | 2558 |
| 1 | 1,3,2 | Edinburgh,Manchester,Newcastle | 3 | 2 | 2 | 1324 | 2648 |
The cast( ... as varchar(100)) is a bit hacky, and I'm not sure why it was needed, but I haven't had a chance to get around that yet.
The SQL is here for testing: http://sqlfiddle.com/#!1/93964/24
The first part is easy:
SELECT c2.name
FROM cities AS c
JOIN trains t ON c.city_id=t.departure_city_id
JOIN trains t2 ON t.destination_city_id=t2.departure_city_id
JOIN cities AS c2 ON t2.destination_city_id=c2.city_id
WHERE c2.city_id!=c.city_id
AND c.name='Edinburgh';
http://sqlfiddle.com/#!12/a656f/14
In PG 9.1+ you could even do it with a recursive CTE for any number of cities in between. The distances are a little more complicated and you probably would be better off transforming city_pairs into actual pairs.