Postgres: How to JOIN - postgresql

I am working on Postgres 9.3. I have two tables, the first for payment items:
Table "public.prescription"
Column | Type | Modifiers
-------------------+-------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('frontend_prescription_id_seq'::regclass)
presentation_code | character varying(15) | not null
presentation_name | character varying(1000) | not null
actual_cost | double precision | not null
pct_id | character varying(3) | not null
And the second for organisations:
Table "public.pct"
Column | Type | Modifiers
-------------------+-------------------------+-----------
code | character varying(3) | not null
name | character varying(200) |
I have a query to get all the payments for a particular code:
SELECT sum(actual_cost) as total_cost, pct_id as row_id
FROM prescription
WHERE presentation_code='1234' GROUP BY pct_id
Here is the query plan for that query.
Now, I'd like to annotate each row with the name property of the associated organisation. This is what I'm trying:
SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription, pct
WHERE prescription.presentation_code='0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;
Here's the ANALYSE for that query. It's incredibly slow: what am I doing wrong?
I think there must be a way to annotate each row with the pct.name AFTER the first query has run, which would be faster.

With JOIN (LEFT JOIN in this case, because we want the line even if there is no pct):
SELECT
sum(prescription.actual_cost) as total_cost,
prescription.pct_id,
pct.name as row_id
FROM prescription
LEFT JOIN pct ON pct.code = prescription.pct_id
WHERE
prescription.presentation_code='0212000AAAAAAAA'
GROUP BY
prescription.pct_id,
pct.name;
I don't know if it's work well, I didn't try this query.

You are taking data from 2 tables, but you do not join the tables in any way. Effectively, you make a full join, resulting in the Cartesian product of both tables. If you look at your ANALYZE statistics, you see that your nested loop processes 62 million rows. that takes time.
Add in a join condition to make this all fast:
SELECT sum(prescription.actual_cost) as total_cost, prescription.pct_id, pct.name as row_id
FROM prescription
JOIN pct On pct.code = prescription.pct_id
WHERE prescription.presentation_code = '0212000AAAAAAAA'
GROUP BY prescription.pct_id, pct.name;

Related

PostgreSQL How to merge two tables row to row without condition

I have two tables
The first table contains three text fields(username, email, num) the second have only one column with random birth_date DATE.
I need to merge tables without condition
For example
first table:
+----------+--------------+-----------+
| username | email | num |
+----------+--------------+-----------+
| 'user1' | 'user1#mail' | '+794949' |
| 'user2' | 'user2#mail' | '+799999' |
+----------+--------------+-----------+
second table:
+--------------+
| birth_date |
+--------------+
| '2001-01-01' |
| '2002-02-02' |
+--------------+
And I need result like
+----------+------------+-------------+--------------+
| username | email | num | birth_date |
+----------+------------+-------------+--------------+
| 'user1' | 'us1#mail' | '+7979797' | '2001-01-01' |
| 'user2' | 'us2#mail' | '+79898998' | '2002-02-02' |
+----------+------------+-------------+--------------+
I need to get in result table with 100 rows too
Tried different JOIN but there is no condition here
Sure there is a join condition, about the simplest there is: Join on true or cross join. Either is the basic merge tables without condition. However this does not result in what you want as it generates a result set of 10k rows. But you an then use limit:
select *
from table1
join table2 on true
order by random()
limit 100;
select *
from table1
cross join table2
order by random()
limit 100;
There is other option, witch I think may be closer to what you want. Assign a value to each row of each table. Then join on this assigned value:
select <column list>
from (select *, row_number() over() rn from table1) t1
join (select *, row_number() over() rn from table2) t2
on (t1.rn = t2.rn);
To eliminate the assigned value you must specifically list each column desired in the result. But that is the way it should be done anyway.
See demo here. (demo user just 3 rows instead of 100)

How to filter NULLs based on whether a column value has only one record or not, T-SQL

I have a dataset not too dissimilar from this:
data set
Or seen here:
| RECORD_VALUE | RECORD_ATTRIBUTE |
| :--- | :--- |
| ABC | NULL |
| DEF | 123 |
| DEF | 456 |
| GHI | NULL |
| GHI | 789 |
From that picture, I would like to filter such that the row for record "ABC" is kept, but that the record with a "NULL" value in Col_B for record "GHI" is removed. Basically, for records that do have a value other than "NULL" in Col_B, I only want the record(s) with values. But for records that only have an associated "NULL" value in Col_B, I want to keep the entire record.
I would appreciate any ideas! Thanks!
There are a couple of ways to skin this cat.
One method is to read the data first to get the ones with NOT NULL in Col_B, then join back to the table to get the relevant values.
However, this approach involves one read of the data - and then processing based on that (saves doing 2 table reads).
WITH A AS
(SELECT Col_A,
Col_B,
MAX(Col_B) OVER (PARTITION BY Col_A) AS Max_ColB
FROM yourtable
)
SELECT A.Col_A, A.Col_B
FROM A
WHERE (Max_ColB IS NULL)
OR (Max_ColB IS NOT NULL AND Col_B IS NOT NULL);
The above finds the maximum value of Col_B for each value in Col_A, which will be NULL if all the values of Col_B are NULL, else will be an actual value.
Then the main part sorts through those - reporting all rows when the Max_ColB is NULL, or only reporting lines where Col_B is NOT NULL is the Max_ColB has a value.
Here is a db<>fiddle with your example data and results. It also includes the CTE component broken out (the WITH... part) showing its results.

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;
The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

How to use COUNT() in more that one column?

Let's say I have this 3 tables
Countries ProvOrStates MajorCities
-----+------------- -----+----------- -----+-------------
Id | CountryName Id | CId | Name Id | POSId | Name
-----+------------- -----+----------- -----+-------------
1 | USA 1 | 1 | NY 1 | 1 | NYC
How do you get something like
---------------------------------------------
CountryName | ProvinceOrState | MajorCities
| (Count) | (Count)
---------------------------------------------
USA | 50 | 200
---------------------------------------------
Canada | 10 | 57
So far, the way I see it:
Run the first SELECT COUNT (GROUP BY Countries.Id) on Countries JOIN ProvOrStates,
store the result in a table variable,
Run the second SELECT COUNT (GROUP BY Countries.Id) on ProvOrStates JOIN MajorCities,
Update the table variable based on the Countries.Id
Join the table variable with Countries table ON Countries.Id = Id of the table variable.
Is there a possibility to run just one query instead of multiple intermediary queries? I don't know if it's even feasible as I've tried with no luck.
Thanks for helping
Use sub query or derived tables and views
Basically If You You Have 3 Tables
select * from [TableOne] as T1
join
(
select T2.Column, T3.Column
from [TableTwo] as T2
join [TableThree] as T3
on T2.CondtionColumn = T3.CondtionColumn
) AS DerivedTable
on T1.DepName = DerivedTable.DepName
And when you are 100% percent sure it's working you can create a view that contains your three tables join and call it when ever you want
PS: in case of any identical column names or when you get this message
"The column 'ColumnName' was specified multiple times for 'Table'. "
You can use alias to solve this problem
This answer comes from #lotzInSpace.
SELECT ct.[CountryName], COUNT(DISTINCT p.[Id]), COUNT(DISTINCT c.[Id])
FROM dbo.[Countries] ct
LEFT JOIN dbo.[Provinces] p
ON ct.[Id] = p.[CountryId]
LEFT JOIN dbo.[Cities] c
ON p.[Id] = c.[ProvinceId]
GROUP BY ct.[CountryName]
It's working. I'm using LEFT JOIN instead of INNER JOIN because, if a country doesn't have provinces, or a province doesn't have cities, then that country or province doesn't display.
Thanks again #lotzInSpace.

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.