Postgres join jsonb column - postgresql

I have a left table that looks like this
+-----------+-----------------------+
| name | interests |
+-----------+-----------------------+
| Jason | ["sports", "food"] |
+-----------+-----------------------+
And another table that has the interest information.
+-----------+----------------------------+
| interest | items |
+-----------+----------------------------+
| sports | ["football", "swimming"] |
+-----------+----------------------------+
| food | ["pasta", "bread"] |
+-----------+----------------------------+
| news | ["BBC", "New York Times"] |
+-----------+----------------------------+
How could I now make a query so that I can obtain an output like this?
Basically something like in Python, we would iterate over the interests and get all the items belong to those interests.
Many thanks.
+-----------+---------------------------------------------+
| name | items |
+-----------+---------------------------------------------+
| Jason | ["football", "swimming", "pasta", "bread"] |
+-----------+---------------------------------------------+

This is really a questionable database design. A traditional many-to-many table would probably make more sense.
However, to achieve what you want, you need to unnest the all elements from the interest table for all interests from the person table. Then aggregate them back into a JSON array:
select p.name,
i.interests
from person p
left join lateral (
select jsonb_agg(ix.item) as interests
from interest i
cross join jsonb_array_elements(i.items) as ix(item)
where p.interests ? i.interest
) as i on true
Online example
A slightly more compact version can be achieved, by defining your own aggregate that appends multiple jsonb values (rather than creating an array of arrays as jsonb_agg() would do if done on the "raw" arrays).
create aggregate jsonb_append_agg(jsonb)
(
sfunc = jsonb_concat(jsonb, jsonb),
stype = jsonb
);
Then you can use it like this:
select p.name,
i.interests
from person p
left join lateral (
select jsonb_append_agg(i.items) as interests
from interest i
where p.interests ? i.interest
) as i on true;

Related

POSTGRESQL: How to join tables on array column?

I would like to ask your help to create a postgresql query so that I can left join categories & products tables and replace the prodcutnums with the actual product names.
Below you can see the tables structures & desired output for the query
Categories Table:
name | productnums
---------------------------------+------------------------------
Books | {605,614,663,647,645,619,627}
Kitchen | {345,328}
Electronics | {145,146}
Products Table:
id | name
---------------------------------+----------------------
145 | LCD Monitor
147 | Mouse
345 | Glass
Desired Output:
name | productnums
---------------------------------+-------------------------------------------
Electronics | {LCD Monitor,Mouse}
I will appreciate any kind of support.
You can use the ANY operator in a JOIN condition, then use array_agg to aggregate the product names.
select c.name,
array_agg(p.name) as products
from categories c
left join products p on p.id = any(c.productnums)
group by c.name;

selecting multiple columns but group by only one in postgres

I have a simple table in postgres:
remoteaddr count
142.4.218.156 592
158.69.26.144 613
167.114.209.28 618
Which I pulled using the following:
select remoteaddr,
count (remoteaddr)
from domain_visitors
group by remoteaddr
having count (remoteaddr) > 500
How do I select additional columns and still only group by remoteaddr?
Option 1: You could use the array_agg() function to concatenate the additional column values into a grouped list:
SELECT
remoteaddr,
array_agg(DISTINCT username) AS unique_users,
array_agg(username) AS repeated_users,
count(remoteaddr) as remote_count
FROM domain_visitors
GROUP BY remoteaddr;
See this SQL Fiddle. This query would return something like the below:
+----------------+---------------------------------+-----------------------------------------------------------------------------------------------------+--------------+
| remoteaddr | unique_users | repeated_users | remote_count |
+----------------+---------------------------------+-----------------------------------------------------------------------------------------------------+--------------+
| 142.4.218.156 | anotheruser,user9688766,vistor1 | user9688766,anotheruser,vistor1,vistor1,vistor1,vistor1,vistor1,anotheruser,anotheruser,anotheruser | 10 |
| 158.69.26.144 | anotheruser,user9688766 | anotheruser,user9688766,user9688766,user9688766,user9688766 | 5 |
| 167.114.209.28 | vistor1 | vistor1 | 1 |
+----------------+---------------------------------+-----------------------------------------------------------------------------------------------------+--------------+
Option 2: You could put your first query in a common table expression (aka a "WITH" clause), and join it against the original table, like this:
WITH grouped_addr AS (
SELECT remoteaddr, count(remoteaddr) AS remote_count
FROM domain_visitors
GROUP BY remoteaddr
)
SELECT ga.remoteaddr, dv.username, ga.remote_count
FROM grouped_addr ga
INNER JOIN domain_visitors dv
ON ga.remoteaddr = dv.remoteaddr
WHERE remote_count > 500;
Here is a SQL Fiddle.
Bear in mind that this will return repeated results for any additional columns (in this example, username). This is not usually what you want. Note each of the SELECT examples in the Fiddles and see which best suits your purpose.

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;
The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

How to use COUNT() in more that one column?

Let's say I have this 3 tables
Countries ProvOrStates MajorCities
-----+------------- -----+----------- -----+-------------
Id | CountryName Id | CId | Name Id | POSId | Name
-----+------------- -----+----------- -----+-------------
1 | USA 1 | 1 | NY 1 | 1 | NYC
How do you get something like
---------------------------------------------
CountryName | ProvinceOrState | MajorCities
| (Count) | (Count)
---------------------------------------------
USA | 50 | 200
---------------------------------------------
Canada | 10 | 57
So far, the way I see it:
Run the first SELECT COUNT (GROUP BY Countries.Id) on Countries JOIN ProvOrStates,
store the result in a table variable,
Run the second SELECT COUNT (GROUP BY Countries.Id) on ProvOrStates JOIN MajorCities,
Update the table variable based on the Countries.Id
Join the table variable with Countries table ON Countries.Id = Id of the table variable.
Is there a possibility to run just one query instead of multiple intermediary queries? I don't know if it's even feasible as I've tried with no luck.
Thanks for helping
Use sub query or derived tables and views
Basically If You You Have 3 Tables
select * from [TableOne] as T1
join
(
select T2.Column, T3.Column
from [TableTwo] as T2
join [TableThree] as T3
on T2.CondtionColumn = T3.CondtionColumn
) AS DerivedTable
on T1.DepName = DerivedTable.DepName
And when you are 100% percent sure it's working you can create a view that contains your three tables join and call it when ever you want
PS: in case of any identical column names or when you get this message
"The column 'ColumnName' was specified multiple times for 'Table'. "
You can use alias to solve this problem
This answer comes from #lotzInSpace.
SELECT ct.[CountryName], COUNT(DISTINCT p.[Id]), COUNT(DISTINCT c.[Id])
FROM dbo.[Countries] ct
LEFT JOIN dbo.[Provinces] p
ON ct.[Id] = p.[CountryId]
LEFT JOIN dbo.[Cities] c
ON p.[Id] = c.[ProvinceId]
GROUP BY ct.[CountryName]
It's working. I'm using LEFT JOIN instead of INNER JOIN because, if a country doesn't have provinces, or a province doesn't have cities, then that country or province doesn't display.
Thanks again #lotzInSpace.

Find all multipolygons from one table within another

So, I've got two tables - PLUTO (pieces of land), and NYZMA (rezoning boundaries). They look like:
pluto nyzma
id | geom name | geom
-------------------- -------------------
1 | MULTIPOLYGON(x) A | MULTIPOLYGON(a)
2 | MULTIPOLYGON(y) B | MULTIPOLYGON(b)
And I want it to spit out something like this, assuming that PLUTO record 1 is in multipolygons A and B, and PLUTO record 2 is in neither:
pluto_id | nyzma_id
-------------------
1 | [A, B]
2 |
How do I, for every PLUTO record's corresponding geometry, cycle through each NYZMA record, and print the names of any whose geometry matches?
Join the two tables using the spatial function ST_Contains. Than use GROUP BY and ARRAY_AGG in the main query:
WITH subquery AS (
SELECT pluto.id, nyzma.name
FROM pluto LEFT OUTER JOIN nyzma
ON ST_Contains(nyzma.geom, pluto.geom)
)
SELECT id, array_agg(name) FROM subquery GROUP BY id;