How to join tables by overlapping jsonb values? (PostgreSQL) - postgresql

To clarify the question, here's an example:
Table venues
id match_ids (jsonb)
---------------------
1 []
2 [112]
3 ["25", 112]
4 [25, 112]
5 ["112"]
6 ["999"]
Table sports
id object (jsonb)
--------------------
1 {"match_ids": [25, 112]}
2 {}
3
To join venues and sports so that the joined table has at least one element in common, how does the SQL statement have to look like?
Here's the expected outcome:
sports.id venue.id venue.match_ids
-------------------------------
1 2 [112]
1 3 ["25", 112]
1 4 [25, 112]
1 5 ["112"]
I wrote the following...
select * from venues
join sports on venues.match_ids <# sports.object::jsonb->'match_ids';
... but this resulted in
ERROR: operator does not exist: boolean -> unknown
LINE 4: sports.object::jsonb->'match_ids';
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
(Here's a fiddle: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d5263069f490828288c30b6a51a181c8)
Any advice will be appreciated!
PS: I use PostgreSQL v13.

I managed to solve the issue by moving from jsonb to array and working with array intersect &&
select * from venues
join sports on ARRAY(SELECT jsonb_array_elements(match_ids)::varchar)
&&
ARRAY(SELECT jsonb_array_elements(sports.object::jsonb->'match_ids')::varchar);
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=5143f236235e4411c729f41a60c69012
EDIT from the original poster
Thank you so much #Ftisiot for this brilliant solution.
However, I found that this approach works if match_ids in venues has integers only. If the column has texts, it may not work. To select both texts and integers, jsonb_array_elements_text should be used.
e.g.
select * from venues
join sports on ARRAY(SELECT jsonb_array_elements_text(match_ids))
&&
ARRAY(SELECT jsonb_array_elements_text(sports.object::jsonb->'match_ids'));
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=64f8fac38ccf73d38834c22e90ba4b2c

Change the precedence with parentheses:
... on venues.match_ids <# (sports.object::jsonb->'match_ids')

Related

Postgres 13 join from another table on JSONB array of String

I have a table with JSONB column. In this column we store identifiers of another table as json array of strings. How can I join the tables
Table Customer:
CustomerID
Name
Campaigns (JSONB)
1
John
[ "rxuatoak", "vsnxcvdsl", "jkiasokd" ]
2
Mick
[ "jdywmsks", "nxbsvwios", "jkiasokd" ]
Table Campaign:
CampaignID
Identifier
CampaignName
1
rxuatoak
Alpha
2
vsnxcvdsl
Bravo
3
jkiasokd
Charlie
4
jdywmsks
Delta
5
nxbsvwios
Echo
Result something like:
CustomerID
Name
CampaignNames
1
John
Alpha, Bravo, Charlie
2
Mick
Delta, Echo, Charlie
I tried many ways, and could only find online help with json objects inside the jsonb column. My jsonb column has simple array of strings.
Using POSTGRES 13
You can apply a JOIN operation between the two tables on condition that an identifier is found within a campaign (using ? operator). Then apply aggregation with STRING_AGG, with respect to the "CustomerID" and "Name"
SELECT customer.CustomerID,
customer.Name_,
STRING_AGG(campaign.CampaignName, ',') AS CampaignNames
FROM customer
INNER JOIN campaign
ON customer.Campaigns ? campaign.Identifier
GROUP BY customer.CustomerID,
customer.Name_
Check the demo here.

PostgreSQL: get first non null value per group

I'd like to obtain is the first ocurrence of non-null value per category.
If there are just null values, the result of this category shall be NULL.
For a table like this:
Category Value
1 NULL
1 1922
2 23
2 99
3 NULL
3 NULL
the result should be
Category Value
1 1922
2 23
3 NULL
How can this be achieved using postgres?
Unfortunately the two features that would make this trivial are not implemented in postgresql
IGNORE NULLS in FIRST_VALUE, LAST_VALUE
FILTER clause in non-aggregate window functions
However, you can hack the desired result using groupby & array_agg , which does support the FILTER clause, and then pick the first element using square-bracket syntax. (recall that postgresql array indexing starts with 1)
Also, I would advise that you provide an explicit ordering for the aggregation step. Otherwise the value that ends up as the first element would depend on the query plan & physical data layout of the underlying table.
WITH vals (category, val) AS ( VALUES
(1,NULL),
(1,1922),
(2,23),
(2,99),
(3,NULL),
(3,NULL)
)
SELECT
category
, (ARRAY_AGG(val) FILTER (WHERE val IS NOT NULL))[1]
FROM vals
GROUP BY 1
produces the following output:
category | array_agg
----------+-----------
1 | 1922
3 |
2 | 23
(3 rows)

Choosing data structure/storage solution for complex geo queries

I have a dataset of entities with their type and lat/long. Like this:
Name Type Lat Long
House1 Big 1 2
House11 Bigger 2 2
House12 Biggest 3 2
House13 Small 4 2
House14 Medium 5 2
So these are houses with their type and location. Now I need to answer queries like: "Find all house of type Big which have a Small and a Medium house in its 10km radius"
What kind of data structure/storage solution would be right here? I looked at Elasticsearch and Redis but looks like I need to iterate over all the houses of the given type (Big for the sample query above) to answer this.
It's perfectly feasible directly from PostgreSQL with PostGIS.
Considering your table structure ...
CREATE TEMPORARY TABLE t (name TEXT, type TEXT, geom GEOGRAPHY);
... and your test data ...
INSERT INTO t VALUES ('House1','Big', ST_MakePoint(1,2));
INSERT INTO t VALUES ('House11','Bigger', ST_MakePoint(2,2));
INSERT INTO t VALUES ('House12','Biggest', ST_MakePoint(3,2));
INSERT INTO t VALUES ('House13','Small', ST_MakePoint(4,2));
INSERT INTO t VALUES ('House14','Medium', ST_MakePoint(5,2));
(Note: here makes no sense to split lat,long in different columns. PostGIS can store both in a single GEOGRAPHY or GEOMETRY column. See ST_MakePoint for more details.)
"Find all house of type Big which have a Small and a Medium house in
its 10km radius"
Try something like this using ST_Distance:
WITH j AS (SELECT * FROM t WHERE type = 'Big')
SELECT
j.name,j.type,
ST_Distance(j.geom,t.geom) AS distance,
t.name, t.type
FROM j,t
WHERE
ST_Distance(j.geom,t.geom) > 10000 AND
t.type IN ('Small','Medium');
name | type | distance | name | type
--------+------+-----------------+---------+--------
House1 | Big | 333756.3481116 | House13 | Small
House1 | Big | 445008.41595616 | House14 | Medium
(2 Zeilen)
(This query returns records which are more than 10k meters away from the Big type house. Just adapt the first where statement to your needs)
EDIT: Query based on the comments.
WITH j AS (SELECT *, ARRAY(SELECT DISTINCT t2.type
FROM t t2
WHERE t2.type IN ('Small','Medium') AND
ST_Distance(t2.geom,t1.geom) < 100000
) AS nearHouseType
FROM t t1 WHERE type = 'Big')
SELECT *
FROM j
WHERE j.nearHouseType #> '{Medium, Small}'::TEXT[]

getting categoryid fo more than one shortname passed

I have the following tables:
business
id catid subcatid
---------------------
10 {1} {10,20}
20 {2} {30,40}
30 {3} {50,60,70}
cat_subcat
catid shortname parent_id bid
--------------------------------------------
1 A 10
2 B 20
3 c 30
10 x 1 10
20 y 1 10
30 z 2 20
40 w 2 20
Both the tables have a relationship using id. The problem I am getting is outlined below. Here is my query currently:
SELECT ARRAY[category_id]::int[] from cat_subcat
where parentcategoryid IS not NULL and shortname ilike ('x,y');
I want to get the category_id for an entered shortname, but my query is not giving the proper output. If I pass one shortname it will retrieve the category_id, but if I pass more than one shortname it will not display category_id. Please tell me how to get the category_id for more than one shortname passed.
To actually use pattern matching with ILIKE, you cannot use a simple IN expression. Instead, you need ILIKE ANY (...) or ALL (...), depending on whether you want the tests ORed or ANDed:
Also, your ARRAY constructor will be applied to individual rows, which seems rather pointless. I assume you want this instead (educated guess):
SELECT array_agg(catid) AS cats
FROM cat_subcat
WHERE parent_id IS NOT NULL
AND shortname ILIKE ANY ('{x,y}');
Well, as long as you don't use wildcards (%, _) for your pattern, you can translate this to:
AND lower(shortname) IN ('x','y');
But that would be rather pointless, since Postgres internally converts this to:
AND lower(shortname) = ANY ('{x,y}');
.. before evaluating.

Perl + PostgreSQL-- Selective Column to Row Transpose

I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.