Aggregate results based on array of strings in JSON? - postgresql

I have a table with a field called 'keywords'. It is a JSONB field with an array of keyword metadata, including the keyword's name.
What I would like is to query the counts of all these keywords by name, i.e. aggregate on keyword name and count(id). All the examples of GROUP BY queries I can find just result in the grouping occuring on the full list (i.e. only giving me counts where the two records have the same set of keywords).
So is it possible to somehow expand the list of keywords in a way that lets me get these counts?
If not, I am still at the planning stage and could refactor my schema to better handle this.
"keywords": [
{
"addedAt": "2017-04-07T21:11:00+0000",
"addedBy": {
"email": "foo#bar.com"
},
"keyword": {
"name": "Animal"
}
},
{
"addedAt": "2017-04-07T20:54:00+0000",
"addedBy": {
"email": "foo#bar.comm"
},
"keyword": {
"name": "Mammal"
}
}
]

step-by-step demo:db<>fiddle
SELECT
elems -> 'keyword' ->> 'name' AS keyword, -- 2
COUNT(*) AS count
FROM
mytable t,
jsonb_array_elements(myjson -> 'keywords') AS elems -- 1
GROUP BY 1 -- 3
Expand the array records into one row per element
Get the keyword's names
Group these text values.

Related

Query by object property in child array

I have jsonb documents in Postgres table like this:
{
"Id": "267f9e75-efb8-4331-8220-932b023b3a34",
"Name": "Some File",
"Tags": [
{
"Key": "supplier",
"Value": "70074"
},
{
"Key": "customer",
"Value": "1008726"
}
]
}
My working query to find documents where Tags.Key is supplier is this:
FROM docs
WHERE EXISTS(
SELECT TRUE
FROM jsonb_array_elements(data -> 'Tags') x
WHERE x ->> 'Key' IN ('supplier')
I wanted to find a shorter way and tried this:
select * from docs where data->'Tags' #> '[{ "Key":"supplier"}]';
But then I get this error for the syntax of #>:
<operator>, AT, EXCEPT, FETCH, FOR, GROUP, HAVING, INTERSECT, ISNULL, LIMIT, NOTNULL, OFFSET, OPERATOR, ORDER, TIME, UNION, WINDOW, WITH or '[' expected, got '#'
My questions are: is there a shorter working query and what's wrong with my second query?
It's actually been an IDE issue: https://youtrack.jetbrains.com/issue/RIDER-83829/Postgres-json-query-error-but-works-in-DataGrip

Indexing of string values in JSONB nested arrays in Postgres

I have a complex object with JSONB (PostgreSQL 12), that has nested arrays in nested arrays and so on. I search for all invoices, that contains specific criteria.
create table invoice:
invoice_number primary key text not null,
parts: jsonb,
...
Object:
"parts": [
{
"groups": [
{
"categories": [
{
"items": [
{
...
"articleName": "article1",
"articleSize": "M",
},
{
...
"articleName": "article2"
"articleSize": "XXL",
}
]
}
]
}
]
},
{
"groups": [
...
]
},
]
I've a build a native query to search for items with a specific articleName:
select * from invoice i,
jsonb_array_elements(i.invoice_parts) parts,
jsonb_array_elements(parts -> 'groups') groups,
jsonb_array_elements(groups -> 'categories') categories,
jsonb_array_elements(categories -> 'items') items
where items ->> 'articleName' like '%name%' and items ->> 'articleSize' = 'XXL';
I assume i could improve search speed with indexing. I've read about Trigram indexes. Would it be the best type of indexing for my case? If yes -> how to build it for such complex object.
Thanks in regards for any advices.
The only option that might speed up this, is to create a GIN index on the parts column and use a JSON path operator:
select *
from invoice
where parts #? '$.parts[*].groups[*].categories[*].items[*] ? (#.articleName like_regex "name" && #.articleSize == "XXL")'
But I doubt this is going to be fast enough, even if that uses the index.

PostgreSQL query nested object in array by WHERE

Using PostgreSQL 13.4, I have a query like this, which is used for a GraphQL endpoint:
export const getPlans = async (filter: {
offset: number;
limit: number;
search: string;
}): Promise<SearchResult<Plan>> => {
const query = sql`
SELECT
COUNT(p.*) OVER() AS total_count,
p.json, TO_CHAR(MAX(pp.published_at) AT TIME ZONE 'JST', 'YYYY-MM-DD HH24:MI') as last_published_at
FROM
plans_json p
LEFT JOIN
published_plans pp ON p.plan_id = pp.plan_id
WHERE
1 = 1
`;
if (filter.search)
query.append(sql`
AND
(
p.plan_id::text ILIKE ${`${filter.search}%`}
OR
p.json->>'name' ILIKE ${`%${filter.search}%`}
**OR
p.json->'activities'->'venue'->>'name' ILIKE ${`%${filter.search}%`}
)
`);
// The above OR line or this alternative didn't work
// p #> '{"activities":[{"venue":{"name":${`%${filter.search}`}}}]}'
.
.
.
}
The data I'm accessing looks like this:
{
"data": {
"plans": {
"records": [
{
"id": "345sdfsdf",
"name": "test1",
"activities": [{...},{...}]
},
{
"id": "abc123",
"name": "test2",
"activities": [
{
"name": "test2",
"id": "123abc",
"venue": {
"name": *"I WANT THIS VALUE"* <------------------------
}
}
]
}
]
}
}
}
Since the search parameter provided to this query varies, I can only make changes in the WHERE block, in order to avoid affecting the other two working searches.
I tried 2 approaches (see above query), but neither worked.
Using TypeORM would be an alternative.
EDIT: For example, could I make that statement work somehow? I want to compare VALUE with the search string that is provided as argument:
p.json ->> '{"activities":[{"venue":{"name": VALUE}}]}' ILIKE ${`%${filter.search}`}
First, you should use the jsonb type instead of the json type in postgres for many reasons, see the manual :
... In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys...
Then you can use the following query to get the whole json data based on the search_parameter provided to the query via the user interface as far as the search_parameter is a regular expression (see the manual) :
SELECT query
FROM plans_json p
CROSS JOIN LATERAL jsonb_path_query(p.json :: jsonb , FORMAT('$ ? (#.data.plans.records[*].activities[*].venue.name like_regex "%s")', search_parameter) :: jsonpath) AS query
If you need to retrieve part of the json data only, then you transfer in the jsonb_path_query function the corresponding part of the jsonpath which is in the '(#.jsonpath)' section to the '$' section. For instance, if you just want to retrieve the jsonb object {"name": "test2", ...} then the query is :
SELECT query
FROM plans_json p
CROSS JOIN LATERAL jsonb_path_query(p.json :: jsonb , FORMAT('$.data.plans.records[*].activities[*] ? (#.venue.name like_regex "%s")', search_parameter) :: jsonpath) AS query

PostgreSQL query of JSONB array by nested object

I have the following array JSON data structure:
{ arrayOfObjects:
[
{
"fieldA": "valueA1",
"fieldB": { "fieldC": "valueC", "fieldD": "valueD" }
},
{
"fieldA": "valueA",
"fieldB": { "fieldC": "valueC", "fieldD": "valueD" }
}
]
}
I would like to select all records where fieldD matches my criteria (and fieldC is unknown). I've see similar answers such as Query for array elements inside JSON type but there the field being queried is a simple string (akin to searching on fieldA in my example) where my problem is that I would like to query based on an object within an object within the array.
I've tried something like select * from myTable where jsonData -> 'arrayOfObjects' #> '[ { "fieldB": { "fieldD": "valueD" } } ]' ) but that doesn't seem to work.
How can I achieve what I want?
You can execute a "contains" query on the JSONB field directly and pass the minimum you're looking for:
SELECT *
FROM mytable
WHERE json_data #> '{"arrayOfObjects": [{"fieldB": {"fieldD": "valueD"}}]}'::JSONB;
This of course assumes that fieldD is always nested under fieldB, but that's a fairly low bar to clear in terms of schema consistency.

MongoDB query on Index

Each element in collection has following format:
{
"Name": "Some Name",
"Description": "Some description",
"Tags": ["java", "code", "some tag"]
}
I have created index on field "Tags" as follows:
db.Establishments.ensureIndex({ Tags: 1 });
Now I want to make query to find out all the tags that begins with "ja" for example (for auto-complete suggestion).
Instead of querying collection is there a way to query index directly, or efficient query which involves operation on index only?
I suppose that you actually want to query the tag attribute and return distinct values for your autocompletion feature, right?
This is quite easy using the distinct method:
db.Establishments.distinct( 'Tags' )
See http://docs.mongodb.org/manual/reference/method/db.collection.distinct/ for more info on distinct queries
As to your question about index queries: you can't ask an index directly - the index serves query optimization as such. Using distinct on an indexed attribute will be quick.
To query the distinct method, execute:
db.Establishments.distinct( 'Tags', { 'Tags': /^ja/ } )