Postgres add additional field to json payload index within an array prior to inserting as record set into database - postgresql

I'm creating a stored procedure in postgres which takes a json and inserts into a table by performing a json_to_recordset command.
Prior to inserting the record set into a SQL table with that command I want to add an additional field as a index to the json in the json array.
Is this possible?
For each index in the array I want to add "current_status": "pending"
[
{
"batch_id": "40",
"state_id": "10"
},
{
"batch_id": "40",
"state_id": "10"
}
]
after
[
{
"batch_id": "40",
"state_id": "10",
"current_status": "pending"
},
{
"batch_id": "40",
"state_id": "10"
"current_status": "pending"
}
]
Another option is updating the only NEW records in the table after the fact.
I'm new to postgres and have been reading through the documentation.

Based on your added comment, the current_status = 'pending' should be added as part of the insert into your target table instead of appending the key to the json objects.
insert into target_table (batch_id, state_id, current_status)
select batch_id, state_id, 'pending' as current_status
from json_to_recordset(<json>) as x(batch_id text, state_id text);

Adding to Mike's correct answer, in the case we want to modify existing tables with json arrays to include the status we may:
-- First create the table
CREATE TABLE myJson AS
SELECT '[
{
"batch_id": "40",
"state_id": "10"
},
{
"batch_id": "50",
"state_id": "60"
}
]'::json js;
WITH unnest_and_concat AS (
SELECT json_array_elements(js)::jsonb || json_build_object('current_status', 'pending')::jsonb jee
FROM myJson)
SELECT json_agg(jee)::json
FROM unnest_and_concat;
Of course this is only intended to work for a table with these rows (for illustration). If the objective is to update the entire table then we can do that (ideally with a LATERAL JOIN) mixed with an update statement. Looks like:
UPDATE myJson
SET old_col=new_col
FROM <insert subquery or table>
WHERE myJson.id = new_table.id;
Nevertheless, I would recommend modifying upon insert rather than updating.

Related

PostgreSQL query nested object in array by WHERE

Using PostgreSQL 13.4, I have a query like this, which is used for a GraphQL endpoint:
export const getPlans = async (filter: {
offset: number;
limit: number;
search: string;
}): Promise<SearchResult<Plan>> => {
const query = sql`
SELECT
COUNT(p.*) OVER() AS total_count,
p.json, TO_CHAR(MAX(pp.published_at) AT TIME ZONE 'JST', 'YYYY-MM-DD HH24:MI') as last_published_at
FROM
plans_json p
LEFT JOIN
published_plans pp ON p.plan_id = pp.plan_id
WHERE
1 = 1
`;
if (filter.search)
query.append(sql`
AND
(
p.plan_id::text ILIKE ${`${filter.search}%`}
OR
p.json->>'name' ILIKE ${`%${filter.search}%`}
**OR
p.json->'activities'->'venue'->>'name' ILIKE ${`%${filter.search}%`}
)
`);
// The above OR line or this alternative didn't work
// p #> '{"activities":[{"venue":{"name":${`%${filter.search}`}}}]}'
.
.
.
}
The data I'm accessing looks like this:
{
"data": {
"plans": {
"records": [
{
"id": "345sdfsdf",
"name": "test1",
"activities": [{...},{...}]
},
{
"id": "abc123",
"name": "test2",
"activities": [
{
"name": "test2",
"id": "123abc",
"venue": {
"name": *"I WANT THIS VALUE"* <------------------------
}
}
]
}
]
}
}
}
Since the search parameter provided to this query varies, I can only make changes in the WHERE block, in order to avoid affecting the other two working searches.
I tried 2 approaches (see above query), but neither worked.
Using TypeORM would be an alternative.
EDIT: For example, could I make that statement work somehow? I want to compare VALUE with the search string that is provided as argument:
p.json ->> '{"activities":[{"venue":{"name": VALUE}}]}' ILIKE ${`%${filter.search}`}
First, you should use the jsonb type instead of the json type in postgres for many reasons, see the manual :
... In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys...
Then you can use the following query to get the whole json data based on the search_parameter provided to the query via the user interface as far as the search_parameter is a regular expression (see the manual) :
SELECT query
FROM plans_json p
CROSS JOIN LATERAL jsonb_path_query(p.json :: jsonb , FORMAT('$ ? (#.data.plans.records[*].activities[*].venue.name like_regex "%s")', search_parameter) :: jsonpath) AS query
If you need to retrieve part of the json data only, then you transfer in the jsonb_path_query function the corresponding part of the jsonpath which is in the '(#.jsonpath)' section to the '$' section. For instance, if you just want to retrieve the jsonb object {"name": "test2", ...} then the query is :
SELECT query
FROM plans_json p
CROSS JOIN LATERAL jsonb_path_query(p.json :: jsonb , FORMAT('$.data.plans.records[*].activities[*] ? (#.venue.name like_regex "%s")', search_parameter) :: jsonpath) AS query

Azure Data Factory - traverse JSON array with multiple rows

I have a REST API that outputs JSON data similar to this example:
{
"GroupIds": [
"1234",
"2345",
"3456",
"4567"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}
Using ADF, I want to parse through this JSON object and insert a row for each value in the GroupIds array along with the objects Id and Name... So ultimately the above JSON should translate to a table like this:
GroupID
Id
Name
1234
w5a19-a493-bfd4-0a0c8djc05
Test Item
2345
w5a19-a493-bfd4-0a0c8djc05
Test Item
3456
w5a19-a493-bfd4-0a0c8djc05
Test Item
4567
w5a19-a493-bfd4-0a0c8djc05
Test Item
Is there some configuration I can use in the Copy Activity settings to accomplish this?
You can use Data flow activity to get desired result.
First add the REST API source then use select transformer and add required columns.
After this select Derived Column transformer and use unfold function to flatten JSON array.
Another way is to use Flatten formatter.
I tend to use a more ELT pattern for this, ie passing the JSON to a Stored Proc activity and letting the SQL database handle the JSON. This assumes you already have access to a SQL DB which is very capable with JSON.
A simplified example:
DECLARE #json NVARCHAR(MAX) = '{
"GroupIds": [
"1234",
"2345",
"3456",
"4567"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}'
SELECT
g.[value] AS groupId,
m.Id,
m.[Name]
FROM OPENJSON( #json, '$' )
WITH
(
Id VARCHAR(50) '$.Id',
[Name] VARCHAR(50) '$.Name',
GroupIds NVARCHAR(MAX) AS JSON
) m
CROSS APPLY OPENJSON( #json, '$.GroupIds' ) g;
You could convert this to a stored procedure where #json is the parameter and convert the SELECT to an INSERT.
My results:
I worked through a very similar example with more screenprints here which is worth a look. It's a different pattern to using Mapping Data Flows but if you already have SQL available then it makes sense to use it rather than fire up separate compute with duplicate cost. If you are not logging to a SQL DB or have access to one, then Mapping Data Flows approach might make sense to you.

Aggregate results based on array of strings in JSON?

I have a table with a field called 'keywords'. It is a JSONB field with an array of keyword metadata, including the keyword's name.
What I would like is to query the counts of all these keywords by name, i.e. aggregate on keyword name and count(id). All the examples of GROUP BY queries I can find just result in the grouping occuring on the full list (i.e. only giving me counts where the two records have the same set of keywords).
So is it possible to somehow expand the list of keywords in a way that lets me get these counts?
If not, I am still at the planning stage and could refactor my schema to better handle this.
"keywords": [
{
"addedAt": "2017-04-07T21:11:00+0000",
"addedBy": {
"email": "foo#bar.com"
},
"keyword": {
"name": "Animal"
}
},
{
"addedAt": "2017-04-07T20:54:00+0000",
"addedBy": {
"email": "foo#bar.comm"
},
"keyword": {
"name": "Mammal"
}
}
]
step-by-step demo:db<>fiddle
SELECT
elems -> 'keyword' ->> 'name' AS keyword, -- 2
COUNT(*) AS count
FROM
mytable t,
jsonb_array_elements(myjson -> 'keywords') AS elems -- 1
GROUP BY 1 -- 3
Expand the array records into one row per element
Get the keyword's names
Group these text values.

Postgres jsonb nested array append

I have simple table with a jsonb column
CREATE TABLE things (
id SERIAL PRIMARY KEY,
data jsonb
);
with data that looks like:
{
"id": 1,
"title": "thing",
"things": [
{
"title": "thing 1",
"moreThings": [
{ "title": "more thing 1" }
]
}
]
}
So how do I append inside of a deeply nested array like moreThings?
For single level nested array I could do this and it works:
UPDATE posts SET data = jsonb_set(data, '{things}', data->'things' || '{ "text": "thing" }', true);
But the same doesn't work for deeply nested arrays:
UPDATE posts SET data = jsonb_set(data, '{things}', data->'things'->'moreThings' || '{ "text": "thing" }', true)
How can I append to moreThings?
It works just fine:
UPDATE things
SET data =
jsonb_set(data,
'{things,0,moreThings}',
data->'things'->0->'moreThings' || '{ "text": "thing" }',
TRUE
)
WHERE id = 1;
If you have a table that consists only of a primary key and a jsonb attribute and you regularly want to manipulate this jsonb in the database, you are certainly doing something wrong. Your life will be much easier if you normalize the data some more.

Building query in Postgres 9.4.2 for JSONB datatype using builtin function

I have a table schema as follows:
DummyTable
-------------
someData JSONB
All my values will be a JSON object. For example, when you do a select *
from DummyTable, it would look like
someData(JSONB)
------------------
{"values":["P1","P2","P3"],"key":"ProductOne"}
{"values":["P3"],"key":"ProductTwo"}
I want a query which will give me result set as follows:
[
{
"values": ["P1","P2","P3"],
"key": "ProductOne"
},
{
"values": ["P4"],
"key": "ProductTwo"
}
]
I'm using Postgres version 9.4.2. I looked at documentation page of the same, but could not find the query which would give the above result.
However, in my API, I can build the JSON by iterating over rows, but I would prefer query doing the same. I tried json_build_array, row_to_json on a result which would be given by select * from table_name, but no luck.
Any help would be appreciated.
Here is the link I looked for to write a query for JSONB
You can use json_agg or jsonb_agg:
create table dummytable(somedata jsonb not null);
insert into dummytable(somedata) values
('{"values":["P1","P2","P3"],"key":"ProductOne"}'),
('{"values":["P3"],"key":"ProductTwo"}');
select jsonb_pretty(jsonb_agg(somedata)) from dummytable;
Result:
[
{
"key": "ProductOne",
"values": [
"P1",
"P2",
"P3"
]
},
{
"key": "ProductTwo",
"values": [
"P3"
]
}
]
Although retrieving the data row by row and building on client side can be made more efficient, as the server can start to send data much sooner - after it retrieves first matching row from storage. If it needs to build the json array first, it would need to retrieve all the rows and merge them before being able to start sending data.