Building query in Postgres 9.4.2 for JSONB datatype using builtin function - postgresql

I have a table schema as follows:
DummyTable
-------------
someData JSONB
All my values will be a JSON object. For example, when you do a select *
from DummyTable, it would look like
someData(JSONB)
------------------
{"values":["P1","P2","P3"],"key":"ProductOne"}
{"values":["P3"],"key":"ProductTwo"}
I want a query which will give me result set as follows:
[
{
"values": ["P1","P2","P3"],
"key": "ProductOne"
},
{
"values": ["P4"],
"key": "ProductTwo"
}
]
I'm using Postgres version 9.4.2. I looked at documentation page of the same, but could not find the query which would give the above result.
However, in my API, I can build the JSON by iterating over rows, but I would prefer query doing the same. I tried json_build_array, row_to_json on a result which would be given by select * from table_name, but no luck.
Any help would be appreciated.
Here is the link I looked for to write a query for JSONB

You can use json_agg or jsonb_agg:
create table dummytable(somedata jsonb not null);
insert into dummytable(somedata) values
('{"values":["P1","P2","P3"],"key":"ProductOne"}'),
('{"values":["P3"],"key":"ProductTwo"}');
select jsonb_pretty(jsonb_agg(somedata)) from dummytable;
Result:
[
{
"key": "ProductOne",
"values": [
"P1",
"P2",
"P3"
]
},
{
"key": "ProductTwo",
"values": [
"P3"
]
}
]
Although retrieving the data row by row and building on client side can be made more efficient, as the server can start to send data much sooner - after it retrieves first matching row from storage. If it needs to build the json array first, it would need to retrieve all the rows and merge them before being able to start sending data.

Related

ADF: use the output from a lookup activity on another activity in Data Factory

I have a lookup activity (Get_ID) that returns:
{
"count": 2,
"value": [
{
"TRGT_VAL": "10000"
},
{
"TRGT_VAL": "52000"
}
],
(...)
I want to use these 2 values from TRGT_VAL in a WHERE clause of a query in another activity. I'm using
#concat('SELECT * FROM table WHERE column in ',activity('Get_ID').output.value[0].TRGT_VAL)
But only the first value of 10000 is being taken into account. How to get the whole list?
I solved by using a lot of replaces:
#concat('(',replace(replace(replace(replace(replace(replace(replace(string(activity('Get_ID').output.value),'{',''),' ',''),'"',''),'TRGT_VAL:',''),'[',''),'}',''),']',''),')')
Output
{
"name": "AptitudeCF",
"value": "(10000,52000)"
}
Instead of using big expression with lot of replace functions, you can use String interpolation syntax and frame your query. Below is query which you can consider.
SELECT * FROM table WHERE column in (#{activity('Get_ID').output.value[0].TRGT_VAL},#{activity('Get_ID').output.value[1].TRGT_VAL)

PostgreSQL query nested object in array by WHERE

Using PostgreSQL 13.4, I have a query like this, which is used for a GraphQL endpoint:
export const getPlans = async (filter: {
offset: number;
limit: number;
search: string;
}): Promise<SearchResult<Plan>> => {
const query = sql`
SELECT
COUNT(p.*) OVER() AS total_count,
p.json, TO_CHAR(MAX(pp.published_at) AT TIME ZONE 'JST', 'YYYY-MM-DD HH24:MI') as last_published_at
FROM
plans_json p
LEFT JOIN
published_plans pp ON p.plan_id = pp.plan_id
WHERE
1 = 1
`;
if (filter.search)
query.append(sql`
AND
(
p.plan_id::text ILIKE ${`${filter.search}%`}
OR
p.json->>'name' ILIKE ${`%${filter.search}%`}
**OR
p.json->'activities'->'venue'->>'name' ILIKE ${`%${filter.search}%`}
)
`);
// The above OR line or this alternative didn't work
// p #> '{"activities":[{"venue":{"name":${`%${filter.search}`}}}]}'
.
.
.
}
The data I'm accessing looks like this:
{
"data": {
"plans": {
"records": [
{
"id": "345sdfsdf",
"name": "test1",
"activities": [{...},{...}]
},
{
"id": "abc123",
"name": "test2",
"activities": [
{
"name": "test2",
"id": "123abc",
"venue": {
"name": *"I WANT THIS VALUE"* <------------------------
}
}
]
}
]
}
}
}
Since the search parameter provided to this query varies, I can only make changes in the WHERE block, in order to avoid affecting the other two working searches.
I tried 2 approaches (see above query), but neither worked.
Using TypeORM would be an alternative.
EDIT: For example, could I make that statement work somehow? I want to compare VALUE with the search string that is provided as argument:
p.json ->> '{"activities":[{"venue":{"name": VALUE}}]}' ILIKE ${`%${filter.search}`}
First, you should use the jsonb type instead of the json type in postgres for many reasons, see the manual :
... In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys...
Then you can use the following query to get the whole json data based on the search_parameter provided to the query via the user interface as far as the search_parameter is a regular expression (see the manual) :
SELECT query
FROM plans_json p
CROSS JOIN LATERAL jsonb_path_query(p.json :: jsonb , FORMAT('$ ? (#.data.plans.records[*].activities[*].venue.name like_regex "%s")', search_parameter) :: jsonpath) AS query
If you need to retrieve part of the json data only, then you transfer in the jsonb_path_query function the corresponding part of the jsonpath which is in the '(#.jsonpath)' section to the '$' section. For instance, if you just want to retrieve the jsonb object {"name": "test2", ...} then the query is :
SELECT query
FROM plans_json p
CROSS JOIN LATERAL jsonb_path_query(p.json :: jsonb , FORMAT('$.data.plans.records[*].activities[*] ? (#.venue.name like_regex "%s")', search_parameter) :: jsonpath) AS query

How can I get the count of JSON array in ADF?

I'm using Azure data factory to retrieve data and copy into a database... the Source looks like this:
{
"GroupIds": [
"4ee1a-0856-4618-4c3c77302b",
"21259-0ce1-4a30-2a499965d9",
"b2209-4dda-4e2f-029384e4ad",
"63ac6-fcbc-8f7e-36fdc5e4f9",
"821c9-aa73-4a94-3fc0bd2338"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}
In my ADF pipeline, I am trying to get the count of GroupIds and store that in a database column (along with the associated Id from the JSON above).
Is there some kind of syntax I can use to tell ADF that I just want the count of GroupIds or is this going to require some kind of recursive loop activity?
You can use the length function in Azure Data Factory (ADF) to check the length of json arrays:
length(json(variables('varSource')).GroupIds)
If you are loading the data to a SQL database then you could use OPENJSON, a simple example:
DECLARE #json NVARCHAR(MAX) = '{
"GroupIds": [
"4ee1a-0856-4618-4c3c77302b",
"21259-0ce1-4a30-2a499965d9",
"b2209-4dda-4e2f-029384e4ad",
"63ac6-fcbc-8f7e-36fdc5e4f9",
"821c9-aa73-4a94-3fc0bd2338"
],
"Id": "w5a19-a493-bfd4-0a0c8djc05",
"Name": "Test Item",
"Description": "test item description",
"Notes": null,
"ExternalId": null,
"ExpiryDate": null,
"ActiveStatus": 0,
"TagIds": [
"784083-4c77-b8fb-0135046c",
"86de96-44c1-a497-0a308607",
"7565aa-437f-af36-8f9306c9",
"d5d841-1762-8c14-d8420da2",
"bac054-2b6e-a19b-ef5b0b0c"
],
"ResourceIds": []
}'
SELECT *
FROM OPENJSON( #json, '$.GroupIds' )
SELECT COUNT(*) countOfGroupIds
FROM OPENJSON( #json, '$.GroupIds' );
My results:
If your data is stored in a table the code is similar. Make sense?
Another funky way to approach it if you really need the count in-line, is to convert the JSON to XML using the built-in functions and then run some XPath on it. It's not as complicated as it sounds and would allow you to get the result inside the pipeline.
The Data Factory XML function converts JSON to XML, but that JSON must have a single root property. We can fix up the json with concat and a single line of code. In this example I'm using a Set Variable activity, where varSource is your original JSON:
#concat('{"root":', variables('varSource'), '}')
Next, we can just apply the XPath with another simple expression:
#string(xpath(xml(json(variables('varIntermed1'))), 'count(/root/GroupIds)'))
My results:
Easy huh. It's a shame there isn't more built-in support for JPath unless I'm missing something, although you can use limited JPath in the Copy activity.
You can use Data flow activity in the Azure data factory pipeline to get the count.
Step1:
Connect the Source to JSON dataset, and in Source options under JSON settings, select single document.
In the source preview, you can see there are 5 GroupIDs per ID.
Step2:
Use flatten transformation to deformalize the values into rows for GroupIDs.
Select GroupIDs array in Unroll by and Unroll root.
Step3:
Use Aggregate transformation, to get the count of GroupIDs group by ID.
Under Group by: Select a column from the drop-down for your aggregation.
Under Aggregate: You can build the expression to get the count of the column (GroupIDs).
Aggregate Data preview:
Step4: Connect the output to Sink transformation to load the final output to database.

Postgres add additional field to json payload index within an array prior to inserting as record set into database

I'm creating a stored procedure in postgres which takes a json and inserts into a table by performing a json_to_recordset command.
Prior to inserting the record set into a SQL table with that command I want to add an additional field as a index to the json in the json array.
Is this possible?
For each index in the array I want to add "current_status": "pending"
[
{
"batch_id": "40",
"state_id": "10"
},
{
"batch_id": "40",
"state_id": "10"
}
]
after
[
{
"batch_id": "40",
"state_id": "10",
"current_status": "pending"
},
{
"batch_id": "40",
"state_id": "10"
"current_status": "pending"
}
]
Another option is updating the only NEW records in the table after the fact.
I'm new to postgres and have been reading through the documentation.
Based on your added comment, the current_status = 'pending' should be added as part of the insert into your target table instead of appending the key to the json objects.
insert into target_table (batch_id, state_id, current_status)
select batch_id, state_id, 'pending' as current_status
from json_to_recordset(<json>) as x(batch_id text, state_id text);
Adding to Mike's correct answer, in the case we want to modify existing tables with json arrays to include the status we may:
-- First create the table
CREATE TABLE myJson AS
SELECT '[
{
"batch_id": "40",
"state_id": "10"
},
{
"batch_id": "50",
"state_id": "60"
}
]'::json js;
WITH unnest_and_concat AS (
SELECT json_array_elements(js)::jsonb || json_build_object('current_status', 'pending')::jsonb jee
FROM myJson)
SELECT json_agg(jee)::json
FROM unnest_and_concat;
Of course this is only intended to work for a table with these rows (for illustration). If the objective is to update the entire table then we can do that (ideally with a LATERAL JOIN) mixed with an update statement. Looks like:
UPDATE myJson
SET old_col=new_col
FROM <insert subquery or table>
WHERE myJson.id = new_table.id;
Nevertheless, I would recommend modifying upon insert rather than updating.

How do I use ADF copy activity with multiple rows in source?

I have source which is JSON array, sink is SQL server. When I use column mapping and see the code I can see mapping is done to first element of array so each run produces single record despite the fact that source has multiple records. How do I use copy activity to import ALL the rows?
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"['#odata.context']": "BuyerFinancing",
"['#odata.nextLink']": "PropertyCondition",
"value[0].AssociationFee": "AssociationFee",
"value[0].AssociationFeeFrequency": "AssociationFeeFrequency",
"value[0].AssociationName": "AssociationName",
Use * as the source field to indicate all elements in json format. For example, with json:
{
"results": [
{"field1": "valuea", "field2": "valueb"},
{"field1": "valuex", "field2": "valuey"}
]
}
and a database table with a column result to store the json. The mapping with results as the collection and * and the sub element will create two records with:
{"field1": "valuea", "field2": "valueb"}
{"field1": "valuex", "field2": "valuey"}
in the result field.
Copy Data Field Mapping
ADF support cross apply for json array. Please check the example in this doc. https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#jsonformat-example
For schema mapping: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#schema-mapping