Coalesce value bound to object's key into parent's value - postgresql

I have a PostgreSQL 12.x database. There is a column data in a table typename that contains JSON. The actual JSON data is not fixed to a particular structure; these are some examples:
{"emt": {"key": " ", "source": "INPUT"}, "id": 1, "fields": {}}
{"emt": {"key": "Stack Overflow", "source": "INPUT"}, "id": 2, "fields": {}}
{"emt": {"key": "https://www.domain.tld/index.html", "source": "INPUT"}, "description": {"key": "JSONB datatype", "source": "INPUT"}, "overlay": {"id": 5, "source": "bOv"}, "fields": {"id": 1, "description": "Themed", "recs ": "1"}}
Basically, what I'm trying to come up with is a (database migration) script that will find any object with the keys key and source, take the actual value of key and assign it to the corresponding key/value pair where the object was originally bound to. For instance:
{"emt": " ", "id": 1, "fields": {}}
{"emt": "Stack Overflow", "id": 2, "fields": {}}
{"emt": "https://www.domain.tld/index.html", "description": "JSONB datatype", "overlay": {"id": 5, "source": "bOv"}, "fields": {"id": 1, "description": "Themed", "recs ": "1"}}
I started finding the rows that contained "source": "INPUT" by using:
select * from typename
where jsonb_path_exists(data, '$.** ? (#.type() == "string" && # like_regex "INPUT")');
...but then I'm not sure how to update the returned subset or to loop through it :/

It took me a while but here is the update statement:
update typename
set data = jsonb_set(data, '{emt}', jsonb_extract_path(data, 'emt', 'key')::jsonb, false)
where jsonb_typeof(data -> 'emt') = 'object'
and jsonb_path_exists(data, '$.emt.key ? (#.type() == "string")')
and jsonb_path_exists(data, '$.emt.source ? (#.type() == "string" && # like_regex "INPUT")');
There are probably better ways to implement that where clause, but that one works ;)
One downside is that I had to figure it out how many keys are involved in the update and align it with the number of update statements; e.g.: in the original example there were two keys: emt and description — so it should have been two update statements.

Related

PostgresSQL nested jsonb update value of complex key/value pairs

Starting out with JSONB data type and I'm hoping someone can help me out.
I have a table (properties) with two columns (id as primary key and data as jsonb).
The data structure is:
{
"ProductType": "ABC",
"ProductName": "XYZ",
"attributes": [
{
"name": "Color",
"type": "STRING",
"value": "Silver"
},
{
"name": "Case",
"type": "STRING",
"value": "Shells"
},
...
]
}
I would like to update the value of a specific attributes element by name for a row with a given id. For example, for the element with "name"="Case" change the value to "Glass". So it ends up like
{
"ProductType": "ABC",
"ProductName": "XYZ",
"attributes": [
{
"name": "Color",
"type": "STRING",
"value": "Silver"
},
{
"name": "Case",
"type": "STRING",
"value": "Glass"
},
...
]
}
Is this possible with this structure using SQL?
I have created table structure if any of you would like to give it a shot.
dbfiddle
Use the jsonb concatenation operator, ||, to replace keys on the fly:
WITH properties (id, data) AS (
values
(1, '{"ProductType": "ABC","ProductName": "XYZ","attributes": [{"name": "Color","type": "STRING","value": "Silver"},{"name": "Case","type": "STRING","value": "Shells"}]}'::jsonb),
(2, '{"ProductType": "ABC","ProductName": "XYZ","attributes": [{"name": "Color","type": "STRING","value": "Red"},{"name": "Case","type": "STRING","value": "Shells"}]}'::jsonb)
)
SELECT id,
data||
jsonb_build_object(
'attributes',
jsonb_agg(
case
when attribs->>'name' = 'Case' then attribs||'{"value": "Glass"}'::jsonb
else attribs
end
)
) as data
FROM properties m
CROSS JOIN LATERAL JSONB_ARRAY_ELEMENTS(data->'attributes') as a(attribs)
GROUP BY id, data
Updated fiddle

Differentiate dropdown multi select (without options defined) with regular text columns

Is there any way to differentiate columns that are of type drop down multi select from regular text columns :
This is supposed to be a multi select drop down list without any option :
"id": 5414087443146628,
"version": 2,
"index": 2,
"title": "Column3",
"type": "TEXT_NUMBER",
"validation": false,
"width": 150
Same question goes for multi contact list without contact options defined.
If you think of multi-contact or multi-dropdown as new versions of the various GET requests, then its easier to return the correct values. For multi-dropdown, you use a combination of query parameters of "level=3" and "include=objectValue", then you'll see the column type change to MULTI_PICKLIST instead of TEXT. (The TEXT value is to maintain backwards compatibility.)
So, essentially, your request would look something like GET /sheets/{sheetId}?level=3&include=objectValue.
To test the scenario you've described, I created the following sheet structure in Smartsheet, where the column names indicate the type of each column:
Then I used Postman to issue a Get Sheet request for that sheet:
GET https://api.smartsheet.com/2.0/sheets/5831916227192708
The columns portion of the API response looks like this:
{
"id": 5831916227192708,
...
"columns": [
{
"id": 1256050323154820,
"version": 0,
"index": 0,
"title": "Description",
"type": "TEXT_NUMBER",
"primary": true,
"validation": false,
"width": 124
},
{
"id": 5759649950525316,
"version": 0,
"index": 1,
"title": "Type=Text/Number",
"type": "TEXT_NUMBER",
"validation": false,
"width": 128
},
{
"id": 1323283741206404,
"version": 0,
"index": 2,
"title": "Type=Dropdown (single select)",
"type": "PICKLIST",
"validation": false,
"width": 111
},
{
"id": 7741495861110660,
"version": 2,
"index": 3,
"title": "Type=Dropdown (multiple select)",
"type": "TEXT_NUMBER",
"validation": false,
"width": 113
},
{
"id": 3048711514285956,
"version": 0,
"index": 4,
"title": "Type=Contact List (single select)",
"type": "CONTACT_LIST",
"validation": false,
"width": 122
},
{
"id": 3992195570132868,
"version": 1,
"index": 5,
"title": "Type=Contact List (multiple select)",
"type": "TEXT_NUMBER",
"validation": false,
"width": 125
}
],
...
}
In this response, we see the following:
If column type is specified as Text/Number, the type attribute value is TEXT_NUMBER
If column type is specified as Dropdown (single select), the type attribute value is PICKLIST
If column type is specified as Dropdown (multiple select), the type attribute value is TEXT_NUMBER
If column type is specified as Contact List (single select), the type attribute value is CONTACT_LIST
If column type is specified as Contact List (multiple select), the type attribute value is TEXT_NUMBER
Therefore, it doesn't seem possible to programmatically differentiate a Dropdown (multiple select) column from a Text/Number column or a Contact List (multiple select) column from a Text/Number column, based on column metadata alone. IMO, seems like a bug for the Dropdown (multiple select) column type and Contact List (multiple select) column type to return type: TEXT_NUMBER. Perhaps someone with Smartsheet can comment here to provide more insight into this behavior.
Did a few tests and level 3 isn't available : https://api.smartsheet.com/2.0/sheets/{sheetId}?level=3 :
{
"errorCode": 1018,
"message": "The value '3' was not valid for the parameter 'level'.",
"refId": "1godowa5cigf1"
}
Although i tried with level 2 and got the info :
https://api.smartsheet.com/2.0/sheets/{sheetId}?level=2&include=objectValue
Results for a multi drop down list :
{
"id": 5414087443146628,
"version": 2,
"index": 2,
"title": "Column3",
"type": "MULTI_PICKLIST",
"options": [
"a",
"b"
],
"validation": false,
"width": 150
}

Parsing Really Messy Nested JSON Strings

I have a series of deeply nested json strings in a pyspark dataframe column. I need to explode and filter based on the contents of these strings and would like to add them as columns. I've tried defining the StructTypes but each time it continues to return an empty DF.
Tried using json_tuples to parse but there are no common keys to rejoin the dataframes and the row numbers dont match up? I think it might have to do with some null fields
The sub field can be nullable
Sample JSON
{
"TIME": "datatime",
"SID": "yjhrtr",
"ID": {
"Source": "Person",
"AuthIFO": {
"Prov": "Abc",
"IOI": "123",
"DETAILS": {
"Id": "12345",
"SId": "ABCDE"
}
}
},
"Content": {
"User1": "AB878A",
"UserInfo": "False",
"D": "ghgf64G",
"T": "yjuyjtyfrZ6",
"Tname": "WE ARE THE WORLD",
"ST": null,
"TID": "BPV 1431: 1",
"src": "test",
"OT": "test2",
"OA": "test3",
"OP": "test34
},
"Test": false
}

Apache Druid sql query conversion to json based query

I am trying to convert the following druid sql query to a druid json query, as one of the columns i have is a multi-value dimension for which druid does not support a sql style query.
My sql query:
SELECT date_dt, source, type_labels, COUNT(DISTINCT unique_p_hll)
FROM "test"
WHERE
type_labels = 'z' AND
(a_id IN ('a', 'b', 'c') OR b_id IN ('m', 'n', 'p'))
GROUP BY date_dt, source, type_labels;
unique_p_hll is an hll column with uniques.
The druid json query i came up with is following:
{
"queryType": "groupBy",
"dataSource": "test",
"granularity": "day",
"dimensions": ["source", "type_labels"],
"limitSpec": {},
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "type_labels", "value": "z" },
{ "type": "or", "fields": [
{ "type": "in", "dimension": "a_id", "values": ["a", "b", "c"] },
{ "type": "in", "dimension": "b_id", "values": ["m", "n", "p"] }
]}
]
},
"aggregations": [
{ "type": "longSum", "name": "unique_p_hll", "fieldName": "p_id" }
],
"intervals": [ "2018-08-01/2018-08-02" ]
}
But the json query seems to be returning empty resultset.
I can see the output correctly in Pivot UI. Though the array column type_labels values show up as {"array_element": "z"} instead of simply "z".
Does the query return empty string, or does it return a formatted JSON with zero records?
If the former, I can suggest a couple of leads for debugging this issue:
Make sure that the query is properly sent to the Broker, as shown in Druid's query tutorial:
curl -X 'POST' -H 'Content-Type:application/json' -d #query-file.json http://<BROKER-IP>:<BROKER-PORT>/druid/v2?pretty
Also, check the Broker's log for errors.

How do I select only a specific key's value from jsonb type in Postgres

I have a jsonb column which has data as below.
[
{"key": "unit_type", "value": "Tablet", "display_name": "Unit Type"},
{"key": "pack_type", "value": "Packet", "display_name": "Pack Type"},
{"key": "units_in_pack", "value": "60", "display_name": "Units in Pack"},
{"key": "item_unit", "value": "", "display_name": "Item unit"},
{"key": "item_size", "value": "1", "display_name": "Item Size"},
{"key": "details", "value": "", "display_name": "Details"},
{"key": "slug", "value": "otc7087", "display_name": "Slug"}
]
I want to get the value field from the array which has a key called slug, so that when I do a select query over table, I get this particular value from the column. For the above row when I do select name, slug, price from table, I should get med1, otc7087, 100 as the output. I am unable to build a query for this thing. I can get all the keys or all the values but how do I select a particular one in the same select query?
Or simply how do I select just the slugs from the table? That will answer.
i believe your json is much more structured ,
just try jsonb_to_recordset
for ex:
select * from json_to_recordset('[
{"key": "unit_type", "value": "Tablet", "display_name": "Unit Type"},
{"key": "pack_type", "value": "Packet", "display_name": "Pack Type"},
{"key": "units_in_pack", "value": "60", "display_name": "Units in Pack"},
{"key": "item_unit", "value": "", "display_name": "Item unit"},
{"key": "item_size", "value": "1", "display_name": "Item Size"},
{"key": "details", "value": "", "display_name": "Details"},
{"key": "slug", "value": "otc7087", "display_name": "Slug"}
]') as x(key int, value text, display_name text);
it will convert jsonb into table with key, value, display_name as columns and then you can fire any type query over it, it works for extracting keys also, whereas the way #Craig Ringer suggested you won't be able to convert it into table like things and firing complex select query like not in , != , range queries , ilike will be really difficult and might be less performant.
You seem to want to search all elements of a json array for an object with a particular value for a given key, then return the value of another key if matched.
Something like this will do the trick:
WITH my_table(jsonblob) AS (VALUES('[
{"key": "unit_type", "value": "Tablet", "display_name": "Unit Type"},
{"key": "pack_type", "value": "Packet", "display_name": "Pack Type"},
{"key": "units_in_pack", "value": "60", "display_name": "Units in Pack"},
{"key": "item_unit", "value": "", "display_name": "Item unit"},
{"key": "item_size", "value": "1", "display_name": "Item Size"},
{"key": "details", "value": "", "display_name": "Details"},
{"key": "slug", "value": "otc7087", "display_name": "Slug"}
]'::jsonb))
SELECT elem ->> 'value'
FROM my_table
CROSS JOIN LATERAL jsonb_array_elements(jsonblob) elem
WHERE (elem ->> 'key') = 'slug';
i.e. select from the table, unpack the array into a join, filter the join table for the desired object by looking for the json key key with value slug, and return the value of the json key value in the select clause when found.
If you want multiple different values from the same json object you need multiple joins, one per desired value.
This is a pretty ugly way to store variable key/value format data. I'd suggest storage like:
{"unit_type": {"value": "Tablet", "display_name": "Unit Type"}, ...}
where you can actually look up the keys.