Convert jsonb comma separated values into a json object using a psql script - postgresql

I have a table in postgresql that has two columns:
Table "schemaname.tablename"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
_key | character varying | | not null |
value | jsonb | | |
Indexes:
"tablename_pkey" PRIMARY KEY, btree (_key)
and I'd like to convert a nested property value of the jsonb that looks like this:
{
"somekey": "[k1=v1, k2=v2, k3=v2]",
}
into this:
{
"somekey": [
"java.util.LinkedHashMap",
{
"k1": "v1",
"k2": "v2",
"k3": "v3"
}
]
}
I've managed to parse the comma separted string into an array of strings but aside from having to still apply another split on '=' I don't really know how to do the actual UPDATE on all rows of the table and generate the proper jsonb value for "somekey" key.
select regexp_split_to_array(RTRIM(LTRIM(value->>'somekey','['),']'),',') from schemaname.tablename;
Any ideas?

Try this one (self-contained test data):
WITH tablename (_key, value) AS (
VALUES
('test', '{"somekey":"[k1=v1, k2=v2, k3=v2]"}'::jsonb),
('second', '{"somekey":"[no one=wants to, see=me, with garbage]"}'::jsonb),
('third', '{"somekey":"[some,key=with a = in it''s value, some=more here]"}'::jsonb)
)
SELECT
tab._key,
jsonb_insert(
'{"somekey":["java.util.LinkedHashMap"]}', -- basic JSON structure
'{somekey,0}', -- path to insert after
jsonb_object( -- create a JSONB object on-the-fly from the key-value array
array_agg(key_values) -- aggregate all key-value rows into one array
),
true -- we want to insert after the matching element, not before it
) AS json_transformed
FROM
tablename AS tab,
-- the following is an implicit LATERAL join (function based on eahc row for previous table)
regexp_matches( -- produces multiple rows
btrim(tab.value->>'somekey', '[]'), -- as you started with
'(\w[^=]*)=([^,]*)', -- define regular expression groups for keys and values
'g' -- we want all key-value sets
) AS key_values
GROUP BY 1
;
...resulting in:
_key | json_transformed
--------+-------------------------------------------------------------------------------------------------------
second | {"somekey": ["java.util.LinkedHashMap", {"see": "me", "no one": "wants to"}]}
third | {"somekey": ["java.util.LinkedHashMap", {"some": "more here", "some,key": "with a = in it's value"}]}
test | {"somekey": ["java.util.LinkedHashMap", {"k1": "v1", "k2": "v2", "k3": "v2"}]}
(3 rows)
I hope the inline comments explain how it works in enough detail.
Without requiring aggregate/group by:
The following requires no grouping as we don't need aggregate function array_agg, but are a little bit less strict on the key-value format and will break a query easily because of some data (the previous variant will just drop some key-value):
WITH tablename (_key, value) AS (
VALUES
('test', '{"somekey":"[k1=v1, k2=v2, k3=v2]"}'::jsonb),
('second', '{"somekey":"[no one=wants to, see=me, with garbage]"}'::jsonb)
)
SELECT
_key,
jsonb_insert(
'{"somekey":["java.util.LinkedHashMap"]}', -- basic JSON structure
'{somekey,0}', -- path to insert after
jsonb_object( -- create a JSONB object on-the-fly from the key-value array
key_values -- take the keys + values as split using the function
),
true -- we want to insert after the matching element, not before it
) AS json_transformed
FROM
tablename AS tab,
-- the following is an implicit LATERAL join (function based on eahc row for previous table)
regexp_split_to_array( -- produces an array or keys and values: [k, v, k, v, ...]
btrim(tab.value->>'somekey', '[]'), -- as you started with
'(=|,\s*)' -- regex to match both separators
) AS key_values
;
...results into:
_key | json_transformed
--------+--------------------------------------------------------------------------------
test | {"somekey": ["java.util.LinkedHashMap", {"k1": "v1", "k2": "v2", "k3": "v2"}]}
second | {"somekey": ["java.util.LinkedHashMap", {"see": "me", "no one": "wants to"}]}
(2 rows)
Feeding it with garbage (as in the "second" row before) or with an = character in the value (as in the "third" row before) would result in the following error here:
ERROR: array must have even number of elements

Related

PostgreSQL create materialized view that sums objects in JSON array into specific columns based on field and table column

Given the following table:
create table entries (
user_id integer,
locations jsonb
);
I want to create a materialized view containing the following structure. These columns should start at zero and add up based on what's in locations.
create table entries_locations_extracted (
user_id integer,
location_1_a integer,
location_1_b integer,
location_2_a integer,
location_2_b integer
);
Locations will always be a JSON array, with the following structure. Multiple locations may exist in the array. And multiple entries per user may exist.
insert into entries (user_id, locations) values (123, '[
{ location=1, a=1, b=2 },
{ location=2, a=3, b=1 },
{ location=2, a=10, b=20 },
{ location=1, a=2, b=3 },
]')
insert into entries (user_id, locations) values (123, '[
{ location=1, a=100, b=200 },
]')
Given the inserts above. The materialized view should have the following row:
| user_id | location_1_a | location_1_b | location_2_a | location_2_b |
-----------------------------------------------------------------------
| 123 | 103 | 205 | 13 | 21 |
You can use an aggregate with a filter and a lateral query to expand the array for this:
SELECT
user_id,
SUM((loc->>'a')::int) FILTER (WHERE loc->'location' = '1') AS location_1_a,
SUM((loc->>'b')::int) FILTER (WHERE loc->'location' = '1') AS location_1_b,
SUM((loc->>'a')::int) FILTER (WHERE loc->'location' = '2') AS location_2_a,
SUM((loc->>'b')::int) FILTER (WHERE loc->'location' = '2') AS location_2_b
FROM
entries,
jsonb_array_elements(locations) AS loc
GROUP BY
user_id;

PostgreSQL: Filter and Aggregate on JSONB Array type

Consider the following table definition:
CREATE TABLE keys
(
id bigint NOT NULL DEFAULT nextval('id_seq'::regclass),
key_value jsonb[] NOT NULL DEFAULT ARRAY[]::jsonb[],
)
The table now contains the following values:
id | key_value
---|-----------
1 | {"{\"a\": \1\", \"b\": \"2\", \"c\": \"3\"}","{\"a\": \"4\", \"b\": \"5\", \"c\": \"6\"}","{\"a\": \"7\", \"b\": \"8\", \"c\": \"9\"}"} |
How do I:
Select all rows where value of b is NOT 2? I tried using the #> operator,
For the returned rows, for each key_value object, return c - a
My confusion stems from the fact that all methods dealing with JSONB in postgres seem to accept JSON or JSONB but none seem to work with JSONB[]. Not sure what I am missing?
Thanks in advance
What could be better than doing this with unnest and normal relational operations?
array types and json are abominations in the face of the perfection that is relational sets. The first rule of holes is that when you find yourself in one, stop digging and climb out of the hole.
with unwind as (
select id, unnest(key_value) as kvjson
from keys
)
select id, (kvjson->>'c')::int - (kvjson->>'a')::int as difference
from unwind
where kvjson->>'b' != '2';

How to break apart a column includes keys and values into separate columns in postgres

I am new in postgres and basically have no experience. I have a table with a column includes key and value. I need write a query which return a table with the all the columns of the table and additional columns as key as the column name and the value under it.
My input is like:
id | name|message
12478| A |{img_type:=png,key_id:=f235, client_status:=active, request_status:=open}
12598| B |{img_type:=none,address_id:=c156, client_status:=active, request_status:=closed}
output will be:
id |name| Img_type|Key_id|address_id|Client_status|Request_status
12478| A | png |f235 |NULL |active | open
12598| B | none |NULL |c156 |active | closed
Any help would be greatly appreciated.
The only thing I can think of, is a regular expression to extract the key/value pairs.
select id, name,
(regexp_match(message, '(img_type:=)([^,}]+),{0,1}'))[2] as img_type,
(regexp_match(message, '(key_id:=)([^,}]+),{0,1}'))[2] as key_id,
(regexp_match(message, '(client_status:=)([^,}]+),{0,1}'))[2] as client_status,
(regexp_match(message, '(request_status:=)([^,}]+),{0,1}'))[2] as request_status
from the_table;
regexp_match returns an array of matches. As the regex contains two groups (one for the "key" and one for the "value"), the [2] takes the second element of the array.
This is quite expensive and error prone (e.g. if any of the values contains a , and you need to deal with quoted values). If you have any chance to change the application that stores the value, you should seriously consider changing your code to store a proper JSON value, e.g.
{"img_type": "png", "key_id": "f235", "client_status": "active", "request_status": "open"}'
then you can use e.g. message ->> 'img_type' to retrieve the value for the key img_type
You might also want to consider a properly normalized table, where each of those keys is a real column.
I can do it with function.
I am sure about the performance but here is my suggestion:
CREATE TYPE log_type AS (img_type TEXT, key_id TEXT, address_id TEXT, client_status TEXT, request_status TEXT);
CREATE OR REPLACE FUNCTION populate_log(data TEXT)
RETURNS log_type AS
$func$
DECLARE
r log_type;
BEGIN
select x.* into r
from
(
select
json_object(array_agg(array_data)) as json_data
from (
select unnest(string_to_array(trim(unnest(string_to_array(substring(populate_log.data, '[^{}]+'), ','))), ':=')) as array_data
) d
) d2,
lateral json_to_record(json_data) as x(img_type text, key_id text, address_id text, client_status text, request_status text);
RETURN r;
END
$func$ LANGUAGE plpgsql;
with log_data (id, name, message) as (
values
(12478, 'A', '{img_type:=png,key_id:=f235, client_status:=active, request_status:=open}'),
(12598, 'B', '{img_type:=none,address_id:=c156, client_status:=active, request_status:=closed}')
)
select id, name, l.*
from log_data, lateral populate_log(message) as l;
What you finally write in query will be something like this, imagine that the data is in a table named log_data :
select id, name, l.*
from log_data, lateral populate_log(message) as l;
I suppose that message column is a text, in Postgres it might be an array, in that case you have to remove some conversions, string_to_array(substring(populate_log.data)) -> populate_log.data

KDB: How to assign string datatype to all columns

When I created the table Tab, I specified the columns as string,
Tab: ([Key1:string()] Col1:string();Col2:string();Col3:string())
But the column datatype (t) is empty. I suppose specifying the column as string has no effect.
meta Tab
c t f a
--------------------
Key1
Col1
Col2
Col3
After I do a bulk upsert in Java...
c.Dict dict = new c.Dict((Object[]) columns.toArray(new String[columns.size()]), data);
c.Flip flip = new c.Flip(dict);
conn.c.ks("upsert", table, flip);
The datatypes are all symbols:
meta Tab
c t f a
--------------------
Key1 s
Col1 s
Col2 s
Col3 s
How can I specify the datatype of the columns as string and have it remain as string?
You cant define a column of the empty table with as strings as they are merely lists of lists of characters
You can just set them as empty lists which is what your code is doing.
But the column will then take on the type of whatever data is inserted into it.
Real question is what is your java process sending symbols when it should be sending strings. You need to make the change there before publishing to KDB
Note if you define as chars you still wont be able to upsert strings
q)Tab: ([Key1:`char$()] Col1:`char$();Col2:`char$();Col3:`char$())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
'rank
[0] Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
^
q)Tab: ([Key1:()] Col1:();Col2:();Col3:())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
Key1 | Col1 Col2 Col3
------| --------------------
"test"| "test" "test" "test"
KDB does not allow to define column types as list during creation of table. So that means you can not define your column type as String because that is also a list.
To do that only way is to define column as empty list like:
q) t:([]id:`int$();val:())
Then when you insert data to this table the column will automatically take type of that data.
q)`t insert (4;"row1")
q) meta t
c | t f a
---| -----
id | i
val| C
In your case, one option is to send string data from your Java process as mentioned by user 'emc211' or other option is to convert your data to string in KDB process before insertion.

aggregate json field names into array

I have a table that holds JSON data. In a query with a GROUP BY clause, I'd like to get an array of all of the JSON field names in the result set.
I tried a query like this:
SELECT array_agg(jsonb_object_keys(data))
FROM table
GROUP BY some_id
WHERE some_id = 3
For an input data like
some_id | data
--------|---------------
3 | {"foo": "bar"}
4 | {"baz": 3}
3 | {"bar": 4}
I'd like to receive:
array_agg
--------------
{'foo', 'bar'}
But it returns an error: ERROR: set-valued function called in context that cannot accept a set
It seems like I need to somehow convert setof text, which is what jsonb_object_keys returns, into array but I don't know how.