Azure Data Flow: Array to columns

Azure Data Flow: Array to columns - azure-data-factory

In my data flow I have a column with an array and I need to map it to columns.
Here is an example of the data:
["title:mr","name:jon","surname:smith"]
[surname:jane"]
["title:mrs","surname:peters"]
["title:mr"]
and here is an example of the desired result:
what's the best approach to achieve this?

You can do this using the combination of derived column, rank and pivot transformations.
Let's say I have the given sample data (array of strings) as a column mycol.
Now, I have used rank transformation. I have given column name id for rank column and used mycol column for sort condition (ascending order). The result would be as shown below:
Now I have used derived column to create a new column with dynamic expression as unfold(mycol).
For some reason this new column's type was not being rendered properly. So, I have used cast to make it complex type with complex type defination as string[].
I have created 2 new columns key and value. The dynamic contents are as follows:
key: split(new[1],':')[1]
value: split(new[1],':')[2]
Now I have used pivot transformation. Here I have used group by on id, selected pivot column as key and selected pivoted columns as max(value) (since aggregate has to be used).
The required result is obtained. The following is the entire dataflow JSON (The actual transformations start from rank as you already have the array column.)
{
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "csv1",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"dataset": {
"referenceName": "dest",
"type": "DatasetReference"
},
"name": "sink1"
}
],
"transformations": [
{
"name": "derivedColumn1"
},
{
"name": "rank1"
},
{
"name": "derivedColumn2"
},
{
"name": "cast1"
},
{
"name": "derivedColumn3"
},
{
"name": "pivot1"
}
],
"scriptLines": [
"source(output(",
" mycol as string",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false) ~> source1",
"source1 derive(mycol = split(replace(replace(replace(mycol,'[',''),']',''),'\"',''),',')) ~> derivedColumn1",
"derivedColumn1 rank(asc(mycol, true),",
" output(id as long)) ~> rank1",
"rank1 derive(new = unfold(mycol)) ~> derivedColumn2",
"derivedColumn2 cast(output(",
" new as string[]",
" ),",
" errors: true) ~> cast1",
"cast1 derive(key = split(new[1],':')[1],",
" value = split(new[1],':')[2]) ~> derivedColumn3",
"derivedColumn3 pivot(groupBy(id),",
" pivotBy(key),",
" {} = max(value),",
" columnNaming: '$N$V',",
" lateral: true) ~> pivot1",
"pivot1 sink(allowSchemaDrift: true,",
" validateSchema: false,",
" partitionFileNames:['op.csv'],",
" umask: 0022,",
" preCommands: [],",
" postCommands: [],",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" saveOrder: 1,",
" partitionBy('hash', 1)) ~> sink1"
]
}
}
}

Related

PostgreSQL: Json function

I have a postgreSQL column that looks like this:
{
"table": false,
"time": {
"user": {
"type": "admin"
},
"end": {
"Always": null
},
"sent": {
"Never": 1356
},
"increments": 5,
"increment_type": "weeks",
"type": "days"
}
}
I would like to extract from the json file "Increments = 5 and Increment_type= weeks). result would be -- Column_a = 5 weeks

Use the dereferencing operators, -> and ->> to get what you need:
select concat(
colname->'time'->>'increments',
' ',
colname->'time'->>'increment_type'
) as column_a
from tablename;

How to add new property in an object nested in 2 arrays (JSONB postgresql)

I am looking to you for help in adding a property to a json object nested in 2 arrays.
Table Example :
CREATE TABLE events (
seq_id BIGINT PRIMARY KEY,
data JSONB NOT NULL,
evt_type TEXT NOT NULL
);
example of my JSONB data column in my table:
{
"Id":"1",
"Calendar":{
"Entries":[
{
"Id": 1,
"SubEntries":[
{
"ExtId":{
"Id":"10",
"System": "SampleSys"
},
"Country":"FR",
"Details":[
{
"ItemId":"1",
"Quantity":10,
},
{
"ItemId":"2",
"Quantity":3,
}
],
"RequestId":"222",
"TypeId":1,
}
],
"OrderResult":null
}
],
"OtherThingsArray":[
]
}
}
So I need to add new properties into a SubEntries object based on the Id value of the ExtId object (The where clause)
How can I do that please ?
Thanks a lot

You can use jsonb_set() for this, which takes jsonpath assignments as a text[] (array of text values) as
SELECT jsonb_set(
input_jsonb,
the starting jsonb document
path_array '{i,j,k[, ...]}'::text[],
the path array, where the series {i, j, k} progresses at each level with either the (string) key or (integer) index (starting at zero)denoting the new key (or index) to populate
new_jsonb_value,
if adding a key-value pair, you can use something like to_jsonb('new_value_string'::text) here to force things to format correctly
create_if_not_exists_boolean
if adding new keys/indexes, give this as true so they'll be appended; otherwise you'll be limited to overwriting existing keys
)
Example
json
{
"array1": [
{
"id": 1,
"detail": "test"
}
]
}
SQL
SELECT
jsonb_set('{"array1": [{"id": 1, "detail": "test"}]}'::jsonb,
'{array1,0,update}'::TEXT[],
to_jsonb('new'::text),
true
)
Output
{
"array1": [
{
"id": 1,
"upd": "new",
"detail": "test"
}
]
}
Note that you can only add 1 nominal level of depth at a time (i.e. either a new key or a new index entry), but you can circumvent this by providing the full depth in the assignment value, or by using jsonb_set() iteratively:
select
jsonb_set(
jsonb_set('{"array1": [{"id": 1, "detail": "test"}]}'::jsonb, '{array1,0,upd}'::TEXT[], '[{"new": "val"}]'::jsonb, true),
'{array1,0,upd,0,check}'::TEXT[],
'"test_val"',
true)
would be required to produce
{
"array1": [
{
"id": 1,
"upd": [
{
"new": "val",
"check": "test_val"
}
],
"detail": "test"
}
]
}
If you need other, more complex logic to evaluate which values need to be added to which objects, you can try:
dynamically creating a set of jsonb_set() statements for execution
using the outputs from queries of jsonb_each() and jsonb_array_elements() to evaluate the row logic down at the SubEntities level, and then using jsonb_object_agg() and jsonb_agg() as appropriate to build the document back up to the root level from the resultant object-arrays and key-value collections

Postgresql Search for a value in jsonb object

I have a jsonb field in a table having values of the form
{
"message": {
"sender": {
"from": "91**********"
},
"channel": "some kind of text",
"content": {
"text": "some kind of text",
"type": "text"
},
"recipient": {
"to": "91**********",
"recipient_type": "some kind of text"
},
"preferences": {
"webHookDNId": "some kind of text"
}
},
"metaData": {
"version": "some kind of text"
}
}
Now i want to search for all such value which in "to" key of the object has a certain phone number. i am using following query for this but it is not working
select * from table_name where (column1::jsonb ? '91**********') ;

? looks for a top-level key. The JSON you show only has two top-level keys, "message" and "metadata". So of course they don't match to '91**********'.
You probably want the containment operator #>:
#> '{"message":{"recipient":{"to":"91**********"}}}'
This will be supported by the either type of JSONB GIN index on your column.

You can use the -> and ->> operators to extract the value from a key:
select *
from the_table
where (the_column -> 'recipient' ->> 'to') = '91**********';
Or the #>> operator
select *
from the_table
where the_column #>> '{recipient,to}' = '91**********';

Is there a magic function with can extract all select keys/nested keys including array from jsonb

Given a jsonb and set of keys how can I get a new jsonb with required keys.
I've tried extracting key-values and assigned to text[] and then using jsonb_object(text[]). It works well, but the problem comes when a key has a array of jsons.
create table my_jsonb_table
(
data_col jsonb
);
insert into my_jsonb_table (data_col) Values ('{
"schemaVersion": "1",
"Id": "20180601550002",
"Domains": [
{
"UID": "29aa2923",
"quantity": 1,
"item": "book",
"DepartmentDomain": {
"type": "paper",
"departId": "10"
},
"PriceDomain": {
"Price": 79.00,
"taxA": 6.500,
"discount": 0
}
},
{
"UID": "bbaa2923",
"quantity": 2,
"item": "pencil",
"DepartmentDomain": {
"type": "wood",
"departId": "11"
},
"PriceDomain": {
"Price": 7.00,
"taxA": 1.5175,
"discount": 1
}
}
],
"finalPrice": {
"totalTax": 13.50,
"total": 85.0
},
"MetaData": {
"shopId": "1405596346",
"locId": "95014",
"countryId": "USA",
"regId": "255",
"Date": "20180601"
}
}
')
This is what I am trying to achieve :
SELECT some_magic_fun(data_col,'Id,Domains.UID,Domains.DepartmentDomain.departId,finalPrice.total')::jsonb FROM my_jsonb_table;
I am trying to create that magic function which extracts the given keys in a jsonb format, as of now I am able to extract scalar items and put them in text[] and use jsonb_object. but don't know how can I extract all elements of array
expected output :
{
"Id": "20180601550002",
"Domains": [
{
"UID": "29aa2923",
"DepartmentDomain": {
"departId": "10"
}
},
{
"UID": "bbaa2923",
"DepartmentDomain": {
"departId": "11"
}
}
],
"finalPrice": {
"total": 85.0
}
}

I don't know of any magic. You have to rebuild it yourself.
select jsonb_build_object(
-- Straight forward
'Id', data_col->'Id',
'Domains', (
-- Aggregate all the "rows" back together into an array.
select jsonb_agg(
-- Turn each array element into a new object
jsonb_build_object(
'UID', domain->'UID',
'DepartmentDomain', jsonb_build_object(
'departId', domain#>'{DepartmentDomain,departId}'
)
)
)
-- Turn each element of the Domains array into a row
from jsonb_array_elements( data_col->'Domains' ) d(domain)
),
-- Also pretty straightforward
'finalPrice', jsonb_build_object(
'total', data_col#>'{finalPrice,total}'
)
) from my_jsonb_table;
This probably is not a good use of a JSON column. Your data is relational and would better fit traditional relational tables.

Postgresql get keys from array of objects in JSONB field

Here' a dummy data for the jsonb column
[ { "name": [ "sun11", "sun12" ], "alignment": "center", "more": "fields" }, { "name": [ "sun12", "sun13" ], "alignment": "center" }, { "name": [ "sun14", "sun15" ] }]
I want to fetch all the name keys value from jsonb array of objects...expecting output -
[ [ "sun11", "sun12" ], [ "sun12", "sun13" ], [ "sun14", "sun15" ] ]
The problem is that I'm able to fetch the name key value by giving the index like 0, 1, etc
SELECT data->0->'name' FROM public."user";
[ "sun11", "sun12" ]
But I'm not able to get all the name keys values from same array of object.I Just want to get all the keys values from the array of json object. Any help will be helpful. Thanks

demo:db<>fiddle (Final query first, intermediate steps below)
WITH data AS (
SELECT '[ { "name": [ "sun11", "sun12" ], "alignment": "center", "more": "fields" }, { "name": [ "sun12", "sun13" ], "alignment": "center" }, { "name": [ "sun14", "sun15" ] }]'::jsonb AS jsondata
)
SELECT
jsonb_agg(elems.value -> 'name') -- 2
FROM
data,
jsonb_array_elements(jsondata) AS elems -- 1
jsonb_array_elements() expands every array element into one row
-> operator gives the array for attribute name; after that jsonb_agg() puts all extracted arrays into one again.

my example
SELECT DISTINCT sub.name FROM (
SELECT
jsonb_build_object('name', p.data->'name') AS name
FROM user AS u
WHERE u.data IS NOT NULL
) sub
WHERE sub.name != '{"name": null}';