Jolt - recurcive process - jolt

I try to flatten some JSON arrays using Jolt. I found how to do it on the first level, but I need to do it "recursively".
About the input:
the array I want to flatten are name "objects"
the inner objects all contain a "name" attribute but the others attributes are not common
Here is a simple sample:
{
"objects": [
{
"name": "first",
"type": "FIRST",
"1st_attr": 0
},
{
"name": "second",
"type": "SECOND",
"2nd_attr": 0
}
],
"sub": {
"objects": [
{
"name": "third",
"type": "THIRD",
"3rd_attr": 0
}
]
}
}
Here is the output I want:
{
"first" : {
"1st_attr" : 0,
"name" : "first",
"type" : "FIRST"
},
"second" : {
"2nd_attr" : 0,
"name" : "second",
"type" : "SECOND"
},
"sub" : {
"third" : {
"3rd_attr" : 0,
"name" : "third",
"type" : "THIRD"
}
}
}
The spec I have flatten the first level, but I would like to have it flatten every level (meanining, not just the second ;)...):
[
{
"operation": "shift",
"spec": {
"objects": {
"*": {
"#": "#(1,name)"
}
},
"*": "&0"
}
}
]
Thanks for your help

With Jolt you have to know where in your input the "object" arrays are. There isn't a way to "just find and flatten all the arrays, no matter where they are in the input doc".
Spec for the input and output you provided :
[
{
"operation": "shift",
"spec": {
"objects": {
"*": "#(0,name)"
},
"sub": {
"objects": {
"*": "&2.#(0,name)"
}
}
}
}
]
Edit: added &2. to have sub.third instead of just third

Related

Need with Jolt Spec for dynamic key

My input looks like below
{
"honda" : {
"accord" : [ "30", "20" ],
"plus" : [ "20", "10", "" ]
},
"tesla" : {
"modelY" : [ "50", "20", "" ],
"modelX" : [ "20", "" ]
}
}
I want to write a spec to get only the first value in the array for each model. This is dynamic data and may vary depending on the query. The output must look something like this. I have tried an no luck with the dynamic key.
{
"honda" : {
"accord" : "30",
"plus" : "20"
},
"tesla" : {
"modelY" : "50",
"modelX" : "20"
}
}
Check this spec,
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"0": "&2.&1"
}
}
}
}
]

JSON conversion using JOLT

I am trying to convert a JSON to different format using JOLT (using NiFi JoltTransformJson processor). For single JSON record, the JOLT am using is working fine in JOLT app demo whereas if i execute with multiple JSON records then I am not getting expected output in JOLT app demo. Could anyone correct me what additional changes I need to do in JOLT spec to handle multiple JSON records.
sample input json
[
{
"pool": {
"field": [
{
"name": "BillingDay",
"value": "12"
},
{
"name": "Custom1",
"value": "POOL_BASE_PLAN_3GB"
}
]
},
"usage": {
"version": "3",
"quota": {
"name": "POOL_TOP_UP_1GB_2",
"cid": "5764888998010953848"
}
},
"state": {
"version": "1",
"property": [
{
"name": "SMS_RO_TOP",
"value": "1"
},
{
"name": "BillingTimeStamp",
"value": "2020-06-12T01:00:05"
},
{
"name": "timereset",
"value": "2020-01-12T00:35:53"
}
]
}
},
{
"pool": {
"field": [
{
"name": "PoolID",
"value": "111100110000003505209"
},
{
"name": "BillingDay",
"value": "9"
}
]
},
"usage": {
"version": "3"
},
"state": {
"version": "1",
"property": [
{
"name": "BillingTimeStamp",
"value": "2020-06-09T01:00:05"
},
{
"name": "timereset",
"value": "2019-03-20T17:10:38"
}
]
}
}
]
JOLT using:
[
{
"operation": "modify-default-beta",
"spec": {
"state": {
"property": {
"name": "NOTAVAILABLE"
}
},
"usage": {
"quota": {
"name": "NOTAVAILABLE"
}
}
}
},
{
"operation": "shift",
"spec": {
"pool": {
"field": {
"*": {
"value": "pool_item.#(1,name)"
}
}
},
// remaining elements print as it is
"*": "&"
}
}
]
Expected output JSON:
[
{
"pool_item" : {
"BillingDay" : "12",
"Custom1" : "POOL_BASE_PLAN_3GB"
},
"usage" : {
"version" : "3",
"quota" : {
"name" : "POOL_TOP_UP_1GB_2",
"cid" : "5764888998010953848"
}
},
"state" : {
"version" : "1",
"property" : [ {
"name" : "SMS_RO_TOP",
"value" : "1"
}, {
"name" : "BillingTimeStamp",
"value" : "2020-06-12T01:00:05"
}, {
"name" : "timereset",
"value" : "2020-01-12T00:35:53"
} ]
}
},
{
"pool_item" : {
"BillingDay" : "9",
"PoolID" : "111100110000003505209"
},
"usage" : {
"version" : "3",
"quota" : {
"name" : "NOTAVAILABLE"
}
},
"state" : {
"version" : "1",
"property" : [ {
"name" : "SMS_RO_TOP",
"value" : "1"
}, {
"name" : "BillingTimeStamp",
"value" : "2020-06-12T01:00:05"
}, {
"name" : "timereset",
"value" : "2020-01-12T00:35:53"
} ]
}
}
]
This below jolt shift specification will work for your multiple json's in input array.
[
{
"operation": "shift",
"spec": {
"*": {
"pool": {
"field": {
"*": {
"value": "[&4].pool_item.#(1,name)"
}
}
},
"usage": "[&1].usage",
"state": "[&1].state"
}
}
}
]

JOLT spec for ternary operation

I am trying to write a jolt spec for conversion of the following input in the expected output mentioned below
INPUT :
{
"city": "Seattle",
"state": "WA",
"country": "US",
"date": "10/20/2018",
"userList": [
{
"name": "David",
"age": "22",
"sex": "M",
"company": "good"
},
{
"name": "Tom",
"age": "30",
"sex": "M",
"company": "good"
},
{
"name": "Annie",
"age": "25",
"sex": "F",
"company": "bad"
},
{
"name": "Aaron",
"age": "27",
"sex": "M",
"company": "bad"
}
]
}
EXPECTED OUTPUT:
{
"users" : [ {
"date" : "10/20/2018",
"username" : "David",
"age" : "22",
"sex" : "M",
"organization" : "good"
}, {
"date" : "10/20/2018",
"username" : "Tom",
"age" : "30",
"sex" : "M",
"organization" : "good"
} ],
"Date" : "10/20/2018",
"State" : "WA",
"Country" : "US"
}
I want to filter out all the elements in the list where company = bad or sex = F.Or alternatively keep only the elements where company = good and sex=M.
I need help in removing the elements from the list based on the specific conditions.
Is jolt recommended for such data driven conversions?
The Spec that I have written so far is
[{
"operation": "shift",
"spec": {
"userList": {
"*": {
"name": "users.[&1].username",
"age": "users.[&1].age",
"sex": "users.[&1].sex",
"company": "users.[&1].organization",
"#(2,date)": "users.[&1].date"
}
},
"date": "Date",
"state": "State",
"country": "Country"
}
}
]
ok, here is the process I followed to come up to a solution for the "take all M+good"
copy top level element as-is, not impacting the scope
for users content, go to the first field used for filtering "users -> * -> sex -> M" and copy all fields in that case
do the same for the other field used for filtering
in step 2 and 3, I used a map to avoid the "null" in the array. Now switching back to array
[{
"operation": "shift",
"spec": {
"userList": {
"*": {
"name": "users.[&1].username",
"age": "users.[&1].age",
"sex": "users.[&1].sex",
"company": "users.[&1].organization",
"#(2,date)": "users.[&1].date"
}
},
"date": "Date",
"state": "State",
"country": "Country"
}
}
,
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": {
"sex": {
"M": {
"#(2,username)": "users.&3.username",
"#(2,age)": "users.&3.age",
"#(2,sex)": "users.&3.sex",
"#(2,organization)": "users.&3.organization",
"#(2,date)": "users.&3.date"
}
}
}
}
}
}
,
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": {
"organization": {
"good": {
"#(2,username)": "users.&3.username",
"#(2,age)": "users.&3.age",
"#(2,sex)": "users.&3.sex",
"#(2,organization)": "users.&3.organization",
"#(2,date)": "users.&3.date"
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": "users[]"
}
}
}
]
For the other case, you just have to change the filter rules to do nothing when the rule match and copy the field in the other cases (*). Which give following spec (did it only on first field, updating for second field should not be an issue)
[{
"operation": "shift",
"spec": {
"userList": {
"*": {
"name": "users.[&1].username",
"age": "users.[&1].age",
"sex": "users.[&1].sex",
"company": "users.[&1].organization",
"#(2,date)": "users.[&1].date"
}
},
"date": "Date",
"state": "State",
"country": "Country"
}
}
,
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": {
"sex": {
"F": null,
"*": {
"#(2,username)": "users.&3.username",
"#(2,age)": "users.&3.age",
"#(2,sex)": "users.&3.sex",
"#(2,organization)": "users.&3.organization",
"#(2,date)": "users.&3.date"
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": "users[]"
}
}
}
]

Implicit array creating in jolt

Input:
{
"categories": {
"1": {
"name": "Books"
},
"2": {
"name": "Games"
}
}
}
Spec:
[
{
"operation": "shift",
"spec": {
"categories": {
"*": {
"#": "categories"
}
}
}
}
]
Output (array of categories):
{
"categories" : [ {
"name" : "Books"
}, {
"name" : "Games"
} ]
}
Another Input with just one element
{
"categories": {
"1": {
"name": "Books"
}
}
}
Output:
{
"categories" : {
"name" : "Books"
}
}
I expected the output to be a categories array containing just one element. Why does this spec not create an array when there is a single element?
I managed this problem with following transformation:
[
{
"operation": "shift",
"spec": {
"*": {
"*": "categories[#1]"
}
}
}
]

how to validate datatype in jolt transformation

I'm new to jolt transformation. I was wondering if there is a way to do a validation on data type then proceed.
I'm processing a json to insert record into hbase. From source I'm getting timestamp repeated for the same resource id which I want to use for row key.
So I just retrieve the first timestamp and concate with resource id to create row key. But I have an issue when there is only one timestamp in the record i.e when its not a list. Appreciate if someone can help me how to handle this situation.
input data
{ "resource": {
"id": "200629068",
"name": "resource_name_1)",
"parent": {
"id": 200053744,
"name": "parent_name"
},
"properties": {
"AP_ifSpeed": "0",
"DisplaySpeed": "0 (NotApplicable)",
"description": "description"
}
},
"data": [
{
"metric": {
"id": "2215",
"name": "metric_name 1"
},
"timestamp": 1535064595000,
"value": 0
},
{
"metric": {
"id": "2216",
"name": "metric_name_2"
},
"timestamp": 1535064595000,
"value": 1
}
]
}
Jolt transformation
[{
"operation": "shift",
"spec": {
"resource": {
// "id": "resource_&",
"name": "resource_&",
"id": "resource_&",
"parent": {
"id": "parent_&",
"name": "parent_&"
},
"properties": {
"*": "&"
}
},
"data": {
"*": {
"metric": {
"id": {
"*": {
"#(3,value)": "&1"
}
},
"name": {
"*": {
"#(3,value)": "&1"
}
}
},
"timestamp": "timestamp"
}
}
}
}, {
"operation": "shift",
"spec": {
"timestamp": {
// get first element from list
"0": "&1"
},
"*": "&"
}
},
{
"operation": "modify-default-beta",
"spec": {
"rowkey": "=concat(#(1,resource_id),'_',#(1,timestamp))"
}
}
]
Output I'm getting
{ "resource_name" : "resource_name_1)",
"resource_id" : "200629068",
"parent_id" : 200053744,
"parent_name" : "parent_name",
"AP_ifSpeed" : "0",
"DisplaySpeed" : "0 (NotApplicable)",
"description" : "description",
"2215" : 0,
"metric_name 1" : 0,
"timestamp" : 1535064595000,
"2216" : 1,
"metric_name_2" : 1,
"rowkey" : "200629068_1535064595000"
}
when there is only one timestamp then I get
"rowkey" : "200629068_"
In your shift make the output "timestamp" always be an array, even if the incoming data array only has one element in it.
"timestamp": "timestamp[]"