JSON conversion using JOLT - jolt

I am trying to convert a JSON to different format using JOLT (using NiFi JoltTransformJson processor). For single JSON record, the JOLT am using is working fine in JOLT app demo whereas if i execute with multiple JSON records then I am not getting expected output in JOLT app demo. Could anyone correct me what additional changes I need to do in JOLT spec to handle multiple JSON records.
sample input json
[
{
"pool": {
"field": [
{
"name": "BillingDay",
"value": "12"
},
{
"name": "Custom1",
"value": "POOL_BASE_PLAN_3GB"
}
]
},
"usage": {
"version": "3",
"quota": {
"name": "POOL_TOP_UP_1GB_2",
"cid": "5764888998010953848"
}
},
"state": {
"version": "1",
"property": [
{
"name": "SMS_RO_TOP",
"value": "1"
},
{
"name": "BillingTimeStamp",
"value": "2020-06-12T01:00:05"
},
{
"name": "timereset",
"value": "2020-01-12T00:35:53"
}
]
}
},
{
"pool": {
"field": [
{
"name": "PoolID",
"value": "111100110000003505209"
},
{
"name": "BillingDay",
"value": "9"
}
]
},
"usage": {
"version": "3"
},
"state": {
"version": "1",
"property": [
{
"name": "BillingTimeStamp",
"value": "2020-06-09T01:00:05"
},
{
"name": "timereset",
"value": "2019-03-20T17:10:38"
}
]
}
}
]
JOLT using:
[
{
"operation": "modify-default-beta",
"spec": {
"state": {
"property": {
"name": "NOTAVAILABLE"
}
},
"usage": {
"quota": {
"name": "NOTAVAILABLE"
}
}
}
},
{
"operation": "shift",
"spec": {
"pool": {
"field": {
"*": {
"value": "pool_item.#(1,name)"
}
}
},
// remaining elements print as it is
"*": "&"
}
}
]
Expected output JSON:
[
{
"pool_item" : {
"BillingDay" : "12",
"Custom1" : "POOL_BASE_PLAN_3GB"
},
"usage" : {
"version" : "3",
"quota" : {
"name" : "POOL_TOP_UP_1GB_2",
"cid" : "5764888998010953848"
}
},
"state" : {
"version" : "1",
"property" : [ {
"name" : "SMS_RO_TOP",
"value" : "1"
}, {
"name" : "BillingTimeStamp",
"value" : "2020-06-12T01:00:05"
}, {
"name" : "timereset",
"value" : "2020-01-12T00:35:53"
} ]
}
},
{
"pool_item" : {
"BillingDay" : "9",
"PoolID" : "111100110000003505209"
},
"usage" : {
"version" : "3",
"quota" : {
"name" : "NOTAVAILABLE"
}
},
"state" : {
"version" : "1",
"property" : [ {
"name" : "SMS_RO_TOP",
"value" : "1"
}, {
"name" : "BillingTimeStamp",
"value" : "2020-06-12T01:00:05"
}, {
"name" : "timereset",
"value" : "2020-01-12T00:35:53"
} ]
}
}
]

This below jolt shift specification will work for your multiple json's in input array.
[
{
"operation": "shift",
"spec": {
"*": {
"pool": {
"field": {
"*": {
"value": "[&4].pool_item.#(1,name)"
}
}
},
"usage": "[&1].usage",
"state": "[&1].state"
}
}
}
]

Related

Elasticsearch search by time range

Trying to search based on date and time separately,
elastic-search document format is,
{
"id": "101",
"name": "Tom",
"customers": ["Jerry", "Nancy", "soli"],
"start_time": "2021-12-13T06:57:29.420198Z",
"end_time": "2021-12-13T07:00:23.511722Z",
}
I need to search based on date and time separately,
Ex:
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"range": {
"start_time": {"gte" : "2021-12-13", "lte" : "2021-12-15" }}
}
]}
}
}
o/p: I am getting the above doc as the result which is expected.
but when I use the below query, then I am getting errors,
"failed to parse date field [6:57:29] with format [strict_date_optional_time||epoch_millis]: [failed to parse date field [6:57:29] with format [strict_date_optional_time||epoch_millis]]"
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"range": {
"start_time": {"gte" : "6:57:29", "lte" : "6:59:35" }}
}
]}
}
}
Why I am not able to get the result based on time?
is there any idea to achieve a search based on both date and time with the single field?
Ex:
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"range": {
"start_time": {"gte" : "2021-12-13", "lte" : "2021-12-15" }}
},
{
"range": {
"start_time": {"gte" : "6:57:29", "lte" : "6:59:35" }}
}
]}
}
}
I also tried to achieve this using regular expressions, but it didn't help me.
This is the mapping,
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"dynamic": "true",
"_source": {
"enabled": "true"
},
"runtime": {
"start_time": {
"type": "keyword",
"script": {
"source": "doc.start_time.start_time.getHourOfDay() >=
params.min && doc.start_time.start_time.getHourOfDay()
<= params.max"
}
}
},
"properties": {
"name": {
"type": "keyword"
},
"customers": {
"type": "text"
}
}
}
}
Above statement gives error ==> "not a statement: result not used from boolean and operation [&&]"
This is the search query,which I'll try once the index will be created,
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"match" : { "name" : "Tom" }
},
{
"range": {
"start_time": {
"gte": "2015-11-01",
"lte": "2015-11-30"
}
}
},
{
"script": {
"source":
"doc.start_time.start_time.getHourOfDay()
>= params.min &&
doc.start_time.start_time.getHourOfDay() <= params.max",
"params": {
"min": 6,
"max": 7
}
}
}
]}
}
}

Need with Jolt Spec for dynamic key

My input looks like below
{
"honda" : {
"accord" : [ "30", "20" ],
"plus" : [ "20", "10", "" ]
},
"tesla" : {
"modelY" : [ "50", "20", "" ],
"modelX" : [ "20", "" ]
}
}
I want to write a spec to get only the first value in the array for each model. This is dynamic data and may vary depending on the query. The output must look something like this. I have tried an no luck with the dynamic key.
{
"honda" : {
"accord" : "30",
"plus" : "20"
},
"tesla" : {
"modelY" : "50",
"modelX" : "20"
}
}
Check this spec,
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"0": "&2.&1"
}
}
}
}
]

Filter nested array with conditions based on multi-level object values and update them - MongoDB aggregate + update

Considering I have the following documents in a collection (ignoring the _id) :
[
{
"Id": "OP01",
"Sessions": [
{
"Id": "Session01",
"Conversations": [
{
"Id": "Conversation01",
"Messages": [
{
"Id": "Message01",
"Status": "read",
"Direction": "inbound"
},
{
"Id": "Message02",
"Status": "delivered",
"Direction": "internal"
},
{
"Id": "Message03",
"Status": "delivered",
"Direction": "inbound"
},
{
"Id": "Message04",
"Status": "sent",
"Direction": "outbound"
}
]
},
{
"Id": "Conversation02",
"Messages": [
{
"Id": "Message05",
"Status": "sent",
"Direction": "outbound"
}
]
}
]
},
{
"Id": "Session02",
"Conversations": [
{
"Id": "Conversation03",
"Messages": [
{
"Id": "Message06",
"Status": "read",
"Direction": "inbound"
},
{
"Id": "Message07",
"Status": "delivered",
"Direction": "internal"
}
]
},
{
"Id": "Conversation04",
"Messages": []
}
]
}
]
},
{
"Id": "OP02",
"Sessions": [
{
"Id": "Session03",
"Conversations": []
}
]
},
{
"Id": "OP03",
"Sessions": []
}
]
First query — aggregate (+$project)
I want to get the list of Messages grouped by their Conversations where:
Sessions.Id: "Session01"
and
Sessions.Conversations.Messages.Direction $in ["inbound", "outbound"]
and
Sessions.Conversations.Messages.Status $in ["sent", "delivered"]
The expected result is:
[
{
"Id": "Conversation01",
"Messages": [
{
"Id": "Message03",
"Status": "delivered",
"Direction": "inbound"
},
{
"Id": "Message04",
"Status": "sent",
"Direction": "outbound"
}
]
},
{
"Id": "Conversation02",
"Messages": [
{
"Id": "Message05",
"Status": "sent",
"Direction": "outbound"
}
]
}
]
A side note:
If on different documents (or on different Sessions) the Sessions.Id: "Session01" condition is verified ("Session01"is not an unique key), the document's Messages that match the other conditions should also be added.
The result output doesn't mention neither the document or Sessions levels.
Second query — update
I want to update the Sessions.Conversations.Messages.Status of all those messages (same condition as before) to "read".
The collection should have now the following documents:
Please note the changes on:
Sessions.Conversations.Messages.Id = "Message03"
Sessions.Conversations.Messages.Id = "Message04"
Sessions.Conversations.Messages.Id = "Message05"
at Sessions.Id = "Session01"
[
{
"Id": "OP01",
"Sessions": [
{
"Id": "Session01",
"Conversations": [
{
"Id": "Conversation01",
"Messages": [
{
"Id": "Message01",
"Status": "read",
"Direction": "inbound"
},
{
"Id": "Message02",
"Status": "delivered",
"Direction": "internal"
},
{
"Id": "Message03",
"Status": "read",
"Direction": "inbound"
},
{
"Id": "Message04",
"Status": "read",
"Direction": "outbound"
}
]
},
{
"Id": "Conversation02",
"Messages": [
{
"Id": "Message05",
"Status": "read",
"Direction": "outbound"
}
]
}
]
},
{
"Id": "Session02",
"Conversations": [
{
"Id": "Conversation03",
"Messages": [
{
"Id": "Message06",
"Status": "read",
"Direction": "inbound"
},
{
"Id": "Message07",
"Status": "delivered",
"Direction": "internal"
}
]
},
{
"Id": "Conversation04",
"Messages": []
}
]
}
]
},
{
"Id": "OP02",
"Sessions": [
{
"Id": "Session03",
"Conversations": []
}
]
},
{
"Id": "OP03",
"Sessions": []
}
]
How can I accomplish these results with an aggregate and update_one queries?
Here comes a visual explanation of both queries:
I have written the aggregation query
db.session.aggregate([
{
$unwind:"$Sessions"
},
{
$unwind:"$Sessions.Conversations"
},
{
$unwind:"$Sessions.Conversations.Messages"
},
{
$match:{
"Sessions.Id" : "Session01",
"Sessions.Conversations.Messages.Direction":{
$in:[
"inbound", "outbound"
]
},
"Sessions.Conversations.Messages.Status":{
$in:[
"sent", "delivered"
]
}
}
},
{
$group:{
"_id":"$Sessions.Conversations.Id",
"Messages":{
$push:"$Sessions.Conversations.Messages"
}
}
}
]).pretty()
Output
{
"_id" : "Conversation02",
"Messages" : [
{
"Id" : "Message05",
"Status" : "sent",
"Direction" : "outbound"
}
]
}
{
"_id" : "Conversation01",
"Messages" : [
{
"Id" : "Message03",
"Status" : "delivered",
"Direction" : "inbound"
},
{
"Id" : "Message04",
"Status" : "sent",
"Direction" : "outbound"
}
]
}
Now to Update the document:
I have used the positional-filters
db.session.update(
{},
{
$set:{
"Sessions.$[session].Conversations.$[].Messages.$[message].Status":"read"
}
},
{
"arrayFilters": [{"session.Id":"Session01"},{ "message.Id": "Message05" }]
}
)
This will update the status as read for "session.Id":"Session01" and "message.Id": "Message05"
Hope this will help you. :)
UPDATE
db.session.update(
{},
{
$set:{
"Sessions.$[session].Conversations.$[].Messages.$[message].Status":"read"
}
},
{
"arrayFilters": [
{
"session.Id":"Session01"
},
{
"message.Direction": {
$in :[
"inbound",
"outbound"
]
},
"message.Status": {
$in :[
"sent",
"delivered"
]
}
}
]
}
)

JOLT spec on adding default values based on a condition

If my input contains "WorkflowCategory" in "metadata", then output should contain workflow.workflowInputProperties with specified default values - having duplicate values (like "" string, 3 etc). If not, workflow.workflowInputProperties should not be added.
Input 1
{
"template": false,
"active": true,
"metadata": [
{
"value": "bank_",
"key": "AssetNamePrefix"
},
{
"value": "-BERG",
"key": "SuffixForPublicId"
},
{
"value": "false",
"key": "CORSEnabled"
},
{
"value": "Capture",
"key": "WorkflowCategory"
},
{
"value": "HD",
"key": "Features"
}
],
"description": "Template for working with PRI",
"name": "prof_name",
"type": "Live",
"id": "BNK056003413",
"version": 6
}
Input 2
{
"template": false,
"active": true,
"metadata": [
{
"value": "HD",
"key": "Features"
}
],
"description": "Live Template",
"name": "Live_HD",
"type": "Live",
"id": "BNK007596994",
"version": 1
}
For Input 1, output should be
{
"id" : "BNK056003413",
"name" : "prof_name",
"metadataSet" : {
"description" : "Template for working with PRI",
"type" : "Live"
},
"workflow" : {
"workflowInputProperties" : {
"assetNamePrefix" : "bank_",
"recordId" : "",
"sourceUri":"",
"processingUri": "",
"recorderType": "ABC",
"completionTimeout": 600
"loopBackTimer": 10,
"numberOfRetries": 3,
"numberOfRetriesForScheduling": 3,
"scheduleDelay" : 3600
},
}
}
For Input 2, output should be as follows, without workflow.workflowInputProperties
{
"id" : "BNK007596994",
"name" : "Live_HD",
"metadataSet" : {
"description" : "Live Template",
"type" : "Live"
}
"features" : "HD"
}
You should add below code to the rest of your spec. The only issue is that I cannot put empty string and I change it into space. I'll try to figure it out.
[
{
"operation": "shift",
"spec": {
"metadata": {
"*": {
"key": {
"WorkflowCategory": {
"#bank_": "workflow.workflowInputProperties.assetNamePrefix",
"# ": [
"workflow.workflowInputProperties.recordId",
"workflow.workflowInputProperties.sourceUri",
"workflow.workflowInputProperties.processingUri"
],
"#ABC": "workflow.workflowInputProperties.recorderType",
"#600": "workflow.workflowInputProperties.completionTimeout",
"#10": "workflow.workflowInputProperties.loopBackTimer",
"#3": ["workflow.workflowInputProperties.numberOfRetries",
"workflow.workflowInputProperties.numberOfRetriesForScheduling"],
"#3600": "workflow.workflowInputProperties.scheduleDelay"
}
}
}
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"workflow": {
"workflowInputProperties": {
"completionTimeout": "=toInteger",
"loopBackTimer": "=toInteger",
"numberOfRetries": "=toInteger",
"numberOfRetriesForScheduling": "=toInteger",
"scheduleDelay": "=toInteger"
}
}
}
}
]

Jolt - recurcive process

I try to flatten some JSON arrays using Jolt. I found how to do it on the first level, but I need to do it "recursively".
About the input:
the array I want to flatten are name "objects"
the inner objects all contain a "name" attribute but the others attributes are not common
Here is a simple sample:
{
"objects": [
{
"name": "first",
"type": "FIRST",
"1st_attr": 0
},
{
"name": "second",
"type": "SECOND",
"2nd_attr": 0
}
],
"sub": {
"objects": [
{
"name": "third",
"type": "THIRD",
"3rd_attr": 0
}
]
}
}
Here is the output I want:
{
"first" : {
"1st_attr" : 0,
"name" : "first",
"type" : "FIRST"
},
"second" : {
"2nd_attr" : 0,
"name" : "second",
"type" : "SECOND"
},
"sub" : {
"third" : {
"3rd_attr" : 0,
"name" : "third",
"type" : "THIRD"
}
}
}
The spec I have flatten the first level, but I would like to have it flatten every level (meanining, not just the second ;)...):
[
{
"operation": "shift",
"spec": {
"objects": {
"*": {
"#": "#(1,name)"
}
},
"*": "&0"
}
}
]
Thanks for your help
With Jolt you have to know where in your input the "object" arrays are. There isn't a way to "just find and flatten all the arrays, no matter where they are in the input doc".
Spec for the input and output you provided :
[
{
"operation": "shift",
"spec": {
"objects": {
"*": "#(0,name)"
},
"sub": {
"objects": {
"*": "&2.#(0,name)"
}
}
}
}
]
Edit: added &2. to have sub.third instead of just third