Summarise keys and values from all json objects using jolt - jolt

I have a JSON input jolt transformation
[
{
"System": {
"Provider": {
"Name": "Microsoft-Windows-Eventlog",
"Guid": "{sdada}"
}
},
"EventID": "3434",
"EventData": {
"SubjectUserSid": "3455",
"SubjectUserName": "abc",
"SubjectDomainName": "def",
"SubjectLogonId": "e4545",
"ObjectServer": "dggg",
"ObjectType": "eet"
}
},
{
"System": {
"Provider": {
"Name": "Microsoft-Windows-Eventlog",
"Guid": "{sdada1}"
},
"EventID": "3435"
}
}
]
As you can see Event data is present in the first JSON but not in the second JSON object in the array
My desired output is :
[
{
"winlog": {
"provider_name": "Microsoft-Windows-Eventlog",
"provider_guid": "{sdada}",
"EventID": "3434",
"event_data": {
"SubjectUserSid": "3455",
"SubjectUserName": "abc",
"SubjectDomainName": "def",
"SubjectLogonId": "e4545",
"ObjectServer": "dggg",
"ObjectType": "eet"
}
}
},
{
"winlog": {
"provider_name": "Microsoft-Windows-Eventlog",
"provider_guid": "{sdada1}",
"EventID": "3435",
"event_data": {
"SubjectUserSid": null,
"SubjectUserName": null,
"SubjectDomainName": null,
"SubjectLogonId": null,
"ObjectServer": null,
"ObjectType": null
}
}
}
]
I want to event data key in both the objects with the value as null for the second object
[
{
"operation": "shift",
"spec": {
"*": {
"System": {
"Provider": {
"Name": "[&3].winlog.provider_name",
"Guid": "[&3].winlog.provider_guid"
}
},
"EventData": "[&1].winlog.event_data"
}
}
}]
My output is
[ {
"winlog" : {
"provider_name" : "Microsoft-Windows-Eventlog",
"provider_guid" : "{sdada}",
"event_data" : {
"SubjectUserSid" : "3455",
"SubjectUserName" : "abc",
"SubjectDomainName" : "def",
"SubjectLogonId" : "e4545",
"ObjectServer" : "dggg",
"ObjectType" : "eet"
}
}
}, {
"winlog" : {
"provider_name" : "Microsoft-Windows-Eventlog",
"provider_guid" : "{sdada1}"
}
} ]
In short how to summarize keys for each array JSON

Firstly, you can put default values into the tree like this:
{
"operation": "modify-default-beta",
"spec": {
"*": {
"EventData": {
"SubjectUserSid": null,
"SubjectUserName": null,
"SubjectDomainName": null,
"SubjectLogonId": null,
"ObjectServer": null,
"ObjectType": null
}
}
}
}

Related

Elasticsearch search by time range

Trying to search based on date and time separately,
elastic-search document format is,
{
"id": "101",
"name": "Tom",
"customers": ["Jerry", "Nancy", "soli"],
"start_time": "2021-12-13T06:57:29.420198Z",
"end_time": "2021-12-13T07:00:23.511722Z",
}
I need to search based on date and time separately,
Ex:
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"range": {
"start_time": {"gte" : "2021-12-13", "lte" : "2021-12-15" }}
}
]}
}
}
o/p: I am getting the above doc as the result which is expected.
but when I use the below query, then I am getting errors,
"failed to parse date field [6:57:29] with format [strict_date_optional_time||epoch_millis]: [failed to parse date field [6:57:29] with format [strict_date_optional_time||epoch_millis]]"
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"range": {
"start_time": {"gte" : "6:57:29", "lte" : "6:59:35" }}
}
]}
}
}
Why I am not able to get the result based on time?
is there any idea to achieve a search based on both date and time with the single field?
Ex:
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"range": {
"start_time": {"gte" : "2021-12-13", "lte" : "2021-12-15" }}
},
{
"range": {
"start_time": {"gte" : "6:57:29", "lte" : "6:59:35" }}
}
]}
}
}
I also tried to achieve this using regular expressions, but it didn't help me.
This is the mapping,
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"dynamic": "true",
"_source": {
"enabled": "true"
},
"runtime": {
"start_time": {
"type": "keyword",
"script": {
"source": "doc.start_time.start_time.getHourOfDay() >=
params.min && doc.start_time.start_time.getHourOfDay()
<= params.max"
}
}
},
"properties": {
"name": {
"type": "keyword"
},
"customers": {
"type": "text"
}
}
}
}
Above statement gives error ==> "not a statement: result not used from boolean and operation [&&]"
This is the search query,which I'll try once the index will be created,
{
"query": {
"bool" : {
"must" : [
{
"match" : { "customers" : "Jerry" }
},
{
"match" : { "name" : "Tom" }
},
{
"range": {
"start_time": {
"gte": "2015-11-01",
"lte": "2015-11-30"
}
}
},
{
"script": {
"source":
"doc.start_time.start_time.getHourOfDay()
>= params.min &&
doc.start_time.start_time.getHourOfDay() <= params.max",
"params": {
"min": 6,
"max": 7
}
}
}
]}
}
}

JSON conversion using JOLT

I am trying to convert a JSON to different format using JOLT (using NiFi JoltTransformJson processor). For single JSON record, the JOLT am using is working fine in JOLT app demo whereas if i execute with multiple JSON records then I am not getting expected output in JOLT app demo. Could anyone correct me what additional changes I need to do in JOLT spec to handle multiple JSON records.
sample input json
[
{
"pool": {
"field": [
{
"name": "BillingDay",
"value": "12"
},
{
"name": "Custom1",
"value": "POOL_BASE_PLAN_3GB"
}
]
},
"usage": {
"version": "3",
"quota": {
"name": "POOL_TOP_UP_1GB_2",
"cid": "5764888998010953848"
}
},
"state": {
"version": "1",
"property": [
{
"name": "SMS_RO_TOP",
"value": "1"
},
{
"name": "BillingTimeStamp",
"value": "2020-06-12T01:00:05"
},
{
"name": "timereset",
"value": "2020-01-12T00:35:53"
}
]
}
},
{
"pool": {
"field": [
{
"name": "PoolID",
"value": "111100110000003505209"
},
{
"name": "BillingDay",
"value": "9"
}
]
},
"usage": {
"version": "3"
},
"state": {
"version": "1",
"property": [
{
"name": "BillingTimeStamp",
"value": "2020-06-09T01:00:05"
},
{
"name": "timereset",
"value": "2019-03-20T17:10:38"
}
]
}
}
]
JOLT using:
[
{
"operation": "modify-default-beta",
"spec": {
"state": {
"property": {
"name": "NOTAVAILABLE"
}
},
"usage": {
"quota": {
"name": "NOTAVAILABLE"
}
}
}
},
{
"operation": "shift",
"spec": {
"pool": {
"field": {
"*": {
"value": "pool_item.#(1,name)"
}
}
},
// remaining elements print as it is
"*": "&"
}
}
]
Expected output JSON:
[
{
"pool_item" : {
"BillingDay" : "12",
"Custom1" : "POOL_BASE_PLAN_3GB"
},
"usage" : {
"version" : "3",
"quota" : {
"name" : "POOL_TOP_UP_1GB_2",
"cid" : "5764888998010953848"
}
},
"state" : {
"version" : "1",
"property" : [ {
"name" : "SMS_RO_TOP",
"value" : "1"
}, {
"name" : "BillingTimeStamp",
"value" : "2020-06-12T01:00:05"
}, {
"name" : "timereset",
"value" : "2020-01-12T00:35:53"
} ]
}
},
{
"pool_item" : {
"BillingDay" : "9",
"PoolID" : "111100110000003505209"
},
"usage" : {
"version" : "3",
"quota" : {
"name" : "NOTAVAILABLE"
}
},
"state" : {
"version" : "1",
"property" : [ {
"name" : "SMS_RO_TOP",
"value" : "1"
}, {
"name" : "BillingTimeStamp",
"value" : "2020-06-12T01:00:05"
}, {
"name" : "timereset",
"value" : "2020-01-12T00:35:53"
} ]
}
}
]
This below jolt shift specification will work for your multiple json's in input array.
[
{
"operation": "shift",
"spec": {
"*": {
"pool": {
"field": {
"*": {
"value": "[&4].pool_item.#(1,name)"
}
}
},
"usage": "[&1].usage",
"state": "[&1].state"
}
}
}
]

JOLT - filtering array based on object value

how can I do this?
This is the array....
Can you please help me?
Can you please give me the answer???? Thanks a lot
{
"results": {
"data": [
{
"name": "xx",
"typeRelationship": [
{
"relationship": "parent",
"type": {
"id": "yyyyy",
}
}
],
"id": "xxxxxxxx"
},
{
"name": "yy",
"typeRelationship": [
{
"relationshipType": "parent",
"type": {
"id": "CCCC"
}
},
{
"relationshipType": "child",
"service": {
"id": "DDDD"
}
},
{
"relationshipType": "child",
"service": {
"id": "xxxxxxxx"
}
}
],
"id": "yyyyy"
}
]
}}
expected:
This is expected:
{
"data" : [ {
"id" : "xxxx",
"href" : "xxxxxx",
"relation":"parent"
} ]
}
For some reason I need to type so it does let me update!!!
This works.
[
{
"operation": "shift",
"spec": {
"data": {
"*": {
"type": {
"id": {
"xxxx": {
"#3": "data[]"
}
}
}
}
}
}
}
]
Edit 1
The below spec moves all the values which as id=xxxxx to the data array.
[
{
"operation": "shift",
"spec": {
"data": {
"*": {
"type": {
"*": {
"id": {
"xxxx": {
"#(2)": "data[]",
"#(4,relation)": "data[&3].relation"
}
}
}
}
}
}
}
}
]
This totally works.
Thanks.
Can you please let me know what is 2? 3? 4?
Because my array is a bit different and I want to fix those numbers but does not work....
{
"results": {
"data": [
{
"name": "xx",
"typeRelationship": [
{
"relationship": "parent",
"type": {
"id": "yyyyy",
}
}
],
"id": "xxxxxxxx"
},
{
"name": "yy",
"typeRelationship": [
{
"relationshipType": "parent",
"type": {
"id": "CCCC"
}
},
{
"relationshipType": "child",
"service": {
"id": "DDDD"
}
},
{
"relationshipType": "child",
"service": {
"id": "xxxxxxxx"
}
}
],
"id": "yyyyy"
}
]
}
}
expected:
{
"rows" : [ {
"rowdata" : {
"relationshipType" : "child",
"Name" : "yy",
"id" : "yyyyy"
}
} ]
}

Need JOLT spec file for transfer of complex JSON

I have a complex JSON object (I've simplified it for this example) that I cannot figure out the JOLT transform JSON for. Does anybody have any ideas of what the JOLT spec file should be?
Original JSON
[
{
"date": {
"isoDate": "2019-03-22"
},
"application": {
"name": "SiebelProject"
},
"applicationResults": [
{
"reference": {
"name": "Number of Code Lines"
},
"result": {
"value": 44501
}
},
{
"reference": {
"name": "Transferability"
},
"result": {
"grade": 3.1889542208002064
}
}
]
},
{
"date": {
"isoDate": "2019-03-21"
},
"application": {
"name": "SiebelProject"
},
"applicationResults": [
{
"reference": {
"name": "Number of Code Lines"
},
"result": {
"value": 45000
}
},
{
"reference": {
"name": "Transferability"
},
"result": {
"grade": 3.8
}
}
]
}
]
Desired JSON after transformation and sorting by "Name" ASC, "Date" DESC
[
{
"Name": "SiebelProject",
"Date": "2019-03-22",
"Number of Code Lines": 44501,
"Transferability" : 3.1889542208002064
},
{
"Name": "SiebelProject",
"Date": "2019-03-21",
"Number of Code Lines": 45000,
"Transferability" : 3.8
}
]
I couldn't find a way to do the sort (I'm not even sure you can sort descending in JOLT) but here's a spec to do the transform:
[
{
"operation": "shift",
"spec": {
"*": {
"date": {
"isoDate": "[#3].Date"
},
"application": {
"name": "[#3].Name"
},
"applicationResults": {
"*": {
"reference": {
"name": {
"Number of Code Lines": {
"#(3,result.value)": "[#7].Number of Code Lines"
},
"Transferability": {
"#(3,result.grade)": "[#7].Transferability"
}
}
}
}
}
}
}
}
]
After that there are some tools (like jq I think) that could do the sort.

how to validate datatype in jolt transformation

I'm new to jolt transformation. I was wondering if there is a way to do a validation on data type then proceed.
I'm processing a json to insert record into hbase. From source I'm getting timestamp repeated for the same resource id which I want to use for row key.
So I just retrieve the first timestamp and concate with resource id to create row key. But I have an issue when there is only one timestamp in the record i.e when its not a list. Appreciate if someone can help me how to handle this situation.
input data
{ "resource": {
"id": "200629068",
"name": "resource_name_1)",
"parent": {
"id": 200053744,
"name": "parent_name"
},
"properties": {
"AP_ifSpeed": "0",
"DisplaySpeed": "0 (NotApplicable)",
"description": "description"
}
},
"data": [
{
"metric": {
"id": "2215",
"name": "metric_name 1"
},
"timestamp": 1535064595000,
"value": 0
},
{
"metric": {
"id": "2216",
"name": "metric_name_2"
},
"timestamp": 1535064595000,
"value": 1
}
]
}
Jolt transformation
[{
"operation": "shift",
"spec": {
"resource": {
// "id": "resource_&",
"name": "resource_&",
"id": "resource_&",
"parent": {
"id": "parent_&",
"name": "parent_&"
},
"properties": {
"*": "&"
}
},
"data": {
"*": {
"metric": {
"id": {
"*": {
"#(3,value)": "&1"
}
},
"name": {
"*": {
"#(3,value)": "&1"
}
}
},
"timestamp": "timestamp"
}
}
}
}, {
"operation": "shift",
"spec": {
"timestamp": {
// get first element from list
"0": "&1"
},
"*": "&"
}
},
{
"operation": "modify-default-beta",
"spec": {
"rowkey": "=concat(#(1,resource_id),'_',#(1,timestamp))"
}
}
]
Output I'm getting
{ "resource_name" : "resource_name_1)",
"resource_id" : "200629068",
"parent_id" : 200053744,
"parent_name" : "parent_name",
"AP_ifSpeed" : "0",
"DisplaySpeed" : "0 (NotApplicable)",
"description" : "description",
"2215" : 0,
"metric_name 1" : 0,
"timestamp" : 1535064595000,
"2216" : 1,
"metric_name_2" : 1,
"rowkey" : "200629068_1535064595000"
}
when there is only one timestamp then I get
"rowkey" : "200629068_"
In your shift make the output "timestamp" always be an array, even if the incoming data array only has one element in it.
"timestamp": "timestamp[]"