Jolt JSON Conversion from String value to Long - jolt

I am using Jolt to convert one json to another json. Everything is working fine except I want to convert String value to Long. Below is my Specs and input. I have use modify-overwrite-beta but no luck.
Specs -
[
{
"operation": "modify-overwrite-beta",
"spec": {
"timestamp": "=toLong(#(1,time))"
}
},
{
"operation": "shift",
"spec": {
"key1": "outputText1",
"key2": "outputText2",
"key3": "outputText3",
"time": "timestamp"
}
}
]
Input Json
{
"key1": "test1",
"time": "1499967627",
"key2": "test2",
"key3": "test3",
}
So in above input json how I can convert time value to Long
Expected Json :
{
"outputText1": "test1",
"timestamp": 1499967627,
"outputText2": "test2",
"outputText3": "test3",
}

Spec
[
{
"operation": "modify-overwrite-beta",
"spec": {
"timestamp": "=toLong(#(1,time))"
}
},
{
"operation": "shift",
"spec": {
"key1": "outputText1",
"key2": "outputText2",
"key3": "outputText3",
// pass timestamp thru
"timestamp": "timestamp"
}
}
]
In the first operation (modify) it was making "timestamp" be a long. But then in the 2nd operation, you were copying the String value from "time" to timestamp, instead of passing timestamp thru.

Related

Add Array Index using JOLT

I want to add a kind of line number or identifier with jolt for each array element.
Given Array:
[
{
"key1": "value1",
"key2": "value2",
"neyN": "valueN"
},
{
"key1": "value1",
"key2": "value2",
"neyN": "valueN"
}
]
Expected Result:
[
{
"key1": "value1",
"key2": "value2",
"neyN": "valueN",
"id": 0
},
{
"key1": "value1",
"key2": "value2",
"neyN": "valueN",
"id": 1
}
]
I tried now default, shift and more, but was not able to find a right solution.
Can someone help me?
Thanks in advance
Marcus
Spec 1 : Group the keys along with id field using index number.
Spec 2 : Remove the index number key from the result array.
[
{
"operation": "shift",
"spec": {
"*": {
"#": "&1",
"$": "&1.id"
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"#": "[]"
}
}
}
]

Map Jolt Transformation

I am trying to transform the following input:
{
"test_types": {
"3": "Type3",
"21": "type21",
"16": "Type16",
"15": "type15"
},
"name": "BlackSquid",
"obj_name": "malware",
"value": "2341743",
"type": "16"
}
Into this output:
{
"test_types": {
"3": "Type3",
"21": "type21",
"16": "Type16",
"15": "type15"
},
"name": "BlackSquid",
"obj_name": "malware",
"value": "2341743",
"type": "16",
"type_name": "Type16"
}
I have tried to use a "modify-overwrite-beta" spec by hardcoding the key value which gives me the result I need:
[
{
"operation": "modify-overwrite-beta",
"spec": {
"type_name": "#(1,test_types.16)"
}
}
]
but I would like to use the "type" value dynamically. The following throws an exception but maybe something like this:
[
{
"operation": "modify-overwrite-beta",
"spec": {
"type_name": "#(1,test_types.#(1,type))"
}
}
]
No need to use modify-overwrite-beta spec, but condition based shift spec for the key type would suffice such as
[{
"operation": "shift",
"spec": {
"*": "&",
"type": {
"*": {
"#(2,type)": "type",
"#(2,test_types.&)": "type_name"
}
}
}
}]

JOLT Transformation Merge Array of Objects

I am trying to create a jolt transformation for the below input;
{
"group1": [
{
"schema": "schemaA"
},
{
"key1": "val1",
"key2": "val2"
}
],
"group2": [
{
"schema": "schemaA"
},
{
"key1": "val1",
"key2": "val2"
}
]}
With the desired output of;
{
"group1": {
"schema": "schemaA",
"key1": "val1",
"key2": "val2"
},
"group2": {
"schema": "schemaA",
"key1": "val1",
"key2": "val2"
}}
The key 'schema' will always be present but I won't know what the key1,key2,etc values are. So I can't explicitly map them. Any help would be much appreciated!
Spec,
[
{
"operation": "shift",
"spec": {
"group*": {
"*": {
"key*": "&2.&",
"schema": "&2.&"
}
}
}
}
]

How to change a name of one field in large Json using Jolt

I have a big Json document:
{ "field1": "value1",
"field2": "value2",
"field3": "value3",
...
"field1000": "value1000"
}
I want to change the name of one field (field3) to third_field
How to do it without writing specification like that:
[
{
"operation": "shift",
"spec": {
"field1": "field1",
"field2": "field2",
"field3": "third_field",
...
"field1000": "field1000"
}
}
]
This should work and essentially does an if then else
[
{
"operation": "shift",
"spec": {
//if
"field3": {
//$ - current value
"$": "third_field"
},
//else
"*": {
//$ - current value
//& - current key
"$": "&"
}
}
}
]

how to validate datatype in jolt transformation

I'm new to jolt transformation. I was wondering if there is a way to do a validation on data type then proceed.
I'm processing a json to insert record into hbase. From source I'm getting timestamp repeated for the same resource id which I want to use for row key.
So I just retrieve the first timestamp and concate with resource id to create row key. But I have an issue when there is only one timestamp in the record i.e when its not a list. Appreciate if someone can help me how to handle this situation.
input data
{ "resource": {
"id": "200629068",
"name": "resource_name_1)",
"parent": {
"id": 200053744,
"name": "parent_name"
},
"properties": {
"AP_ifSpeed": "0",
"DisplaySpeed": "0 (NotApplicable)",
"description": "description"
}
},
"data": [
{
"metric": {
"id": "2215",
"name": "metric_name 1"
},
"timestamp": 1535064595000,
"value": 0
},
{
"metric": {
"id": "2216",
"name": "metric_name_2"
},
"timestamp": 1535064595000,
"value": 1
}
]
}
Jolt transformation
[{
"operation": "shift",
"spec": {
"resource": {
// "id": "resource_&",
"name": "resource_&",
"id": "resource_&",
"parent": {
"id": "parent_&",
"name": "parent_&"
},
"properties": {
"*": "&"
}
},
"data": {
"*": {
"metric": {
"id": {
"*": {
"#(3,value)": "&1"
}
},
"name": {
"*": {
"#(3,value)": "&1"
}
}
},
"timestamp": "timestamp"
}
}
}
}, {
"operation": "shift",
"spec": {
"timestamp": {
// get first element from list
"0": "&1"
},
"*": "&"
}
},
{
"operation": "modify-default-beta",
"spec": {
"rowkey": "=concat(#(1,resource_id),'_',#(1,timestamp))"
}
}
]
Output I'm getting
{ "resource_name" : "resource_name_1)",
"resource_id" : "200629068",
"parent_id" : 200053744,
"parent_name" : "parent_name",
"AP_ifSpeed" : "0",
"DisplaySpeed" : "0 (NotApplicable)",
"description" : "description",
"2215" : 0,
"metric_name 1" : 0,
"timestamp" : 1535064595000,
"2216" : 1,
"metric_name_2" : 1,
"rowkey" : "200629068_1535064595000"
}
when there is only one timestamp then I get
"rowkey" : "200629068_"
In your shift make the output "timestamp" always be an array, even if the incoming data array only has one element in it.
"timestamp": "timestamp[]"