how to validate datatype in jolt transformation - jolt

I'm new to jolt transformation. I was wondering if there is a way to do a validation on data type then proceed.
I'm processing a json to insert record into hbase. From source I'm getting timestamp repeated for the same resource id which I want to use for row key.
So I just retrieve the first timestamp and concate with resource id to create row key. But I have an issue when there is only one timestamp in the record i.e when its not a list. Appreciate if someone can help me how to handle this situation.
input data
{ "resource": {
"id": "200629068",
"name": "resource_name_1)",
"parent": {
"id": 200053744,
"name": "parent_name"
},
"properties": {
"AP_ifSpeed": "0",
"DisplaySpeed": "0 (NotApplicable)",
"description": "description"
}
},
"data": [
{
"metric": {
"id": "2215",
"name": "metric_name 1"
},
"timestamp": 1535064595000,
"value": 0
},
{
"metric": {
"id": "2216",
"name": "metric_name_2"
},
"timestamp": 1535064595000,
"value": 1
}
]
}
Jolt transformation
[{
"operation": "shift",
"spec": {
"resource": {
// "id": "resource_&",
"name": "resource_&",
"id": "resource_&",
"parent": {
"id": "parent_&",
"name": "parent_&"
},
"properties": {
"*": "&"
}
},
"data": {
"*": {
"metric": {
"id": {
"*": {
"#(3,value)": "&1"
}
},
"name": {
"*": {
"#(3,value)": "&1"
}
}
},
"timestamp": "timestamp"
}
}
}
}, {
"operation": "shift",
"spec": {
"timestamp": {
// get first element from list
"0": "&1"
},
"*": "&"
}
},
{
"operation": "modify-default-beta",
"spec": {
"rowkey": "=concat(#(1,resource_id),'_',#(1,timestamp))"
}
}
]
Output I'm getting
{ "resource_name" : "resource_name_1)",
"resource_id" : "200629068",
"parent_id" : 200053744,
"parent_name" : "parent_name",
"AP_ifSpeed" : "0",
"DisplaySpeed" : "0 (NotApplicable)",
"description" : "description",
"2215" : 0,
"metric_name 1" : 0,
"timestamp" : 1535064595000,
"2216" : 1,
"metric_name_2" : 1,
"rowkey" : "200629068_1535064595000"
}
when there is only one timestamp then I get
"rowkey" : "200629068_"

In your shift make the output "timestamp" always be an array, even if the incoming data array only has one element in it.
"timestamp": "timestamp[]"

Related

Facing issue with JOLT transformation with nested array

I have a JSON input :
{
"id": "Root_ID",
"Item": [
{
"id": "ID_1",
"characteristic": [
{
"name": "char1",
"value": "PRE1"
},
{
"name": "char2",
"value": "2050-01-01"
}
]
},
{
"id": "ID_2",
"characteristic": [
{
"name": "char1",
"value": "PRE2"
},
{
"name": "char2",
"value": "2050-01-02"
}
]
}
]
}
which needs to be converted by using a Jolt transformation spec to the following output :
{
"id": "Root_ID",
"Item": [
{
"id": "ID_1",
"char1": "PRE1",
"char2": "2050-01-01"
},
{
"id": "ID_2",
"char1": "PRE2",
"char2": "2050-01-02"
}
]
}
Currently, I'm using this spec :
[
{
"operation": "shift",
"spec": {
"id": "id",
"Item": {
"*": {
"characteristic": {
"*": {
"name": {
"char1": {
"#(2,value)": "item[#3].char1"
},
"char2": {
"#(2,value)": "item[#3].char2"
}
}
}
}
}
}
}
}
]
which does not produce the desired result.
Can you please help me prepare a correct spec to handle this issue ?
Edit : What if I'd like to get the following JSON result ?
{
"id": "Root_ID",
"Item": [
{
"id": "ID_1",
"char1": "PRE1"
},
{
"id": "ID_2",
"char1": "PRE2",
"char2": "2050-01-02"
}
]
}
You can use the following shift transformation spec in which #value and #name are matched reciprocally such as
[
{
"operation": "shift",
"spec": {
"id": "&",
"Item": {
"*": {
"id": "&2[&1].&",// you can reduce two levels based on the inner identifier 4,3 -> 2,1
"char*": {
"*": {
"#value": "&4[&3].#name" // &4 copies the literal "Item" after going four levels up the tree, [&3] generates array-wise result(array of objects) after going three levels up the tree to reach the level of the indexes of the "Item" array
}
}
}
}
}
}
]
Edit : You can alternatively use the following spec for the case in which you need to return char2 within the all objects but the first :
[
{
"operation": "shift",
"spec": {
"id": "&",
"Item": {
"0": {// stands for the first index
"id": "&2[&1].&",
"char*": {
"*": {
"name": {
"char1": {
"#2,value": "&6[&5].#3,name"
}
}
}
}
},
"*": {
"id": "&2[&1].&",
"char*": {
"*": {
"#value": "&4[&3].#name"
}
}
}
}
}
}
]

JOLT Nested if condition

I have this input JSON:
{
"user": "123456",
"product": "television",
"category": "electronics",
"tag": "summer"
}
And this transformation:
[
{
"operation": "shift",
"spec": {
"product": {
"#(1,product)": "item",
"#(1,user)": {
"#2": "userBias"
}
},
"user": {
"#(1,user)": "user"
},
"category": {
"#category": "rules.[0].name",
"#(1,category)": "rules.[0].values[0]"
},
"tag": {
"rules": "rules",
"#tag": "rules.[1].name",
"#(2,tag)": "rules.[1].values[0]"
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"userBias?": "=toInteger"
}
}
]
Which works fine and produces the following JSON:
{
"item": "television",
"userBias": 2,
"user": "123456",
"rules": [
{
"name": "category",
"values": [
"electronics"
]
},
{
"name": "tag",
"values": [
"summer"
]
}
]
}
If from the input though I delete "category": "electronics" so it becomes:
{
"user": "123456",
"product": "television",
"tag": "summer"
}
Then i get back the following result:
{
"item": "television",
"userBias": 2,
"user": "123456",
"rules": [
null,
{
"name": "tag",
"values": [
"summer"
]
}
]
}
The problem with the above is that it contains a null element inside the array and I do not know how to get rid of it. I have tried with recursivelySquashNulls but it does not work.
Also basically what am looking for is if both category and tag exist then tag should go to rules[1] if only tag exists then tag should go to rules[0].
Thanks in advance,
giannis
Because you specify tag and category elements individually. Rather, prefer putting them into the category rest by combining them under asterisked key notation such as
[
{
"operation": "shift",
"spec": {
"product": {
"#(1,product)": "item",
"#(1,user)": {
"#2": "userBias"
}
},
"user": {
"#(1,user)": "user"
},
"*": {
"$": "r[0].&.name",
"#(1,&)": "r[0].&.values[]"
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"r": {
"*": { "*": "rules" }
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"userBias?": "=toInteger"
}
}
]
Result1 :
Result2(without "category": "electronics" pair for the Input) :

JOLT spec on adding default values based on a condition

If my input contains "WorkflowCategory" in "metadata", then output should contain workflow.workflowInputProperties with specified default values - having duplicate values (like "" string, 3 etc). If not, workflow.workflowInputProperties should not be added.
Input 1
{
"template": false,
"active": true,
"metadata": [
{
"value": "bank_",
"key": "AssetNamePrefix"
},
{
"value": "-BERG",
"key": "SuffixForPublicId"
},
{
"value": "false",
"key": "CORSEnabled"
},
{
"value": "Capture",
"key": "WorkflowCategory"
},
{
"value": "HD",
"key": "Features"
}
],
"description": "Template for working with PRI",
"name": "prof_name",
"type": "Live",
"id": "BNK056003413",
"version": 6
}
Input 2
{
"template": false,
"active": true,
"metadata": [
{
"value": "HD",
"key": "Features"
}
],
"description": "Live Template",
"name": "Live_HD",
"type": "Live",
"id": "BNK007596994",
"version": 1
}
For Input 1, output should be
{
"id" : "BNK056003413",
"name" : "prof_name",
"metadataSet" : {
"description" : "Template for working with PRI",
"type" : "Live"
},
"workflow" : {
"workflowInputProperties" : {
"assetNamePrefix" : "bank_",
"recordId" : "",
"sourceUri":"",
"processingUri": "",
"recorderType": "ABC",
"completionTimeout": 600
"loopBackTimer": 10,
"numberOfRetries": 3,
"numberOfRetriesForScheduling": 3,
"scheduleDelay" : 3600
},
}
}
For Input 2, output should be as follows, without workflow.workflowInputProperties
{
"id" : "BNK007596994",
"name" : "Live_HD",
"metadataSet" : {
"description" : "Live Template",
"type" : "Live"
}
"features" : "HD"
}
You should add below code to the rest of your spec. The only issue is that I cannot put empty string and I change it into space. I'll try to figure it out.
[
{
"operation": "shift",
"spec": {
"metadata": {
"*": {
"key": {
"WorkflowCategory": {
"#bank_": "workflow.workflowInputProperties.assetNamePrefix",
"# ": [
"workflow.workflowInputProperties.recordId",
"workflow.workflowInputProperties.sourceUri",
"workflow.workflowInputProperties.processingUri"
],
"#ABC": "workflow.workflowInputProperties.recorderType",
"#600": "workflow.workflowInputProperties.completionTimeout",
"#10": "workflow.workflowInputProperties.loopBackTimer",
"#3": ["workflow.workflowInputProperties.numberOfRetries",
"workflow.workflowInputProperties.numberOfRetriesForScheduling"],
"#3600": "workflow.workflowInputProperties.scheduleDelay"
}
}
}
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"workflow": {
"workflowInputProperties": {
"completionTimeout": "=toInteger",
"loopBackTimer": "=toInteger",
"numberOfRetries": "=toInteger",
"numberOfRetriesForScheduling": "=toInteger",
"scheduleDelay": "=toInteger"
}
}
}
}
]

JOLT spec for ternary operation

I am trying to write a jolt spec for conversion of the following input in the expected output mentioned below
INPUT :
{
"city": "Seattle",
"state": "WA",
"country": "US",
"date": "10/20/2018",
"userList": [
{
"name": "David",
"age": "22",
"sex": "M",
"company": "good"
},
{
"name": "Tom",
"age": "30",
"sex": "M",
"company": "good"
},
{
"name": "Annie",
"age": "25",
"sex": "F",
"company": "bad"
},
{
"name": "Aaron",
"age": "27",
"sex": "M",
"company": "bad"
}
]
}
EXPECTED OUTPUT:
{
"users" : [ {
"date" : "10/20/2018",
"username" : "David",
"age" : "22",
"sex" : "M",
"organization" : "good"
}, {
"date" : "10/20/2018",
"username" : "Tom",
"age" : "30",
"sex" : "M",
"organization" : "good"
} ],
"Date" : "10/20/2018",
"State" : "WA",
"Country" : "US"
}
I want to filter out all the elements in the list where company = bad or sex = F.Or alternatively keep only the elements where company = good and sex=M.
I need help in removing the elements from the list based on the specific conditions.
Is jolt recommended for such data driven conversions?
The Spec that I have written so far is
[{
"operation": "shift",
"spec": {
"userList": {
"*": {
"name": "users.[&1].username",
"age": "users.[&1].age",
"sex": "users.[&1].sex",
"company": "users.[&1].organization",
"#(2,date)": "users.[&1].date"
}
},
"date": "Date",
"state": "State",
"country": "Country"
}
}
]
ok, here is the process I followed to come up to a solution for the "take all M+good"
copy top level element as-is, not impacting the scope
for users content, go to the first field used for filtering "users -> * -> sex -> M" and copy all fields in that case
do the same for the other field used for filtering
in step 2 and 3, I used a map to avoid the "null" in the array. Now switching back to array
[{
"operation": "shift",
"spec": {
"userList": {
"*": {
"name": "users.[&1].username",
"age": "users.[&1].age",
"sex": "users.[&1].sex",
"company": "users.[&1].organization",
"#(2,date)": "users.[&1].date"
}
},
"date": "Date",
"state": "State",
"country": "Country"
}
}
,
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": {
"sex": {
"M": {
"#(2,username)": "users.&3.username",
"#(2,age)": "users.&3.age",
"#(2,sex)": "users.&3.sex",
"#(2,organization)": "users.&3.organization",
"#(2,date)": "users.&3.date"
}
}
}
}
}
}
,
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": {
"organization": {
"good": {
"#(2,username)": "users.&3.username",
"#(2,age)": "users.&3.age",
"#(2,sex)": "users.&3.sex",
"#(2,organization)": "users.&3.organization",
"#(2,date)": "users.&3.date"
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": "users[]"
}
}
}
]
For the other case, you just have to change the filter rules to do nothing when the rule match and copy the field in the other cases (*). Which give following spec (did it only on first field, updating for second field should not be an issue)
[{
"operation": "shift",
"spec": {
"userList": {
"*": {
"name": "users.[&1].username",
"age": "users.[&1].age",
"sex": "users.[&1].sex",
"company": "users.[&1].organization",
"#(2,date)": "users.[&1].date"
}
},
"date": "Date",
"state": "State",
"country": "Country"
}
}
,
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": {
"sex": {
"F": null,
"*": {
"#(2,username)": "users.&3.username",
"#(2,age)": "users.&3.age",
"#(2,sex)": "users.&3.sex",
"#(2,organization)": "users.&3.organization",
"#(2,date)": "users.&3.date"
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": "&",
"users": {
"*": "users[]"
}
}
}
]

How can I use CloudKit web services to query based on a reference field?

I've got two CloudKit data objects that look somewhat like this:
Parent Object:
{
"records": [
{
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"recordType": "ParentObject",
"fields": {
"fsYear": {
"value": "2015",
"type": "STRING"
},
"displayOrder": {
"value": 2015221153856287200,
"type": "INT64"
},
"fjpFSGuidForReference": {
"value": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"type": "STRING"
},
"fsDateSearch": {
"value": "2015221153856287158",
"type": "STRING"
},
},
"recordChangeTag": "id4w7ivn",
"created": {
"timestamp": 1439149087571,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
},
"modified": {
"timestamp": 1439149087571,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
}
}
],
"total":
}
Child Object:
{
"records": [
{
"recordName": "2015221153856287168",
"recordType": "ChildObject",
"fields": {
"District": {
"value": "002",
"type": "STRING"
},
"ZipCode": {
"value": "12345",
"type": "STRING"
},
"InspecReference": {
"value": {
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"action": "NONE",
"zoneID": {
"zoneName": "_defaultZone"
}
},
"type": "REFERENCE"
},
},
"recordChangeTag": "id4w7lew",
"created": {
"timestamp": 1439149090856,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
},
"modified": {
"timestamp": 1439149090856,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
}
}
],
"total": 1
}
I'm trying to write a query to directly access the CloudKit web service and return the Child Object based on the reference of the parent object.
My test JSON looks something like this:
{"query":{"recordType":"ChildObject","filterBy":{"fieldName":"InspecReference","fieldValue":{ "value" : "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57", "type" : "string" },"comparator":"EQUALS"}},"zoneID":{"zoneName":"_defaultZone"}}
However, I'm getting the following error from CloudKit:
{"uuid":"33db91f3-b768-4a68-9056-216ecc033e9e","serverErrorCode":"BAD_REQUEST","reason":"BadRequestException:
Unexpected input"}
I'm guessing I have the Record Field Dictionary in the query wrong. However, the documentation isn't clear on what this should look like on a reference object.
You have to re-create the actual object of the reference. In this particular case, the JSON looks like this:
{
"query": {
"recordType": "ChildObject",
"filterBy": {
"fieldName": "InspecReference",
"fieldValue": {
"value": {
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"action": "NONE"
},
"type": "REFERENCE"
},
"comparator": "EQUALS"
}
},
"zoneID": {
"zoneName": "_defaultZone"
}
}