How would I use the attribute as input for Value with JOLT? - jolt

For a specific function that I am building, I need to parse my JSON and have in some cases the attribute, instead of the value itself, be used as the value for the attribute. But how do I manage that with JOLT?
Let's say this is my input:
{
"Results": [
{
"FirstName": "John",
"LastName": "Doe"
},
{
"FirstName": "Mary",
"LastName": "Joe"
},
{
"FirstName": "Thomas",
"LastName": "Edison"
}
]
}
And this should be the outcome:
{
"Results": [
{
"Name": "FirstName",
"Value": "John"
},
{
"Name": "FirstName",
"Value": "Mary"
},
{
"Name": "FirstName",
"Value": "Thomas"
},
{
"Name": "LastName",
"Value": "Doe"
},
{
"Name": "LastName",
"Value": "Doe"
},
{
"Name": "LastName",
"Value": "Edison"
},
]
}
For those interested.. I'm building a JSON to Excel export functionality in Mendix and it has to be completely dynamic, regardless of the input. To accomplish this, I need an array where each attribute (equal to a column in Excel) has to be it's own object with a column name and a value. If each column data is it's own object, I can simply say "create column for each object with the same "Name". Little bit difficult to explain, but it 'should' work.

Arrays and Jolt, are not the best. Basically there are 3 ways to deal with arrays in Shift.
you explicitly assign data to an array position. Aka foo[0] and foo[1]
you reference a "number" that exists in the input data. Aka foo[&2] and foo[&3]
you "accumulate" data into a list. Aka foo[].
Your input data is array of size 3. Your desired output is an array of size 6. You want this to be flexible and be able to handle variable inputs.
This means option 3. So you have to "fix" / process your data into it "final form", while maintaining the original input Json structure (of a list with 3 items), and then accumulate all the "built" items into a list.
This means that you are buildling a list of lists, and then finally "squashing" it down to a single list.
Spec
[
{
// Step 1 : Pivot the data into parallel lists of keys and values
// maintaining the original outer input list structure.
"operation": "shift",
"spec": {
"Results": {
"*": { // results index
"*": { // FirstName / Lastname
"$": "temp[&2].keys[]",
"#": "temp[&2].values[]"
}
}
}
}
},
{
// Step 2 : Un-pivot the data into the desired
// Name/Value pairs, using the inner array index to
// keep things organized/separated.
"operation": "shift",
"spec": {
"temp": {
"*": { // temp index
"keys": {
"*": "temp[&2].[&].Name"
},
"values": {
"*": "temp[&2].[&].Value"
}
}
}
}
},
{
// Step 3 : Accumulate the "finished" Name/Value pairs
// into the final "one big list" output.
"operation": "shift",
"spec": {
"temp": {
"*": { // outer array
"*": "Results[]"
}
}
}
}
]

Related

JSON Schema - can array / list validation be combined with anyOf?

I have a json document I'm trying to validate with this form:
...
"products": [{
"prop1": "foo",
"prop2": "bar"
}, {
"prop3": "hello",
"prop4": "world"
},
...
There are multiple different forms an object may take. My schema looks like this:
...
"definitions": {
"products": {
"type": "array",
"items": { "$ref": "#/definitions/Product" },
"Product": {
"type": "object",
"oneOf": [
{ "$ref": "#/definitions/Product_Type1" },
{ "$ref": "#/definitions/Product_Type2" },
...
]
},
"Product_Type1": {
"type": "object",
"properties": {
"prop1": { "type": "string" },
"prop2": { "type": "string" }
},
"Product_Type2": {
"type": "object",
"properties": {
"prop3": { "type": "string" },
"prop4": { "type": "string" }
}
...
On top of this, certain properties of the individual product array objects may be indirected via further usage of anyOf or oneOf.
I'm running into issues in VSCode using the built-in schema validation where it throws errors for every item in the products array that don't match Product_Type1.
So it seems the validator latches onto that first oneOf it found and won't validate against any of the other types.
I didn't find any limitations to the oneOf mechanism on jsonschema.org. And there is no mention of it being used in the page specifically dealing with arrays here: https://json-schema.org/understanding-json-schema/reference/array.html
Is what I'm attempting possible?
Your general approach is fine. Let's take a slightly simpler example to illustrate what's going wrong.
Given this schema
{
"oneOf": [
{ "properties": { "foo": { "type": "integer" } } },
{ "properties": { "bar": { "type": "integer" } } }
]
}
And this instance
{ "foo": 42 }
At first glance, this looks like it matches /oneOf/0 and not oneOf/1. It actually matches both schemas, which violates the one-and-only-one constraint imposed by oneOf and the oneOf fails.
Remember that every keyword in JSON Schema is a constraint. Anything that is not explicitly excluded by the schema is allowed. There is nothing in the /oneOf/1 schema that says a "foo" property is not allowed. Nor does is say that "foo" is required. It only says that if the instance has a keyword "foo", then it must be an integer.
To fix this, you will need required and maybe additionalProperties depending on the situation. I show here how you would use additionalProperties, but I recommend you don't use it unless you need to because is does have some problematic properties.
{
"oneOf": [
{
"properties": { "foo": { "type": "integer" } },
"required": ["foo"],
"additionalProperties": false
},
{
"properties": { "bar": { "type": "integer" } },
"required": ["bar"],
"additionalProperties": false
}
]
}

Ingesting multi-valued dimension from comma sep string

I have event data from Kafka with the following structure that I want to ingest in Druid
{
"event": "some_event",
"id": "1",
"parameters": {
"campaigns": "campaign1, campaign2",
"other_stuff": "important_info"
}
}
Specifically, I want to transform the dimension "campaigns" from a comma-separated string into an array / multi-valued dimension so that it can be nicely filtered and grouped by.
My ingestion so far looks as follows
{
"type": "kafka",
"dataSchema": {
"dataSource": "event-data",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "timestamp",
"format": "posix"
},
"flattenSpec": {
"fields": [
{
"type": "root",
"name": "parameters"
},
{
"type": "jq",
"name": "campaigns",
"expr": ".parameters.campaigns"
}
]
}
},
"dimensionSpec": {
"dimensions": [
"event",
"id",
"campaigns"
]
}
},
"metricsSpec": [
{
"type": "count",
"name": "count"
}
],
"granularitySpec": {
"type": "uniform",
...
}
},
"tuningConfig": {
"type": "kafka",
...
},
"ioConfig": {
"topic": "production-tracking",
...
}
}
Which however leads to campaigns being ingested as a string.
I could neither find a way to generate an array out of it with a jq expression in flattenSpec nor did I find something like a string split expression that may be used as a transformSpec.
Any suggestions?
Try setting useFieldDiscover: false in your ingestion spec. when this flag is set to true (which is default case) then it interprets all fields with singular values (not a map or list) and flat lists (lists of singular values) at the root level as columns.
Here is a good example and reference link to use flatten spec:
https://druid.apache.org/docs/latest/ingestion/flatten-json.html
Looks like since Druid 0.17.0, Druid expressions support typed constructors for creating arrays, so using expression string_to_array should do the trick!

Newbie: Transform json repetitions to Single Arrays and add Entry Inside

I'm looking for some Jolt help. I'm officially a newbie attempting to understand if Jolt will suffice for usage in our solution. I can't seem to get the following output as I would expect.
I was wondering if anyone could help me, and see if this template would be able to help out on our other needs for structures of the same fashion. Also, if anyone has the best/better reading they suggest on Jolt, I'm happy to hear of that as well.
In a nutshell, I'm trying to use JOLT to handle the following Needs:
From a JSON input, if there are multiple repetitions of a structure, output these repetitions as an array of the independent json structures.
From the parent level, take a named match SetIDAL1 and place into the top created json structure, but not as part of the array, and as its own element.
Here is an example of my input I need to transform:
"AL1": {
"0": {
"AllergenCodeMnemonicDescription": {
"1": {
"Text": "ASPIRIN",
"ID": "TEST1"
},
"2": {
"Text": "TYLENOL",
"ID": "TEST2"
}
},
"SetIDAL1": "1"
},
"1": {
"AllergenCodeMnemonicDescription": {
"1": {
"Text": "ADVIL"
}
},
"SetIDAL1": "2"
}
}
}
My current spec I'm using seems to get me pretty close. However, I can't get the arrays up a level in order to remove the 0:1, nor fix the fact that I want the SetIDAL1 value placed inside the newly created array object, instead it makes it's own array object. I've played around with various other options that only lead me further away. Any help for the solution and input/guidance, general "smart ways" to look at this issue would be appreciated.
Unfortunately, I do not have a copy of previous work I tried, which would perform the matching on all groups and map them as expected. I started moving toward matching each individual 0/1 object underneath my input in attempts to see if I could "bury" the SetIDAL properly, to no avail. I really do not want to code for each level, but hoping there's a solution for the "problem at hand" that someone can assist me with.
[
{
"operation": "shift",
"spec": {
"AL1": {
"0": {
"AllergenCodeMnemonicDescription": {
"#": "AL1.[].AllergenCodeMnemonicDescription.[]"
},
"SetIDAL1": "AL1.[].SetIDAL1"
},
"1": {
"AllergenCodeMnemonicDescription": {
"*": "AL1.[].AllergenCodeMnemonicDescription.[]"
},
"SetIDAL1": "AL1.[].SetIDAL1"
}
}
}
}
]
here is the output I am getting. I assume potentially I need yet another shift after this, to bring the "1"/"2" levels "UP again" somehow. But I can't seem to get the SetIDAL1 in the correct place as stated before.
{
"AL1" : [ {
"AllergenCodeMnemonicDescription" : [ {
"1" : {
"Text" : "ASPIRIN",
"ID" : "TEST1"
},
"2" : {
"Text" : "TYLENOL",
"ID" : "TEST2"
}
} ]
}, {
"SetIDAL1" : "1"
}, {
"AllergenCodeMnemonicDescription" : [ {
"Text" : "ADVIL"
} ]
}, {
"SetIDAL1" : "2"
} ]
}
Here is the output I need:
{
"AL1": [
{
"AllergenCodeMnemonicDescription": [
{
"Text": "ASPIRIN",
"ID": "TEST1"
},
{
"Text": "TYLENOL",
"ID": "TEST2"
}
],
"SetIDAL1": "1"
},
{
"AllergenCodeMnemonicDescription": [
{
"Text": "ADVIL"
}
],
"SetIDAL1": "2"
}
]
}
As the indexes start at 1 (not 0) was getting a null on the first element from each Allergen array and had to remove it in the end (not sure if theres a better way to do this), but this specs will do the trick:
[
{
"operation": "shift",
"spec": {
"AL1": {
"*": {
"AllergenCodeMnemonicDescription": {
"*": {
"Text": "AL1.[&3].AllergenCodeMnemonicDescription.[&1].&",
"ID": "AL1.[&3].AllergenCodeMnemonicDescription.[&1].&"
}
},
"SetIDAL1": "AL1.[&1].&"
}
}
}
},
{
"operation": "remove",
"spec": {
"AL1": {
"*": {
"AllergenCodeMnemonicDescription": {
"0": ""
}
}
}
}
}
]

MongoDb Query returning unwanted documents

I have a database containing documents of two structures:
{
"name": "",
"name_ar": "",
"description": "",
"bla1": {
"name": "",
"link": "",
"Logo": ""
},
"bla2": {
"name": "",
"id": ""
}
}
and
{
"name": "",
"name_ar": "",
"description": "",
"bla1": {
"name": [],
"link": "",
"Logo": ""
},
"bla2": {
"name": "",
"id": ""
}
}
I want to query my collection to get documents with "bla1.name" exactly equal to something. However using the following query:
{$and: [{'bla1.name': {'$type': 'string'}}, {"bla1.name":'something'}]}
returns all documents (even where "bla1.name" is an array) containing the name: 'something'.
What am I doing wrong?
From the MongoDB docs:
$type now works with arrays in the same way it works with other BSON types. Previous versions only matched documents where the field contained a nested array.
That means: If an array has at least one element with the given type it gets selected.
If you want to exclude arrays as type you have to extend your query. As the query already matches strings, you can exclude the type selection for string:
$and: [
// not necessary any more, as this selection is already implied by the last part
// {
// "bla1.name": {
// "$type": "string"
// }
// },
{
"bla1.name": {
$not: {
"$type": "array"
}
}
}, {
"bla1.name": "something"
}
]
See the official docs: https://docs.mongodb.com/manual/reference/operator/query/type/#behavior
Here is a working demo on the Mongo playground: https://mongoplayground.net/p/3ri7Bjfrae8

How can I count all possible subdocument elements for a given top element in Mongo?

Not sure I am using the right terminology here, but assume following oversimplified JSON structure available in Mongo :
{
"_id": 1234,
"labels": {
"label1": {
"id": "l1",
"value": "abc"
},
"label3": {
"id": "l2",
"value": "def"
},
"label5": {
"id": "l3",
"value": "ghi"
},
"label9": {
"id": "l4",
"value": "xyz"
}
}
}
{
"_id": 5678,
"labels": {
"label1": {
"id": "l1",
"value": "hjk"
},
"label5": {
"id": "l5",
"value": "def"
},
"label10": {
"id": "l10",
"value": "ghi"
},
"label24": {
"id": "l24",
"value": "xyz"
}
}
}
I know my base element name (labels in the example), but I do not know the various sub elements I can have (so in this case the labelx names).
How can I group / count the existing elements (like as if I would be using a wildcard) so I would get some distinct overview like
"label1":2
"label3":1
"label5":2
"label9":1
"label10":1
"label24":1
as a result? So far I only found examples where you actually need to know the element names. But I don't know them and want to find some way to get all possible sub element names for a given top element for easy review.
In reality the label names can be pretty wild, I used labelx for readability in the example.
You can try below aggregation in 3.4.
Use $objectToArray to transform object to array of key value pairs followed by $unwind and $group on key to count occurrences.
db.col.aggregate([
{"$project":{"labels":{"$objectToArray":"$labels"}}},
{"$unwind":"$labels"},
{"$group":{"_id":"$labels.k","count":{"$sum":1}}}
])