MongoDB data model for fast reads using array data

MongoDB data model for fast reads using array data - mongodb

I have a dataset which returns an array named "data_arr" that contains anywhere from 5 to 200 subitems which have a labelspace & key-value pair as follows.
{
"other_fields": "other_values",
...
"data_arr":[
{
"labelspace": "A",
"color": "red"
},
{
"labelspace": "A",
"size": 500
},
{
"labelspace": "B",
"shape": "round"
},
]
}
The question is, within MongoDB, how to store this data optimized for fast reads. Specifically, there would be queries:
Comparing key-values (ie. Average size of objects which are both red
and round).
Returning all documents which meet a criteria (ie. Red objects
larger than 300).
Label space is important because some key names are reused.
I've contemplated indexing with the existing structure by indexing labelspace.
I've considered grouping all labelspace key/values into a single sub-document as follows:
{
"other_fields": "other_values",
...
"data_a":
{
"color": "red",
"size": 500
},
"data_b":
{
"shape": "round"
}
}
Or modeling it as follows with a multi-value index
{
"other_fields": "other_values",
...
"data_arr":[
{
"labelspace": "A",
"key": "color",
"value": "red"
},
{
"labelspace": "A",
"key": "size",
"value": 500
},
{
"labelspace": "B",
"key": "shape",
"value": "round"
},
]
}
This is a new data set that needs to be collected. So it's difficult for me to build up enough of a sample only to discover I've ventured down the wrong path.
I think the last one is best suited for indexing, so possibly the best approach?

Related

MongoDB - different structure for error "cannot index parallel arrays"

i have the following document structure:
{
"_id": "123",
"timestamp": 1628632419,
"propertyA": "A",
"propertyB": "B",
"propertyC": "C",
"propertyD": "D",
"propertyE": "E",
"myArray": [
{
"myNestedArray": [
{
"name": "NestedName1",
"value": "1"
},
{
"name": "NestedName2",
"value": "2"
},
{
"name": "NestedName3",
"value": "3"
}
],
"type": "MyType",
"name": "MyName",
"nestedPropertyA": "A",
"nestedPropertyB": "B",
"nestedPropertyC": "C",
"nestedPropertyD": "D",
"nestedPropertyE": "E",
},
...
]
}
With that, I want to create an index like that:
collection.createIndex({
'myArray.type': 1,
'myArray.myNestedArray.name': 1,
'myArray.myNestedArray.value': 1,
})
This results in:
cannot index parallel arrays
I read through mongoDB's documentation and I understand where the problem is. Now my question is, what is a good structure for my document, in order that my indexing is working?
I found the approach to structure from:
{a:[1,2], b:[8,9]}
to:
{ab:[[1,8], [1,9], [2,8], [2,9]]}
But as I see this approach for my situation, the objects under myArray are too complex.
I was also thinking about moving the array indices as own properties like:
"type": "MyType",
"name": "MyName",
"myNestedArray0": {
"name": "NestedName1",
"value": "1"
},
"myNestedArray1": {
"name": "NestedName2",
"value": "1"
},
...
But this feels wrong and is also not really flexible, furthermore the indexing would be a fix number like:
collection.createIndex({
'myArray.type': 1,
'myArray.myNestedArray0.name': 1,
'myArray.myNestedArray0.value': 1,
'myArray.myNestedArray1.name': 1,
'myArray.myNestedArray1.value': 1,
...
})
Another thought would be, refactoring the myNestedArray to an independent collection. My problem here is, that I need the properties like "propertyA", "propertyB" etc. Furthermore, the myNestedArray could have many entries, so it could multiply the amount of documents immensive.
Can someone give me an advice how to proceed here.

How can I count all possible subdocument elements for a given top element in Mongo?

Not sure I am using the right terminology here, but assume following oversimplified JSON structure available in Mongo :
{
"_id": 1234,
"labels": {
"label1": {
"id": "l1",
"value": "abc"
},
"label3": {
"id": "l2",
"value": "def"
},
"label5": {
"id": "l3",
"value": "ghi"
},
"label9": {
"id": "l4",
"value": "xyz"
}
}
}
{
"_id": 5678,
"labels": {
"label1": {
"id": "l1",
"value": "hjk"
},
"label5": {
"id": "l5",
"value": "def"
},
"label10": {
"id": "l10",
"value": "ghi"
},
"label24": {
"id": "l24",
"value": "xyz"
}
}
}
I know my base element name (labels in the example), but I do not know the various sub elements I can have (so in this case the labelx names).
How can I group / count the existing elements (like as if I would be using a wildcard) so I would get some distinct overview like
"label1":2
"label3":1
"label5":2
"label9":1
"label10":1
"label24":1
as a result? So far I only found examples where you actually need to know the element names. But I don't know them and want to find some way to get all possible sub element names for a given top element for easy review.
In reality the label names can be pretty wild, I used labelx for readability in the example.

You can try below aggregation in 3.4.
Use $objectToArray to transform object to array of key value pairs followed by $unwind and $group on key to count occurrences.
db.col.aggregate([
{"$project":{"labels":{"$objectToArray":"$labels"}}},
{"$unwind":"$labels"},
{"$group":{"_id":"$labels.k","count":{"$sum":1}}}
])

How would I use the attribute as input for Value with JOLT?

For a specific function that I am building, I need to parse my JSON and have in some cases the attribute, instead of the value itself, be used as the value for the attribute. But how do I manage that with JOLT?
Let's say this is my input:
{
"Results": [
{
"FirstName": "John",
"LastName": "Doe"
},
{
"FirstName": "Mary",
"LastName": "Joe"
},
{
"FirstName": "Thomas",
"LastName": "Edison"
}
]
}
And this should be the outcome:
{
"Results": [
{
"Name": "FirstName",
"Value": "John"
},
{
"Name": "FirstName",
"Value": "Mary"
},
{
"Name": "FirstName",
"Value": "Thomas"
},
{
"Name": "LastName",
"Value": "Doe"
},
{
"Name": "LastName",
"Value": "Doe"
},
{
"Name": "LastName",
"Value": "Edison"
},
]
}
For those interested.. I'm building a JSON to Excel export functionality in Mendix and it has to be completely dynamic, regardless of the input. To accomplish this, I need an array where each attribute (equal to a column in Excel) has to be it's own object with a column name and a value. If each column data is it's own object, I can simply say "create column for each object with the same "Name". Little bit difficult to explain, but it 'should' work.

Arrays and Jolt, are not the best. Basically there are 3 ways to deal with arrays in Shift.
you explicitly assign data to an array position. Aka foo[0] and foo[1]
you reference a "number" that exists in the input data. Aka foo[&2] and foo[&3]
you "accumulate" data into a list. Aka foo[].
Your input data is array of size 3. Your desired output is an array of size 6. You want this to be flexible and be able to handle variable inputs.
This means option 3. So you have to "fix" / process your data into it "final form", while maintaining the original input Json structure (of a list with 3 items), and then accumulate all the "built" items into a list.
This means that you are buildling a list of lists, and then finally "squashing" it down to a single list.
Spec
[
{
// Step 1 : Pivot the data into parallel lists of keys and values
// maintaining the original outer input list structure.
"operation": "shift",
"spec": {
"Results": {
"*": { // results index
"*": { // FirstName / Lastname
"$": "temp[&2].keys[]",
"#": "temp[&2].values[]"
}
}
}
}
},
{
// Step 2 : Un-pivot the data into the desired
// Name/Value pairs, using the inner array index to
// keep things organized/separated.
"operation": "shift",
"spec": {
"temp": {
"*": { // temp index
"keys": {
"*": "temp[&2].[&].Name"
},
"values": {
"*": "temp[&2].[&].Value"
}
}
}
}
},
{
// Step 3 : Accumulate the "finished" Name/Value pairs
// into the final "one big list" output.
"operation": "shift",
"spec": {
"temp": {
"*": { // outer array
"*": "Results[]"
}
}
}
}
]

C# MongoDB driver - Change array item position?

I wonder if there is a way to change an item's position in an array using C# MongoDB driver?
For example, I have this document:
{
"Item": "X",
"Values": [
{
"Value": 1
},
{
"Value": 2
}
]
}
and I want to change the order of the items in the Values array let's say:
{
"Item": "X",
"Values": [
{
"Value": 2
},
{
"Value": 1
}
]
}
what I'm currently doing is using PullFilter to remove "value": 2" and then use PushEach to insert it at the specific position I need.
But in my case, the array items are large objects so I was wondering if there is a way to just change the item's position without having to remove and re-insert it again.

MongoDB and Nested Arrays

I have a collection with data like this:
{
"Name": "Steven",
"Children": [
{
"Name": "Liv",
"Children": [
{
"Name": "Milo"
}
]
},
{
"Name": "Mia"
},
{
"Name": "Chelsea"
}
]
},
{
"Name": "Ozzy",
"Children": [
{
"Name": "Jack",
"Children": [
{
"Name": "Pearl"
}
]
},
{
"Name": "Kelly"
}
]
}
Two questions
Can MongoDB flatten the arrays to a structure like this [Steven, Liv, Milo, Mia,Chelsea, Ozzy, Jack, Pearl,Kelly]
How can I find the a document where name is jack, no matter where in the structure it is placed

In general, MongoDB does not perform recursive or arbitrary-depth operations on nested fields. To accomplish objectives 1 and 2 I would reconsider the structure of the data as an arbitrarily nested document is not a good way to model a tree in MongoDB. The MongoDB docs have a good section of modelin tree structures that present several options with examples. Pick the one that best suits your entire use case - they will all make 1 and 2 very easy.