I have something like this:
{
"_id": ...,
"section": {
"a": "101",
"b": "101",
"c": "000",
"d": "101"
}
}
How can I unset all fields with value "101"? I know nothing about keys ("a", "b", "c"), only value.
Query
schema in general must be known even if some fields are missing, so having unknown schema is not good idea
uknown schema causes many problems and slower, more complicated queries, but you still can do it with objectToAray arrayToObject etc
Playmongo
aggregate(
[{"$set":
{"section":
{"$arrayToObject":
[{"$filter":
{"input": {"$objectToArray": "$section"},
"cond": {"$ne": ["$$this.v", "101"]}}}]}}}])
Related
i have the following document structure:
{
"_id": "123",
"timestamp": 1628632419,
"propertyA": "A",
"propertyB": "B",
"propertyC": "C",
"propertyD": "D",
"propertyE": "E",
"myArray": [
{
"myNestedArray": [
{
"name": "NestedName1",
"value": "1"
},
{
"name": "NestedName2",
"value": "2"
},
{
"name": "NestedName3",
"value": "3"
}
],
"type": "MyType",
"name": "MyName",
"nestedPropertyA": "A",
"nestedPropertyB": "B",
"nestedPropertyC": "C",
"nestedPropertyD": "D",
"nestedPropertyE": "E",
},
...
]
}
With that, I want to create an index like that:
collection.createIndex({
'myArray.type': 1,
'myArray.myNestedArray.name': 1,
'myArray.myNestedArray.value': 1,
})
This results in:
cannot index parallel arrays
I read through mongoDB's documentation and I understand where the problem is. Now my question is, what is a good structure for my document, in order that my indexing is working?
I found the approach to structure from:
{a:[1,2], b:[8,9]}
to:
{ab:[[1,8], [1,9], [2,8], [2,9]]}
But as I see this approach for my situation, the objects under myArray are too complex.
I was also thinking about moving the array indices as own properties like:
"type": "MyType",
"name": "MyName",
"myNestedArray0": {
"name": "NestedName1",
"value": "1"
},
"myNestedArray1": {
"name": "NestedName2",
"value": "1"
},
...
But this feels wrong and is also not really flexible, furthermore the indexing would be a fix number like:
collection.createIndex({
'myArray.type': 1,
'myArray.myNestedArray0.name': 1,
'myArray.myNestedArray0.value': 1,
'myArray.myNestedArray1.name': 1,
'myArray.myNestedArray1.value': 1,
...
})
Another thought would be, refactoring the myNestedArray to an independent collection. My problem here is, that I need the properties like "propertyA", "propertyB" etc. Furthermore, the myNestedArray could have many entries, so it could multiply the amount of documents immensive.
Can someone give me an advice how to proceed here.
I have a existing collection of ~12 million documents. I want to update one field in all the documents to have a running number within all groups of documents that share a common "ref" field. This would be a one time operation. Is there any way I can achieve this in MongoDB 4.4?
Simplified documents example:
{"_id": 1, "ref": "REF_A", "description": "aaaa"}
{"_id": 2, "ref": "REF_A", "description": "bbbb"}
{"_id": 3, "ref": "REF_A", "description": "cccc"}
{"_id": 4, "ref": "REF_B", "description": "dddd"}
{"_id": 5, "ref": "REF_B", "description": "eeee"}
...
Desired modified output:
{"_id": 1, "ref": "REF_A", "description": "aaaa1"}
{"_id": 2, "ref": "REF_A", "description": "bbbb2"}
{"_id": 3, "ref": "REF_A", "description": "cccc3"}
{"_id": 4, "ref": "REF_B", "description": "dddd1"} <- reset count because ref changed
{"_id": 5, "ref": "REF_B", "description": "eeee2"}
...
The running number is concatenated to description field here. As soon as the "ref" changes, the concat number counter should reset and start from 1 again. When sorted by "_id" all the same refs are already together. Order matters.
I've been looking at aggregations to solve this, but it seems I would need a way to refer to previous documents and could not figure it out yet.
The best I could find is this thread:
Add some kind of row number to a mongodb aggregate command / pipeline
But does not seem to suit my case where row number is being reset under a condition.
Query1
sort by ref and descript
group by ref and collect the docs for each ref-group
map to add the index of each member to the description
unwind replace root, to reset the initial structure
you need to add $out stage in the end of the pipeline(see your driver how to do this), to save this to a new collection, and then replace the one you have now. (we can't use group in any update method even with pipeline)
(only way is $out or $merge, but merge will be slower)
set also {allowDiskUse : true}
Test code here
aggregate(
[{"$sort": {"ref": 1, "description": 1}},
{"$group": {"_id": "$ref", "docs": {"$push": "$$ROOT"}}},
{"$set":
{"docs":
{"$map":
{"input": {"$range": [0, {"$size": "$docs"}]},
"in":
{"$let":
{"vars": {"doc": {"$arrayElemAt": ["$docs", "$$this"]}},
"in":
{"$mergeObjects":
["$$doc",
{"description":
{"$concat":
["$$doc.description",
{"$toString": {"$add": ["$$this", 1]}}]}}]}}}}}}},
{"$unwind": "$docs"},
{"$replaceRoot": {"newRoot": "$docs"}}])
*In MongoDB 5 we have $setWindowFields for those, but you have MongoDB 4.4, so we only have $group i think, give it a try, but you have many documents.
Query2
requires >= MongoDB 5 which you don't have now
again you need out $out
Test code here
aggregate(
[{"$setWindowFields":
{"partitionBy": "$ref",
"sortBy": {"description": 1},
"output": {"rank": {"$rank": {}}}}},
{"$set":
{"description": {"$concat": ["$description", {"$toString": "$rank"}]},
"rank": "$$REMOVE"}}])
I have an aggregate call in MongoDB (v4.2) where I'm doing a $lookup and $unwind of related sub-docs but I'm having problems figuring out how to sort. The sub-doc has a field called created and so does the top-level doc:
Wine.aggregate(
[
{
"$match": {
"user": ObjectId("<userId>")
}
},
{
"$sort": {
"created": -1
}
},
{
"$lookup": {
"from": "wineuniversals", // the collection name
"localField": "wineUniversal", // field from the wines model
"foreignField": "_id", // field how the two collections are linked
"as": "uWineData" // the object property where the universal wine data is stored
}
},
{
"$unwind": {
"path": "$uWineData"
}
}
]
).exec(function(err, wines) {...});
And here is an example of the docs it returns:
{
"_id": "5dbbc408e78d867664213147",
"photoURL": "51972ee99dec8f31cdb6ff8025a0d3d3",
"user": "554f99352ee62248071b4d0f",
"mode": "past",
"wineUniversal": "5dce6038e78d8676642131fd",
"hidden": false,
"deleted": false,
"eventBlindTasting": false,
"quantity": 1,
"groupDescription": "none",
"comment": "Surprisingly tasty. You'd think it's be olonk based on the gimmicky label.",
"scoreTotal": 91,
"scoreOverallImpression": 4.2,
"scoreFinish": 4,
"scoreTaste": 4,
"scoreAroma": 4.3,
"lastUpdated": "2020-05-21T23:06:54.497Z",
"created": "2020-05-21T23:06:54.498Z", // FIRST INSTANCE OF CREATED
"uWineData": {
"_id": "5dce6038e78d8676642131fd",
"scoreCount": 1,
"averageScore": 0,
"userWines": ["5dbbc408e78d867664213147"],
"expertScore2": "",
"expertReviewer2": null,
"expertScore1": "",
"expertReviewer1": null,
"currency": "USD",
"commonPrice": 12,
"additionalDetails": "",
"designation": "",
"category": "Red",
"varietal": "Cabernet Sauvignon",
"vineyard": "",
"appellation": "California",
"subRegion": "",
"region": "California",
"country": "United States",
"wineryUrl": "https://www.thewalkingdeadwine.com",
"winery": "The Walking Dead Wines",
"vintage": "2016",
"deleted": false,
"lastUpdated": "2019-11-15T08:22:32.437Z",
"created": "2019-11-15T08:22:16.579Z", // SECOND INSTANCE OF CREATED IN SUBDOC
"__v": 1
}
}
It looks like the $sort step is keying off the created field in the subdoc instead of the top-level doc's created field.
Is there a way to reference the created in the top-level doc so my $sort step orders the docs by the top level created field?
Ah ha! Turns out that adding the preserverNullAndEmptyPathways: true to the $unwind fixed the issue. Some of the top-level docs didn't have a sub-doc available yet (eg, the most recent docs hadn't yet had an _id linkage added to them, so there wasn't anything to $unwind, thus the Null/Empty issue).
{
"path": "$uWineData",
"preserveNullAndEmptyArrays": true
}
I've written some $redact operation to filter my documents:
db.test.aggregate([
{ $redact: {
$cond: {
if: { "$ifNull" : ["$_acl.READ", false] },
then: { $cond: {
if: { $anyElementTrue: {
$map: {
input: "$_acl.READ",
as: "myfield",
in: { $setIsSubset: [ "$$myfield", ["user1“] ] }
}
}},
then: "$$DESCEND",
else: "$$PRUNE"
}},
else: "$$DESCEND",
}
}}
])
This will remove all (sub)documents, where _acl.READ doesn't contain user1. But it will keep all (sub)documents where _acl.READ is not set.
After the aggregation I can't tell if some information was removed of if it simply wasn't part of the document.
Though I'd like remove sensitive information, but keep some hint which tells that access was denied. I.e.
{
id: ...,
subDoc1: {
foo: "bar",
_acl: {
READ: [ ["user1"] ]
}
},
subDoc2: {
_error: "ACCESS DENIED"
}
}
I just can't figure out, how to modify the document while using $redact.
Thank you!
The $redact pipeline stage is quite unique in the aggregation framework as it is not only capable of recursively descending into the nested structure of a document but also in that it can traverse across all of the keys at any level. It does however still require a concept of "depth" in that a key must either contain a sub-document object or an array which itself is composed of sub-documents.
But what it cannot do is "replace" or "swap-out" content. The only actions allowed here are fairly set, or more specifically from the documentation:
The argument can be any valid expression as long as it resolves to $$DESCEND, $$PRUNE, or $$KEEP system variables. For more information on expressions, see Expressions.
The possibly misleading statement there is "The argument can be any valid expression", which is in fact true, but it must however return exactly the same content as what would be resolved to be present in one of those system variables anyhow.
So in order to give some sort of "Access Denied" response in replacement of the "redacted" content, you would have to process differently. Also you would need to consider the limitations of other operators which could simply not work in a "recursive" or in a manner that requires "traversal" as mentioned earlier.
Keeping with the example from the documentation:
{
"_id": 1,
"title": "123 Department Report",
"tags": [ "G", "STLW" ],
"year": 2014,
"subsections": [
{
"subtitle": "Section 1: Overview",
"tags": [ "SI", "G" ],
"content": "Section 1: This is the content of section 1."
},
{
"subtitle": "Section 2: Analysis",
"tags": [ "STLW" ],
"content": "Section 2: This is the content of section 2."
},
{
"subtitle": "Section 3: Budgeting",
"tags": [ "TK" ],
"content": {
"text": "Section 3: This is the content of section3.",
"tags": [ "HCS" ]
}
}
]
}
If we want to process this to "replace" when matching the "roles tags" of [ "G", "STLW" ], then you would do something like this instead:
var userAccess = [ "STLW", "G" ];
db.sample.aggregate([
{ "$project": {
"title": 1,
"tags": 1,
"year": 1,
"subsections": { "$map": {
"input": "$subsections",
"as": "el",
"in": { "$cond": [
{ "$gt": [
{ "$size": { "$setIntersection": [ "$$el.tags", userAccess ] }},
0
]},
"$$el",
{
"subtitle": "$$el.subtitle",
"label": { "$literal": "Access Denied" }
}
]}
}}
}}
])
That's going to produce a result like this:
{
"_id": 1,
"title": "123 Department Report",
"tags": [ "G", "STLW" ],
"year": 2014,
"subsections": [
{
"subtitle": "Section 1: Overview",
"tags": [ "SI", "G" ],
"content": "Section 1: This is the content of section 1."
},
{
"subtitle": "Section 2: Analysis",
"tags": [ "STLW" ],
"content": "Section 2: This is the content of section 2."
},
{
"subtitle" : "Section 3: Budgeting",
"label" : "Access Denied"
}
]
}
Basically, we are rather using the $map operator to process the array of items and pass a condition to each element. In this case the $cond operator first looks at the condition to decide whether the "tags" field here has any $setIntersection result with the userAccess variable we defined earlier.
Where that condition was deemed true then the element is returned un-altered. Otherwise in the false case, rather than remove the element ( not impossible with $map but another step), since $map returns an equal number of elements as it received in "input", you just replace the returned content with something else. In this case and object with a single key and a $literal value. Being "Access Denied".
So keep in mind what you cannot do, being:
You cannot actually traverse document keys. Any processing needs to be explicit to the keys specifically mentioned.
The content therefore cannot be in another other form than an array as MongoDB cannot traverse accross keys. You would need to otherwise evaluate conditionally at each key path.
Filtering the "top-level" document is right out. Unless you really want to add an additional stage at the end that does this:
{ "$project": {
"doc": { "$cond": [
{ "$gt": [
{ "$size": { "$setIntersection": [ "$tags", userAccess ] }},
0
]},
"$ROOT",
{
"title": "$title",
"label": { "$literal": "Access Denied" }
}
]}
}}
With all said and done, there really is not a lot of purpose in any of this unless you are indeed intending to actually "aggregate" something at the end of the day. Just making the server do exactly the same filtering of document content that you can do in client code it usually not the best use of expensive CPU cycles.
Even in the basic examples as given, it makes a lot more sense to just do this in client code unless you are really getting a major benefit out of removing entries that do not meet your conditions from being transferred over the network. In your case there is no such benefit, so better to client code instead.
I'm trying to store a multitude of documents which are double linked i.e. they can have a predecessor and a successor. Since the collection exists of different documents I'm not sure if I can create a feasible index on it:
{"_id": "1234", "title": "Document1", "content":"...", "next": "1236"}
{"_id": "1235", "title": "Document2", "content":"...", "next": "1238"}
{"_id": "1236", "title": "Document1a", "content":"...", "prev": "1234"}
{"_id": "1237", "title": "Document2a", "content":"...", "prev": "1235", "next": "1238"}
{"_id": "1238", "title": "Document2b", "content":"...", "prev": "1237", "next": "1239"}
...
Since I'll need the whole 'history' of a document including prev and next documents I guess I'll have to perform a multitude of querys depending on the size of the list?
Any suggestions on how to create a performant index? A different structure for storing double linked lists would also be interesting.
If you want to optimize reading you can use arrays to store previous and next documents.
{
"_id": "1237",
"title": "Document1",
"content":"...",
"next": "1238",
"prev": "1235",
"parents" : [1000, 1235]
"children" : [1238, 1239]
}
You can then get all the documents where your _id is either in child or parents array. This solution is good if you only need parents or children of the document. To get a whole list you can't efficiently use indexes with $or and two $in operators.
Alternative and probably a better solution is to store the entire list for each document i.e. child and parents in one array:
{
"_id": "1237",
"title": "Document1",
"content":"...",
"next": "1238",
"prev": "1235",
"list_ids" : [1000, 1235, 1238, 1239, 1237]
}
That way you can have an index on list_ids and get all the documents with a simple $in query that will be fast.
The problem with both of the solutions is that you will need to update all related documents when you add a new document. So this is probably not a good solution if you're
going to have a write heavy app.