MongoDB running numbers within subsets of documents - mongodb

I have a existing collection of ~12 million documents. I want to update one field in all the documents to have a running number within all groups of documents that share a common "ref" field. This would be a one time operation. Is there any way I can achieve this in MongoDB 4.4?
Simplified documents example:
{"_id": 1, "ref": "REF_A", "description": "aaaa"}
{"_id": 2, "ref": "REF_A", "description": "bbbb"}
{"_id": 3, "ref": "REF_A", "description": "cccc"}
{"_id": 4, "ref": "REF_B", "description": "dddd"}
{"_id": 5, "ref": "REF_B", "description": "eeee"}
...
Desired modified output:
{"_id": 1, "ref": "REF_A", "description": "aaaa1"}
{"_id": 2, "ref": "REF_A", "description": "bbbb2"}
{"_id": 3, "ref": "REF_A", "description": "cccc3"}
{"_id": 4, "ref": "REF_B", "description": "dddd1"} <- reset count because ref changed
{"_id": 5, "ref": "REF_B", "description": "eeee2"}
...
The running number is concatenated to description field here. As soon as the "ref" changes, the concat number counter should reset and start from 1 again. When sorted by "_id" all the same refs are already together. Order matters.
I've been looking at aggregations to solve this, but it seems I would need a way to refer to previous documents and could not figure it out yet.
The best I could find is this thread:
Add some kind of row number to a mongodb aggregate command / pipeline
But does not seem to suit my case where row number is being reset under a condition.

Query1
sort by ref and descript
group by ref and collect the docs for each ref-group
map to add the index of each member to the description
unwind replace root, to reset the initial structure
you need to add $out stage in the end of the pipeline(see your driver how to do this), to save this to a new collection, and then replace the one you have now. (we can't use group in any update method even with pipeline)
(only way is $out or $merge, but merge will be slower)
set also {allowDiskUse : true}
Test code here
aggregate(
[{"$sort": {"ref": 1, "description": 1}},
{"$group": {"_id": "$ref", "docs": {"$push": "$$ROOT"}}},
{"$set":
{"docs":
{"$map":
{"input": {"$range": [0, {"$size": "$docs"}]},
"in":
{"$let":
{"vars": {"doc": {"$arrayElemAt": ["$docs", "$$this"]}},
"in":
{"$mergeObjects":
["$$doc",
{"description":
{"$concat":
["$$doc.description",
{"$toString": {"$add": ["$$this", 1]}}]}}]}}}}}}},
{"$unwind": "$docs"},
{"$replaceRoot": {"newRoot": "$docs"}}])
*In MongoDB 5 we have $setWindowFields for those, but you have MongoDB 4.4, so we only have $group i think, give it a try, but you have many documents.
Query2
requires >= MongoDB 5 which you don't have now
again you need out $out
Test code here
aggregate(
[{"$setWindowFields":
{"partitionBy": "$ref",
"sortBy": {"description": 1},
"output": {"rank": {"$rank": {}}}}},
{"$set":
{"description": {"$concat": ["$description", {"$toString": "$rank"}]},
"rank": "$$REMOVE"}}])

Related

MongoDB delete fields with unknown names, based on value

I have something like this:
{
"_id": ...,
"section": {
"a": "101",
"b": "101",
"c": "000",
"d": "101"
}
}
How can I unset all fields with value "101"? I know nothing about keys ("a", "b", "c"), only value.
Query
schema in general must be known even if some fields are missing, so having unknown schema is not good idea
uknown schema causes many problems and slower, more complicated queries, but you still can do it with objectToAray arrayToObject etc
Playmongo
aggregate(
[{"$set":
{"section":
{"$arrayToObject":
[{"$filter":
{"input": {"$objectToArray": "$section"},
"cond": {"$ne": ["$$this.v", "101"]}}}]}}}])

MongoDB: How do I $sort on top-level doc field instance of created?

I have an aggregate call in MongoDB (v4.2) where I'm doing a $lookup and $unwind of related sub-docs but I'm having problems figuring out how to sort. The sub-doc has a field called created and so does the top-level doc:
Wine.aggregate(
[
{
"$match": {
"user": ObjectId("<userId>")
}
},
{
"$sort": {
"created": -1
}
},
{
"$lookup": {
"from": "wineuniversals", // the collection name
"localField": "wineUniversal", // field from the wines model
"foreignField": "_id", // field how the two collections are linked
"as": "uWineData" // the object property where the universal wine data is stored
}
},
{
"$unwind": {
"path": "$uWineData"
}
}
]
).exec(function(err, wines) {...});
And here is an example of the docs it returns:
{
"_id": "5dbbc408e78d867664213147",
"photoURL": "51972ee99dec8f31cdb6ff8025a0d3d3",
"user": "554f99352ee62248071b4d0f",
"mode": "past",
"wineUniversal": "5dce6038e78d8676642131fd",
"hidden": false,
"deleted": false,
"eventBlindTasting": false,
"quantity": 1,
"groupDescription": "none",
"comment": "Surprisingly tasty. You'd think it's be olonk based on the gimmicky label.",
"scoreTotal": 91,
"scoreOverallImpression": 4.2,
"scoreFinish": 4,
"scoreTaste": 4,
"scoreAroma": 4.3,
"lastUpdated": "2020-05-21T23:06:54.497Z",
"created": "2020-05-21T23:06:54.498Z", // FIRST INSTANCE OF CREATED
"uWineData": {
"_id": "5dce6038e78d8676642131fd",
"scoreCount": 1,
"averageScore": 0,
"userWines": ["5dbbc408e78d867664213147"],
"expertScore2": "",
"expertReviewer2": null,
"expertScore1": "",
"expertReviewer1": null,
"currency": "USD",
"commonPrice": 12,
"additionalDetails": "",
"designation": "",
"category": "Red",
"varietal": "Cabernet Sauvignon",
"vineyard": "",
"appellation": "California",
"subRegion": "",
"region": "California",
"country": "United States",
"wineryUrl": "https://www.thewalkingdeadwine.com",
"winery": "The Walking Dead Wines",
"vintage": "2016",
"deleted": false,
"lastUpdated": "2019-11-15T08:22:32.437Z",
"created": "2019-11-15T08:22:16.579Z", // SECOND INSTANCE OF CREATED IN SUBDOC
"__v": 1
}
}
It looks like the $sort step is keying off the created field in the subdoc instead of the top-level doc's created field.
Is there a way to reference the created in the top-level doc so my $sort step orders the docs by the top level created field?
Ah ha! Turns out that adding the preserverNullAndEmptyPathways: true to the $unwind fixed the issue. Some of the top-level docs didn't have a sub-doc available yet (eg, the most recent docs hadn't yet had an _id linkage added to them, so there wasn't anything to $unwind, thus the Null/Empty issue).
{
"path": "$uWineData",
"preserveNullAndEmptyArrays": true
}

MongoDB upsert array document with golang

I have a document like below:
{
"_id": "1.0",
files: [
{"name": "file_1", "size": 1024, "create_ts": 1570862776426},
{"name": "file_2", "size": 2048, "create_ts": 1570862778426}
]
}
And I want to upsert “files” with "file_x":
1 if "file_x" already in "files", then update, for example "file_x" is:
{"name": "file_2", "size": 4096, "create_ts": 1570862779426}
after upsert document is:
{
"_id": "1.0",
files: [
{"name": "file_1", "size": 1024, "create_ts": 1570862776426},
{"name": "file_2", "size": 4096, "create_ts": 1570862779426}}
]
}
2 if "file_x" not in "files", insert it, for example "file_x" is:
{"name": "file_3", "size": 4096, "create_ts": 1570862779426}
after upsert document is :
{
"_id": "1.0",
files: [
{"name": "file_1", "size": 1024, "create_ts": 1570862776426},
{"name": "file_2", "size": 2048, "create_ts": 1570862778426},
{"name": "file_3", "size": 4096, "create_ts": 1570862779426}
]
}
So can I use one function to archive it ?
You will need to do this manually. There's no upsert mechanism for embedded structures inside a document.
First fetch the document, check if file_x is in the files list, if not, insert it. Then save the document back.
You should make sure that at any given time, only one program / goroutine is trying to do this, otherwise you will run into race conditions and file_x might get inserted multiple times.
There is not a single update operation in mongodb update language that will do what you want to do. You can get close by using $addToSet, which adds to a set of items if the item is not already there, but it will not update the item based on the match of a subset of fields. Your best option is to perform a read-update in memory-write.

How to update a property of a sub-document in an embedded array?

Given the following document in the database, I want to update pincode of address array.
I'm using the $ positional locator in Mongodb. But this does not find the document embedded multiple levels.
"_id": ObjectId("58b91ccf3dc9021191b256ff"),
"phone": 9899565656,
"Email": "sumit#mail.com",
"Organization": "xyz",
"Name": "sumit",
"address": [{
"city": "chennai",
"pincode": 91,
"_id": ObjectId("58b91db48682ab11ede79b28"),
"choice": [{
"_id": ObjectId("58b91fa6901a74124fd70d89")
}]
}]
Using this query to update.
db.presenters.update({"Email":"sumit#mail.com","address.city":"chennai"},{$set:{"address.$.pincode.": 95 }})
You seem to have incorrect field name while updating, an extra dot at the end. Try following
db.presenters.update({"Email":"sumit#mail.com","address.city":"chennai"},
{$set:{"address.$.pincode": 95 }})

How to create a (double) linked list structure in MongoDB?

I'm trying to store a multitude of documents which are double linked i.e. they can have a predecessor and a successor. Since the collection exists of different documents I'm not sure if I can create a feasible index on it:
{"_id": "1234", "title": "Document1", "content":"...", "next": "1236"}
{"_id": "1235", "title": "Document2", "content":"...", "next": "1238"}
{"_id": "1236", "title": "Document1a", "content":"...", "prev": "1234"}
{"_id": "1237", "title": "Document2a", "content":"...", "prev": "1235", "next": "1238"}
{"_id": "1238", "title": "Document2b", "content":"...", "prev": "1237", "next": "1239"}
...
Since I'll need the whole 'history' of a document including prev and next documents I guess I'll have to perform a multitude of querys depending on the size of the list?
Any suggestions on how to create a performant index? A different structure for storing double linked lists would also be interesting.
If you want to optimize reading you can use arrays to store previous and next documents.
{
"_id": "1237",
"title": "Document1",
"content":"...",
"next": "1238",
"prev": "1235",
"parents" : [1000, 1235]
"children" : [1238, 1239]
}
You can then get all the documents where your _id is either in child or parents array. This solution is good if you only need parents or children of the document. To get a whole list you can't efficiently use indexes with $or and two $in operators.
Alternative and probably a better solution is to store the entire list for each document i.e. child and parents in one array:
{
"_id": "1237",
"title": "Document1",
"content":"...",
"next": "1238",
"prev": "1235",
"list_ids" : [1000, 1235, 1238, 1239, 1237]
}
That way you can have an index on list_ids and get all the documents with a simple $in query that will be fast.
The problem with both of the solutions is that you will need to update all related documents when you add a new document. So this is probably not a good solution if you're
going to have a write heavy app.