I'm trying to store a multitude of documents which are double linked i.e. they can have a predecessor and a successor. Since the collection exists of different documents I'm not sure if I can create a feasible index on it:
{"_id": "1234", "title": "Document1", "content":"...", "next": "1236"}
{"_id": "1235", "title": "Document2", "content":"...", "next": "1238"}
{"_id": "1236", "title": "Document1a", "content":"...", "prev": "1234"}
{"_id": "1237", "title": "Document2a", "content":"...", "prev": "1235", "next": "1238"}
{"_id": "1238", "title": "Document2b", "content":"...", "prev": "1237", "next": "1239"}
...
Since I'll need the whole 'history' of a document including prev and next documents I guess I'll have to perform a multitude of querys depending on the size of the list?
Any suggestions on how to create a performant index? A different structure for storing double linked lists would also be interesting.
If you want to optimize reading you can use arrays to store previous and next documents.
{
"_id": "1237",
"title": "Document1",
"content":"...",
"next": "1238",
"prev": "1235",
"parents" : [1000, 1235]
"children" : [1238, 1239]
}
You can then get all the documents where your _id is either in child or parents array. This solution is good if you only need parents or children of the document. To get a whole list you can't efficiently use indexes with $or and two $in operators.
Alternative and probably a better solution is to store the entire list for each document i.e. child and parents in one array:
{
"_id": "1237",
"title": "Document1",
"content":"...",
"next": "1238",
"prev": "1235",
"list_ids" : [1000, 1235, 1238, 1239, 1237]
}
That way you can have an index on list_ids and get all the documents with a simple $in query that will be fast.
The problem with both of the solutions is that you will need to update all related documents when you add a new document. So this is probably not a good solution if you're
going to have a write heavy app.
Related
I have a existing collection of ~12 million documents. I want to update one field in all the documents to have a running number within all groups of documents that share a common "ref" field. This would be a one time operation. Is there any way I can achieve this in MongoDB 4.4?
Simplified documents example:
{"_id": 1, "ref": "REF_A", "description": "aaaa"}
{"_id": 2, "ref": "REF_A", "description": "bbbb"}
{"_id": 3, "ref": "REF_A", "description": "cccc"}
{"_id": 4, "ref": "REF_B", "description": "dddd"}
{"_id": 5, "ref": "REF_B", "description": "eeee"}
...
Desired modified output:
{"_id": 1, "ref": "REF_A", "description": "aaaa1"}
{"_id": 2, "ref": "REF_A", "description": "bbbb2"}
{"_id": 3, "ref": "REF_A", "description": "cccc3"}
{"_id": 4, "ref": "REF_B", "description": "dddd1"} <- reset count because ref changed
{"_id": 5, "ref": "REF_B", "description": "eeee2"}
...
The running number is concatenated to description field here. As soon as the "ref" changes, the concat number counter should reset and start from 1 again. When sorted by "_id" all the same refs are already together. Order matters.
I've been looking at aggregations to solve this, but it seems I would need a way to refer to previous documents and could not figure it out yet.
The best I could find is this thread:
Add some kind of row number to a mongodb aggregate command / pipeline
But does not seem to suit my case where row number is being reset under a condition.
Query1
sort by ref and descript
group by ref and collect the docs for each ref-group
map to add the index of each member to the description
unwind replace root, to reset the initial structure
you need to add $out stage in the end of the pipeline(see your driver how to do this), to save this to a new collection, and then replace the one you have now. (we can't use group in any update method even with pipeline)
(only way is $out or $merge, but merge will be slower)
set also {allowDiskUse : true}
Test code here
aggregate(
[{"$sort": {"ref": 1, "description": 1}},
{"$group": {"_id": "$ref", "docs": {"$push": "$$ROOT"}}},
{"$set":
{"docs":
{"$map":
{"input": {"$range": [0, {"$size": "$docs"}]},
"in":
{"$let":
{"vars": {"doc": {"$arrayElemAt": ["$docs", "$$this"]}},
"in":
{"$mergeObjects":
["$$doc",
{"description":
{"$concat":
["$$doc.description",
{"$toString": {"$add": ["$$this", 1]}}]}}]}}}}}}},
{"$unwind": "$docs"},
{"$replaceRoot": {"newRoot": "$docs"}}])
*In MongoDB 5 we have $setWindowFields for those, but you have MongoDB 4.4, so we only have $group i think, give it a try, but you have many documents.
Query2
requires >= MongoDB 5 which you don't have now
again you need out $out
Test code here
aggregate(
[{"$setWindowFields":
{"partitionBy": "$ref",
"sortBy": {"description": 1},
"output": {"rank": {"$rank": {}}}}},
{"$set":
{"description": {"$concat": ["$description", {"$toString": "$rank"}]},
"rank": "$$REMOVE"}}])
Let's assume we have the following collections:
Users
{
"id": MongoId,
"username": "jsloth",
"first_name": "John",
"last_name": "Sloth",
"display_name": "John Sloth"
}
Places
{
"id": MongoId,
"name": "Conference Room",
"description": "Some longer description of this place"
}
Meetings
{
"id": MongoId,
"name": "Very important meeting",
"place": <?>,
"timestamp": "1506493396",
"created_by": <?>
}
Later on, we want to return (e.g. from REST webservice) list of upcoming events like this:
[
{
"id": MongoId(Meetings),
"name": "Very important meeting",
"created_by": {
"id": MongoId(Users),
"display_name": "John Sloth",
},
"place": {
"id": MongoId(Places),
"name": "Conference Room",
}
},
...
]
It's important to return basic information that need to be displayed on the main page in web ui (so no additional calls are needed to render the table). That's why, each entry contains display_name of the user who created it and name of the place. I think that's a pretty common scenario.
Now my question is: how should I store this information in db (question mark values in Metting document)? I see 2 options:
1) Store references to other collections:
place: MongoId(Places)
(+) data is always consistent
(-) additional calls to db have to be made in order to construct the response
2) Denormalize data:
"place": {
"id": MongoId(Places),
"name": "Conference room",
}
(+) no need for additional calls (response can be constructed based on one document)
(-) data must be updated each time related documents are modified
What is the proper way of dealing with such scenario?
If I use option 1), how should I query other documents? Asking about each related document separately seems like an overkill. How about getting last 20 meetings, aggregate the list of related documents and then perform a query like db.users.find({_id: { $in: <id list> }})?
If I go for option 2), how should I keep the data in sync?
Thanks in advance for any advice!
You can keep the DB model you already have and still only do a single query as MongoDB introduced the $lookup aggregation in version 3.2. It is similar to join in RDBMS.
$lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
So instead of storing a reference to other collections, just store the document ID.
I am trying to figure out specific mongoDb query, so far unsuccessfully.
Documents in my collections looks someting like this (contain more attributes, which are irrelevant for this query):
[{
"_id": ObjectId("596e01b6f4f7cf137cb3d096"),
"code": "A",
"name": "name1",
"sys": {
"cts": ISODate("2017-07-18T12:40:22.772Z"),
}
},
{
"_id": ObjectId("596e01b6f4f7cf137cb3d097"),
"code": "A",
"name": "name2",
"sys": {
"cts": ISODate("2017-07-19T12:40:22.772Z"),
}
},
{
"_id": ObjectId("596e01b6f4f7cf137cb3d098"),
"code": "B",
"name": "name3",
"sys": {
"cts": ISODate("2017-07-16T12:40:22.772Z"),
}
},
{
"_id": ObjectId("596e01b6f4f7cf137cb3d099"),
"code": "B",
"name": "name3",
"sys": {
"cts": ISODate("2017-07-10T12:40:22.772Z"),
}
}]
What I need is to get current versions of documents, filtered by code or name, or both. Current version means that from two(or more) documents with same code, I want pick the one which has latest sys.cts date value.
So, result of this query executed with filter name="name3" would be the 3rd document from previous list. Result of query without any filter would be 2nd and 3rd document.
I have an idea how to construct this query with changed data model but I was hoping someone could lead me right way without doing so.
Thank you
Given the following document in the database, I want to update pincode of address array.
I'm using the $ positional locator in Mongodb. But this does not find the document embedded multiple levels.
"_id": ObjectId("58b91ccf3dc9021191b256ff"),
"phone": 9899565656,
"Email": "sumit#mail.com",
"Organization": "xyz",
"Name": "sumit",
"address": [{
"city": "chennai",
"pincode": 91,
"_id": ObjectId("58b91db48682ab11ede79b28"),
"choice": [{
"_id": ObjectId("58b91fa6901a74124fd70d89")
}]
}]
Using this query to update.
db.presenters.update({"Email":"sumit#mail.com","address.city":"chennai"},{$set:{"address.$.pincode.": 95 }})
You seem to have incorrect field name while updating, an extra dot at the end. Try following
db.presenters.update({"Email":"sumit#mail.com","address.city":"chennai"},
{$set:{"address.$.pincode": 95 }})
In my scenerio, there are authors in a collection, each author has messages and each message of author can has events. Each actor allowed to perform only one kind of action once.
db.people.ensureIndex({messages.messageEvents.eventName: 1, messages.messageEvents.actorId: 1}, {unique: true});
I added index but it has no effect. As you see below, my document has three elements which have "eventName":"vote" and "actorId":"1234" that should be against my constraint.
How to ensure unique item in messageEvents array based on eventName and actorId fields ?
Actually, i need to update the existing item without a second search and update event instead of rejecting it .
{
"_id": "1234567",
"authorPoint": 0,
"messages": [
{
"messageId": "112",
"messageType": "Q",
"messagePoint": 0,
"messageEvents": [
{
"eventName": "Add",
"actorId": "1234",
"detail": ""
},
{
"eventName": "Vote",
"actorId": "1234",
"detail": "up"
},
{
"eventName": "Vote",
"actorId": "1234",
"detail": "down"
},
{
"eventName": "Vote",
"actorId": "1234",
"detail": "cork"
}
]
}
]
}
Mustafa, unique constraints are not enforced within a single array, although they're enforced among documents in a collection. This is a known bug that won't be fixed for a while:
https://jira.mongodb.org/browse/SERVER-1068
There's a workaround, though. Keep your unique index in place, and:
1) Ensure your application does not insert new documents with duplicate values in the array. You can check for uniqueness in your application code before inserting.
2) When updating existing documents use $addToSet instead of $push.