MongoDb updating old documents - mongodb

I am new to MongoDb and to the NoSQL concept of storing information. Unlike in SQL Server, adding or updating a column name in a table will affect all of the records. I can assume that when I pull a record, that column will be there. It may or may not have a value depending on how it was setup.
But how would you handle a situation in MongoDb where you have an outdated document and you want to pull it and now your code is looking for a property that did not exist or was just recently renamed?

There are few examples of different technics.
Assuming we have a collection pets with documents like:
{
kind: "cat",
age: 2
}
and at some point we added a field "nick", so new documents look like:
{
kind: "cat",
age: 2,
nick: "Tom"
}
Your app require "nick" field to list pets.
Use defaults
Update old documents with default values (you may known it as 'ALTER TABLE' magic in SQL terms):
db.pets.updateMany({nick: {$exists: false}}, {$set: {nick: "NONAME"}});
If you need to support both versions you need to do it runtime.
On application side:
db.pets.find({}).forEach(pet => print(p.nick || "NONAME") );
On db side:
db.pets.aggregate([
{$project: {
kind: 1,
age: 1,
nick: { $ifNull: [ "$nick", "NONAME" ] }
}}
])
Ignore invalid documents
Remove them:
db.pets.remove({nick: {$exists: false}})
If you need to support both versions you need to filter them out runtime:
db.pets.find({
kind: {$exists: true},
age: {$exists: true},
nick: {$exists: true}
);
You can make it more defensive by specifying type:
db.pets.find({
kind: {$type: "string"},
age: {$type: "int"},
nick: {$type: "string"}
);
Stop program execution
db.pets.find({}).forEach(pet => if(!pet.nick){throw new Error("Pet " + pet._id + " has no nick!")} );

Related

updating a referenced collection in mongo

I a model schema(created from mongoose)(say collection A) has a reference to another collection(say collection B), then how do you upsert to this referenced collection so that they respect the referenced relationship.
e.g.
A = [...{
name: "a",
b_id: "EXISTENT_ID"
}]
B = [..., {
_id: "EXISTENT_ID",
"subject": "science",
"age": 23
}]
I tried to bulk update like this:
A.bulkWrite([{
updateOne: {
filter: {_id: "EXISTENT_ID"},
update: {$set: {"A.b_id": {subject: "maths", "age": 22}}},
upsert: true
}
}])
and I get a write error that says: Updating the path 'a.b_id' would create a conflict at 'b_id', I was expecting the associated reference to be updated, since the schema of A is defined as:
Schema({
name: String,
b_id: {
type: mongoose.Types.ObjectId,
ref: 'B',
required: true,
}
})
The reason why I'd have to bulkWrite A is because the record is to be created if it doesn't exist and the reference linked.
For now I'm using javascript to do things manually making multiple round trips to the database, but, I'd like to use queries if possible. Is something I'm doing currently wrong or is there a mechanism to do this sort of a referenced update? I'd like a header to proceed. Thanks in advance.

How to build a MongoDB query that combines two field temporarily?

I have a schema which has one field named ownerId and a field which is an array named participantIds. In the frontend users can select participants. I'm using these ids to filter documents by querying the participantIds with the $all operator and the list of participantsIds from the frontend. This is perfect except that the participantsIds in the document don't include the ownerId. I thought about using aggregate to add a new field which consists of a list like this one: [participantIds, ownerId] and then querying against this new field with $all and after that delete the field again since it isn't need in the frontend.
How would such a query look like or is there any better way to achieve this behavior? I'm really lost right now since I'm trying to implement this with mongo_dart for the last 3 hours.
This is how the schema looks like:
{
_id: ObjectId(),
title: 'Title of the Event',
startDate: '2020-09-09T00:00:00.000',
endDate: '2020-09-09T00:00:00.000',
startHour: 1,
durationHours: 1,
ownerId: '5f57ff55202b0e00065fbd10',
participantsIds: ['5f57ff55202b0e00065fbd14', '5f57ff55202b0e00065fbd15', '5f57ff55202b0e00065fbd13'],
classesIds: [],
categoriesIds: [],
roomsIds: [],
creationTime: '2020-09-10T16:42:14.966',
description: 'Some Desc'
}
Tl;dr I want to query documents with the $all operator on the participantsIds field but the ownerId should be included in this query.
What I want is instead of querying against:
participantsIds: ['5f57ff55202b0e00065fbd14', '5f57ff55202b0e00065fbd15', '5f57ff55202b0e00065fbd13']
I want to query against:
participantsIds: ['5f57ff55202b0e00065fbd14', '5f57ff55202b0e00065fbd15', '5f57ff55202b0e00065fbd13', '5f57ff55202b0e00065fbd10']
Having fun here, by the way, it's better to use Joe answer if you are doing the query frequently, or even better a "All" field on insertion.
Additional Notes: Use projection at the start/end, to get what you need
https://mongoplayground.net/p/UP_-IUGenGp
db.collection.aggregate([
{
"$addFields": {
"all": {
$setUnion: [
"$participantsIds",
[
"$ownerId"
]
]
}
}
},
{
$match: {
all: {
$all: [
"5f57ff55202b0e00065fbd14",
"5f57ff55202b0e00065fbd15",
"5f57ff55202b0e00065fbd13",
"5f57ff55202b0e00065fbd10"
]
}
}
}
])
Didn't fully understand what you want to do but maybe this helps:
db.collection.find({
ownerId: "5f57ff55202b0e00065fbd10",
participantsIds: {
$all: ['5f57ff55202b0e00065fbd14',
'5f57ff55202b0e00065fbd15',
'5f57ff55202b0e00065fbd13']
})
You could use the pipeline form of update to either add the owner to the participant list or add a new consolidated field:
db.collection.update({},[{$set:{
allParticipantsIds: {$setUnion: [
"$participantsIds",
["$ownerId"]
]}
}}])

Pull Document By ID from Triple Nested Array MongoDB

I'd like to be able to pull a document by id from a triple nested array of documents. DB looks something like this:
[{
type: "Foods",
fruit: [{
name: "Apple",
kinds: [{
name: "Red Delicious"
details: [{
_id: ObjectId("123123123"),
colors: ["red", "yellow"]
}]
}]
}]
}]
I'd like to be able to pull the document with the _id: 123123123. I've tried many different ways, but it always says it matches, but won't modify the document.
I've tried:
db.stuff.update({}, {$pull: {fruits: {kinds: {details: {_id: "123123123"}}}}}),
db.stuff.update({}, {$pull: {"fruits.kinds.details' : {_id: "123123123"}}}),
db.stuff.update({}, {$pull: {"fruits.$[].kinds.$[].details' : {_id: "123123123"}}})
But everytime it matches, but won't delete the document.
Please help.
The last attempt is correct however you need to fix two things: fruit instead of fruit (according to your sample data) and types needs to match so you have to convert string to ObjectId
db.stuff.update({}, {$pull: {"fruit.$[].kinds.$[].details' : {_id: mongoose.Types.ObjectId("123123123")}}})

MongoDB Event Driven Database Design

Goal
Zero Conflict System: Having this be a write-only system would save us from conflicts. People are creating and updating documents both offline and online and being able to figure out what update trumps what is important.
Deep Historical reference: I want to know at any period of time, what that document looked like. On top of that, I need a deep historical analysis of how each item changes over time.
I was thinking of the following architecture:
Reference Document
_id: "u12345",
type: "user",
createdAt: 1584450565 //UNIX TIMESTAMP
{
_id: "<random>"
type: "user-name-revision" //{type}-{key}-Revision
referenceId: "u12345"
value: "John Doe Boy"
updatedAt: 1584450565
}
{
_id: "<random>"
type: "user-name-revision"
referenceId: "u12345"
value: "John Doe"
updatedAt: 1584450566 // 1 second higher than the above
}
{
_id: "<random>"
type: "user-email-revision"
referenceId: "u12345"
value: "john#gmail.com"
updatedAt: 1584450565
}
If you want to get the user, you would:
Get all documents with referenceId of u12345.
Only get the most recent of each type
Then combine and output the user like so:
_id: "u12345",
type: "user",
createdAt: 1584450565,
name: "John Doe"
email: "john#gmail.com"
updatedAt: 1584450566 // highest timestamp
The only issue I see is if I wanted to sort all users by name let's say - If I have 1000 users, I don't see a clean way of doing this.
I was wondering if anyone had any suggestions for a pattern I could use. I'm using MongoDB so I have the power of that at my disposal.
You can try below aggregation.
Project the key field from the type field, sort by updatedAt and group to pick latest value and keep the reference and updatedAt.
Group all documents and merge the different key values and keep the updatedAt and post processing to format the document.
Lookup to pull in user value and followed by replaceRoot to merge the main document with lookup document.
Sort the documents by name.
db.collectionname.aggregate([
{"$addFields":{"key":{"$arrayElemAt":[{"$split":["$type","-"]},1]}}},
{"$sort":{"updatedAt":-1}},
{"$group":{
"_id":{"referenceId":"$referenceId","key:"$key"},
"value":{"$first":"$$ROOT"},
"referenceId":{"$first":"$referenceId"},
"updatedAt":{"$first":"$updatedAt"}
}},
{"$sort":{"updatedAt":-1}},
{"$group":{
"_id":"$_id.referenceId",
"data":{
"$mergeObjects":{"$arrayToObject":[[["$_id.key","$value"]]]}
},
"updatedAt":{"$first":"$updatedAt"}
}},
{"$addFields":{
"data.referenceId":"$referenceId",
"data.updatedAt":"$updatedAt"
}},
{"$project":{"data":1}},
{"$lookup":{
"from":"othercollectionname",
"localField":"data.referenceId",
"foreignField":"_id",
"as":"reference"
}},
{"$replaceRoot":{
"newRoot":{
"$mergeObjects":[{"$arrayElemAt":["$reference",0]},"$data"]}
}},
{"$project":{"_id":0}},
{"$sort":{"name":1}}
])
Alternate approach:
With all the transformation your query will be little slower. You can make few tweaks.
Input
{
_id: "<random>"
type: "user",
key: "name"
referenceId: "u12345"
value: "John Doe Boy"
updatedAt: 1584450565
}
Query
db.collectionname.aggregate([
{"$sort":{"updatedAt":-1}},
{"$group":{
"_id":{"referenceId":"$referenceId","key":"$key"},
"top":{"$first":"$$ROOT"}
}},
{"$sort":{"top.updatedAt":-1}},
{"$group":{
"_id":"$_id.referenceId",
"max":{"$max":{"$cond":[{"$eq":["$key", "name"]},"$top.value",null]}},
"key-values":{"$push":{"k":"$_id.key","v":"$top.value"}},
"updatedAt":{"$first":"$top.updatedAt"}
}},
{"$lookup":{
"from":"othercollectionname",
"localField":"_id",
"foreignField":"_id",
"as":"reference"
}},
{"$project":{"_id":0}},
{"$sort":{"max":1}}
])
We can refine our schema further to remove few other stages. We make sure we add the latest value at the end of array. Something like
Input
{
_id: "<random>"
type: "user",
key: "name"
referenceId: "u12345"
updates:[
{"value": "John Doe Boy", updatedAt: 1584450565},
{"value": "John Doe", updatedAt: 1584450566}
]
}
Query
db.collectionname.aggregate([
{"$addFields":{"latest":{"$arrayElemAt":["$updates",-1]}}},
{"$group":{
"_id":"$referenceId",
"max":{"$max":{"$cond":[{"$eq":["$key", "name"]},"$latest.value",null]}},
"updatedAt":{"$first":"$updatedAt"}
"key-values":{"$push":{"k":"$key","v":"$latest.value"}},
"updatedAt":{"$first":"$latest.updatedAt"}
}},
{"$lookup":{
"from":"othercollectionname",
"localField":"_id",
"foreignField":"_id",
"as":"reference"
}},
{"$project":{"_id":0}},
{"$sort":{"max":1}}
])
Your question does not have enough requirements for a specific answer, so I'll try to give an answer that should cover many cases.
I doubt you'll find detailed published use cases, however, I can give you a few tips from my personal experience.
High throughput:
If you are using a high throughput event streaming, it would be better to store you data in an event log, where IDs are not unique and there are no updates, only inserts. This could be done for instance with Kafka which is meant to be used for event streaming. You could then process the events in bulks into a searchable database e.g. MongoDB.
Low throughput:
For a lower throughput, you could insert documents directly into MongoDB, however, still only insert, not update data.
Storing data in a event-log style in MongoDB:
In both cases, within MongoDB, you'll want a random _id (e.g. UUID), so each event has a unique _id. To access a logical document, you'll need another field, e.g. docId, which along with eventTimestamp will be indexed (with eventTimestamp sorted desc for faster access to latest version).
Searching:
To search by other fields, you can use additional indexes, as necessary, however, if your searches take significant CPU time, make sure you only run them against secondary instances of MongoDB (secondayOnly), so that the event inserts won't get delayed. Make yourself familiar with MongoDB's aggregation pipeline.
To prevent invalid states due to out-of-order updates:
Since you want to enable updates, you should consider only saving the changes in each document, e.g. +1 to field A, set value to x for field B. In this case you will need to have an index with docId and ascending eventTimestamp instead and every now and then aggregate your events into summary documents in a different collection, to enable faster reading of the latest state. Use the eventTimestamp of the latest document per docId for the aggregated document, plus the aggregationTimestamp and versionCount. If at any point you receive a document with an eventTimestamp lower than the latest eventTimestamp in the aggregated collection, you'll need to partially recalculate that collection. In other cases, you can update the aggregated collection incrementually.
Use this you will get desired output, make sure you have indexed in the referencedId and updatedAt and enough memory to sort.
db.columnName.aggregate([
{
$match:{
referenceId:"u12345"
}
},
{
$project:{
type: { $arrayElemAt: [ {$split: [ "$type", "-" ]}, 0 ] },
referenceId:true,
createdAt:true,
name:true,
email:true,
updatedAt:true
}
},
},
{
$sort:{
updatedAt:-1
}
},
{
$group:{
_id:"$referenceId",
type:{
$first:"$type"
},
createdAt:{
$last:"$updatedAt"
},
name:{
$first:"$name"
},
email:{
$first:"$email"
},
updatedAt:{
$first:"$updatedAt"
}
}
}
])

MongoDB: Upsert document in array field

Suppose, I have the following database:
{
_id: 1,
name: 'Alice',
courses: [
{
_id: 'DB103',
credits: 6
},
{
_id: 'ML203',
credits: 4
}
]
},
{
_id: 2,
name: 'Bob',
courses: []
}
I now want to 'upsert' the document with the course id 'DB103' in both documents. Although the _id field should remain the same, the credits field value should change (i.e. to 4). In the first document, the respective field should be changed, in the second one, {_id: 'DB103', credits: 4} should be inserted into the courses array.
Is there any possibility in MongoDB to handle both cases?
Sure, I could search with $elemMatch in courses for 'DB103' and if I haven't found it, insert, otherwise update the value. But these are two steps and I would like to do both in just one.