How can I upsert a record and array element at the same time? - mongodb

That is meant to be read as a dual upsert operation, upsert the document then the array element.
So MongoDB is a denormalized store for me (we're event sourced) and one of the things I'm trying to deal with is the concurrent nature of that. The problem is this:
Events can come in out of order, so each update to the database need to be an upsert.
I need to be able to not only upsert the parent document but an element in an array property of that document.
For example:
If the document doesn't exist, create it. All events in this stream have the document's ID but only part of the information depending on the event.
If the document does exist, then update it. This is the easy part. The update command is just written as UpdateOneAsync and as an upsert.
If the event is actually to update a list, then that list element needs to be upserted. So if the document doesn't exist, it needs to be created and the list item will be upserted (resulting in an insert); if the document does exist, then we need to find the element and update it as an upsert, so if the element exists then it is updated otherwise it is inserted.
If at all possible, having it execute as a single atomic operation would be ideal, but if it can only be done in multiple steps, then so be it. I'm getting a number of mixed examples on the net due to the large change in the 2.x driver. Not sure what I'm looking for beyond the UpdateOneAsync. Currently using 2.4.x. Explained examples would be appreciated. TIA
Note:
Reiterating that this is a question regarding the MongoDB C# driver 2.4.x

Took some tinkering, but I got it.
var notificationData = new NotificationData
{
ReferenceId = e.ReferenceId,
NotificationId = e.NotificationId,
DeliveredDateUtc = e.SentDate.DateTime
};
var matchDocument = Builders<SurveyData>.Filter.Eq(s => s.SurveyId, e.EntityId);
// first upsert the document to make sure that you have a collection to write to
var surveyUpsert = new UpdateOneModel<SurveyData>(
matchDocument,
Builders<SurveyData>.Update
.SetOnInsert(f => f.SurveyId, e.EntityId)
.SetOnInsert(f => f.Notifications, new List<NotificationData>())){ IsUpsert = true};
// then push a new element if none of the existing elements match
var noMatchReferenceId = Builders<SurveyData>.Filter
.Not(Builders<SurveyData>.Filter.ElemMatch(s => s.Notifications, n => n.ReferenceId.Equals(e.ReferenceId)));
var insertNewNotification = new UpdateOneModel<SurveyData>(
matchDocument & noMatchReferenceId,
Builders<SurveyData>.Update
.Push(s => s.Notifications, notificationData));
// then update the element that does match the reference ID (if any)
var matchReferenceId = Builders<SurveyData>.Filter
.ElemMatch(s => s.Notifications, Builders<NotificationData>.Filter.Eq(n => n.ReferenceId, notificationData.ReferenceId));
var updateExistingNotification = new UpdateOneModel<SurveyData>(
matchDocument & matchReferenceId,
Builders<SurveyData>.Update
// apparently the mongo C# driver will convert any negative index into an index symbol ('$')
.Set(s => s.Notifications[-1].NotificationId, e.NotificationId)
.Set(s => s.Notifications[-1].DeliveredDateUtc, notificationData.DeliveredDateUtc));
// execute these as a batch and in order
var result = await _surveyRepository.DatabaseCollection
.BulkWriteAsync(
new []{ surveyUpsert, insertNewNotification, updateExistingNotification },
new BulkWriteOptions { IsOrdered = true })
.ConfigureAwait(false);
The post linked as being a dupe was absolutely helpful, but it was not the answer. There were a few things that needed to be discovered.
The 'second statement' in the linked example didn't work
correctly, at least when translated literally. To get it to work, I had to match on the
element and then invert the logic by wrapping it in the Not() filter.
In order to use 'this index' on the match, you have to use a
negative index on the array. As it turns out, the C# driver will
convert any negative index to the '$' character when the query is
rendered.
In order to ensure they are run in order, you must include bulk write
options with IsOrdered set to true.

Related

DynamoDB - How to upsert nested objects with updateItem

Hi I am newbie to dynamoDB. Below is the schema of the dynamo table
{
"user_id":1, // partition key
"dob":"1991-09-12", // sort key
"movies_watched":{
"1":{
"movie_name":"twilight",
"movie_released_year":"1990",
"movie_genre":"action"
},
"2":{
"movie_name":"harry potter",
"movie_released_year":"1996",
"movie_genre":"action"
},
"3":{
"movie_name":"lalaland",
"movie_released_year":"1998",
"movie_genre":"action"
},
"4":{
"movie_name":"serendipity",
"movie_released_year":"1999",
"movie_genre":"action"
}
}
..... 6 more attributes
}
I want to insert a new item if the item(that user id with dob) did not exist, otherwise add the movies to existing movies_watched map by checking if the movie is not already available the movies_watched map .
Currently, I am trying to use update(params) method.
Below is my approach:
function getInsertQuery (item) {
const exp = {
UpdateExpression: 'set',
ExpressionAttributeNames: {},
ExpressionAttributeValues: {}
}
Object.entries(item).forEach(([key, item]) => {
if (key !== 'user_id' && key !== 'dob' && key !== 'movies_watched') {
exp.UpdateExpression += ` #${key} = :${key},`
exp.ExpressionAttributeNames[`#${key}`] = key
exp.ExpressionAttributeValues[`:${key}`] = item
}
})
let i = 0
Object.entries(item. movies_watched).forEach(([key, item]) => {
exp.UpdateExpression += ` movies_watched.#uniqueID${i} = :uniqueID${i},`
exp.ExpressionAttributeNames[`#uniqueID${i}`] = key
exp.ExpressionAttributeValues[`:uniqueID${i}`] = item
i++
})
exp.UpdateExpression = exp.UpdateExpression.slice(0, -1)
return exp
}
The above method just creates update expression with expression names and values for all top level attributes as well as nested attributes (with document path).
It works well if the item is already available by updating movies_watched map. But throws exception if the item is not available and while inserting. Below is exception:
The document path provided in the update expression is invalid for update
However, I am still not sure how to check for duplicate movies in movies_watched map
Could someone guide me in right direction, any help is highly appreciated!
Thanks in advance
There is no way to do this, given your model, without reading an item from DDB before an update (at that point the process is trivial). If you don't want to impose this additional read capacity on your table for update, then you would need to re-design your data model:
You can change movies_watched to be a Set and hold references to movies. Caveat is that Set can contain only Numbers or Strings, thus you would have movie id or name or keep the data but as JSON Strings in your Set and then parse it back into JSON on read. With SET you can perform ADD operation on the movies_watched attribute. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html#Expressions.UpdateExpressions.ADD
You can go with single table design approach and have these movies watched as separate items with (PK:userId and SK:movie_id). To get a user you would perform a query and specify only PK=userId -> you will get a collection where one item is your user record and others are movies_watched. If you are new to DynamoDB and are learning the ropes, then I would suggest go with this approach. https://www.alexdebrie.com/posts/dynamodb-single-table/

Getting ElasticSearch document fields inside of loaded records in searchkick

Is it possible to get ElasticSearch document fields inside of loaded AR records?
Here is a gist that illustrates what I mean: https://gist.github.com/allomov/39c30905e94c646fb11637b45f43445d
In this case I want to avoid additional computation of total_price after getting response from ES. The solution that I currently see is to include the relationship and run total_price computation for each record, which is not so optimal way to perform this operation, as I see it.
result = Product.search("test", includes: :product_components).response
products_with_total_prices = result.map do |product|
{
product: product
total_price: product.product_components.map(&:price).compact.sum
}
end
Could you please tell if it is possible to mix ES document fields into AR loaded record?
As far as I'm aware it isn't possible to get a response that merges the document fields into the loaded record.
Usually I prefer to completely rely on the data in the indexed document where possible (using load: false as a search option), and only load the AR record(s) as a second step if necessary. For example:
result = Product.search("test", load: false).response
# If you also need AR records, could do something like:
product_ids = result.map(&:id)
products_by_id = {}
Product.where(id: product_ids).find_each do |ar_product|
products_by_id[ar_product.id] = ar_product
end
merged_result = result.map do |es_product|
es_product[:ar_product] = products_by_id[es_product.id]}
end
Additionally, it may be helpful to retrieve the document stored in the ES index for a specific record, which I would normally do by defining the following method in your Product class:
def es_document
return nil unless doc = Product.search_index.retrieve(self).presence
Hashie::Mash.new doc
end
You can use select: true and the with_hit method to get the record and the search document together. For your example:
result = Product.search("test", select: true)
products_with_total_prices =
result.with_hit.map do |product, hit|
{
product: product,
total_price: hit["_source"]["total_price"]
}
end

MongoDB: Performing actions on a field of a collection

I am very new to MongoDB (only got started out of interest last week). I am trying to figure out something and I'm not entirely sure of the terminology to go about searching for it. So I decided to make a SO question.
I created a collection called Students. Students has the fields id, name, undergrad(which is a boolean), classes(which is an array) and units(which is 3 times the number of classes the student has).
Now, I wanted to see how I could perform actions on a particular field of Students. What I did was, inserted a couple documents and purposefully did not include the units field. And I wanted to $set the units field forEach document/student that did not have the field. I did the following:
var studentDoc = db.students.find({units: {$exists:false}})
studentDoc.forEach(function(stu){
db.student.update({_id:stu._id}, {$set:{units:{$size:"$classes"}}})
}
)
Question 1: Is what I've done even remotely correct?
Question 2: When I type studentDoc after setting the var studentDoc, it doesn't print anything. But when I write
var studentDoc = db.students.find({units:{$exists:false}}).toArray()
it prints studentDoc as an array but still doesn't seem to do anything in the forEach loop.
Question 3: How do I $set the units field as 3 * (size of classes array)
I hope I have been clear in my question. I have tried searching on the MongoDB docs and google, but haven't had any luck (probably because of my lack of knowledge to search for the correct things).
Any help would be great! You can even point me in the right direction, and that'll be great!
Thank you in advance for all your help!!
I'm not sure why foreach not getting in. You can do the same with below working code.
for(i=0;i<studentDoc.length();i++){
var stud_id = studentDoc[i]._id;
var doc = db.students.findOne({"_id":stud_id})["classes"];
if(doc){
var len = doc.length;
db.students.update({"_id":stud_id},{$set:{"units":len*3}})
}
}
As far as I know, You can't use $size:"$" queries in update statement. you will get error like below:
The dollar ($) prefixed field '$size' in 'units.$size' is not valid for storage.
The return value of db.collection.find() is cursor.
In the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, the cursor is automatically iterated up to 20 times to access up to the first 20 documents that match the query. To iterate manually, assign the returned cursor to a variable using the var keyword. So
var studentDoc = db.students.find({units: {$exists:false}})
You should iterator studentDoc manually.
Whereas, the cursor.toArray() returns an array that contains all the documents from a cursor. The method iterates completely the cursor, loading all the documents into RAM and exhausting the cursor. Thus
var studentDoc = db.students.find({units:{$exists:false}}).toArray()
it prints studentDoc as an array.
If you want to use forEach, here is cursor.forEach().
db.students.find({units:{$exists:false}}).forEach()
var studentDoc = db.students.find({units: {$exists:false}})
here studentDoc is a cursor, it's not printable.
you can use forEach
studentDoc.forEach(printjson);
or iterator the cursor
while (studentDoc.hasNext()) {
printjson(studentDoc.next());
}

Composite views in couchbase

I'm new to Couchbase and am struggling to get a composite index to do what I want it to. The use-case is this:
I have a set of "Enumerations" being stored as documents
Each has a "last_updated" field which -- as you may have guessed -- stores the last time that the field was updated
I want to be able to show only those enumerations which have been updated since some given date but still sort the list by the name of the enumeration
I've created a Couchbase View like this:
function (doc, meta) {
var time_array;
if (doc.doc_type === "enum") {
if (doc.last_updated) {
time_array = doc.last_updated.split(/[- :]/);
} else {
time_array = [0,0,0,0,0,0];
}
for(var i=0; i<time_array.length; i++) { time_array[i] = parseInt(time_array[i], 10); }
time_array.unshift(meta.id);
emit(time_array, null);
}
}
I have one record that doesn't have the last_updated field set and therefore has it's time fields are all set to zero. I thought as a first test I could filter out that result and I put in the following:
startkey = ["a",2012,0,0,0,0,0]
endkey = ["Z",2014,0,0,0,0,0]
While the list is sorted by the 'id' it isn't filtering anything! Can anyone tell me what I'm doing wrong? Is there a better composite view to achieve these results?
In couchbase when you query view by startkey - endkey you're unable to filter results by 2 or more properties. Couchbase has only one index, so it will filter your results only by first param. So your query will be identical to query with:
startkey = ["a"]
endkey = ["Z"]
Here is a link to complete answer by Filipe Manana why it can't be filtered by those dates.
Here is a quote from it:
For composite keys (arrays), elements are compared from left to right and comparison finishes as soon as a element is different from the corresponding element in the other key (same as what happens when comparing strings à la memcmp() or strcmp()).
So if you want to have a view that filters by date, date array should go first in composite key.

How does Mongoengine decide if 2 EmbeddedDocuments are equal or not?

I have the following Mongoengine document:
class MyEmbed(EmbeddedDocument):
embedField = StringField(primary_key=True)
varField = StringField()
class TestDoc(Document):
myField = StringField()
embed_list = ListField(EmbeddedDocumentField(MyEmbed))
So I keep a list of embedded documents, to which I wish to add new documents if they don't exist already. The problem is that when I use the atomic update operator add_to_set things don't turn out the way I want them to.
This is what I am trying to do:
embed1 = models.MyEmbed(embedField="F1")
parent = models.TestDoc(myField="ParentField")
embed_list = []
embed_list.append(embed1)
parent.embed_list = embed_list
parent.save()
embed2 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed2)
The problem is that after doing this, I have in the DB a list of embedded documents with 2 elements. And what I want is to decide upon one field (embedField in this case) whether 2 EmbeddedDocuments are equal or not, and not by taking into account all the properties. My questions are:
What are the default criteria according to which Mongoengine decides whether 2 EmbeddedDocuments are equal or not?
How can I redefine the function that makes Mongoengine decide when 2 EmbeddedDocuments are equal or not?
Thanks!
The actual checking is done inside MongoDB and not mongoengine.
The object sent to mongodb should be the same, but this is where it gets tricky as with BSON order is important and in python with dictionaries its not. When converting to send to mongodb mongoengine just passes a dictionary. This is a bug - so I've added #296 and will fix for 0.8
See https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/document.py#L51 and https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/base/document.py#L52:
def __eq__(self, other):
if isinstance(other, self.__class__):
return self._data == other._data
return False
It compare dicts of Embedded documents data. So you can override this method.
If you look at Document update that calls QuerySet update (find add_to_set and addToSet) you can find that mongoengine doesnt't check exists document in list and just call mongo $addToSet operation: https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/queryset/transform.py#L156.
In your code you have document MyEmbed(embedField="F1") and try add another document MyEmbed(embedField="F1", varField="varField") so logic right: it add new document. If you try next code:
embed1 = models.MyEmbed(embedField="F1")
parent = models.TestDoc(myField="ParentField")
embed_list = []
embed_list.append(embed1)
parent.embed_list = embed_list
parent.save()
embed2 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed2)
embed3 = models.MyEmbed(embedField="F1")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed3)
embed4 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed4)
you can find that parent contains only embed1 and embed2.
So, to resolve you problem you can override __eq__ method and check document in list, but you must find another solution for update document list, because it have direct call of mongo method.