Firestore, why use "update" instead of "set merge"? - google-cloud-firestore

set with merge will update fields in the document or create it if it doesn't exists
update will update fields but will fail if the document doesn't exist
Wouldn't it be much easier to always use set merges?
Are the prices slightly different?

Between set merge and update there is difference in the use case.
You may find detailed information regarding this on this post.
Regarding the pricing, as stated here:
Each set or update operation counts as a single write and is being billed according to the region.
=========================================================================
EDIT:
The choice of which operation to use is greatly depending on the use case, as if you use "set merge" for a batch update, your request will successfully update all existing documents but also create dummy documents for non existent ids, which sometimes is not what you want.
After investigating a bit further, we could add another difference:
set merge will always override the data with the data you pass, while
update is specifically designed to give you the possibility to perform a partial update of a document without the possibility of creating incomplete documents that your code isn't otherwise prepared to handle. Please check this answer, as well as this scenario.

The difference is that .set(data, {merge:true}) will update the document if it exists, or create the document if it doesn't.
.update() fails if the document doesn't exist.
But why does .update() still exist? Well, probably for backward compatibility. I believe .set() with merge:true has been introduced at a later date than .update(). As you have pointed out, set/merge is more versatile. I use it instead of .update() and instead of .add()

Related

Query Couchbase for missing document where no index exists

I need to query a Couchbase collection for all documents that are missing a particular field.
e.g. SELECT * FROM Bucket01 WHERE Field IS MISSING;
However, I have some annoying limitations:
There is no index on the field.
There is no shared field between affected documents that does have an index.
I am not allowed to create any new indexes.
I do not know any DocKeys.
There is no primary index on the collection.
Can this be accomplished? If so, how?
You could do a one off via Eventing
function OnUpdate(doc, meta) {
if (!doc.somefield) {
// the field is missing, log it or write
// the meta.id to another collection in KV.
log("Missing somefield id", meta.id);
}
}
Given the restrictions you've supplied, this sounds very nearly impossible.
The eventing option in Jon's answer is probably your best bet, but given the restrictions, I'm going to guess they won't allow that either.
Another option (again, probably going to be restricted) is to use and query the Analytics service. The Analytics service provides workload isolation, so that your queries would not affect the normal operations of your production workload.
This would require you adding at least one more node to the cluster with the Analytics service, setting up a data link, and then running your SQL++ against that Analytics data set.
Of course, if this is a temporary, one-off, then adding (and later removing) an Analytics node for a single query might be more trouble than it's worth. But since so many other paths have been closed off to you, it is an option.
Given the restrictions you've supplied, this sounds very nearly impossible.
The eventing option in Jon's answer is probably your best bet, but given the restrictions, I'm going to guess they won't allow that either.
Another option (again, probably going to be restricted) is to use a Couchbase (map/reduce) View. These are deprecated in Couchbase 7, but still technically available.
The syntax would look similar to the eventing function:
function(doc, meta)
{
if(!doc.someField) {
emit(meta.id, [doc.foo, doc.bar, doc.baz]);
}
}
Which would emit the ID of the documents that are missing, along with foo,bar,baz fields (if you need them).
I must stress that creating Views are generally not a good option when compared to SQL++, Eventing, K/V, etc. But if this is a one-off, temporary situation, then it might be okay.

MongoDB/Mongoose atomic read & write on single Document

I need to update a Document based on certain criteria with mongoDB/mongoose.
So I search for the Document in the Collection based on ID.
Once I have the document I check if it satisfies the condition I'm looking for (values of one the nested properties).
Once I've confirmed the Document satisfies the criteria, I perform certain actions on the document and then Save the document.
The whole process of finding the Document, checking for criteria, making adjustments and updating, takes some time to complete.
Problem is I can trigger the api that runs this process multiple times.
While the entire process is running for the first api call, I can call the api multiple times (can't control api call) and have the entire process running again for the same document.
So now I end up making the same updates twice on the Document.
If the first call runs through successfully, the next call will not cos the first update will ensure the criteria is no longer met. But since they are being called while the first one hasn't finished updating it ends up going through successfully.
Any way I can perform all the steps as one atomic action?
Well, I'm a year or two late to this party, and this answer I'm about to write has some issues depending on your use case, but for my situation, it has worked pretty well.
Here's the solution with Mongoose / JavaScript:
const findCriteria1 = { _id: theId };
myModel.findOne(findCriteria1, function(error, dataSet) {
if (dataSet.fieldInQuestion === valueIWantFieldToBe) {
const findCriteria2 = { _id: theId, fieldInQuestion: dataSet.fieldInQuestion };
const updateObject = { fieldInQuestion: updateValue };
myModel.findOneAndUpdate(findCriteria2, updateObject, function(error, dataSet) {
if (!error) {
console.log('success')
}
});
}
}
So basically, you find() the document with the value you want, and if it meets conditions, you do a findOneAndUpdate(), making sure that the document value did not change from what it was when you found it (as seen in findCriteria2).
One major issue with this solution is that it is possible for this operation to fail because the document value was updated by another user in between this operation's DB calls. This is unlikely, but may not be acceptable, especially if your DB is being pounded with frequent calls to the same document. A much better solution, if it exists, would be for a document lock and update queue, much like most SQL databases can do.
One way to help with that issue would be to wrap the whole solution I gave in a loop, and if the findOneAndUpdate fails, to try the loop again until it doesn't fail. You could set how many times you tried the loop... and this seems like a really bad idea, but you could do an infinite loop... of course, yeah, that could be really dangerous because it has the potential to totally disable the DB.
Another issue that my loop idea doesn't solve is that if you need a "first come, first served" model, that might not always be the case, as a DB request that thwarts the request before it may get to be the "first served".
And, a better idea altogether might just be to change how you model your data. Of course, this depends on what checks you need to run on your data, but you mentioned "values of nested properties" in your answer... what if those values were just in a seperate document and you could simply check what you needed to on the findOneAndUpdate() criteria parameter?
To operate on a consistent snapshot of the entire database, use a transaction with read concern snapshot.
Transactions do not magically prevent concurrency. withTransaction helper handles the mechanics of many cases of concurrent modifications transparently to the application, but you still need to understand concurrent operations on databases in general to write working/correct code.

Bad practice to willfully allow errors using updateData?

I only want to update a document if it exists and I don't want to use a transaction because it's not offline-capable. Therefore, I use updateData(). However, this task is common in the UX and is likely to fail (the document won't exist) half of the time. I shudder at the idea of allowing errors that I know will happen but I see no other way to preserve offline capability and update documents only when they exist. Is this frowned upon by Firestore?
Firestore doesn't really care if your update fails when a document doesn't exist.

Why does MongoDB no longer allow using $set and $unset with an empty document?

I just updated from MongoDB version 2.2 to version 2.6 and discovered that you can no longer use $set and $unset operators in the update method with an empty dictionary. For example, calling db.mytable.update({field:value}, {$set:{}}) used to just leave the document unmodified, but now it raises an error, saying that the value to $set can't be empty.
Can someone justify why this is an improvement over the old behavior? For me, it just creates an unnecessary need for extra logic, such as if statements to make sure the value isn't empty before attempting to update.
SERVER-12266 contains an official explanation. In particular this comment:
I spoke to Scott Hernandez about this today, and he explained the new strictness around empty modifiers is intended to alert users that were inadvertently sending over empty updates. [...]
Whether that's reasonable or not, I can't say. I suppose you could work around it by appending _id (or another constant field) to the $set value by default.
This was bassically a user change that:
is intended to alert users that were inadvertently sending over empty updates
https://jira.mongodb.org/browse/SERVER-12266?focusedCommentId=485843&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-485843

Mongoose versioning: when is it safe to disable it?

From the docs:
The versionKey is a property set on each document when first created
by Mongoose. This keys value contains the internal revision of the
document. The name of this document property is configurable. The
default is __v. If this conflicts with your application you can
configure as such:
[...]
Document versioning can also be disabled by setting the versionKey to
false. DO NOT disable versioning unless you know what you are doing.
But I'm curious, in which cases it should be safe to disable this feature?
The version key purpose is optimistic locking.
When enabled, the version value is atomically incremented whenever a document is updated.
This allow your application code to test if changes have been made between a fetch (bringing in version key 42 for example) and a consequent update (ensuring version value still is 42).
If version key has a different value (eg. 43 because an update has been made to the document), your application code can handle the concurrent modification.
The very same concept is often used in relational databases instead of pessimistic locking that can bring horrible performance. All decent ORMs provide such a feature. For example it's nicely described in ObjectDB documentation. It's an object database implemented in Java but the same concept applies.
The blog post linked in Behlül's comment demonstrate the optimistic locking usefulness with a concrete example, but only for arrays changes, see below.
On the opposite, here is a simple case where its useless: a user profile that can be edited by its owner ownly. Here you can get rid of optimistic locking and assume that the last edit always wins.
So, only you know if your application need optimistic locking or not. Use case by use case.
The Mongoose situation is somewhat special.
Optimistic locking is enabled only for arrays because the internal storage format uses positional index. This is the issue described by the blog post linked in the question's comment. I found the explanation given in the mongoose-orm mailing list pretty clear: if you need optimistic locking for other fields, you need to handle it yourself.
Here is a gist showing how to implement a retry strategy for an add operation. Again, how you want to handle it depends on your use cases but it should be enough to get you started.
I hope this clears things up.
Cheers