Firestore transaction based on non existent document (Collection level locking not available) - google-cloud-firestore

Based on this SO answer I came to know that firestore does not have collection level locking in a transaction.
In my case, I have to ensure that the username field in users collection is unique before I write to a collection.
For that, I write a transaction that does this:
Executes a query on users collection to check if a document exists where username=something
If it does exist, fail and return error from transaction
If it does not exist, just run the write operation for the userId I want to update/create.
Now the issue here is that if two clients simultaneously try to run this transaction, both might query the collection and since the collection is not locked, one client might insert/update a document in collection while other won't see it.
Is my assumption correct? And if yes, then how to deal with such scenarios?

What you're trying to do is actually not possible to do atomically, as it's not possible to transact safely on a document that you can't identify with an ID. The problem here is that a transaction is only "safe" if you can get() the specific document to add or modify. Since you can't get() a document using a field value in the document, you're at a loss.
If you want to ensure uniqueness of anything in Firestore, that uniqueness will need to be coded into the document ID itself. In the simplest case, you can use the username as the ID of a document in a new collection. If you do that, your transaction can simply get() the required document by username, check to see if it exists, then write the document if it doesn't. Else, the transaction can fail.
Bear in mind that because there are limitations to document IDs in Firestore, you might need to escape or encode that username if your usernames could possibly violate the rules.

An alternative to coding this data into the doc id is to use a separate collection as a sort of manual index. Security rules can then enforce uniqueness on the index. So something like this:
/docs/${documentId} => {uniqueField: "foo", ...}
/docmap/${uniqueField} => {docId: "doc2"}
The idea here is that one must first write the docmap entry containing the new doc id before they are allowed to writet he doc. Since the docmap is keyed on our unique field, it enforces uniqueness.
Security rules would look roughly like so:
function getPath(childPath) {
return path('/databases/'+database+'/documents/'+childPath)
}
// we can only write to our doc if the unique field exists in docmap/
// and matches our doc id
match /docs/{docid} {
let docMapPath = 'docmap/' + request.resource.data.uniqueField;
allow write: if getData(docMapPath).docId == docId;
//todo validate data schema
}
// It is only possible to add a uniqueField to the docmap
// if it doesn't already exist for another doc
// we also validate that the doc id matches our schema
match /docmap/{uniqueField} {
allow write: if resource.data.size() == 0 &&
request.resource.data.docId is string &&
request.resource.data.docId.size() < 100
}
And a write would look roughly like so:
const db = firebase.firestore();
db.doc('docmap/foo').set('doc2')
.then(() => db.doc('docs/doc2').set({uniqueField: 'foo'})
.then(doc => console.log("success"))
.catch(e => console.error(e));
You could also do this in a transaction or even a batch operation to make it atomic, but it's probably not necessary to add complexity to the process; the security rules will enforce the constraints.

Related

Is this firebase security rule redundant?

I have a collection of users, and I have a separate collection of usernames. In my collection usernames I store different usernames as doc_ids. That is, under collection usernames I can have doc_ids as first, second, third, and so on. Under each doc_id I store the following info:
{
ownerId: id,
dateUpdated: someDate
}
When I change some user's username, I execute a batch query, where I first delete the oldUsername doc, and then insert the newUsername doc with the appropriate fields. My question is regarding one of the security rules, related to the usernames collection. Do I need to check, if I already have such username (that is such doc_id). Do I need the following rule:
match /usernames/{username} {
allow create: if !exists(/databases/$(database)/documents/usernames/$(username))
}
I think this rule, is redundant since I am enforcing the uniqueness of collection ids, but I already saw it on a few other posts, so I wanted to check other people's opinions.
Yup, that rule does nothing as the create will only be triggered when the document doesn't exist yet. If the document already exists, its .update will be triggered.
This type of check is common in a .write, but not needed when you're using the more granular .create.

Does a wildcard have to be used for the document segment in a matching path of a Firestore collection group security rule?

db.collectionGroup('private')
.where('members', 'array-contains', userId)
.get()
.then(...)
This query fetches documents successfully if the relevant security rule is set like:
match /{path=**}/private/{document} {
allow read: if request.auth.uid in resource.data.members;
}
However, the similar rule below prevents the same query unexpectedly.
match /{path=**}/private/allowed {
allow read: if request.auth.uid in resource.data.members;
}
In this database,
private subcollections exist only under documents in the rooms collection.
Every private has only a single document with the ID "allowed".
This means /rooms/xxxxxxxx/private/allowed is the only possible path existing, where xxxxxxxx is an auto-assigned document ID.
Therefore specifying the path as /{path=**}/private/allowed looks correct to me.
In fact, "get" queries work in simulations in the playground, so is it a restriction only for collection group queries, or am I doing anything wrong?
FYI, more detailed database structure is described in another question of mine here.
Yes, it is required.
When you perform a collection group query, it's not possible to call out a specific document id in the query (e.g. "allowed"). The query is explicitly asking to consider all of the documents in all of the subcollections of the given name ("private"). Therefore, the rules must allow for those documents to be considered by adding the trailing wildcard.
You can certainly add a filter to the query if you want to get only certain documents with certain field values, but that filter can't be enforced in the rules.

When are mongodb indexes updated?

Question
Are mongodb indexes updated before the success of a write operation is reported to the application or do index updates run in the background? If they run in the background: is there a way to wait for an index update to complete?
Background
I have a document
person1obj = {
email: 'user#domain.tld',
[...]
}
in a people collection where a unique index is applied to the email field. Now I'd like to insert another document
person2obj = {
email: 'user#domain.tld',
[...]
}
Obviously, I have to change the email field of person1 before person2 can be inserted. With mongoose, the code looks like
mongoose.model('Person').create(person1obj, function (err, person1) {
// person1 has been saved to the db and 'user#domain.tld' is
// added to the *unique* email field index
// change email for person1 and save
person1.email = 'otheruser#domain.tld';
person1.save(function(err, person1) {
// person1 has been updated in the db
// QUESTION: is it guaranteed that 'user#domain.tld' has been removed from
// the index?
// inserting person2 could fail if the index has not yet been updated
mongoose.model('Person').create(person2obj, function (err, person2) {
// ...
});
});
});
I have seen a random fail of my unit tests with the error E11000 duplicate key error index which made me wonder if index updates run in the background.
This question probably is related to mongodb's write concern but I couldn't find any documentation on the actual process for index updates.
From the FAQ (emphasis mine):
How do write operations affect indexes?
Any write operation that alters an indexed field requires an update to the index in addition to the document itself. If you update a document that causes the document to grow beyond the allotted record size, then MongoDB must update all indexes that include this document as part of the update operation.
Therefore, if your application is write-heavy, creating too many indexes might affect performance.
At the very least in the case of unique indexes, the indexing does not run in the background. This is evident by the fact that when you try to write a new document with a duplicate key that is suppose to be unique you get a duplicate key error.
If indexing was to happen asynchronously in the background, Mongo would not be able to tell if the write actually succeeded. Thus the indexing must happen during the write sequence.
While I have no evidence for this (though Mongo is open source, if you have enough time you can look it up), I believe that all indexing is done during the write sequence, even if its not a unique index. It wouldn't make sense to have special logic for writes that affect a unique index.

How to insert new document only if it doesn't already exist in MongoDB

I have a collection of users with the following schema:
{
_id:ObjectId("123...."),
name:"user_name",
field1:"field1 value",
field2:"field2 value",
etc...
}
The users are looked up by the user.name, which must be unique. When a new user is added, I first perform a search and if no such user is found, I add the new user document to the collection. The operations of searching for the user and adding a new user, if not found, are not atomic, so it's possible, when multiple application servers are connect to the DB server, for two add_user requests to be received at the same time with the same user name, resulting in no such user being found for both add_user requests, which in turn results with two documents having the same "user.name". In fact this happened (due to a bug on the client) with just a single app server running NodeJS and using Async library.
I was thinking of using findAndModify, but that doesn't work, since I'm not simply updating a field (that exists or doesn't exist) of a document that already exists and can use upsert, but want to insert a new document only if the search criteria fails. I can't make the query to be not equal to "user.name", since it will find other users.
First of all, you should maintain a unique index on the name field of the users collection. This can be specified in the schema if you are using Mongoose or by using the statement:
collection.ensureIndex('name', {unique: true}, callback);
This will make sure that the name field remains unique and will solve the problem of concurrent requests as you have specified in your question. You do not require searching when this index is set.

Find the collection name from document._id in meteor (mongodb)

From the looks of the syntax for handling mongodb related things in meteor it seems that you always need to know the collection's name to update, insert, remove or anything to the document.
What I am wondering is if it's possible to get the collection's name from the _id field of a document in meteor.
Meaning if you have a document with the _id equal to TNTco3bHzoSFMXKJT. Now knowing the _id of the document you want to find which collection the document is located in. Is this possible through meteor's implementation of mongodb or vanilla mongodb?
As taken from the official docs:
idGeneration String
The method of generating the _id fields of new documents in this collection. Possible values:
'STRING': random strings
'MONGO': random Meteor.Collection.ObjectID values
The default id generation technique is 'STRING'.
Your best option would be to insert records within a pseudo transaction where the second step is to take the id and collection name to feed it into a reference collection. Then, you can do your lookups from that.
It would be pretty costly, though to construct your find's but might be a pattern worthwhile exploring if you are building an app where your users will be creating arbitrary data patterns.
You could accomplish this by doing a findOne on all of the collections:
var collectionById = function(id) {
return _.find(_.keys(this), function(name) {
if (this[name] instanceof Meteor.Collection) {
if (this[name].findOne(id)) {
return true;
}
}
});
};
I tested this on both the client and the server and it seemed to work when run in the global context.