Difference between .where() and .doc() related to number of reads - google-cloud-firestore

If I get a document by .collection('users').where('uid', isEqualTo, widget.uid) or by .collection('users').doc(widget.uid) will I read the database the same number of times?

You are charged one read for every document that matches your query when it is executed. It does not matter how you make that query - what matters is the number of documents returned to the app.
If your where query returns exactly one document, then it is not any different in price or performance than one using doc.

The former returns a QueryFieldFilterConstraint while the latter returns a DocumentReference. The DocumentReference is preferable because it can be used to write, read, or listen to the location.
Naturally, a query must read every id in the referenced collection (this is probably indexed) while a reference points to a single id. In terms of pricing, aggregateQueries documentation is actually returning a page not found error at the moment?
See Understand Cloud Firestore billing
For aggregation queries such as count(), you are charged one document
read for each batch of up to 1000 index entries matched by the query.
For aggregation queries that match 0 index entries, there is a minimum
charge of one document read.
For example, count() operations that match between 0 and 1000 index
entries are billed for one document read. For a count() operation that
matches 1500 index entries, you are billed 2 document reads.

Related

Is sharding necessary for subcollections in Cloud Firestore?

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I am saving sequential timestamps in a subcollection.
I would like to know if I need a shard field in this specific case to increase the maximum writes per second?
I am only using "normal" collection indexes, no collection group index.
Some additional explanations:
I have a top level collection called "companies" and under every document is a subcollection called "orders". Every order document has a timestamp "created". This field is indexed and I need this index. These orders could be created very frequently. I am sure that the 500 writes per second limit would apply to this constellation. But I wonder if every subcollection "orders" would have its own limit of 500 writes per second or if all subcollections share one limit. I could add a shard field to avoid the write limit as stated in the documentation but if every subcollection would get its own limit this would not be necessary for my specific case. 500 write per second per subcollection should be more than enough. I think that every subcollection has its own index as long as I am not using a collection group index. And therefore the server should be able to split the data across multiple servers if necessary. But maybe I am wrong. I can't find any concrete informations on that in the documentation or in the internet.
Screenshots from database:
root collection companies
subcollection orders

Firebase read calculations for document queries?

Quick question about how firestore reads are calculated. Say I have a collection with 100 items in it, and I do
citiesRef.order(by: "name").limit(to: 3)
This would technically have to look at all 100 items, order them by name, and then return 3. Would this count for 3 reads or would it count for 100 reads, since we're looking at 100 items?
Thanks.
If the above query returns 3 documents then it would count as 3 reads.
You are charged for each document read, write, and delete that you perform with Cloud Firestore.
Charges for writes and deletes are straightforward. For writes, each set or update operation counts a single write.
Charges for reads have some nuances that you should keep in mind. The following sections explain these nuances in detail
https://firebase.google.com/docs/firestore/pricing#operations

What does nscannedObjects = 0 actually mean?

As far as I understood, nscannedObjects entry in the explain() method means the number of documents that MongoDB needed to go to find in the disk.
My question is: when this value is 0, what this actually mean besides the explanation above? Does MongoDB keep a cache with some documents stored there?
nscannedObjects=0 means that there was no fetching or filtering to satisfy your query, the query was resolved solely based on indexes. So for example if you were to query for {_id:10} and there were no matching documents you would get nscannedObjects=0.
It has nothing to do with the data being in memory, there is no such distinction with the query plan.
Note that in MongoDB 3.0 and later nscanned and nscannedObjects are now called totalKeysExamined and totalDocsExamined, which is a little more self-explanatory.
Mongo is a document database, which means that it can interpret the structure of the stored documents (unlike for example key-value stores).
One particular advantage of that approach is that you can build indices on the documents in the database.
Index is a data structure (usually a variant of b-tree), which allows for fast searching of documents basing on some of their attributes (for example id (!= _id) or some other distinctive feature). These are usually stored in memory, allowing very fast access to them.
When you search for documents basing on indexed attributes (let's say id > 50), then mongo doesn't need to fetch the document from memory/disk/whatever - it can see which documents match the criteria basing solely on the index (note that fetching something from disk is several orders of magnitude slower than memory lookup, even with no cache). The only time it actually goes to the disk is when you need to fetch the document for further processing (and which is not covered by the statistic you cited).
Indices are crucial to achieve high performance, but also have drawbacks (for example rarely used index can slow down inserts and not be worth it - after each insertion the index has to be updated).

How to paginate query results in NoSQL databases when there are no unique fields included in the projection?

I've heard using MongoDB's skip() to batch query results is a bad idea because it can lead to the server becoming IO bound as it has to 'walk through' all the results. I want to only return a maximum of 200 documents at a time, and then the user will be able to fetch the next 200 if they want (assuming they haven't limited it to less).
Initially I read up on paginating results and most things I read said the easiest way in MongoDB at least is to modify the query criteria to simulate skipping through.
For example if a field called account number on the last document is 28022004, then the next query should have "accNumber > 28022004" in the criteria. But what if there are no unique fields included in the projection? What if the user wants to sort the records by a non-unique field?

Limit the number of documents in a mongodb collection , without FIFO policy

I'm building an application to handle ticket sales and expect to have really high demand. I want to try using MongoDB with multiple concurrent client nodes serving a node.js website (and gracefully handle failure of clients).
I've read "Limit the number of documents in a collection in mongodb" (which is completely unrelated) and "Is there a way to limit the number of records in certain collection" (but that talks about capped collections, where the new documents overwrite the oldest documents).
Is it possible to limit the number of documents in a collection to some maximum size, and have documents after that limit just be rejected. The simple example is adding ticket sales to the database, then failing if all the tickets are already sold out.
I considered having a NumberRemaining document, which I could atomically decerement until it reaches 0 but that leaves me with a problem if a node crashes between decrementing that number, and saving the purchase of the ticket.
Store the tickets in a single MongoDB document. As you can only atomically set one document at a time, you shouldn't have a problem with document dependencies that could have been solved by using a traditional transactional database system.
As a document can be up to 16MB, by storing only a ticket_id in a master document, you should be able to store plenty of tickets without needing to do any extra complex document management. While it could introduce a hot spot, the document likely won't be very large. If it does get large, you could use more than one document (by splitting them into multiple documents as one document "fills", activate another).
If that doesn't work, 10gen has a pattern that might fit.
My only solution so far (I'm hoping someone can improve on this):
Insert documents into an un-capped collection as they arrive. Keep the implicit _id value of ObjectID, which can be sorted and will therefore order the documents by when they were added.
Run all queries ordered by _id and limited to the max number of documents.
To determine whether an insert was "successful", run an additional query that checks that the newly inserted document is within the maximum number of documents.
My solution was: I use an extra count variable in another collection. This collection has a validation rule that avoids count variables to become negative. Count variable should always be non negative integer number.
"count": { "$gte": 0 }
The algorithm was simple. Decrement the count by one. If it succeed insert the document. If it fails it means there is no space left.
Vice versa for deletion.
Also you can use transactions to prevent failures(Count is decremented but service is failed just before insertion operation).