Firestore Pricing for deletion - google-cloud-firestore

If I perform a query to delete a document that does not exist in my firestore database, will I be charged for the delete operation, read operation or will not be charged at all?

Please refer to the documentation.
If you make a query, you will be charged a read for each document returned by the request, regardless of what you do with it.
If you attempt to delete the document from a query, you will be charged a delete if the delete actually succeeds in deleting that document. Document delete operations that don't actually delete a document are not billed. You will not be charged for an additional read, as described by the docs:
when a document is deleted, you are not charged for a read

It sounds like you will be charged based off of reading this. Even though they don't say "delete", this makes it seem like you will be charged: "There is a minimum charge of one document read for each query that you perform, even if the query returns no results." Now if you receive an error doing the operation you should not be charged. I would recommend testing this with very small amounts of data at first just to make sure though, the fee is only $0.02 per 100,000 deletions.

Related

Firestore pricing on duplicate listeners

If I create multiple onSnapshot listeners for the same document in different places in my code, will I be charged once (one document) or multiple times (for each listener).
Does it make sense to write a wrapper around Firestore that does this or is this built-in?
As per documentation:
Cloud Firestore allows you to listen to the results of a query and get
realtime updates when the query results change.
When you listen to the results of a query, you are charged for a read
each time a document in the result set is added or updated. You are
also charged for a read when a document is removed from the result set
because the document has changed. (In contrast, when a document is
deleted, you are not charged for a read.)
Also, if the listener is disconnected for more than 30 minutes (for
example, if the user goes offline), you will be charged for reads as
if you had issued a brand-new query.
What you decide to do afterwards will heavily depend on your use case and your application needs.

Is Google Firestore get() request for a non existing document, charged?

I recently realized that even if a Firestore query doesn't match any document, I will still be charged for 1 read.
In my case, there could be lots of queries for non-existing docs, and I want to avoid this cost.
In my case, the client already has (or can generate locally) the relevant document Id beforehand, but the client still doesn't know if this document exists or not.
So instead of querying and receiving the doc, I can do get(docId)
Question: Does the Firestore charge for replying error to a get() request of the non-existing document?
A get() call for a document that requires the server to read data is charged as a document read. Since the server needs to check whether the document exists, that is a charged read operation (as far as i know).
The documentation on Firestore pricing says:
Minimum charge for queries
There is a minimum charge of one document read for each query that you
perform, even if the query returns no results.
So it sounds like you will be charged. The important thing to realize is that the indexes the Firestore uses to manage your documents do take time and space to maintain, so if you make use of an index, it's reasonable to expect that it's going cost money because of resources consumed.

Firestore - Delete Documents without Read first

Is there a way to mass delete documents in a Collection without getting charged for a 'read' first?
Let's say I have a collection with 1000 documents. I decide I want to delete every document that's older than 1 day. I could use a Query to return a QuerySnapshot that returns [300] documents (DocumentReference). I don't need to read the documents contents (DocumentSnapshot), I just need to delete them.
From what I understand from the pricing documentation, because I returned a QuerySnapshot first, it will charge me for 300 reads, then 300 deletes. It doesn't delineate between "reading" a DocumentReference vs. "reading" data in a DocumentSnapshot.
Is there any way to avoid the 300 reads? I can understand that getting back these 300 documents involves effort on Firestore's end to figure out that appropriate subset of documents. But the arbitrary charge of a read no matter if you actually try to get document data (DocumentSnapshot) or not (e.g. just a DocumentReference to delete) seems like it should be possible to avoid.
To delete a document you must have or create a DocumentReference to that document. This requires that you know its complete and exact path to the document.
If you want to delete documents that match a certain condition without already knowing their paths, you will first need to query for those documents to determine those paths/DocumentReferences. This involves reading them. There is now way to avoid this at the moment.

mongodb read, copy, process and delete

I have to write an app that constantly polls a mongodb db collection in a given db. If it finds documents it reads them copies them to another db, does some extra processing and deletes them from the original db.
What is the most efficient way to implement this? What are the best practices?
Is it better to process one doc at a time: read one document, copy the document then delete it
or is it better to read all documents, copy all of them, then delete all of them?
What would be the best way to handle failures in the middle of one of these read, write deletes?
Bulk reads, inserts and deletes are almost always more performant than single document actions. But try to limit it to a maximum number of documents, e.g. in our setup 500 seemed to be optimal.
For handling errors, you could use the following pseudo transaction pattern:
findAndModify while setting "state":"pending" for all read documents
process documents
bulk insert
delete all documents with "state":"pending"
If something goes wrong in the processing part or the bulk insert, you can unlock all locked documents and try again.
A more elaborate example of these kind of psuedo transactions can be found in the MongoDB Tutorial:
http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/

Document DB and simulating ACID

See results at the end
I want to use a document DB (for various reasons) - probably CouchDB or MongoDB. However, I also need ACID on my multiple-document transactions.
However, I do plan on working with "add-only" model - changes are added as new documents (add is add, update is add a copy+transform data, delete is add empty document with the same ID + delete flag). Periodically, I'll run compaction on the database to remove non-current documents.
With that in mind, are there any holes in the following idea:
Maintain a collection for current transactions in progress. This collection will hold documents with transaction IDs (GUIDs + timestamp) of transactions in progress.
Atomicity:
On a transaction:
Add a document to the transactions in progress collection.
Add the new documents (add is add, update is copy+add, delete is add with ID and “deleted” flag).
Each added document will have the following management fields:
Transaction ID.
Previous document ID (linked list).
Remove the document added to the transactions in progress collection.
On transaction fail:
Remove all added documents
Remove the document from the transactions in progress collection.
Periodically:
Go over all transaction in progress, get ones that have been abandoned (>10 minutes?), remove the associated documents in the DB (index on transaction ID) and then remove the transaction in progress.
Read transaction consistency (read only committed transactions):
On data retrieval:
Load transactions in progress set.
Load needed documents.
For all documents, if the document transaction ID is in “transactions in progress” or later (using timestamp), load the previous document in the linked list (recursive).
It’s a bit like MVCC, a bit like Git. I set the retrieval context by the transactions I know that managed to finish before I started. I avoid single sequence (hence single execution) by keeping a list of “ongoing transactions” and not a “transaction revision”. And, of course, I avoid reading non-comitted transactions and provide rollback on conflict.
So - are there any holes in this? Will my performance suffer horribly?
Edit1: Please please please - don't hammer the "don't use document database if you need multi-document transactions". I know, I need a document database anyway for other reasons.
Edit2: added timestamp to avoid data from transactions that start after retrieval transaction has started. Possibly could change timestamp to sequence ID.
Edit3: Here's another algorithm I thought about - it may be better than the one above:
New algorithm - easier to understand (and possible correct this time :) )
Support structures:
transaction_support_tempalte {
_created-by-transaction: <txid>
_made-obsolete-by-transaction: <txid>
}
transaction_record { //
transaction_id: <txid>
timestamp: <tx timestamp>
updated_documents: {
[doc1_id, doc2_id...]
}
}
transaction_numer { //atomic counter - used for ordering transactions.
_id: "transaction_number"
next_transaction_id: 0 //initial.
}
Note: all IDs are model object IDs, not DB ids (don't confuse with logical IDs which are different).
DB ID - different for each document - but multiple DB documents are revisions of one model object.
Model object ID - same for all revisions of the model object.
Logical ID - client-facing ID.
First time setup:
1. Create the transaction_number document:
Commit process:
1. Get new transaction ID by atomic increment on the transaction number counter.
2. Insert a new transaction record with the transaction id, the timestamp and the updated documents.
3. Create the new version for each document. Make sure the _created-by-transaction is set.
4. Update the old version of each updated or deleted document as
"_made-obsolete-by-transaction" with the transaction id.
This is the time to detect conflicts! if seen a conflict, rollback.
Note - this can be done as find-and-modify rather then by serializing the entire document again.
5. Remove the transaction record.
Cleanup process:
1. Go over transaction record, sorted by id, ascending (oldest transaction first).
2. For each transaction, if it expired (by timestamp), do rollback(txid).
Rollback(txid) process:
1. Get the transaction record for the given transaction id.
2. For each document id in the "updated documents":
2.1 If the document exists and has "_made-obsolete-by-transaction" with
the correct transaction id, remove the _made-obsolete-by-transaction data.
3. For each document with the _created-by-transaction-id:
3.1 remove the document.
4. Remove the transaction record document.
Retrieval process:
1. Top-transaction-id = transaction ID counter.
2. Read all transactions from the transactions collection.
Current-transaction-ids[] = Get all transaction IDs.
3. Retrieve documents as needed. Always use "sort by transaction_id, desc" as last sort clause.
3.1 If a document "_created-by-transaction-id" is in the Current-transaction-ids[]
or is >= Top-transaction-id - ignore it (not yet committed).
3.2 If a document "_made-obsolete-by-transaction" is not in the Current-transaction-ids[]
and is < Top-transaction-id - ignore it (a newer version was committed).
4. We may have to retrieve more chunks to satisfy original requests if documents were ignored.
Was the document committed when we started?
If we see a document with transaction ID in the current executing transactions - it's a transaction that
started before we started the retrieval but was not yet committed at that time - so we don't want it.
If we see a document with transaction ID >= top transaction ID - it's a transaction that started after
we started the retrieval - so we don't want it.
Is the document up-to-date (latest version)?
If we see a document with made-obsolete that is not in the current transaction IDs (transactions started
before we started) and is < top transaction ID (transactions started after we started) - then
there was a transaction that finished commit in our past that made this document obsolete - so we don't want it.
Why is sorting not harmed?
Because we add the sort as a last clause, we'll always see the real sorting work first. For each real
sorting "bucket" we might get multiple documents that represent the model object at different versions.
However, the sort order between model objects remains.
Why doesn't the counter makes the transaction execute serially (one at ta time)?
Because this is not RDBMS - we don't really have transactions so we don't wait for the transaction
to commit as we do with "select for update".
Another transaction can make the atomic change as soon as we're done with it.
Compaction:
One in a while a compaction will have to take place - get all really old documents and remove them to another data store.
This shouldn't affect any running retrieval or transaction.
Optimization:
Put the conditions into the query itself.
Add transaction ID to all indexes.
Make sure documents with the same model object ID don't get sharded into different nodes.
What's the cost?
Assuming we want multiple document versions for history and audit anyway, the extra cost is
atomically updating the counter, creating the transaction record, "sealing" the previous version of each model object
(mark obsolete) and removing the transaction document. This shouldn't be too big.
Note that if the above assumption is not valid, the extra cost is quite high, especially for retrieval.
Results:
I've implemented the above algorithm (the revised one with minor changes). Functionally, it's working. However, the performance (at least over MongoDB with 3 nodes in master-slave replication topology, no fsync but replication required before "commit" ends) is atrocious. I'm constantly reading things I've just written to from different threads. I'm getting constant collection locks on the transactions collection and my indexes can't keep up with the constant rollover. Performance is capped at 20 TPS for tiny tiny transactions with 10 feeder threads.
In short - not a good general purpose solution.
without going into the specifics of your plan, I thought it might first be useful to go over mongoDB's support of ACID requirements.
Atomicity: Mongo supports atomic changes for individual documents. Typically, the most significant atomic operations are "$set" and findAndModify Some documentation on these operations and atomicity in mongoDB in general:
http://www.mongodb.org/display/DOCS/Atomic+Operations
[http://www.mongodb.org/display/DOCS/Updating#Updating-%24set][1]
http://www.mongodb.org/display/DOCS/findAndModify+Command
Consistency: Difficult to achieve and quite complex. I won't try to summarize in this post, but there is a great series of posts on the subject:
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
[http://blog.mongodb.org/post/498145601/on-distributed-consistency-part-2-some-eventual][2]
Isolation: Isolation in mongoDB does exist for documents, but not for any higher levels. Again, this is a complicated subject; besides the Atomic Operations link above, the best resource I have found is the following stack overflow thread:
Why doesn't MongoDB use fsync()? (the top answer is a bit of a goldmine for this subject in general, though some of the information regarding durability is out of date)
Durability: The main way that users ensure data durability is by using the getLastError command (see link below for more info) to confirm that a majority of nodes in a replica set have written the data before the call returns.
http://www.mongodb.org/display/DOCS/getLastError+Command#getLastErrorCommand-majority
http://docs.mongodb.org/manual/core/replication-internals/ (linked to in the above document)
Knowing all this about ACID in mongo, it would be very useful to look over some examples similar problems that have already been worked out in mongo. The two following links I expect will be really useful to you as they are very complete and right on subject.
Two-Phase Commits: http://cookbook.mongodb.org/patterns/perform-two-phase-commits/
Transactions for e-commerce work: http://www.slideshare.net/spf13/mongodb-ecommerce-and-transactions-10524960
Finally, I have to ask: Why do you want to have transactions? It is rare that users of mongoDB find they truly need ACID to achieve their goals. It might be worthwhile stepping back and trying to approach the problem from another perspective before you go ahead and implement a whole layer on top of mongo just to get transactions.