getting a doc read count of a sub collection in firebase? - google-cloud-firestore

I am trying to change the structure of part of my Db, it a part that is getting called pretty frequently, thus I am afraid it is going to cost more than one read per call due to my new nesting nature, I want to nest it under a sub doc due so I can protect the whole sub-col by a premium Until value, will this nesting increase the price of since I am querying deep to 1 level plus checking the premium Until value in rules ? would my single call now considered a double call one for the week and one for the weeks ??
// Call
let collectionRef = collection(db, 'Weeks', id, 'week');
// get a specific doc in that sub-collection.
// Db Model

The nesting of a document doesn't affect how much it costs to read it. One document read is always going to cost the same, no matter what the collection it comes from.

Related

MongoDB/Mongoose atomic read & write on single Document

I need to update a Document based on certain criteria with mongoDB/mongoose.
So I search for the Document in the Collection based on ID.
Once I have the document I check if it satisfies the condition I'm looking for (values of one the nested properties).
Once I've confirmed the Document satisfies the criteria, I perform certain actions on the document and then Save the document.
The whole process of finding the Document, checking for criteria, making adjustments and updating, takes some time to complete.
Problem is I can trigger the api that runs this process multiple times.
While the entire process is running for the first api call, I can call the api multiple times (can't control api call) and have the entire process running again for the same document.
So now I end up making the same updates twice on the Document.
If the first call runs through successfully, the next call will not cos the first update will ensure the criteria is no longer met. But since they are being called while the first one hasn't finished updating it ends up going through successfully.
Any way I can perform all the steps as one atomic action?
Well, I'm a year or two late to this party, and this answer I'm about to write has some issues depending on your use case, but for my situation, it has worked pretty well.
Here's the solution with Mongoose / JavaScript:
const findCriteria1 = { _id: theId };
myModel.findOne(findCriteria1, function(error, dataSet) {
if (dataSet.fieldInQuestion === valueIWantFieldToBe) {
const findCriteria2 = { _id: theId, fieldInQuestion: dataSet.fieldInQuestion };
const updateObject = { fieldInQuestion: updateValue };
myModel.findOneAndUpdate(findCriteria2, updateObject, function(error, dataSet) {
if (!error) {
console.log('success')
}
});
}
}
So basically, you find() the document with the value you want, and if it meets conditions, you do a findOneAndUpdate(), making sure that the document value did not change from what it was when you found it (as seen in findCriteria2).
One major issue with this solution is that it is possible for this operation to fail because the document value was updated by another user in between this operation's DB calls. This is unlikely, but may not be acceptable, especially if your DB is being pounded with frequent calls to the same document. A much better solution, if it exists, would be for a document lock and update queue, much like most SQL databases can do.
One way to help with that issue would be to wrap the whole solution I gave in a loop, and if the findOneAndUpdate fails, to try the loop again until it doesn't fail. You could set how many times you tried the loop... and this seems like a really bad idea, but you could do an infinite loop... of course, yeah, that could be really dangerous because it has the potential to totally disable the DB.
Another issue that my loop idea doesn't solve is that if you need a "first come, first served" model, that might not always be the case, as a DB request that thwarts the request before it may get to be the "first served".
And, a better idea altogether might just be to change how you model your data. Of course, this depends on what checks you need to run on your data, but you mentioned "values of nested properties" in your answer... what if those values were just in a seperate document and you could simply check what you needed to on the findOneAndUpdate() criteria parameter?
To operate on a consistent snapshot of the entire database, use a transaction with read concern snapshot.
Transactions do not magically prevent concurrency. withTransaction helper handles the mechanics of many cases of concurrent modifications transparently to the application, but you still need to understand concurrent operations on databases in general to write working/correct code.

Firestore Geopoint in an Arrays

I am working with an interesting scenario that I am not sure can work, or will work well. With my current project I am trying to find an efficient way of working with geopoints in firestore. The straight forward approach where a document can contain a geopoints field is pretty self explanatory and easy to query. However, I am having to work with a varying amount of geopoints for a single document (Article). The reason for this is because a specific piece of content may need to be available in more than one geographic area.
For example, An article may need to be available only in NYC, Denver and Seattle. Using a geopoint for each location and searching by radius, in general, is a pretty standard task if I only wanted the article to be available in Seattle, but now it needs to be available in two more places.
The solution as I see it currently is to use an array and fill it with geopoints. The structure would look something like this:
articleText (String),
sortTime (Timestamp),
tags (Array)
- ['tagA','tagB','tagC','tagD'],
availableLocations (Array)
- [(Geopoint), (Geopoint), (Geopoint), (Geopoint)]
Then performing a query to get all content within 10 miles of a specific Geopoint starting at a specific postTime.
What I don't know is if putting the geopoints in an array works well or should be avoided in favor or another data structure.
I have considered replicating an article document for each geopoint, but that does not scale very well if more than a handful of locations needs defining. I've also considered creating a "reference" collection where each point is a document that contains the documentID of an article, but this leads to reading each reference document then reading the actual document. Essentially two document reads for 1 piece of content, which can get expensive based on the Firestore pricing model, and may slow things down unnecessarily.
Am I approaching this in an acceptable way? And are there other methods that can work more efficiently?

nosql wishlist models - Struggle between reference and embedded document

I got a question about modeling wishlists using mongodb and mongoose. The idea is I need a user beeing able to have many different wishlists which contain many wishes, each wish making a reference to a single article
I was thinking about it and because a wishlist only belong to a single user I thought using embedded document for that.
Same for the wish beeing embedded to a wishlist.
So I got something like that
var UserSchema = new Schema({
...
wishlists: [wishlistSchema]
...
})
var WishlistSchema = new Schema({
...
wishes: [wishSchema]
...
})
but my question is what to do with the article ? should I use a reference or should I copy the article's data in an embedded document.
If I use embedded document I got an update problem. When the article's price change, to update every wish referencing this article become a struggle. But to access those wishes's article is a piece of cake.
If I use reference, The update is not a problem anymore but I got a probleme when I filter the wish depending on their article criteria ( when I filter the wishes depending on a price, category etc .. ).
I think the second way is probably the best but I don't know how if it's possible to build a query to filter the wish depending on the article's field. I tried a lot of things using population but nothing works very well when you need to populate depending on a nested object field. ( for exemple getting wishes where their article respond to certain conditions ).
Is this kind of query doable ?
Sry for the loooong question and for my bad English :/ but any advice would be great !
In my experience in dealing with NoSQL database (mongo, mainly), when designing a collection, do not think of the relations. Instead, think of how you would display, page, and retrieve the documents.
I would prefer embedding and updating multiple schema when there's a change, as opposed to doing a ref, for multiple reasons.
Get would be fast and easy and filter is not a problem (like you've said)
Retrieve operations usually happen a lot more often than updates and with proper indexing, you wouldn't really have to bother about performance.
It leverages on NoSQL's schema-less nature and you'll be less prone restructuring due to requirement changes (new sorting, new filters, etc)
Paging would be a lot less of a hassle, and UI would not be restricted with it's design with paging and limit.
Joining could become expensive. Redundant data might be a hassle to update but it's always better than not being able to display a data in a particular way because your schema is normalized and joining is difficult.
I'd say that the rule of thumb is that only split them when you do not need to display them together. It is not impossible to join them back if you do, but definitely more troublesome.

Detecting concurrent data modification of document between read and write

I'm interested in a scenario where a document is fetched from the database, some computations are run based on some external conditions, one of the fields of the document gets updated and then the document gets saved, all in a system that might have concurrent threads accessing the DB.
To make it easier to understand, here's a very simplistic example. Suppose I have the following document:
{
...
items_average: 1234,
last_10_items: [10,2187,2133, ...]
...
}
Suppose a new item (X) comes in, five things will need to be done:
read the document from the DB
remove the first (oldest) item in the last_10_items
add X to the end of the array
re-compute the average* and save it in items_average.
write the document to the DB
* NOTE: the average computation was chosen as a very simple example, but the question should take into account more complex operations based on data existing in the document and on new data (i.e. not something solvable with the $inc operator)
This certainly is something easy to implement in a single-threaded system, but in a concurrent system, if 2 threads would like to follow the above steps, inconsistencies might occur since both will update the last_10_items and items_average values without considering and/or overwriting the concurrent changes.
So, my question is how can such a scenario be handled? Is there a way to check or react-upon the fact that the underlying document was changed between steps 1 and 5? Is there such a thing as WATCH from redis or 'Concurrent Modification Error' from relational DBs?
Thanks
In database system,it uses a memory inspection and roll back scheme which is similar to transactional memory.
Briefly speaking, it simply monitors the share memory parts you specified and do something like compare and swap or load and link or test and set.
Therefore,if any memory content is changed during transaction,it will abort and try again until there is no conflict operation for that shared memory.
For example,GCC implements the following:
https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
type __sync_lock_test_and_set (type *ptr, type value, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
For more info about transactional memory,
http://en.wikipedia.org/wiki/Software_transactional_memory

MongoDB Schema Design With or Without Null Placeholders

I still getting used to using a schema-less document oriented database and I am wondering what a generally accepted practice is regarding schema designs within an application model.
Specifically I'm wondering whether it is a good practice to use enforce a schema within the application model when saving to mongodb like this:
{
_id: "foobar",
name: "John"
billing: {
address: "8237 Landeau Lane",
city: "Eden Prairie",
state: "MN",
postal: null
}
balance: null,
last_activity: null
}
versus only storing the fields that are used like this:
{
_id: "foobar",
name: "John"
billing: {
address: "8237 Landeau Lane",
city: "Eden Prairie",
state: "MN"
}
}
The former is self-descriptive which I like, while the latter makes no assumptions on the mutability of the model schema.
I like the first option because it makes it easy to see at a glance what fields are used by the model yet currently unspecified, but it seems like it would be a hassle to update every document to reflect a new schema design if I wanted to add an extra field, like favorite_color.
How do most veteran mongodb users handle this?
I would suggest second approach.
You can always see the intended structure if you look at your entity class in the source code. Or do you use dynamic language, and don't create an entity?
You save a lot of space per record, because you don't have to store null column names. This may not be expensive on small collections. But on large, with millions of records, I would even go to shorten the names of fields.
As you already mentioned. By specifying optional column names, you create a pattern, which, if you want to follow, you'll have to update all existing records when you add a new field. This is, again, a bad idea for a big DB.
In any case it all goes down your db size. If you don't target for many GBs or TBs of data, then both approaches are fine. But, if you predict, that your DB may grow really large, I would do anything to cut the size. Spending 30-40% of storage for column names is a bad idea.
I prefer the first option, it is easier to code within the application and requires much less state holders and functions to understand how things should work.
As for adding a new field over time you don't need to update all your records to support this new field like you would in SQL all you need to do is write the new field into your model application side and support this field being null if it is not returned from MongoDB.
A good example is in PHP.
I have a class of user at first with only one field, name
class User{
public $name;
}
6 months down the line and 60,000 users later I want to add, say, address. All I do is add that variable to my application model:
class User{
public $name;
public $address = array();
}
This now works exactly like adding a new null field to SQL without having to actually add it to every row on-demand.
It is a very reactive design, don't update what you don't need to. If that row gets used it will get updated, if not then who cares.
So eventually your rows actually become a mix and match between option 1 and 2 but it is really a reactive option 1.
Edit
On the storage side you have also got to think of pre-allocation and movement of documents.
Say the amount of a set record now is only a third of the doc but then suddenly, from the user updating the doc with all of the fields, you now have extra fragmentation from the movement of your docs.
Normally when you are defining a schema like this you are defining one that will eventually grow and apply to that user in most cases (much like an SQL schema does).
This is something to take into consideration that even though storage might be lower in the short term it could cause fragmentation and slow querying due to that fragmentation and you could easily find yourself having to run compacts or repairDbs due to the problems you now face.
I should mention that both of those functions I said above are not designed to be run regularly and have a significant performance problem to them while they run on a production environment.
So really with the structure above you don't need to add a new field across all documents and you will most likely get less movement and problems in the long run.
You can fix the performance problems of consistently growing documents by using power of 2 sizes padding, but then this is collection wide which means that even your fully filled documents will use up at least double their previous space and you small documents will probably be using as much space as your full documents would have on a padding factor of 1.
Aka you lose space, not gain it.