I'd like to know what cost a reads on Cloud Firestore.
For example, the app is loading an object of a collection and all fields into it. Does it cost 1 read for the object or does it cost 10 reads since there are 10 fields in this object (name, image link, description, uuid, createDates, price , price, price 3 etc) ?
If the answer is 10 (which I supposed it is), there is a possibility to reduce reads by deleting the fields I don't need when using my app (createdates, uuid for example).
Is there any problems doing that?
Also, can I group some of the fields together? (let's say price(string)=price1/price2/price3 and then in my app I say price1 is the first number of price, price2 is the one in the middle and so on.
Will this reduce the reads by 3 for the price?
Thank you very much for theses explanations
Firestore pricing is based on document (object) reads: https://cloud.google.com/firestore/pricing with a minimum charge of one document for every query, even if there are no results.
Since documents contain the key/value pair fields (https://cloud.google.com/firestore/docs/data-model) you should only get charged per document, not per field.
Of course, other costs may come into play, as the documentation notes that larger documents can be slower to retrieve (a cost of latency) and of course larger documents will use more network bandwidth, which can incur a cost in some cases.
There is other guidance on the pricing page about how to reduce costs for large result sets, via the use of cursors, but the costs are still based on documents.
Related
My application is used for creating production budgets for complex projects (construction, media productions etc.)
The structure of the budget is as follows:
The budget contains "sections",
the sections contain "accounts"
the accounts contains "subaccounts"
the subaccounts contain line items.
Line items have a number of fields, (units, rate, currency, tax etc.) and a calculated total
Or perhaps using Firestore to do these cascading calculations is the wrong approach? I should just load a single complex budget document into my app, do all the cacluations and updates on the clients, and then write back the entire budget as a single document when the user presses "save budget"?
Certain fields in line items may have alpha numeric codes which represent numeric values, which a user can use instead of a hard-coded number, e.g. user can enter "=build-weeks" and define that with a formula that evaluates to say "7" which is then used in the calculation of a total.
Line items bubble up their totals, so subaccounts have total equal to the sum of their line items,
Accounts total equals the total of their subaccounts,
sections total equals sum of accounts totals,
and budget total is total of section totals.
The question si how to aggregate into this data into documents comprising the budget.
Budgets may be sort of long, say 5,000 linesitems or more in total. Single accounts may have hundreds of line items.
Users will most likely look at a all of the line items for a given account, so it occurred to me
to make individual documents for sections, accounts and subaccounts, and make line items a map within a sub account.
The problem main concern I have with this approach is that when the user changes, say the exchange rate of currency of a line item, or changes the calculated value of a named value like "build-weeks" I will ahve to retrieve all the individual line items containing that curency or named value, recalculate the total, and then bubble up the changes through the hierarchy.
This seems not that complicated if each line item is its own document, I can just search the collection for the presence of the code in question, recalculate the line item, and use a cloud function to bubble up teh changes maybe.
But if all the lineitems are contained in an array of maps within each subaccount map item,
it seems like it will be quite tedious to find and change them when necessary..
On the other hand -- keeping these documents so small seems like a lot of document reads when somebody is reviewing a budget, or say, printing it, If somebody just clicks on a bunch of accounts, it might be 100's of reads per click to retrieve all the line items and hundreds or a thousand writes when somebody changes the value of a often used named value like "build-weeks".
Does anybody have any thoughts on the obvious "right" answer to this? Or does it just depend on what I want to optimize for - firestore costs, responsiveness of app, complexity of code?
From my stand point, there is no obvious answer to your problem and indeed it does depend on what you want to optimize for.
However there are a few points that you need to consider on your decision:
Documents in Firestore have a limit of 1Mb/Document;
Documents in Firestore have a limit of 20000 fields;
Queries are shallow, so you don't get data from subcollections on the same query;
For considerations 1 and 2, this means that if you choose the design you database to a big document containing everything, even though you said that your app will have lots of data, I doubt that it will be more than the limits mentioned, still, do consider those. Also, how necessary is it to get all the data at once, this could represent performance and user battery/data usage issues (if you are making a mobile app).
For consideration 3, it means that you would have to make many reads if you choose to get all the data for your sections divided in subdocuments, this will mean more cost to you but better performance for users.
To make the right call on this problem I suggest that you talk to possible users of your solution and understand the problem that you are trying to fix and what they expect of the app. Also, it might be interesting to take a look at the How to Structure Your Data and Maps, Arrays and Subcollections videos, as they explain in a more visual way how Firestore behaves and it could be helpful to antecipate problems that the approach you choose could cause.
Hope I was able to help with these considerations.
I am having some trouble which schema design to pick, i have a document which holds user info each user have a very big set of items that can be up to 20k items.
an item have a date and an id and 19 other fields and also an internal array which can have 20-30 items , and it can be modified,deleted and of course newly inserted and queried by any property that it holds.
so i came up with 2 possible schemas.
1.Putting everything into a single docment
{_id:ObjectId("") type:'user' name:'xxx' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
{_id:ObjectId("") type:'user' name:'yyy' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
2.Seperating the items from the user and letting eachitem have its own
document
{_id:ObjectId(""), type:'user', username:'xxx'}
{_id:ObjectId(""), type:'user', username:'yyy'}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
as i explained before a single user can have thousands of items and i have tens of users, internalitems can have 20-30 items, and it has 9 fields
considering that a single item can be queried by different users and can be modified only by the owner and another process.
if performance is really important which design would you pick?
if you pick neither of them what schema can you suggest?
on a side note i will be sharding and i have a single collection for everything.
I wouldn't recommend the first approach, there is a limit to the maximum document size:
"The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS."
Source: http://docs.mongodb.org/manual/reference/limits/
There is also a performance implication if you exceed the current allocated document space when updating (http://docs.mongodb.org/manual/core/write-performance/ "Document Growth").
Your first solution is susceptible to both of these issues.
The second one is (Disclaimer: In the case of 20-30 internal items) is less susceptible of reaching the limit but still might require reallocation when doing updates. I haven't had this issue with a similar scenario, so this might be the way to go. And you might wanna look into Record Padding(http://docs.mongodb.org/manual/core/record-padding/) for some more details.
And, if all else fails, you can always split the internal items out as well.
Hope this helps!
I'm building an application that uses MongoDB as a database. I have a lot of products, and I want to log what products a user looks at to the user's database entry. For instance, a user profile looks like this:
{
"email" : "foo#bar.com",
"name" : "John Snow",
"_id" : ObjectId("51ecbcc6896652a008000001"),
"productsViewed" : [
product1,
product2,
product3,
product4
]
}
I have two options here. I can log just the _id of each product, or I could log entire objects representing the product (name, price, ~100 word description, categories, that sort of thing). The difference in object size is 1 line of text per product vs about 30 lines per product.
I realise that this is probably a trivial amount of data to be concerned about, but if a user has 10,000 productsViewed entries, will the ~30x larger difference make any sort of impact? Logging more data is far more useful for my purposes but I'd like to avoid my database calls lagging if the user profile becomes quite large.
Question is: At what point (in character length, I guess?) is too much data to store with one MongoDB record?
16 Meg is the limitation for the entire document. This means that all strings etc have to fit within 16 meg. However, before that there are more limitation on your schema which you, yourself hint at:
but if a user has 10,000 productsViewed entries, will the ~30x larger difference make any sort of impact?
And the answer is yes. First off with the added data of the root user you will probably be over the 16 meg limit, however, further on from this the in-memory $pull, $push and other sub document operators might have a hard time keeping peformance up. You can sort of mitigate that problem by batching your subdocuments into groups of 100.
However, yet again, you have an even bigger problem: Fragmentation. Since MongoDB stores the record in a single contigeous space on the disk, hence it has settings like padding, you could see considerable fragmentation from odd sized record objects not being reused here.
I would personally say that you should factor off this relation to a separate collection.
we store apple app data in a database (http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html).
we want to optimize for one type of query: find all apps that meet some criteria. criteria: (1) avg rating of app; (2) number of app ratings; (3) devices supported by app; (4) countries where app is sold; (5) current price of app; and (6) date when app went free. the query should be as fast as possible. example query: "find all apps with > 600 ratings, averages 5 stars, supports iPads and iPhones, is sold in the US, and dropped their price to $0.00 two days ago."
based on the apple schema, there is price information for every country. assuming apple supports 100 countries, each app will have 100 prices -- one for each country. we also need to store the historical prices for each app, meaning an app with 10 price changes will have 1000 prices (assuming 100 countries).
three questions:
1) how do you advise we store the price data in mongo to make queries fast? right now, we're thinking of storing prices as an array of objects. each object consists of three elements: (1) date; (2) country; (3) price.
2) if we store price data as objects in an array, what do we need to do to make searches against price data very fast. again, the common price search is something like, "find all apps that dropped their price to $0.00 2 days again in the USA store."
3) any gotchas we should be mindful of in storing the data?
Personally, I would have a separate collection for the daily price data -- 1 record per day per app (the compound natural key), with that day's set of 100 numbers for that app. This way the records will never need to grow or relocate -- that's a big win. With proper indexes, most any query against this collection can be made to perform well. Keep the field names small for more efficient storage.
I would keep a separate collection for the app "master data" -- 1 record per app. In those records you can memoize the most recent date the app went free, a snapshot of the most recent by-country price vector, and similar snapshot values of any other "summary" data that may form the selection criteria for an app search. Aggregations to compute and record such values, should they may become costly, can then be performed in the background at convenient times.
Hope that's a help! Great that you're asking these questions up front. :)
This is related to my last question.
We have an app where we are storing large amounts of data per user. Because of the nature of data, previously we decided to create a new database for each user. This would have required a large no. of databases (probably millions) -- and as someone pointed out in a comment, that this indicated wrong design.
So we changed the design and now we are thinking about storing each user's entire information in one collection. This means one collection exactly maps to one user. Since there are 12,000 collections available per database, we can store 12,000 users per DB (and this limit could be increased).
But, now my question is -- is there any limit on the no. of documents a collection can have. Because of the way we need to store data per user, we expect to have a huge (tens of millions in extreme cases) no. of document per documents. Is that OK for MongoDB and design-wise?
EDIT
Thanks for the answers. I guess then it's OK to use large no of documents per collection.
The app is a specialized inventory control system. Each user has a large no. of little pieces of information related to them. Each piece of information has a category and some related stuff under that category. Moreover, no two collections need to see each other's data -- hence an index that touch more than one collection is not needed.
To adjust the number of collections/indexes you can have (~24k is the limit--~12k is what they say for collections because you have the _id index by default, but keep in mind, if you have more indexes on the collections, that will use namespace up as well), you can use the --nssize option when you start up mongod.
There are plenty of implementations around with billions of documents in a collection (and I'm sure there are several with trillions), so "tens of millions" should be fine. There are some numbers such as counts returned that have constraints of 64 bits, so after you hit 2^64 documents you might find some issues.
What sort of query and update load are you going to be looking at?
Your design still doesn't make much sense. Why store each user in a separate collection?
What indexes do you have on the data? If you are indexing by some field that has content that's common across all the users you'll get a significant saving in total index size by having a single collection with one index.
Index size is often the limiting factor not total database size when it comes to performance.
Why do you have so many documents per user? How large are they?
Craigslist put 2+ billion documents in MongoDB so that shouldn't be an issue if you have the hardware to support it and aren't being inefficient with your indexes.
If you posted more of your schema here you'd probably get better advice.