I want to restrict access to sensitive attributes of my Users documents to a smaller set of clients. My current understanding is that there are two ways to split the data, so that we can make security rules for each part:
Create a Users collection and a top level SensitiveUserData collection that both use the same document ID, and only retrieve the SensitiveUserData for a user when needed and allowed.
Create a SensitiveUserData subcollection within the User document. This collection will always contain just a single document, but the ID won't matter.
Which of these (or a third) is preferred in general?
Neither of these approaches is pertinently better than the other, and both have valid use-cases. In the end it's a combination of personal preference, and a (typically evolving) insight into the use-cases of your app.
In many scenarios, using subcollections is preferred as it allows the data to be better spread out over the physical storage, which in turn helps throughput. But in this case I doubt that makes a difference, as you're likely to use the user ID as keys in both SensitiveUserData and Users collections, so they'll be similarly distributed anyway.
For me personally, I often end up with a top-level collection. But that may well be related to my long history of modeling data in the Firebase Realtime Database, where access permission is inherited, so you can't hide a subcollection there.
Related
I'm new to NoSQL modeling and I am currently confronted with a problem of which I do not know how to solve it.
Say I have a calendar and some people are allowed to see certain events. These people are categorised into 3 groups. In SQL, I would've given each event an integer and I would've made a bit-wise comparator. In NoSQL (Firestore in this case), I need to specify certain rules but, somehow I can't forbid someone to view a certain entry in a document. I have an idea on how to solve this, but it seems very... ineffective. Namely, make a collection where all the events are stored (only accessible by the admin) and based on the entries, update 3 documents in which the events are stored as well.
Is there a better method? I'm a bit new to this but it feels very bad.
Reads in Cloud Firestore are performed at the document level, meaning you either retrieve the full document, or you retrieve nothing. There is no way to retrieve a partial document. You cannot rely solely on the security rules to prevent users from reading a specific field.
If you want certain fields to be hidden from some users, then you have to put them in a separate document. You might consider creating a document in a private subcollection. And then write security rules that have different levels of access for the two collections.
You can refer Control Access to Specific Field for more information and example.
What are the general limits, if any, of a very large frequently used collection (loads of writes and reads to the collection at the same time) in firestore?
Say you have an app where a user can scroll through a list of users. The information about each user is stored in a document in a collection. Now imagine a lot of new users is constantly created, and a lot of users is scrolling through the current list of users at the same time e.g. reading from the collection. Furthermore a lot of users searches the collection of users using different fields (name, interests, pets, etc.) and uses the indexes of Firestore.
If all of these things happened at the same time, would it affect the performance of firestore clientwise? Would it be necessary to create multiple smaller collections containing users? My question is to be understanded as an extreme case.
If all of these things happened at the same time, would it affect the performance of firestore clientwise? Would it be necessary to create multiple smaller collections containing users?
Yes. And yes.
Firestore is susceptible to hotspots on writes. If many clients write to the same collection and/or to the same index, you may experience delays in processing of those writes. For the exact limits, see the documentation on limits on writes and transactions.
Creating separate collections to shard out those writes is the common solution for this, although you can also accept the delay in processing.
I'm writing an application that gathers statistics of users across multiple social networks accounts. I have a collection of users and I would like to store the statistics information of each user.
Now, I have two options:
Create a collection that stores users statistics documents, and add a reference object to each of the user documents that links it to the corresponding document in the statistics collection.
Embed a statistics document in each of the users document.
Besides for query performance (which I'm less concerned about):
what are the pros and cons of each of these approaches?
What should I take into account if I choose to use references rather than embedding the information inside the user document?
The shape of the data is determined by the application itself.
There’s a good chance that when you are working with the users data, you probably need statistics details.
The decision about what to put in the document is pretty much determined by how the data is used by the application.
The data that is used together as users documents is a good candidate to be pre-joined or embedded.
One of the limitations of this approach is the size of the document. It should be a maximum of 16 MB.
Another approach is to split data between multiple collections.
One of the limitations of this approach is that there is no constraint in MongoDB, so there are no foreign key constraints as well.
The database does not guarantee consistency of the data. Is it up to you as a programmer to take care that your data has no orphans.
Data from multiple collections could be joined by applying the lookup operator. But, a collection is a separate file on disk, so seeking on multiple collections means seeking from multiple files, and that is, as you are probably guessing, slow.
Generally speaking, embedded data is the preferable approach.
I am new to MongoDB so I apologize if these questions are simple.
I am developing an application that will track specific user interactions and put information about the user and the interactions into a MongoDB. There are several types of interactions that will all collect different information from the user.
My First question is: Should all of these interaction be in the same collection or should I separate them out by types (as you would do in a RDBMS)?
Additionally I would like to be able to look up:
All the interactions a specific user has made
All the users that have made a specific interaction
I was thinking of putting a Manual reference to an interaction document for each interaction a user performs in his document and a manual reference to the user that performed the interaction in each interaction document.
My second questions is: Does this "doubling up" of Manual references make sense or is there a better way to do this?
Any thoughts would be greatly appreciated.
Thank you!
My First question is: Should all of these interaction be in the same collection or should I separate them out by types (as you would do in a RDBMS)?
Without knowing too much about your data size, write amount, read amount, querying needs etc I would say; yes, all in one collection.
I am not sure if separating them out is how I would design this in a RDBMS either.
"Does this "doubling up" of Manual references make sense or is there a better way to do this?"
No it doesn't make sound databse design to me.
Putting a user_id on the interaction collection document sounds good enough.
So when you want to get all user interactions you just query by the interactions collection user_id.
When you want to do it the other way around you query for all interactions that fit your query area, pull out those user_ids and then do a $in clause on the user collection.
My First question is: Should all of these interaction be in the same collection or should I separate them out by types (as you would do in a RDBMS)?
The greatest advantage of a document store over a relational database is precisely that you can do that. Put all different interactions into one collection and don't be afraid to give them different sets of fields.
Additionally I would like to be able to look up:
All the interactions a specific user has made
I was thinking of putting a Manual reference to an interaction document for each interaction a user performs in his document and a manual reference to the user that performed the interaction in each interaction document.
Note that it's usually not a good idea to have documents which grow indefinitely. MongoDB has an upper limit for document size (per default:16MB). MongoDB isn't good at handling large documents, because documents are loaded completely into ram cache. When you have many large objects, not much will fit into the cache. Also, when documents grow, they sometimes need to be moved to another hard drive location, which slows down updates (that also screws with natural ordering, but you shouldn't rely on that anyway).
All the users that have made a specific interaction
Are you referring to a specific interaction instance (assuming that multiple users can be part of one interaction) or all users which already performed a specific interaction type?
In the latter case I would add an array of performed interaction types to the user document, because otherwise you would have to perform a join-like operation, which would either require a MapReduce or some application-sided logic.
The the first case I would, contrary to what Sammaye suggests, recommend to use not the _id field of the user collection, but rather the username. When you use an index with the unique flag on user.username, it's just as fast as searching by user._id and uniqueness is guaranteed.
The reason is that when you search for the interactions by a specific user, it's more likely that you know the username and not the id. When you only have the username and you are referencing the user by id, you first have to search the users collection to get the _id of the username, which is a additional database query.
This of course assumes that you don't always have the user._id at hand. When you do, you can of course use _id as reference.
We have two very similar data types that are both "users". The first one consists of active users and the other has users that are automatically extracted and pulled into our system and have a much lower priority (in terms of speed of access) than active users.
Every active user has the potential to bring in at least 1000 data-mined users. We'll be using the active users much more frequently and performance is our primary concern. With the data-mined users, performance is secondary but we will be storing large quantities of them.
Any input on how we should be handling this? Either one collection for every user (both active and data-mined), or two collections (one for active, one for data-mined users)?
Mongo is great for storing similar, but different, objects in the same collection as long as your app can handle it.
Are the data-mined users a child of the active users? If so, then you would probably want to keep them embedded in the active users documents. You dont need to access them all the time - MongoDB allows you to fetch parts of a document if you dont need the whole thing.
Will you be querying them differently? If so, you may want to keep them separate so that your indexes do not become bloated.
Will you be querying either of them with queries that will not hit indexes? If so you will want to separate them so that you dont need to do full collection scans every time.