I have a companies collection that has three companies(at the moment). There is also a second collection called campaigns. Each campaign belongs to a company. Each campaign has different fields depending on what company it is. Because mongodb is schemaless does that mean that I can add whatever fields I want to the collection. Like if I created a template form for each company could I save whatever template was selected.
Short answer
Yes.
Longer answer
While MongoDB does not enforce a schema (save for the required _id field), it is good to have at least some schema to your documents. If you store extremely different documents in a single collection, it can make it very challenging to handle all the different document forms in your code. This usually makes development much harder.
Typically companies have common data associated with them, so you will still have a common schema between them. But each company can expand upon that base schema. Just be sure to appropriately handle all of the potential null values for fields in your code.
Related
I'm new to NoSQL modeling and I am currently confronted with a problem of which I do not know how to solve it.
Say I have a calendar and some people are allowed to see certain events. These people are categorised into 3 groups. In SQL, I would've given each event an integer and I would've made a bit-wise comparator. In NoSQL (Firestore in this case), I need to specify certain rules but, somehow I can't forbid someone to view a certain entry in a document. I have an idea on how to solve this, but it seems very... ineffective. Namely, make a collection where all the events are stored (only accessible by the admin) and based on the entries, update 3 documents in which the events are stored as well.
Is there a better method? I'm a bit new to this but it feels very bad.
Reads in Cloud Firestore are performed at the document level, meaning you either retrieve the full document, or you retrieve nothing. There is no way to retrieve a partial document. You cannot rely solely on the security rules to prevent users from reading a specific field.
If you want certain fields to be hidden from some users, then you have to put them in a separate document. You might consider creating a document in a private subcollection. And then write security rules that have different levels of access for the two collections.
You can refer Control Access to Specific Field for more information and example.
Ok so the more and more I develop in Mongodb i start to wonder about the need for multiple collections vs having one large collection with indexes (since columns and fields can be different for each document unlike tabular data). If i am trying to develop in the most efficient way possible (meaning less code and reusable code) then can I use one collection for all documents and just index on a field. By having all documents in one collection with indexes then i can reuse all my form processing code and other code since it will all be inserting into the same collection.
For Example:
Lets say i am developing a contact manager and I have two types of contacts "individuals" and "businesses". My original thought was to create a collection called individuals and a second collection called businesses. But that was because im used to developing in sql where yes this would be appropriate since columns would be different for each table. The more i started to think about the flexibility of document dbs the more I started to think, "do I really need two collections for this?" If i just add a field to each document called "contact type" and index on that, do i really need two collections? Since the fields/columns in each document do not have to be the same for all (like in sql) then each document can have their own fields as long as i have a "document type" field and an index on that field.
So then i took that concept and started to think, if i only need one collection for "individuals" and "businesses" then do i even need a separate collection for "Users" or "Contact History" or any other data. In theory couldn't i build the entire solution in once collection and just have a field in each document that specifield the "type" and index on it such as "Users", "Individual Contact", "Business Contacts", "Contact History", etc, and if it is a document related to another document i can index on the "parent key/foreign" Id field...
This would allow me to code the front end dynamically since the form processing code would all be the same (inserting into the same collection). This would save a lot of coding but i want to make sure by using indexes and secondary indexes that the db would still run fast and not cause future problems as the collection grew. As you can imagine, if everything was in one collection there might be hundreds of thousands even millions of documents in this collection as the user base grows but it would have indexes and secondary indexes to optimize performance.
My question is: Is this a common method mongodb developers use? Why or why not? What are the downfalls, if any? If this is a commonly used method, please also give any positives to using this method. thank you.
This is a really big point in Mongo and the answer is a little bit more of an art than science. Having one collection full of gigantic documents is definitely an anti-pattern because it works against many of Mongo's features.
For instance, when retrieving documents, you can only retrieve a whole document out of a collection (not entirely true, but mostly). So if you have huge documents, you're retrieving huge documents each time. Also, having huge documents makes sharding less flexible since only the top level documents are indexed (and hence, sharded) in each collection. You can index values deep into a document, but the index value is associated with the top level document.
At the same time, going purely relational is also an anti-pattern because you've lost a lot of the referential integrity by going to Mongo in the first place. Also, all joins are done in application memory, so each one requires a full round-trip (slow).
So the answer is to do something in between. I'm thinking you'll probably want a collection for individuals and a different collection for businesses in this case. I say this because it seem like businesses have enough meta-data associated that it could bulk up a lot. (Also, I individual-business relationship seems like a many-to-many). However, an individual might have a Name object (with first and last properties). That would be a bad idea to make Name into a separate collection.
Some info from 10gen about schema design: http://www.mongodb.org/display/DOCS/Schema+Design
EDIT
Also, Mongo has limited support for transactions - in the form of atomic aggregates. When you insert an object into mongo, the entire object is either inserted or not inserted. So you're application domain requires consistency between certain objects, you probably want to keep them in the same document/collection.
For example, consider an application that requires that a User always has a Name object (containing FirstName, LastName, and MiddleInitial). If a User was somehow inserted with no corresponding Name, the data would be considered to be corrupted. In an RDBMS you would wrap a transaction around the operations to insert User and Name. In Mongo, we make sure Name is in the same document (aggregate) as User to achieve the same effect.
Your example is a little less clear, since I don't understand the business cases. One thing that does come to mind is that Mongo has excellent support for inheritance. It might make sense to put all users, individuals, and potentially businesses into the same collection (depending on how the application is modeled). If one individual has many contacts, you probably want individuals to have an array of IDs. If your application requires that you get a quick preview of contacts, you might consider duplicating part of an individual and storing an array of contact objects.
If you're used to RDBMS thinking, you probably think all your data always has to be consistent. The truth is, that's probably not entirely true. This concept of applying atomic aggregates to the domain has been preached heavily by the DDD community recently. When you look at your domain in depth, like your business users do, the consistency boundaries should become distinct.
MongoDB, and NoSQL in general, is about de-normalising data and about reducing joins. It goes against normal SQL thinking.
In your case, I don't see any reason why you would want to have separate collections because it introduces unnecessary complexity and performance overhead. Consider, for example, if you wanted to have a screen that displayed all contacts, in alphabetical order. If you have one single collection for contacts, then its really easy, but if you have two collections it becomes a more complicated proposition.
Where I would have multiple collections is if your application had multiple users storing contacts. I would then have one collection for each user. This makes it so easy to extract out that users contacts.
I'm creating a web application with a dynamic survey creation & submission component. I'm using MongoDB to store the schema of the form and the form submissions.
I can imagine organizing this in several different ways:
Having all form submissions and form schemas as documents in a single collection.
Have separate collections for all form schemas and all form submissions
Have a separate collection for all form schemas, and create a new collection for all submissions of a form for each schema.
I'm still researching this and I come from the world of RDBMS, I'm a noob to NoSQL databases. Anyone have any advice?
EDIT 1
Forgot to address embedding the responses as a property within the form schema document.
Having all form submissions and form schemas as documents in a single collection.
You will want to avoid this one (#1). The simple reason here is that the form submission has a different role than the form schema. Mixing these in the same collection will make it more difficult to query.
Have separate collections for all form schemas and all form submissions
To clarify, it sounds like you're suggesting two collections: schema andsubmission`.
This is a logical way to proceed. You will have one small schema collection and one large submission collection.
The key limitation will be the queries you make against that submission collection. Are you planning to query "across types"? Or are major queries centered about "submission type"?
If you end up including "submission type" on every query, then it makes sense to...
Have a separate collection for all form schemas, and create a new collection for all submissions of a form for each schema.
The reason for this is simply the indexes. If you have one collection, you will need an index on "type". So by making separate collections, you can save an index. However, if you ever end up needing the sharding features, this can make for lots of collections to manage.
Of course, you can work around this "extra index", by being creative with the _id. MongoDB has an auto-generated ObjectId that it will use by default, kind of like an auto-increment ID. However, you can override this and create a smarter _id, something like submissionid_userid.
My preference is honestly for the last option. But really #2 & #3 are both good options, really just an issue of trade-offs in terms of code complexity and management complexity.
I'd go for two collections: form and submissions.
This is the approach scales horizontally well as you only have 2 collections to worry about.
I agree with #Gates VP about providing custom _id rather than the default objectId as you are spared the need for an extra index.
On the submissions collection if you set the _id format to formID_userID to get all the submissions all you'd need to do is:
db.submissions.find({'_id': '^formID'})
The bonus here is the anchored regex will use the _id_ index - so its efficient.
For general reference and others stumbling upon this: there are some good presentations about schema design - that are worth checking out:
http://www.10gen.com/presentations/mongodb-tokyo-2012/basic-application-and-schema-design
http://www.10gen.com/presentations/mongosv-2011/schema-design-principles-and-practice
I'd like to code a web app where most of the sections are dependent on the user profile (for example different to-do lists per person etc) and I'd love to use MongoDB. I was thinking of creating about 10 embedabble documents for the main profile document and keep everything related to one user inside his own document.
I don't see a clear way of using foreign keys for mongodb, the only way would be to create a field to_do_id with the type of ObjectId for example, but they would be totally unrelated internally, just happen to have the same Ids I'd have to query for.
Is there a limit on the number of embedded document types inside a top level document that could degrade performance?
How do you guys solve the issue of having a central profile document that most of the documents have to relate to in presenting a view per person?
Do you use semi foreign keys inside MongoDb and have fields with ObjectId types that would have some other document's unique Id instead of embedding them?
I cannot feel what approach should be taken when. Thank you very much!
There is no special limit with respect to performance. The larger the document, though, the longer it takes to transmit over the wire. The whole document is always retrieved.
I do it with references. You can choose between simple manual references and the database DBRef as per this page: http://www.mongodb.org/display/DOCS/Database+References
The link above documents how to have references in a document in a semi-foreign key way. The DBRef might be good for what you are trying to do, but the simple manual reference is very efficient.
I am not sure a general rule of thumb exists for which reference approach is best. Since I use Java or Groovy mostly, I like the fact that I get a DBRef object returned. I can check for this datatype and use that to decide how to handle the reference in a generic way.
So I tend to use a simple manual reference for references to different documents in the same collection, and a DBRef for references across collections.
I hope that helps.
I'm slightly embarrassed to admit it, but I'm having trouble conceptualizing how to architect data in a non-relational world. Especially given that most document/KV stores have slightly different features.
I'd like to learn from a concrete example, but I haven't been able to find anyone discussing how you would architect, for example, a blog using CouchDB/Redis/MongoDB/Riak/etc.
There are a number of questions which I think are important:
Which bits of data should be denormalised (e.g. tags probably live with the document, but what about users)
How do you link between documents?
What's the best way to create aggregate views, especially ones which require sorting (such as a blog index)
First of all I think you would want to remove redis from the list as it is a key-value store instead of a document store. Riak is also a key-value store, but you it can be a document store with library like Ripple.
In brief, to model an application with document store is to figure out:
What data you would store in its own document and have another document relate to it. If that document is going to be used by many other documents, then it would make sense to model it in its own document. You also must consider about querying the documents. If you are going to query it often, it might be a good idea to store it in its own document as you would find it hard to query over embedded document.
For example, assuming you have multiple Blog instance, a Blog and Article should be in its own document eventhough an Article may be embedded inside Blog document.
Another example is User and Role. It makes make sense to have a separate document for these. In my case I often query over user and it would be easier if it is separated as its own document.
What data you would want to store (embed) inside another document. If that document only solely belongs to one document, then it 'might' be a good option to store it inside another document.
Comments sometimes would make more sense to be embedded inside another document
{ article : { comments : [{ content: 'yada yada', timestamp: '20/11/2010' }] } }
Another caveat you would want to consider is how big the size of the embedded document will be because in mongodb, the maximum size of embedded document is 5MB.
What data should be a plain Array. e.g:
Tags would make sense to be stored as an array. { article: { tags: ['news','bar'] } }
Or if you want to store multiple ids, i.e User with multiple roles { user: { role_ids: [1,2,3]}}
This is a brief overview about modelling with document store. Good luck.
Deciding which objects should be independent and which should be embedded as part of other objects is mostly a matter of balancing read/write performance/effort - If a child object is independent, updating it means changing only one document but when reading the parent object you have only ids and need additional queries to get the data. If the child object is embedded, all the data is right there when you read the parent document, but making a change requires finding all the documents that use that object.
Linking between documents isn't much different from SQL - you store an ID which is used to find the appropriate record. The key difference is that instead of filtering the child table to find records by parent id, you have a list of child ids in the parent document. For many-many relationships you would have a list of ids on both sides rather than a table in the middle.
Query capabilities vary a lot between platforms so there isn't a clear answer for how to approach this. However as a general rule you will usually be setting up views/indexes when the document is written rather than just storing the document and running ad-hoc queries later as you would with SQL.
Ryan Bates made a screencast a couple of weeks ago about mongoid and he uses the example of a blog application: http://railscasts.com/episodes/238-mongoid this might be a good place for you to get started.