Quote from MongoDB Architecture Guide
Developers are working with applications that create massive volumes
of new, rapidly changing data types — structured, semi-structured,
unstructured and polymorphic data.
what are polymorphic data? Please explain for a guy with SQL background.
Document-oriented database are schemaless. It mean that databases don't care about schema of the data. But each document has own schema / structure. Polymorphic data means that in one collection you have many versions of document schema (e.g. different field type, fields that occur in some documents etc.).
For example in below documents email field is string or array of string:
{
"user": "Anna",
"email" : "anna#gmail.com"
}
{
"user": "Jon",
"email" : [
"jon#gmail.com",
"jon#yahoo.com"
]
}
Related
If I have a collection and put some data in it, just like
{
"_id" : ObjectId("xxxxxxxxxxxxxxxxxxxxx"),
"name" : "Tom",
"age" : 22,
"job":"engineer"
}
when I use Mysql ,I can use command 'desc table' to list all fields ,so when I switch to mongodb ,how can I list all field what I expected like '_id,name,age,job'
db.collection.insertOne({"name" : "Tom","age" : 22,"job":"engineer"})
this is the simple query in command prompt run it, it will add the data exactly the above ....
That's one of the major conceptional difference between relational databases like MySQL and NoSQL databases as MongoDB.
In relational databases you have tables with columns. Each record in such table has these columns.
In NoSQL databases you have just documents and (by default) these documents do not enforce any particular structure. Thus there is no way to retrieve the field structure of such documents as in principle it can be different for each document.
I use mongo for saving my attributes , now the question is i want to save the attributes like this
{
title:"new product"
...
...
attr:[
{
name:"color",value:"red"
},
{
name:"size",value:6
},
....
]
}
now if i create index for value field is it bad index design?
should i separate the integer fields from string fields and then index separately ?
MongoDB encapsulates features of dynamic schema which supports variance of different data types across documents in MongoDB collection. Regarding concern mentioned above to store values with different data types in one field ,I suggest that it can be stored and its main purpose for which MongoDB is designed.
I have a relational SQL DB that's being changed to MongoDB. In SQL there are 3 tables that are relevant: Farm, Division, Wombat (names and purpose changed for this question). There's also a Farmer table which is the equivalent of a users table.
Using Mongoose I've come up with this new schema:
var mongoose = require('mongoose');
var farmSchema = new mongoose.Schema({
// reference to the farmer collection's _id key
farmerId: mongoose.Schema.ObjectId,
name: String, // name of farm
division: [{
divisionId: mongoose.Schema.ObjectId,
name: String,
wombats: [{
wombatId: mongoose.Schema.ObjectId,
name: String,
weight: Number
}]
}]
});
Each of the (now) nested collections has a unique field in it. This will allow me to use Ajax to send just the uniqueId and the weight (for example) to adjust that value instead of updating the entire document when only the weight changes.
This feels like an incorrect SQL adaptation for MongoDB. Is there a better way to do this?
In general, I believe that people tend to embed way too much when using MongoDB.
The most important argument is that having different writers to the same objects makes things a lot more complicated. Working with arrays and embedded objects can be tricky and some modifications are impossible, for instance because there's no positional operator matching in nested arrays.
For your particular scenario, take note that unique array keys might not behave as expected, and that behavior might change in future releases.
It's often desirable to opt for a simple SQL-like schema such as
Farm {
_id : ObjectId("...")
}
Division {
_id : ObjectId("..."),
FarmId : ObjectId("..."),
...
}
Wombat {
_id : ObjectId("..."),
DivisionId : ObjectId("..."),
...
}
Whether embedding is the right approach or not very much depends on usage patterns, data size, concurrent writes, etc. - a key difference to SQL is that there is no one right way to model 1:n or n:n relationships, so you'll have to carefully weigh the pros and cons for each scenario. In my experience, having a unique ID is a pretty strong indicator that the document should be a 'first-class citizen' and have its own collection.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am creating a project that requires data storage and I am considering useing MongoDB but am having trouble finding the logical / optimal way of organising the data
My simplified data needs to store Place information like so:
{place_city : "London",
place_owner: "Tim",
place_name: "Big Ben"}
{place_city : "Paris",
place_owner: "Tim",
place_name: "Eifel Tower"}
{place_city : "Paris",
place_owner: "Ben",
place_name: "The Louvre"}
And here are the main operations I need
Retrieve all my places
Retrieve all my friends places
Retrieve all my friends cities
If I use mongoDB a collection document max size is 16meg right? If that is correct then I can't store all the information in a PLACES similar to my example above right?
I would probably need to create a "OWNER" collection? like so:
{
owner: "Tim",
cities: [ {
name: "London",
places:[ {name:"Big Ben"}]
},
{
name: "Paris",
places:[ {name:"Eifel Tower"}, {name: "The Louvre"}]
}
]
}
but the problem now is that Retrieving my friends places becomes cumbersome and my friends cities even more so....
any advice from a cunning DB architect woudl be much appreciated.
The size limit of 16MB is per document, not per collection.
{place_city : "London", place_owner: "Tim", place_name: "Big Ben"}
is a very little document, so don't worry. The design of your collections depends heavily on how you query your data.
The data size limitation is per document and not per collection. Collections can easily become several GB (or even TB) large.
I would suggest you keep your data as simple as you have, like:
{place_city : "London",
place_owner: "Tim",
place_name: "Big Ben"}
{place_city : "Paris",
place_owner: "Tim",
place_name: "Eifel Tower"}
{place_city : "Paris",
place_owner: "Ben",
place_name: "The Louvre"}
I am thinking that friends are stored like this:
{
username: "Ben",
friends: [ "Tim", "Bob" ]
}
Then your three queries can be done as:
All your places: db.places.find( { place_owner: "Ben" } );
All your friends' places with two queries (pseudo code):
friends = db.friends.find( { username: "Ben" } );
// friends = [ "Tim", "Bob" ], you do need to do some code to make this change
db.places.find( { place_owner: { $in: [ "Tim", "Bob" ] } } );
All your friends' cities with two queries (pseudo code):
friends = db.friends.find( { username: "Ben" } );
db.so.distinct( 'name', { place_owner: { $in: [ "Tim", "Bob" ] } } );
Even with millions of documents, this should work fine, providing you have an index on the fields that you query for: { place_owner: 1 } and { username: 1 }.
I love MongoDB, but this data is not a good candidate for MongoDB. MongoDB does not support relationships, and that is basically all you are tracking here. Use a relational database to store the relationships.
Think of it like this: under the skin of the DBMS, MongoDB or SQL, an index is an index, a table is a table (basically). You get more performance from MongoDB not because it does the same things faster, but because you are able to use it to get your DB server to do less. (E.g. pull an entire document containing nested arrays and subdocs instead of joining a bunch of tables together). There are some fundamental differences in the way MongoDB handles updates, but for querying simple data sets most systems are going to be relatively similar. One big difference between the two rooted in the way MongoDB works is that it cannot use data in a collection as parameters for a another query, which is basically the whole point of a relational database. Since 2 of your use cases require "joins" (to "all my friends"), you need two queries.
So what you're doing with two queries is the same thing as a join, except relational databases are optimized to do this extremely efficiently; I promise you it will be slower to do this join manually, plus you're sending all data (friends' IDs) over the wire and making an extra DB connection. Now, if you could store all your friends' cities and places in a single document, MongoDB will probably be (slightly) faster than joining, but now you've got a new problem, because you now have to manage all this, anytime anyone adds a city or place all their friends have to be modified--this is unrealistic.
But there is even more to the story than that, because SQL DBMS are extremely mature applications with lots of features to improve query performance. They let you do things like create "materialize views" that store all your friends cities and places in memory and update themselves automatically any time one of their source tables is updated so you don't have to do anything special, you'd just query and you'd get your data without actually executing any joins. (A materialized table is not the right fit here, but just as an example, it is possible if you needed it.)
ALSO, when you are using MongoDB, a guideline I've found helpful is this, anytime you are asking yourself whether your document will be large enough to store all the data you will EVENTUALLY have, you probably have a design problem on your hands. If a document's growth is unbound, it should probably be enumerated within a collection instead. Or put another way, your collections should grow as your application is used, not your document's size (much).
If breaking apart your schema like this means that for primary operations you are doing a lot of manual joins, it is worth considering the question of whether or not you should be using a relational database instead.
Let's say I have a collection of documents such as:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
And, on the other hand the owners are represented as a separate collection:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
How can I make sure that, when I insert a document it references the user in a correct way. In old-school RDBMS this could easily be done using a Foreign Key.
I know that I can check the correctness of insertion from my business code, BUT what if an attacker tampers with my request to the server and puts "owner" : 100, and Mongo doesn't throw any exception back.
I would like to know how this situation should be handled in a real-word application.
Thank you in advance!
MongoDB doesn't have foreign keys (as you have presumably noticed). Fundamentally the answer is therefore, "Don't let users tamper with the requests. Only let the application insert data that follows your referential integrity rules."
MongoDB is great in lots of ways... but if you find that you need foreign keys, then it's probably not the correct solution to your problem.
To answer your specific question - while MongoDB encourages handling foreign-key relationships on the client side, they also provide the idea of "Database References" - See this help page.
That said, I don't recommend using a DBRef. Either let your client code manage the associations or (better yet) link the documents together from the start. You may want to consider embedding the owner's "documents" inside the owner object itself. Assemble your documents to match your usage patterns and MongoDB will shine.
This is a one-to-one to relationship. It's better to embed one document in another, instead of maintaining separate collections. Check here on how to model them in mongodb and their advantages.
Although its not explicitly mentioned in the docs, embedding gives you the same effect as foreign key constraints. Just want to make this idea clear. When you have two collections like that:
C1:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
C2:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
And if you were to declare foreign key constraint on C2._id to reference C1._id (assuming MongoDB allows it), it would mean that you cannot insert a document into C2 where C2._id is non-existent in C1. Compare this with an embedded document:
{
"_id" : 0 ,
"owner" : 0,
"name" : "Doc1",
"owner_details" : {
"username" : "John"
}
}
Now the owner_details field represents the data from the C2 collection, and the remaining fields represent the data from C1. You can't add an owner_details field to a non-existent document. You're essentially achieving the same effect.
This questions was originally answered in 2011, so I decided to post an update here.
Starting from version MongoDB 4.0 (released in June 2018), it started supporting multi-document ACID transactions.
Relations now can be modeled in two approaches:
Embedded
Referenced (NEW!)
You can model referenced relationship like so:
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000")
]
}
Where the sample document structure of address document:
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}
If someone really wants to enforce the Foreign keys in the Project/WebApp. Then you should with a MixSQL approach i.e. SQL + NoSQL
I would prefer that the Bulky data which doesn't have that much references then it can be stored in NoSQL database Store. Like : Hotels or Places type of data.
But if there is some serious things like OAuth modules Tables, TokenStore and UserDetails and UserRole (Mapping Table) etc.... then you can go with SQL.
I would also reccommend that if username's are unique, then use them as the _id. You will save on an index. In the document being stored, set the value of 'owner' in the application as the value of 'username' when the document is created and never let any other piece of code update it.
If there are requirements to change the owner, then provide appropirate API's with business rules implemented.
There woudln't be any need of foreign keys.