What is the ideal way to make collection reference in mongodb? - mongodb

I am new to mongodb, I have only one collection called "stores", here "company" has many "stores", should I make company as separate table and make reference to "stores" table or below is ok?
{
"_id" : ObjectId("yyyyyyyyyyyyyyyyy"),
"type" : "Feature",
"properties" : {
"name" : "aa",
"address" : "bb",
"company" : "AAA"
},
"geometry" : {
"type" : "Point",
"coordinates" : [
yyyy,
xxxx
]
}
}
What is the ideal way to call company reference in mongodb?

With MongoDB, you can embed related data in a single structure or document or you can store in a normalized way. It is based on the requirement.
Below are from MongoDB documentation
In general, use embedded data models when:
you have “contains” relationships between entities. See Model One-to-One Relationships with Embedded Documents.
you have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are
viewed in the context of the “one” or parent documents. See Model
One-to-Many Relationships with Embedded Documents.
In general, use normalized data models:
when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the
implications of the duplication.
to represent more complex many-to-many relationships.
to model large hierarchical data sets.
More details can be found for here and here
And this SOF answer might help you as well.
In your case, if the relationship between stores and company is one-to-many then use the embedded structure.

Related

MongoDB Embedding alongside referencing

There is a lot of content of what kind of relationships should use in a database schema. However, I have not seen anything about mixing both techniques. 
The idea is to embed only the necessaries attributes and with them a reference. This way the application have the necessary data for rendering and the reference for the updating methods.
The problem I see here is that the logic for handle any CRUD operations becomes more tricky because its mandatory to update multiples collections however I have all the information in one single read.
Basic schema for a page that only wants the students names of a classroom:
CLASSROOM COLLECTION
{"_id": ObjectID(),
"students": [{"studentId" : ObjectID(),
"name" : "John Doe",
},
...
]
}
STUDENTS COLLECION
{"_id": ObjectID(),
"name" : "John Doe",
"address" : "...",
"age" : "...",
"gender": "..."
}
I use the students' collection in a different page and there I do not want any information about the classroom. That is the reason not to embed the students.
I started to learning mongo a few days ago and I don't know if this kind of schema bring some problems.
You can embed some fields and store other fields in a different collection as you are suggesting.
The issues with such an arrangement in my opinion would be:
What is the authority for a field? For example, what if a field like name is both embedded and stored in the separate collection, and the values differ?
Both updating and querying become awkward as you need to do it differently depending on which field is being worked with. If you make a mistake and go in the wrong place, you create/compound the first issue.

what is polymorphic data? NoSQL databases

Quote from MongoDB Architecture Guide
Developers are working with applications that create massive volumes
of new, rapidly changing data types — structured, semi-structured,
unstructured and polymorphic data.
what are polymorphic data? Please explain for a guy with SQL background.
Document-oriented database are schemaless. It mean that databases don't care about schema of the data. But each document has own schema / structure. Polymorphic data means that in one collection you have many versions of document schema (e.g. different field type, fields that occur in some documents etc.).
For example in below documents email field is string or array of string:
{
"user": "Anna",
"email" : "anna#gmail.com"
}
{
"user": "Jon",
"email" : [
"jon#gmail.com",
"jon#yahoo.com"
]
}

Denormalization Data in MongoDb Doctrine Symfony 2

I'm Following this Doc
http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/tutorials/getting-started.html
And
http://symfony.com/doc/current/bundles/DoctrineMongoDBBundle/index.html
When I Save My Document, I have two Collection
like this:
{
"_id" : ObjectId("5458e370d16fb63f250041a7"),
"name" : "A Foo Bar",
"price" : 19.99,
"posts" : [
{
"$ref" : "Embedd",
"$id" : ObjectId("5458e370d16fb63f250041a8"),
"$db" : "test_database"
}
]
}
I'd like have
{
"_id" : ObjectId("5458e370d16fb63f250041a7"),
"name" : "A Foo Bar",
"price" : 19.99,
"posts" : [
{
"mycomment" :"dsdsds"
" date" : date
}
]
}
I want denormalization my data. How Can i Do it?
Can I use Methods like $push,$addToSet etc of mongoDb?
Thanks
Doctrine ODM supports both references and embedded documents.
In your first example, you're using references. The main document (let's assume it's called Product) references many Post documents. Those Post documents live in their own collection (for some reason this is named Embedd -- I would suggest renaming that if you keep this schema). By default, ODM uses the DBRef convention for references, so each reference is itself a small embedded document with $ref, $id, and $db fields.
Denormalization can be achieved by using embedded documents (an #EmbedMany mapping in your case). If you were embedding a Post document, the Post class should be mapped as an #EmbeddedDocument. This tells ODM that it's not a first-class document (belonging to its own collection), so it won't have to worry about tracking it by _id and the like (in fact, embedded documents won't even need identifiers unless you want to map one).
My rule of thumb for deciding to embed or references has generally been asking myself, "Will I need this document outside of the context of the parent document?" If a Post will not have an identity outside of the Product record, I'm comfortable embedding it; however, if I find later that my application also wants to show users a list of all of their Posts, or that I need to query by Posts (e.g. a feed of all recent Posts, irrespective of Product), then I may want to reference documents in a Posts collection (or simply duplicate embedded Posts as needed).
Alternatively, you may decide that Posts should exist in both their own collection and be embedded on Product. In that case, you can create an AbstractPost class as a #MappedSuperclass and define common fields there. Then, extend this with both Post and EmbeddedPost sub-classes (mapped accordingly). You'll be responsible for creating some code to generate an EmbeddedPost from a Post document, which will be suitable for embedding in the Product.posts array. Furthermore, you'll need to handle data synchronization between the top-level and embedded Posts (e.g. if someone edits a Post comment, you may want all the corresponding embedded versions updated as well).
On the subject of references: ODM also supports a simple option for reference mappings, in which case it will just store the referenced document's _id instead of the larger DBRef object. In most cases, having DBRef store the collection and database name for each referenced document is quite redundant; however, DBRef is actually useful if you're using single-collection inheritance, as ODM uses the object to store extra discriminator information (i.e. the class of the referenced object).

Storing structured data in Lucene

I have seen many references pointing to the use of Lucene or Solr as a NoSQL data store, not just the indexing engine:
NoSQL (MongoDB) vs Lucene (or Solr) as your database
http://searchhub.org/2010/04/29/for-the-guardian-solr-is-the-new-database/
However, because Lucene only provides a "flat" document structure, where each field can be multi-value (scalar), I can't seem to fully understand how people are mapping complex structured data into Lucene for index and store. For example:
{
"firstName": "Joe",
"lastName": "Smith",
"addresses" : [
{
"type" : "home",
"line1" : "1 Main Street",
"city" : "New York",
},
{
"type" : "office",
"line1" : "P.O. Box 1234",
"zip:“10000”
}
]
}
Things can obviously get more complex. I.e. what if the object has two collections: addresses and phone numbers? what if address itself has a collection?
I can think of two ways to map this two lucene "document":
Create a stored but not indexed field to store a JSON/BSON version of the object, and then create other index but don't store fields for indexing/searching.
Find a smart way to somehow fit the object into Lucene way of storing data. I.e. use dot notation to flat the fields, use multi-value fields to store individual collection value and then somehow recreate the object on its way back...
I wonder if people have dealt with similar problems before and what solution have you used?
Take a look at my Stupid Lucene Tricks: Hierarchies for one approach.
It depends what the usage is.
If you only need them for display, you can the complex value (addresses) as a JSON string and store it as multiple value field, if you need to use them as index, you can choose following struture:
"addresses_type": [
"home",
"office"
],
"addresses_line1": [
"1 Main Street",
"P.O. Box 1234"
],
"addresses_city": [
"New York",
""
],
"addresses_zip": [
"",
"10000"
]

How to enforce foreign keys in NoSql databases (MongoDB)?

Let's say I have a collection of documents such as:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
And, on the other hand the owners are represented as a separate collection:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
How can I make sure that, when I insert a document it references the user in a correct way. In old-school RDBMS this could easily be done using a Foreign Key.
I know that I can check the correctness of insertion from my business code, BUT what if an attacker tampers with my request to the server and puts "owner" : 100, and Mongo doesn't throw any exception back.
I would like to know how this situation should be handled in a real-word application.
Thank you in advance!
MongoDB doesn't have foreign keys (as you have presumably noticed). Fundamentally the answer is therefore, "Don't let users tamper with the requests. Only let the application insert data that follows your referential integrity rules."
MongoDB is great in lots of ways... but if you find that you need foreign keys, then it's probably not the correct solution to your problem.
To answer your specific question - while MongoDB encourages handling foreign-key relationships on the client side, they also provide the idea of "Database References" - See this help page.
That said, I don't recommend using a DBRef. Either let your client code manage the associations or (better yet) link the documents together from the start. You may want to consider embedding the owner's "documents" inside the owner object itself. Assemble your documents to match your usage patterns and MongoDB will shine.
This is a one-to-one to relationship. It's better to embed one document in another, instead of maintaining separate collections. Check here on how to model them in mongodb and their advantages.
Although its not explicitly mentioned in the docs, embedding gives you the same effect as foreign key constraints. Just want to make this idea clear. When you have two collections like that:
C1:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
C2:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
And if you were to declare foreign key constraint on C2._id to reference C1._id (assuming MongoDB allows it), it would mean that you cannot insert a document into C2 where C2._id is non-existent in C1. Compare this with an embedded document:
{
"_id" : 0 ,
"owner" : 0,
"name" : "Doc1",
"owner_details" : {
"username" : "John"
}
}
Now the owner_details field represents the data from the C2 collection, and the remaining fields represent the data from C1. You can't add an owner_details field to a non-existent document. You're essentially achieving the same effect.
This questions was originally answered in 2011, so I decided to post an update here.
Starting from version MongoDB 4.0 (released in June 2018), it started supporting multi-document ACID transactions.
Relations now can be modeled in two approaches:
Embedded
Referenced (NEW!)
You can model referenced relationship like so:
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000")
]
}
Where the sample document structure of address document:
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}
If someone really wants to enforce the Foreign keys in the Project/WebApp. Then you should with a MixSQL approach i.e. SQL + NoSQL
I would prefer that the Bulky data which doesn't have that much references then it can be stored in NoSQL database Store. Like : Hotels or Places type of data.
But if there is some serious things like OAuth modules Tables, TokenStore and UserDetails and UserRole (Mapping Table) etc.... then you can go with SQL.
I would also reccommend that if username's are unique, then use them as the _id. You will save on an index. In the document being stored, set the value of 'owner' in the application as the value of 'username' when the document is created and never let any other piece of code update it.
If there are requirements to change the owner, then provide appropirate API's with business rules implemented.
There woudln't be any need of foreign keys.