Performing 'find all documents that ...' queries - mongodb

I'm coming from a Postgres background, and I am currently contemplating whether I should use a noSQL database such as mongoDB in my next project. For this I have a few questions
Is it possible to perform queries in noSQL that can fetch all the documents that have some common subdocument/attribute, example "select all users where country = italy"
Also, how is redundancy handled in noSQL? say I have a document that represents a given car model that multiple people can own. Would I then have to insert the same exact data in all these People documents, describing the given car model?
Thanks

Sure you can do queries with where clause in MongoDB (and other NoSQL engine), if I take your example, you will store the users into a "collection" named "users", and query it more or less the same way.
db.users.find( { "country" : "Italy" } );
MongoDB has a very rich and powerful query and aggregation engine ( http://docs.mongodb.org/manual/tutorial/query-documents/ ) , I am inviting your to follow the tutorials ( http://docs.mongodb.org/manual/tutorial/getting-started/ )or free online training ( http://university.mongodb.com ), to learn more about it.
To insert the document it is also really easy:
db.users.insert( {"first_name" : "John", "last_name" : "Doe", "country" : "USA" } );
that's it!
You talk about redundancy, like in your SQL world it depends a lot of the design. In MongoDB you will organize your document and link between them (linked or embedded documents) based on your business needs. It is hard to give an answer about document design in this context so I will invite you to read some interesting articles:
MongoDB Documentation : http://docs.mongodb.org/manual/data-modeling/
MongoDB Blog, 6 Rules of Thumb for MongoDB Schema Design (part 1,2,3)
http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
http://blog.mongodb.org/post/87892923503/6-rules-of-thumb-for-mongodb-schema-design-part-2
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
Answer to the question about users and cars, will be "It depends of your application".
If your application is mostly read, and you need most of the data about cars & users, duplication (denormalization), will be a good approach to make the development easy. (and yes you will need more work when you have to update the information...). The blog post and documentation should help you to find your way.

Yes, you can do queries in MongoDB on documents that have common subdocument/attribute.
MongoDB encourages embedding(de-normalization) of data as disk space is cheap and embedding docs can result in better query performance.Yes, embedding documents would mean inserting the same data in all People documents in a Car model. But if you want to avoid duplication/de-normalization , then you can go for 'referencing' which is a normalized model which stores data something like this: People collection will contain people docs and Car collection will contain car docs, similar to what we have in rdbms. But the primary/foreign key relationship is not imposed by MongoDB. You will end up doing joins in code and hence query performance will get degraded.

Related

Are there technical downsides to using a single collection over multiple collections in MongoDB?

Since MongoDB is schemaless, I could just drop all my documents into a single collection, with a key collection and an index on that key.
For example this:
db.getCollection('dogs').find()
db.getCollection('cars').find()
Would become this:
db.getCollection('all').find({'collection': 'dogs'})
db.getCollection('all').find({'collection': 'cars'})
Is there any technical downside to doing this?
There are multiple reasons to have different collections, maybe the two most importants are:
Performance: even if MongoDB has been designed to be flexible, it doesn't prevent the need to have indexes on fields that will be used during the search. You would have dramatic response times if the collection is too heterogeneous.
Maintenability/evolutivity: design should be driven by the usecases (usually you'll store the data as it's received by the application) and the design should be explicit to anyone looking at the database collections
MongoDB University is a great e-learning platform, it is free and there is in particular this course:
M320: Data Modeling
schema questions are often better understood by working backwards from the queries you'll rely on and how the data will get written.... if you were going to query Field1 AND Field2 together in 1 query statement you do want them in the same collection....dogs and cars don't sound very related while dogs and cats do...so really look at how you're going to want to query.....joining collections is not really ideal - doable via $lookup but not ideal....

Decide create multiple database and relational in cloudant

I'm currently learning NOSQL cloudant and trying to design the database, as I'm learning I'm cloudant treat all records as a documents and using denormalise for table. so I'm currently a bit confuse to how to decide which one need to be in one document and which one is need to be separated.
Below are my test cases :
let's say i'm designing store book tables structure, for simplicity I'll be having this tables BOOK, STORE, STORE_BRANCH
BOOK field : _id, book_name, author
STORE field : _id, store_name
STORE_BRANCH field : _id, store_branch_name, address, store_id_fk
with above case, I not able to decide where should i put the "price" field to ? as for normal RDBMS i will just create another table and having fields : ( store book_id, store_branch_id and prices), this with the assumption the price of the book is different for each branch. so i wondering how I put this in cloudant ?
any suggestion is appreciated
Your doubts are pretty common for RDMBS user.
In NoSQL generally you use the everything-in-one-document approach. In fact, in some cases, approximating JOINs in a document-oriented database like Cloudant is outright trivial. For example if you want to model a one-to-n relationship, you can put all n-related documents into the document they belong to. In your case you should put all the store_branch in the related store. This strategy is OK if:
The document does not get so big that it impairs performance. This can be mitigated somewhat by using database views or show functions.
The information in the inner document only appears there and does not need to be duplicated into other documents, or such duplication is acceptable for your application.
The document does not get updated concurrently. If it does, there will likely be unnecessary conflicts that will need to be resolved by the application.
If the above strategy is not applicable you can use an approach that more closely mimics how you solve this problem in a relational database: you can create a document for each "relational table". In your case you should create a document having fields : ( store book_id, store_branch_id and prices), too.
This Cloudant article explains very deeply these possibilities: Cloudant - Join the fun, have fun with JOINs.

many-to-many relationships for social app: Mongodb or graph databases like Neo4j

I have tried to understand embedding in Mongodb but could not find good enough documentation. Linking is not advised as writes are not atomic across documents and also there are two lookups. Does someone know how to solve this or would you suggest me to go to graph dbs like neo4j.
I am trying to build an application which would need many-to-many relationships. To explain, I will take the example of a library. It can suggest books to user based on books his friends are reading and neighbors (like minded) users are reading.
There are Users and Books. Users borrow books and have friends who are other users
Given a user, I need all books he is reading and number of mutual
friends for the book
Given a book, I need all the people who are reading it. May be given
a user A, this would return the intersection of people reading book
and friends of user A. This is mutual friendship
Users = [
{ name: 'xyz', 'id':'000000', friend_ids:['949583','958694']}
{ name: 'abc', 'id':'000001', friend_ids:['949582','111111']}
]
Books = [
{'book':'da vinci code', 'author': 'dan brown', 'readers'=['949583', '000000']}
{'book':'iCon', 'author': 'Young', 'readers'=['000000', '000001']}
]
As seen above, generally I need two documents if I take mongo DB as I might two way lookup. Duplicating (embedding) on document into another could lead to lot of duplicity (these schemas could store much more information than shown).
Am I modeling my data correctly? Can this be effectively done in mongodb or should I look at graph dbs.
A disclaimer: I work for Neo4j
From your outline, requirements and type of data it seems that your app is rather in a sweetspot for graph databases.
I'd suggest you just do a quick spike with a graph database and see how it is going.
There will be no duplication
you have transactions for atomic operations
following links is the natural operation
local queries (e.g. from a user or a book) are cheap and fast
you can use graph algorithms like shortest path to find interesting information about your data
recommendations and similar operations are natural to graph databases
Some Questions:
Why did you choose MongoDB in the first place?
What implementation language do you use?
Your basic schema proposal above would work fine for MongoDB, with a few suggestions:
Use integers for identifiers, rather than strings. Integers will often be stored more compactly by MongoDB (they will always be 8 bytes, whereas strings' stored size will depend on the length of the string). You can use findAndModify to emulate unique sequence generators (like auto_increment in some relational databases) -- see Mongoengine's SequenceField for an example of how this is done. You could also use ObjectIds which are always 12 bytes, but are virtually guaranteed to be unique without having to store any coordination information in the database.
You should use the _id field instead of id, as this field is always present in MongoDB and has a default unique index created on it. This means your _ids are always unique, and lookups by _id is very fast.
You are right that using this sort of schema will require multiple find()s, and will incur network round-trip overhead each time. However, for each of the queries you have suggested above, you need no more than 2 lookups, combined with some straightforward application code:
"Given a user, I need all books he is reading and number of mutual friends for the book"
a. Look up the user in question, thenb. query the books collection using db.books.find({_id: {$in: [list, of, books, for, the, user]}}), thenc. For each book, compute a set union for that book's readers plus the user's friends
"Given a book, I need all the people who are reading it."a. Look up the book in question, thenb. Look up all the users who are reading that book, again using $in like db.users.find({_id: {$in: [list, of, users, reading, book]}})
"May be given a user A, this would return the intersection of people reading book and friends of user A."a. Look up the user in question, thenb. Look up the book in question, thenc. Compute the set union of the user's friends and the book's readers
I should note that $in can be slow if you have very long lists, as it is effectively equivalent to doing N number of lookups for a list of N items. The server does this for you, however, so it only requires one network round-trip rather than N.
As an alternative to using $in for some of these queries, you can create an index on the array fields, and query the collection for documents with a specific value in the array. For instance, for query #1 above, you could do:
// create an index on the array field "readers"
db.books.ensureIndex({readers: 1})
// now find all books for user whose id is 1234
db.books.find({readers: 1234})
This is called a multi-key index and can perform better than $in in some cases. Your exact experience will vary depending on the number of documents and the size of the lists.

MongoDB - How to Handle Relationship

I just start learning about nosql database, specially MongoDB (no specific reason for mongodb). I browse few tutorial sites, but still cant figure out, how it handle relationship between two documents/entity
Lets say for example:
1. One Employee works in one department
2. One Employee works in many department
I dont know the term 'relationship' make sense for mongodb or not.
Can somebody please give something about joins, relationship.
The short answer: with "nosql" you wouldn't do it that way.
What you'd do instead of a join or a relationship is add the departments the user is in to the user object.
You could also add the user to a field in the "department" object, if you needed to see users from that direction.
Denormalized data like this is typical in a "nosql" database.
See this very closely related question: How do I perform the SQL Join equivalent in MongoDB?
in general, you want to denormalize your data in your collections (=tables). Your collections should be optimized so that you don't need to do joins (joins are not possible in NoSQL).
In MongoDB you can either reference other collections (=tables), or you can embed them into each other -- whatever makes more sense in your domain. There are size limits to entries in a collection, so you can't just embed the encyclopedia britannica ;-)
It's probably best if you look for API documentation and examples for the programming language of your choice.
For Ruby, I'd recommend the Mondoid library: http://mongoid.org/docs/relations.html
Generally, if you decided to learn about NoSql databases you should follow the "NoSql way", i.e. learn the principles beyond the movement and the approach to design and not simply try to map RDBMS to your first NoSql project.
Simply put - you should learn how to embed and denormalize data (like Will above suggested), and not simply copy the id to simulate foreign keys.
If you do this the "foreign _id way", next step is to search for transactions to ensure that two "rows" are consistently inserted/updated. Few steps after Oracle/MySql is waiting. :)
There are some instances in which you want/need to keep the documents separate in which case you would take the _id from the one object and add it as a value in your other object.
For Example:
db.authors
{
_id:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
name:'George R.R. Martin'
}
db.books
{
name:'A Dance with Dragons'
authorId:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
}
There is no official relationship between books and authors its just a copy of the _id from authors into the authorId value in books.
Hope that helps.

MongoDB and "joins" [duplicate]

This question already has answers here:
How do I perform the SQL Join equivalent in MongoDB?
(19 answers)
Closed 6 years ago.
I'm sure MongoDB doesn't officially support "joins". What does this mean?
Does this mean "We cannot connect two collections(tables) together."?
I think if we put the value for _id in collection A to the other_id in collection B, can we simply connect two collections?
If my understanding is correct, MongoDB can connect two tables together, say, when we run a query. This is done by "Reference" written in http://www.mongodb.org/display/DOCS/Schema+Design.
Then what does "joins" really mean?
I'd love to know the answer because this is essential to learn MongoDB schema design. http://www.mongodb.org/display/DOCS/Schema+Design
It's no join since the relationship will only be evaluated when needed. A join (in a SQL database) on the other hand will resolve relationships and return them as if they were a single table (you "join two tables into one").
You can read more about DBRef here:
http://docs.mongodb.org/manual/applications/database-references/
There are two possible solutions for resolving references. One is to do it manually, as you have almost described. Just save a document's _id in another document's other_id, then write your own function to resolve the relationship. The other solution is to use DBRefs as described on the manual page above, which will make MongoDB resolve the relationship client-side on demand. Which solution you choose does not matter so much because both methods will resolve the relationship client-side (note that a SQL database resolves joins on the server-side).
The database does not do joins -- or automatic "linking" between documents. However you can do it yourself client side. If you need to do 2, that is ok, but if you had to do 2000, the number of client/server turnarounds would make the operation slow.
In MongoDB a common pattern is embedding. In relational when normalizing things get broken into parts. Often in mongo these pieces end up being a single document, so no join is needed anyway. But when one is needed, one does it client-side.
Consider the classic ORDER, ORDER-LINEITEM example. One order and 8 line items are 9 rows in relational; in MongoDB we would typically just model this as a single BSON document which is an order with an array of embedded line items. So in that case, the join issue does not arise. However the order would have a CUSTOMER which probably is a separate collection - the client could read the cust_id from the order document, and then go fetch it as needed separately.
There are some videos and slides for schema design talks on the mongodb.org web site I belive.
one kind of join a query in mongoDB, is ask at one collection for id that match , put ids in a list (idlist) , and do find using on other (or same) collection with $in : idlist
u = db.friends.find({"friends": something }).toArray()
idlist= []
u.forEach(function(myDoc) { idlist.push(myDoc.id ); } )
db.family.find({"id": {$in : idlist} } )
The first example you link to shows how MongoDB references behave much like lazy loading not like a join. There isn't a query there that's happening on both collections, rather you query one and then you lookup items from another collection by reference.
The fact that mongoDB is not relational have led some people to consider it useless.
I think that you should know what you are doing before designing a DB. If you choose to use noSQL DB such as MongoDB, you better implement a schema. This will make your collections - more or less - resemble tables in SQL databases. Also, avoid denormalization (embedding), unless necessary for efficiency reasons.
If you want to design your own noSQL database, I suggest to have a look on Firebase documentation. If you understand how they organize the data for their service, you can easily design a similar pattern for yours.
As others pointed out, you will have to do the joins client-side, except with Meteor (a Javascript framework), you can do your joins server-side with this package (I don't know of other framework which enables you to do so). However, I suggest you read this article before deciding to go with this choice.
Edit 28.04.17:
Recently Firebase published this excellent series on designing noSql Databases. They also highlighted in one of the episodes the reasons to avoid joins and how to get around such scenarios by denormalizing your database.
If you use mongoose, you can just use(assuming you're using subdocuments and population):
Profile.findById profileId
.select 'friends'
.exec (err, profile) ->
if err or not profile
handleError err, profile, res
else
Status.find { profile: { $in: profile.friends } }, (err, statuses) ->
if err
handleErr err, statuses, res
else
res.json createJSON statuses
It retrieves Statuses which belong to one of Profile (profileId) friends. Friends is array of references to other Profiles. Profile schema with friends defined:
schema = new mongoose.Schema
# ...
friends: [
type: mongoose.Schema.Types.ObjectId
ref: 'Profile'
unique: true
index: true
]
I came across lot of posts searching for the same - "Mongodb Joins" and alternatives or equivalents. So my answer would help many other who are like me. This is the answer I would be looking for.
I am using Mongoose with Express framework. There is a functionality called Population in place of joins.
As mentioned in Mongoose docs.
There are no joins in MongoDB but sometimes we still want references to documents in other collections. This is where population comes in.
This StackOverflow answer shows a simple example on how to use it.