NoSQL db schema design - mongodb

I'm trying to find a way to create the db schema. Most operations to the database will be Read.
Say I'm selling books on the app so the schema might look like this
{
{ title : "Adventures of Huckleberry Finn"
author : ["Mark Twain", "Thomas Becker", "Colin Barling"],
pageCount : 366,
genre: ["satire"] ,
release: "1884",
},
{ title : "The Great Gatsby"
author : ["F.Scott Fitzgerald"],
pageCount : 443,
genre: ["Novel, "Historical drama"] ,
release: "1924"
},
{ title : "This Side of Paradise"
author : ["F.Scott Fitzgerald"],
pageCount : 233,
genre: ["Novel] ,
release: "1920"
}
}
So most operations would be something like
1) Grab all books by "F.Scott Fitzgerald"
2) Grab books under genre "Novel"
3) Grab all book with page count less than 400
4) Grab books with page count more than 100 no later than 1930
Should I create separate collections just for authors and genre and then reference them like in a relational database or embed them like above? Because it seems like if I embed them, to store data in the db I have to manually type in an author name, I could misspell F.Scott Fitzgerald in a document and I wouldn't get back the result.

First of all i would say a nice DB choice.
As far as mongo is concerned the schema should be defined such that it serves your access patterns best. While designing schema we also must observe that mongo doesn't support joins and transactions like SQL. So considering all these and other attributes i would suggest that your choice of schema is best as it serves your access patterns. Usually whenever we pull any book detail, we need all information like author, pages, genre, year, price etc. It is just like object oriented programming where a class must have all its properties and all non- class properties should be kept in other class.
Taking author in separate collection will just add an extra collection and then you need to take care of joins and transactions by your code. Considering your concern about manually typing the author name, i don't get actually. Let's say user want to see books by author "xyz" so he clicks on author name "xyz" (like some tag) and you can fetch a query to bring all books having that selected name as one of the author. If user manually types user name then also it is just finding the document by entered string. I don't see anything manual here.
Just adding on, a price key shall also fit in to every document.

Related

nosql inconsistent data structure

I'm new to nosql (MongoDB) so go easy on me.
I'm scraping json-ld from various web pages and want to store/recall the data. However the value types keep changing. For instance sometimes the "author" field uses an "organization" type, other times it's a "person" type sometimes it's simply a string, and sometimes it's just missing.
Should I convert the data to some type of standard?
Should each object be put into it's own collection and referenced?
How do you deal with displays being different.
Looking for words of experience or links to good articles on how to deal with inconsistent data structure.
The whole point of No-Sql database is that its schema less, and the structure can vary from document to other, so I see no issue in here.
I think you are asking on how you should deal with it in your application business logic, so here is my suggestion:
You can save the author as an embedded sub-document which always have a field called “type” (as an enum of values: String, Person, Organization, etc…) and act accordingly when you fetch the data.
For example, if the author is simply a String then the document would look like something like:
{
…,
“author”: {
“type”: “String”,
“text”: <text>
}
}
If its a Person type then:
{
…,
“author”: {
“type”: “Person”,
“first_name”: <first name>,
“last_name”: <last name>
}
}

NoSQL - wrapping my head around the duplication

I come from a strong RDBMS background and I'm trying to wrap my head around how I'd structure a NoSQL project. From what I can see, normalisation is disregarded in favour of speed.
Say for example I have a blog and categories. In MySQL, I would have a blog table and a categories table joined by a blog_categories table to maintain the many-to-many relationship. I would simply use JOINs to display the data on the website accordingly.
Easy enough.
But with NoSQL:
Would I store the blog category along with the blog article in the same document? So if categories "general", "news" and "articles" were assigned to an article, I would save this data along with the article for each article applicable?
Does this then mean that if I wanted to rename "news" to "latest news", I would need to cycle through all blog articles and their categories and manually update each instance of the "news" category?
You can store blog categories with the blog but it depends on how your blog will work. It will be faster but you can not:
1) Easily get a list of all tags
2) Change the name of a tag
How i tend to do it is have a table of tags and use a technique called manual referencing to populate data on screen.
In short it means having a collection of tags like :
{
_id : tag1,
tagName : "news"
}
Then in your blog:
{
_id : blog1,
blogTitle : "Title",
blogBody : "..."
tags : [
tag1,
....
]
}
That way you get the functionality you desire. for more information on manual referencing see mongodb documentation:
https://docs.mongodb.com/manual/reference/database-references/#manual-references

How to design product category and products models in mongodb?

I am new to mongo db database design,
I am currently designing a restaurant products system, and my design is similar to a simple eCommerce database design where each productcategory has products so in a relational database system this will be a one (productcategory) to many products.
I have done some research I do understand that in document databses.
Denationalization is acceptable and it results in faster database reads.
therefore in a nosql document based database I could design my models this way
//product
{
name:'xxxx',
price:xxxxx,
productcategory:
{
productcategoryName:'xxxxx'
}
}
my question is this, instead of embedding the category inside of each product, why dont we embed the products inside productcategory, then we can have all the products once we query the category, resulting in this model.
//ProductCategory
{
name:'categoryName',
//array or products
products:[
{
name:'xxxx',
price:xxxxx
},
{
name:'xxxx',
price:xxxxx
}
]
}
I have researched on this issue on this page http://www.slideshare.net/VishwasBhagath/product-catalog-using-mongodb and here http://www.stackoverflow.com/questions/20090643/product-category-management-in-mongodb-and-mysql both examples I have found use the first model I described (i.e they embed the productCategory inside product rather than embed array of products inside productCategory), I do not understand why, please explain. thanks
For the DB design, you have to consider the cardinality (one-to-few, one-to-many & one-to-gazillions) and also your data access patterns (your frequent queries, updates) while designing your DB schema to make sure that you get optimum performance for your operations.
From your scenario, it looks like each Product has a category, however it also looks like you need to query to find out Products for each category.
So, in this case, you could do with something like :
Product = { _id = "productId1", name : "SomeProduct", price : "10", category : ObjectId("111") }
ProductCategory = { _id = ObjectId("111"), name : "productCat1", products : ["productId1", productId2", productId3"]}
As i said about data access patterns, if you always read the Category-name and "Category-name" is something which is very infrequently updated then you can go for denormalizing with this two-way referencing by adding the product-category-name in the product:
Product = { _id = "productId1", name : "SomeProduct", price : "10", category : { ObjectId("111"), name:"productCat1" }
So, with embedding the documents, specific queries would be faster if no joins would be required, however other queries in which you need to access embedded details as stand-alone entities would be difficult.
This is a link from MongoDB which explains DB design for one-to-many scenario like you have with very nice examples in which you would realize that there are much more ways of doing it and many other things to think about.
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3 (also has links for parts 1 & 2)
This link also describes the pros & cons for each scenario enabling you to reach a narrow down on a DB schema design.
It all depends on what queries you have in mind.
Suppose a product can only belong to one category, and one category applies to many products. Then, if you expect to retrieve the category together with the product, it makes sense to store it directly:
// products
{_id:"aabbcc", name:"foo", category:"bar"}
and if you expect to query all the products in a given category then it makes sense to create a separate collection
// categories
{_id:"bar", products=["aabbcc"]}
Remember that you cannot atomically update both the products and categories database (MongoDB is eventually consistent), but you can occasionally run a batch job which will make sure all categories are up-to-date.
I would recommend to think in terms of what kind of information will I often need, as opposed to how to normalize/denormalize this data, and make your collections reflect what you actually want.

mongodb: Embedded only id or both id and name

I'm new to mongodb, please suggest me how to correct design schema for situation like below:
I have User collection and Product collection. Product contain info like id, title, description, price... User can bookmark or like Product. Currently, in User collection, I'm store 1 array for liked products, and 1 array for bookmarked products. So when I need to view info about 1 user, I have to read out these 2 array, then search in Product collection to get title of liked and bookmarked products.
//User collection
{
_id : 12345,
name: "John",
liked: [123, 456, 789],
bkmark: [123, 125]
}
//Product collection
{
_id : 123,
title: "computer",
desc: "awesome computer",
price: 12
}
Now I think I can speed up this process by embedded both product id and title in User collection, so that I don't have to search in Product collection, just read it out and display. But if I choose this way, whenever Product's title get updated, I have to search and update in User collection too. I can't evaluate update cost in 2nd way, so I don't know which way is correct. Please help me to choose between them.
Thanks & Regards.
You should consider what happens more often: A product gets renamed or the information of a user is requested.
You should also consider what's a bigger problem: Some time lag in which users see an outdated product name (we are talking about seconds, maybe minutes when you have a really large number of users) or always a longer response time when requesting a user profile.
Without knowing your actual usage patterns and requirements, I would guess that it's the latter in both cases, so you should rather optimize for this situation.
In general it is not recommended to normalize a MongoDB as radical as you would normalize a relational database. The reason is that MongoDB can not perform JOINs. So it's usually not such a bad idea to duplicate some relevant information in multiple documents, while accepting a higher cost for updates and a potential risk of inconsistencies.

How do I describe a collection in Mongo?

So this is Day 3 of learning Mongo Db. I'm coming from the MySql universe...
A lot of times when I need to write a query for a MySql table I'm unfamiliar with, I would use the "desc" command - basically telling me what fields I should include in my query.
How would I do that for a Mongo db? I know, I know...I'm searching for a schema in a schema-less database. =) But how else would users know what fields to use in their queries?
Am I going at this the wrong way? Obviously I'm trying to use a MySql way of doing things in a Mongo db. What's the Mongo way?
Type the below query in editor / mongoshell
var col_list= db.emp.findOne();
for (var col in col_list) { print (col) ; }
output will give you name of columns in collection :
_id
name
salary
There is no good answer here. Because there is no schema, you can't 'describe' the collection. In many (most?) MongoDb applications, however, the schema is defined by the structure of the object hierarchy used in the writing application (java or c# or whatever), so you may be able to reflect over the object library to get that information. Otherwise there is a bit of trial and error.
This is my day 30 or something like that of playing around with MongoDB. Unfortunately, we have switched back to MySQL after working with MongoDB because of my company's current infrastructure issues. But having implemented the same model on both MongoDB and MySQL, I can clearly see the difference now.
Of course, there is a schema involved when dealing with schema-less databases like MongoDB, but the schema is dictated by the application, not the database. The database will shove in whatever it is given. As long as you know that admins are not secretly logging into Mongo and making changes, and all access to the database is controller through some wrapper, the only place you should look at for the schema is your model classes. For instance, in our Rails application, these are two of the models we have in Mongo,
class Consumer
include MongoMapper::Document
key :name, String
key :phone_number, String
one :address
end
class Address
include MongoMapper::EmbeddedDocument
key :street, String
key :city, String
key :state, String
key :zip, String
key :state, String
key :country, String
end
Now after switching to MySQL, our classes look like this,
class Consumer < ActiveRecord::Base
has_one :address
end
class Address < ActiveRecord::Base
belongs_to :consumer
end
Don't get fooled by the brevity of the classes. In the latter version with MySQL, the fields are being pulled from the database directly. In the former example, the fields are right there in front of our eyes.
With MongoDB, if we had to change a particular model, we simply add, remove, or modify the fields in the class itself and it works right off the bat. We don't have to worry about keeping the database tables/columns in-sync with the class structure. So if you're looking for the schema in MongoDB, look towards your application for answers and not the database.
Essentially I am saying the exactly same thing as #Chris Shain :)
While factually correct, you're all making this too complex. I think the OP just wants to know what his/her data looks like. If that's the case, you can just
db.collectionName.findOne()
This will show one document (aka. record) in the database in a pretty format.
I had this need too, Cavachon. So I created an open source tool called Variety which does exactly this: link
Hopefully you'll find it to be useful. Let me know if you have questions, or any issues using it.
Good luck!
AFAIK, there isn't a way and it is logical for it to be so.
MongoDB being schema-less allows a single collection to have a documents with different fields. So there can't really be a description of a collection, like the description of a table in the relational databases.
Though this is the case, most applications do maintain a schema for their collections and as said by Chris this is enforced by your application.
As such you wouldn't have to worry about first fetching the available keys to make a query. You can just ask MongoDB for any set of keys (i.e the projection part of the query) or query on any set of keys. In both cases if the keys specified exist on a document they are used, otherwise they aren't. You will not get any error.
For instance (On the mongo shell) :
If this is a sample document in your people collection and all documents follow the same schema:
{
name : "My Name"
place : "My Place"
city : "My City"
}
The following are perfectly valid queries :
These two will return the above document :
db.people.find({name : "My Name"})
db.people.find({name : "My Name"}, {name : 1, place :1})
This will not return anything, but will not raise an error either :
db.people.find({first_name : "My Name"})
This will match the above document, but you will have only the default "_id" property on the returned document.
db.people.find({name : "My Name"}, {first_name : 1, location :1})
print('\n--->', Object.getOwnPropertyNames(db.users.findOne())
.toString()
.replace(/,/g, '\n---> ') + '\n');
---> _id
---> firstName
---> lastName
---> email
---> password
---> terms
---> confirmed
---> userAgent
---> createdAt
This is an incomplete solution because it doesn't give you the exact types, but useful for a quick view.
const doc = db.collectionName.findOne();
for (x in doc) {
print(`${x}: ${typeof doc[x]}`)
};
If you're OK with running a Map / Reduce, you can gather all of the possible document fields.
Start with this post.
The only problem here is that you're running a Map / Reduce on which can be resource intensive. Instead, as others have suggested, you'll want to look at the code that writes the actual data.
Just because the database doesn't have a schema doesn't mean that there is no schema. Generally speaking the schema information will be in the code.
I wrote a small mongo shell script that may help you.
https://gist.github.com/hkasera/9386709
Let me know if it helps.
You can use a UI tool mongo compass for mongoDb. This shows all the fields in that collection and also shows the variation of data in it.
If you are using NodeJS and want to get the all the field names using the API request, this code works for me-
let arrayResult = [];
db.findOne().exec(function (err, docs)){
if(err)
//show error
const JSONobj = JSON.parse(JSON.stringify(docs));
for(let key in JSONobj) {
arrayResult.push(key);
}
return callback(null, arrayResult);
}
The arrayResult will give you entire field/ column names
Output-
[
"_id",
"emp_id",
"emp_type",
"emp_status",
"emp_payment"
]
Hope this works for you!
Consider you have collection called people and you want to find the fields and it's data-types. you can use below query
function printSchema(obj) {
for (var key in obj) {
print( key, typeof obj[key]) ;
}
};
var obj = db.people.findOne();
printSchema(obj)
The result of this query will be like below,
you can use Object.keys like in JavaScript
Object.keys(db.movies.findOne())