How can Mongo store infinitely long comments in a blog post example - mongodb

Am looking to build a blogging system and came across the following blog.
http://blog.mongolab.com/2012/08/why-is-mongodb-wildly-popular/
While it's nice to see how we can store everything in one Mongo document as a json type object (example json from the blog pasted below) rather than distributing data across multiple tables, I'm having trouble understanding how this can accommodate an hypothetically super long comment thread.
{
_id: 1234,
author: { name: "Bob Davis", email : "bob#bob.com" },
post: "In these troubled times I like to …",
date: { $date: "2010-07-12 13:23UTC" },
location: [ -121.2322, 42.1223222 ],
rating: 2.2,
comments: [
{ user: "jgs32#hotmail.com",
upVotes: 22,
downVotes: 14,
text: "Great point! I agree" },
{ user: "holly.davidson#gmail.com",
upVotes: 421,
downVotes: 22,
text: "You are a moron" }
],
tags: [ "Politics", "Virginia" ]
}
Aside from the comments key which is represented as an array of comment objects, allowing us to store an endless number of comments within this document rather than on a separate comments table requiring a join operation to relate if we are to do this with a relational database, the rest of the fields (ie author, post, date, location, rating, tags) can all be done as columns on a relational database table as well.
Since there is a limit of 16MB per document, what happens when this blog attracts a lot of comments?
Also, why can't I store a json object on a relational database column? Afterall it's a text isn't it?

First, a clarification: MongoDB actually stores BSON, which is a essentially superset of JSON that supports more data types.
Since there is a limit of 16MB per document, what happens when this blog attracts a lot of comments?
You won't be able to increase the size past 16MB, so you'll lose the ability to add more comments. But you don't need to store all the comments on the blog post document. You could store the first N, then retire old comments to a comments collection as new ones are added. You could store comments in another collection with a parent reference. The way comments are stored should jive with how you expect them to be used. 16MB of comments would really be a lot - you might even have a special solution to handle the occasional post that gets that kind of activity, an approach that's totally different from the normal way of handling comments.
We can store json in a relational database. So what is the value of Mongo I'm getting?
Here's two ways of storing JSON (in MongoDB).
> db.test.drop()
> db.test.insert({ "name" : { "first" : "Yogi", "last" : "Bear" }, "location" : "Yellowstone", "likes" : ["picnic baskets", "PBJ", "the great outdoors"] })
> db.test.findOne()
{
"_id" : ObjectId("54f9f41f245e945635f2137b"),
"name" : {
"first" : "Yogi",
"last" : "Bear"
},
"location" : "Yellowstone",
"likes" : [
"picnic baskets",
"PBJ",
"the great outdoors"
]
}
var jsonstring = '{ "name" : { "first" : "Yogi", "last" : "Bear" }, "location" : "Yellowstone", "likes" : ["picnic baskets", "PBJ", "the great outdoors"] }'
> db.test.drop
> db.test2.insert({ "myjson" : jsonstring })
> db.test2.findOne()
{
"_id" : ObjectId("54f9f535245e945635f2137d"),
"myjson" : "{ \"name\" : { \"first\" : \"Yogi\", \"last\" : \"Bear\" }, \"location\" : \"Yellowstone\", \"likes\" : [\"picnic baskets\", \"PBJ\", \"the great outdoors\"] }"
}
Can you store and use JSON the first way using a relational database? How useful is JSON stored in the second way compared to the first?
There's lots of other differences between MongoDB and relational databases that make one better than the other for various use cases - but going further into that is too broad for an SO answer.

Can you store and use JSON the first way using a relational database?
How useful is JSON stored in the second way compared to the first?
Sorry are you suggesting that with Mongo json documents can be stored without using escape characters, whereas with a RDBMS I must use escape characters to escape the double quotes? I wasn't aware of that's the case.

Related

Best way to structure my firebase database

I'm working on a project where users can post things. But, I'm wondering if my firebase database structure is efficient. Below is how my database looks like so far. I have posts child contains all the post that users will post. and each user will be able to track their own posts by having posts child in uid. Is there a better way of structuring my data? or am I good to go? Any advice would be appreciated!
{
"posts" : {
"-KVRT-4z1AUoztWnF-pe" : {
"caption" : "",
"likes" : 0,
"pictureUrl" : "https://firebasestorage.googleapis.com/v0/b/cloub-4fdbd.appspot.com/o/users%2FufTgaqudXeUciW5bGgCSfoTRUw92%2F208222E1-8E20-42A0-9EEF-8AF34F523878.png?alt=media&token=9ec5301e-d913-44ee-81d0-e0ec117017de",
"timestamp" : 1477946376629,
"writer" : "ufTgaqudXeUciW5bGgCSfoTRUw92"
}
},
"users" : {
"ufTgaqudXeUciW5bGgCSfoTRUw92" : {
"email" : "Test1#gmail.com",
"posts" : {
"-KVRT-4z1AUoztWnF-pe" : {
"timestamp" : 1477946376677
}
},
"profileImageUrl" : "https://firebasestorage.googleapis.com/v0/b/cloub-4fdbd.appspot.com/o/profile_images%2F364DDC66-BDDB-41A4-969E-397A79ECEA3D.png?alt=media&token=c135d337-a139-475c-b7a4-d289555b94ca",
"username" : "Test1"
}
}
}
Working with NoSql Data , your need to take care of few things:-
Avoid Nesting The Data Too Deep
Flatten your dataStructure as possible
Prefer Dictionaries
Security Rules [Firebase]
Try this structure:-
users:{
userID1: {..//Users info..
posts:{
postID1: true,
postID2: true,
postID3: true,
}
},
userID2: {..//Users info..},
userID3: {..//Users info..},
userID4: {..//Users info..},
},
posts: {
userID1 :{
postID1: {..//POST CONTENT },
postID2: {..//POST CONTENT },
postID3: {..//POST CONTENT },
}
}
Keep the data flat and shallow. Store data elsewhere in the tree rather than nest a branch of data under a node that is simply related, duplicating the data if that helps to keep the tree shallow.
The goal is to have fast requests that return only data you need. Consider that every time the tree changes and the client-side listener fires the node and all its children are communicated to the client. Duplication of data across the tree facilitates quick requests with minimal data.
This process of flattening the data is known as "denormalization" and this section of the Firebase Doc does a nice job of providing guidance:
https://firebase.google.com/docs/database/android/structure-data
In your example above I see posts metadata nested under "users", a nested list that grows. Every time something changes under "users" the listener will fire to update the client and all of this data will be transmitted in each response. You could instead consider to fetch the posts data from the "posts" node based on the writer's uuid.

Storing a query in Mongo

This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);

Refrence same Document in Multiple Websites

I have one Admin interface website where a user can create news articles and also select which websites this article shall appear on. (Have many websites connected to the same Mongo database)
Each website has an array with article ID's. When going to one of these websites I loop this array and fetches all articles (from the Articles collection) belonging to this site with:
Articles.findOne({_id:id});
However this becomes a problem if I like to do more advanced queries such as sorting on date. Putting a limit etc etc.
At the same time I don't want to filter all articles for a specific site directly from the Articles collection since it seams expensive? (it contains all articles from all websites) And saving each article locally on each website would create duplicates.
I wonder what is a good way of storing these news articles and still fetch them quickly for each website?
------------------------
I am currently doing like this for fetching all articles from a site and sorting on date. But now I also need to put a limit, only fetch articles from a specific category and so on and it becomes very bothering:
var websites = Websites.find({name : "SITENAME"},{}).fetch();
var now = new Date();
var articles = [];
websites[0].articles.forEach(function(id) {
article = Articles.findOne({_id:id});
if (article != undefined && article.publishedDate < now) {
articles.push(article);
}
});
articles.sort(function(a, b) {
a = a.publishedDate;
b = b.publishedDate;
return a > b ? -1 : a<b ? 1 : 0;
});
return articles;
Edit to clarify:
This is the current database structure. Each article in the articles collection looks like this:
{
"_id" : "CdHWxgq75yjcgQoDZ",
"category" : "Nyheter",
"tags" : [
"ZaifTyGGouPwdrGur"
],
"data" : [
"Hello this is some random content"
],
"publishedOnSites" : [
"ZaifTyGGouPwdrGur"
],
"publishedDate" : ISODate("2015-11-20T07:22:09.799Z"),
"userId" : "B3t6QFgG7MfNkvzR5"
}
Each website in the websites collection like this:
{
"_id" : "ZaifTyGGouPwdrGur",
"name" : "SITENAME",
"categories" : [
"News",
"Life",
"TV",
"Sport",
"Quizzes",
"Video"
],
"tags" : [
"batman",
"bil",
"polis",
"flicka",
"cool",
"byrå",
"förvandling"
],
"articles" : [
"PgGetxkC9KynaPNLc",
"ZaifTyGGouPwdrGur",
"oPQHh3u2CGhRwYp2a",
"a5ZkhbxRcLEpggTuF",
"t3n8Zp6Cve6e88Gmt",
"eYQmaavt6tAwbbmzf",
"F9LzZFcFxSpejseHn",
"NLWb5NahoPjgAt7eN",
"pwkTtFN8gZCsnKDGg",
"o62uCK7S6qauJfyYa",
"pivJGzo4CFw3QRb3v",
"H2EHv7rX5GQmyqiDk",
"tGfrv82NMwJEpuThK",
"CvjGPKmsCqmd9o5oP",
"29hoZxnmfovTnC8TM",
"NXHXhaXDYgKLagamJ",
"9EjfABeK5akDLeZJT",
"5q5zeYRkPHMJXtEpT",
"eWGwWq3J7JqtQi2fK",
"7W27ufZ4qDyX4mJnC",
"oBhGpNCBTrMcb3qvq",
"7pRorBYbZ8Mx6jYX3",
"d2PoAFGTcbQzapXpW",
"qDRiB65vcpMu6KTTe",
]
}
I save the article ID's in each website to fetch it quickly without having to filter all articles. However this becomes a problem when I want to make queries such as sorting on date, putting a limit, skipping the first elements, only fetch articles with a certain category etc.
I need suggestions for a better database structure.
Usually, it's better to let MongoDB handle the filtering, sorting etc. It knows how to do it well and how to do it fast.
So, what you'd want to do is this:
var arcticles_ids = Websites.findOne({name: "SITENAME"}).articles;
var articlesCursor = Articles.find({_id: {$in: articles_ids}}, {sort: {publishedDate: -1}});
On the second line, you can add a limit etc. If you're worried about performance, add indexes, e.g.:
db.articles.createIndex({_id: 1, publishedDate: -1});
Note: Do not just add this index to your database. Analyze what kind of queries you have and add indexes based off of that. The above was just an example.
Also, you might want to consider adding a field to the Articles collection that stores all the websites that this article belongs to. E.g:
article: {
someField: someValue,
websites_ids: [1, 5, 8, 10]
}
This is useful if you want to make your query reactive. E.g:
var articlesCursor = Articles.find({websites_ids: website_id}, {sort: {publishedDate: -1}});
This way, if the cursor is reactive and an article is added to a website, the client immediately receives this information about the article. If done your way, the cursor would only track the specific IDs of the articles. Something to consider.

The correct way of storing document reference in one-to-one relationship in MongoDB

I have two MongoDB collections user and customer which are in one-to-one relationship. I'm new to MongoDB and I'm trying to insert documents manually although I have Mongoose installed. I'm not sure which is the correct way of storing document reference in MongoDB.
I'm using normalized data model and here is my Mongoose schema snapshot for customer:
/** Parent user object */
user: {
type: Schema.Types.ObjectId,
ref: "User",
required: true
}
user
{
"_id" : ObjectId("547d5c1b1e42bd0423a75781"),
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333",
}
I want to make a reference to this user document from the customer document. Which of the following is correct - (A) or (B)?
customer (A)
{
"_id" : ObjectId("547d916a660729dd531f145d"),
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
"user" : ObjectId("547d5c1b1e42bd0423a75781")
}
customer (B)
{
"_id" : ObjectId("547d916a660729dd531f145d"),
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
"user" : {
"_id" : ObjectId("547d5c1b1e42bd0423a75781")
}
}
Remember these things
Embedding is better for...
Small subdocuments
Data that does not change regularly
When eventual consistency is acceptable
Documents that grow by a small amount
Data that you’ll often need to perform a second query to fetch Fast reads
References are better for...
Large subdocuments
Volatile data
When immediate consistency is necessary
Documents that grow a large amount
Data that you’ll often exclude from the results
Fast writes
Variant A is Better.
you can use also populate with Mongoose
Use variant A. As long as you don't want to denormalize any other data (like the user's name), there's no need to create a child object.
This also avoids unexpected complexities with the index, because indexing an object might not behave like you expect.
Even if you were to embed an object, _id would be a weird name - _id is only a reserved name for a first-class database document.
One to one relations
1 to 1 relations are relations where each item corresponds to exactly one other item. e.g.:
an employee have a resume and vice versa
a building have and floor plan and vice versa
a patient have a medical history and vice versa
//employee
{
_id : '25',
name: 'john doe',
resume: 30
}
//resume
{
_id : '30',
jobs: [....],
education: [...],
employee: 25
}
We can model the employee-resume relation by having a collection of employees and a collection of resumes and having the employee point to the resume through linking, where we have an ID that corresponds to an ID in th resume collection. Or if we prefer, we can link in another direction, where we have an employee key inside the resume collection, and it may point to the employee itself. Or if we want, we can embed. So we could take this entire resume document and we could embed it right inside the employee collection or vice versa.
This embedding depends upon how the data is being accessed by the application and how frequently the data is being accessed. We need to consider:
frequency of access
the size of the items - what is growing all the time and what is not growing. So every time we add something to the document, there is a point beyond which the document need to be moved in the collection. If the document size goes beyond 16MB, which is mostly unlikely.
atomicity of data - there're no transactions in MongoDB, there're atomic operations on individual documents. So if we knew that we couldn't withstand any inconsistency and that we wanted to be able to update the entire employee plus the resume all the time, we may decide to put them into the same document and embed them one way or the other so that we can update it all at once.
In mongodb its very recommended to embedding document as possible as you can, especially in your case that you have 1-to-1 relations.
Why? you cant use atomic-join-operations (even it is not your main concern) in your queries (not the main reason). But the best reason is each join-op (theoretically) need a hard-seek that take about 20-ms. embedding your sub-document just need 1 hard-seek.
I believe the best db-schema for you is using just an id for all of your entities
{
_id : ObjectId("547d5c1b1e42bd0423a75781"),
userInfo :
{
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333",
},
customerInfo :
{
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
},
staffInfo :
{
........
}
}
Now if you just want the userinfo you can use
db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{userInfo : 1}).userInfo;
it will give you just the userInfo:
/* 0 */
{
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333"
}
And if you just want the **customerInfo ** you can use
db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{customerInfo : 1}).customerInfo;
it will give you just the customerInfo :
/* 0 */
{
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street"
}
and so on.
This schema has the minimum hard round-trip and actually you are using mongodb document-based feature with best performance you can achive.

MongoDB Schema Design for language database

I need some advice on MongoDB schema design for a natural language database.
I need to store for each language texts and words like:
lang: {
_id: "English",
texts : [
{ text : "This is a first text",
date : Date("2011-09-19T04:00:10.112Z"),
tag : "test1"
},
{ text : "Second One",
date : Date("2011-09-19T04:00:10.112Z"),
tag : "test2"
}
],
words : [
{
word : "This",
},
{
word : "is",
},
{
word : "a",
},
{
word : "first",
},
{
word : "text",
},
{
word : "second",
},
{
word : "one",
}
]
}
And then I need to know each words and texts a user has associated. The word/text amount tends to be huge and I need to list all words on a language and all words a user has associated for that language.
From my perspective I think storing the user_ids that are associated with a given word in an array for the word is maybe a good approach like:
lang: {
_id: "English",
texts : [
...
],
words : [
{
word : "This",
users: [user1,user2,user3]
},
{
word : "is",
users: [user1,user2]
},
...
]
}
Having in mind that a word can be associated to hundreds of thousand of users and the document limit (as I read) is 4MB and that I need to:
List all words for a given user and language
Is this a good approach? Or can you think of a better one?
Hope this question is clear enough and that someone can give me a help on this ;)
Thank you all!
I don't think this is a good approach, for just the reason you mention: the document size limit. It looks like with your approach, you are definitely going to run up against the limit. I would go for a flatter approach (which should also make your collection easier to query). Something like this:
[
{
user: "user1",
word: "This",
lang: "en"
},
{
user: "user1",
word: "is",
lang: "en"
},
// et cetera...
]
In other words, grow vertically by adding documents rather than horizontally by adding more data to one document. You can query words for a given user with db.find( { user: "user1", lang: "en" });.
This approach isn't "normalized", of course, so if you're concerned about space then you might want to create a separate collection for users, words, and languages and reference them in the main collection by an ID. But since there are no join queries in MongoDB, you have to weigh query performance against space efficiency.
dbaseman is correct (and upvoted), but a couple of other points:
First, the document limit is now 16MB (Max Document Size), as of this writing, assuming you are running a recent versionof MongoDB.
Second, unbounded growth is generally a bad idea in MongoDB, this type of document size expansion can cause MongoDB to have to move the document if it exceeds the current space allocated to it. You can read more about this in the Padding Factor section of the documentation.
Those types of moves are relatively expensive, especially if they happen frequently. Therefore, if you do go with this type of design limiting the size (essentially bounding that growth) of the comments equivalent in your main collection (most recent X, most popular X etc.) and perhaps even pre-populating that document field (essentially manual padding) to beyond the average size will reduce the moves caused additions/changes.
This is the reason why tip #6 in the MongoDB Developers tips and tricks book from O'Reilly is:
Tip #6: Do not embed fields that have unbound growth