Relation N to N mongodb. Embedded or reference? - mongodb

I'm newbie in mongodb. Suppose I have the following:
In this case, I have a relation N to N between Posts an tags. Because 1 Post could be have several tags associated and vice versa.
Post
{
"_id" : ObjectId("508d27069cc1ae293b36928d"),
"title" : "This is the title",
"body" : "This is the body text.",
"tags" : [
"chocolate",
"spleen",
"piano",
"spatula"
],
"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
"category_id" : ObjectId("508d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "This is coment 1",
"body" : "This is the body of comment 1.",
"author_id" : ObjectId("508d345f9cc1ae293b369296"),
"created_date" : ISODate("2012-10-28T13:34:23.929Z")
},
{
"subject" : "This is coment 2",
"body" : "This is the body of comment 2.",
"author_id" : ObjectId("508d34739cc1ae293b369297"),
"created_date" : ISODate("2012-10-28T13:34:43.192Z")
},
{
"subject" : "This is coment 3",
"body" : "This is the body of comment 3.",
"author_id" : ObjectId("508d34839cc1ae293b369298"),
"created_date" : ISODate("2012-10-28T13:34:59.336Z")
}
]
}
Let's say I have to change a specific tag, for instance: replace "chocolate" by "choco". Probably, There will have a lot of posts with the tag "chocolate".
Is this approach embedded the correct one or Do I have to implement reference?

If you expect that a value that is used in multiple locations may need to be changed you'll need to use it by reference. IE in your case if you have a ton of posts with the chocolate and you need to replace it with choco, it will be very inefficient (and a little bit risky) if the tags are embedded. However if its all done by reference you just need to change chocolate to choco in one place.
StackOverflow actually does this. Question tags may be renamed and it will instantly rename all the instances of the tag. This is done easily by StackOverflow though because they use relational SQL databases in the backed.
Which leads me to my final point - if you expect to need a lot of reference data you should really reconsider using Mongo. When your data has a lot of reference relations, you probably want to use a relational database.

Related

MongoDB: Is it possible to index(unique) subarrays from documents in an isolated way?

I recently encountered an issue, and I'd like to solve it. If anyone would give any suggestion I'll be grateful.
I have documents that represent "users" and each document has a subarray that is responsible to save some codes, they can be many for each user. The matter is, each user cannot have duplicate codes in its specific array, but at the same time, in this case, each document should be isolated, for example, being possible to have two or more identical codes but since they are from different documents(users).
In short, the subarray("codes") cannot have individually duplicated codes(code), but that shouldn't interfere with other documents
I could do that in the application part, but I think doing that guarantee directly on DB, it's safer.
Is it possible to create indexes for this specific situation?
Example of two documents representing their respective users:
{ // Document of user 1
"_id" : "1", //user 1 and its codes
"codes" : [
{
"code" : "1111",
"description" : "code 1",
},
{
"code" : "2222",
"description" : "code 2",
},
{
"code" : "3333",
"description" : "code 3",
}
]
},
{ // Document of user 2
"_id" : "2", //user 2 and its codes
"codes" : [
{
"code" : "1111",
"description" : "code 1",
},
{
"code" : "4444",
"description" : "code 2",
},
{
"code" : "2222",
"description" : "code 3",
}
]
}
Thank you!
Use https://docs.mongodb.com/manual/reference/operator/update/addToSet/ to maintain uniqueness of code subdocuments. You will need to ensure that you always specify code fields in the same order (e.g. code, description).

Relationship between 2 collections in MongoDB and show data between them

I am making a small database for library with MongoDB. I have 2 collections, first one is called 'books' which stores information about books. The second collection is called 'publishers' which stores information about the publishers and the IDs of the books which they published.
This is the document structure for 'books'. It has 3 documents
{
"_id" : ObjectId("565f2481104871a4a235ba00"),
"book_id" : 1,
"book_name" : "C++",
"book_detail" : "This is details"
},
{
"_id" : ObjectId("565f2492104871a4a235ba01"),
"book_id" : 2,
"book_name" : "JAVA",
"book_detail" : "This is details"
},
{
"_id" : ObjectId("565f24b0104871a4a235ba02"),
"book_id" : 3,
"book_name" : "PHP",
"book_detail" : "This is details"
}
This is the document structure for 'publishers'. It has 1 document.
{
"_id" : ObjectId("565f2411104871a4a235b9ff"),
"pub_id" : 2,
"pub_name" : "Publisher 2",
"pub_details" : "This is publishers details",
"book_id" : [2,3]
}
I want to write a query to show all the details of the books which are published by this publisher. I have written this query but it does not work. When I run it, it displays this message "Script executed successfuly, but there are no results to show.".
db.getCollection('publishers').find({"pub_id" : 2}).forEach(
function (functionName) {
functionName.books = db.books.find( { "book_id": functionName.book_id } ).toArray();
}
)
I think that your data structure is flawed. The publisher is a property of a book, not the other way around. You should add pub_id to each book, and remove book_id from the publisher:
{
"_id" : ObjectId("565f2481104871a4a235ba00"),
"book_id" : 1,
"book_name" : "C++",
"book_detail" : "This is details",
"pub_id" : 1
},
{
"_id" : ObjectId("565f2492104871a4a235ba01"),
"book_id" : 2,
"book_name" : "JAVA",
"book_detail" : "This is details"
"pub_id" : 2
},
{
"_id" : ObjectId("565f24b0104871a4a235ba02"),
"book_id" : 3,
"book_name" : "PHP",
"book_detail" : "This is details"
"pub_id" : 2
}
Then, select your books like such:
db.getCollection('books').find({"pub_id" : 2});
Try this way,
db.getCollection('publishers').find({"pub_id" : "2"}).exec(function(err, publisher){
if (err) {
res.send(err);
}
else
if(publisher)
{
publisher.forEach(function(functionName)
{
functionName.books = db.books.find( { "book_id": functionName.book_id } ).toArray();
})
}
})
I would suggest reading the official documentation, because the relationship between books and publishers is precisely the example which is used there: https://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
In mongoDB and noSQL at large, it is not true that publisher must be a property of book. This is only the case in RDBMS, where in one-to-many relationships the reference is in the "one" part. The same works the other way round, books don't have to be a property of publisher. The clue here is in the absence of "must."
It all depends on how many is the "many-to-many". In this case, I'd say it's also about what type of library we're talking about, size of catalogue and whether new book purchases are common:
Is the number of books per publisher small AND data about publisher is often accessed with data about the book? Then, embed publisher info in the book document.
Is the number of books per publisher fairly big but reasonably stable (e.g.: historical library where acquisitions are rare)? Then, create a publishers collection with an array of books per publisher.
Is the number of books per publisher fairly big and catalogue grows at a reasonable pace? Then, include reference in the book document and fetch publisher info with it.
Side note
Although not related to the question, I think your document structure is flawed. _id and book_id are redundant. If you want to follow the RDBMS pattern of incremental integer IDs, then it's absolutely OK that you specify your own _id at the time of inserting the document with 1, 2, 3, etc. ObjectID() is a great thing, but, again, there's no obligation to use it.

ALPS sample implementation

I'm looking for a sample client implementation utilizing ALPS (not the mountains but the Application-Level Profile Semantics).
Do YOU! have one?
I've looked into the related RFC draft and discussions but still can figure it quite out.
Specifically I would like to know how my client should know what the descriptor describes, given that my client supposedly knows nothing about the structure and semantics of the REST API as the REST principle demands?
As a human I know that a descriptor with an id tag called "users" is likely to describe how to interact with users but how is my client to know without me telling him explicitly?
I know I could insert some kind of keyword to show up in the descriptor and tell my client to match the appropriate ones but this seems hardly the right way.
I happily provide a more detailed example given somebody is willing to read it.
I'm exploring ALPS for the first time too, and my understanding from that RFC draft wasn't immediate either.
Here is a slideshow (166 slides, so it's not possible to copy it all into this answer) from the author of the RFC which I think gives a much better understanding of the role ALPS plays.
As a human I know that a descriptor with an id tag called users is likely to describe how to interact with users but how is my client to know this without me telling him explicitly?
From this slideshow, I deduce this answer to your question: He doesn't.
In the slideshow, a sample ALPS profile is compared with equivalent HTML code for a form submit. The browser knows how to render the HTML to the screen, but only the human knows what it means to POST that form with those input fields, using that submit button.
Here is an example Complete JSON Representation taken from alps.io
{
"alps" : {
"version" : "1.0",
"doc" : {
"href" : "http://example.org/samples/full/doc.html"
},
"descriptor" : [
{
"id" : "search",
"type" : "safe",
"doc" : {"value" :
"A search form with a two inputs"
},
"descriptor" : [
{
"id" : "value",
"name" : "search",
"type" : "descriptor",
"doc" : { "value" : "input for search" }
},
{ "href" : "#resultType" }
]
},
{
"id" : "resultType",
"type" : "descriptor",
"description" : {"value" : "results format"},
"ext" : [
{
"href" : "http://alps.io/ext/range",
"value" : "summary,detail"
}
]
}
]
}
}
Take, for example, a generic mobile phone app which is displaying screens to the user based on REST responses. Say a HAL+Json response contains a reference to a search entity. The app can lookup in this ALPS document what a search entity is, and can be coded on how to represent that. Namely, a search is something which has a name/value pair (with an id) and a href. The href refers to the second descriptor with id resultType which lets the app know the format to expect for search results. The actual URLs and data involved would come from the REST responses.
From July 2014, here is a Spring blog article describing the ALPS for an app which manages a "To Do List". The ALPS document describes
What is a todo entity
What actions can be done with a todo entity
An abridged version of the ALPS profile for that small app:
{
"version" : "1.0",
"descriptors" : [ {
"id" : "todo-representation",
"descriptors" : [ {
"name" : "description",
"doc" : {
"value" : "Details about the TODO item",
"format" : "TEXT"
},
"type" : "SEMANTIC"
}, {
"name" : "title",
"doc" : {
"value" : "Title for the TODO item",
"format" : "TEXT"
},
"type" : "SEMANTIC"
}, {
"name" : "id",
"type" : "SEMANTIC"
}, {
"name" : "completed",
"doc" : {
"value" : "Is it completed?",
"format" : "TEXT"
},
"type" : "SEMANTIC"
} ]
}, {
"id" : "create-todos",
"name" : "todos",
"type" : "UNSAFE",
"rt" : "#todo-representation"
}, {
"id" : "get-todos",
"name" : "todos",
"type" : "SAFE",
"rt" : "#todo-representation"
}, {
"id" : "delete-todo",
"name" : "todo",
"type" : "IDEMPOTENT",
"rt" : "#todo-representation"
} ]
}
I guess one way to think of it might as a kind "schema", but instead of database tables, it's describing the scope of REST responses.

mongodb: taking a set of keys from one collection and matching with another

I'm new to mongodb and javascript, and have been reading the manual, but I can't seem to put the pieces together to solve the following problem.. I was wondering if you can kindly help.
I have two collections "places" and "reviews".
One document in "places" collection is as follows:
{
"_id" : "004571a7-afe4-4124-996e-b6ec779db494",
"name" : "wakawaka place",
"address" : {
"address" : "12 ad avenue",
"city" : "New York",
},
"review" : [
{
"id" : "i32347",
"review_list" : [
"r123456",
"r123457"
],
}
]
}
The "review" array can be empty for some documents.
And in the "reviews" collection, every document in the collection represents a review:
{
"_id" : ObjectId("53c913689c8e91a5a9c4047f"),
"user_id" : "useridhere",
"review_id" : "r123456",
"attraction_id" : "i32347",
"content" : "review content here"
}
What I would like to achieve is, for each place that has reviews, get the content of each review from the "review" collection and store them together in another new collection.
I'd be grateful for any suggestions on how to go about this.
Thanks

mongo updating array element and racing condition?

I am imagining foo is doing update on the third comment, comments.2.value, while bar is $pull-ing, removing the first comment.
If foo finishes first, then the third comment is updated successfully, since the index is still correct.
But if bar finishes first, then the index has changed, and foo's comments.2.value would affect not the third comment anymore.
Is this scenario possible, and if it is, i wonder whether there are common solutions for array element updates and racing condition ?
Thank you !
The situation that you described is theoretically possible if multiple applications are accessing the database simultaneously. For this reason, it is best, if possible, to give each member of the array some unique identifier, rather than accessing elements in the array by position.
For example,
> db.myComments.save({_id:1,
comments:[
{cid:1, author:"Marc", comment:"Marc's Comment"},
{cid:2, author:"Mike", comment:"Mike's Comment"},
{cid:3, author:"Barrie", comment:"Barrie's Comment"}
]})
If we want to modify Mike's Comment, but we don't necessarily know that it will appear second in the array, we can update it like so:
> db.myComments.update({_id:1, "comments.cid":2}, {$set:{"comments.$.comment":"Mike's NEW Comment"}})
> db.myComments.find().pretty()
{
"_id" : 1,
"comments" : [
{
"cid" : 1,
"author" : "Marc",
"comment" : "Marc's Comment"
},
{
"author" : "Mike",
"cid" : 2,
"comment" : "Mike's NEW Comment"
},
{
"cid" : 3,
"author" : "Barrie",
"comment" : "Barrie's Comment"
}
]
}
We could even change the entire sub-document, like so:
> db.myComments.update({_id:1, "comments.cid":2}, {$set:{"comments.$":{cid:4, author:"someone else", comment:"A completely new comment!"}}})
> db.myComments.find().pretty()
{
"_id" : 1,
"comments" : [
{
"cid" : 1,
"author" : "Marc",
"comment" : "Marc's Comment"
},
{
"cid" : 4,
"author" : "someone else",
"comment" : "A completely new comment!"
},
{
"cid" : 3,
"author" : "Barrie",
"comment" : "Barrie's Comment"
}
]
}
The query document will find the first value in the array that matches, and the "$" in the update document references that position.
More information on the "$" operator may be found in the "The $ positional operator" section of the "Updating" documentation.
http://www.mongodb.org/display/DOCS/Updating#Updating-The%24positionaloperator
Hopefully this will give you an idea of how your application can modify values in an array without referencing their position. Good luck!