mongodb data model design - embedded document only?

mongodb data model design - embedded document only? - mongodb

I'm not sure if I have to use only embedded documents for this example:
I have a basic model for an user
Every user has multiple pages
Every page has multiple sessions
Every session has multiple actions
So it might look like this:
user = {
'email': 'test#test.com',
'pages': [
{
'name': 'best page',
'sessions': [
{
session_name: 'abc',
actions: [
{abc: 'def'},
{abc: 'def'}
]
},
]
}, ..
]
};
Basically there are 3 nested arrays. Sessions data will be used just for reading (no update operation). I was thinking about making sessions as another model with reference on page. Is it a good idea?

You are doing absolutely fine with your architecture.This is the case of one to many relationship. You must keep Sessions data in nested form as illustrated in your example.Even if you want to just query sessions data, you might need to issue multiple queries to resolve the references in future. So,for better practice, you should not keep sessions data separately. For more information, see Mongo: Model One-to-Many Relationships

Related

Good DB-design to reference different collections in MongoDB

I'm regularly facing the similar problem on how to reference several different collections in the same property in MongoDB (or any other NoSQL database). Usually I use Meteor.js for my projects.
Let's take an example for a notes collection that includes some tagIds:
{
_id: "XXXXXXXXXXXXXXXXXXXXXXXX",
message: "This is an important message",
dateTime: "2018-03-01T00:00:00.000Z",
tagIds: [
"123456789012345678901234",
"abcdefabcdefabcdefabcdef"
]
}
So a certain id referenced in tagIds might either be a person, a product or even another note.
Of course the most obvious solutions for this imo is to save the type as well:
...
tagIds: [
{
type: "note",
id: "123456789012345678901234",
},
{
type: "person",
id: "abcdefabcdefabcdefabcdef",
}
]
...
Another solution I'm also thinking about is to use several fields for each collection, but I'm not sure if this has any other benefits (apart from the clear separation):
...
tagIdsNotes: ["123456789012345678901234"],
tagIdsPersons: ["abcdefabcdefabcdefabcdef"],
...
But somehow both solutions feel strange to me as they need a lot of extra information (it would be nice to have this information implicit) and so I wanted to ask, if this is the way to go, or if you know any other solution for this?

If you use Meteor Methods to pull this data, you have a chance to run some code, get from DB, run some mappings, pull again from DB etc and return a result. However, if you use pub/sub, things are different, you need to keep it really simple and light.
So, first question: method or pub/sub?
Your question is really more like: should I embed and how much to embed, or should I not embed and build relations (only keep an id of a tag in the message object) and later use aggregations or should I denormalize (duplicate data): http://highscalability.com/building-scalable-databases-denormalization-nosql-movement-and-digg
All these are ok in Mongo depending on your case: https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-3
The way I do this is to keep a tags Collection indexed by messageId and eventually date (for sorting). When you have a message, you get all tags by querying the Tags Collection rather than mapping over your tags in your message object and send 3 different queries to 3 different Collections (person, product, note).
If you embed your tags data in the message object, let's say in your UX you want to show there are 3 tags and on click you get those 3 tags. You can basically pull those tags when you pulled the message (and might not need that data) or pull the tags on an action such as click. So, you might want to consider what data you need in your view and only pull that. You could keep an Integer as number of tags on the message object and save the tags in either a tags Collection or embed in your message object.
Following the principles of NoSQL it is ok and advisable to save some data multiple times in different collections to make your queries super fast.
So in a Tags Collection you could save as well things related to your original objects. Let's say
// Tags
{
...
messageId: 'xxx',
createdAt: Date,
person: {
firstName: 'John',
lastName: 'Smith',
userId: 'yyyy',
...etc
},
{
...
messageId: 'xxy',
createdAt: Date,
product: {
name: 'product_name',
productId: 'yyzz',
...etc
},
}

Embedding duplicate "friendship" subdocuments to model mutual friendship edges in MongoDB

I'm trying to model bidirectional friendships in MongoDB. "Bidirectional" means, like Facebook but unlike Twitter, if you're friends with Sam then Sam must also be friends with you. In MongoDB, the usually-recommended solution (example) seems to be to something like this:
Create a User collection (aka nodes to use the proper graph theory term) and a Friendship collection (aka edges)
Each User document contains an embedded friends array, each element of which contains the ObjectID of each friend. Each element can also cache read-only info about each friend (e.g. name, photo URL) that can avoid cross-document queries for common use-cases like "display my friend list".
Adding friendships involves inserting a new Friendship document and then $push-ing the Friendship's Object ID into a friends array in both users, as well as read-only cached info about friends (e.g. name) to avoid multi-document queries when displaying friend lists.
I'm considering a different design where, instead of a separate Friendship collection, edge data will be stored (duplicated) in both nodes of the bidirectional relationship. Like this:
{
_id: new ObjectID("111111111111111111111111"),
name: "Joe",
pictureUrl: "https://foo.com/joe.jpg",
invites: [
... // similar schema to friends array below
],
friends: [
{
friendshipId: new ObjectID("123456789012345678901234"),
lastMeeting: new Date("2019-02-07T20:35:55.256+00:00"),
user1: {
userId: new ObjectID("111111111111111111111111"),
name: "Joe", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/joe.jpg",
},
user2: {
userId: new ObjectID("222222222222222222222222"),
name: "Bill", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/bill.jpg",
},
}
]
},
{
_id: new ObjectID("222222222222222222222222"),
name: "Bill",
pictureUrl: "https://foo.com/bill.jpg",
invites: [
... // similar schema to friends array below
],
friends: [
{
friendshipId: new ObjectID("123456789012345678901234"),
lastMeeting: new Date("2019-02-07T20:35:55.256+00:00"), // shared data about the edge
user1: { // data specific to each friend
userId: new ObjectID("111111111111111111111111"),
name: "Joe", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/joe.jpg",
},
user2: { // data specific to each friend
userId: new ObjectID("222222222222222222222222"),
name: "Bill", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/bill.jpg",
},
}
]
}
Here's how I'm planning to deal with the following:
Reads - all reads for common operations happen only from individual User documents. High-level info about friends (e.g. name, picture URL) are cached inside the friends array.
Inviting a friend - add a new document to an invites array embedded into both users (not shown above) that's similar in structure and functionality to the friends collection shown above
Invitation accepted - using updateMany, add a new identical embedded document to the friends array of both users, and $pull an element from invites array of both users. Initially I'll use multi-document transactions for these updates, but because adding friendships isn't time-critical, this could be adapted to use eventual consistency.
Friendship revoked - use updateMany with a filter for {'friends.friendshipId': new ObjectID("123456789012345678901234")} to $pull the friendship subdocument from both users' friends arrays. Like above, this could use multi-document transactions initially, and eventual consistency later if needed for scale.
Updates to cached data - if a user changes info cached in friends (e.g. name or picture URL), this is an uncommon operation that can proceed slowly and one-document-at-a-time, so eventual consistency is fine.
I have two basic concerns that I'd like your advice about:
What are the problems and pitfalls with the approach described above? I know about the obvious things: extra storage, slower updates, need to add queuing and retry logic to support eventual consistency, and the risk of edge data getting out-of-sync between its two copies. I think I'm OK with these problems. But are there other, non-obvious problems that I will likely run into?
Instead of having a user1 and user2 field for each node of the edge, would it be better to use a 2-element array instead? Why or why not? Here's an example of what I mean:
friends: [
{
friendshipId: new ObjectID("123456789012345678901234"),
lastMeeting: new Date("2019-02-07T20:35:55.256+00:00"),
users: [
{
userId: new ObjectID("111111111111111111111111"),
name: "Joe", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/joe.jpg",
},
{
userId: new ObjectID("222222222222222222222222"),
name: "Bill", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/bill.jpg",
},
],
}
]
BTW, I know that graph databases and even relational databases are better at modeling relationships compared to MongoDB. But for a variety of reasons I've settled on MongoDB for now, so please limit answers to MongoDB solutions rather than pointing me to using a graph or relational database. Thanks!

What is the advantage of structuring collections with userID as document keys vs as documents with the userID attribute?

Specifically for Firestore, why would one structure their root-collections as userId documents that contains a userPosts sub-collection that holds each user's posts. Why wouldn't one just hold the posts in the root collection and query-filter by userId?
For example, many StackOverflow Q/As about structuring data suggests to have something like this:
'users':{
'user1#gmail.com': {
'username': 'user1'
}
},
'posts':{
'user1#gmail.com': {
'userPosts':{
'post1': {
'content': 'post1 content'
},
'post2': {
'content': 'post2 content'
},
}
}
}
where 'users' and 'posts' are collections and 'userPosts' is a sub-collection and the query is:
db.collection('posts').doc('user1#gmail.com').collection('userPosts').get()
What is the advantage of organizing the 'posts' collections by userId (email in this case) and userPosts rather than keeping a collection full of notes and querying by matching userId with the post's userId, like so:
'users':{
'user1#gmail.com': {
'username': 'user1'
}
},
'posts':{
'post1': {
'userId': 'user1#gmail.com',
'content': 'post1 content'
},
'post2': {
'userId': 'user1#gmail.com',
'content': 'post2 content'
},
}
where the query is:
db.collection('posts').where('userId', '==', 'user1#gmail.com')

When dealing with nosql type databases, all decisions about data structure should be based on the queries you intend to perform. Without knowing your queries, a certain structure may not actually meet all the requirements of your app. This is why data is often duplicated in nosql databases - to suit queries that may not be possible otherwise.
In your example, for the one use case that you've provided, there is no real advantage to structuring the data either way. The advantage to choosing one or the other is more likely based on other queries that you might want to perform.
For example, if you wanted to construct some query across all posts for all users, it is (currently) not possible with your first structure.
There may also be the issue of security rules. Some database structures are easier to protect with some security rules. But again, it depends on the requirements for your rules.
Bottom line is that the advantageous structure is the one that meets all the needs of all your queries and rules. But a single structure may not meet all your needs, and you may end up keeping two or more in sync.

How get distinct list of values and below that all the documents that are of that distinct value?

Getting my head around MongoDB document design and trying to figure out how to do something or if I'm barking up the wrong tree.
I'm creating a mini CMS. The site will contains either documents or url's that are grouped by a category, i.e. There's a group called 'shop' that has a list of links to items on another site and there's a category called 'art' that has a list of works of art, each of which has a title, summary and images for a slideshow.
So one possible way to do this would be to have a collection that would look something like:
[{category: 'Products',
title: 'Thong',
href: 'http://www.thongs.com'
},{
category: 'Products',
title: 'Incredible Sulk',
href:'http://www.sulk.com'
},{
category: 'Art',
title: 'Cool art',
summary: 'This is a summary to display',
images: [...]
}]
But, and here's the question.... when I'm building the webpage this structure isn't much use to me. the homepage contains lists of 'things' grouped by their category, lists... menus.. stuff like that. To be able to easily do that I need to have something that looks more like:
[
{'Products':[
{title:'thong', href:'http://www.thongs.com'},
{title:'Incredible Sulk'}
]
},
{'Art':[
{title:'Cool art',summary:'This is a summary to display',images:[...]}
]
}
]
So the question is, can I somehow do this transformation in MondoDB? If I can't then is it bad to do this in my app server layer(I'd get a grouped list of unique categories and then loop through them querying Mongo for documents of that category)? I'm guessing app server layer is bad, after all mongodb has it all in memory if I'm lucky. If neither of these are good then am I doing it all wrong and should I actually store the structure like this in the first place?
I need to make it easy for the user to create categories on the fly and consider what happens if they start to add lots of documents and I either need to restrict how many documents I pull back for each category or somehow limit the fields returned so that when I query mongodb it doesn't return back a relatively big chunk of data which is slow and wasteful, but instead returns back the minimum I need to create the desired page.

I figured a group query that will give me almost the structure I want, but good enough to use for templates.
db.things.group({
key:{category:true},
initial:{articles:[]},
reduce: function(doc, aggregator) {
aggregator.articles.push(doc);
}
})

MongoDB data structure with large number internal documents

I am relatively new to MongoDB, and so far am really impressed. I am struggling with the best way to setup my document stores though. I am trying to do some summary analytics using twitter data and I am not sure whether to put the tweets into the user document, or to keep those as a separate collection. It seems like putting the tweets inside the user model would quickly hit the limit with regards to size. If that is the case then what is a good way to be able to run MapReduce across a group of user's tweets?
I hope I am not being too vague but I don't want to get too specific and too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB land where I would lay out my schema like
| USER |
--------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like the logical schema in Mongo would be
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
but wouldn't that quickly bloat the User document beyond capacity. But I would like to run analysis on tweets belonging to users with similar somegroupID. It conceptually makes sense to to the model layout as above, but at what point is that too unweildy? And what are viable alternatives?

You're right that you'll probably run into the 16MB MongoDB document limit here. You are not saying what sort of analysis you'd like to run, so it is difficult to recommend a schema. MongoDB schemas are designed with the data-query (and insertion) patterns in mind.
Instead of putting your tweets in a user, you can of course quite easily do the opposite, add a user-id and group-id into the tweet documents itself. Then, if you need additional fields from the user, you can always pull that in a second query upon display.
I mean a design for a tweet document as:
{
'hashtags': [ '#foo', '#bar' ],
'urls': [ "http://url1.example.com", 'http://url2.example.com' ],
'user_mentions' : [ 'queen_uk' ],
'geodata': { ... },
'userid': 'derickr',
'somegroupid' : 40
}
And then for a user collection, the documents could look like:
{
'userid' : 'derickr',
'realname' : Derick Rethans',
...
}

All credit to the fine folks at MongoHQ.com. My question was answered over on https://groups.google.com/d/msg/mongodb-user/OtEOD5Kt4sI/qQg68aJH4VIJ
Chris Winslett # MongoHQ
You will find this video interesting:
http://www.10gen.com/presentations/mongosv-2011/schema-design-at-scale
Essentially, in one document, store one days of tweets for one
person. The reasoning:
Querying typically consists of days and users
Therefore, you can have the following index:
{user_id: 1, date: 1} # Date needs to be last because you will range
and sort on the date
Have fun!
Chris MongoHQ
I think it makes the most sense to implement the following:
user
{ user_id: 123123,
screen_name: 'cledwyn',
misc_bits: {...},
groups: [123123_group_tall_people, 123123_group_techies, ],
groups_in: [123123_group_tall_people]
}
tweet
{ tweet_id: 98798798798987987987987,
user_id: 123123,
tweet_date: 20120220,
text: 'MongoDB is pretty sweet',
misc_bits: {...},
groups_in: [123123_group_tall_people]
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse