Consider two collections:
users (can be both, organizer and participant in other meetings)
meetings
For the sake of simplicity, I show here only the basic data, in my code i have emails, passwords, etc.
User (easier part)
{ "_id": "ObjectId('user0_id')", "username": "Paul" }
and model:
type User struct {
Id primitive.ObjectID `bson:"_id" json:"id,omitempty"`
Username string `json:"username"`
}
Meeting:
{
"_id": "ObjectId('meeting0_id')",
"organizer": "ObjectId('user0_id')",
"participants": [ "ObjectId('user1_id')", "ObjectId('user2_id')", "ObjectId('user3_id')"]
}
and model:
type Meeting struct {
Id primitive.ObjectID `bson:"_id" json:"id,omitempty"`
Organizer primitive.ObjectID `json:"organizer"`
Participants []primitive.ObjectID `json:"participants,omitempty"
}
Everything works if I extract only the basic data from mongo, but it seems ineffective... Because if I want to present this data in a readable form, I can't use only ObjectID, but with mongo "$lookup" I would like to get more data about users right away.
Another problem, in some cases I need a different dataset. Once to show a list I need only
the name of the users assigned to the meeting.
However, in the case of data administration or sending notifications, I need more (all?) User data.
How to correctly (what are the best practices) to store data like this in Go models?
Create one super-struct "meeting" with all possible data? Eg.with Participants []User instead of ID's ?
But what next? Get a complete set of data from the database each time, then filter it on the code side? Or filter on mongo side, but in most cases almost all fields in struct will be empty (eg. LastPasswordChangeDate in simple meeting participants list). Especially since there may be more "lookup" data, e.g. meeting place, invitations, etc., etc.
How finally save this super struct to two collections?
P.S. Create different models for different "views" of meetings seems super stupid...
Related
I'm regularly facing the similar problem on how to reference several different collections in the same property in MongoDB (or any other NoSQL database). Usually I use Meteor.js for my projects.
Let's take an example for a notes collection that includes some tagIds:
{
_id: "XXXXXXXXXXXXXXXXXXXXXXXX",
message: "This is an important message",
dateTime: "2018-03-01T00:00:00.000Z",
tagIds: [
"123456789012345678901234",
"abcdefabcdefabcdefabcdef"
]
}
So a certain id referenced in tagIds might either be a person, a product or even another note.
Of course the most obvious solutions for this imo is to save the type as well:
...
tagIds: [
{
type: "note",
id: "123456789012345678901234",
},
{
type: "person",
id: "abcdefabcdefabcdefabcdef",
}
]
...
Another solution I'm also thinking about is to use several fields for each collection, but I'm not sure if this has any other benefits (apart from the clear separation):
...
tagIdsNotes: ["123456789012345678901234"],
tagIdsPersons: ["abcdefabcdefabcdefabcdef"],
...
But somehow both solutions feel strange to me as they need a lot of extra information (it would be nice to have this information implicit) and so I wanted to ask, if this is the way to go, or if you know any other solution for this?
If you use Meteor Methods to pull this data, you have a chance to run some code, get from DB, run some mappings, pull again from DB etc and return a result. However, if you use pub/sub, things are different, you need to keep it really simple and light.
So, first question: method or pub/sub?
Your question is really more like: should I embed and how much to embed, or should I not embed and build relations (only keep an id of a tag in the message object) and later use aggregations or should I denormalize (duplicate data): http://highscalability.com/building-scalable-databases-denormalization-nosql-movement-and-digg
All these are ok in Mongo depending on your case: https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-3
The way I do this is to keep a tags Collection indexed by messageId and eventually date (for sorting). When you have a message, you get all tags by querying the Tags Collection rather than mapping over your tags in your message object and send 3 different queries to 3 different Collections (person, product, note).
If you embed your tags data in the message object, let's say in your UX you want to show there are 3 tags and on click you get those 3 tags. You can basically pull those tags when you pulled the message (and might not need that data) or pull the tags on an action such as click. So, you might want to consider what data you need in your view and only pull that. You could keep an Integer as number of tags on the message object and save the tags in either a tags Collection or embed in your message object.
Following the principles of NoSQL it is ok and advisable to save some data multiple times in different collections to make your queries super fast.
So in a Tags Collection you could save as well things related to your original objects. Let's say
// Tags
{
...
messageId: 'xxx',
createdAt: Date,
person: {
firstName: 'John',
lastName: 'Smith',
userId: 'yyyy',
...etc
},
{
...
messageId: 'xxy',
createdAt: Date,
product: {
name: 'product_name',
productId: 'yyzz',
...etc
},
}
I'm new to nosql (MongoDB) so go easy on me.
I'm scraping json-ld from various web pages and want to store/recall the data. However the value types keep changing. For instance sometimes the "author" field uses an "organization" type, other times it's a "person" type sometimes it's simply a string, and sometimes it's just missing.
Should I convert the data to some type of standard?
Should each object be put into it's own collection and referenced?
How do you deal with displays being different.
Looking for words of experience or links to good articles on how to deal with inconsistent data structure.
The whole point of No-Sql database is that its schema less, and the structure can vary from document to other, so I see no issue in here.
I think you are asking on how you should deal with it in your application business logic, so here is my suggestion:
You can save the author as an embedded sub-document which always have a field called “type” (as an enum of values: String, Person, Organization, etc…) and act accordingly when you fetch the data.
For example, if the author is simply a String then the document would look like something like:
{
…,
“author”: {
“type”: “String”,
“text”: <text>
}
}
If its a Person type then:
{
…,
“author”: {
“type”: “Person”,
“first_name”: <first name>,
“last_name”: <last name>
}
}
Currently in our system we have two separate collections, of invites, and users. So we can send an invite to someone, and that invite will have some information attached to it and is stored in the invites collection. If the user registers his account information is stored in the users collection.
Not every user has to have an invite, and not every invite has to have a user. We check if a user has an invite (or visa versa) on the email address, which in those case is stored in both collections.
Originally in our dashboard we have had a user overview, in which there is a page where you can see the current users and paginate between them.
Now we want to have one single page (and single table) in which we can view both the invites and the users and paginate through them.
Lets say our data looks like this:
invites: [
{ _id: "5af42e75583c25300caf5e5b", email: "john#doe.com", name: "John" },
{ _id: "53fbd269bde85f02007023a1", email: "jane#doe.com", name: "Jane" },
...
]
users: [
{ _id: "53fe288be081540200733892", email: "john#doe.com", firstName: "John" },
{ _id: "53fd103de08154020073388d", email: "steve#doe.com", firstName: "Steve" },
...
]
Points to note.
Some users can be matched with an invite based on the email (but that is not required)
Some invites never register
The field names are not always exactly the same
Is it possible to make a paginated list of all emails and sort on them? So if there is an email that starts with an a in collection invites, that is picked before the email that starts with a b in collection users etc. And then use offset / limit to paginate through it.
Basically, I want to "merge" the two collections in something that would be akin to a MySQL view and be able to query on that as if the entire thing was one collection.
I'm preferably looking for a solution without changing the data structure in the collection (as a projected view or something, that is fine). But parts of the code already rely on the given structure. Which in light of this new requirement might not be the best approach.
I'm trying to build an event website that will host videos and such. I've set up a collection with the event name, event description, and an object with some friendly info of people "attending". If things go well there might be 100-200k people attending, and those people should have access to whoever else is in the event. (clicking on the friendly name will find the user's id and subsequently their full profile) Is that asking too much of mongo? Or is there a better way to go about doing something like that? It seems like that could get rather large rather quick.
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :{
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url}
}
}
Thanks for the suggestions!
In MongoDB many-to-many modeling (or one-to-many) in general, you should take a different approach depending if the many are few (up to few dozens usually) or "really" many as in your case.
It will be better for you not to use embedding in your case, and instead normalize. If you embed users in your events collection, adding attendees to a certain event will increase the array size. Since documents are updated in-place, if the document can't fit it's disk size, it will have to moved on disk, a very expensive operation which will also cause fragmentation. There are few techniques to deal with moves, but none is ideal.
Having a array of ObjectId as attendees will be better in that documents will grow much less dramatically, but still issue few problems. How will you find all events user has participated in? You can have a multi-key index for attendees, but once a certain document moves, the index will have to be updated per each user entry (the index contains a pointer to the document place on disk). In your case, where you plan to have up to 200K of users it will be very very painful.
Embedding is a very cool feature of MongoDB or any other document oriented database, but it's naive to think it doesn't (sometimes) comes without a price.
I think you should really rethink your schema: having an events collection, a users collection and a user_event collection with a structure similar to this:
{
_id : ObjectId(),
user_id : ObjectId(),
event_id : ObjectId()
}
Normalization is not a dirty word
Perhaps you should consider modeling your data in two collections and your attendees field in an event document would be an array of user ids.
Here's a sample of the schema:
db.events
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :[ObjectId('userId1'), ObjectId('userId2') ...]
}
db.users
{
_id : ObjectId('userId1'),
username: 'user friendly name',
avatarLink: 'url to avatar'
}
Then you could do 2 separate queries
db.events.find({_id: ObjectId('eventId')});
db.users.find( {_id: {$in: [ObjectId['userId1'), ObjectId('userId2')]}});
Suppose I have Student and Teacher in a many to many relationship. If I just want to find out all the teachers for a given student or vice versa I generally model it by using embedded Object Ids. For example if teacher has a property studentIds which is an array of student Object Ids then that is enough information to do all the queries you need.
However suppose that a student can give a teacher a rating. How should this rating fit into the model? At the moment I do the following:
Inside teacher instead of storing an array of student, I store an
array of json objects {studentId: ObjectId, rating: String}
When doing the query, I transform the array of json objects into an array
of studentIds and extract the full information as usual
So now I have an array of student objects and an array of json objects with
the ratings
However, since the $in operator in MongoDB does not preserve ordering, I need to do my own sorting
At the last step, with everything in order I can combine student objects with ratings
to get what I want
It works but seems somewhat convoluted is there a better way of doing this?
Here are some considerations. In the end, it depends on your requirements:
Rating is optional, right?
If so, ask yourself whether you want to combine a required feature (storing teacher/student association) with a nice-to-have one. Code that implements a nice-to-have feature now writes to your most important collection. I think you can improve separation of concerns in your code with a different db schema.
Will you need more features?
Let's say you want to provide students with a list of ratings they gave, the average rating a student has given to teachers, and you want to show a development of ratings over time. This will be very messy with embedded documents. Embedded documents are less flexible.
If you need top read performance, you need to denormalize more data
If you want to stick to the embedded documents, you might want to copy more data. Let's say there's an overview of ratings per teacher where you can see the students' names. It would be helpful to embed an object
{ studentId : ObjectId,
rating: string,
studentName: string,
created: dateTime }
As alternatives, consider
TeacherRating {
StudentId: id
TeacherId: id
Rating: number
Created: DateTime
}
Teacher/student association will still be stored in the teacher object, but the ratings are in a different collections. A rating can't be created if no association between teacher and student can be found.
or
TeacherStudentClass {
StudentId: id
TeacherId: id
Class: id
ClassName: string // (denormalized, just an example)
Rating: number // (optional)
Created: DateTime
}
To find all students for a given teacher, you'd have to query the linker document first, then do a $in query on the students, and vice versa. That is one more query, but it comes with a huge gain in flexibility.