Mongo DB Design For Events Calendar - mongodb

We have an events calendar and here is what I am thinking for our basic mongo schema:
users
username
password
salt
events
name
description
tags[]
category
venue_id
...
venues
name
address
loc
Queries that will be done are:
Listing of distinct tags (this might be hard given current design)
events in a given tag
Event information along with the venue location
All event and venue information for a particular event
All events near me. Need to use a geo index on loc
Any feedback/ideas on if we should be nesting venues inside events or use mysql instead?

Ok, that looks pretty good. Let's construct some of the queries.
Assuming the collections are named users, events, venues:
Insert some dummy events:
db.events.insert({tags:["awesome","fun","cool"]})
db.events.insert({tags:["sweet","fun","rad"]})
Make an index (like a boss)
db.events.ensureIndex({ tags: 1 })
Listing of distinct tags (this might be hard given current design):
Nope, not hard.
db.events.distinct("tags")
[ "awesome", "cool", "fun", "rad", "sweet" ]
Events in a given tag (you meant "with a given tag" right?)
db.events.find({tags: "fun"})
{ "_id" : ObjectId("4ecc08c62477605df6522c97"), "tags" : [ "awesome", "fun", "cool" ] }
{ "_id" : ObjectId("4ecc08d92477605df6522c98"), "tags" : [ "sweet", "fun", "rad" ] }
Event information along with venue location
You can do this a couple different ways. One way would be to query for the event and subsequently query for the venue. With both documents, join (combine) the data you want manually.
OR
You can denormalize a bit and store cached venue names + locations (but not venue details like hours of operation, max occupancy, website, phone number, etc..) for a speed boost (1 query instead of 2) That method comes with the standard denormalization caveat of not being able to update your data in one place.
All event and venue information for a particular event
See above
All events near me. Need to use a geo index on loc
Two queries again, same concept as above just reverse the order.
Get the venues:
db.venues.find( { loc : { $near : [lat,lon] } } )
Get the events using the venue ids:
db.events.find( { venue : { $in : [id1,id2,id3...] } } )
Some of this stuff can be done automatically for you if you use an ODM.
Good luck!

Related

Refrence same Document in Multiple Websites

I have one Admin interface website where a user can create news articles and also select which websites this article shall appear on. (Have many websites connected to the same Mongo database)
Each website has an array with article ID's. When going to one of these websites I loop this array and fetches all articles (from the Articles collection) belonging to this site with:
Articles.findOne({_id:id});
However this becomes a problem if I like to do more advanced queries such as sorting on date. Putting a limit etc etc.
At the same time I don't want to filter all articles for a specific site directly from the Articles collection since it seams expensive? (it contains all articles from all websites) And saving each article locally on each website would create duplicates.
I wonder what is a good way of storing these news articles and still fetch them quickly for each website?
------------------------
I am currently doing like this for fetching all articles from a site and sorting on date. But now I also need to put a limit, only fetch articles from a specific category and so on and it becomes very bothering:
var websites = Websites.find({name : "SITENAME"},{}).fetch();
var now = new Date();
var articles = [];
websites[0].articles.forEach(function(id) {
article = Articles.findOne({_id:id});
if (article != undefined && article.publishedDate < now) {
articles.push(article);
}
});
articles.sort(function(a, b) {
a = a.publishedDate;
b = b.publishedDate;
return a > b ? -1 : a<b ? 1 : 0;
});
return articles;
Edit to clarify:
This is the current database structure. Each article in the articles collection looks like this:
{
"_id" : "CdHWxgq75yjcgQoDZ",
"category" : "Nyheter",
"tags" : [
"ZaifTyGGouPwdrGur"
],
"data" : [
"Hello this is some random content"
],
"publishedOnSites" : [
"ZaifTyGGouPwdrGur"
],
"publishedDate" : ISODate("2015-11-20T07:22:09.799Z"),
"userId" : "B3t6QFgG7MfNkvzR5"
}
Each website in the websites collection like this:
{
"_id" : "ZaifTyGGouPwdrGur",
"name" : "SITENAME",
"categories" : [
"News",
"Life",
"TV",
"Sport",
"Quizzes",
"Video"
],
"tags" : [
"batman",
"bil",
"polis",
"flicka",
"cool",
"byrÄ",
"förvandling"
],
"articles" : [
"PgGetxkC9KynaPNLc",
"ZaifTyGGouPwdrGur",
"oPQHh3u2CGhRwYp2a",
"a5ZkhbxRcLEpggTuF",
"t3n8Zp6Cve6e88Gmt",
"eYQmaavt6tAwbbmzf",
"F9LzZFcFxSpejseHn",
"NLWb5NahoPjgAt7eN",
"pwkTtFN8gZCsnKDGg",
"o62uCK7S6qauJfyYa",
"pivJGzo4CFw3QRb3v",
"H2EHv7rX5GQmyqiDk",
"tGfrv82NMwJEpuThK",
"CvjGPKmsCqmd9o5oP",
"29hoZxnmfovTnC8TM",
"NXHXhaXDYgKLagamJ",
"9EjfABeK5akDLeZJT",
"5q5zeYRkPHMJXtEpT",
"eWGwWq3J7JqtQi2fK",
"7W27ufZ4qDyX4mJnC",
"oBhGpNCBTrMcb3qvq",
"7pRorBYbZ8Mx6jYX3",
"d2PoAFGTcbQzapXpW",
"qDRiB65vcpMu6KTTe",
]
}
I save the article ID's in each website to fetch it quickly without having to filter all articles. However this becomes a problem when I want to make queries such as sorting on date, putting a limit, skipping the first elements, only fetch articles with a certain category etc.
I need suggestions for a better database structure.
Usually, it's better to let MongoDB handle the filtering, sorting etc. It knows how to do it well and how to do it fast.
So, what you'd want to do is this:
var arcticles_ids = Websites.findOne({name: "SITENAME"}).articles;
var articlesCursor = Articles.find({_id: {$in: articles_ids}}, {sort: {publishedDate: -1}});
On the second line, you can add a limit etc. If you're worried about performance, add indexes, e.g.:
db.articles.createIndex({_id: 1, publishedDate: -1});
Note: Do not just add this index to your database. Analyze what kind of queries you have and add indexes based off of that. The above was just an example.
Also, you might want to consider adding a field to the Articles collection that stores all the websites that this article belongs to. E.g:
article: {
someField: someValue,
websites_ids: [1, 5, 8, 10]
}
This is useful if you want to make your query reactive. E.g:
var articlesCursor = Articles.find({websites_ids: website_id}, {sort: {publishedDate: -1}});
This way, if the cursor is reactive and an article is added to a website, the client immediately receives this information about the article. If done your way, the cursor would only track the specific IDs of the articles. Something to consider.

Need Help on Mongo DB query

There is an existing person collection in the system which is like:
{
"_id" : ObjectId("536378bcc9ecd7046700001f"),
"engagements":{
"5407357013875b9727000111" : {
"role" : "ADMINISTRATOR",
},
"5407357013875b9727000222" : {
"role" : "DEVELOPER",
}
}
}
So that multiple user objects can have the same engagement with a specific role, I need to fire a query in this hierarchy where I can get all the persons which have a specific engagement in the engagements property of person collection.
I want to get all the persons which have
5407357013875b9727000222 in the engagements.
I know $in operator could be used but the problem is that I need to compare the keys of the sub Json engagements.
I think it's as simple as this:
db.users.find({'engagements.5407357013875b9727000222': {$exists: true}})
If you want to match against multiple engagement ids, then you'll have to use $or. Sorry, no $in for you here.
Note, however, that you need to restructure your data, as this one can't be indexed to help this concrete query. Here I assume you care about performance and this query is used often enough to have impact on the database.

Reference an _id in a Subdocument in another Collection Mongodb

I am developing an application with mongodb and nodejs
I should also mention that I am new to both so please help me through this question
my database has a collection categories and then in each category I am storing products in subdocument
just like below :
{
_id : ObjectId(),
name: String,
type: String,
products : [{
_id : ObjectId(),
name : String,
description : String,
price : String
}]
});
When it comes to store the orders in database the orders collection will be like this:
{
receiver : String,
status : String,
subOrders : [
{
products :[{
productId : String,
name : String,
price : String,
status : String
}],
tax : String,
total : String,
status : String,
orderNote : String
}
]
}
As you can see we are storing _id of products which is a subdocument of categories in orders
when storing there is no issue obviously, when it comes to fetch these data if we just need the limited field like name or price there will be no issue as well, but if later on we need some extra fields from products like description,... they are not stored in orders.
My question is this:
Is there any easy way to access other fields of products apart from loop through the whole categories in mongodb, namely I need a sample code for querying the description of a product by only having its _id in mongodb?
or our design and implementation was wrong and I have to re-design it from scratch and separate the products from categories into another collection?
please don't put links to websites or weblogs that generally talks about mongodb and its collections implementations unless they focus on a very similar issue to mine
thanks in advance
I'd assume that you'd want to return as many product descriptions as matched the current list of products, so first, there isn't a query to return only matching array elements. Using $elemMatch you can return a specific element or the first match, but not only matching array elements. However, $elemMatch can also be used as a projection operator.
db.categories({ "products._id" : "PID1" },
{ $elemMatch : { "products._id" : "PID1" },
"products._id" : 1,
"products.description" : 1})
You'd definitely want to index the "products._id" field to achieve reasonable performance.
You might consider instead creating a products collection where each document contains a category identifier, much like you would in a relational database. This is a common pattern in MongoDb when embedding doesn't make sense, or complicates queries and aggregations.
Assuming that is true:
You'll need to load the data from the second collection manually. There are no joins in MognoDb. You might consider using $in which takes a list of values for a field and loads all matching documents.
Depending on the driver you're using to access MongoDb, you should be able to use the projection feature of find, which can limit the fields returned for a document to just those you've specified.
As product descriptions ardently likely to change frequently, you might also consider caching the values for a period on the client (like a web server for example).
db.products.find({ _id: { $in : [ 'PID1', 'PID2'] } }, { description : 1 })

1 document with updates vs Many smaller and inserting

I need to develop a data set for users which stores their favourite items - maybe 5% of users will have favourites, and for those perhaps 5-10 favourites on average, with a max of 50. Almost every user will have a "get favourites" call happen, regardless of if they have them, but will probably add infrequently
My assumption is: There will probably be 100x more "get favourites" than "add/post favourite".
Would it be better to have this structure in mongo, which may slow inserts (since it needs to update 1 document per user), but could be faster to retrieve all.
{
_id : 123456, (the user id)
favourites : [
{ item_id : 43563, created_date : ... },
{ item_id : 31232, created_date : ... },
{ item_id : 23472, created_date : ... }
]
}
Or 1 document per favourite
{
_id: ...,
user_id : 123456,
item_id : 43563,
created_date:...
}
{
_id: ...,
user_id : 123456,
item_id : 31232,
created_date:...
}
{
_id: ...,
user_id : 123456,
item_id : 23472,
created_date:...
}
The second structure is probably more flexible for future requirements change, but I assume the first structure would localise all the data in one area on a disk and may be much quicker for reads.
Then again, I'm not sure if changing the size of a collection document (by many updates) may have a detrimental affect? (i.e. low level would it have to move the document around on disk, or would it fragment the data anyway, since it may not preallocate enough space for it on storage on first insert)
The question is: Is one method recommended or significantly more highly performant than the other.
One way to design a Mongo collection is to think of the way in which the data is most likely to be used and design it for that purpose. In your case your user will query favourites much more frequently that add them. Therefore the collection should be design to optimise this query.
With this in mind the first option is the most optimal of the two. However you might want to consider a slight modification to that structure.
As you have said the getFavourites method will be called for all users but will only return a list of favourites for 5% of users. This call will have to retrieve the favourites array and determine if it has content. While this does not cost too much you could pre-calculate this call by adding an additional field that is true only if the user has favourites. Therefore it will only be necessary to query this field and then only query for favourites if the value returned is true.
I imagine a structure as follows:
{
_id : 123456, (the user id),
hasFavourites: 1,
favourites : [
{ item_id : 43563, created_date : ... },
{ item_id : 31232, created_date : ... },
{ item_id : 23472, created_date : ... }
]
}
This document has favourites so the field hasFavourites is 1, if it didn't it would be 0.

how do I do 'not-in' operation in mongodb?

I have two collections - shoppers (everyone in shop on a given day) and beach-goers (everyone on beach on a given day). There are entries for each day, and person can be on a beach, or shopping or doing both, or doing neither on any day. I want to now do query - all shoppers in last 7 days who did not go to beach.
I am new to Mongo, so it might be that my schema design is not appropriate for nosql DBs. I saw similar questions around join and in most cases it was suggested to denormalize. So one solution, I could think of is to create collection - activity, index on date, embed actions of user. So something like
{
user_id
date
actions {
[action_type, ..]
}
}
Insertion now becomes costly, as now I will have to query before insert.
A few of suggestions.
Figure out all the queries you'll be running, and all the types of data you will need to store. For example, do you expect to add activities in the future or will beach and shop be all?
Consider how many writes vs. reads you will have and which has to be faster.
Determine how your documents will grow over time to make sure your schema is scalable in the long term.
Here is one possible approach, if you will only have these two activities ever. One record per user per day.
{ user: "user1",
date: "2012-12-01",
shopped: 0,
beached: 1
}
Now your query becomes even simpler, whether you have two or ten activities.
When new activity comes in you always have to update the correct record based on it.
If you were thinking you could just append a record to your collection indicating user, date, activity then your inserts are much faster but your queries now have to do a LOT of work querying for both users, dates and activities.
With proposed schema, here is the insert/update statement:
db.coll.update({"user":"username", "date": "somedate"}, {"shopped":{$inc:1}}, true)
What that's saying is: "for username on somedate increment their shopped attribute by 1 and create it if it doesn't exist aka "upsert" (that's the last 'true' argument).
Here is the query for all users on a particular day who did activity1 more than once but didn't do any of activity2.
db.coll.find({"date":"somedate","shopped":0,"danced":{$gt:1}})
Be wary of picking a schema where a single document can have continuous and unbounded growth.
For example, storing everything in a users collection where the array of dates and activities keeps growing will run into this problem. See the highlighted section here for explanation of this - and keep in mind that large documents will keep getting into your working data set and if they are huge and have a lot of useless (old) data in them, that will hurt the performance of your application, as will fragmentation of data on disk.
Remember, you don't have to put all the data into a single collection. It may be best to have a users collection with a fixed set of attributes of that user where you track how many friends they have or other semi-stable information about them and also have a user_activity collection where you add records for each day per user what activities they did. The amount or normalizing or denormalizing of your data is very tightly coupled to the types of queries you will be running on it, which is why figure out what those are is the first suggestion I made.
Insertion now becomes costly, as now I will have to query before insert.
Keep in mind that even with RDBMS, insertion can be (relatively) costly when there are indices in place on the table (ie, usually). I don't think using embedded documents in Mongo is much different in this respect.
For the query, as Asya Kamsky suggest you can use the $nin operator to find everyone who didn't go to the beach. Eg:
db.people.find({
actions: { $nin: ["beach"] }
});
Using embedded documents probably isn't the best approach in this case though. I think the best would be to have a "flat" activities collection with documents like this:
{
user_id
date
action
}
Then you could run a query like this:
var start = new Date(2012, 6, 3);
var end = new Date(2012, 5, 27);
db.activities.find({
date: {$gte: start, $lt: end },
action: { $in: ["beach", "shopping" ] }
});
The last step would be on your client driver, to find user ids where records exist for "shopping", but not for "beach" activities.
One possible structure is to use an embedded array of documents (a users collection):
{
user_id: 1234,
actions: [
{ action_type: "beach", date: "6/1/2012" },
{ action_type: "shopping", date: "6/2/2012" }
]
},
{ another user }
Then you can do a query like this, using $elemMatch to find users matching certain criteria (in this case, people who went shopping in the last three days:
var start = new Date(2012, 6, 1);
db.people.find( {
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
}
});
Expanding on this, you can use the $and operator to find all people went shopping, but did not go to the beach in the past three days:
var start = new Date(2012, 6, 1);
db.people.find( {
$and: [
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
},
actions : {
$not: {
$elemMatch : {
action_type : { $in: ["beach"] },
date : { $gt : start }
}
}
}
]
});