Refrence same Document in Multiple Websites - mongodb

I have one Admin interface website where a user can create news articles and also select which websites this article shall appear on. (Have many websites connected to the same Mongo database)
Each website has an array with article ID's. When going to one of these websites I loop this array and fetches all articles (from the Articles collection) belonging to this site with:
Articles.findOne({_id:id});
However this becomes a problem if I like to do more advanced queries such as sorting on date. Putting a limit etc etc.
At the same time I don't want to filter all articles for a specific site directly from the Articles collection since it seams expensive? (it contains all articles from all websites) And saving each article locally on each website would create duplicates.
I wonder what is a good way of storing these news articles and still fetch them quickly for each website?
------------------------
I am currently doing like this for fetching all articles from a site and sorting on date. But now I also need to put a limit, only fetch articles from a specific category and so on and it becomes very bothering:
var websites = Websites.find({name : "SITENAME"},{}).fetch();
var now = new Date();
var articles = [];
websites[0].articles.forEach(function(id) {
article = Articles.findOne({_id:id});
if (article != undefined && article.publishedDate < now) {
articles.push(article);
}
});
articles.sort(function(a, b) {
a = a.publishedDate;
b = b.publishedDate;
return a > b ? -1 : a<b ? 1 : 0;
});
return articles;
Edit to clarify:
This is the current database structure. Each article in the articles collection looks like this:
{
"_id" : "CdHWxgq75yjcgQoDZ",
"category" : "Nyheter",
"tags" : [
"ZaifTyGGouPwdrGur"
],
"data" : [
"Hello this is some random content"
],
"publishedOnSites" : [
"ZaifTyGGouPwdrGur"
],
"publishedDate" : ISODate("2015-11-20T07:22:09.799Z"),
"userId" : "B3t6QFgG7MfNkvzR5"
}
Each website in the websites collection like this:
{
"_id" : "ZaifTyGGouPwdrGur",
"name" : "SITENAME",
"categories" : [
"News",
"Life",
"TV",
"Sport",
"Quizzes",
"Video"
],
"tags" : [
"batman",
"bil",
"polis",
"flicka",
"cool",
"byrå",
"förvandling"
],
"articles" : [
"PgGetxkC9KynaPNLc",
"ZaifTyGGouPwdrGur",
"oPQHh3u2CGhRwYp2a",
"a5ZkhbxRcLEpggTuF",
"t3n8Zp6Cve6e88Gmt",
"eYQmaavt6tAwbbmzf",
"F9LzZFcFxSpejseHn",
"NLWb5NahoPjgAt7eN",
"pwkTtFN8gZCsnKDGg",
"o62uCK7S6qauJfyYa",
"pivJGzo4CFw3QRb3v",
"H2EHv7rX5GQmyqiDk",
"tGfrv82NMwJEpuThK",
"CvjGPKmsCqmd9o5oP",
"29hoZxnmfovTnC8TM",
"NXHXhaXDYgKLagamJ",
"9EjfABeK5akDLeZJT",
"5q5zeYRkPHMJXtEpT",
"eWGwWq3J7JqtQi2fK",
"7W27ufZ4qDyX4mJnC",
"oBhGpNCBTrMcb3qvq",
"7pRorBYbZ8Mx6jYX3",
"d2PoAFGTcbQzapXpW",
"qDRiB65vcpMu6KTTe",
]
}
I save the article ID's in each website to fetch it quickly without having to filter all articles. However this becomes a problem when I want to make queries such as sorting on date, putting a limit, skipping the first elements, only fetch articles with a certain category etc.
I need suggestions for a better database structure.

Usually, it's better to let MongoDB handle the filtering, sorting etc. It knows how to do it well and how to do it fast.
So, what you'd want to do is this:
var arcticles_ids = Websites.findOne({name: "SITENAME"}).articles;
var articlesCursor = Articles.find({_id: {$in: articles_ids}}, {sort: {publishedDate: -1}});
On the second line, you can add a limit etc. If you're worried about performance, add indexes, e.g.:
db.articles.createIndex({_id: 1, publishedDate: -1});
Note: Do not just add this index to your database. Analyze what kind of queries you have and add indexes based off of that. The above was just an example.
Also, you might want to consider adding a field to the Articles collection that stores all the websites that this article belongs to. E.g:
article: {
someField: someValue,
websites_ids: [1, 5, 8, 10]
}
This is useful if you want to make your query reactive. E.g:
var articlesCursor = Articles.find({websites_ids: website_id}, {sort: {publishedDate: -1}});
This way, if the cursor is reactive and an article is added to a website, the client immediately receives this information about the article. If done your way, the cursor would only track the specific IDs of the articles. Something to consider.

Related

Best way to structure my firebase database

I'm working on a project where users can post things. But, I'm wondering if my firebase database structure is efficient. Below is how my database looks like so far. I have posts child contains all the post that users will post. and each user will be able to track their own posts by having posts child in uid. Is there a better way of structuring my data? or am I good to go? Any advice would be appreciated!
{
"posts" : {
"-KVRT-4z1AUoztWnF-pe" : {
"caption" : "",
"likes" : 0,
"pictureUrl" : "https://firebasestorage.googleapis.com/v0/b/cloub-4fdbd.appspot.com/o/users%2FufTgaqudXeUciW5bGgCSfoTRUw92%2F208222E1-8E20-42A0-9EEF-8AF34F523878.png?alt=media&token=9ec5301e-d913-44ee-81d0-e0ec117017de",
"timestamp" : 1477946376629,
"writer" : "ufTgaqudXeUciW5bGgCSfoTRUw92"
}
},
"users" : {
"ufTgaqudXeUciW5bGgCSfoTRUw92" : {
"email" : "Test1#gmail.com",
"posts" : {
"-KVRT-4z1AUoztWnF-pe" : {
"timestamp" : 1477946376677
}
},
"profileImageUrl" : "https://firebasestorage.googleapis.com/v0/b/cloub-4fdbd.appspot.com/o/profile_images%2F364DDC66-BDDB-41A4-969E-397A79ECEA3D.png?alt=media&token=c135d337-a139-475c-b7a4-d289555b94ca",
"username" : "Test1"
}
}
}
Working with NoSql Data , your need to take care of few things:-
Avoid Nesting The Data Too Deep
Flatten your dataStructure as possible
Prefer Dictionaries
Security Rules [Firebase]
Try this structure:-
users:{
userID1: {..//Users info..
posts:{
postID1: true,
postID2: true,
postID3: true,
}
},
userID2: {..//Users info..},
userID3: {..//Users info..},
userID4: {..//Users info..},
},
posts: {
userID1 :{
postID1: {..//POST CONTENT },
postID2: {..//POST CONTENT },
postID3: {..//POST CONTENT },
}
}
Keep the data flat and shallow. Store data elsewhere in the tree rather than nest a branch of data under a node that is simply related, duplicating the data if that helps to keep the tree shallow.
The goal is to have fast requests that return only data you need. Consider that every time the tree changes and the client-side listener fires the node and all its children are communicated to the client. Duplication of data across the tree facilitates quick requests with minimal data.
This process of flattening the data is known as "denormalization" and this section of the Firebase Doc does a nice job of providing guidance:
https://firebase.google.com/docs/database/android/structure-data
In your example above I see posts metadata nested under "users", a nested list that grows. Every time something changes under "users" the listener will fire to update the client and all of this data will be transmitted in each response. You could instead consider to fetch the posts data from the "posts" node based on the writer's uuid.

How can Mongo store infinitely long comments in a blog post example

Am looking to build a blogging system and came across the following blog.
http://blog.mongolab.com/2012/08/why-is-mongodb-wildly-popular/
While it's nice to see how we can store everything in one Mongo document as a json type object (example json from the blog pasted below) rather than distributing data across multiple tables, I'm having trouble understanding how this can accommodate an hypothetically super long comment thread.
{
_id: 1234,
author: { name: "Bob Davis", email : "bob#bob.com" },
post: "In these troubled times I like to …",
date: { $date: "2010-07-12 13:23UTC" },
location: [ -121.2322, 42.1223222 ],
rating: 2.2,
comments: [
{ user: "jgs32#hotmail.com",
upVotes: 22,
downVotes: 14,
text: "Great point! I agree" },
{ user: "holly.davidson#gmail.com",
upVotes: 421,
downVotes: 22,
text: "You are a moron" }
],
tags: [ "Politics", "Virginia" ]
}
Aside from the comments key which is represented as an array of comment objects, allowing us to store an endless number of comments within this document rather than on a separate comments table requiring a join operation to relate if we are to do this with a relational database, the rest of the fields (ie author, post, date, location, rating, tags) can all be done as columns on a relational database table as well.
Since there is a limit of 16MB per document, what happens when this blog attracts a lot of comments?
Also, why can't I store a json object on a relational database column? Afterall it's a text isn't it?
First, a clarification: MongoDB actually stores BSON, which is a essentially superset of JSON that supports more data types.
Since there is a limit of 16MB per document, what happens when this blog attracts a lot of comments?
You won't be able to increase the size past 16MB, so you'll lose the ability to add more comments. But you don't need to store all the comments on the blog post document. You could store the first N, then retire old comments to a comments collection as new ones are added. You could store comments in another collection with a parent reference. The way comments are stored should jive with how you expect them to be used. 16MB of comments would really be a lot - you might even have a special solution to handle the occasional post that gets that kind of activity, an approach that's totally different from the normal way of handling comments.
We can store json in a relational database. So what is the value of Mongo I'm getting?
Here's two ways of storing JSON (in MongoDB).
> db.test.drop()
> db.test.insert({ "name" : { "first" : "Yogi", "last" : "Bear" }, "location" : "Yellowstone", "likes" : ["picnic baskets", "PBJ", "the great outdoors"] })
> db.test.findOne()
{
"_id" : ObjectId("54f9f41f245e945635f2137b"),
"name" : {
"first" : "Yogi",
"last" : "Bear"
},
"location" : "Yellowstone",
"likes" : [
"picnic baskets",
"PBJ",
"the great outdoors"
]
}
var jsonstring = '{ "name" : { "first" : "Yogi", "last" : "Bear" }, "location" : "Yellowstone", "likes" : ["picnic baskets", "PBJ", "the great outdoors"] }'
> db.test.drop
> db.test2.insert({ "myjson" : jsonstring })
> db.test2.findOne()
{
"_id" : ObjectId("54f9f535245e945635f2137d"),
"myjson" : "{ \"name\" : { \"first\" : \"Yogi\", \"last\" : \"Bear\" }, \"location\" : \"Yellowstone\", \"likes\" : [\"picnic baskets\", \"PBJ\", \"the great outdoors\"] }"
}
Can you store and use JSON the first way using a relational database? How useful is JSON stored in the second way compared to the first?
There's lots of other differences between MongoDB and relational databases that make one better than the other for various use cases - but going further into that is too broad for an SO answer.
Can you store and use JSON the first way using a relational database?
How useful is JSON stored in the second way compared to the first?
Sorry are you suggesting that with Mongo json documents can be stored without using escape characters, whereas with a RDBMS I must use escape characters to escape the double quotes? I wasn't aware of that's the case.

Document References query example

If I choose to use Document References with a structure of Materialized Paths instead of the simple Embedded Documents how can I display the same results?
For example if I had Embedded docs I simply :
db.col.find({'user' : 'foo'})
and return:
{'user' : 'foo',
'posts' : [ {},
{},
{}
]
}
Which command should I use to display posts as an embedded array of that user? Or this can only happen client-side?
If it's document references,
users collection will contain:
{
_id : "foo",
// users details
}
and posts collection:
{
_id: "postid",
author: "foo"
// other fields
}
In this case,
1) First make query to get the user id from users collection.
2) Then send the user id to the posts collection to get all posts
var user = db.users.find({_id : "foo"});
// this is used to get user details or validate user and only after validation if you need to fetch the posts
var posts = db.posts.find({author: user._id });
As the documents are referenced, there will be a roundtrip to the server which is obvious.
I am not sure how you have used materialized path for this scenario, let me know the data structure of it and i would be able to mention the query based on that.

how do I do 'not-in' operation in mongodb?

I have two collections - shoppers (everyone in shop on a given day) and beach-goers (everyone on beach on a given day). There are entries for each day, and person can be on a beach, or shopping or doing both, or doing neither on any day. I want to now do query - all shoppers in last 7 days who did not go to beach.
I am new to Mongo, so it might be that my schema design is not appropriate for nosql DBs. I saw similar questions around join and in most cases it was suggested to denormalize. So one solution, I could think of is to create collection - activity, index on date, embed actions of user. So something like
{
user_id
date
actions {
[action_type, ..]
}
}
Insertion now becomes costly, as now I will have to query before insert.
A few of suggestions.
Figure out all the queries you'll be running, and all the types of data you will need to store. For example, do you expect to add activities in the future or will beach and shop be all?
Consider how many writes vs. reads you will have and which has to be faster.
Determine how your documents will grow over time to make sure your schema is scalable in the long term.
Here is one possible approach, if you will only have these two activities ever. One record per user per day.
{ user: "user1",
date: "2012-12-01",
shopped: 0,
beached: 1
}
Now your query becomes even simpler, whether you have two or ten activities.
When new activity comes in you always have to update the correct record based on it.
If you were thinking you could just append a record to your collection indicating user, date, activity then your inserts are much faster but your queries now have to do a LOT of work querying for both users, dates and activities.
With proposed schema, here is the insert/update statement:
db.coll.update({"user":"username", "date": "somedate"}, {"shopped":{$inc:1}}, true)
What that's saying is: "for username on somedate increment their shopped attribute by 1 and create it if it doesn't exist aka "upsert" (that's the last 'true' argument).
Here is the query for all users on a particular day who did activity1 more than once but didn't do any of activity2.
db.coll.find({"date":"somedate","shopped":0,"danced":{$gt:1}})
Be wary of picking a schema where a single document can have continuous and unbounded growth.
For example, storing everything in a users collection where the array of dates and activities keeps growing will run into this problem. See the highlighted section here for explanation of this - and keep in mind that large documents will keep getting into your working data set and if they are huge and have a lot of useless (old) data in them, that will hurt the performance of your application, as will fragmentation of data on disk.
Remember, you don't have to put all the data into a single collection. It may be best to have a users collection with a fixed set of attributes of that user where you track how many friends they have or other semi-stable information about them and also have a user_activity collection where you add records for each day per user what activities they did. The amount or normalizing or denormalizing of your data is very tightly coupled to the types of queries you will be running on it, which is why figure out what those are is the first suggestion I made.
Insertion now becomes costly, as now I will have to query before insert.
Keep in mind that even with RDBMS, insertion can be (relatively) costly when there are indices in place on the table (ie, usually). I don't think using embedded documents in Mongo is much different in this respect.
For the query, as Asya Kamsky suggest you can use the $nin operator to find everyone who didn't go to the beach. Eg:
db.people.find({
actions: { $nin: ["beach"] }
});
Using embedded documents probably isn't the best approach in this case though. I think the best would be to have a "flat" activities collection with documents like this:
{
user_id
date
action
}
Then you could run a query like this:
var start = new Date(2012, 6, 3);
var end = new Date(2012, 5, 27);
db.activities.find({
date: {$gte: start, $lt: end },
action: { $in: ["beach", "shopping" ] }
});
The last step would be on your client driver, to find user ids where records exist for "shopping", but not for "beach" activities.
One possible structure is to use an embedded array of documents (a users collection):
{
user_id: 1234,
actions: [
{ action_type: "beach", date: "6/1/2012" },
{ action_type: "shopping", date: "6/2/2012" }
]
},
{ another user }
Then you can do a query like this, using $elemMatch to find users matching certain criteria (in this case, people who went shopping in the last three days:
var start = new Date(2012, 6, 1);
db.people.find( {
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
}
});
Expanding on this, you can use the $and operator to find all people went shopping, but did not go to the beach in the past three days:
var start = new Date(2012, 6, 1);
db.people.find( {
$and: [
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
},
actions : {
$not: {
$elemMatch : {
action_type : { $in: ["beach"] },
date : { $gt : start }
}
}
}
]
});

Mongo DB Design For Events Calendar

We have an events calendar and here is what I am thinking for our basic mongo schema:
users
username
password
salt
events
name
description
tags[]
category
venue_id
...
venues
name
address
loc
Queries that will be done are:
Listing of distinct tags (this might be hard given current design)
events in a given tag
Event information along with the venue location
All event and venue information for a particular event
All events near me. Need to use a geo index on loc
Any feedback/ideas on if we should be nesting venues inside events or use mysql instead?
Ok, that looks pretty good. Let's construct some of the queries.
Assuming the collections are named users, events, venues:
Insert some dummy events:
db.events.insert({tags:["awesome","fun","cool"]})
db.events.insert({tags:["sweet","fun","rad"]})
Make an index (like a boss)
db.events.ensureIndex({ tags: 1 })
Listing of distinct tags (this might be hard given current design):
Nope, not hard.
db.events.distinct("tags")
[ "awesome", "cool", "fun", "rad", "sweet" ]
Events in a given tag (you meant "with a given tag" right?)
db.events.find({tags: "fun"})
{ "_id" : ObjectId("4ecc08c62477605df6522c97"), "tags" : [ "awesome", "fun", "cool" ] }
{ "_id" : ObjectId("4ecc08d92477605df6522c98"), "tags" : [ "sweet", "fun", "rad" ] }
Event information along with venue location
You can do this a couple different ways. One way would be to query for the event and subsequently query for the venue. With both documents, join (combine) the data you want manually.
OR
You can denormalize a bit and store cached venue names + locations (but not venue details like hours of operation, max occupancy, website, phone number, etc..) for a speed boost (1 query instead of 2) That method comes with the standard denormalization caveat of not being able to update your data in one place.
All event and venue information for a particular event
See above
All events near me. Need to use a geo index on loc
Two queries again, same concept as above just reverse the order.
Get the venues:
db.venues.find( { loc : { $near : [lat,lon] } } )
Get the events using the venue ids:
db.events.find( { venue : { $in : [id1,id2,id3...] } } )
Some of this stuff can be done automatically for you if you use an ODM.
Good luck!