Parse DB Design: How to get all the posts for particular category - mongodb

I'm creating a discussion system using Parse.com
In my [simplified] system, there are Posts, Categorys, and Comments.
As you probably imagined, Posts can belong to one or more Categorys and can have multiple Comments.
However, often users will want to see all the Posts in a Category. If I set up my database like this
Post (name, content, categories)
Category(name)
I am worried that querying for all the Posts in a Category will be very ineffeficient (since it will have to check the categories field of every Post.
However, if I design the database like
Post (name, content)
Category(name, posts)
it will be inefficient for me to query what Categorys a Post belongs to since it will have to search all the Posts arrays in the all the Categorys.
I'm sure this must be a common Database design dilemma but I am still new at this. What is the best way to approach and solve this problem?

What you're looking for is a bi-directional, many-to-many relationship between Post and Category. With Parse, there are at least three approaches you can take.
You can add a column as a PFRelation to the Post table. You can ask a Post for its categories relation, create a query from that and run it. Inversely, if you have a category you can create a Post query with a where clause on the categories key. PFRelations are good if you will have big collections.
If you think better as a relational model, just create a "join" table called CategoryPosts. It would have two pointer columns, one for the Post and another for the Category. This is also very efficient.
Lastly, you could add an array column to either class. Since all of the results are loaded at once, this works best for smaller collections.
These options are described in a little more detail in the Parse Relations Documentation.

Related

directus cms how to join

I'm using directus for the first time. According to the documentation, database joins are possible. However, there is nothing about usage in the documentation, just a note to add this in future. Does anyone of you know how to use it anyway?
You can setup a relational interface (like a many-to-one) to connect two collections. When that's setup you can use the fields parameter to select how many "levels" deep you want to retrieve the relational data.
Let's say you have a collection books and a collection authors. In this example, each book has a single author. Using a many-to-one interface in the books collection, you can now select what author wrote the book.
To fetch the books, you'd normally use /items/books. To retrieve the title of the book, and the name of the author, you can get /items/books?fields=title,author.name.
If you want all the data, you can also use the * flag: ?fields=*.* will retrieve all fields 2 'levels' deep.

Sails.js one to many embedded associations with mongo

On the sails documentation here it shows modeling one to many associations with what looks like high level referencing.
Lets say I want to use mongo to make a post that has a lot of comments on it. I will take the post as the document and in it I will embed all the comments in one attribute.
If I did it like the documentation, would the mongo adapter automatically, create a document with the comments embedded? or would it do something relational and reference the comments?
If it doesn't embed, how would I go about putting the embedded comments in my model?
Thanks
Mongo doesn't provide associations on its own. Sails uses Waterline for ORM.
You need to create your Comment object yourself and just add its id to the appropriate attribute in the Post instance(which should be a collection), using post.comments.add(comment.id).
Removal is similar, just call post.comments.remove(comment.id)
Note that at some point you might not like to have thousands of commentids being fetched every time you retrieve a Post (or worse, thousands of Comment documents if you populate and fetch). This, of course, is only a concern if you're expecting thousands of comments per post in the first place.
Oh, and don't forget to save your document to finalize the changes.

Is it possible to group multiple collections in mongodb

so I'm working with a database that has multiple collections and some of the data overlaps in the collection . In particular I have a collection called app-launches which contains a field called userId and one called users where the _id of a particular object is actually the same as the userId in app-launches. Is it possible to group the two collections together so I can analyze the data? Or maybe match the the userId in app-launches with the _id in users?
There is no definit answer for your question Jeffrey and none of the experts here can tell you to choose which technique over other just by having this information.
After going through various web pages over internet and mongo documentation and understanding the design patterns used in Mongo over a period of time, How I would design it depends on few things which I can try explaining it here in short.
if you have a One-To-One relation then always prefer to choose Embedding over Linking. e.g. User and its address (assuming user has only one address) thus you can utilize the atomicity (without worrying about transactions) as well easily fetch the records without too and fro to bring other information as in the case of Linking (like in DBRef)
If you have One-To-Many relation then you need to consider whether you can do the stuff by using Embedding (prefer this as explained the benefits in point 1). However, embedding would help you if you always want the information altogether e.g. Post/Comments where your requirement is to get the post and all of its comments by postId let say. But think of a situation where you need to get all the comments (and it related posts) which contains some specific tags in comments. in this case you should prefer Linking Because if you go via Embedding route then you would end up getting all the collection of comments for a post and you have to filter the desired comments.
for a Many-To-Many relations I would prefer two separate entities as well another collection for linking them e.g. Product-Category.
-$

MongoDB Data Model Design for Meteor.js App

I'm not much of a backend guy and even worse when it comes to MongoDB, however, I've been taken with Meteor.js so I'm giving it a try as I play around.
I'm creating a project management/ticketing app and would like your opinion on the data model design. In my app you create a ticket, assign other team members to the ticket and allow people to access it and manipulate the data like a todo list, attachments, comments, etc. Pretty basic.
From my research, it appears that a normalized data model with references makes sense. In that case, is a good model:
A collection for all my users.
A collection for tickets (each ticket/project its own document) with a field for team members in which I insert them into an array using a reference. Then I'd have fields for comments, todos, etc.
Or would this be best:
A collection for all my users.
A unique collection for each ticket with a field for team members kept in an array.
Sorry if this seems rather basic. I'm taking the MongoDB University classes for Node, so I hope I don't have to rely on too many basic questions for too long.
Thanks everyone!
You should store each ticket/project in its own document in a single collection (the first option).
If you give each ticket its own collection you have no effective way to index and query tickets.

Structuring cassandra database

I don't understand one thing about Cassandra. Say, I have similar website to Facebook, where people can share, like, comment, upload images and so on.
Now, let's say, I want to get all of the things my friends did:
Username1 liked you comment
username 2 updated his profile picture
And so on.
So after a lot of reading, I guess I would need to do is create new Column Family for each single thing, for example: user_likes user_comments, user_shares. Basically, anything you can think off, and even after I do that, I would still need to create secondary indexes for most of the columns just so I could search for data? And even so how would I know which users are my friends? Would I need to first get all of my friends id's and then search through all of those Column Families for each user id?
EDIT
Ok so i did some more reading and now i understand things a little bit better, but i still can't really figure out how to structure my tables, so i will set a bounty and i want to get a clear example of how my tables should look like if i want to store and retrieve data in this kind of order:
All
Likes
Comments
Favourites
Downloads
Shares
Messages
So let's say i want to retrieve ten last uploaded files of all my friends or the people i follow, this is how it would look like:
John uploaded song AC/DC - Back in Black 10 mins ago
And every thing like comments and shares would be similar to that...
Now probably the biggest challenge would be to retrieve 10 last things of all categories together, so the list would be a mix of all the things...
Now i don't need an answer with a fully detailed tables, i just need some really clear example of how would i structure and retrieve data like i would do in mysql with joins
With sql, you structure your tables to normalize your data, and use indexes and joins to query. With cassandra, you can't do that, so you structure your tables to serve your queries, which requires denormalization.
You want to query items which your friends uploaded, one way to do this is t have a single table per user, and write to this table whenever a friend of that user uploads something.
friendUploads { #columm family
userid { #column
timestamp-upload-id : null #key : no value
}
}
as an example,
friendUploads {
userA {
12313-upload5 : null
12512-upload6 : null
13512-upload8 : null
}
}
friendUploads {
userB {
11313-upload3 : null
12512-upload6 : null
}
}
Note that upload 6 is duplicated to two different columns, as whoever did upload6 is a friend of both User A and user B.
Now to query the friends upload display of a friend, do a getSlice with a limit of 10 on the userid column. This will return you the first 10 items, sorted by key.
To put newest items first, use a reverse comparator that sorts larger timestamps before smaller timestamps.
The drawback to this code is that when User A uploads a song, you have to do N writes to update the friendUploads columns, where N is the number of people who are friends of user A.
For the value associated with each timestamp-upload-id key, you can store enough information to display the results (probably in a json blob), or you can store nothing, and fetch the upload information using the uploadid.
To avoid duplicating writes, you can use a structure like,
userUploads { #columm family
userid { #column
timestamp-upload-id : null #key : no value
}
}
This stores the uploads for a particular user. Now when want to display the uploads of User B's friends, you have to do N queries, one for each friend of User B, and merge the result in your application. This is slower to query, but faster to write.
Most likely, if users can have thousands of friends, you would use the first scheme, and do more writes rather than more queries, as you can do the writes in the background after the user uploads, but the queries have to happen while the user is waiting.
As an example of denormalization, look at how many writes twitter rainbird does when a single click occurs. Each write is used to support a single query.
In some regards, you "can" treat noSQL as a relational store. In others, you can denormalize to make things faster. For instance, PlayOrm's #OneToMany stored the many like so
user1 -> friend.user23, friend.user25, friend.user56, friend.user87
This is the wide row approach so when you find your user, you have all the foreign keys to his friends. Each row can be different lengths. You may also have a reverse reference stored as well so the user might have references to the people that marked him as a friend but he did not mark them back(let's call it buddy) so you might have
user1 -> friend.user23, friend.user25, buddy.user29, buddy.user37
Notice that if designed right, you may NOT need to "search" for the data. That said, with PlayOrm, you can still do Scalable SQL and do joins(you just have to figure out how to partition your tables so it can scale to trillions of rows).
A row can have millions of columns in it or it could have just 10. We are actually in the process of updating alot of the documentation in PlayOrm and the noSQL patterns this month so if you keep an eye on that, you can also learn more about general noSQL there as well.
Dean
Think of each DB query as of request to the service running on another machine. Your goal is to minimize number of these requests (because each request requires network roundtrip).
Here comes the main difference from RDBMS paradigm: In SQL you would typically use joins and secondary indexes. In cassandra joins aren't possible, since related data would reside on different servers. Things like materialized views are used in cassandra for the same purpose (to fetch all related data with single query).
I'd recommend to read this article:
http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/
And to look into twissandra sample project https://github.com/twissandra/twissandra
This is nice collection of optimization technics for the kind of projects you described.