I have an existing PostgreSQL database, which contains roughly 500,000 entries each of which is essentially a category in a huge tree of categories (each category has different schemas of elements).
I also have a MySQL database, which contains roughly 100,000 documents, each of which can be categories in one or more categories.
I need to be able to search for documents, which match attribute filters which are set in the categories the document is linked to.
As I understand it, I'd have to store all the data relating to all the categories a document links to, in each document, in mongo, and that just seems insane. How can I make this work?
As an example, imagine a category, which represents a red car, made in 1964, and a document which was written in 1990 about that red car. I need to be able to search for 1964 and fine the document about the car, as well as the car itself.
n:m relations in MongoDB can be expressed with arrays of database referencs (DBRef) or arrays of object IDs.
So each document would have a field "categories" which has an array with the IDs or database references of the categories it belongs to.
See this article for further information:
http://docs.mongodb.org/manual/applications/database-references/
An alternative which avoids to perform multiple database queries just to show the names of the categories would be to put the category names in that array instead of the IDs. Then you should also add an index (with the ensureIndex function) to the name field of your category collection for faster lookup (you might want to create a unique index on this field anyway to avoid duplicate category names).
About the data an object has because it belongs to a category, like cars having a manufacturer and a document having a list of other objects mentioned in the document: this data should be put directly into the document of the object. The advantage of a document-oriented database is that not every entity must have the same fields.
Related
I am making an app where users can follow each other. To decide how to model it in firestore I would like to know how does collection size affect query performance.
I first thought of making it like this:
relationships(coll.)
----{userId_1}(document)
--------following(coll)
------------{someId1}(document)
------------{someId2}(document)
.....
--------followers(coll)
------------{someId5}(document)
------------{someId7}(document)
.....
----{userId_2}(document)
--------following(coll)
------------{someId11}(document)
------------{someId24}(document)
.....
--------followers(coll)
------------{someId56}(document)
------------{someId72}(document)
.....
So I would have main collection relationships, then each document would represent one user and he would have two collections - following and followers, and in those collections I would store documents with data like id,name,email,..
Then when user1 wants to see his followers, I would get all documents under relationships/userId_1/followers, and if he would like to see who he follows I would get documents under relationships/userId_1/following
I also thought about doing it like this:
relationships(coll)
----{user5id_user4id}(document)
--------user1:"user5id" (field)
--------user2:"user4id" (field)
.........(other fields)
----{user4_user5}(document)
--------user1:"user4id" (field)
--------user2:"user5id" (field)
.........(other fields)
I would have one main collection relationships where each document would represent one following relationship, document name would be firstUserId_secondUSerId (means firstUserId follows secondUserId) and I would also have two fields user1 and user2 that would store ids of two users where user1 follows user2
So if I am {myUserId} and I would like to get all the people who I follow I would do a query on relationships collection where user1 = myUserId
And if I would like to get all the people who follow me I would do a query on relationships collection where user2 = myUserId
since each document represents relation user1 follows user2.
So my question is which way would be more efficient with querying the data.
In first case each user would have collection of his followers/following and I would just get the documents, in second case relationship would have many document representing user1->follows->user2 relation.
I know that I would be billed by number of documents that query function returns, but how fast would it be if it would need to search through large collection.
Collection size has no bearing on the performance or cost of a query. Both are determined entirely by size of the result size (number of documents). So, a query for 10 documents out of 100 performs and costs the same as a query for 10 documents out of 100,000. The size of 10 is the only thing that matters here.
See also: Queries scale with the size of your result set, not the size of your data set
I have several drop down lists (Select HTML elements) that need to be populated in the admin page. I wanted to know what is the recommend way to store in MongoDB?
Should I store each data (e.g. company list, country list) in a single document called for example Globals, and retrieve those by querying that single document?
If you will need the lists for dropdowns only, just store the lists in a single document as arrays. The collection will contain only this document so you can use findOne({}). But if you need to search for the lists(autocomplete) the ideal design will be a lot more different.
I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.
I have a collection of categories, each category document containing a link to its parent (except the root categories). Pretty simple so far.
I want to list the categories, and add a subcategory_count field to every document with the count of direct descendants.
How should I go about doing this? Could Map/Reduce be of use?
There are no "calculated columns" in MongoDB, so you can't select data and count subdocuments at the same time.
This is also the reason why most people store array length along with the array.
{friends_list: [1, 3, 234, 555],
friends_count: 4}
This helps for easier retrieval, filtering, sorting, etc. But it requires a little bit more of manual work.
So, you are basically limited to these options:
Store everything in one document.
Store subcategory count in the category.
Count subcategories on the client-side.
find() to get all of them, count the number of subcategories on each level and then update().
But it seems like your domain object should be doing this for you, so you end up with one category object that can contain categories (which could also contain categories...), and hence one mongodb document (you're listing all of them anyway, so it makes sense to retrive the whole thing in one query).
I'm building a database with several collections. I have unique strings that I plan on using for all the documents in the main collection. Documents in other collections will reference documents in the main collection, which means I'll have to save said id's in the other collections. However, if _id's only need to be unique across a collection and not across an entire database, then I would just make the _id's in the other collections also use the aforementioned unique strings.
Also, I assume that in order to set my own _id's, all I have to do is have an "_id":"unique_string" property as part of the document that I insert, correct? I wouldn't need to convert the "unique_string" into another format, right?
Also, hypothetically speaking, would I be able to have a variable save the string "_id" and use that instead? Just to be clear, something as follows: var id = "_id" and then later on in the code (during an insert or a query for example) have id:"unique_string".
Best, and thanks,Sami
_ids have to be unique in a collection. You can quickly verify this by inserting two documents with the same _id in two different collections.
Your other assumptions are correct, just try them and see whether they work (they will). The proof of the pudding is in the eating.
Note: use _id directly, var id = "_id" just compilcates the code.