Mongo DB chained query - mongodb

I am new to MongoDB and as I wonder if a chained query like the following is possible(somewhat like a join):
db.places.insert({
"_id": original_id
"place_name": "Broadway Center"
"url": "bc.example.net"})
db.people.insert({
"name": "Erin"
"places_id": original_id
"url": "bc.example.net/Erin"})
So given a place name string, I want to select the people associated with that place.
But the people collection only reference the place id, not the place name.

You cannot use joins in MongoDB.
The idiomatic solution is retrieve all place_ids for that place_name from your places collection and then use the place_ids to query in your people collection.
Another option is keeping, for example, places in people collection (this makes more sense to me than people inside places collection but, of course, it depends on your domain). But then you have to take into account that in case that only one place changes, you have to change all people documents sharing a specific place. If people and places are in separate collections this doesn't happen so it depends on if we have static data or not and on if we want to optimize searches or updates.

Related

MongoDB Querying Large Datasets

Lets say I have simple document structure like:
{
"item": {
"name": "Skittles",
"category": "Candies & Snacks"
}
}
On my search page, whenever user searches for product name, I want to have a filter options by category.
Since categories can be many (like 50 types) I cannot display all of the checkboxes on the sidebar beside the search results. I want to only show those which have products associated with it in the results. So if none of the products in search result have a category, then do not show that category option.
Now, the item search by name itself is paginated. I only show 30 items in a page. And we have tens of thousands of items in our database.
I can search and retrieve all items from all pages, then parse the categories. But if i retrieve tens of thousands of items in 1 page, it would be really slow.
Is there a way to optimize this query?
You can use different approaches based on your workflow and see what works the best in your situation. Some good candidate for the solution are
Use distinct prior to running the query on large dataset
Use Aggregation Pipeline as #Lucia suggested
[{$group: { _id: "$item.category" }}]
Use another datastore(either redis or mongo itselff) to store intelligence on categories
Finally based on the approach you choose and the inflow of requests for filters, you may want to consider indexing some fields
P.S. You're right about how aggregation works, unless you have a match filter as first stage, it will fetch all the documents and then applies the next stage.

How to tag documents in MongoDB?

I need to tag documents in a collection, let's call it 'Contacts'.
The first idea I had was to create an attribute called "tags" for each document.
Well, in this case we have something like:
{
_id:'1',
contact_name:'Asya Kamsky',
tags:['mongodb', 'maths', 'travels']
}
Now, let's suppose that we have users that want to tag any document in 'Contacts'.
If we keep the decision to save the tags attribute for each document, as the tags are personal, we need to use the userId for each tag.
So our document would be something like that (or not):
{
_id:'1',
contact_name:'Asya Kamsky',
tags:[
{userId:'alex',tags:['mongodb', 'maths', 'travels']},
{userId:'eric',tags:['databases', 'friends', 'japan']},
]
}
Now, let's complicate it a bit. Let's imagine that we have A LOT of users and each one want to tag documents with his personal tags.
How to deal with that?
Ok, we could create thousands of tags for each document:
{
_id:'1',
contact_name:'Asya Kamsky',
tags:[
{userId:'alex',tags:['mongodb', 'maths', 'travels']},
{userId:'eric',tags:['databases', 'friends', 'japan']},
{.....................................................}
{.....................................................}
{......................................................}
]
}
But, what if we have millions of users? In this case we have a 16mg limitation for each document, as I know....
At this point, worrying about the future growth of my application, I decided
to create a nice separated collection called 'tags' that would contain documents similar to:
{
"contact_name" : "Asya Kamsky",
"useriId" : "alex",
"tags" : ['mongodb', 'maths', 'travels'],
"timestamp" : "2017-08-08 14:33:28"
},
{
"contact_name" : "Asya Kamsky",
"useriId" : "eric",
"tags" : ['databases', 'friends', 'japan'],
"timestamp" : "2017-08-08 14:33:28"
}
That's, we have a separated documents that represent a tag of each user.
Cool and clean, right?
Well, i this case, we face 2 problems:
Minor problem: We return to the SQL logic that I don't like anymore but I accept in some cases.
Big (for me) problem: how to search a contact by PERSONAL tags? In this case we have a nice 'JOIN' problem that MongoDB resolves well using $lookup.
"Resolves well" for 10000, 20000, or even 500000 documents. But as I want to ensure a good performance in the future, I think about 10000000 contacts. So, as I researched recently, the $lookup works well for a "small part" of universe and, even with indexes, this search would take a lot of time to be executed.
How to resolve this challenge?
Thanks all
If your usage is such that the number of users X number/size of tags per contact (plus whatever other data is in a contacts document) is likely to bring you near the 16MB document size limit then storing the tags ins a separate collection seems valid. But before you go down that route are you sure this is likely? Have you tried creating contact documents in a bid to see how many tags, how many users per contact would get you near the 16MB limit. If the answer implies a number of users and/or tags which you are unlikely ever to reach then maybe your concerns are strictly theoretical and you could consider sticking with the simplest solution which is to embed the user specific tags inside contacts.
The rest of this answer assumes that the size estimates and your knowledge about the likely number of tags and users per contact are such that the size constraints are valid. On this basis, you stated this specific concern about join performance ...
But as I want to ensure a good performance in the future, I think about 10000000 contacts. So, as I researched recently, the $lookup works well for a "small part" of universe and, even with indexes, this search would take a lot of time to be executed.
Have you tried measuring this performance? Generate seed documents for contacts and tags and then persist variations of these and then run queries using $lookup and measure the performance. You could do this for a few benchmarks, for example:
1,000 contacts and 10,000 tags
100,000 contacts and 1,000,000 tags
1,000,000 contacts and 10,000,000 tags
10,000,000 contacts and 100,000,000 tags
When running your benchmark tests you can additionally use explain() to understand what's going on inside MongoDB.
You might find that performance is acceptable, only you can know this since you understand what expectations the users of your system have with respect to performance.
One last point, if the use case here is that a given user wants to find all of their contacts and tags then this could be handled with a 'client side join' i.e. two queries (1) to get the tags for "userId" : "..." and (2) to find the contacts referenced by those tags. Depending on what your use cases are, this could be more performant that a server side join (aka $lookup).

What is a good way to handle "complex" mongodb documents in meteor

I want to store "carpool_debts" which is basically going to hold the number of days owed to other users. It looks like this:
carpool_debts{
_id,
owner,
owner_id,
creditors:[{name,
id,
amount},
{name,
id,
amount}
]}
Does that data structure look reasonable for what I want to store? Also implementing that data structure seemed cumbersome to maintain. I found it cumbersome mainly because there isn't an upsert type of function available in meteor yet. Instead of creditors being a list of sub documents would I be better off storing the creditors as a delimited string? I would like to know if I am on the right path or if I am missing something? Thanks.
You can structure mongo documents just like you would in a relational database, for example, having separate collections for creditors and owners and using carpool_debts as a link table with the amount attached:
carpool_debts{
_id,
owner_id,
creditor_id,
amount}
creditors{
_id,
name}
owners{
_id,
name}
However, this is not using mongodb to its full potential. Especially if this is a database with masses of data, you may want to optimise it for the most used queries, otherwise it'll be slow. For example, to optimise for looking up an owner's debt, you can add the data needed right there in the owners collection, using sub documents for creditors, and sub documents again for individual debts, similar to what you've already done:
owners{
_id,
name,
creditors: {id,
name,
debts: {
amount,
due_date}
}
}
and similarly, add the debt information on the creditors collection if you often look up the outstanding debt of creditors:
creditors{
_id,
name,
debtors: {
owner_id,
owner_name,
debts: {
aount,
due_date
}
}
}
This way, you only need to look up one record to get all the information you need. Of course, there are catches. First of all, this is not very DRY, but that's intentional. But you have to remember to update the other table(s) when something changes. If you change the name of a creditor for example, you'll need to update every owner document that has debts with this creditor (make sure you index that). This of course makes updates much slower (and the database bigger), but if you don't update very often, and look up much more often, this is not going to be a problem.
Also if for example creditors can have thousands of individual outstanding debts, you may have to separate that into a link table, or rather, link collection, like this, so you don't exceed mongodb's maximum document size:
creditors{
_id,
name,
}
debtors: {
owner_id,
creditor_id,
debts: {
amount,
due_date
}
}
Then you have one document for each creditor-owner connection. This means more documents to look up when looking at a creditor, but still just one for looking up an owner.
This looks fine, but you could also consider separating creditors into its own collection and just storing an array of creditor_id's in the debts collection. That would reduce complexity and make finding and filtering information easier. And it would be more DRY since if there are multiple debts with the same creditor, you only have the creditor stored in a single place.
You could also consider just having each document in the debts collection be a single debt by an owner to a single creditor. Then you'd just have id, owner_id and creditor_id - like a link table in a relational database.

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.

Unique vote, disable revote

I'm building simple Web App where users can vote.
What is the fastest way for checking if user has already voted. I'm interested in both relation databases and document based databases (mongodb,...)
I have few ideas but I am sure they can be improved:
Relation databases
Create a seperate table for voting:
|userid|articleid|
Before incrementing articles vote check if there is a row including both userid and articleid. We have two queries. Is possible to improve this with triggers? For example:
|useridarticleid| unique column
Before vote generate useridarticleid on application side. Try to insert useridarticleid. Trigger will fire if field is new and it will increment our vote column in article.
Document based
This is a bit more trickier. So having document structured like so:
{
"id": "123",
"content": "something",
"num_votes": 2,
"votes" : [
"userid1",
"userid2"
]
}
First "query" - check if userid is in votes array. Second "query" - Increment num_votes if not.
Again two queries. So I thought we can change this but I don't know really if it will increase performance:
Insert userid in votes array. When user want to check article "count" votes in array. But I think it possible that performance will drop because if traffic is high counting every article is a bit of waste. Imagine Reddit here.
Actually, it's a lot simpler in a document database. Your document structure is perfect for it.
{
"id": "123",
"content": "something",
"num_votes": 2,
"votes" : [
"userid1",
"userid2"
]
}
db.collection.update(
{id:"123", votes:{$ne:"userid"}},
{$push:{"votes":"userid"},$inc:{"num_votes":1}}
);
This will atomically update record id=123 adding userid to list of voters and incrementing votes by one only if userid is not already in the list of votes on this document.
So there is only one query and one update - and they are actually the same operation.
In a relational database |userid|articleid| would be the best approach, using both fields as primary keys.
In the second one you can also consider wther putting the votes in the user document, or in the article document.
Anyway, I'd suggest you really focus on creating a design, where changing all this decisions later is easy.
The different ways of designing this, favor things like "A lot of users at the same article at the same time" or "A lot of users in different articles", etc... Until you can see the real usage, you won't have enough information to decide which approach will work best and fastest... So create something that you can easily adapt to whatever information you learn later.
BTW: You might also consider don't counting the votes synchronically. I remember an article (which I can't find) where it mentioned that you tube votes numbers weren't actually "accurate"... They put an estimation of the current votes, and calculated the real number in a background worker thread.