Does elasticSearch or MongoDB suits my needs? - mongodb

I'll have reactjs webapp with nodejs server. Besides regular things like regular users profiles etc, i'll need to have the most effective and scalable solution for the purposes listed below. I understand that I'll need to rewrite some parts again and again over time, but DB choice is fundamental thing, so i hope i'll select the right one.
Full-text search. I'll have this json structure:
items: {
[guid]: { // txt
parent: [guid],
path: '/root/dir/subdir/',
created: timestamp,
updated: timestamp,
access: 'rwa', // rwa/rw-/r--/---
owner: user_name,
title: 'string',
text: 'string',
comments: {...}
},
},
Items will potentially contain millions of records. Each record's text property may contain from few words till, say, 100k characters. Users will be slightly updating and growing this records all the time.
I'll need to perform search based on title and text properties. I'll have to use path property to search among items with specific path. For example: "find first 20 records where title or text contains some words and path begins with /root/dir sorted by title/path/created/updated property"
Viewed-like statuses. Next thing i need is to mark each viewed page as "viewed" for user in order to gray out a url to it. Apparently, user might go through thousands of pages each time and there might be a lot of users. I guess "bloom filters" might help me with that, but i have no idea, YET, how i'll implement that.
News feeds. Well, that's nothing unusual, regular news feed with subscriptions, trandings and recommendations. I like the solution described here.
Thank you for help!

Related

How to tag documents in MongoDB?

I need to tag documents in a collection, let's call it 'Contacts'.
The first idea I had was to create an attribute called "tags" for each document.
Well, in this case we have something like:
{
_id:'1',
contact_name:'Asya Kamsky',
tags:['mongodb', 'maths', 'travels']
}
Now, let's suppose that we have users that want to tag any document in 'Contacts'.
If we keep the decision to save the tags attribute for each document, as the tags are personal, we need to use the userId for each tag.
So our document would be something like that (or not):
{
_id:'1',
contact_name:'Asya Kamsky',
tags:[
{userId:'alex',tags:['mongodb', 'maths', 'travels']},
{userId:'eric',tags:['databases', 'friends', 'japan']},
]
}
Now, let's complicate it a bit. Let's imagine that we have A LOT of users and each one want to tag documents with his personal tags.
How to deal with that?
Ok, we could create thousands of tags for each document:
{
_id:'1',
contact_name:'Asya Kamsky',
tags:[
{userId:'alex',tags:['mongodb', 'maths', 'travels']},
{userId:'eric',tags:['databases', 'friends', 'japan']},
{.....................................................}
{.....................................................}
{......................................................}
]
}
But, what if we have millions of users? In this case we have a 16mg limitation for each document, as I know....
At this point, worrying about the future growth of my application, I decided
to create a nice separated collection called 'tags' that would contain documents similar to:
{
"contact_name" : "Asya Kamsky",
"useriId" : "alex",
"tags" : ['mongodb', 'maths', 'travels'],
"timestamp" : "2017-08-08 14:33:28"
},
{
"contact_name" : "Asya Kamsky",
"useriId" : "eric",
"tags" : ['databases', 'friends', 'japan'],
"timestamp" : "2017-08-08 14:33:28"
}
That's, we have a separated documents that represent a tag of each user.
Cool and clean, right?
Well, i this case, we face 2 problems:
Minor problem: We return to the SQL logic that I don't like anymore but I accept in some cases.
Big (for me) problem: how to search a contact by PERSONAL tags? In this case we have a nice 'JOIN' problem that MongoDB resolves well using $lookup.
"Resolves well" for 10000, 20000, or even 500000 documents. But as I want to ensure a good performance in the future, I think about 10000000 contacts. So, as I researched recently, the $lookup works well for a "small part" of universe and, even with indexes, this search would take a lot of time to be executed.
How to resolve this challenge?
Thanks all
If your usage is such that the number of users X number/size of tags per contact (plus whatever other data is in a contacts document) is likely to bring you near the 16MB document size limit then storing the tags ins a separate collection seems valid. But before you go down that route are you sure this is likely? Have you tried creating contact documents in a bid to see how many tags, how many users per contact would get you near the 16MB limit. If the answer implies a number of users and/or tags which you are unlikely ever to reach then maybe your concerns are strictly theoretical and you could consider sticking with the simplest solution which is to embed the user specific tags inside contacts.
The rest of this answer assumes that the size estimates and your knowledge about the likely number of tags and users per contact are such that the size constraints are valid. On this basis, you stated this specific concern about join performance ...
But as I want to ensure a good performance in the future, I think about 10000000 contacts. So, as I researched recently, the $lookup works well for a "small part" of universe and, even with indexes, this search would take a lot of time to be executed.
Have you tried measuring this performance? Generate seed documents for contacts and tags and then persist variations of these and then run queries using $lookup and measure the performance. You could do this for a few benchmarks, for example:
1,000 contacts and 10,000 tags
100,000 contacts and 1,000,000 tags
1,000,000 contacts and 10,000,000 tags
10,000,000 contacts and 100,000,000 tags
When running your benchmark tests you can additionally use explain() to understand what's going on inside MongoDB.
You might find that performance is acceptable, only you can know this since you understand what expectations the users of your system have with respect to performance.
One last point, if the use case here is that a given user wants to find all of their contacts and tags then this could be handled with a 'client side join' i.e. two queries (1) to get the tags for "userId" : "..." and (2) to find the contacts referenced by those tags. Depending on what your use cases are, this could be more performant that a server side join (aka $lookup).

MongoDB - Tag based search with autocomplete

I am looking to implement a tag search feature and was looking for some advice in terms of efficiency. I am new to MongoDB so I am unsure of best practices for performance.
Okay so I want to create a link sharing app which users tag the links based on their content. For instance a funny dog image would be tagged with "funny" and "dog". A link would have a:
title,
url,
user_id,
tags: array of tags
Now in order for me to allow users to search for links I need a list of all the tags used. For usability this needs to have auto-complete functionality. So I researched a bit and tested out using a collection of tags where I index the tag value e.g. "funny" and then use a regex.
db.tags.find({value:/^search/})
With a collection of 600,000 documents it searched for all documents beginning with "s" in 63 milliseconds. As the length of the search term increases the execution time decreases.
Now comes the part I'm unsure of. Say for instance I want to find all the links with have the tags "funny" and "dog" (need to use intersects). How should I store the tags? Should I store the object id of each tag? Can I index these object ids? Is there another way to structure the whole database?
Also id like to be able suggest tags based on tags they already entered. I was thinking of just having a related field in the tag document for instance:
tag
----
id
value
related: [{
tag_id
count
}]
(again unsure as it would suggest tags that could be related to one of the already entered tags and not to another. With an intersect this would return no results.)
Any advice would be much appreciated.
Edit: mistake
Create a text index on the tag array. This will enable you to search quickly for funny, dog, and funny or dog.
https://docs.mongodb.com/manual/core/index-text/
db.tags.createIndex( { tags: "text" }, {background:true} )
As to the related tags, I don't think that you want to reference the _id values. You can probably embed an array of related tags such as:
relatedTags: [{tag1}, {tag2}]

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

Many to many in MongoDB

I decided to give MongoDB a try and see how well we get along. I do have some questions though.
Premise
I have users(id, name, address, password, email, etc)
I have stamps(id, type, value, price, etc)
Users browse through a stamp archive and filter it in various ways(pagination, filter by price, type, name, etc), select a stamp then add it to their collection.
Users can add more then one stamp to their collection (1 piece of mint and one used or just 2 pieces of used)
Users can flag some of their stamps for sale or trade and perhapa specify a price.
So far
Here's what I have so far:
{
_id : objectid,
Name: "bob",
Email: "bob#bob.com",
...
Stamps: [stampid-1, stampid-543,...,stampid-23]
}
Questions
How should I add the state of the owned stamp, the quantity and condition?
what would be some sample queries for the situations described earlier?
As far as I know, ensureindex makes it so you reduce the number of "scanned" entries.
The accepted answer here keeps changing the index. Is that just for the purpose of explaining it or is this the way to do it? I mean it does make sense somehow but I keep thinking of it in sql terms and... it does not make ANY sense...
The only change I would do is how you store the stamps that a user owns. I would store an array of objects representing the stamps and duplicating the values that are the more often accessed.
For example something like that :
{
_id : objectid,
Name: "bob",
Email: "bob#bob.com",
...
Stamps : [
{
_id: id,
type: 'type',
price: 20,
forSale: true/false,
quantity: 2
},
{
_id: id2,
type: 'type2',
price: 5,
forSale: false,
quantity: 10
}
]
}
You can see that some datas are duplicated between the stamps collection and the stamps array in the user collection. You do that with the properties that you access the more often. Because otherwise you would have to do a findOne for each stamps, and it is better to read directly the data that doing that in MongoDB. And this way you can add others properties such as quantity and forSale here.
The goal of duplication here is to avoid to run a query for each stamp in the array.
There is a link of a video that discusses MongoDB design and also explains what I tried to explain here.
http://lacantine.ubicast.eu/videos/3-mongodb-deployment-strategies/
from a SQL background, struggling with NoSQL also. It seems to me that a lot hinges on how unchanging types of data may or may not be. One thing that puzzles me in RDBMS systems is why it is not possible to say a particular column/field is "immutable". If you know a field is immutable (or nearly) in a NoSQL context it seems me to make it more acceptable to duplicate the info. Is it complete heresy to suggest that in many contexts you might actually want a combination of SQL and NoSQL structures?

Nested Comments in MongoDB

I'm quite new to MongoDB and trying to build a nested comment system with it.
On the net you're finding various document structures to achieve that, but I'm looking for some proposals that would enable me easily to do the following things with the comments
Mark comments as spam/approved and retrieve comments by this attributes
Retrieve comments by user
Retrieve comment count for an object/user
Besides of course displaying the comments as it is normally done. If you have any suggestions on how to handle these things with MongoDB - or - tell me to look for an alternative it'd be appreciated much!
Have you considered storing the comments in all documents that need a reference to them? If you have a document for the user, store all of that user's comments in it. If you have a separate document for objects, store all comments there also. It feels sort of wrong after coming from a relational world where you try to have exactly one copy of a given piece of data, and then reference it by ID, but even with relational databases you have to start duplicating data if you want queries to run quickly.
With this design, each document that you load would be "complete". It would have all the data you need, and indexes on that collection would keep reads fast. The price would be slightly slower writes, and more of a headache when you need to update the comment text, since you need to update more than one document.
Because of you need retrieve comments by some attributes, by user, etc.., you can't embed(embedding is always faster for document databases) comment in each object that users can comment. So you need create separate collection for the comments. I suggest following structure:
comment
{
_id : ObjectId,
status: int (spam =1, approved =2),
userId: ObjectId,
commentedObjectId: ObjectId,
commentedObjectType: int(for example question =1, answer =2, user =3),
commentText
}
With above structure you can easy do things thats you want:
//Mark comments as spam/approved and retrieve comments by this attributes
//mark specific comment as spam
db.comments.update( { _id: someCommentId }, { status: 1 }, true);
db.comments.find({status : 1});// get all comments marked as spam
//Retrieve comments by user
db.comments.find({'_userId' : someUserId});
//Retrieve comment count for an object/user
db.comments.find({'commentedObjectId' : someId,'commentedObjectType' : 1 })
.count();
Also i suppose for comments counting will be better to create extra field in each object and inc it on comment add/delete.