MongoDB 3 (text search) or Elasticsearch? - mongodb

MongoDB 3 offers us text indexes (http://docs.mongodb.org/manual/core/index-text/). My question is, should I use Elasticsearch or MongoDB 3 with the text index feature? Which is the best for searching through lots of entries? Which is the one with the best performance (5 million+ entries) in 2015?
I googled for this information, but I only found out-dated answers.
Thanks a lot!
EDIT: My use case is searching titles, descriptions and profiles for keywords. Is MongoDB 3 capable of searching these things with the text index feature (as fast or close to) like Elasticsearch?

Depends on what your use case is. If you want full text search capabilities like finding a document based on keywords, or finding a product based on keywords that may be present in title, description, review or tags of the product. If such is the use case elastic search is the thing to go for.
You may also want to evaluate Lucene/ Solr for above use cases.

Related

Non-exact/related searches with MongoDB: find() vs $search

Hope everyone is doing great.
I had a bit of a "weird" question regarding doing non-exact/related searches with MongoDB.
I'm building a web application with a sort of "search engine" search bar if you will (I.e.: people input stuff and the results are documents related to that search instead of exact results), and I'm having a difficult time deciding the best approach.
Recently I discovered about MongoDB's full text search and it's been amazing so far in terms of what I want to achieve. However, as my search functionalities get more complex (adding stuff like sorting, pagination, etc.) I notice a lack of documentation on best practices in comparison to using find() queries. I mean, I know there are aggregation pipeline stages for doing those types of functionalities, but I have found the amount of proper examples kinda lacking.
Taking that into consideration, I've starting to consider changing my approach to using find() queries, but I can't seem to find examples of people using them for non-exact/related matches in the same way of what full text search can achieve. How would you even do that with find()? Would you use a more elaborated Regex or something similar? Is it even worth the try?
I would love to hear your anecdotes, specially as your search features became more complex, to ensure that the app remains performant. Do you swear by full text search? Or have you achieved search engine-like search using the good old find()? If so, how?
Thank you everyone!
Basically what i know in mongodb full text search is come with 2 types.
mongoDB atlas search
On-premise text search
To perform text search you can learn more on below ref docs
REFERENCE: https://www.mongodb.com/docs/manual/core/link-text-indexes/

How well do the search engines of databases like mongoDB and Postgres do compared to something like Elasticsearch? (in 2020)

I am working on a website, where users can create entries into a database. These entries are all of the same form, so I was using Sqlite (with FTS5) so far, which is the only database I know ;)
However it is crucial that users can search these entries on the website. The full text search is working decently well (the users know approximately what they are looking for) but I need two improvements:
Wrong spelling should return the correct entry (I have seen the spellfix extension for sqlite for that, but I don't know how well it works)
more importantly if a user enters a few query words on the website I try to MATCH those with a sql query. If a user enters too many words, it will return 0 matches:
for example: if a user inputs "sars covid 19" into the search-bar:
CREATE VIRTUAL TABLE TEST USING FTS5(SomeText);
INSERT INTO TEST(SomeText) VALUES
('Covid 19');
SELECT SomeText
FROM TEST
WHERE SomeText MATCH 'sars covid 19';
=> 0 matches, but I would want it to return the 'covid 19' entry.
I realise that sqlite might be too simplistic for a database that needs to handle searches well. Now my question is: Do Postgres or MongoDB have search engines that include the functionality that I need? Or is it worth diving into solutions with Elastic Search etc.?
Most articles I found on this are 5-10 years old, so I am wondering what the current state of affairs is regarding search engines in databases. Any hints are greatly appreciated
Combination es + mongodb work well, you index and perform full text search in es and you keep the original documents with some key fields indexed in mongodb...
Elasticsearch will work for sure. You only have to think about how you will index your document, and you will be able to find them the way you index them, in your context it seems that the default text will work with a match query :
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
MongoDb will work too in this simple case : https://docs.mongodb.com/manual/text-search/ , but mongo wont work with tokenizer so if you need to upgrade your text search mongo will be limited.
Postgresql could do it, using the like but I am not familiar with enough, if you have 10k entries, it will be ok for sure, if you expect 1 millions, mongo or es would be better.
If you have to choose between mongodb and es, you have to be more specific in your question, for full text, es is really nice, having a lot of features, mongodb give some nice database tools too. Sometimes es will be better, sometimes mongo, depends of what you need. If you only want full text, es is a must.

advanced searching mongodb using mongomapper, sunspot/solr or sphinx?

I have am using mongodb with mongomapper to store all my products. Each product belongs to multiple categories that have many levels i.e. category, sub category etc.
Each product has many search fields that are embedded documents in product.
All this is working and I now want to add search to the app.
The search system needs text search: multiple, dynamic, faceted search including min/max range search.
I have been looking into sunspot gem but having difficulty setting it up on dev let alone trying to run it in production! And I have also looked at sphinx.
But I am wondering if using just mongomapper / mongodb will be quick enough and the best way, as its quite a complex search system ?
Any help / suggestions / experiences / tutorials and examples on this would be most appreciated.
Thanks a lot,
Rick
I've been involved with a very large Sphinx powered search and I think its awful. Very difficult to configure if you want anything past a very simple full-text search. Solr\Lucene, on the other hand, is incredibly flexible and was unbelievably easier to setup and get running.
I am not using Solr in conjunction with MongoDB to power full text search with all the extra goodies, like facets, etc. Depending on how you configure Solr, you may not need to even hit your MongoDB for data. Or, you may tell Solr to index fields, but not to store them and instead you just store the ObjectId's that correspond to data inside of MongoDB.
If your search truly is a complex search system, I very strongly recommend that you do not use MongoDB for search and go with Solr. One big reason is that MongoDb doesnt have a full text feature - instead, it has regular expression matches. The Regex matches work wonderfully but will only use indexes in certain cases.

Please advise an optimal solution to full text search in mongoDB

The documents in my database have names and descriptions among other fields. I would like to allow the users to search for those documents by providing some keywords. The keywords should be used to lookup in both the name and the description field. I've read the mongoDB documentation on full text search and it looks really nice and easy if I want to search for keywords in the name field of my documents. However, the description field contains free form text and can take up to 2000 characters, so potentially there are a few hundred words per document. I could treat them the same way as names and just split the whole description into separate words and store it as another tag-like array (as per the Mongo example), but it seems like a terrible idea - each document's size could be almost doubled, plus there are characters like dots, commas, etc.
I know there are specialized solutions for exactly this kind of problems and I was just looking at Lucene.Net, I also saw Solr mentioned here and there.
Should I be looking to implement this search feature in mongoDB or should I use a specialized solution? Currently I just have one instance of mongod and one instance of a web server. We might need to scale later, but for now that is all I use. I'd appreciate any suggestions on how to implement this feature.
If storing the text split out into an array per the documented approach is not viable (I can understand your concerns), then I think you should look into a specialised solution.
Quote from the MongoDB documentation:
MongoDB has interesting functionality
that makes certain search functions
easy. That said, it is not a dedicated
full text search engine.
So, for more advanced full text search functionality I think a dedicated engine would be more suited. I have no experience in this area so I can't offer much in the way of suggestions from here, other than what my thoughts would be if I was in the same boat:
how much work involved in using a dedicated full-text search engine instead of MongoDB's functionality?
does that add more complexity / is it worth it?
would it be quicker/simpler to use MongoDB and just take the hit on the extra disk space?
maybe MongoDB will support better full-text functionality in future (it is rapidly evolving after all)
Fulltext search support is planned for the future. However right now you have to go with Solr & friends. Using the built-in "fulltext" functionality is not really suitable for real world usage.

Sphinx search engine and related tags

I'm using Sphinx search engine to index all my Intranet documents using tags. With that I don't have any trouble to find specific documents with one ore more tags.
I want to go further with a new feature like the StackOverflow "related tags" feature.
Does anybody know the best way to do this with Sphinx ?
Thanks
You run a boolean OR query on all terms in the document you want to find related items for. It can be fairly slow because all documents in the database has to be ranked on similarity, unless you limit the search using and:ed terms. See my text here: https://stackoverflow.com/questions/3121266/efficient-item-similarity-search-using-sphinx