Does MongoDB support soundex or fuzzy matching? I want to spot dupes of basic contact name and address fields. I'm using the official C# driver. Thanks
Mongodb doesn't support soundex matching, but it has Full Text Search.
Also,
You can always just store the
soundex-encoded string in a separate
field in mongo and search against
that. Soundex is a really trivial
algorithm and should only take a handful of
lines.
-- from mongodb-user
MongoDB does not support real fulltext search and nothing like soundex (which is a very bad part for matching terms - something like Levensthein distance calculation is much better).
In addition look at my last comment here:
Full-text search in NoSQL databases
Related
Hope everyone is doing great.
I had a bit of a "weird" question regarding doing non-exact/related searches with MongoDB.
I'm building a web application with a sort of "search engine" search bar if you will (I.e.: people input stuff and the results are documents related to that search instead of exact results), and I'm having a difficult time deciding the best approach.
Recently I discovered about MongoDB's full text search and it's been amazing so far in terms of what I want to achieve. However, as my search functionalities get more complex (adding stuff like sorting, pagination, etc.) I notice a lack of documentation on best practices in comparison to using find() queries. I mean, I know there are aggregation pipeline stages for doing those types of functionalities, but I have found the amount of proper examples kinda lacking.
Taking that into consideration, I've starting to consider changing my approach to using find() queries, but I can't seem to find examples of people using them for non-exact/related matches in the same way of what full text search can achieve. How would you even do that with find()? Would you use a more elaborated Regex or something similar? Is it even worth the try?
I would love to hear your anecdotes, specially as your search features became more complex, to ensure that the app remains performant. Do you swear by full text search? Or have you achieved search engine-like search using the good old find()? If so, how?
Thank you everyone!
Basically what i know in mongodb full text search is come with 2 types.
mongoDB atlas search
On-premise text search
To perform text search you can learn more on below ref docs
REFERENCE: https://www.mongodb.com/docs/manual/core/link-text-indexes/
I Want to query using part of id to get all the matched documents. So I tried “starts with” and "contains" which works find but is there any performance issue for large collection?
The best way to make this search optimum :
Add $text index on the fields you want to do search in. This is really important because internally it tokenize your string to that you could search for a part of it.
Use regex which is also quicker to do.
If you are using aggregate, read this mongodb official doc about aggregation optimization which might help you to implement this in efficient manner : https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
Last but not the least, if you are not yet fully inclined towards mongodb and project is fresh, look out for elasticsearch service which is based on Lucene. Its extremely powerful doing these kinds of searches.
I want to perform LIKE search (e.g. all words containing 'abc' i.e. %abc%) but by using the Hibernate Search API.
Is there a way to do it by using the existing analyzers ?
If so which one is better in terms of performance; SQL or Hibernate Search for this case ?
Maybe have a look at this:
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/RegexpQuery.html?is-external=true
But note this:
"Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow RegexpQueries, a Regexp term should not start with the expression .*"
This should be included in Hibernate-Search
Correct, Hibernate Search is much more efficient for this than using a SQL LIKE criteria.
The StandardAnalyzer (org.apache.lucene.analysis.standard.StandardAnalyzer) is a good fit, other analyzers will do more advanced text splitting.
I have read a few articles recently on the combination of mongodb for storage and elasticsearch for indexing/search. I feel like I'm missing something though. Why would you go this route as opposed to just using mongo to index the data? What benefits does elasticsearch bring and is it worth the added complexity?
ElasticSearch implements a lot more features, such as custom splitting of text into words, custom stemming, facetted search and a whole lot more. While MongoDB's (rather simple) text search does some of this, it is not nearly as powerful as ElasticSearch.
If all you ever do is look for a single string in a single field, then MongoDB's normal query system will work excellently for that. If you need to look for words in multiple fields, then MongoDB's text search will work. If you need anything more than that, ElasticSearch is the way to go.
A search engine and a database do some fundamentally different things. A good search engine (like ElasticSearch) supports far more elaborate and complex indexing, facets, highlighting etc. In the case of ElasticSearch, you also get your replies 'real-time'. On the other hand, a search engine doesn't return every single document that matches your query. Instead, it will score documents according to how much they match, and return the top scoring ones. When you query a database such as MongoDB, you should expect it to return everything that matches your query.
You can store the entire document in ElasticSearch, but it is usually not the optimal solution. Normally you will have it configured to return the document id's, which you use to fetch the document from a database. MongoDB is a database optimized for document based storage. this is why you hear about people using them together.
edit:
When this was posted, it matched the recommendations, but this may no longer be the case.
Derick's answer pretty much nails it. The questions behind all this is:
What are the features you want to implement in your application?
If you rely on heavy searching capabilities in large chunks of text, ElasticSearch is probably a good thing to use. If you want to have a flexible datastore that can cope with complex ad-hoc queries, Mongo might be a good fit. If you have different requirements for a datastore, it is often a good thing to combine two tools instead of implementing all kind of workarounds to make it work with just one datastore.
Choose the right tool for the job.
The documents in my database have names and descriptions among other fields. I would like to allow the users to search for those documents by providing some keywords. The keywords should be used to lookup in both the name and the description field. I've read the mongoDB documentation on full text search and it looks really nice and easy if I want to search for keywords in the name field of my documents. However, the description field contains free form text and can take up to 2000 characters, so potentially there are a few hundred words per document. I could treat them the same way as names and just split the whole description into separate words and store it as another tag-like array (as per the Mongo example), but it seems like a terrible idea - each document's size could be almost doubled, plus there are characters like dots, commas, etc.
I know there are specialized solutions for exactly this kind of problems and I was just looking at Lucene.Net, I also saw Solr mentioned here and there.
Should I be looking to implement this search feature in mongoDB or should I use a specialized solution? Currently I just have one instance of mongod and one instance of a web server. We might need to scale later, but for now that is all I use. I'd appreciate any suggestions on how to implement this feature.
If storing the text split out into an array per the documented approach is not viable (I can understand your concerns), then I think you should look into a specialised solution.
Quote from the MongoDB documentation:
MongoDB has interesting functionality
that makes certain search functions
easy. That said, it is not a dedicated
full text search engine.
So, for more advanced full text search functionality I think a dedicated engine would be more suited. I have no experience in this area so I can't offer much in the way of suggestions from here, other than what my thoughts would be if I was in the same boat:
how much work involved in using a dedicated full-text search engine instead of MongoDB's functionality?
does that add more complexity / is it worth it?
would it be quicker/simpler to use MongoDB and just take the hit on the extra disk space?
maybe MongoDB will support better full-text functionality in future (it is rapidly evolving after all)
Fulltext search support is planned for the future. However right now you have to go with Solr & friends. Using the built-in "fulltext" functionality is not really suitable for real world usage.