Mongodb Text Search Processed Query - mongodb

I'm using the text search feature and I couldn't find a way to get the stemmed terms in the query. Is there a way to also return the list of words in the stemmed form together with the query results and also the parts of the document that matched the result? This would be meaningful to understand and identify which part of the document matches.
Cheers!

As of MongoDB 2.6, the only meta information about the text search that can be used is a score indicating the strength of the match. You can submit a ticket on the Core Server project to request this feature (as I looked and I don't think one exists at the moment).

Related

MongoDB sorting by documents with more fields filled out

I'm currently trying to get a query that I can find by zip code (very easy)
e.g. db.col.find({zip: 60010})
but then sort based on how many fields on the Mongo document are filled out (or not null; not so easy)
How can I do this in a fast an efficient way?
You can do it using objectToArday and addFields then size
You can refer this answer
But it's costlier option than having one field which tells the count of fields in that document.
You can decide based on read or write intensive application.

MongoDB: How to remove all documents matching a query and return their ids

With MongoDB, is it possible to specify a query and removes all matching documents while returning the documents' ids (alternatively the whole document)?
I found "How to get removed document in MongoDB?" that explains how remove a single document and return it using findAndModify. However, I need to remove a bunch of documents at once.
I'm ok with a solution involving the aggregation pipeline as long as it fulfils the requirements.
I'm using the offical C# driver, but solutions in JS are ok.

MongoDB 3 (text search) or Elasticsearch?

MongoDB 3 offers us text indexes (http://docs.mongodb.org/manual/core/index-text/). My question is, should I use Elasticsearch or MongoDB 3 with the text index feature? Which is the best for searching through lots of entries? Which is the one with the best performance (5 million+ entries) in 2015?
I googled for this information, but I only found out-dated answers.
Thanks a lot!
EDIT: My use case is searching titles, descriptions and profiles for keywords. Is MongoDB 3 capable of searching these things with the text index feature (as fast or close to) like Elasticsearch?
Depends on what your use case is. If you want full text search capabilities like finding a document based on keywords, or finding a product based on keywords that may be present in title, description, review or tags of the product. If such is the use case elastic search is the thing to go for.
You may also want to evaluate Lucene/ Solr for above use cases.

Full Text Search & Inverted Indexes in MongoDB

I’m playing around with MongoDB for the moment to see what nice features it has. I’ve created a small test suite representing a simple blog system with posts, authors and comments, very basic.
I’ve experimented with a search function which uses the MongoRegEx class (PHP Driver), where I’m just searching through all post content and post titles after the sentence ‘lorem ipsum’ with case sensitive on “/I”.
My code looks like this:
$regex = new MongoRegEx('/lorem ipsum/i');
$query = array('post' => $regex, 'post_title' => $regex);
But I’m confused and stunned about what happens. I check every query for running time (set microtime before and after the query and get the time with 15 decimals).
For my first test I’ve added 110.000 blog documents and 5000 authors, everything randomly generated. When I do my search, it finds 6824 posts with the sentence “lorem ipsum” and it takes 0.000057935714722 seconds to do the search. And this is after I’ve reset the MongoDB service (using Windows) and this is without any index other than the default on _id.
MongoDB uses a B-tree index, which most definitely isn’t very efficient for a full text search. If I create an index on my post content attribute, the same query as above runs in 0.000150918960571, which funny enough is slower than without any index (slower with a factor of 0.000092983245849). Now this can happen for several reasons because it uses a B-tree cursor.
But I’ve tried to search for an explanation as to how it can query it so fast. I guess that it probably keeps everything in my RAM (I’ve got 4GB and the database is about 500MB). This is why I try to restart the mongodb service to get a full result.
Can anyone with experience with MongoDB help me understand what is going on with this kind of full text search with or without index and definitely without an inverted index?
Sincerely
- Mestika
I think you simply didn't iterate over the results? With just a find(), the driver will not send a query to the server. You need to fetch at least one result for that. I don't believe MongoDB is this fast, and I believe your error to be in your benchmark.
As a second thing, for regular expression search that is not anchored at the beginning of the field's value with an ^, no index is used at all. You should play with explain() to see what is actually happening.

Sphinx search engine and related tags

I'm using Sphinx search engine to index all my Intranet documents using tags. With that I don't have any trouble to find specific documents with one ore more tags.
I want to go further with a new feature like the StackOverflow "related tags" feature.
Does anybody know the best way to do this with Sphinx ?
Thanks
You run a boolean OR query on all terms in the document you want to find related items for. It can be fairly slow because all documents in the database has to be ranked on similarity, unless you limit the search using and:ed terms. See my text here: https://stackoverflow.com/questions/3121266/efficient-item-similarity-search-using-sphinx