MongoDb Indexing full text search - mongodb

I have a document named "posts", it is like this :
{
"_id" : ObjectId("5afc22290c06a67f081fa463"),
"title" : "Cool",
"description" : "this is amazing"
}
And I have putted index on title and description :
db.posts.createIndex( { title: "text", description: "text" } )
The problem is when I search and type for example "amaz" it return the data with "this is amazing" above, while it should return data only when I type "amazing"
db.posts.find({ $text: { $search: 'amaz' } }, (err, results) => {
return res.json(results);
});

Credit to #amenadiel for the original data here:
https://stackoverflow.com/a/24316510/7948962
From the MongoDB docs:
https://docs.mongodb.com/manual/core/index-text/
Index Entries
text index tokenizes and stems the terms in the indexed fields for the index entries. text index stores one index entry for each unique stemmed term in each indexed field for each document in the collection. The index uses simple language-specific suffix stemming.
This is to allow you to search partial "stem" terms in the index, and have the database return all related results. In your specific scenario, amaz is a bit of an odd token as it is a bit irregular compared to other words such as talking, which is tokenized to the word talk, or talked to talk. Similarly walking and walked to walk.
In your case, the word amazing in your text will be tokenized as amaz. If your column contained data such as amazed, it would receive the same amaz token as well. And those results would also be returned from a search of amaz.

Related

how to write mongoldb query for search/filter

I want to filter input text from database and show that data.as shown in below I have tried these query $text query giving this working for text values but how to filter query for numeric fields like id or date.
find({$text :{$search:<input text>, $caseSensitive: false}})
I want output data which match with input text/value(which can be text, date,number).
MongoDB provides text indexes to support text search queries on string content. text indexes can include any field whose value is a string or an array of string elements.
To perform text search queries, you must have a text index on your collection
db.stores.createIndex( { name: "text", description: "text" } )
You can use your search string with $search parameter :
db.stores.find( { $text: { $search: "coffee",$caseSensitive: false, } } )
Full documentation is available here : https://docs.mongodb.com/manual/text-search/

Mongo full text search doesn't find

I'm trying to implement full text search in my Mongo database. It's a database of audio tracks metadata. I wan't to search by artistName and title of a track. I have these records in the tracks collection (showing only important fields):
db.tracks.find({},{artistName: 1, title: 1})
{ "_id" : "A10328E00047516670", "artistName" : "Tapani Kansa", "title" : "Tuulia" }
{ "_id" : "A10328E00047516661", "artistName" : "Tapani Kansa", "title" : "Rakkautemme valssi" }
{ "_id" : "A10328E0004751669W", "artistName" : "Tapani Kansa", "title" : "Täysikuu" }
{ "_id" : "A10328E0004751668Y", "artistName" : "Tapani Kansa", "title" : "Muista minua" }
I've created the text index on this collection:
db.tracks.createIndex({artistName: 'text', title: 'text', lyrics: 'text'})
But when I try to search the tracks, no results are returned:
rs-ds047345:PRIMARY> db.tracks.find({$text: {$search: 'Tapani'}}).size()
0
rs-ds047345:PRIMARY> db.tracks.find({$text: {$search: 'Rakkautemme valssi'}}).size()
0
I accidentally noticed, that when I crop some letters from the end of the searched word, I'm starting to get some results... so full text search somehow works, just not in way I would like and expect.
db.tracks.find({$text: {$search: 'Tapa'}}).size()
12
rs-ds047345:PRIMARY> db.tracks.find({$text: {$search: 'Rakkaute'}}).size()
1
Could someone please tell me, how can I search the database using full words, or what I'm doing wrong?
I've tried that on MongoDB versions 3.0.8 and 3.2.1
according to spec -
For case insensitive and diacritic insensitive text searches, the
$text operator matches on the complete stemmed word. So if a document
field contains the word blueberry, a search on the term blue will not
match. However, blueberry or blueberries will match.
what I will suggest is normal index and a regex search
db.tracks.createIndex({"artistName": 1})
db.tracks.createIndex({ "title" : 1})
db.tracks.createIndex({ "lyrics": 1})
db.tracks.find({artistName:"/Tap/[0-10]"}).explain()
the square bracket will force index scan for regex instead of colscan
was testing on 3.0.6 and 3.2.3 with no luck :(
So, the problem was in the documents stored in database. I didn't noticed that they contains a field named language, which changes full text search behaviour, although I tried to disable word stemming by by setting language: 'none' in index and queries.
When I renamed the language field to a different name, the full text search started to work exactly as I expect.

Get text words from query

I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.
Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.
As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/

mongoDB text index on subdocuments

I have a collection that looks something like this
{ "text1" : "text",
"url" : "http:....",
"title" : "the title",
......,
"search_metadata" : { "tags" : [ "tag1", "tag2", "tag3" ],
"title" : "the title",
"topcis": [ "topic1", "topic2"]
}
}
I want to be able to add a text index to search_metadata and all it's subdocuments.
ensureIndex({search_metadata:"text"}) Gives me no results
and:
ensureIndex({"$**":"text"}) will give me irrelevant data
How can I make it happen?
From the text indexes page:
text indexes can include any field whose value is a string or an array
of string elements. To perform queries that access the text index, use
the $text query operator
Your search_metadata field is a series of sub-documents, not a string or an array of strings, so it basically is not in the right format to make use of a text index in MongoDB as it is currently structured.
Now, embedded in search_metadata you have both strings and arrays of strings, so you could use a text index on those, so an index on {search_metadata.tags : "text"} for example fits the criteria and should work just fine.
Hence, it's a choice between restructuring the field to meet the text index criteria, or a matter of indexing the relevant sub-fields. If you take the latter approach you may find that you don't need text indexes on each of the fields and a simpler (and far smaller) index may serve you just as well (using a normal index on tags and then $elemMatch for example).

mongodb full text search index all fields vs index one field

Currently lastest MongoDB stable version is v2.4.6, full text search is in beta state.
Which one will be faster:
To create one field named "text" and extract all text info from all other fields and concatenate them into field - "text" and index that field ("text")
So that will look like
db.collection.ensureIndex( { "text": "text" } );
Just index all fields:
db.collection.ensureIndex( { "$**": "text" } );
Which one is better in performance and in disk storage requirements?