I have created the following collection and text index:
db.test1.insert({"name":"kumar goyal","email":'rakesh.goyal#gmail.com',"phoneNo":9742140651});
db.test1.ensureIndex({ "$**": "text" },{ name: "TextIndex" })
Partial search is working for name and email field e.g.
db.test1.runCommand("text",{search:'rakesh'});
returns the record properly but its not working on the phoneNo field.
db.test1.runCommand("text",{search:'9742'});
is not working.
I guess text index is not working on number fields. Is there anyway to make it work in mongodb?
You are inserting the telephone number as a number, but then searching for it as a string.
Perhaps you should consider inserting the telephone number as a string. You're not planning to perform any arithmetic actions on that field are you?
This way mongo will be able to perform textual searches (like the one you gave as an example) on the "numbers" but will treat them as strings.
Mongo DB full text does not support partial search. Hence it was not working, nothing to do with numbers.
Related
When trying to search for part of an email address, I see an issue -
When I search for a part of the email address containing the domain (eg- "prateeknar#gma" when email ID is "prateeknar#gmail.com"), I get all the records in the Collection.
db.getCollection('Employee').find({$text:{$search:"prateeknar#gm"}}) -. returns all records in the collection
However, when I search for just the username or username followed by "2" (eg - "prateeknar" or "prateeknar#"), I get the right results.
db.getCollection('Employee').find({$text:{$search:"prateeknar"}}) -. returns the results properly
As pointed out by #AlexBlex - delimiting with double quotes is a solution.This solves the issue but it adds a lot of latency. Can we reduce the latency in some other way?
What is the issue? How can I fix this ?
Use regex For searching
Ref Link : mongoRegex
db.Employee.find({email:{$regex:"text"}})
try to remove the text index and do normal index on your email filed.
then you can do a simple regex to find it
db.getCollection('Employee').find({email:/^prateeknar#gm/})
Mongo DB search by index returns true if some of the string matches something in the collection
when you search for "prateeknar", only one document matches that
however when you search for "prateeknar#gm" probably most of your documents match #gm and mongo is returning those also, try adding some emails that aren't gmail and you should get no return from those
Is there a difference between a wildcard search index like $** and text indexes that I create for each of the fields in the collection ?
I do see a small difference in response time when I individually create text indexes. Using individual indexes, returns a better response. I am not able to post an example now, but will try to.
A wildcard text search will index every field that contains string data for each document in the collection (https://docs.mongodb.com/manual/core/index-text/#wildcard-text-indexes).
Because you are essentially increasing the number of fields indexed with a wild card text index, it would take longer to run compared to targeting specific fields for a text index.
Since you can only have one text index per collection (https://docs.mongodb.com/manual/core/index-text/#create-text-index), its worth considering which fields you plan on querying against beforehand.
Example:
{
shortName: "KITT",
longName: "Knight Industries Two Thousand",
fromZeroToSixty: 2,
year: 1982,
manufacturer: "Pontiac",
/* 25 more fields */
}
Ability to query by at least 20 fields which means that only 10 fields are left unindexed
There's 3 fields (all number) that could be used for sorting (both ways)
This leaves me wondering that how does sites with lots of searchable fields do it: e.g real estate or car sale sites where you can filter by every small detail and can choose between several sort options.
How could I pull this off with MongoDB? How should I index that kind of collection?
Im aware that there are dbs specifically made for searching but there must be general rules of thumb to do this (even if less performant) in every db. Im sure not everybody uses Elasticsearch or similar.
---
Optional reading:
My reasoning is that index could be huge but the index order matters. You'll always make sure that fields that return the least results are first and most generic fields are last in index. However, what if user chooses only generic fields? Should I include non-generic fields to query anyway? How to solve ordering in both ways? Or index intersection saves the day and I should just add 20 different indexes?
text index is your friend.
Read up on it here: https://docs.mongodb.com/v3.2/core/index-text/
In short, it's a way to tell mongodb that you want full text search over a specific field, multiple fields, or all fields (yay!)
To allow text indexing of all fields, use the special symbol $**, and define it of type 'text':
db.collection.createIndex( { "$**": "text" } )
you can also configure it with Case Insensitivity or Diacritic Insensitivity, and more.
To perform text searches using the index, use the $text query helper, see: https://docs.mongodb.com/v3.2/reference/operator/query/text/#op._S_text
Update:
In order to allow user to select specific fields to search on, it's possible to use weights when creating the text-index: https://docs.mongodb.com/v3.2/core/index-text/#specify-weights
If you carefully select your fields' weights, for example using different prime numbers only, and then add the $meta text score to your results you may be able to figure out from the "textScore" which field was matched on this query, and so filter out the results that didn't get a hit from a selected search field.
Read more here: https://docs.mongodb.com/v3.2/tutorial/control-results-of-text-search/
Since it is not possible to find "blueberry" by the word "blue" by using a mongodb full text search, I want to help my users to complete the word "blue" to "blueberry". To do so, is it possible to query all the words in a mongodb full text index -> that I can use the words as suggestions i.e. for typeahead.js?
Language stemming in text search uses an algorithm to try to relate words derived from a common base (eg. "running" should match "run"). This is different from the prefix match (eg. "blue" matching "blueberry") that you want to implement for an autocomplete feature.
To most effectively use typeahead.js with MongoDB text search I would suggest focusing on the prefetch support in typeahead:
Create a keywords collection which has the common words (perhaps with usage frequency count) used in your collection. You could create this collection by running a Map/Reduce across the collection you have the text search index on, and keep the word list up to date using a periodic Incremental Map/Reduce as new documents are added.
Have your application generate a JSON document from the keywords collection with the unique keywords (perhaps limited to "popular" keywords based on word frequency to keep the list manageable/relevant).
You can then use the generated keywords JSON for client-side autocomplete with typeahead's prefetch feature:
$('.mysearch .typeahead').typeahead({
name: 'mysearch',
prefetch: '/data/keywords.json'
});
typeahead.js will cache the prefetch JSON data in localStorage for client-side searches. When the search form is submitted, your application can use the server-side MongoDB text search to return the full results in relevance order.
A simple workaround I am doing right now is to break the text into individual chars stored as a text indexed array.
Then when you do the $search query you simply break up the query into chars again.
Please note that this only works for short strings say length smaller than 32 otherwise the indexing building process will take really long thus performance will be down significantly when inserting new records.
You can not query for all the words in the index, but you can of course query the original document's fields. The words in the search index are also not always the full words, but are stemmed anyway. So you probably wouldn't find "blueberry" in the index, but just "blueberri".
Don't know if this might be useful to some new people facing this problem.
Depending on the size of your collection and how much RAM you have available, you can make a search by $regex, by creating the proper index. E.g:
db.collection.find( {query : {$regex: /querywords/}}).sort({'criteria': -1}).limit(limit)
You would need an index as follows:
db.collection.ensureIndex( { "query": 1, "criteria" : -1 } )
This could be really fast if you have enough memory.
Hope this helps.
For those who have not yet started implementing any database architecture and are here for a solution, go for Elasticsearch. Its a json document driven database similar to mongodb structurally. It has "edge-ngram" analyzer which is really really efficient and quick in giving you did you mean for mis-spelled searches. You can also search partially.
I spent several hours reading through docs and forums, trying to find a solution for the following problem:
In A Mongo database, I have a collection with some unstructured data:
{"data" : "some data" , "_id" : "497ce96f395f2f052a494fd4"}
{"more_data" : "more data here" ,"recursive_data": {"some_data": "even more data here", "_id" : "497ce96f395f2f052a4323"}
{"more_unknown_data" : "string or even dictionaries" , "_id" : "497ce96f395f2f052a494fsd2"}
...
The catch is that the elements in this collections don't have a predefined structure and they can be unlimited levels.
My goal is to create a query, that searches through the collection and finds all the elements that match a regular expression( in both the keys and the values ).
For example, if I have a regex: '^even more' - It should return all the elements that have the string "even more" somewhere in the structure. In this case - that will be the second one.
Simply add an array to each object and populate it with the strings you want to be able to search on. Typically I'd lowercase those values to make case-insensitive search easy.
e.g. Tags : ["copy of string 1", "copy of string 2", ...]
You can extend this technique to index every word of every element. Sometimes I also add the field with an identifier in front of it, e.g. "genre:rock" which allows searches for values in specific fields (choose the ':' character carefully).
Add an index on this array and now you have the ability to search for any word or phrase in any document in the collection and you can search for "genre:rock" to search for that value in a specific field.
Ever if you will find a way to do this you will still face problem of slow searches, becouse there are no indexes
I had similar problem and solution was to create additional database(on same engine or any other more suitable for search) and to fill it with mongo keys and combined to one text field data. And to update it whenever mongodb data updates.
If it suitable you can also try to go this way...At least search works very fast. (I used postgresql as search backend)