Text Indexes MongoDB, Minimum length of search string - mongodb

I have created a text index for collection X from mongo shell
db.X.ensureIndex({name: 'text', cusines: 'text', 'address.city': 'text'})
now if a document whose name property has a value seasons, its length is 7
so if I run the find query(with a search string of length <= 5)
db.X.find({$text: {$search: 'seaso'}})
it does not return any value if I change the search string to season (length >= 6) then it returns the document.
Now my question is does the search string has some minimum length constraint to fetch the records.
if yes, then is there is any way to change it?

MongoDB $text searches do not support partial matching. MongoDB allows support text search queries on string content with support for case insensitivity and stemming.
Looking at your examples:
// this returns nothing because there is no inferred association between
// the value: 'seasons' and your input: 'seaso'
db.X.find({$text: {$search: 'seaso'}})
// this returns a match because 'season' is seen as a stem of 'seasons'
db.X.find({$text: {$search: 'season'}})
So, this is not an inssue with the length of your input. Searching on seaso returns no matches because:
Your text index does not contain the whole word: seaso
Your text index does not contain a whole word for which seaso is a recognised stem
This presumes that the language of your text index is English, You can confirm this by runing db.X.getIndexes() and you'll see this in the definition of your text index:
"default_language" : "english"
FWIW, if your index is case insensitive then the following will also return matches:
db.X.find({$text: {$search: 'SEaSON'}})
db.X.find({$text: {$search: 'SEASONs'}})
Update 1: in repsonse to this question "is it possible to use RegExp".
Assuming the name attribute contains the value seasons and you are seaching with seaso then the following will match your document:
db.X.find({type: {$regex: /^seaso/}})
More details in the docs but ...
This will not use your text index so if you proceeed with using the $regex operator then you won't need the text index.
Index coverage with the $regex operator is probably not what you expect, the brief summary is this: if your search value is anchored (i.e. seaso, rather than easons) then MongoDB can use an index but otherwise it cannot.

Related

Is there a ts (text search) function would return found string instead of boolean?

I am using PostgreSQL to find out the matched string in the article by using tsvector and tsquery.
I read the PostgreSQL manual 12.3 Controlling Text Search but nothing could help me to get the exact output I wanted.
Query:
SELECT ts_headline('english',
'The most common type of search
is to find all documents containing given query terms
and return them in order of their similarity to the
query.',
to_tsquery('query & similarity'),
'StartSel = <, StopSel = >');
ts_headline output
The most common type of search
is to find all documents containing given <query> terms
and return them in order of their <similarity> to the
<query>.
I'm looking for the only string as mentioned below:
query, similarity
If you pick delimiters for StartSel and StopSel that you are sure do not exist elsewhere in the string, then it is pretty easy to do this with a regexp.
SELECT distinct regexp_matches[1] from
regexp_matches(
ts_headline('english',
'The most common type of search
is to find all documents containing given query terms
and return them in order of their similarity to the
query.',
to_tsquery('query & similarity'),
'StartSel = <, StopSel = >'
),
'<(.*?)>','g'
);

Pymongo find document whose field is a substring of a given string

Let's say we have a collection with the following documents:
{_id : 1, str : 'hello'}
{_id : 2, str : 'hello world'}
{_id : 3, str : 'world'}
And I would like to find documents whose str field is a substring of hello world!. Is there a way to do this in pymongo?
I know the opposite - getting documents whose field contains a string can be done using $regex, but what I want is getting documents whose field is contained by a string.
You can use text indexes for this, which support text search queries on string content. Text indexes can include any field whose value is a string or an array of string elements.
Here's a minimal example using pymongo:
# Get database connection
conn = pymongo.MongoClient('mongodb://localhost:27017/')
coll = conn.get_database('test').get_collection('test')
# Create text index
coll.create_index([('str',pymongo.TEXT)])
# Text search
print list(coll.find({'$text': {'$search': 'hello world'}}))
With your example documents, this will result in:
[{u'_id': 3.0, u'str': u'world'},
{u'_id': 2.0, u'str': u'hello world'},
{u'_id': 1.0, u'str': u'hello'}]
For more information, please see:
Text Indexes
$text operator

How can I query several words in indexed fields in pymongo?

When I want to execute a indexed text search i use the following command:
text_results = db.command('text', 'foo', search=query)
I am now wondering how I can query several words. I tried already to set the query to query = ['word1', 'word2'] but that does not work.
Using the the search string "word1 word2" searches for the term word1 OR the term word2:
text_results = db.command('text', 'foo', search='word1 word2')
Also, here's a quote from docs:
If the search string includes phrases, the search performs an AND with
any other terms in the search string; e.g. search for "\"twinkle
twinkle\" little star" searches for "twinkle twinkle" and ("little" or
"star").
So to search where the field contains "word1" AND "word2", go for
text_results = db.command('text', 'foo', search="\"word1\" \"word2\"")

Mongodb count query to search for strings containing either one string or the other

Hi I trying to get a count of the documents in a mongodb containing either of the strings(words). I have around 50 words(or strings) . I am aware that i need to use "or" query here.
Here is the query which i tried: But I am not sure if this is correct
db.collection.find({"created_at": /^sep 23.*/i, "$and": [{ "text": /.*abc.*/i },{ "text": /.*efg.*/i }]}).count()
You can do this by using $in which acts as an OR match against a single field:
db.collection.find({created_at: /^sep 23/i, text: {$in: [/abc/i, /efg/i] }}).count()
And you can simplify your regular expressions a bit to remove the .* parts because those are already implied.
Given you have not specified that you want to search a specific field in a document, I would suggest the text search option described here: http://docs.mongodb.org/manual/reference/operator/query/text/
With reference to the doc mentioned above you could use:
db.collection.find( { $text: { $search: "word1 word2 word3" } } )
space delimited strings are considered as having a logical OR operator between them...

mongodb , wildcard in $in

I have a mongodb query where i want to get documents if a field has particular value.
db.collection.find({key:{$in:['value1','value2']}}) if i run above command i get documents containing either 'value1' or 'value2'. but lets just say there are no values. and i search db.collection.find({key:{$in:[]}}), nothing is displayed. and db.collection.find({key:{$in:[*]}}) gives unexpected token* which wild card do i use in $in to show all results.?
I think this is logically consistent behavior for $in. The query
db.collection.find({ "key" : { "$in" : [] } })
could be translated as "find all the documents where the value of key is one of the values contained in the array []". Since there are no values in the array [], there are no matching documents. If you want to find all of the extant values for key, use .distinct to return them as an array:
db.collection.distinct("key")
.distinct will use an index if possible.
If you want a query to match all documents, omit the query selector from .find:
db.collection.find()
as suggested in the comments.