Pymongo find document whose field is a substring of a given string - mongodb

Let's say we have a collection with the following documents:
{_id : 1, str : 'hello'}
{_id : 2, str : 'hello world'}
{_id : 3, str : 'world'}
And I would like to find documents whose str field is a substring of hello world!. Is there a way to do this in pymongo?
I know the opposite - getting documents whose field contains a string can be done using $regex, but what I want is getting documents whose field is contained by a string.

You can use text indexes for this, which support text search queries on string content. Text indexes can include any field whose value is a string or an array of string elements.
Here's a minimal example using pymongo:
# Get database connection
conn = pymongo.MongoClient('mongodb://localhost:27017/')
coll = conn.get_database('test').get_collection('test')
# Create text index
coll.create_index([('str',pymongo.TEXT)])
# Text search
print list(coll.find({'$text': {'$search': 'hello world'}}))
With your example documents, this will result in:
[{u'_id': 3.0, u'str': u'world'},
{u'_id': 2.0, u'str': u'hello world'},
{u'_id': 1.0, u'str': u'hello'}]
For more information, please see:
Text Indexes
$text operator

Related

Text Indexes MongoDB, Minimum length of search string

I have created a text index for collection X from mongo shell
db.X.ensureIndex({name: 'text', cusines: 'text', 'address.city': 'text'})
now if a document whose name property has a value seasons, its length is 7
so if I run the find query(with a search string of length <= 5)
db.X.find({$text: {$search: 'seaso'}})
it does not return any value if I change the search string to season (length >= 6) then it returns the document.
Now my question is does the search string has some minimum length constraint to fetch the records.
if yes, then is there is any way to change it?
MongoDB $text searches do not support partial matching. MongoDB allows support text search queries on string content with support for case insensitivity and stemming.
Looking at your examples:
// this returns nothing because there is no inferred association between
// the value: 'seasons' and your input: 'seaso'
db.X.find({$text: {$search: 'seaso'}})
// this returns a match because 'season' is seen as a stem of 'seasons'
db.X.find({$text: {$search: 'season'}})
So, this is not an inssue with the length of your input. Searching on seaso returns no matches because:
Your text index does not contain the whole word: seaso
Your text index does not contain a whole word for which seaso is a recognised stem
This presumes that the language of your text index is English, You can confirm this by runing db.X.getIndexes() and you'll see this in the definition of your text index:
"default_language" : "english"
FWIW, if your index is case insensitive then the following will also return matches:
db.X.find({$text: {$search: 'SEaSON'}})
db.X.find({$text: {$search: 'SEASONs'}})
Update 1: in repsonse to this question "is it possible to use RegExp".
Assuming the name attribute contains the value seasons and you are seaching with seaso then the following will match your document:
db.X.find({type: {$regex: /^seaso/}})
More details in the docs but ...
This will not use your text index so if you proceeed with using the $regex operator then you won't need the text index.
Index coverage with the $regex operator is probably not what you expect, the brief summary is this: if your search value is anchored (i.e. seaso, rather than easons) then MongoDB can use an index but otherwise it cannot.

MonogDB: How to get all the documents with specific value in a field?

I have an array of 10 unique Object IDs named Arr
I have 10,000 documents in a collection named xyz.
How can I find documents using Object IDs in the array Arr from the collection xyz with only one request?
There are $all and $in operators but are used to query fields with an array.
Or do I need to make requests equal to the length of Arr and get individual document using findOne?
EDIT:
I'm expecting something like this:
db.getCollection("xyz").find({"_id" : [array containing 10 unique IDs]})
....for which the result callback will contain an array of all the matched IDs of query array.
According to the documentation here: https://docs.mongodb.com/manual/reference/operator/query/in/
You should use the following query:
db.getCollection("xyz").find({"Arr" : { $in: [123, 456, 789 ] }});

Return array index from $elemMatch query

Say I have a document like this
{
title : 'myTitle',
favorites : [{name : 'text', number : 6}, {name : 'other', number : 4}]
}
I would like to return the array index (if any) from where the embedded document was retrieved within the favorites array.
Say I have the following query
{title : 'myTitle'}
and the projection
{favorites : {$elemMatch : {name : 'text', number : 6} }}
if the projection returns the document AND it contains a favorites array with the sub document, is there a way to know at which index that sub document was found? Which in this case is index 0.
The reason I would like the index is because as soon as I retrieve the document, I proceed to update it, and it would boost the performance if I had a specific index to update instead of using $elemMatch again which would cause mongo to iterate through all the array entries until finding the document that matches.
Unfortunately, it is not possible using the $elemMatch operator. If you are using mongodb 3.2, you could make use of the $unwind operator and aggregate instead of doing a find()
db.collection.aggregate([
{$match:{"favorites.name":"text","favorites.number":6}},
{$unwind:{"path":"$favorites","includeArrayIndex":"index"}},
{$match:{"favorites.name":"text","favorites.number":6}}
])
The document would be returned, with its array index in the field - index. If you want the entire document, with other array elements, you would have to $group, after $unwind instead of $match.
For previous versions, iterating the array in the client side, and getting the sub document's index would be the way to go.

mongo search sub string

Hello all I am working on some application using MongoDB and I have to find sub string in the collection.
I have a collection Query, as shown below.
> db.Query.find();
>{ "_id" : ObjectId("54c9ec8ead38d420d87743b0"), "QueryID" : 1, "QueryString" : "
List my games", "QueryFrequency" : 9, "QueryResultset" : 3 }
>...
Now I want to search sub string in the QueryString.
e.g. here "games" in "QueryString" : "List my games"
For this I enabled indexing on QueryString and after running the following command I am getting some results also.
> db.Query.runCommand("text" , {search : "games"});
"text" is name given to my index.
Now the problem is that I get result only when the word that I am searching has length greater than 3 (i.e. has more than 3 characters)
For my example I get results when I search with word "List" or "games",
But when I use "my" or any other word having less than 4 character gives no result.
Is there any way to solve this or am I missing some settings.
You can use regex for this. Your query would be like
db.collection.find({"query_string":{$regex: your_regex }})
For case in-sensitive search you could use this
db.users.find ({ "name" : /my/i } )
Where the i stands for insensitive

mongodb , wildcard in $in

I have a mongodb query where i want to get documents if a field has particular value.
db.collection.find({key:{$in:['value1','value2']}}) if i run above command i get documents containing either 'value1' or 'value2'. but lets just say there are no values. and i search db.collection.find({key:{$in:[]}}), nothing is displayed. and db.collection.find({key:{$in:[*]}}) gives unexpected token* which wild card do i use in $in to show all results.?
I think this is logically consistent behavior for $in. The query
db.collection.find({ "key" : { "$in" : [] } })
could be translated as "find all the documents where the value of key is one of the values contained in the array []". Since there are no values in the array [], there are no matching documents. If you want to find all of the extant values for key, use .distinct to return them as an array:
db.collection.distinct("key")
.distinct will use an index if possible.
If you want a query to match all documents, omit the query selector from .find:
db.collection.find()
as suggested in the comments.