mongodb get all keys within a string - mongodb

Is it possible to search a string if I have some data stored like
Names:
{
name: 'john'
},
{
name: 'pete'
},
{
name: 'jack smith'
}
Then I perform a query like
{ $stringContainsKeys: 'pete said hi to jack smith' }
and it would return
{
name: 'pete'
},
{
name: 'jack smith'
}
I'm not sure that this is even possible in mongoDB or if this kind of searching has a specific name.

Yes, quite possible indeed through the use of the $text operator which performs a text search on the content of the fields indexed with a text index.
Suppose you have the following test documents:
db.collection.insert([
{
_id: 1, name: 'john'
},
{
_id: 2, name: 'pete'
},
{
_id: 3, name: 'jack smith'
}
])
First you need to create a text index on the name field of your document:
db.collection.createIndex( { "name": "text" } )
And then perform a logical OR search on each term of a search string which is space-delimited and returns documents that contains any of the terms
The following query searches specifies a $search string of six terms delimited by space, "pete said hi to jack smith":
db.collection.find( { "$text": { "$search": "pete said hi to jack smith" } } )
This query returns documents that contain either pete or said or hi or to or jack or smith in the indexed name field:
/* 0 */
{
"_id" : 3,
"name" : "jack smith"
}
/* 1 */
{
"_id" : 2,
"name" : "pete"
}

Starting from Mongodb 2.6 you can search mongodb collection to match any of the search terms.
db.names.find( { $text: { $search: "pete said hi to jack smith" } } )
This will search for each of the terms separated by space.
You can find more information about this at
http://docs.mongodb.org/manual/reference/operator/query/text/#match-any-of-the-search-terms
However, it will work only with individual terms. If you have to search for exact phrase which is not a single term, e.g. you want to find "jack smith', but not "smith jack", it will not work, so you will have to use search for a phrase.
http://docs.mongodb.org/manual/reference/operator/query/text/#search-for-a-phrase which searches for exact phrases in the text.
If you need more advanced text-based search features in your application, you might consider using something like Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/1.3/query-dsl-mlt-field-query.html.
Zoran

Related

MongoDB Atlas Search - Multiple terms in search-string with 'and' condition (not 'or')

In the documentation of MongoDB Atlas search, it says the following for the autocomplete operator:
query: String or strings to search for. If there are multiple terms in
a string, Atlas Search also looks for a match for each term in the
string separately.
For the text operator, the same thing applies:
query: The string or strings to search for. If there are multiple
terms in a string, Atlas Search also looks for a match for each term
in the string separately.
Matching each term separately seems odd behaviour to me. We need multiple searches in our app, and for each we expect less results the more words you type, not more.
Example: When searching for "John Doe", I expect only results with both "John" and "Doe". Currently, I get results that match either "John" or "Doe".
Is this not possible using MongoDB Atlas Search, or am I doing something wrong?
Update
Currently, I have solved it by splitting the search-term on space (' ') and adding each individual keyword to a separate must-sub-clause (with the compound operator). However, then the search query no longer returns any results if there is one keyword with only one character. To account for that, I split keywords with one character from those with multiple characters.
The snippet below works, but for this I need to save two generated fields on each document:
searchString: a string with all the searchable fields concatenated. F.e. "John Doe Man Streetstreet Citycity"
searchArray: the above string uppercased & split on space (' ') into an array
const must = [];
const searchTerms = 'John D'.split(' ');
for (let i = 0; i < searchTerms.length; i += 1) {
if (searchTerms[i].length === 1) {
must.push({
regex: {
path: 'searchArray',
query: `${searchTerms[i].toUpperCase()}.*`,
},
});
} else if (searchTerms[i].length > 1) {
must.push({
autocomplete: {
query: searchTerms[i],
path: 'searchString',
fuzzy: {
maxEdits: 1,
prefixLength: 4,
maxExpansions: 20,
},
},
});
}
}
db.getCollection('someCollection').aggregate([
{
$search: {
compound: { must },
},
},
]).toArray();
Update 2 - Full example of unexpected behaviour
Create collection with following documents:
db.getCollection('testing').insertMany([{
"searchString": "John Doe ExtraTextHere"
}, {
"searchString": "Jane Doe OtherName"
}, {
"searchString": "Doem Sarah Thisistestdata"
}])
Create search index 'default' on this collection:
{
"mappings": {
"dynamic": false,
"fields": {
"searchString": {
"type": "autocomplete"
}
}
}
}
Do the following query:
db.getCollection('testing').aggregate([
{
$search: {
autocomplete: {
query: "John Doe",
path: 'searchString',
fuzzy: {
maxEdits: 1,
prefixLength: 4,
maxExpansions: 20,
},
},
},
},
]).toArray();
When a user searches for "John Doe", this query returns all the documents that have either "John" OR "Doe" in the path "searchString". In this example, that means all 3 documents. The more words the user types, the more results are returned. This is not expected behaviour. I would expect more words to match less results because the search term gets more precise.
An edgeGram tokenization strategy might be better for your use case because it works left-to-right.
Try this index definition take from the docs:
{
"mappings": {
"dynamic": false,
"fields": {
"searchString": [
{
"type": "autocomplete",
"tokenization": "edgeGram",
"minGrams": 3,
"maxGrams": 10,
"foldDiacritics": true
}
]
}
}
}
Also, add change your query clause from must to filter. That will exclude the documents that do not contain all the tokens.

MongoDB text index wildcard weight

Suppose I have sample db structure
[
{ name: 'hello world', description: { key: 'something' } },
{ name: 'user', description: { key: 'hello world' } },
]
with index
db.fulltext.createIndex({ name: 'text', '$**': 'text' }, { weights: { name: 10, '$**': 5 } })
I am finding documents with the query
db.fulltext.find({ $text: { $search: 'hello world' } }, { score: { $meta: 'textScore' } })
But... It gives me 15.0 score for both documents... It's impossible to add weight to wildcard operator? Why second document multiply score from name key?
The wildcard index "$**" includes all the string fields in the document in the text index. In the above scenario, name is a string attribute for which weight was given as 10 and in general all string fields weight was assigned as 5 (including name field because wild card is used). So, the weight is overridden.
When the text search is done, equal weightage is given for all String fields. So, the score is same for both the documents as there is no relative significance to the other indexed fields (i.e. because the wild card was used while creating the index).
The $text operator assigns a score to each document that contains the
search term in the indexed fields. The score represents the relevance
of a document to a given text search query.
When different weight is need for different fields, you need to provide the field names specifically while creating the index. In other words, you should not provide a weight for a String field and include wild card weight for all string fields. Obviously, one weight will override the other.
If you can change the index as mentioned below, you can see the difference.
Create Index:-
db.fulltext.createIndex({ name: 'text', 'description.key' : 'text' }, { weights: { name: 10, 'description.key' : 5 } })
Search:-
db.fulltext.find({ $text: { $search: 'hello world' } }, { score: { $meta: 'textScore' } })
Result:-
{
"_id" : ObjectId("57e119cbf522cc85b5595797"),
"name" : "hello world",
"description" : {
"key" : "something"
},
"score" : 15
}
{
"_id" : ObjectId("57e119cbf522cc85b5595798"),
"name" : "user",
"description" : {
"key" : "hello world"
},
"score" : 7.5
}

Text query through referenced objects with MongoDB

I have the following structure.
books collection:
{
_id: "book_1",
title: "How to build a house",
authorId: "author_1"
}
{
_id: "book_2",
title: "How to plant a tree",
authorId: "author_2"
}
authors collection:
{
_id: "author_1",
name: "Adam Adamson"
}
{
_id: "author_2",
name: "Brent Brentson"
}
I want to make a case insensitive free text search with the string "b" through the books collection and find all books that either has the "b" in the title or has an author with "b" in the name.
I can embed the author in the book object just to be able to make the query. But if the author name changes in the authors collection, the embedded authors object will have the wrong name.
{
_id: "book_2",
title: "How to plant a tree",
authorId: "author_2",
author:
{
name: "Brent Brentson"
}
}
What would be a good way to solve this problem?
You could use the following queries where the first gets the array of author ids that match the given regex expression query on the authors collection (using the map() method of the find() cursor) and the second query applies that array in the books collection query using the $in operator as well as using the regex pattern to find books that have "b" in the title:
var authorIds = db.authors.find({"name": /b/i}).map(function (doc) {return doc._id});
db.books.find({$or: [{"title": /b/i}, {"authorId": {"$in": authorIds} }]})
Result:
/* 0 */
{
"_id" : "book_1",
"title" : "How to build a house",
"authorId" : "author_1"
}
/* 1 */
{
"_id" : "book_2",
"title" : "How to plant a tree",
"authorId" : "author_2"
}
-- UPDATE --
Thanks to #yogesh for suggesting another approach which uses the distinct() method to get the author ids list:
var authorIds = db.authors.distinct("_id", {"name": /b/i})

How to return the results of a query using find() for a deep search level with MongoDB?

I have the following structure of a collection:
teams: [{
name: String,
address: String,
players: [{
name: String,
age: Integer
}]
}]
I am trying to search by 'name' of the player. I am trying this:
db.teams.find({ players: { $elemMatch: { name: /john/ } } }, { 'players.$':1 })
This returns some results, but not all. And this:
db.teams.find({ players: { $elemMatch: { name: /john/ } } })
This last returns all the records of the collection.
I need to find by a player with its name as parameter.
What is right query?
This is a common misconception about MongoDB. Queries in MongoDB do not match subdocuments; they do not return subdocuments. Queries match and return documents in the collection, with returned fields adjustable via projection. With your document structure, you cannot write a query that matches and returns one of the subdocuments in the array. You can (and have, in your examples), written a query to match documents containing an array element that matches a condition, and then project just the first matching element in the array. It's important to understand the difference. If you want to match and return player documents, your collection should contain player documents:
{
"name" : "John Smalls",
"age" : 29,
"team_name" : "The Flying Sharks",
"team_address" : "123 Fake Street"
}
The player documents reference their team. To find a player by name,
db.players.find({ "name" : /^John"/ })
To find all the players on a team,
db.players.find({ "team_name" : "The Flying Sharks" })
Your life will be much easier if you search for players by querying for player documents instead of trying to match team documents.

Query for any nested subdocuments

I would like to perform a query for a given nested value on multiple subdocuments.
In the example below, I would like to perform the search across multiple "product_types" objects.
{
"product_types": {
"type_1": [
{
name: "something",
price: 100
},
{
name: "something else",
price: 50
}
],
"type_2": [
{
name: "another one",
price: 20
},
{
name: "and a last one",
price: 30
}
]
}
}
I understood the dollar sign matches any subdocument. Here is what I came up with to get all the product with a "price" value of 100. But it doesn't work. Any idea?
db.inventory.find( { product_types.$.price : 100 } )
PS: I anticipate on some answers saying that such a db design to store products would be very bad and I agree; this is just an example to illustrate the kind of query I want to perform.
MongoDB doesn't support any sort of wildcard property like you're trying to do here with $. However, you can search multiple properties using an $or operator:
db.inventory.find({ $or: [
{ product_types.type_1.price: 100 },
{ product_types.type_2.price: 100 }
]})
But this is going to return the matching documents in full rather than just the matched array elements, so you'll also have to post-process the docs in your code to pull those out.
Searching with embedded documents can be done as
db.inventory.find({"product_types.type_1.price":100})
Field name should be inside " "! Otherwise it will throw syntax error.