searching for a phrase in Mongodb: weird behaviour

searching for a phrase in Mongodb: weird behaviour - mongodb

i have the following collection:
books> db.books.find()
[
{
_id: ObjectId("61ab85b0056b5357b5e23e6b"),
fields: 'hello good morning'
},
{ _id: ObjectId("61ab85b5056b5357b5e23e6c"), fields: 'good morning' },
{
_id: ObjectId("61ab8679056b5357b5e23e6d"),
fields: 'hello good morning guys'
},
{
_id: ObjectId("61ab8684056b5357b5e23e6e"),
fields: 'good morning guys'
}
]
Then, i run this query:
db.books.find({$text : {$search : "\"good morning\" hello"}})
and i get:
[
{ _id: ObjectId("61ab85b5056b5357b5e23e6c"), fields: 'good morning' },
{
_id: ObjectId("61ab8684056b5357b5e23e6e"),
fields: 'good morning guys'
},
{
_id: ObjectId("61ab85b0056b5357b5e23e6b"),
fields: 'hello good morning'
},
{
_id: ObjectId("61ab8679056b5357b5e23e6d"),
fields: 'hello good morning guys'
}
]
Can you help me to understand the output? The first result doesn't make much sense to me (document _id : ...23e6c), as it doesn't contain the "hello" string.
I read the first answer to this question. Does it mean that, in my case, mongodb is searching for
(good morning) AND (good OR morning OR hello)
It would explain my question, but i can't find the exact reference to this in the mongodb documentation
Thanks in advance

Directly from the docs:
A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase. See Behavior for more information on the field.
This is just the behavior a text index allows, What you can do it mark every single expression as a "phrase", like so:
{$text : {$search : "\"good morning\" \"hello\""}}
This will enforce $and logic.

Related

How does 'fuzzy' work in MongoDB's $searchBeta stage of aggregation?

I'm not quite understanding how fuzzy works in the $searchBeta stage of aggregation. I'm not getting the desired result that I want when I'm trying to implement full-text search on my backend. Full text search for MongoDB was released last year (2019), so there really aren't many tutorials and/or references to go by besides the documentation. I've read the documentation, but I'm still confused, so I would like some clarification.
Let's say I have these 5 documents in my db:
{
"name": "Lightning Bolt",
"set_name": "Masters 25"
},
{
"name": "Snapcaster Mage",
"set_name": "Modern Masters 2017"
},
{
"name": "Verdant Catacombs",
"set_name": "Modern Masters 2017"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
},
{
"name": "Battle of Wits",
"set_name": "Magic 2013"
}
And this is my aggregation in MongoDB Compass:
db.cards.aggregate([
{
$searchBeta: {
search: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
query: 'lightn',
path: ["name", "set_name"],
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);
What I'm expecting my result to be:
[
{
"name": "Lightning Bolt", //lightn is in 'Lightning'
"set_name": "Masters 25"
},
{
"name": "Chain Lightning", //lightn is in 'Lightning'
"set_name": "Battlebond"
}
]
What I actually get:
[] //empty array
I don't really understand why my result is empty, so it would be much appreciated if someone explained what I'm doing wrong.

What I think is happening:
db.cards.aggregate... is looking for documents in the "name" and "set_name" fields for words that have a max edit of one character variation from the "lightn" query. The documents that are in the cards collection contain edits that are greater than 2, and therefor your expected result is an empty array. "Fuzzy is used to find strings which are similar to the search term or terms"; used with maxEdits and prefixLength.
Have you tried the term operator with the wildcard option? I think the below aggregation would get you the results you were actually expecting.
e.g.
db.cards.aggregate([
{$searchBeta:
{"term":
{"path":
["name","set_name"],
"query": "l*h*",
"wildcard":true}
}}]).pretty()

You need to provide an index to use with your search query.
The index is basically the analyzer that your query will use to process your results regarding if you want to a full match of the text, or you want a partial match etc.
You can read more about Analyzers from here
In your case, an index based on STANDARD analyzer will help.
After you create your index your code, modified below, will work:
db.cards.aggregate([
{
$search:{
text: { //search has been deprecated, but it works in MongoDB Compass; replace with 'text'
index: 'index_name_for_analyzer (STANDARD in your case)'
query: 'lightn',
path: ["name"] //since you only want to search in one field
fuzzy: {
maxEdits: 1,
prefixLength: 2,
maxExpansion: 100
}
}
}
}
]);

Mongo query array.length within an array

I tried to use length on an array that is within another array and got an error:
db.quotes.find( { $where: "this.booksWithQuote.authors.length>0" } )
// fails with this.booksWithQuote.authors is undefined
I'm basing this syntax on this example MongoDB – find all documents where an array / list size is greater than N that works:
db.domain.find( {booksWithQuote: {$exists:true}, $where:'this.booksWithQuote.length>0'} )
// Above works
So I'm wondering if this is possible, how can I find all documents that have array1.array2 where array2 length is greater than zero.
I've tried the following but it fails with a syntax error:
db.quotes.find( {
"booksWithQuote": {$exists: true} },
"booksWithQuote.authors": {$exists: true} },
$where: "booksWithQuote.authors.length>0" } )
// fails with this.booksWithQuote.authors is undefined
It's worth pointing out that if I knew the author of a book, the nested array with array searching worked. Pretty cool!
db.quotes.find( { "booksWithQuote.authors" : "Sam Fisher" } )
// Returns all quotes that have a book that has the author Sam Fisher
But in my case I'm just trying to find all of the quotes that have more than one author on any given book.
To follow along, consider this example.
I have a collection of quotes, and each quote has a list of books where the quote was used, and each book has an array of authors.
Below is some sample data so you can understand the structure of the data. It shows one quote with no books, another quote with a book but no authors, and a third quote with books and several authors.
[
{
quoteText: "Love to code",
booksWithQuote: [ ]
},
{
quoteText: "Another Quotesake",
booksWithQuote: [
{
title: "Where is elmo",
authors: [ ]
}
]
},
{
quoteText: "For goodness sake",
booksWithQuote: [
{
title: "The search for Elmo",
authors: [
"John Smith",
"Sam Fisher",
"Jim Wiggins"
]
},
{
title: "Finding Elmo",
authors: [ "Sam Fisher" ]
},
{
title: "Mercy me",
authors: [ ]
}
]
}
]
So to reiterate, How can I find all documents that have an array within another array where the 2nd array has one or more elements?

You can use $exists operator and refer to first element of an array using dot notation:
db.col.find({ "booksWithQuote.authors.0": { $exists: true } })

MongoDB documentation: https://www.mongodb.com/docs/manual/reference/operator/query/expr/
you can use $expr to achieve this
$expr:{$gte: [{$size: "$selectedArray"}, size]}

mongodb get all keys within a string

Is it possible to search a string if I have some data stored like
Names:
{
name: 'john'
},
{
name: 'pete'
},
{
name: 'jack smith'
}
Then I perform a query like
{ $stringContainsKeys: 'pete said hi to jack smith' }
and it would return
{
name: 'pete'
},
{
name: 'jack smith'
}
I'm not sure that this is even possible in mongoDB or if this kind of searching has a specific name.

Yes, quite possible indeed through the use of the $text operator which performs a text search on the content of the fields indexed with a text index.
Suppose you have the following test documents:
db.collection.insert([
{
_id: 1, name: 'john'
},
{
_id: 2, name: 'pete'
},
{
_id: 3, name: 'jack smith'
}
])
First you need to create a text index on the name field of your document:
db.collection.createIndex( { "name": "text" } )
And then perform a logical OR search on each term of a search string which is space-delimited and returns documents that contains any of the terms
The following query searches specifies a $search string of six terms delimited by space, "pete said hi to jack smith":
db.collection.find( { "$text": { "$search": "pete said hi to jack smith" } } )
This query returns documents that contain either pete or said or hi or to or jack or smith in the indexed name field:
/* 0 */
{
"_id" : 3,
"name" : "jack smith"
}
/* 1 */
{
"_id" : 2,
"name" : "pete"
}

Starting from Mongodb 2.6 you can search mongodb collection to match any of the search terms.
db.names.find( { $text: { $search: "pete said hi to jack smith" } } )
This will search for each of the terms separated by space.
You can find more information about this at
http://docs.mongodb.org/manual/reference/operator/query/text/#match-any-of-the-search-terms
However, it will work only with individual terms. If you have to search for exact phrase which is not a single term, e.g. you want to find "jack smith', but not "smith jack", it will not work, so you will have to use search for a phrase.
http://docs.mongodb.org/manual/reference/operator/query/text/#search-for-a-phrase which searches for exact phrases in the text.
If you need more advanced text-based search features in your application, you might consider using something like Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/1.3/query-dsl-mlt-field-query.html.
Zoran

Query running very slow on big MongoDB db

I have a MongoDB db with a single rather large collection of documents (13GB for about 2M documents) sitting on a single server with 8GB RAM. Each document has a text field that can be relatively large (it can be a whole blog post) and the other fields are data about the text content and the text author. Here's what the schema looks like:
{
text: "Last night there was a storm in San Francisco...",
author: {
name: "Firstname Lastname",
website_url: "http://..."
},
date: "201403075612",
language: "en",
concepts: [
{name: "WeatherConcept", hit: "storm", start: 23, stop: 28},
{name: "LocationConcept", hit: "San Francisco", start: 32, stop: 45}
],
location: "us",
coordinates: []
}
I'm planning to query the data in different ways:
Full-text search on the "text" field. So let's say my text search query is q:
db.coll.aggregate([
{
$match:{
$text: {
$search:q
}
}
}
])
Aggregate documents by author:
db.coll.aggregate([
{
$project: {
name: "$author.name",
url: "$author.website_url"
}
},
{
$group: {
_id: "$name",
size: {
$sum:1
},
url: {
$first: "$url"
}
}
},
{
$sort:{
size:-1
}
}
])
Aggregate documents by concepts:
db.coll.aggregate([
{
$unwind: "$concepts"
},
{
$group: {
_id: "$concepts.name",
size: {
$sum:1
}
}
},
{
$sort:{
size:-1
}
}
])
These three queries may also include filtering on the following fields: date, location, coordinates, language, author.
I don't have indexes yet in place, so the queries run very slow. But since the indexes would be very different for the different ways I hit the data, does that rule out indexing as a solution? Or is there a way to index for all these cases and not have to shard the collection? Basically my questions are:
What would be a good indexing strategy in this case?
Do I need to create separate collections for authors and concepts?
Should I somehow restructure my data?
Do I need to shard my collection or is my 8GB single-server powerful enough to handle that data?

Do you have any indexes on your collection?
Have a look at the following
http://docs.mongodb.org/manual/indexes/
if you do have indexes make sure they are being hit by doing the following
db.CollectionName.find({"Concept":"something"}).explain();
You also need to give us more information about your setup. How much RAM does the server have? I've worked with a MongoDB that has 200GB sitting on 3 shards. So 13GB on 1 shouldn't be an issue

How can I retrieve all the fields when using $elemMatch?

Consider the following posts collection:
{
_id: 1,
title: "Title1",
category: "Category1",
comments: [
{
title: "CommentTitle1",
likes: 3
},
{
title: "CommentTitle2",
likes: 4
}
]
}
{
_id: 2,
title: "Title2",
category: "Category2",
comments: [
{
title: "CommentTitle3",
likes: 1
},
{
title: "CommentTitle4",
likes: 4
}
]
}
{
_id: 3,
title: "Title3",
category: "Category2",
comments: [
{
title: "CommentTitle5",
likes: 1
},
{
title: "CommentTitle6",
likes: 3
}
]
}
I want to retrieve all the posts, and if one post has a comment with 4 likes I want to retrieve this comment only under the "comments" array. If I do this:
db.posts.find({}, {comments: { $elemMatch: {likes: 4}}})
...I get this (which is exactly what I want):
{
_id: 1,
comments: [
{
title: "CommentTitle2",
likes: 4
}
]
}
{
_id: 2,
comments: [
{
title: "CommentTitle4",
likes: 4
}
]
}
{
_id: 3
}
But how can I retrieve the remaining fields of the documents without having to declare each of them like below? This way if added more fields to the post document, I wouldn't have to change the find query
db.posts.find({}, {title: 1, category: 1, comments: { $elemMatch: {likes: 4}}})
Thanks

--EDIT--
Sorry for the misread of your question. I think you'll find my response to this question here to be what you are looking for. As people have commented, you cannot project this way in a find, but you can use aggregation to do so:
https://stackoverflow.com/a/21687032/2313887
The rest of the answer stands as useful. So I think I'll leave it here
You must specify all of the fields you want or nothing at all when using projection.
You are asking here essentially that once you choose to alter the output of the document and limit how one field is displayed then can I avoid specifying the behavior. The bottom line is thinking of the projection part of a query argument to find just like SQL SELECT.It behaves in that * or all is the default and after that is a list of fields and maybe some manipulation of the fields format. The only difference is for _id which is always there by default unless specified otherwise by excluding it, i.e { _id: 0 }
Alternately if you want to filter the collection you nee to place your $elemMatch in thequery itself. The usage here in projection is to explicitly limit the returned document to only contain the matching elements in the array.
Alter your query:
db.posts.find(
{ comments: { $elemMatch: {likes: 4}}},
{ title: 1, category: 1, "comments.likes.$": 1 }
)
And to get what you want we use the positional $ operator in the projection portion of the find.
See the documentation for the difference between the two usages:
http://docs.mongodb.org/manual/reference/operator/query/elemMatch/
http://docs.mongodb.org/manual/reference/operator/projection/elemMatch/

This question is pretty old, but I just faced the same issue and I didn't want to use the aggregation pipeline as it was simple query and I only needed to get all fields applying an $elemMatch to one field.
I'm using Mongoose (which was not the original question but it's very frequent these days), and to get exactly what the question said (How can I retrieve all the fields when using $elemMatch?) I made this:
const projection = {};
Object.keys(Model.schema.paths).forEach(key => {
projection[key] = 1;
});
projection.subfield = { $elemMatch: { _id: subfieldId } };
Model.find({}, projection).then((result) => console.log({ result });

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse