Run text search on unwinded documents in MongoDB - mongodb

I have a collection that looks something like this:
{
"strings" : [
"Hello world",
"Error connecting to server",
"Dbus error"
],
"file" : "test.c"
}
{
"strings" : [
"Could not configure supporting library.",
"Assertion failed",
"Error in server response"
],
"file" : "run.c"
}
...
There are ~250k such documents, each having its own array of strings and a unique file name. At query time, I have a string which can be a sub-string of one of the strings of one of the docs, or can contain some of the strings from some of the docs (i.e. it will never match exactly, any of the strings from any document).
Examples:
Hello world hi there // Should match test.c
Error connecting // test.c
Error connecting to server: 107 // test.c
Assertion failed: Dbus error // Should match both test.c and run.c
I need to retrieve the document whose "strings" field contains a string that matches the most with my query string.
I have tried indexed text search:
db.testcoll.find( { $text: { $search: "Could not configure supporting library." } } , {"file": 1, "_id": 0, score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}})
However, the problem with this is that there can be a document having several strings that partially match my query string, so the document having the closest matching string still gets a lower match score compared to the other document.
I tried unwinding the 'strings' field in aggregation, so that each document will have only one string and I'd expect the closest matching one to get the highest score. However, mongodb only allows a $match having a $text if it's the first pipeline stage so I couldn't have the unwind step before the it.
db.testcoll.aggregate([{$project: {file: 1, strings: 1}}, {$unwind: "$strings"}, {$match: { $text: { $search: "Could not configure supporting library." } }}])
Output:
2019-05-27T11:36:56.671+0530 E QUERY [js] Error: command failed: {
"ok" : 0,
"errmsg" : "$match with $text is only allowed as the first pipeline stage",
"code" : 17313,
"codeName" : "Location17313"
} : aggregate failed
Is there a way to perform text search on the documents returned by unwind? If not, is there any way to perform this kind of a text search in mongodb?
Thanks in advance for any help.

Related

Bad Value needs an array in mongoDB

I am trying to use $in inside $match condition of aggregate. I have SegmentList as literal array and Segment as a literal value. I am trying to match with $in condition as in query below.
db.collection.aggregate([
{'$match': {
'MaterialNumber': '867522'
}},
{'$project':{
'SegmentList': {'$literal':['A']},
'Segment':{'$literal':'A'},
'value':{'$literal':'B'},
}
},
{'$match': {
'Segment':{'$in':'$SegmentList'}
}
}
])
But I am getting an error.
assert: command failed: {
"ok" : 0,
"errmsg" : "bad query: BadValue: $in needs an array",
"code" : 16810
} : aggregate failed
I am not not able to understand what is problem
Not sure what you are intending to acheive here as all you are doing comparing if 'A' is in array with element 'A' which is always true.
$match in its regular form only takes a value not field reference.
No need to use $literal you can directly pass [].
You can use one of the following query based on mongo version.
3.4 & below:
{'$match': {'Segment':{'$in':['A']}}
3.6 ( not needed for your use case )
{'$match': {'$expr':{'$in':['$Segment', ['A']]}}}

Mongo DB find() query error

I am new to MongoDB. I have a collection called person. I'm trying to get all the records without an _id field with this query:
db.person.find({}{_id:0})
but the error is
syntax error: unexpected {
but if i write
db.person.find()
it works perfectly.
Consider following documents inserted in person collection as
db.person.insert({"name":"abc"})
db.person.insert({"name":"xyz"}
If you want to find exact matching then use query as
db.person.find({"name":"abc"})
this return only matched name documents
If you want all names without _id then use projeciton id query as
db.person.find({},{"_id":0})
which return
{ "name" : "abc" }
{ "name" : "xyz" }
According to Mongodb manual you have little wrong syntax, you forgot to give comma after {}
Try this :
db.person.find({}, { _id: 0 } )

compare multiple value in same document in mongodb

My question was, how to find all title of the releases contains the artist name
Here is the 2 Documents for just an example:
`{"release" : {"title" : "DEF day",
"artists" : {"artist" : {"role" : "1","name" : "DEF"}}}
}
{ "release" : {"title" : "XYZ day",
"artists" : {"artist" : {"role" : "1","name" : "KYC"}}}
}`
when i run this following query:
`db.test.find({$where:
"this.release.title.indexOf(this.release.artists.artist.name) > -1"
})`
I get the result "DEF day", so it works excellent!!
BUT, once i run this query to my original data(same format like above) i get this:
`error: {
"$err" : "TypeError: Object 30 has no method 'indexOf'\n at _funcs1 (_funcs1:1:49) near 'his.release.artists.a' ",
"code" : 16722
}`
My collection is big (8GB+).
Just looking at the above error, can anyone tell what may be the problem with the above query? your answer is appreciated. please.
The problem is in the data itself. Somewhere you have a document, where the title field is not String, thus invoking indexOf on the whole data, on that particular document you get the error.
One thing you can do to solve this problem is first of all find that erroneous document. To do that you will need to run a simple query.
MongoDB has the $type operator, which matches fields by their type. And here is the list of all available BSON types a MongoDB document field can have.
The id of String BSON type is 2, thus you need all the documents that have type of title field not equal to 2.
find({field: {$not: {$type: <BSON type>}}}))
After finding the document(s), you may fix or remove them.

In MongoDB, can I project whether a value has a substring to a new field?

In MongoDB, I want to group my documents based on whether a certain field has a certain substring. I was trying to project each document to a boolean that says whether the field has that substring/matches that pattern, and then group by that field.
How can I do this?
Edit:
1) I have tried the aggregation pipeline as
db.aggregate([
{
'$match': {
'$regex': '.*world'
}
}
])
But I get an error:
pymongo.errors.OperationFailure: command SON([('aggregate', u'test'), ('pipeline', [{'$match': {'$regex': '.*world'}}])]) failed: exception: bad query: BadValue unknown top level operator: $regex
2) I am NOT trying to select only the words that match the pattern. I would like to select every document. I would like to make a field called 'matched' which says whether the pattern was matched for each element.
The $regex operator needs to appear after the field that you are matching on:
db.aggregate([
{
"$match": {
"fieldName": {"$regex": ".*world"}
}
}
])
The "unknown top level operator" message means that the operator has to appear after the field to which it applies.

Mongo db pulling part of the document

I've a document which has another nested document in it represent a changes (logging). Each change document has a timestamp, field, old and new values. Basically as you can see this document would grow a lot, and I really don't want to get all changes but only a recent one.
What I want to do is make a query and get only those changes which falls in between two time stamps. I am not interested in any other information in document, so I dont want to pull it, just recent changes.
{
.......
"adwordsChanges":[
{
"timestamp":NumberLong("1400162491325"),
"field":"syncState",
"old":null,
"new":"OK"
},
{
"timestamp":NumberLong("1400162491325"),
"field":"keywordId",
"old":null,
"new":NumberLong("23918779329")
},
{
"timestamp":NumberLong("1400162491325"),
"field":"adGroupId",
"old":null,
"new":NumberLong("16972286472")
}
]
}
This is what I've tried!
db.keywords.find(
{
$and :[{"_id" : ObjectId("5374c7a7ac58b0d3b5e970fa")}, {"adwordsChanges.field" : "keywordId"}, {"adwordsChanges.timestamp" : {$gte:NumberLong("11111111"), $lte:NumberLong(99999999999999) }}]
})
Running Andrei's query I am getting compilation error:
assert: command failed: {
"errmsg" : "exception: The top-level _id field is the only field currently supported for exclusion",
"code" : 16406,
"ok" : 0
} : aggregate failed
Error: command failed: {
"errmsg" : "exception: The top-level _id field is the only field currently supported for exclusion",
"code" : 16406,
"ok" : 0
} : aggregate failed
at Error (<anonymous>)
at doassert (src/mongo/shell/assert.js:11:14)
at Function.assert.commandWorked (src/mongo/shell/assert.js:244:5)
at DBCollection.aggregate (src/mongo/shell/collection.js:1149:12)
at (shell):1:13
2014-05-27T15:23:54.945+0100 Error: command failed: {
"errmsg" : "exception: The top-level _id field is the only field currently supported for exclusion",
"code" : 16406,
"ok" : 0
Thanks for any help!
You could use aggregation framework to filter subdocuments, like this:
db.keywords.aggregate({$unwind:"$adwordsChanges"},
{$match:{"adwordsChanges.timestamp" :
{$gte:1400162491325, $lte:23918779329},
"adwordsChanges.field" : "keywordId"}},
{$project:{_id:0,
timestamp:"$adwordsChanges.timestamp",
field:"$adwordsChanges.field",
old:"$adwordsChanges.old",
new:"$adwordsChanges.new"
}});
Explanation of difference between project and group
Initially I thought original document should be returned and group was used to restore original document structure, i.e in this case groupis opposite operation to unwind. After clarification it became clear that only subdocuments were needed, then I replaced group operation with project, so that subdocuments were projected to root level.