MongoDB Query doesn't return with a sort - mongodb

I have the query:
db.changes.find(
{
$or: [
{ _id: ObjectId("60b1e8dc9d0359001bb80441") },
{ _oid: ObjectId("60b1e8dc9d0359001bb80441") },
],
},
{
_id: 1,
}
);
which returns almost instantly.
But the moment I add a sort, the query doesn't return. The query just runs. The longest I could tolerate the query running was over 30 Min, so I'm not entirely sure if it does eventually return.
db.changes
.find(
{
$or: [
{ _id: ObjectId("60b1e8dc9d0359001bb80441") },
{ _oid: ObjectId("60b1e8dc9d0359001bb80441") },
],
},
{
_id: 1,
}
)
.sort({ _id: -1 });
I have the following indexes:
[
{
"_oid" : 1
},
{
"_id" : 1
}
]
and this is what db.currentOp() returns:
{
"host": "xxxx:27017",
"desc": "conn387",
"connectionId": 387,
"client": "xxxx:55802",
"appName": "MongoDB Shell",
"clientMetadata": {
"application": {
"name": "MongoDB Shell"
},
"driver": {
"name": "MongoDB Internal Client",
"version": "4.0.5-18-g7e327a9017"
},
"os": {
"type": "Linux",
"name": "Ubuntu",
"architecture": "x86_64",
"version": "20.04"
}
},
"active": true,
"currentOpTime": "2021-09-24T15:26:54.286+0200",
"opid": 71111,
"secs_running": NumberLong(23),
"microsecs_running": NumberLong(23860504),
"op": "query",
"ns": "myDB.changes",
"command": {
"find": "changes",
"filter": {
"$or": [
{
"_id": ObjectId("60b1e8dc9d0359001bb80441")
},
{
"_oid": ObjectId("60b1e8dc9d0359001bb80441")
}
]
},
"sort": {
"_id": -1.0
},
"projection": {
"_id": 1.0
},
"lsid": {
"id": UUID("38c4c09b-d740-4e44-a5a5-b17e0e04f776")
},
"$readPreference": {
"mode": "secondaryPreferred"
},
"$db": "myDB"
},
"numYields": 1346,
"locks": {
"Global": "r",
"Database": "r",
"Collection": "r"
},
"waitingForLock": false,
"lockStats": {
"Global": {
"acquireCount": {
"r": NumberLong(2694)
}
},
"Database": {
"acquireCount": {
"r": NumberLong(1347)
}
},
"Collection": {
"acquireCount": {
"r": NumberLong(1347)
}
}
}
}
This wasn't always a problem, it's only recently started. I've also rebuilt the indexes, and nothing seems to work. I've tried using .explain(), and that also doesn't return.
Any suggestions would be welcome. For my situation, it's going to be much easier to make changes to the DB than it is to change the query.

This is happening due to the way Mongo chooses what's called a "winning plan", I recommend you read more on this in my other answer which explains this behavior. However it is interesting to see if the Mongo team will consider this specific behavior a feature or a bug.
Basically the $or operator has some special qualities, as specified:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
It seems that the addition of the sort is disrupting the usage this quality, meaning you're running a collection scan all of a sudden.
What I recommend you do is use the aggregation pipeline instead of the query language, I personally find it has more stable behavior and it might work there. If not maybe just do the sorting in code ..

The server can use a separate index for each branch of the $or, but in order to avoid doing an in-memory sort the indexes used would have to find the documents in the sort order so a merge-sort can be used instead.
For this query, an index on {_id:1} would find documents matching the first branch, and return them in the proper order. For the second branch, and index on {oid:1, _id:1} would do the same.
If you have both of those indexes, the server should be able to find the matching documents quickly, and return them without needing to perform an explicit sort.

Related

MongoDB Atlas Search not showing results when typing few characters

The problem I am facing is that I want to develop an autocomplete search bar using Mean Stack like the one in this site, but when I type, for example, 'ag' it's not returning the right location that should be 'Aguascalientes'.
I have two different search indexes set up and a different query for each.
First Index:
{
"mappings": {
"dynamic": false,
"fields": {
"name": {
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 3,
"tokenization": "edgeGram",
"type": "autocomplete"
},
"searchName": {
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 3,
"tokenization": "edgeGram",
"type": "autocomplete"
}
}
}
}
First Query:
[
{
$search: {
index: "autocomplete2",
compound: {
must: [
{
text: {
query: search,
path: "searchName",
fuzzy: {
maxEdits: 2,
},
},
},
],
},
},
},
{
$limit: 10,
},
]
The first ones are not returning any document at all. But the second example is:
{
"mappings": {
"dynamic": false,
"fields": {
"name": {
"analyzer": "lucene.standard",
"type": "string"
},
"searchName": {
"analyzer": "lucene.standard",
"type": "string"
}
}
}
}
Query:
[
{
$search: {
index: 'default',
compound: {
must: [
{
text: {
query: search,
path: 'name',
fuzzy: {
maxEdits: 1,
},
},
},
{
text: {
query: search,
path: 'searchName',
fuzzy: {
maxEdits: 1,
},
},
},
],
},
},
},
{
$limit: 5,
},
]
The second example is only returning documents if the search term 'aguascalient' but is not returning any document if the search term is shorter like the site. Maybe it has something to do with the fuzzy edits but if I set it up to greater than 2 I get an error.
Also the order is not right, it returns first the CITY and second the STATE but I need the STATE first because the search term is more similar than the city. Let me explain, search field for STATE is only 'Aguascalientes' but search field cities is 'Aguascalientes Aguascalientes' so I don't know why is not working properly. Maybe in that case I should give weights accordingly but I'm not sure if it's the right approach to solve this.
My data structure:
{
"_id": "638d0ffc34ad076c6bd12cb6",
"depth": 2,
"label": "CITY",
"location_id": "V1-C-247",
"name": "Aguascalientes",
"parent": "Aguascalientes",
"fullName": "Aguascalientes, Aguascalientes",
"parentId": "V1-B-61",
"searchName": "Aguascalientes Aguascalientes",
}
{
"_id": "638d0ffc34ad076c6bd12cb6",
"depth": 1,
"label": "STATE",
"location_id": "V1-C-248",
"name": "Aguascalientes",
"parent": null,
"fullName": "Aguascalientes",
"parentId": null,
"searchName": "Aguascalientes",
}
For the first index + query setup:
First, you are indexing the name field but are not searching on it. I will remove it from the code snippets for readability, but you can add it back to your index definition if you find you need to search on it.
There are two problems with the this index + query setup if you want to return results with a query for "ag". You have searchName defined as a field mapping of type autocomplete, but you also need to use the autocomplete operator in your query:
[
{
$search: {
index: "autocomplete2",
compound: {
must: [
{
autocomplete: {
query: search,
path: "searchName",
},
},
],
},
},
},
{
$limit: 10,
},
]
Second, in your index definition field mapping for searchName, you have minGram set to 3 and maxGram set to 7. Based on the documentation for the autocomplete field mapping, this means that your data will be tokenized into sequences of character lengths between 3 to 7, using the selected tokenization strategy. Since you have selected edgeGram, the tokens generated by the text "Aguascalientes" will be tokenized starting from the left edge, resulting in tokens "agu", "agua", "aguas", "aguasc", "aguasca". Since the search term "ag" does not match any of the tokens, nothing is returned. So, you must change the minGram to 2 to get the token "ag":
{
"mappings": {
"dynamic": false,
"fields": {
"searchName": {
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 2,
"tokenization": "edgeGram",
"type": "autocomplete"
}
}
}
}
Finally, if you want the document with an exact match to return over a partial match, ie. "Aguascalientes" should return before "Aguascalientes Aguascalientes", you need to implement exact matching. Here is a MongoDB blog post outlining a few options.
One option that I tried: In the index, use a keyword analyzer on the "searchName" field typed as a string data type. In the query, use the text operator nested in a should clause so that exact matches will return higher than other results.
Index:
{
"mappings": {
"dynamic": false,
"fields": {
"searchName": [
{
"foldDiacritics": false,
"maxGrams": 7,
"type": "autocomplete"
},
{
"analyzer": "lucene.keyword",
"searchAnalyzer": "lucene.keyword",
"type": "string"
}
]
}
}
}
Query:
[
{
$search: {
compound: {
must: [
{
autocomplete: {
query: search,
path: "searchName"
}
}
],
should:[
{
text: {
query: search,
path: "searchName"
}
}
],
},
},
},
]

MongoDB: $set specific fields for a document array elements only if not null

I have a collection with the following documents (for example):
{
"_id": {
"$oid": "61acefe999e03b9324czzzzz"
},
"matchId": {
"$oid": "61a392cc54e3752cc71zzzzz"
},
"logs": [
{
"actionType": "CREATE",
"data": {
"talent": {
"talentId": "qq",
"talentVersion": "2.10",
"firstName": "Joelle",
"lastName": "Doe",
"socialLinks": [
{
"type": "FACEBOOK",
"url": "https://www.facebook.com"
},
{
"type": "LINKEDIN",
"url": "https://www.linkedin.com"
}
],
"webResults": [
{
"type": "VIDEO",
"date": "2021-11-28T14:31:40.728Z",
"link": "http://placeimg.com/640/480",
"title": "Et necessitatibus",
"platform": "Repellendus"
}
]
},
"createdBy": "DEVELOPER"
}
},
{
"actionType": "UPDATE",
"data": {
"talent": {
"firstName": "Joelle new",
"webResults": [
{
"type": "VIDEO",
"date": "2021-11-28T14:31:40.728Z",
"link": "http://placeimg.com/640/480",
"title": "Et necessitatibus",
"platform": "Repellendus"
}
]
}
}
}
]
},
{
"_id": {
"$oid": "61acefe999e03b9324caaaaa"
},
"matchId": {
"$oid": "61a392cc54e3752cc71zzzzz"
},
"logs": [....]
}
a brief breakdown: I have many objects like this one in the collection. they are a kind of an audit log for actions takes on other documents, 'Match(es)'. for example CREATE + the data, UPDATE + the data, etc.
As you can see, logs field of the document is an array of objects, each describing one of these actions.
data for each action may or may not contain specific fields, that in turn can also be an array of objects: socialLinks and webResults.
I'm trying to remove sensitive data from all of these documents with specified Match ids.
For each document, I want to go over the logs array field, and change the value of specific fields only if they exist, for example: change firstName to *****, same for lastName, if those appear. also, go over the socialLinks array if exists, and for each element inside it, if a field url exists, change it to ***** as well.
What I've tried so far are many minor variations for this query:
$set: {
'logs.$[].data.talent.socialLinks.$[].url': '*****',
'logs.$[].data.talent.webResults.$[].link': '*****',
'logs.$[].data.talent.webResults.$[].title': '*****',
'logs.$[].data.talent.firstName': '*****',
'logs.$[].data.talent.lastName': '*****',
},
and some play around with this kind of aggregation query:
[{
$set: {
'talent.socialLinks.$[el].url': {
$cond: [{ $ne: ['el.url', null] },'*****', undefined],
},
},
}]
resulting in errors like: message: "The path 'logs.0.data.talent.socialLinks' must exist in the document in order to apply array updates.",
But I just cant get it to work... :(
Would love an explanation on how to exactly achieve this kind of set-only-if-exists behaviour.
A working example would also be much appreciated, thx.
Would suggest using $\[<indentifier>\] (filtered positional operator) and arrayFilters to update the nested document(s) in the array field.
In arrayFilters, with $exists to check the existence of the certain document which matches the condition and to be updated.
db.collection.update({},
{
$set: {
"logs.$[a].data.talent.socialLinks.$[].url": "*****",
"logs.$[b].data.talent.webResults.$[].link": "*****",
"logs.$[b].data.talent.webResults.$[].title": "*****",
"logs.$[c].data.talent.firstName": "*****",
"logs.$[d].data.talent.lastName": "*****",
}
},
{
arrayFilters: [
{
"a.data.talent.socialLinks": {
$exists: true
}
},
{
"b.data.talent.webResults": {
$exists: true
}
},
{
"c.data.talent.firstName": {
$exists: true
}
},
{
"d.data.talent.lastName": {
$exists: true
}
}
]
})
Sample Mongo Playground

What is the best way to query an array of subdocument in MongoDB?

let's say I have a collection like so:
{
"id": "2902-48239-42389-83294",
"data": {
"location": [
{
"country": "Italy",
"city": "Rome"
}
],
"time": [
{
"timestamp": "1626298659",
"data":"2020-12-24 09:42:30"
}
],
"details": [
{
"timestamp": "1626298659",
"data": {
"url": "https://example.com",
"name": "John Doe",
"email": "john#doe.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "https://www.myexample.com",
"name": "John Doe",
"email": "doe#john.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "http://example.com/sub/directory",
"name": "John Doe",
"email": "doe#johnson.com"
}
}
]
}
}
Now the main focus is on the array of subdocument("data.details"): I want to get output only of relevant matches e.g:
db.info.find({"data.details.data.url": "example.com"})
How can I get a match for all "data.details.data.url" contains "example.com" but won't match with "myexample.com". When I do it with $regex I get too many results, so if I query for "example.com" it also return "myexample.com"
Even when I do get partial results (with $match), It's very slow. I tried this aggregation stages:
{ $unwind: "$data.details" },
{
$match: {
"data.details.data.url": /.*example.com.*/,
},
},
{
$project: {
id: 1,
"data.details.data.url": 1,
"data.details.data.email": 1,
},
},
I really don't understand the pattern, with $match, sometimes Mongo do recognize prefixes like "https://" or "https://www." and sometime it does not.
More info:
My collection has dozens of GB, I created two indexes:
Compound like so:
"data.details.data.url": 1,
"data.details.data.email": 1
Text Index:
"data.details.data.url": "text",
"data.details.data.email": "text"
It did improve the query performance but not enough and I still have this issue with the $match vs $regex. Thanks for helpers!
Your mistake is in the regex. It matches all URLs because the substring example.com is in all URLs. For example: https://www.myexample.com matches the bolded part.
To avoid this you have to use another regex, for example that just start with that domain.
For example:
(http[s]?:\/\/|www\.)YOUR_SEARCH
will check that what you are searching for is behind an http:// or www. marks.
https://regex101.com/r/M4OLw1/1
I leave you the full query.
[
{
'$unwind': {
'path': '$data.details'
}
}, {
'$match': {
'data.details.data.url': /(http[s]?:\/\/|www\.)example\.com/)
}
}
]
Note: you must scape special characters from the regex. A dot matches any character and the slash will close your regex causing an error.

MongoDB - Project specific element from array (big data)

I got a big array with data in the following format:
{
"application": "myapp",
"buildSystem": {
"counter": 2361.1,
"hostname": "host.com",
"jobName": "job_name",
"label": "2361",
"systemType": "sys"
},
"creationTime": 1517420374748,
"id": "123",
"stack": "OTHER",
"testStatus": "PASSED",
"testSuites": [
{
"errors": 0,
"failures": 0,
"hostname": "some_host",
"properties": [
{
"name": "some_name",
"value": "UnicodeLittle"
},
<MANY MORE PROPERTIES>,
{
"name": "sun",
"value": ""
}
],
"skipped": 0,
"systemError": "",
"systemOut": "",
"testCases": [
{
"classname": "IdTest",
"name": "has correct representation",
"status": "PASSED",
"time": "0.001"
},
<MANY MORE TEST CASES>,
{
"classname": "IdTest",
"name": "normalized values",
"status": "PASSED",
"time": "0.001"
}
],
"tests": 8,
"time": 0.005,
"timestamp": "2018-01-31T17:35:15",
"title": "IdTest"
}
<MANY MORE TEST SUITES >,
]}
Where I can distinct three main structures with big data: TestSuites, Properties, and TestCases. My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete. I would like to select only the "time" value from TestSuites, but it kind of conflicts with the "time" of TestCases in my query:
db.my_tests.find(
{
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
},
{
application: 1,
creationTime: 1,
buildSystem: 1,
"testSuites.time": 1,
_id:1
}
)
Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time: 1 without success. Please notice that TestSuites is an array of one element with a dictionary.
I already checked this similar post without success:
Mongodb update the specific element from subarray
Following code prints duration of each TestSuite:
query = db.my_collection.aggregate(
[
{$match: {
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
}
},
{ $project :
{ duration: { $sum: "$testSuites.time"}}
}
]
).forEach(function(doc)
{
print(doc._id)
print(doc.duration)
}
)
Is it possible to project only the "time" properties from TestSuites
without loading the whole schema? I already tried testSuites: 1,
testSuites.$.time
Answering to your problem of prejecting only the time property of the testSuites document you can simply try projecting it with "testSuites.time" : 1 (you need to add the quotes for the dot notation property references).
My task is to sum all times from each TestSuite so that I can get the
total duration of the test. Since the properties and TestCases are
huge, the query cannot complete
As for your task, i suggest you try out the mongodb's aggregation framework for your calculations documents tranformations. The aggregations framework option {allowDiskUse : true} will also help you if you are proccessing "large" documents.

removing an entire subdocument mongodb

This should be really easy but I cant seem to get it working;
I just need to remove a sub-document that I had accidentally run and introduced undesirable info in the sub-document.
I tried - db.test.remove({}, {dcoll10:{"$exists": true}, {multi: true}); but this didnt work.
Example of document and sub-doc (dcoll10) is given below;
{
"_id": "SSS",
"ts": { "$date": 1395927614611 },
"dcoll10": [
{
"_id": "SSS",
"type": "1813",
"gro": "0.1",
},
{
"_id": "SSS",
"type": "1813",
"gro": "0.1",
}
],
"assima" : [
{......}
]
}
It sounds like you want the $unset operator which is documented here: http://docs.mongodb.org/manual/reference/operator/update/unset/