Slow query in DocumentDB - aws-documentdb

I have a DocumentDB with some really slow queries and peaks of CPU. One of the slowest queries based on the profiler, its the following. I'm trying to understand what kind of issue could cause this query to be slow freaking slow:
{
"op":"update",
"ts":1597203362481,
"ns":"db.registrations",
"command":{
"q":{
"_id":"5ca3f02edb3fb733eb2f0f46"
},
"u":{
"$set":{
"last_active_at":"2020-08-12T03:33:43.278Z",
"updated_at":"2020-08-12T03:33:43.278Z"
}
}
},
"nMatched":1,
"nModified":1,
"protocol":"op_query",
"millis":139201,
"planSummary":"IXSCAN",
"execStats":{
"stage":"UPDATE",
"nReturned":"0",
"executionTimeMillisEstimate":"138855.197",
"inputStages":[
{
"stage":"LIMIT_SKIP",
"nReturned":"1",
"executionTimeMillisEstimate":"138822.167",
"inputStage":{
"stage":"IXSCAN",
"nReturned":"1",
"executionTimeMillisEstimate":"0.305",
"indexName":"_id_",
"direction":"forward"
}
},
{
"stage":"IXSCAN",
"nReturned":"1",
"executionTimeMillisEstimate":"138841.412",
"indexName":"_id_",
"direction":"forward"
}
]
},
"client":"10.70.13.125:42436",
"user":"db"
}
Neither those fields: last_active_at or updated_at have index. So why this could being so slow?

Related

aggregations select does not work in mongodb compass window tool

All of a sudden Select doesn't work Your query doesn't seem to be wrong Select doesn't work and it's hard to insert a pipeline I need it for debugging Why doesn't it work?
Windows mongodb compass management tool Aggregations -> Select query add does not work
image link
https://i.stack.imgur.com/z609k.png
mycode
[
{'$match':{'month':202202}},
{
'$lookup':{
'from':'stat_ad_conversion',
'let':{'master_id':'$master_id','campaign_id':'$campaign_id','biz_id':'$biz_id','yyyy':'$yyyy','mm':'$mm'},
'pipeline':[
{
'$match':{
'$expr':{
'$and':[
{'$eq':['$master_id','$$master_id']},
{'$eq':['$campaign_id','$$campaign_id']},
{'$eq':['$biz_id','$$biz_id']},
{'$eq':['$yyyy','$$yyyy']},
{'$eq':['$mm','$$mm']}
]
}
}
}
],
'as':'t1'
}
},
{'$unwind':{'path':'$t1','preserveNullAndEmptyArrays':true}},
{
'$group':{
'_id':{'master_id':'$master_id','campaign_id':'$campaign_id','biz_id':'$biz_id','yyyy':'$yyyy','mm':'$mm'},
'exposecount':{'$sum':'$exposecount'},
'clickcount':{'$sum':'$clickcount'},
'cost':{'$sum':'$cost'},
'mm':{'$first':'$mm'}
}
},
{
'$group':{
'_id':{'master_id':'$t1.master_id','campaign_id':'$t1.campaign_id','biz_id':'$t1.biz_id','keyword_id':'$t1.keyword_id','yyyy':'$t1.yyyy','mm':'$t1.mm'},
'count':{'$sum':'$conversion_count'},
'keywordcount':{'$sum':1},
'conversioncost':{'$sum':'$t1.conversion_count'},
'directcount':{'$sum':{'$switch':{'branches':[{'case':{'$eq':['$t1.conversion_method',1]},'then':'$t1.conversion_count'}],'default':0}}},
'indirectcount':{'$sum':{'$switch':{'branches':[{'case':{'$eq':['$t1.conversion_method',2]},'then':'$t1.conversion_count'}],'default':0}}},
'conversioncost':{'$sum':'$t1.sales_conversion'},
'conversiondirectcost':{'$sum':{'$switch':{'branches':[{'case':{'$eq':['$t1.conversion_method',1]},'then':'$t1.sales_conversion'}],'default':0}}},
'conversionindirectcost':{'$sum':{'$switch':{'branches':[{'case':{'$eq':['$t1.conversion_method',2]},'then':'$t1.sales_conversion'}],'default':0}}}
}
},
{
'$merge':{
'into':'finish_period_month_1',
'on':'_id',
'whenMatched':'replace',
'whenNotMatched':'insert'
}
}
]
Creating and executing a normal pipeline query Select does not work
Only No Preview Documents no results are coming out

Search is slow in Node Mongo API?

I have created an API to search documents in Mongodb.
As in my Mongodb there is More than 60 Millions Documents.
So, I have created an API to search and the code is
let regexexp = new RegExp(searchquery.split('').join('\\s*'),'i');
s = {"name": regexexp, "status": 'AVAILABLE'};
db.collection(collectionName).find(s).count().then(function(totalcount) {
db.collection(collectionName).find(s).limit(18).toArray(function(err, res) {
if (err) throw err;
responsedata = { status: true,error : false,data : res,'totalproductcount':
totalcount};
resolve(responsedata);
});
});
But it will take time around 1 min 30 sec sometimes 2 min to search 18 documents which is very much for any user is there anything which i have not used in it.
I have used these value for mongo db connction
const option = {
useUnifiedTopology: true,
useNewUrlParser: true,
socketTimeoutMS: 300000,
poolSize:500,
keepAlive: 300000,
connectTimeoutMS: 300000,
authSource:'admin',
};
Then i found out aggregation in Mongodb I have prepared query for that
db.collection('customers').aggregate( [ { $match: { name: "abc" } } ]
).toArray(function(err, results) {
console.log(results);
});
But it will only work for exact match i want to work if in name parameter is abc is present anywhere it will search that option but it doesn't work.
Is there any way to get the data in minimum time like i have checked from mongo shell it gives me data in milliseconds but it will take time when i hit it with my API.
If Any thing more required please posted here.Any help is appreciated.

MongoDB View vs Function to abstract query and variable/parameter passed

I hate to risk asking a duplicate question, but perhaps this is different from Passing Variables to a MongoDB View which didn't have any clear solution.
Below is a query to find the country for IP Address 16778237. (Outside the scope of this query, there is a formula that turns an IPV4 address into a number.)
I was wondering if we could abstract away this query out of NodeJS code, and make a view, so the view could be called from NodeJS. But the fields ipFrom and ipTo are indexed to get the query to run fast against millions of documents in the collection, so we can't return all the rows to NodeJS and filter there.
In MSSQL maybe this would have to be a stored procedure, instead of a view. Just trying to learn what is possible in MongoDB. I know there are functions, which are written in JavaScript. Is that where I need to look?
db['ip2Locations'].aggregate(
{
$match:
{
$and: [
{
"ipFrom": {
$lte: 16778237
}
},
{
"ipTo": {
$gte: 16778237
}
},
{
"active": true
}
],
$comment: "where 16778237 between startIPRange and stopIPRange and the row is 'Active',sort by createdDateTime, limit to the top 1 row, and return the country"
}
},
{
$sort:
{
'createdDateTime': - 1
}
},
{
$project:
{
'countryCode': 1
}
},
{
$limit: 1
}
)
Part 2 - after more research and experimenting, I found this is possible and runs with success, but then see trying to make a view below this query.
var ipaddr = 16778237
db['ip2Locations'].aggregate(
{
$match:
{
$and: [
{
"ipFrom": {
$lte: ipaddr
}
},
{
"ipTo": {
$gte: ipaddr
}
},
{
"active": true
}
],
$comment: "where 16778237 between startIPRange and stopIPRange and the row is 'Active',sort by createdDateTime, limit to the top 1 row, and return the country"
}
},
{
$sort:
{
'createdDateTime': - 1
}
},
{
$project:
{
'countryCode': 1
}
},
{
$limit: 1
}
)
If I try to create a view with a "var" in it, like this;
db.createView("ip2Locations_vw-lookupcountryfromip","ip2Locations",[
var ipaddr = 16778237
db['ip2Locations'].aggregate(
I get error:
[Error] SyntaxError: expected expression, got keyword 'var'
In the link I provided above, I think the guy was trying to figure how the $$user-variables work (no example here: https://docs.mongodb.com/manual/reference/aggregation-variables/). That page refers to $let, but never shows how the two work together. I found one example here: https://www.tutorialspoint.com/mongodb-query-to-set-user-defined-variable-into-query on variables, but not $$variables. I'm
db.createView("ip2Locations_vw-lookupcountryfromip","ip2Locations",[
db['ip2Locations'].aggregate(
...etc...
"ipFrom": {
$lte: $$ipaddr
}
I tried ipaddr, $ipaddr, and $$ipaddr, and they all give a variation of this error:
[Error] ReferenceError: $ipaddr is not defined
In a perfect world, one would be able to do something like:
get['ip2Locations_vw-lookupcountryfromip'].find({$let: {'ipaddr': 16778237})
or similar.
I'm getting that it's possible with Javascript stored in MongoDB (How to use variables in MongoDB query?), but I'll have to re-read that; seems like some blogs were warning against it.
I have yet to find a working example using $$user-variables, still looking.
Interpretation
You want to query a view from some server side code, passing a variable to it.
Context
Can we use an external variable to recompute a View? Take the following pipeline:
var pipeline = [{ $group:{ _id:null, useless:{ $push:"$$NOW" } } }]
We can pass system variables using $$. We can define user variables too, but the user defined variables are made out of:
Collection Data
System Variables.
Also, respect to your Part2:
A variable var variable="what" will be computed only once. Redefine variable="whatever" makes no difference in the view, it uses "what".
Conclusion
Views can only be re-computed with system variables, or user variables dependant on those system variables or collection data.
Added an answer to the post you link too.

Update with upsert, but only update if date field of document in db is less than updated document

I am having a bit of an issue trying to come up with the logic for this. So, what I want to do is:
Bulk update a bunch of posts to my remote MongoDB instance BUT
If update, only update if lastModified field on the remote collection is less than lastModified field in the same document that I am about to update/insert
Basically, I want to update my list of documents if they have been modified since the last time I updated them.
I can think of two brute force ways to do it...
First, querying my entire collection, trying to manually remove and replace the documents that match the criteria, add the new ones, and then mass insert everything back to the remote collection after deleting everything in remote.
Second, query each item and then deciding, if there is one in remote, if I want to update it or no. This seems like it would be very tasking when dealing with remote collections.
If relevant, I am working on a NodeJS environment, using the mondodb npm package for database operations.
You can use the bulkWrite API to carry out the updates based on the logic you specified as it handles this better.
For example, the following snippet shows how to go about this assuming you already have the data from the web service you need to update the remote collection with:
mongodb.connect(mongo_url, function(err, db) {
if(err) console.log(err);
else {
var mongo_remote_collection = db.collection("remote_collection_name");
/* data is from http call to an external service or ideally
place this within the service callback
*/
mongoUpsert(mongo_remote_collection, data, function() {
db.close();
})
}
})
function mongoUpsert(collection, data_array, cb) {
var ops = data_array.map(function(data) {
return {
"updateOne": {
"filter": {
"_id": data._id, // or any other filtering mechanism to identify a doc
"lastModified": { "$lt": data.lastModified }
},
"update": { "$set": data },
"upsert": true
}
};
});
collection.bulkWrite(ops, function(err, r) {
// do something with result
});
return cb(false);
}
If the data from the external service is huge then consider sending the writes to the server in batches of 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
For bulk operations MongoDB imposes a default internal limit of 1000 operations per batch and so the choice of 500 documents is good in the sense that you have some control over the batch size rather than let MongoDB impose the default, i.e. for larger operations in the magnitude of > 1000 documents. So for the above case in the first approach one could just write all the array at once as this is small but the 500 choice is for larger arrays.
var ops = [],
counter = 0;
data_array.forEach(function(data) {
ops.push({
"updateOne": {
"filter": {
"_id": data._id,
"lastModified": { "$lt": data.lastModified }
},
"update": { "$set": data },
"upsert": true
}
});
counter++;
if (counter % 500 === 0) {
collection.bulkWrite(ops, function(err, r) {
// do something with result
});
ops = [];
}
})
if (counter % 500 != 0) {
collection.bulkWrite(ops, function(err, r) {
// do something with result
}
}

How to search for text or expression in multiple fields

db.movies.find({"original_title" : {$regex: input_data, $options:'i'}}, function (err, datares){
if (err || datares == false) {
db.movies.find({"release_date" : {$regex: input_data + ".*", $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"cast" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"writers" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"genres.name" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
db.movies.find({"directors" : {$regex: input_data, $options:'i'}}, function (err, datares){
if(err || datares == false){
res.status(451);
res.json({
"status" : 451,
"error code": "dataNotFound",
"description" : "Invalid Data Entry."
});
return;
} else{
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
} else {
res.json(datares);
return;
}
});
I am trying to implement a so called "all-in-one" search so that whenever a user types in any kind of movie related information, my application tries to return all relevant information. However I have noticed that this transaction might be expensive on the backend and sometimes the host is really slow.
How do I smoothly close the db connection and where should I use it?
I read here that it is best not to close a mongodb connection in node.js >>Why is it recommended not to close a MongoDB connection anywhere in Node.js code?
Is the a proper way to implement a all-in-one search kind of a thing by using nested find commands?
Your current approach is full of problems and is not necessary to do this way. All you are trying to do is search for what a can gather is a plain string within a number of fields in the same collection. It may possibly be a regular expression construct but I'm basing two possibilities on a plain text search that is case insensitive.
Now I am not sure if you came to running one query dependant on the results of another because you didn't know another way or though it would be better. Trust me on this, that is not a better approach than anything listed here nor is it really required as will be shown:
Regex query all at once
The first basic option here is to continue your $regex search but just in a singular query with the $or operator:
db.movies.find(
{
"$or": [
{ "original_title" : { "$regex": input_data, "$options":"i"} },
{ "release_date" : { "$regex": input_data, "$options":"i"} },
{ "cast" : { "$regex": input_data, "$options":"i"} },
{ "writers" : { "$regex": input_data, "$options":"i"} },
{ "genres.name" : { "$regex": input_data, "$options":"i"} },
{ "directors" : { "$regex": input_data, "$options":"i"} }
]
},
function(err,result) {
if(err) {
// respond error
} else {
// respond with data or empty
}
}
);
The $or condition here effectively works like "combining queries" as each argument is treated as a query in itself as far as document selection goes. Since it is one query than all the results are naturally together.
Full text Query, multiple fields
If you are not really using a "regular expression" built from regular expression operations i.e ^(\d+)\bword$, then you are probably better off using the "text search" capabilities of MongoDB. This approach is fine as long as you are not looking for things that would be generally excluded, but your data structure and subject actually suggests this is the best option for what you are likely doing here.
In order to be able to perform a text search, you first need to create a "text index", specifically here you want the index to span multiple fields in your document. Dropping into the shell for this is probably easiest:
db.movies.createIndex({
"original_title": "text",
"release_date": "text",
"cast" : "text",
"writers" : "text",
"genres.name" : "text",
"directors" : "text"
})
There is also an option to assign a "weight" to fields within the index as you can read in the documentation. Assigning a weight give "priority" to the terms listed in the search for the field that match in. For example "directors" might be assigned more "weight" than "cast" and matches for "Quentin Tarantino" would therefore "rank higher" in the results where he was a director ( and also a cast member ) of the movie and not just a cast member ( as in most Robert Rodriguez films ).
But with this in place, performing the query itself is very simple:
db.movies.find(
{ "$text": { "$search": input_data } },
function(err,result) {
if(err) {
// respond error
} else {
// respond with data or empty
}
}
);
Almost too simple really, but that is all there is to it. The $text query operator knows to use the required index ( there can only be one text index per collection ) and it will just then look through all of the defined fields.
This is why I think this is the best fit for your use case here.
Parallel Queries
The final alternate I'll give here is you still want to demand that you need to run separate queries. I still deny that you do need to only query if the previous query does not return results, and I also re-assert that the above options should be considered "first", with preference to text search.
Writing dependant or chained asynchronous functions is a pain, and very messy. Therefore I suggest leaning a little help from another library dependency and using the node-async module here.
This provides an aync.map.() method, which is perfectly suited to "combining" results by running things in parallel:
var fields = [
"original_title",
"release_date",
"cast",
"writers",
"genres.name",
"directors"
];
async.map(
fields,
function(field,callback) {
var search = {},
cond = { "$regex": input_data, "$options": "i" };
search[field] = cond; // assigns the field to search
db.movies.find(search,callback);
},
function(err,result) {
if(err) {
// respond error
} else {
// respond with data or empty
}
}
);
And again, that is it. The .map() operator takes each field and transposes that into the query which in turn returns it's results. Those results are then accessible after all queries are run in the final section, "combined" as if they were a single result set, just as the other alternates do here.
There is also a .mapSeries() variant that runs each query in series, or .mapLimit() if you are otherwise worried about using database connections and concurrent tasks, but for this small size this should not be a problem.
I really don't think that this option is necessary, however if the Case 1 regular expression statements still apply, this "may" possibly provide a little performance benefit due to running queries in parallel, but at the cost of increased memory and resource consumption in your application.
Anyhow, the round up here is "Don't do what you are doing", you don't need to and there are better ways to handle the task you want to achieve. And all of them are mean cleaner and easier to code.