mongodb $maxScan didn't equals limit - mongodb

This is my first question on stack overflow, I am so happy and await your answers. My question is:
When I use MongoDB Query Selectors, I want limit results. But $maxScan is not work as I want.
---------This is What I want result.
db.post.find({query:{status:"publish"},$orderby:{date:-1}},{status:1,name:1,date:1,$slice:2}).limit(3)
{ "_id" : ObjectId("519262580cf21fb1647fb765"), "date" : ISODate("2013-05-14T16:12:08.600Z"), "status" : "publish", "name" : "关于多说" }
{ "_id" : ObjectId("519254ad0cf2f064f6ecef82"), "date" : ISODate("2013-05-14T15:13:49.017Z"), "status" : "publish", "name" : "回顾<蜗居>的100句经典台词" }
{ "_id" : ObjectId("519254690cf2f064f6ecef81"), "date" : ISODate("2013-05-14T15:12:41.462Z"), "status" : "publish", "name" : "女人脱光了是什么" }
-----------This is the results I use $maxScan
db.post.find({query:{status:"publish"},$maxScan:3,$orderby:{date:-1}},{status:1,name:1,date:1})
{ "_id" : ObjectId("518e6c690cf21a363df2956e"), "date" : ISODate("2013-05-11T16:06:01.341Z"), "status" : "publish", "name" : "淘宝新店,充值任务" }
I find may be the $maxScan didn't like limit(). it first limit the collection data and then execute the query! but this is not I want. Is anything i wrong? please help.Thanks
--------------All results
db.post.find({query:{},$orderby:{date:-1}},{status:1,name:1,date:1})
{ "_id" : ObjectId("519262580cf21fb1647fb765"), "date" : ISODate("2013-05-14T16:12:08.600Z"), "status" : "publish", "name" : "关于多说" }
{ "_id" : ObjectId("519254ad0cf2f064f6ecef82"), "date" : ISODate("2013-05-14T15:13:49.017Z"), "status" : "publish", "name" : "回顾<蜗居>的100句经典台词" }
{ "_id" : ObjectId("519254690cf2f064f6ecef81"), "date" : ISODate("2013-05-14T15:12:41.462Z"), "status" : "publish", "name" : "女人脱光了是什么" }
{ "_id" : ObjectId("518ee61a0cf22bd326d60215"), "date" : ISODate("2013-05-12T00:45:14.295Z"), "status" : "publish", "name" : "JSTL日期格式化用法(转载)" }
{ "_id" : ObjectId("518e6c690cf21a363df2956e"), "date" : ISODate("2013-05-11T16:06:01.341Z"), "status" : "publish", "name" : "淘宝新店,充值任务" }
{ "_id" : ObjectId("518e21c90cf21a363df2956d"), "date" : ISODate("2013-05-11T10:47:37.803Z"), "status" : "draft", "name" : "一夜没睡" }
{ "_id" : ObjectId("518df75d0cf21a363df2956c"), "date" : ISODate("2013-05-11T07:46:37.726Z"), "status" : "draft", "name" : "飞娥入侵" }
{ "_id" : ObjectId("518d80630cf21a363df2956b"), "date" : ISODate("2013-05-10T23:18:59.323Z"), "status" : "publish", "name" : "Java的日期格式化常用方法" }

To return only the top results, you should use limit(), which will limit the amount of results returned from the cursor. This is commonly used with skip() to paginate the results.
It's not explained very clearly in the docs, but $maxScan as the name suggests limits the number of documents the query will examine. Presumably your query is examining some documents which don't meet the criteria (with status != publish) and then discarding them.
Do you have an index on status? It's possible that could help the query return the results you want while scanning fewer documents, but I still think limit() is what you want.

Related

Query data in MongoDB vs Filtering in Code

This question is performance based.
If I have a collection which I want to query on multiple fields (fieldValue < x < fieldValue, status = 'pending' etc...) is it better to query via a mongoDB query or rather to retrieve a sample of the collection that fits some simpler query such as status = 'pending' and then do further filtering of the data in the server code?
When would you recommend which approach and when not?
Thank you for your taking your time.
Regards,
Emir
Go for single query option, since the filtering most of the unwanted data and fetching the required data from the Database should be done in the Database itself. Any additional operations will take its own time and resources to complete the same job. Here in this case we can use $and, $gt, $lt, $eq. Performance will be high if the data is operated at the Data layer itself.
Sample Collection
{ "_id" : ObjectId("5a13c7e08e1b021d0f556c29"), "value" : 10, "status" : "pending" }
{ "_id" : ObjectId("5a13c7e58e1b021d0f556c2a"), "value" : 20, "status" : "completed" }
{ "_id" : ObjectId("5a13c7e88e1b021d0f556c2b"), "value" : 40, "status" : "In Progress" }
{ "_id" : ObjectId("5a13c7ec8e1b021d0f556c2c"), "value" : 50, "status" : "pending" }
{ "_id" : ObjectId("5a13c7f08e1b021d0f556c2d"), "value" : 750, "status" : "completed" }
{ "_id" : ObjectId("5a13c7f68e1b021d0f556c2e"), "value" : 90, "status" : "pending" }
{ "_id" : ObjectId("5a13c7fb8e1b021d0f556c2f"), "value" : 190, "status" : "pending" }
{ "_id" : ObjectId("5a13c7fe8e1b021d0f556c30"), "value" : 120, "status" : "completed" }
{ "_id" : ObjectId("5a13c8038e1b021d0f556c31"), "value" : 220, "status" : "completed" }
{ "_id" : ObjectId("5a13c8078e1b021d0f556c32"), "value" : 720, "status" : "pending" }
{ "_id" : ObjectId("5a13c80b8e1b021d0f556c33"), "value" : 7420, "status" : "In Progress" }
Sample Query: 20 < x < 300 and status = pending
db.collection.find({$and:[{value:{$gt: 20}, value:{$lt:300}, status:{$eq:"pending"}}]})
The result will be
{ "_id" : ObjectId("5a13c7e08e1b021d0f556c29"), "value" : 10, "status" : "pending" }
{ "_id" : ObjectId("5a13c7ec8e1b021d0f556c2c"), "value" : 50, "status" : "pending" }
{ "_id" : ObjectId("5a13c7f68e1b021d0f556c2e"), "value" : 90, "status" : "pending" }
{ "_id" : ObjectId("5a13c7fb8e1b021d0f556c2f"), "value" : 190, "status" : "pending" }
Hope it helps!

Updating two level sub-document which is list in mongoDB?

I have a hotel collection whose one of the document looks like this -
{
"_id" : "HOTEL_1",
"name" : "Decent hotel",
"chainId" : "CHN123",
"rooms" : [
{
"id" : "ROM1",
"name" : "decent rooms",
"ratePlans" : [
{
"ratePlanId" : "RPNB1191989873C2G",
"status" : "INACTIVE",
"marginPart" : {
"marginType" : "PERCENTAGE",
"margin" : "32"
}
},
{
"ratePlanId" : "RPNE0992HBG6I0GE8",
"status" : "INACTIVE",
"marginPart" : {
"marginType" : "PERCENTAGE",
"margin" : "32"
}
}
]
},
{
"id" : "ROM2",
"name" : "another decent rooms"
"ratePlans" : []
}
]
}
I need to update status as ACTIVE of all the rate plans of all the rooms with a certain condition like chainId.
I tried with this but failed -
db.hotel.updateMany({ "chainId" : "CHN_123"},{$set : {"rooms.$ratePlans.$status" : "ACTIVE" }});
I also want to update margin as common value say 50% to all such rates.
Instead of updateMany try the below query
db.hotel.update({ "chainId" : "CHN_123"},{$set : {"rooms.$ratePlans.$status" : "ACTIVE" }},{multi:true,upsert:false},function(err,doc){
console.log(doc)
});
It works always!!

mongodb aggregate with extra info

I have a mongo collection containing docs such as this:
{
"_id" : ObjectId("57697321c22d3917acd66513"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 1638,
"url" : "http://www.thecompany.com/path/to/page1",
"date" : ISODate("2016-06-21T17:02:20.352Z"),
"valid" : true
}
What I am trying to do is to run one query that would group on signature filed, return min and max price AND corresponding url:
{
"signature" : "AnotherAlphaNumericID",
"min_price" : 1504,
"min_rent_listing" : "http://www.thecompany.com/path/to/page1",
"max_price" : 1737,
"max_price_listing" : "http://www.thecompany.com/path/to/page2",
}
Running a $group on $signature field to obtain $min and $max is straight forward but in order to get the actual urls I split the query into 2 with the first query returning a sorted list of docs using $signature with prices from min to max and then (in python code) taking the first and last element. This works fine but would be nice to have one query.
Thoughts?
p.s.
Also 'toyed' with running one query for min and one for max and 'zipping' the results.
You can play a trick with help of $group and $project. Assuming dataset is
{
"_id" : ObjectId("57db28dc705af235a826873a"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 1638.0,
"url" : "http://www.thecompany.com/path/to/page1",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
{
"_id" : ObjectId("57db28dc705af235a826873b"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 168.0,
"url" : "http://www.thecompany.com/path/to/page2",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
{
"_id" : ObjectId("57db28dc705af235a826873c"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 163.0,
"url" : "http://www.thecompany.com/path/to/page3",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
{
"_id" : ObjectId("57db28dc705af235a826873d"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 1680.0,
"url" : "http://www.thecompany.com/path/to/page4",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
Try following query in shell
db.collection.aggregate([
{$sort:{price:1}},
{$group:{
_id:"$signature",
_first:{$first:"$url"},
_last:{$last:"$url"},
_min:{$first:"$price"},
_max:{$last:"$price"}}
},
{$project:{
_id:0,
min:{
url:"$_first",
price:"$_min"},
max:{
url:"$_last",
price:"$_max"}}
}
])
Output will be with minimum/maximum price and corresponding url
{
"min" : {
"url" : "http://www.thecompany.com/path/to/page3",
"price" : 163.0
},
"max" : {
"url" : "http://www.thecompany.com/path/to/page4",
"price" : 1680.0
}
}
What I changed from original answer:
_min:{$min:"$price"}, --> to use $first
_max:{$max:"$price"}} --> to use $last
Reason: we go into the pipeline with an ascending sort on price. By default, first record is min and last record is max.

Inconsistent query results with embedded documents on MongoDB

I've got a collection called payments with an example of its document shown below:
{
"_id" : ObjectId("579b5ee817e3aaac2f0aebc1"),
"updatedAt" : ISODate("2016-07-29T11:04:01.209-03:00"),
"createdAt" : ISODate("2016-07-29T10:49:28.113-03:00"),
"createdBy" : ObjectId("5763f56010cd7b03008147d4"),
"contract" : ObjectId("578cb907f1575f0300d84d09"),
"recurrence" : [
{
"when" : ISODate("2016-05-29T11:03:45.606-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2d"),
"transaction" : {
"createdAt" : ISODate("2016-05-29T11:03:45.608-03:00"),
"tid" : "9999999999999999B01A",
"status" : 4,
"code" : "00",
"message" : "Transação autorizada"
},
"status" : "PAGO"
},
{
"when" : ISODate("2016-06-29T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2c"),
"transaction" : {
"createdAt" : ISODate("2016-06-29T11:03:45.608-03:00"),
"tid" : "9999999999999999B01A",
"status" : 4,
"code" : "00",
"message" : "Transação autorizada"
},
"status" : "PAGO"
},
{
"when" : ISODate("2016-07-29T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2b"),
"status" : "ERRO",
"transaction" : {
"code" : "56",
"createdAt" : ISODate("2016-07-29T11:04:01.196-03:00"),
"message" : "Autorização negada",
"status" : 5,
"tid" : "1006993069000730B88A"
}
},
{
"when" : ISODate("2016-07-30T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2a"),
"status" : "PENDENTE"
},
{
"when" : ISODate("2016-07-31T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e29"),
"status" : "PENDENTE"
},
{
"when" : ISODate("2016-08-01T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e28"),
"status" : "PENDENTE"
}
],
"status" : "PAGO",
"conditions" : {
"originalValue" : 7406.64,
"totalValue" : 7400,
"upfrontValue" : 1500,
"upfrontInstallments" : 3,
"balanceInstallments" : 9
},
"__v" : 0,
"transaction" : {
"code" : "00",
"createdAt" : ISODate("2016-07-29T10:49:46.610-03:00"),
"message" : "Transação autorizada",
"status" : 6,
"tid" : "1006993069000730AF5A"
}
}
If I run the query below, I get the desired document shown above:
db.payments.find({ "recurrence.transaction.tid": "1006993069000730B88A" })
However, if I run this other query, MongoDB returns my entire collection (presumably because it didn't match the subdocument's id):
db.payments.find({ "recurrence._id": ObjectId("579b6241ea945e3631f64e2b") })
Both queries should return the same result! I also checked some other questions including this one so unless I'm going crazy I'm doing the same thing. Not sure why the inconsistent results though.
Tryout this:
db.payments.find({ recurrence : { $elemMatch: { "transaction.tid": "1006993069000730B88A"} } }).pretty()

MongoDB - How to remove duplicates

I have a collection which have many duplicates due to the routines that populated it in the first place. How to dedupe these?
e.g.
{ "_id" : ObjectId("531a5fe448757e00244096fa"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
{ "_id" : ObjectId("531a731148757e17587a6e04"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
{ "_id" : ObjectId("531a7bb848757e1f7c0ca702"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
I want it to be just one (don't care which objectID gets picked)
{ "_id" : ObjectId("531a5fe448757e00244096fa"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
You should use an Index over you code field:
db.<collection>.ensureIndex({'code' : 1}, {unique : true, dropDups : true})
unique will ensure you will not have duplicates anymore.
dropDups will delete all your duplicate documents when the ensureIndex operation is run