mongodb $near query is slow - mongodb

One mongodb collection
{
"_id" : ObjectId("574bbae4d009b5364abaebe5"),
"cityid" : 406,
"location" : {
"type" : "Point",
"coordinates" : [
118.602355,
24.89083
]
},
"shopid" : "a"
}
with about 50, 000 rows;
and indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "pingan-test.shop_actinfo_collection_0530"
},
{
"v" : 1,
"key" : {
"location" : "2dsphere"
},
"name" : "location_2dsphere",
"ns" : "pingan-test.shop_actinfo_collection_0530",
"2dsphereIndexVersion" : 3
},
{
"v" : 1,
"key" : {
"shopid" : 1,
"cityid" : 1
},
"name" : "shopid_1_cityid_1",
"ns" : "pingan-test.shop_actinfo_collection_0530"
}
]
I query this collection like:
body = {'cityid': 2, 'location': {'$near': {'$geometry': {'type': 'Point', 'coordinates': [122.0, 31.0]}}}, 'shopid': {'$in': ['a','b']}}
results = collection.find(body, {'shopid': 1, '_id':0},).batch_size(20).limit(20)
shops = list(results)
The question is that it run about 400ms. But it just take 30ms if we don't care about location.
why and how to fix? please.

You have an index on shopid and cityid, but you search for cityid. Since the index is ordered by shopid first it cannot be used to search by cityid. If you change the index to cityid: 1, shopid: 1, then you will see a performance improvement because your query will be able to search using the index.

after all, i got it.
I just create a index to cityid: 1, shopid: 1, "location" : "2dsphere"
, and then, world peace。
and thanks #tiramisu again.

Related

Geonear sort by distance and time

I have the following data:
{
"_id" : ObjectId("55a8c1ba3996c909184d7a22"),
"uid" : "1db82e8a-2038-4818-b805-76a46ba62639",
"createdate" : ISODate("2015-07-17T08:50:02.892Z"),
"palce" : "aa",
"sex" : 1,
"longdis" : 1,
"location" : [ 106.607312, 29.575281 ]
}
{
"_id" : ObjectId("55a8c1ba3996c909184d7a24"),
"uid" : "1db82e8a-2038-4818-b805-76a46ba62639",
"createdate" : ISODate("2015-07-17T08:50:02.920Z"),
"palce" : "bbb",
"sex" : 1,
"longdis" : 1,
"location" : [ 106.589896, 29.545098 ]
}
{
"_id" : ObjectId("55a8c1ba3996c909184d7a25"),
"uid" : "1db82e8a-2038-4818-b805-76a46ba62639",
"createdate" : ISODate("2015-07-17T08:50:02.922Z"),
"palce" : "ccc",
"sex" : 1,
"longdis" : 1,
"location" : [ 106.590758, 29.566713 ]
}
{
"_id" : ObjectId("55a8c1ba3996c909184d7a26"),
"uid" : "1db82e8a-2038-4818-b805-76a46ba62639",
"createdate" : ISODate("2015-07-17T08:50:02.923Z"),
"palce" : "ddd",
"sex" : 1,
"longdis" : 1,
"location" : [ 106.637039, 29.561436 ]
}
{
"_id" : ObjectId("55a8c1bc3996c909184d7a27"),
"uid" : "1db82e8a-2038-4818-b805-76a46ba62639",
"createdate" : ISODate("2015-07-17T08:50:04.499Z"),
"palce" : "eee",
"sex" : 1,
"longdis" : 1,
"location" : [ 106.539522, 29.57929 ]
}
{
"_id" : ObjectId("55a8d12e78292fa3837ebae4"),
"uid" : "1db82e8a-2038-4818-b805-76a46ba62639",
"createdate" : ISODate("2015-07-17T09:55:58.947Z"),
"palce" : "fff",
"sex" : 1,
"longdis" : 1,
"location" : [ 106.637039, 29.561436 ]
}
I want to first of all, sort by the distance, if the distance is the same, sort by the time.
my command :
db.runCommand( {
geoNear: "paging",
near: [106.606033,29.575897 ],
spherical : true,
maxDistance : 1/6371,
minDistance:0/6371,
distanceMultiplier: 6371,
num:2,
query: {'_id': {'$nin': []}}
})
or
db.paging.find({
'location':{
$nearSphere: [106.606033,29.575897],
$maxDistance:1
}
}).limit(5).skip((2 - 1) * 2).sort({createdate:-1})
How can I sort on both "nearest" and "createddate"?
The correct query to use here uses the aggregation framework which has the $geoNear pipeline stage to assist with this. It's also the only place you get to "sort" by multiple keys, as unforntunately the "geospatial" $nearSphere does not have a "meta" projection for "distance" like $text has a "score".
Also the geoNear database command you are using can also not be used with "cursor" .sort() in that way either.
db.paging.aggregate([
{ "$geoNear": {
"near": [106.606033,29.575897 ],
"spherical": true,
"distanceField": "distance",
"distanceMuliplier": 6371,
"maxDistance": 1/6371
}},
{ "$sort": { "distance": 1, "createdate": -1 } },
{ "$skip": ( 2-1 ) * 2 },
{ "$limit": 5 }
])
That is the equivalent of what you are trying to do.
With the aggregation framework you use the "pipeline operators" instead of "cursor modifiers" to do things like $sort, $skip and $limit. Also these must be in a Logical order, whereas the cursor modifiers generally work it out.
It's a "pipeline", just like "Unix pipe". |
Also, be careful with "maxDistance" and "distanceMuliplier". Since your co-ordinates are in "legacy co-ordinate pairs" and not GeoJSON format, then the distances are measured in "radians". If you have GeoJSON stored location data then the result is returned in "meters".

Optimizing mongo query for better response

I am trying to optimize mongodb query for better response
db.myReports.find({
"CheckInDate": {
"$gte" : ISODate("2015-01-12T00:00:00Z"),
"$lte" : ISODate("2015-03-31T00:00:00Z")
},
"SubscriberPropertyId": NumberLong(47984),
"ChannelId": {
"$in": [701, 8275]
},
"PropertyId": {
"$in": [47984, 3159, 5148, 61436, 66251, 70108]
},
"LengthOfStay": 1
}, {
"CheckInDate": 1,
"SubscriberPropertyId": 1,
"ChannelId": 1,
"PropertyId": 1
});
Currently it is taking around 3 minutes just to find data from 3 million records.
One Document from collection
{
"_id" : ObjectId("54dba46c320caf5a08473074"),
"OptimisationId" : NumberLong(1),
"ScheduleLogId" : NumberLong(3),
"ReportId" : NumberLong(4113235),
"SubscriberPropertyId" : NumberLong(10038),
"PropertyId" : NumberLong(18166),
"ChannelId" : 701,
"CheckInDate" : ISODate("2014-09-30T18:30:00Z"),
"LengthOfStay" : 1,
"OccupancyIndex" : 1.0,
"CreatedDate" : ISODate("2014-09-11T06:31:08Z"),
"ModifiedDate" : ISODate("2014-09-11T06:31:08Z"),
}
INDEX created is:
db.myReports.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "db.myReports"
},
{
"v" : 1,
"key" : {
"CheckInDate" : 1,
"SubscriberPropertyId" : 1,
"ReportId" : 1,
"ChannelId" : 1,
"PropertyId" : 1
},
"name" :
"CheckInDate_1_SubscriberPropertyId_1_ReportId_1_Channe
lId_1_PropertyId_1",
"ns" : "db.myReports"
},
{
"v" : 1,
"key" : {
"CheckInDate" : 1
},
"name" : "CheckInDate_1",
"ns" : "db.myReports"
}
]
I have created index on possible entities
Put equality queries first, then range queries:
db.myReports.find({
"SubscriberPropertyId": NumberLong(47984),
"ChannelId": {
"$in": [701, 8275]
},
"PropertyId": {
"$in": [47984, 3159, 5148, 61436, 66251, 70108]
},
"CheckInDate": {
"$gte" : ISODate("2015-01-12T00:00:00Z"),
"$lte" : ISODate("2015-03-31T00:00:00Z")
},
"LengthOfStay": 1 // low selectivity, move to the end
}, {
"CheckInDate": 1,
"SubscriberPropertyId": 1,
"ChannelId": 1,
"PropertyId": 1
});
Make sure the index fits, i.e make the index SubscriberPropertyId, ChannelId, PropertyId, CheckInDate. LengthOfStay probably has too low selectivity to make sense in an index, depends on your data.
That should reduce the nscanned significantly, but getting 300k results will take its time (actually reading them, I mean)

mongoDB does not combine 1d and 2d indexes, geo queries scans all documents irrespective of filters applied to limit the number of records

Below is the output from explain for one of the queries:
{
"cursor" : "GeoSearchCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : **199564**,
"nscanned" : 199564,
"nscannedObjectsAllPlans" : **199564**,
"nscannedAllPlans" : **199564**,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1234,
"indexBounds" : {
},
"server" : "MongoDB",
"filterSet" : false
}
This query scans all the 199564 records, where as constrains applied in the filter for the query, which should be around few hundred records only.
Pointers would be much appreciated
Adding the query and indexes applied:
Query
{
"isfeatured" : 1 ,
"status" : 1 ,
"isfesturedseq" : 1 ,
"loc_long_lat" : {
"$near" : [ 76.966438 , 11.114906]
} ,
"city_id" : "40" ,
"showTime.0" : { "$exists" : true}}
Indexes
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"loc_long_lat" : "2d"
},
"name" : "loc_long_lat_2d",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"georand" : "2d"
},
"name" : "georand_2d",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"city_id" : 1
},
"name" : "city_id_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"endDatetime" : 1
},
"name" : "endDatetime_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"movieid" : 1
},
"name" : "movieid_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"theaterid" : 1
},
"name" : "theaterid_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"status" : 1
},
"name" : "status_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfeatured" : 1
},
"name" : "isfeatured_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfesturedseq" : 1
},
"name" : "isfesturedseq_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"is_popular" : 1
},
"name" : "is_popular_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"loc_name" : 1
},
"name" : "loc_name_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"est_city_id" : 1
},
"name" : "est_city_id_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfeatured" : 1,
"status" : 1,
"city_id" : 1
},
"name" : "isfeatured_1_status_1_city_id_1",
"ns" : "test_live.movies_theater_map",
"background" : true
},
{
"v" : 1,
"key" : {
"movieid" : 1,
"endDatetime" : 1,
"city_id" : 1,
"status" : 1
},
"name" : "movieid_1_endDatetime_1_city_id_1_status_1",
"ns" : "test_live.movies_theater_map",
"background" : 2
},
{
"v" : 1,
"key" : {
"movieid" : 1,
"endDatetime" : 1,
"city_id" : 1,
"status" : 1,
"georand" : 1
},
"name" : "movieid_1_endDatetime_1_city_id_1_status_1_georand_1",
"ns" : "test_live.movies_theater_map",
"background" : 2
},
{
"v" : 1,
"key" : {
"rand" : 1
},
"name" : "rand_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"isfeatured" : 1,
"city_id" : 1,
"status" : 1
},
"name" : "isfeatured_1_city_id_1_status_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"movieid" : 1,
"city_id" : 1
},
"name" : "movieid_1_city_id_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"loc_long_lat" : 1,
"is_popular" : 1,
"movieid" : 1,
"status" : 1
},
"name" : "loc_long_lat_1_is_popular_1_movieid_1_status_1",
"ns" : "test_live.movies_theater_map"
},
{
"v" : 1,
"key" : {
"status" : 1,
"city_id" : 1,
"theaterid" : 1,
"endDatetime" : 1
},
"name" : "status_1_city_id_1_theaterid_1_endDatetime_1",
"ns" : "test_live.movies_theater_map",
"background" : true
}
The $near operator uses a 2d or 2dsphere index to return documents in order from nearest to furthest. For a 2d index, a max of 100 documents are returned. Your query scanned every document because there were no matching documents and every document, from nearest to furthest, had to be scanned to check if it matched all the conditions.
I would suggest the following to improve the query:
Use the $maxDistance option, which is specified in radians for legacy coordinates, to limit the maximum number of documents scanned.
Use a 2dsphere index, ideally with GeoJSON points instead of legacy coordinates. You can have compound indexes with prefix keys to a geo index with a 2dsphere index, so you could index the query in part on all the other conditions to reduce the number of documents that need to be scanned. What version of MongoDB are you using? You may not have all of these features available with an old version.
Use limit to limit the maximum number of documents scanned. However, when the query has less results than the value of limit, you'll still scan every document.

Right index for slow distinct query

I'm using a distict query with filter defined and query is quire slow on database with 73K items. The query looks like this:
db.runCommand({ "distinct": "companies", "key": "approver",
"query": { "doc_readers": { "$in": [ "ROLE_USER", "ROLE_MODUL_CUST", "ROLE_MODUL_PROJECTS" ] } } })
Query stats is here:
"stats" : {
"n" : 73394,
"nscanned" : 146788,
"nscannedObjects" : 73394,
"timems" : 292,
"cursor" : "BtreeCursor doc_readers_1"
},
It shows that it checks every item to get distinct list of approvers here . Is there a way how to create a better index to speed up things? I have a 3 similar queries on one web page so together they take 1 sec to get data.
Update 1: I have the following indexes
Only the first one is beeing used as stats shows ...
{
"v" : 1,
"key" : {
"doc_readers" : 1
},
"name" : "doc_readers_1",
"ns" : "netnotes.companies",
"sparse" : false
},
{
"v" : 1,
"key" : {
"doc_readers" : 1,
"approver" : 1
},
"name" : "doc_readers_1_schvalovatel_1",
"ns" : "netnotes.companies"
},
{
"v" : 1,
"key" : {
"approver" : 1,
"doc_readers" : 1
},
"name" : "schvalovatel_1_doc_readers_1",
"ns" : "netnotes.companies"
},

mongodb Embedded document search on parent and child field

I have a nested embedded document CompanyProduct below is structure
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca3"),
"CompanyID" : 90449,
"Name" : Company1,
"CompanyDepartment" : [
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca4")
"DepartmentID" : 287,
"DepartmentName" : "Stores",
"DepartmentInventory" : [
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 1,
"ProductName" : "abc",
"Quantity" : 100
},
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 2,
"ProductName" : "xyz",
"Quantity" : 1
}
],
}
],
}
There can be N no of companies and each company can have N number of departments and each department can have N number of products.
I want to do a search to find out a particular product quantity under a particular company
I tried below query but it does not work. It returns all the products for the specific company, the less than 20 condition doesn't work.
db.CompanyProduct.find({$and : [{"CompanyDepartment.DepartmentInventory.Quantity":{$lt :20}},{"CompanyID":90449}]})
How should the query be?
You are searching from companyProduct's sub documents. So it will return you companyProduct whole document, it is NoSQL database , some how we do not need to normalize the collection , but your case it has to be normalize , like if you want to EDIT/DELETE any sub document and if there are thousand or millions of sub document then what will you do ... You need to make other collection with the name on CompanyDepartment and companyProduct collection should be
productCompany
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca3"),
"CompanyID" : 90449,
"Name" : Company1,
"CompanyDepartment" : ['53d213c5ddbb1912343a8ca4'],
}
and other collection companyDepartment
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca4")
"DepartmentID" : 287,
"DepartmentName" : "Stores",
"DepartmentInventory" : [
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 1,
"ProductName" : "abc",
"Quantity" : 100
},
{
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 2,
"ProductName" : "xyz",
"Quantity" : 1
}
],
}
after this you got array of companyDeparment' ID and only push and pull query will be used on productCompany
A Solution can be
db.YourCollection.aggregate([
{
$project:{
"CompanyDepartment.DepartmentInventory":1,
"CompanyID" : 1
}
},{
$unwind: "$CompanyDepartment"
},{
$unwind: "$CompanyDepartment.DepartmentInventory"
},{
$match:{$and : [{"CompanyDepartment.DepartmentInventory.Quantity":{$lt :20}},{"CompanyID":90449}]}
}
])
the result is
{
"result" : [
{
"_id" : ObjectId("53d213c5ddbb1912343a8ca3"),
"CompanyID" : 90449,
"CompanyDepartment" : {
"DepartmentInventory" : {
"_id" : ObjectId("53b7b92eecdd765430d763bd"),
"ProductID" : 2,
"ProductName" : "xyz",
"Quantity" : 1
}
}
}
],
"ok" : 1
}