Mongodb 2.4 2dsphere queries very slow - mongodb

I have converted my old collection using mongodb "2d" index to a collection having geojson specification "2dsphere" index. The problem is that the query is taking about 11 second to execute on collection of about 2 lac objects. Previously is was taking about 100 ms for query. My document is as follow.
{
"_id": ObjectId("4f9c2aa2d142b9882f02a3b3"),
"geonameId": NumberInt(1106542),
"name": "Chitungwiza",
"feature code": "PPL",
"country code": "ZW",
"state": "Harare Province",
"population": NumberInt(340360),
"elevation": "",
"timezone": "Africa\/Harare",
"geolocation": {
"type": "Point",
"coordinates": {
"0": 31.07555,
"1": -18.01274
}
}
}
My explain query output is given below.
db.city_info.find({"geolocation":{'$near':{ '$geometry': { 'type':"Point",coordinates:[73,23] } }}}).explain()
{
"cursor" : "S2NearCursor",
"isMultiKey" : true,
"n" : 172980,
"nscannedObjects" : 172980,
"nscanned" : 1121804,
"nscannedObjectsAllPlans" : 172980,
"nscannedAllPlans" : 1121804,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 13,
"nChunkSkips" : 0,
"millis" : 13841,
"indexBounds" : {
},
"nscanned" : 1121804,
"matchTested" : NumberLong(191431),
"geoMatchTested" : NumberLong(191431),
"numShells" : NumberLong(373),
"keyGeoSkip" : NumberLong(930373),
"returnSkip" : NumberLong(933610),
"btreeDups" : NumberLong(0),
"inAnnulusTested" : NumberLong(191431),
"server" : "..."
}
Please let me know how can I correct the problem and reduce the query time.

The $near command does not require $maxDistance argument for "2dsphere" databases as you suggest. Adding $maxDistance just specified a range that reduced the number of query results to a manageable number. The reason for the difference in your experience changing from "2d" to "2dsphere" style indexes is that "2d" indexes impose a default limit of 100 if none is specified. As you can see, the default query plan for 2dsphere indexes does not impose such limit so the query is scanning the entire index ("nscannedObjects" : 172980). If you ran the same query on a "2d" index you would see "n" and "nscannedObjects" are only 100 which explains the cost discrepancy.
If all of your items were within the $maxDistance range (try it with $maxDistance 20M meters, for instance), you will see the query performance degrade back to where it was without it. In either case, it is very important to use limit() to tell the query plan to only scan the necessary results within the index to prevent runaways, especially with larger data sets.

I have solved the problem. The $near command requires $maxDistance argument as specified here: http://docs.mongodb.org/manual/applications/2dsphere/ . As soon as I supplied $maxDistance, the query time reduced to less than 100 ms.

Related

In MongoDB, how can I sort a 2d indexed $near query when there are over 100 records?

I have a query that uses $near to filter records down to a proximity. It is then supposed to be sorting the results by a separate field. However I'm running into a situation where records are missing even though they match the criteria.
I suspect this is due to the fact that using $near with 2d indexes has a 100 record limit. What I believe is happening is that the geospatial sort is occurring first and mine is only then being applied to the top 100 records of that result.
Is there anyway to overcome this behavior? Can I disregard the sort of $near and use my own as the primary sort or, alternatively, circumvent the 100 record limit so that my sort applies to the entire set?
Here is the explain() from the query I'm using:
db.properties.find({
loc: {
$near: [-80.173366, 34.07868],
$maxDistance: 5
}}).sort({mls: -1}).explain()
{
"cursor" : "GeoSearchCursor",
"isMultiKey" : false,
"n" : 100,
"nscannedObjects" : 211,
"nscanned" : 700,
"nscannedObjectsAllPlans" : 211,
"nscannedAllPlans" : 700,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
},
"server" : "slate:27017",
"filterSet" : false
}
I ran into the same problem a while ago, you can use aggregate - $match. I have used the following Snippet at a hackaton.
db.kickstarter.aggregate(
{'$match' :
{geo2 :
{$geoWithin :
{ $centerSphere :[[parseFloat(lng), parseFloat(lat) ], radius/6371 ]
}
}
}
},
{$sort : {'pledged' : -1}},
{$limit : 1000}, //you can set your limit here
function(err, data){
if(err)console.log(err);
}
);

MongoDB fulltext search not using index

We use mongoDB fulltext search to find products in our database.
Unfortunately it is incredible slow.
The collection contains 89.114.052 documents and I have the suspicion, that the full text index is not used.
Performing a search with explain(), nscannedObjects returns 133212.
Shouldn't this be 0 if an index is used?
My index:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "textIndex",
"ns" : "search.products",
"weights" : {
"brand" : 1,
"desc" : 1,
"ean" : 1,
"name" : 3,
"shop_product_number" : 1
},
"default_language" : "german",
"background" : false,
"language_override" : "language",
"textIndexVersion" : 2
}
The complete test search:
> db.products.find({ $text: { $search: "playstation" } }).limit(100).explain()
{
"cursor" : "TextCursor",
"n" : 100,
"nscannedObjects" : 133212,
"nscanned" : 133212,
"nscannedObjectsAllPlans" : 133212,
"nscannedAllPlans" : 133212,
"scanAndOrder" : false,
"nYields" : 1041,
"nChunkSkips" : 0,
"millis" : 105,
"server" : "search2:27017",
"filterSet" : false
}
Please have a look at the question you asked:
".... The collection contains 89.114.052 documents and I have the suspicion, that the full text index is not used ...."
You are only "nScanned" for 133212 documents. Of course the index is used. If it was not then 89,114,052 documents ( because this is English locale and not German ) would have otherwise been reported in "nScanned" which means an index is not used.
Your query is slow. Well it seems your hardware is not up to the task of keeping 1333212 documents in memory or otherwise having the super fast disk to "page" effectively. But this is not a MongoDB problem but yours.
You have over 100,000 documents that match your query and even if you just want 100 then you need to accept this is how this works and MongoDB does not "give up" once you have matched 100 documents and yield control. The query pattern here finds all of the matches and then applies the "limit" to the cursor in order just to return the most recent.
Maybe some time in the future the "text" functionality might allow you do do things like you can do in the aggregate version of $geoNear and specify "minimum" and "maximum" values for a "score" in order to improve results. But right now it does not.
So either upgrade your hardware or use an external text search solution if your problem is the slow results on matching over 100,000 documents out of over 89,000,000 documents.

Projection makes query slower

I have over 600k of record in MongoDb.
my user Schema looks like this:
{
"_id" : ObjectId,
"password" : String,
"email" : String,
"location" : Object,
"followers" : Array,
"following" : Array,
"dateCreated" : Number,
"loginCount" : Number,
"settings" : Object,
"roles" : Array,
"enabled" : Boolean,
"name" : Object
}
following query:
db.users.find(
{},
{
name:1,
settings:1,
email:1,
location:1
}
).skip(656784).limit(10).explain()
results into this:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 656794,
"nscanned" : 656794,
"nscannedObjectsAllPlans" : 656794,
"nscannedAllPlans" : 656794,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 5131,
"nChunkSkips" : 0,
"millis" : 1106,
"server" : "shreyance:27017",
"filterSet" : false
}
and after removing projection same query db.users.find().skip(656784).limit(10).explain()
results into this:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 656794,
"nscanned" : 656794,
"nscannedObjectsAllPlans" : 656794,
"nscannedAllPlans" : 656794,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 5131,
"nChunkSkips" : 0,
"millis" : 209,
"server" : "shreyance:27017",
"filterSet" : false
}
As far I know projection always increase performance of a query. So I am unable to understand why MongoDB is behaving like this. Can someone explain this. And when to use projection and when not. And how actually projection is implemented in MongoDB.
You are correct that projection makes this skip query slower in MongoDB 2.6.3. This is related to an optimisation issue with the 2.6 query planner tracked as SERVER-13946.
The 2.6 query planner (as at 2.6.3) is adding SKIP (and LIMIT) stages after projection analysis, so the projection is being unnecessarily applied to results that get thrown out during the skip for this query. I tested a similar query in MongoDB 2.4.10 and the nScannedObjects was equal to the number of results returned by my limit rather than skip + limit.
There are several factors contributing to your query performance:
1) You haven't specified any query criteria ({}), so this query is doing a collection scan in natural order rather than using an index.
2) The query cannot be covered because there is no projection.
3) You have an extremely large skip value of 656,784.
There is definitely room for improvement on the query plan, but I wouldn't expect skip values of this magnitude to be reasonable in normal usage. For example, if this was an application query for pagination with 50 results per page your skip() value would be the equivalent of page number 13,135.
Unless the result of your projection does something to produce an "index only" query, and that means only the the fields "projected" in the result are all present in the index only, then you are always producing more work for the query engine.
You have to consider the process:
How do I match? On document or index? Find appropriate primary or other index.
Given the index, scan and find things.
Now what do I have to return? Is all of the data in the index? If not go back to the collection and pull the documents.
That is the basic process. So unless one of those stages "optimizes" in any way then of course things "take longer".
You need to look at this as designing a "server engine" and understand the steps that need to be undertaken. Considering none of your conditions met anything that would produce "optimal" on the specified steps you need to learn to accept that.
Your "best" case, is wher only the projected fields are the fields present in the chosen index. But really, even that has the overhead of loading the index.
So choose wisely, and understand the constraints and memory requirements for what you are writing our query for. That is what "optimization" is all about.

Slow range query on a multikey index

I have a MongoDB collection named post with 35 million objects. The collection has two secondary indexes defined as follows.
> db.post.getIndexKeys()
[
{
"_id" : 1
},
{
"namespace" : 1,
"domain" : 1,
"post_id" : 1
},
{
"namespace" : 1,
"post_time" : 1,
"tags" : 1 // this is an array field
}
]
I expect the following query, which simply filters by namespace and post_time, to run in a reasonable time without scanning all objects.
>db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).count()
7408
However, it takes MongoDB at least ten minutes to retrieve the result and, curiously, it manages to scan 70 million objects to do the job according to the explain function.
> db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).explain()
{
"cursor" : "BtreeCursor namespace_1_post_time_1_tags_1",
"isMultiKey" : true,
"n" : 7408,
"nscannedObjects" : 69999186,
"nscanned" : 69999186,
"nscannedObjectsAllPlans" : 69999186,
"nscannedAllPlans" : 69999186,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 378967,
"nChunkSkips" : 0,
"millis" : 290048,
"indexBounds" : {
"namespace" : [
[
"my_namespace",
"my_namespace"
]
],
"post_time" : [
[
ISODate("2013-04-09T00:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
],
"tags" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "localhost:27017"
}
The difference between the number of objects and the number of scans must be caused by the lengths of the tag arrays (which are all equal to 2). Still, I don't understand why post_time filter does not make use of the index.
Can you tell me what I might be missing?
(I am working on a descent machine with 24 cores and 96 GB RAM. I am using MongoDB 2.2.3.)
Found my answer in this question: Order of $lt and $gt in MongoDB range query
My index is a multikey index (on tags) and I am running a range query (on post_time). Apparently, MongoDB cannot use both sides of the range as a filter in this case, so it just picks the $gte clause, which comes first. As my lower limit happens to be the lowest post_time value, MongoDB starts scanning all the objects.
Unfortunately, this is not the whole story. Trying to solve the problem, I created non-multikey indexes too but MongoDB insisted on using the bad one. That made me think that the problem was elsewhere. Finally, I had to drop the multikey index and create one without the tags field. Everything is fine now.

Indexing with mongodb: bad performance / indexOnly=false

I have a mongodb on a 8GB linux machine running. Currently it's in test-mode so there are very few other requests coming in if any at all.
I have a colelction items with 1 million documents in it. I am creating an index on the fields: PeerGroup and CategoryIds (which is an array of 3-6 elements which will yield in an multi key): db.items.ensureIndex({PeerGroup:1, CategoryIds:1}.
When I am querying
db.items.find({"CategoryIds" : new BinData(3,"xqScEqwPiEOjQg7tzs6PHA=="), "PeerGroup" : "anonymous"}).explain()
I have the following results:
{
"cursor" : "BtreeCursor PeerGroup_1_CategoryIds_1",
"isMultiKey" : true,
"n" : 203944,
"nscannedObjects" : 203944,
"nscanned" : 203944,
"nscannedObjectsAllPlans" : 203944,
"nscannedAllPlans" : 203944,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 680,
"indexBounds" : {
"PeerGroup" : [
[
"anonymous",
"anonymous"
]
],
"CategoryIds" : [
[
BinData(3,"BXzpwVQozECLaPkJy26t6Q=="),
BinData(3,"BXzpwVQozECLaPkJy26t6Q==")
]
]
},
"server" : "db02:27017"
}
I think 680ms is not that very fast. Or is this acceptable?
Also, why does it say "indexOnly:false" ?
I think 680ms is not that very fast. Or is this acceptable?
That kind of depends on how big these objects are and whether this was a first run. Assuming the whole data set (including the index) you are returning fits into memory, then they next time you run this it will be an in-memory query and will then return basically as fast as possible. The nscanned is high meaning that this query is not very selective, are most records going to have an "anonymous" value in PeerGroup? If so, and the CategoryId is more selective then you might try an index on {CategoryIds:1, PeerGroup:1} instead (use hint() to try out one versus the other).
Also, why does it say "indexOnly:false"
This simply indicates that all the fields you wish to return are not in the index, the BtreeCursor indicates that the index was used for the query (a BasicCursor would mean it had not). For this to be an indexOnly query, you would need to be returning only the two fields in the index (that is: {_id : 0, PeerGroup:1, CategoryIds:1}) in your projection. That would mean that it would never have to touch the data itself and could return everything you need from the index alone.