Query Result Mismatch in MongoDB (Sharded Environment) - mongodb

What I am doing is:
1) Created a new shard database.
2) Created two new shard collections (source and target) and dumped data in it
3) Then I am running two map reduce on two sharded collections one after the other And putting the result of both the MR in same collection (say j_30052014125600).
This is perfectly working fine. I am running all this on sharded cluster having
1 shard server
1 config server
2 replicas
The problem is when I executed the query to count the documents
db.j_30052014125600.find({"value.ID": 305763}).count(),
I got the count 2 which is correct, but when I executed the find query
db.j_30052014125600.find({"value.ID": 305763}),
I got just one document.
I have copied here a small screenshot which will help you guys to understand.
Why this is so? Can any one please explain me?
Thanks
Update:
mongos> db.j_30052014125600.find({"value.ID": 305763}).explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"firstset/192.168.1.1:10002,192.168.1.2:10001" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 519034,
"nscanned" : 519034,
"nscannedObjectsAllPlans" : 519034,
"nscannedAllPlans" : 519034,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 1,
"millis" : 508,
"indexBounds" : {
},
"server" : "VM1:10002"
}
],
"secondset/192.168.1.1:10003,192.168.1.3:10004" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 280944,
"nscanned" : 280944,
"nscannedObjectsAllPlans" : 280944,
"nscannedAllPlans" : 280944,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 289,
"indexBounds" : {
},
"server" : "VM2:10004"
}
]
},
"cursor" : "BasicCursor",
"n" : 1,
"nChunkSkips" : 1,
"nYields" : 0,
"nscanned" : 799978,
"nscannedAllPlans" : 799978,
"nscannedObjects" : 799978,
"nscannedObjectsAllPlans" : 799978,
"millisShardTotal" : 797,
"millisShardAvg" : 398,
"numQueries" : 2,
"numShards" : 2,
"millis" : 533
}

Related

MongoDB index is not being used

I have a collection of questions with index on modified.
{
"v" : 1,
"key" : {
"modified" : 1
},
"name" : "modified_1",
"ns" : "app23568387.questions",
"background" : true,
"safe" : null
}
But when I query the questions with modified field, mongo does not use this index.
db.questions.find({modified: ISODate("2016-07-20T20:58:20.662Z")}).explain(true);
It returns
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 19315626,
"nscanned" : 19315626,
"nscannedObjectsAllPlans" : 19315626,
"nscannedAllPlans" : 19315626,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 384889,
"nChunkSkips" : 0,
"millis" : 43334,
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 19315626,
"nscanned" : 19315626,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0
}
],
"server" : "c387.candidate.37:10387",
"filterSet" : false,
"stats" : {
"type" : "COLLSCAN",
"works" : 19624020,
"yields" : 384889,
"unyields" : 384889,
"invalidates" : 3,
"advanced" : 0,
"needTime" : 19315627,
"needFetch" : 0,
"isEOF" : 1,
"docsTested" : 19315626,
"children" : []
}
}
When I use hint(), mongo throws an error bad hint.
I have another collection of folders which has exactly the same index and the query uses the index. (returns "cursor" : "BtreeCursor modified_1" for explain())
What could be the difference between questions and folders? Is it possible that the index is "broken" even though getIndexes() returns it? If so, what can I do to fix it?
It seems your index is not completely built in background. You can check this by using db.currentOp() command:
https://docs.mongodb.com/v3.0/reference/method/db.currentOp/#currentop-index-creation
Also check the mongod.log to see any error on the index building process.
The simple way to fix is to drop the index and create it again

MongoDB - Index scan low performance

I'm very new to MongoDB and i'm trying to test some performance in order to understand if my structure is fine.
I have a collection with 5 fields (3 date, one Int and one pointer to another ObjectId)
In this collection i've created an index on two fields:
_p_monitor_ref Asc (this is the pointer)
collected Desc (this is one Date field)
The index name is: _p_monitor_ref_1_collected_-1
I've created this index in the beginning and populated the table with some records. After that, i've duplicated the records many times with this script.
var bulk = db.measurements.initializeUnorderedBulkOp();
db.measurements.find().limit(1483570).forEach(function(document) {
document._id = new ObjectId();
bulk.insert(document);
});
bulk.execute();
Now, the collection have 3 million of document.
Now, i try to execute explain to see if the collection use the index and how many time is needed to be executed. This is the query:
db.measurements.find({ "_p_monitor_ref": "Monitors$iKNoB6Ga5P" }).sort({collected: -1}).explain()
As you see, i use _p_monitor_ref to search all documents by pointer, and then i order for collected -1 (this is the index)
This is the first result when i run it. MongoDB use the index (BtreeCursor _p_monitor_ref_1_collected_-1) but the execution time is very hight "millis" : 120286,:
{
"cursor" : "BtreeCursor _p_monitor_ref_1_collected_-1",
"isMultiKey" : false,
"n" : 126862,
"nscannedObjects" : 126862,
"nscanned" : 126862,
"nscannedObjectsAllPlans" : 126862,
"nscannedAllPlans" : 126862,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 23569,
"nChunkSkips" : 0,
"millis" : 120286,
"indexBounds" : {
"_p_monitor_ref" : [
[
"Monitors$iKNoB6Ga5P",
"Monitors$iKNoB6Ga5P"
]
],
"collected" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "my-pc",
"filterSet" : false
}
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2967141,
"nscannedObjects" : 2967141,
"nscanned" : 2967141,
"nscannedObjectsAllPlans" : 2967141,
"nscannedAllPlans" : 2967141,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 27780,
"nChunkSkips" : 0,
"millis" : 11501,
"server" : "my-pc",
"filterSet" : false
}
Now, if i execute the explain again this is the result and the time is "millis" : 201:
{
"cursor" : "BtreeCursor _p_monitor_ref_1_collected_-1",
"isMultiKey" : false,
"n" : 126862,
"nscannedObjects" : 126862,
"nscanned" : 126862,
"nscannedObjectsAllPlans" : 126862,
"nscannedAllPlans" : 126862,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 991,
"nChunkSkips" : 0,
"millis" : 201,
"indexBounds" : {
"_p_monitor_ref" : [
[
"Monitors$iKNoB6Ga5P",
"Monitors$iKNoB6Ga5P"
]
],
"collected" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "my-pc",
"filterSet" : false
}
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2967141,
"nscannedObjects" : 2967141,
"nscanned" : 2967141,
"nscannedObjectsAllPlans" : 2967141,
"nscannedAllPlans" : 2967141,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 23180,
"nChunkSkips" : 0,
"millis" : 651,
"server" : "my-pc",
"filterSet" : false
}
Why i have this two very different results ? Maybe the second execution take the data from some kind of cache...
Now, the collection have 3 million of record... what if the collection will grow and become 10/20/30 million ?
I dont know if i'm doing something wrong. Sure, i'm executing it on my Laptop (i dont have a SSD).
The reason why you have smaller execution time at second attempt is connected with fact, that first attempt forced mongo to load data into memory and data was still available in memory when second attempt was executed.
When your collection will grow, index will grow as well - so that could affect that it will be to big to fit in free memory blocks and mongodb engine will load/unload part of that index - so performance will vary.

MongoDB $or + sort + sharding = no index used

Consider the following query, which is a fairly simple use case:
db.Transactions.find({
$or: [
{ "from.addresses" : "name#domain.com" },
{ "to.addresses" : "name#domain.com" }
]
}).sort({ "time" : -1 });
"from.addresses" and "to.addresses" are indexed fields (arrays). Those indexes aren't compound. There is currently no index on "time".
Note that I'm using sharding on this collection and this might influence the behaviour of the query.
The issues are:
If I'm sorting on "time" (to paginate correctly the transactions to the user), no index is used and the whole collection is scanned (tens of millions of documents): .explain() => "cursor" : "BasicCursor" on all shards
If I remove the .sort(), then the indexes are correctly used
If I remove the $or, then the indexes are correctly used
Is it possible to make MongoDB use the indexes?
I'm currently considering running 2 separate queries (one for each side of the $or) and merging them myself (MUCH faster than the $or behaviour).
Please find below the full .explain() (running on a smaller collection than the real one, running on the real one would take hours):
{
"clusteredType" : "ParallelSort",
"shards" : {
"rs/mongo-a:27017,mongo-b:27017" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1356,
"nscannedObjects" : 45589,
"nscanned" : 45589,
"nscannedObjectsAllPlans" : 45589,
"nscannedAllPlans" : 45589,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 356,
"nChunkSkips" : 8014,
"millis" : 44726,
"indexBounds" : {
},
"server" : "mongo-a:27017"
}
],
"rs1/mongo-a1:27018,mongo-b1:27018" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 3435,
"nscannedObjects" : 15663,
"nscanned" : 15663,
"nscannedObjectsAllPlans" : 15663,
"nscannedAllPlans" : 15663,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 505,
"indexBounds" : {
},
"server" : "mongo-a1:27018"
}
],
"rs2/mongo-a2:27018,mongo-b2:27018" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2208,
"nscannedObjects" : 10489,
"nscanned" : 10489,
"nscannedObjectsAllPlans" : 10489,
"nscannedAllPlans" : 10489,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 329,
"indexBounds" : {
},
"server" : "mongo-a2:27018"
}
],
"rs3/mongo-a3:27018,mongo-b3:27018" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2249,
"nscannedObjects" : 10500,
"nscanned" : 10500,
"nscannedObjectsAllPlans" : 10500,
"nscannedAllPlans" : 10500,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 7,
"nChunkSkips" : 0,
"millis" : 439,
"indexBounds" : {
},
"server" : "mongo-a3:27018"
}
],
"rs4/mongo-a4:27018,mongo-b4:27018" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2251,
"nscannedObjects" : 10488,
"nscanned" : 10488,
"nscannedObjectsAllPlans" : 10488,
"nscannedAllPlans" : 10488,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 336,
"indexBounds" : {
},
"server" : "mongo-a4:27018"
}
],
"rs5/mongo-a5:27018,mongo-b5:27018" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1175,
"nscannedObjects" : 5220,
"nscanned" : 5220,
"nscannedObjectsAllPlans" : 5220,
"nscannedAllPlans" : 5220,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 2,
"nChunkSkips" : 0,
"millis" : 376,
"indexBounds" : {
},
"server" : "mongo-a5:27018"
}
]
},
"cursor" : "BasicCursor",
"n" : 12674,
"nChunkSkips" : 8014,
"nYields" : 375,
"nscanned" : 97949,
"nscannedAllPlans" : 97949,
"nscannedObjects" : 97949,
"nscannedObjectsAllPlans" : 97949,
"millisShardTotal" : 46711,
"millisShardAvg" : 7785,
"numQueries" : 6,
"numShards" : 6,
"millis" : 44939
}
There is a JIRA you might want to watch: https://jira.mongodb.org/browse/SERVER-1205

MongoDB outside range query

I am trying to query MongoDB to obtain something like:
"get persons with age not in the range [30,40]"
I am doing:
db.persons.find({'age' : {$nin : [{$lt : 30},{$gt : 40}]}})
which is not working for me. I know that I could do something like people with age<30 AND people with age>40 but I was wondering if I can use the "not in" operator...
thanks
What about using the OR conjunction like this:
db.persons.find($or: [{'age': {$lt: 30}},{'age': {$gt : 40}}])
$in / $nin are operators used for querying for discrete values in a list and can not be used for range searches.
In your example, the query with $nin would have to be
db.persons.find({age:{$nin:[30,31,32,33,34,35,36,37,38,39,40]}})
which is not at all practical and, furthermore, would not make use of an index:
db.persons.ensureIndex({age:1})
db.persons.find({age:{$nin:[30,31,32,33,34,35,36,37,38,39,40]}}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "Aspire-5750:27017"
}
Sgoettschkes' answer above is correct and would use the index:
db.persons.find({$or: [{'age': {$lt: 30}},{'age': {$gt : 40}}]}).explain()
{
"clauses" : [
{
"cursor" : "BtreeCursor age_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 12,
"indexBounds" : {
"age" : [
[
-1.7976931348623157e+308,
30
]
]
}
},
{
"cursor" : "BtreeCursor age_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"age" : [
[
40,
1.7976931348623157e+308
]
]
}
}
],
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"millis" : 12,
"server" : "Aspire-5750:27017"
}
For more information on querying effectively, see http://docs.mongodb.org/manual/core/read-operations/

slow Mongodb $near search with additional criteria

I have a collection, the data look like this:
{
"_id" : ObjectId("4e627655677c27cf24000000"),
"gps" : {
"lng" : 116.343079,
"lat" : 40.034283
},
"lat" : 1351672296
}
And I build a compound index:
{
"v" : 1,
"key" : {
"gps" : "2d",
"lat" : 1
},
"ns" : "test.user",
"name" : "gps__lat_1"
}
A pure $near query like below can be very fast ( < 20ms ):
>db.user.find({"gps":{"$near":{"lng":116.343079,"lat":40.034283}}}).explain()
{
"cursor" : "GeoSearchCursor",
"nscanned" : 100,
"nscannedObjects" : 100,
"n" : 100,
"millis" : 23,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
But the query with "lat" criteria is very slow ( 900ms+ ):
>db.user.find({"gps":{"$near":{"lng":116.343079,"lat":40.034283}},"lat":{"$gt":1351413167}}).explain()
{
"cursor" : "GeoSearchCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 665,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Can anybody explain this? Great thx!
I updated my Mongodb up to 2.2.0, the problem disappeared.
127.0.0.1/test> db.user.find({gps:{$near:[116,40]},lat:{$gt:1351722342}}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "zhangshenjiamatoMacBook-Air.local:27017"
}
From the explain above, it doesn't look like the geoIndex is being used at all - also, it looks like the above query didn't return any results!
If your query is using the 2d index, the explain should contain:
"cursor" : "GeoSearchCursor"
Can you check if upgrading to 2.2.0 really solved your issue? :)
Sundar