2dsphere compound query performance slow? - mongodb

I've read the docs but It does not work as expected. I have 64 million of these documents all with different "millisecos":
{
"_id" : ObjectId("5396e85c12f43f5d1bafbd13"),
"author" : ObjectId("5396ca2b0fe95cf96599d881"),
"location" : {
"type": "Point",
"coordinates" : [
12.52891929999998,
16.620259
]
},
"name": "Jordan",
"description" : "aDescription"
}
And I have the following 3 indexes in Mongo:
{
// index on the _id
"location" : "2dsphere",
// compound index on location and time_1
}
How do I quickly query on the compound index to get distinct "names"? My query to get all applicable documents is this:
db.mycollection.find({"location":{"$geoWithin": {"$centerSphere": [-83.3434343,24.34343], 0.5}}, "millisecos": {"$gt": 1399522511000, "$lt":1399526111000}})
I am not sure how to get to the unique "names" of authors I want.
Is is slow, is indexing supposed to help? How do I make it fast? (w/o sharding). If query is bad, I am open to other recommendations. Does limit help or skip?
Query Plan:
{
"cursor": "S2Cursor",
"isMultiKey": true,
"n": 70394,
"nscannedObjects": 70394,
"nscanned": 10479843,
"nscannedObjectsAllPlans":70394
"scanAndOrder": false,
"nYields": 141,
"millis": 175250,
"indexBounds": { },
"nscanned": 10479843,
"matchTested": NumberLong(10024353),
"geoTested": NumberLong(60348),
"cellsInCover": NumberLong(26),
"server": mongo-cluster.excelmicro:27017
}

Related

Complex MongoDB query?

I'm pretty brand new to Mongo and queries still, so that said, I'm trying to build a query that will find me results that match these three types of dog breeds and in addition to that, check for additional two specs. And finally, sort all by age. All the data comes from a csv file (scrnshot), there aren't any sub categories to any of the entries.
db.animals.find({
"animal_id" : 1,
"breed" : "Labrador Retriever Mix",
"breed" : "Chesapeake Bay Retriever",
"breed" : "Newfoundland",
$and : [ { "age_upon_outcome_in_weeks" :{"$lt" : 156, "$gte" : 26} ],
$and: {"sex_upon_outcome" : "Intact Female"}}).sort({"age_upon_outcome_in_weeks" : 1})
This is throwing a number of errors, such as :
Error: error: {
"ok" : 0,
"errmsg" : "$and must be an array",
"code" : 2,
"codeName" : "BadValue"
}
What am I messing up? Or is there a better way to do it?
As mentionend by takis in the comments, you cannot repeat a key in a mongo query - you have to imagine that your query document becomes a json object, and each time a key is repeated is replaces the previous one. To go around this problem, mongodb supports $or and $and operators. For complex queries like this one, I would recommend starting with a global each containing a single constraint or a $or constraint. Your query becomes this:
db.coll.find({
"$and": [
{ "animal_id": 1 },
{ "age_upon_outcome_in_weeks": { "$lt": 156, "$gte": 26 } },
{ "sex_upon_outcome": "Intact Female" },
{ "$or": [
{ "breed": "Labrador Retriever Mix" },
{ "breed": "Chesapeake Bay Retriever" },
{ "breed": "Chesapeake Bay Retriever" },
{ "breed": "Newfoundland" }
]
}
]
})
.sort({"age_upon_outcome_in_weeks" : 1})
--- edit
You can also consider using the $in instead of the $or:
db.coll.find({
"animal_id": 1,
"age_upon_outcome_in_weeks": { "$lt": 156, "$gte": 26 },
"sex_upon_outcome": "Intact Female",
"breed": { "$in": [
"Labrador Retriever Mix",
"Chesapeake Bay Retriever",
"Chesapeake Bay Retriever",
"Newfoundland"
] }
})
.sort({"age_upon_outcome_in_weeks" : 1})

How to combine Documents in aggregation pipeline with MongoDB Java driver 3.6?

I am using an aggregation pipeline with the MongoDB Java driver version 3.6. If I have documents that look something like:
doc1 --
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
doc2 --
{
"CAR": {
"VIN": "ASDF1234",
"AVAILABILITY": "In Stock"
}
}
And if I submit a query like:
collection.aggregate(
Arrays.asList(
Aggregates.match(
and(
in("CAR.VIN", vinList),
or(
eq("CAR.MAKE", carMake),
eq("CAR.AVAILABILITY", carAvailability),
)
)
)
)
)
Let us assume that there are exactly two different records for which the "CAR.VIN" criteria match for every VIN, and I am going to get two results. Rather than deal with two results each time, I would like to merge the documents so that the result looks like this:
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord",
"AVAILABILITY": "In Stock"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
The example where I have two and only two results trivializes my need for this. Imagine that vinList is a list of 10000 values, and it might return 2 x 10000 documents. When I return an AggregateIterable to the client that is calling my code, I do not want to impose the requirement that they have to group or collate the results in any way, but that they will receive one document for each result that has all of the information that they will want to parse, cleanly and easily.
Of course, people will suggest that the data is simply combined into one document with all of the data in the MongoDB collection. For reasons that I cannot control, there are two separate documents corresponding to each VIN in the same collection, and that is something that I am unable to change. There is a value in our system that makes this more reasonable than it might seem, so please don't focus on this apparent problem with the data.
I am trying, with not much luck, to utilize the Aggretes.group() operation to merge the fields in my aggregation pipeline. Accumulators.push seems to be the closest operation to what I need, but I do not want to complicate the document structure with extra arrays, etc. Is there a straightforward approach that I am not seeing?
you can try $mergeObjects added in mongo v3.6
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$mergeObjects : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : {
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
}
}
>
to get features as array
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$push : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : [
{
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
},
null
]
}
>

Complex Sort on multiple very large MongoDB Collections

I have a mongodb database with currently about 30 collections ranging from 1.5gb to 2.5gb and I need to reformat and sort the data into nested groups and dump them to a new collection. This database will eventually have about 2000 collections of the same type and formatting of data.
Data is currently available like this:
{
"_id" : ObjectId("598392d6bab47ec75fd6aea6"),
"orderid" : NumberLong("4379116282"),
"regionid" : 10000068,
"systemid" : 30045305,
"stationid" : 60015036,
"typeid" : 7489,
"bid" : 0,
"price" : 119999.91,
"minvolume" : 1,
"volremain" : 6,
"volenter" : 8,
"issued" : "2015-12-31 09:12:29",
"duration" : "14 days, 0:00:00",
"range" : 65535,
"reportedby" : 0,
"reportedtime" : "2016-01-01 00:22:42.997926"} {...} {...}
I need to group these by regionid > typeid > bid like this:
{"regionid": 10000176,
"orders": [
{
"typeid": 34,
"buy": [document, document, document, ...],
"sell": [document, document, document, ...]
},
{
"typeid": 714,
"buy": [document, document, document, ...],
"sell": [document, document, document, ...]
}]
}
Here's more verbose a sample of my ideal output format: https://gist.github.com/BatBrain/cd3426c29ce8ca8152efd1fa06ca1392
I have been trying to use the db.collection.aggregate() to do this, running this command as an initial test step:
db.day_2016_01_01.aggregate( [{ $group : { _id : "$regionid", entries : { $push: "$$ROOT" } } },{ $out : "test_group" }], { allowDiskUse:true, cursor:{} })
But I have been getting this message, "errmsg" : "BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit."
I tried looking into how to use the cursor object, but I'm pretty confused about how to apply it in this situation, or even if that is a viable option. Any advice or solutions would be great.

Mongo DB - Slower With Index

I have about 1000000 documents in a collections (random generated).
Sample document:
{
"loc": {
"lat": 39.828475,
"lon": 116.273542
},
"phone": "",
"keywords": [
"big",
"small",
"biggest",
"smallest"
],
"prices": [
{
"category": "uRgpiamOVTEQ",
"list": [
{
"price": 29,
"name": "ehLYoPpntlil"
}
]
},
{
"category": "ozQNmdwpwhXPnabZ",
"list": [
{
"price": 96,
"name": "ITTaLHf"
},
{
"price": 88,
"name": "MXVgJFBgtwLYk"
}
]
},
{
"category": "EDkfKGZSou",
"list": [
{
"price": 86,
"name": "JluoCLnenOqDllaEX"
},
{
"price": 35,
"name": "HbmgMDfxCOk"
},
{
"price": 164,
"name": "BlrUD"
},
{
"price": 106,
"name": "LOUcsMDeaqVm"
},
{
"price": 14,
"name": "rDkwhN"
}
]
}
],
}
Search without indexes
> db.test1.find({"prices.list.price": { $gt: 190 } }).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 541098,
"nscannedObjects" : 1005584,
"nscanned" : 1005584,
"nscannedObjectsAllPlans" : 1005584,
"nscannedAllPlans" : 1005584,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 8115,
"nChunkSkips" : 0,
**"millis" : 13803,**
"server" : "localhost:27017",
"filterSet" : false
}
With indexes:
> db.test1.ensureIndex({"prices.list.price":1,"menu.list.name":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.test1.find({"prices.list.price": { $gt: 190 } }).explain()
{
"cursor" : "BtreeCursor prices.list.price_1_prices.list.name_1",
"isMultiKey" : true,
"n" : 541098,
"nscannedObjects" : 541098,
"nscanned" : 868547,
"nscannedObjectsAllPlans" : 541098,
"nscannedAllPlans" : 868547,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 16852,
"nChunkSkips" : 0,
**"millis" : 66227,**
"indexBounds" : {
"menu.list.price" : [
[
190,
Infinity
]
],
"menu.list.name" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "localhost:27017",
"filterSet" : false
}
Have any idea why indexed search slower than without index ?
Also i will use:
db.test1.find( { loc : { $near : [39.876045, 32.862245]}}) (need 2d indexes)
db.test1.find({ keywords:{$in: [ "small", "another" ] }}) (use index for keywords)
db.test1.find({"prices.list.name":/.s./ }) (no need to index because i will use regex)
An index allows faster access to location of the document that satisfies the query.
In your example, your query selects half of all the documents in the collection. So even though the index scan provides faster access to know which documents will match the query predicate, it actually creates a lot more work overall.
In collection scan, the query is scanning all of the documents, and checking the field that you are querying by to see if it matches. Half the time it ends up selecting the document.
In index scan, the query is traversing half of all the index entries and then jumping from them directly to the documents that satisfy the query predicate. That's more operations in your case.
In addition, while doing this, the operations are yielding the read mutex when they need to wait for the document they have to read to be brought into RAM, or when there is a write that is waiting to go, and the index scan is showing double the number of yields as the collection scan. If you don't have enough RAM for your working set, then adding an index will put more pressure on the existing resources and make things slower, rather than faster.
Try the same query with price compared to a much larger number, like 500 (or whatever would be a lot more selective in your data set). If the query is still slower with an index, then you are likely seeing a lot of page faulting on the system. But if there is enough RAM for the index, then the indexed query will be a lot faster while the unindexed query will be just as slow.
First, as a suggestion you will get more faster while querying arrays with elemMatch.
http://docs.mongodb.org/manual/reference/operator/query/elemMatch/
In your case
db.test1.find({"prices.list.price":{ $elemMatch: { $gte: 190 }} })
Second thing is
To index a field that holds an array value, MongoDB adds index items
for each item in the array. These multikey indexes allow MongoDB to
return documents from queries using the value of an array. MongoDB
automatically determines whether to create a multikey index if the
indexed field contains an array value; you do not need to explicitly
specify the multikey type.
Consider the following illustration of a multikey index:
Diagram of a multikey index on the addr.zip field. The addr field contains an array of
address documents. The address documents contain the zip field.
Multikey indexes support all operations supported by other MongoDB
indexes; however, applications may use multikey indexes to select
documents based on ranges of values for the value of an array.
Multikey indexes support arrays that hold both values (e.g. strings,
numbers) and nested documents.
from http://docs.mongodb.org/manual/core/index-multikey/

mongodb index an array's key (not the values)

In MongoDB, I have the following document
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": {
"blabla": 20,
"blibli": 100,
"blublu": 250,
... (many more here)
}
}
And I would like to index it to be able to query for the "key" of the "concept" array (I know it's not really a mongoDB array...):
db.things.find({concepts:blabla});
Is it possible with the above schema? Or shall I refactor my documents to something like
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
I'll answer your actual question. No you cannot index on the field names given your current schema. $exists uses an index but that is an existence check only.
There are a lot of problems with a schema like the one you're using and I would suggest a refactor to :
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": [
{name:"blabla", value: 20},
{name:"blibli", value: 100},
{name:"blublu", value: 250},
... (many more here)
]
}
then index {'concepts.name:1'} and you can actually query on the concept names rather than just check for the existence.
TL;DR : No you can't.
You can query field presence with specific query:
db.your_collection.find({"concept.yourfield": { $exists: true }})
(notice the $exists)
It will return all your document where yourfield is a field of concept subdocument
edit:
this solution is only about query. Indexes contains values not field.
MongoDB indexes each value of the array so you can query for individual items.As you can find here.
But in nested arrays you need to tell to index mongodb to index your sub-fields.
db.col1.ensureIndex({'concepts.blabla':1})
db.col1.ensureIndex({'concepts.blublu':1})
db.col1.find({'concepts.blabla': 20}).explain()
{
"cursor" : "BtreeCursor concepts.blabla_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"concepts.blabla" : [
[
20,
20
]
]
}
}
After creating the index , the cursor type changes itself from BasicCursor to BtreeCursor.
if you create your document as you stated at the end of your question
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
just the indexing will be enough as below:
db.col1.ensureIndex({'concepts':1})