I'm not able to optimize a distinct query using indexes.
My collection look like this :
{
"_id" : ObjectId("592ed92296232608d00358bd"),
"measurement" : ObjectId("592ed92196232608d0034c23"),
"loc" : {
"coordinates" : [
2.65939299848366,
50.4380671935187
],
"type" : "Point"
},
"elements" : [
ObjectId("592ed92196232608d0034c24"),
ObjectId("592ed92196232608d0034c26"),
ObjectId("592ed92196232608d0034c28")
]
}
I'm trying to execute a query like
db.mycol.distinct('elements', {
$and:[
measurement:{
$in:[
ObjectId("592ed92196232608d0034c23"),
ObjectId("592ed92196232608d0034c24")
]
},
{
loc:{
$geoWithin:{
$geometry:{
type:'Polygon',
coordinates:[[
[
2.0214843750000004,
50.25071752130677
],
[
2.0214843750000004,
50.65294336725709
],
[
3.0487060546875004,
50.65294336725709
],
[
3.0487060546875004,
50.25071752130677
],
[
2.0214843750000004,
50.25071752130677
]
]]
}
}
}
}
]
})
And I have this index :
{
measurement: 1,
loc: '2dsphere',
elements: 1
}
The query plan (db.mycol.explain().distinct(...)) shows an IXSCAN, but the query is taking ages. I added the index hoping that it could use a Mongo covered query. The doc states that
all the fields in the query are part of an index,
and all the fields returned in the results are in the same index.
So I guessed I needed an index including elements. But according to the query execution time, it's not using it.
What is the best way to index a collection for such a query ?
Covered queries don't work with arrays.
From the same page referred in the question:
Restrictions on Indexed Fields
An index cannot cover a query if:
any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index and cannot support a covered query.
Related
I have a MongoDB collection which contains a location (GeoJSON Point) and other fields to filter on.
{
"Location" : {
"type" : "Point",
"coordinates" : [
-118.42359,
33.974563
]
},
"Filters" : [
{
"k" : 1,
"v" : 5
},
{
"k" : 2,
"v" : 8
}
]
}
My query uses the aggregate function because it performs a sequence of filtering, sorting, grouping, etc... The first step where it's filtering is where I'm having trouble performing the geo near operation.
$geoNear: {
spherical: true,
near: [-118.236391, 33.782092],
distanceField: 'Distance',
query: {
// Filter by other fields.
Filters: {
$all: [
{ $elemMatch: { k: 1 /* Bedrooms */, v: 5 } }
]
}
},
maxDistance: 8046
},
For indexing I tried two approaches:
Approach #1: Create two separate indexes, one with the Location field and one with the fields we subsequently filter on. This approach is slow, with very little data in my collection it takes 3+ seconds to query within a 5 mile radius.
db.ResidentialListing.ensureIndex( { Location: '2dsphere' }, { name: 'ResidentialListingGeoIndex' } );
db.ResidentialListing.ensureIndex( { "Filters.k": 1, "Filters.v": 1 }, { name: 'ResidentialListingGeoQueryIndex' } );
Approach #2: Create one index with both the Location and other fields we filter on. Creating the index never completed, as it generated a ton of warnings about "Insert of geo object generated a high number of keys".
db.ResidentialListing.ensureIndex( { Location: '2dsphere', "Filters.k": 1, "Filters.v": 1 }, { name: 'ResidentialListingGeoIndex' } );
The geo index itself seems to work fine, if I only perform the $geoNear operation and don't try to query after then it executes in 60ms. However, as soon as I try to query on other fields after is when it gets slow. Any ideas would be appreciated on how to set up the query and indexes correctly so that it performs well...
I am trying to do a find request in mongoDB with the condition:
"if element contains a list that contains exactly theses elements".
It makes more sense with an example:
{
"categories" : [
[
"dogs",
"cats"
],
[
"dogs",
"octopus"
]
]
}
I want to find an element with a category containing only "dogs" and "octopus".
find({ 'categories' : ['dogs','octopus']}) finds the element
find({ 'categories' : ['octopus','dogs']}) doesn't find and that's where my issue is since I don't care about the order in the list
The output would be all the elements with a category containing only "dogs" and "octopus"
I am not sure if it's possible but if it's not the two solutions I see would be to store them in alphabetic order (good but what if I need the order afterwards?) or to store/search all the possible orders (very ugly)
You can use aggregation pipelines
db.collection.aggregate([
{ "$unwind": "$categories" },
{ "$match": { "categories" : { "$all" : [ "dogs", "octopus" ]}}}
])
This gives you the following document
{
"_id" : ObjectId("54c6685e7cdaa3f3e4dd8def"),
"categories" : [ "dogs", "octopus" ]
}
I have a very large collection ( more than 800k ) and I need to implement a query for auto-complete ( based on word beginnings only ) functionality based on tags. my documents look like this:
{
"_id": "theid",
"somefield": "some value",
"tags": [
{
"name": "abc tag1",
"vote": 5
},
{
"name": "hij tag2",
"vote": 22
},
{
"name": "abc tag3",
"vote": 5
},
{
"name": "hij tag4",
"vote": 77
}
]
}
if for example my query would be for all tags that start with "ab" and has a "somefield" that is "some value" the result would be "abc tag1","abc tag3" ( only names ).
I care about the speed of the queries much more than the speed of the inserts and updates.
I assume that the aggregation framework would be the right way to go here, but what would be the best pipeline and indexes for very fast querying ?
the documents are not 'tag' documents they are documents representing a client object, they contain much more data fields that I left out for simplicity, each client has several tags and another field ( I changed its name so it wont be confused with the tags array ). I need to get a set without duplicates of all tags that a group of clients have.
Your document structure doesn't make sense - I'm assuming tags is an array and not an object. Try queries like this
db.tags.find({ "somefield" : "some value", "tags.name" : /^abc/ })
with an index on { "maintag" : 1, "tags.name" : 1 }. MongoDB optimizes left-anchored regex queries into range queries, which can be fulfilled efficiently using an index (see the $regex docs).
You can get just the tags from this document structure using an aggregation pipeline:
db.tags.aggregate([
{ "$match" : { "somefield" : "some value", "tags.name" : /^abc/ } },
{ "$unwind" : "$tags" },
{ "$match" : { "tags.name" : /^abc/ } },
{ "$project" : { "_id" : 0, "tag_name" : "$tags.name" } }
])
Index only helps for first $match, so same indexes for the pipeline as for the query.
Here is an example of a document from the collection I am querying
meteor:PRIMARY> db.research.findOne({_id: 'Z2zzA7dx6unkzKiSn'})
{
"_id" : "Z2zzA7dx6unkzKiSn",
"_userId" : "NtE3ANq2b2PbWSEqu",
"collaborators" : [
{
"userId" : "aTPzFad8DdFXxRrX4"
}
],
"name" : "new one",
"pending" : {
"collaborators" : [ ]
}
}
I want to find all documents within this collection with either _userId: 'aTPzFad8DdFXxRrX4' or from the collaborators array, userId: 'aTPzFad8DdFXxRrX4'
So I want to look though the collection and check if the _userId field is 'aTPzFad8DdFXxRrX4'. If not then check the collaborators array on the document and check if there is an object with userId: 'aTPzFad8DdFXxRrX4'.
Here is the query I am trying to use:
db.research.find({$or: [{_userId: 'aTPzFad8DdFXxRrX4'}, {collaborators: {$in: [{userId: 'aTPzFad8DdFXxRrX4'}]}}] })
It does not find the document and gives me a syntax error. What is my issue here? Thanks
The $in operator is basically a simplified version of $or but you really only have one argument here so you should not even need it. Use dot notation instead:
db.research.find({
'$or': [
{ '_userId': 'aTPzFad8DdFXxRrX4'},
{ 'collaborators.userId': 'aTPzFad8DdFXxRrX4'}
]
})
If you need more than one value then use $in:
db.research.find({
'$or': [
{ '_userId': 'aTPzFad8DdFXxRrX4'},
{ 'collaborators.userId': {
'$in': ['aTPzFad8DdFXxRrX4','aTPzFad8DdFXxRrX5']
}}
]
})
I have a collections of objects with structure like this:
{
"_id" : ObjectId("5233a700bc7b9f31580a9de0"),
"id" : "3df7ce4cc2586c37607a8266093617da",
"published_at" : ISODate("2013-09-13T23:59:59Z"),
...
"topic_id" : [
284,
9741
],
...
"date" : NumberLong("1379116800055")
}
I'm trying to use the following query:
db.collection.find({"topic_id": { $in: [ 9723, 9953, 9558, 9982, 9833, 301, ... 9356, 9990, 9497, 9724] }, "date": { $gte: 1378944001000, $lte: 1378954799000 }, "_id": { $gt: ObjectId('523104ddbc7b9f023700193c') }}).sort({ "_id": 1 }).limit(1000)
The above query uses topic_id, date index but then it does not keep the order of returned results.
Forcing it to use hint({_id:1}) makes the results ordered, but the nscanned is 1 million documents even though limit(1000) is specified.
What am I missing?