MongoDB query by one index, sort by antoher - mongodb

The relevant fields of the documents in my collection are the following:
{
point: {
type: Point,
coordinates: [15.6446464, 45.231323]
}
score: 24
}
I have a 2dsphere index on point and a "normal", descending index on score.
I want to run the following query:
db.properties.find({point: {$geoWithin: <some polygon> }}).sort({score: -1}).limit(2000)
Is there any way to make mongo use the index on point for the find part, and then the index on score for sorting?
The collection has about 700k documents, the find part can return tens of thousands of documents, each of which has up to a MB.
The current problem is that, when using the point index, the size of the returned collection is too big for sorting in memory. When using the score index, the query is too slow because of a sequential scan on coordinates.

When executing your current query, MongoDB will only use the index on point. After running the find you will have a subset of the data and therefore Mongo can no longer use the index on score. You could instead create a composite index on point and score with score indexed in descending order. Even though the first values are unique, it helps to speed up the sorting as MongoDB can use the index to sort on score rather than having to scan through the entire document (which can be up to a MB in size).
The composite index follows the general rule of thumb when indexing. In general the order of an index should be:
Fields on which you will query for exact values.
Fields on which you will sort.
Fields on which you will query for a range of values.
However, as per your comment this composite index is not very fast and this suggest that MongoDB can't load the entire index into memory. How much RAM have you allocated to MongoDB ? Is there any chance you can increase this ?

Related

Is it possible to pre-sort a text index in MongoDB?

My understanding is that, in MongoDB, regular (not text) indexes are pre-sorted based on the parameters passed to createIndex(). For example, db.collection.createIndex({ name: 1 }) will create an index with documents sorted by name, in ascending order.
Is it possible to do this with text indexes? I have a large MongoDB collection (millions of documents) with a text index. When I perform a text search on the collection, I'd like to sort the results by created date... but the sort operation always runs out of memory. Can I set up the text index so that it's pre-sorted by created date (ie. no need to perform a sort operation after the results are retrieved)?
According to the text index docs it's not possible:
Sort operations cannot obtain sort order from a text index, even from
a compound text index; i.e. sort operations cannot use the ordering in
the text index.
Unfortunately it looks like sort on a text index is a real problem in MongoDB. There are multiple related issues on their tracker SERVER-36087, SERVER-24375,
SERVER-36794

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

MongoDB Indexing: Multiple single-field vs single compound?

I have a collection of geospatial+temporal data with a few additional properties, which I'll be displaying on a map. The collection has a few million documents at this point, and will grow over time.
Each document has the following fields:
Location: [geojson object]
Date: [Date object]
ZoomLevel: [int32]
EntryType: [ObjectID]
I need to be able to rapidly query this collection by any combination of location (generally a geowithin query), Date (generally $gte/$lt), ZoomLevel and EntryType.
What I'm wondering is: Should I make a compound index containing all four fields, or a single index for each field, or some combination thereof? I read in the MongoDB docs the following:
For a compound index that includes a 2dsphere index key along with
keys of other types, only the 2dsphere index field determines whether
the index references a document.
...Which sounds like it means having the 2dsphere index for Location be part of a compound index might be pointless?
Any clarity on this would be much appreciated.
For your use case you will need to use multiple indexes.
If you create one index covering all fields of your documents your queries will only be able to use it when they include the first field in the index.
Since you need to query by any combination of these four fields I suggest you to analyze your data access patterns and see exactly what filters are you actually using and create specific index for each one or group of them.
EDIT: For your question about 2dsphere, it does make sense to make them compound.
This note refers to the 'sparse' option. Sparse index references only documents that contains the index fields, for 2dspheres the only documents that will be left out is the ones that do not contain the geojson/point array.

How index is helping in this scenario is mongo db?

Let's say I have a collection with this structure,
student_id,score,score_type
I have an index on score, and I want to query score of student with id=10000 and order the results by score.
I ran the query on my dataset and this is what the query plan is,
1: First the db uses the index on score to sort the documents.
2: Then it does the filter on the document with id:1000
Even though we use an index here, all the docs are examined here for the match(since there is no index on student_id). My question is that if all the documents are to be examined,why doesn't the db consider this alternate plan
1: Do a collection wide search and do the filtering.
2: Then use the index on score to do the sorting.
Here sorting will be done on a smaller dataset, so it should be faster.
What is wrong with the second plan?
Only one index can be used per query.
So if you want to query for a key and sort for another, you need a multi key index:
db.collection.ensureIndex({student_id:1,score:1})
db.collection.find({student_id: 1000}).sort({score:1})

how to build index in mongodb in this situation

I have a mongodb database, which has following fields:
{"word":"ipad", "date":20140113, "docid": 324, "score": 98}
which is a reverse index for a log of docs(about 120 millions).
there are two kinds of queries in my system:
one of which is :
db.index.find({"word":"ipad", "date":20140113}).sort({"score":-1})
this query fetch the word "ipad" in date 20140113, and sort the all docs by score.
another query is:
db.index.find({"word":"ipad", "date":20140113, "docid":324})
to speed up these two kinds of query, what index should I build?
Should I build two indexes like this?:
db.index.ensureIndex({"word":1, "date":1, "docid":1}, {"unique":true})
db.index.ensureIndex({"word":1, "date":1, "score":1}
but I think build the two index use two much hard disk space.
So do you have some good ideas?
You are sorting by score descending (.sort({"score":-1})), which means that your index should also be descending on the score-field so it can support the sorting:
db.index.ensureIndex({"word":1, "date":1, "score":-1});
The other index looks good to speed up that query, but you still might want to confirm that by running the query in the mongo shell followed with .explain().
Indexes are always a tradeoff of space and write-performance for read-performance. When you can't afford the space, you can't have the index and have to deal with it. But usually the write-performance is the larger concern, because drive space is usually cheap.
But maybe you could save one of the three indexes you have. "Wait, three indexes?" Yes, keep in mind that every collection must have an unique index on the _id field which is created implicitely when the collection is initialized.
But the _id field doesn't have to be an auto-generated ObjectId. It can be anything you want. When you have another index with an uniqueness-constraint and you have no use for the _id field, you can move that unique-constraint to the _id field to save an index. Your documents would then look like this:
{ _id: {
"word":"ipad",
"date":20140113,
"docid": 324
},
"score": 98
}