mongodb index. how to index a single object on a document, nested in an array - mongodb

I have the following document:
{
'date': date,
'_id': ObjectId,
'Log': [
{
'lat': float,
'lng': float,
'date': float,
'speed': float,
'heading': float,
'fix': float
}
]
}
for 1 document, the Log array can be some hundred entries.
I need to query the first and last date element of Log on each document. I know how to query it, but I need to do it fast, so I would like to build an index for that. I don't want to index Log.date since it is too big... how can I index them?

In fact it's hard to advise without knowing how you work with the documents. One of the solutions could be to use a sparse index. You just need to add a new field to every first and last array element, let's call it shouldIndex. Then just create a sparse index which includes shouldIndex and date fields. Here's a short example:
Assume we have this document
{"Log":
[{'lat': 1, 'lng': 2, 'date': new Date(), shouldIndex : true},
{'lat': 3, 'lng': 4, 'date': new Date()},
{'lat': 5, 'lng': 6, 'date': new Date()},
{'lat': 7, 'lng': 8, 'date': new Date(), shouldIndex : true}]}
Please note the first element and the last one contain shouldIndex field.
db.testSparseIndex.ensureIndex( { "Log.shouldIndex": 1, "Log.date":1 }, { spar
se: true } )
This index should contain entries only for your first and last elements.
Alternatively you may store first and last elements date field in a seperate array.
For more info on sparse indexes please refer to this article.
Hope it helps!

So there was an answer about indexing that is fundamentally correct. As of writing though it seems a little unclear whether you are talking about indexing at all. It almost seems like what you want to do is get the first and last date from the elements in your array.
With that in mind there are a few approaches:
1. The elements in your array have been naturally inserted in increasing date values
So if the way all writes that are made to this field is done, only with use of the $push operator over a period of time, and you never update these items, at least in so much as changing a date, then your items are already in order.
What this means is you just get the first and last element from the array
db.collection.find({ _id: id },{ Log: {$slice: 1 }}); // gets the first element
db.collection.find({ _id: id },{ Log: {$slice: -1 }}); // gets the last element
Now of course that is two queries but it's a relatively simple operation and not costly.
2. For some reason your elements are not naturally ordered by date
If this is the case, or indeed if you just can't live with the two query form, then you can get the first and last values in aggregation, but using $min and $max modifiers
db.collection.aggregate([
// You might want to match first. Just doing one _id here. (commented)
//{"$match": { "_id": id }},
//Unwind the array
{"$unwind": "$Log" },
//
{"$group": {
"_id": "$_id",
"firstDate": {"$min": "$Log.Date" },
"lastDate": {"$max": "$Log.Date" }
}}
])
So finally, if your use case here is getting the details of the documents that have the first and last date, we can do that as well, mirroring the initial two query form, somewhat. Using $first and $last :
db.collection.aggregate([
// You might want to match first. Just doing one _id here. (commented)
//{"$match": { "_id": id }},
//Unwind the array
{"$unwind": "$Log" },
// Sort the results on the date
{"$sort": { "_id._id": 1, "Log.date": 1 }},
// Group using $first and $last
{"$group": {
"_id": "$_id",
"firstLog": {"$first": "$Log" },
"lastLog": {"$last": "$Log" }
}}
])
Your mileage may vary, but those approaches may obviate the need to index if this indeed would the the only usage for that index.

Related

Can you have a collection that's randomly distributed in mongodb?

I have a collection that's essentially just a collection of unique IDs, and I want to store them randomly distributed so I can quickly just findOne instead of sampling them since it's quicker than aggregation.
I ran the following aggregation to sort it randomly:
db.my_coll.aggregate([{"$sample": {"size": 1200000}}, {"$out": {db: "db", coll: "my_coll"}}], {allowDiskUse: true})
it seems to work?
db.my_coll.find():
{ _id: 581848, schema_version: 1 },
{ _id: 1184557, schema_version: 1 },
{ _id: 213688, schema_version: 1 },
....
Is this allowed? I thought _id is a default index, and it should always be sorted by the index. I'm only ever removing elements from this collection, so it's fine if they don't get inserted randomly, but I don't know if this is just a hack that might at some point behave differently.

MongoDB match on document and subdocuments, what to use as indexes?

I have a lot of documents looking like this:
[{
"title": "Luxe [daagse] [verzorging] # Egypte! Incl. vluchten, transfers & 4* ho",
"price": 433,
"automatic": false,
"destination": "5d26fc92f72acc7a0b19f2c4",
"date": "2020-01-19T00:00:00.000+00:00",
"days": 8,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
},
{
"title": "Luxe [daagse] [verzorging] # Egypte! Incl. vluchten, transfers & 4* ho",
"automatic": true,
"destination": "5d26fc92f72acc7a0b19f2c4",
"prices": [{
"price": 433,
"date_from": "2020-01-19T00:00:00.000+00:00",
"date_to": "2020-01-28T00:00:00.000+00:00",
"day_count": 8,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
},
{
"price": 899,
"date_from": "2020-04-19T00:00:00.000+00:00",
"date_to": "2020-04-28T00:00:00.000+00:00",
"day_count": 19,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
}
]
}
]
As you can see, automatic deals have multiple prices (can be a lot, between 1000 and 4000) and does not have the original fields available.
Now I need to search in the original document as well in the subdocuments to look for a match.
This is the aggregation I use to search through the documents:
[{
"$match": {
"destination": {
"$in": ["5d26fc9af72acc7a0b19f313"]
}
}
}, {
"$match": {
"$or": [{
"prices": {
"$elemMatch": {
"price": {
"$lte": 1500,
"$gte": 400
},
"date_to": {
"$lte": "2020-04-30T22:00:00.000Z"
},
"date_from": {
"$gte": "2020-03-31T22:00:00.000Z"
},
"board_type": {
"$in": ["5d08e1bfff6c4f13f6db1e68"]
}
}
}
}, {
"price": {
"$lte": 1500,
"$gte": 400
},
"date": {
"$lte": "2020-04-30T22:00:00.000Z",
"$gte": "2020-03-31T22:00:00.000Z"
},
"board_type": {
"$in": ["5d08e1bfff6c4f13f6db1e68"]
}
}]
}
}, {
"$limit": 20
}]
I would like to speed things up, because it can be quite slow. I was wondering, what is the best index strategy for this aggregate, what fields do I use? Is this the best way of doing it or is there a better way?
From Mongo's $or docs:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
So with that in mind in order to avoid a collection scan in this pipeline you have to create a compound index containing both price and prices fields.
Remember that order matters in compound indexes so the order of the field should vary depending on your possible usage of it.
It seems to me that the index you want to create looks something like:
{destination: 1, date: 1, board_type: 1, price: 1, prices: 1}
A compound index including the match filter fields is required to make the aggregation run fast. In aggregation queries, having the $match stage early in the pipeline (preferably, first stage) utilizes indexes, if any are defined on the filter fields. In the posted query it is so, and defining the indexes is all needed for a fast query. But, index on what fields?
The index is going to be compound index; i.e., index on multiple fields of the query criteria. The index prefix starts with the destination field. The remaining index fields are to be determined. What are the remaining fields?
Most of these fields are in the prices array's sub-document fields - price, date_from, date_to and board_type. There is also the date field from the main document. Which of these fields need to be used in the compound index?
Defining indexes on array elements (or fields of sub-documents in an array) creates lots of index keys. This means lots of storage and for using the index the memory (or RAM). This is an important consideration. Indexes on array elements are called as multikey indexes. For an index to be properly utilized, the collection's documents and the index being used by the query (together called as working set) must fit into the RAM.
Another aspect you need to consider is the query selectivity. How many documents gets selected using a filter which uses an index field, is a factor. It is imperative that the filter field with must select a small set of the input documents to be effective. See Create Queries that Ensure Selectivity.
It is difficult to determine what other fields need to be considered (sure some of the fields of the prices) based on the above two factors. So, the index is going to be something like this:
{ destination: 1, fld1: 1, fld2: 1, ... }
The fld1, fld2, ..., are going to be the prices array sub-document fields and / or the date field. I think only one set of date fields can be used with the index. An example index can be one of these:
{ destination: 1, date: 1, "prices.price": 1, "prices.board_type": 1}
{ destination: 1, "prices.price": 1, "prices.date_from": 1, "prices.date_to": 1, "prices.board_type": 1}
Note the index keys order and the necessity of the price, date_from, date_to and board_type is to be determined based upon the two main factors - requirement of the working set and the query selectivity - this is important.
NOTES: On a small sample data set with similar structure showed usage of the compound index with the primary destination field and two fields from the prices (one with equality condition and one with range condition). The query plan using the explain showed an IXSCAN (index scan) on the compound index, and using an index will sure improve the query performance.

How can i search by _id's feature in mongodb

For example, I want to update half of data whose _id is an odd number. such as:
db.col.updateMany({"$where": "this._id is an odd number"})
Instead of integer, _id is mongo's ObejectId which be regard as hexadecimal "string". It is not supported to code as:
db.col.updateMany(
{"$where": "this._id % 2 = 1"},
{"$set": {"a": 1}}
)
so, what is the correct format?
And what if molding according to _id?
This operation can also be done using two database calls.
Get List of _id from collection.
Push only ODD _id into an array.
Update the collection.
Updating the collection:
db.collection.update(
{ _id: { $in: ['id1', 'id2', 'id3'] } }, // Array with odd _id
{ $set: { urkey : urValue } }
)

Meteor + Mongo (2.6.7) Pushing Document to Array in Sorted Order

I have a document with an array (which should be denormalised, but can't be because the reactive events will fire "add" too many times at client startup).
I need to be able to push a document to that array, and keep it in sorted (or roughly sorted) order. I've tried this query:
{ $push: {
'events': {
$each: [{'id': new Mongo.ObjectID, 'start':startDate,...}],
$sort: {'start': 1},
$slice: -1
}
}
But it requires the $slice operator to be present... I don't want to delete all my old data, I just want to be able to insert data into an array, and then have that array be sorted so that I can query the array later and say "slice greater than or equal to time X".
Is this possible?
Edit:
This mongo aggregate query nearly works, except for one level of document in the result array, but aggregating is not reactive (probably because they're expensive computations). Here is the aggregate query if anyone can see how to translate it to a find, or why it can't be translated:
Coll.aggregate({$unwind: '$events'},
{$sort: {'events.start':1}},
{$match: {'events.start': {$gte: new Date()}}},
{$group: {_id: '$_id', 'events': {$push: '$events'} }})

Mongodb aggregate query help - grouping with multiple fields and converting to an array

I have the following document in the mongodb collection
[{quarter:'Q1',project:'project1',user:'u1',cost:'100'},
{quarter:'Q2',project:'project1',user:'u2',cost:'100'},
{quarter:'Q3',project:'project1',user:'u1',cost:'200'},
{quarter:'Q1',project:'project2',user:'u2',cost:'200'},
{quarter:'Q2',project:'project2',user:'u1',cost:'300'},
{quarter:'Q3',project:'project2',user:'u2',cost:'300'}]
i need to generate an output which will sum the cost based on quarter and project and put it in the format so that it can be rendered in the Extjs chart.
[{quarter:'Q1','project1':100,'project2':200,'project3':300},
{quarter:'Q2','project1':100,'project2':200,'project3':300},
{quarter:'Q3','project1':100,'project2':200,'project3':300}]
i have tried various permutations and combinations of aggregates but couldnt really come up with a pipeline. your help or direction is greatly appreciated
Your cost data appears to be strings, which isn't helping, but assuming you're around that:
The main component is the $cond operator in the document projection, and assuming your data is larger and you want to group the results:
db.mstats.aggregate([
// Optionaly match first depending on what you are doing
// Sum up cost for each quarter and project
{$group: {_id: { quarter: "$quarter", project: "$project" }, cost: {$sum: "$cost" }}},
// Change the "projection" in $group, using $cond to add a key per "project" value
// We use $sum and the false case of 0 to fill in values not in the row.
// These will then group on the key adding the real cost and 0 together.
{$group: {
_id: "$_id.quarter",
project1: {$sum: {$cond:[ {$eq: [ "$_id.project", "project1" ]}, "$cost", 0 ]}},
project2: {$sum: {$cond:[ {$eq: [ "$_id.project", "project2" ]}, "$cost", 0 ]}}
}},
// Change the document to have the "quarter" key
{$project: { _id:0, quarter: "$_id", project1: 1, project2: 1}},
// Optionall sort by quarter
{$sort: {quarter: 1 }}
])
So after doing the initial grouping the document is altered with use of $cond to determine if the value of a key is going to go into the new key name. Essentially this asks if the current value of project is "project1" then put the cost value into this project1 key. And so on.
As we put a 0 value into this new document key when there was no match, we need to group the results again in order to merge the documents. Sorting is optional, but probably what you want for a chart.
Naturally you will have to build this up dynamically and probably query for the project keys that you want. But otherwise this should be what you are looking for.