mongodb - Find document with closest integer value - mongodb

Let's assume I have a collection with documents with a ratio attribute that is a floating point number.
{'ratio':1.437}
How do I write a query to find the single document with the closest value to a given integer without loading them all into memory using a driver and finding one with the smallest value of abs(x-ratio)?

Interesting problem. I don't know if you can do it in a single query, but you can do it in two:
var x = 1; // given integer
closestBelow = db.test.find({ratio: {$lte: x}}).sort({ratio: -1}).limit(1);
closestAbove = db.test.find({ratio: {$gt: x}}).sort({ratio: 1}).limit(1);
Then you just check which of the two docs has the ratio closest to the target integer.
MongoDB 3.2 Update
The 3.2 release adds support for the $abs absolute value aggregation operator which now allows this to be done in a single aggregate query:
var x = 1;
db.test.aggregate([
// Project a diff field that's the absolute difference along with the original doc.
{$project: {diff: {$abs: {$subtract: [x, '$ratio']}}, doc: '$$ROOT'}},
// Order the docs by diff
{$sort: {diff: 1}},
// Take the first one
{$limit: 1}
])

I have another idea, but very tricky and need to change your data structure.
You can use geolocation index which supported by mongodb
First, change your data to this structure and keep the second value with 0
{'ratio':[1.437, 0]}
Then you can use $near operator to find the the closest ratio value, and because the operator return a list sorted by distance with the integer you give, you have to use limit to get only the closest value.
db.places.find( { ratio : { $near : [50,0] } } ).limit(1)
If you don't want to do this, I think you can just use #JohnnyHK's answer :)

Related

MongoDB fast count of subdocuments - maybe trough index

I'm using MongoDB 4.0 on mongoDB Atlas cluster (3 replicas - 1 shard).
Assuming i have a collection that contains multiple documents.
Each of this documents holding an array out of subdocuments that represent cities in a certain year with additional information. An example document would look like that (i removed unessesary information to simplify example):
{_id:123,
cities:[
{name:"vienna",
year:1985
},
{name:"berlin",
year:2001
}
{name:"vienna",
year:1985
}
]}
I have a compound index on and year. What is the fastest way to count the occurrences of name and year combinations?
I already tried the following aggregation:
[{$unwind: {
path: '$cities'
}}, {$group: {
_id: {
name: 'cities.name',
year: '$cities.year'
},
count: {
$sum: 1
}
}}, {$project: {
count: 1,
name: '$_id.name',
year: '$_id.year',
_id: 0
}}]
Another approach i tried was a map-reduce in the following form - the map reduce performed a bit better ~30% less time needed.
map function:
function m() {
for (var i in this.cities) {
emit({
name: this.cities[i].name,
year: this.cities[i].year
},
1);
}
}
reduce function (also tried to replace sum with length, but surprisingly sum is faster):
function r(id, counts) {
return Array.sum(counts);
}
function call in mongoshell:
db.test.mapReduce(m,r,{out:"mr_test"})
Now i was asking myself - Is it possible to access the index? As far as i know it is a B+ tree that holds the pointers to the relevant documents on disk, therefore from a technical point of view I think is would be possible to iterate through all leaves of the index tree and just counting the pointers? Does anybody if this is possible?
Does anybody knows another way to solve this approach in a high performant way? (It is not possible to change the design, because of other dependencies of the software, we are running this on a very big dataset). Has anybody maybe experience in solve such task via shards?
The index will not be very helpful in this situation.
MongoDB indexes were designed for identifying documents that match a given critera.
If you create an index on {cities.name:1, cities.year:1}
This document:
{_id:123,
cities:[
{name:"vienna",
year:1985
},
{name:"berlin",
year:2001
}
{name:"vienna",
year:1985
}
]}
Will have 2 entries in the b-tree that refer to this document:
vienna|1985
berlin|2001
Even if it were possible to count the incidence of a specific key in the index, this does not necessarily correspond.
MongoDB does not provide a method to examine the raw entries in an index, and it explicitly refuses to use an index on a field containing an array for counting.
The MongoDB count command and helper functions all count documents, not elements inside of them. As you noticed, you can unwind the array and count the items in an aggregation pipeline, but at that point you've already loaded all of the documents into memory, so it's too late to make use of an index.

MongoDB: What is the fastest / is there a way to get the 200 documents with a closest timestamp to a specified list of 200 timestamps, say using a $in [duplicate]

Let's assume I have a collection with documents with a ratio attribute that is a floating point number.
{'ratio':1.437}
How do I write a query to find the single document with the closest value to a given integer without loading them all into memory using a driver and finding one with the smallest value of abs(x-ratio)?
Interesting problem. I don't know if you can do it in a single query, but you can do it in two:
var x = 1; // given integer
closestBelow = db.test.find({ratio: {$lte: x}}).sort({ratio: -1}).limit(1);
closestAbove = db.test.find({ratio: {$gt: x}}).sort({ratio: 1}).limit(1);
Then you just check which of the two docs has the ratio closest to the target integer.
MongoDB 3.2 Update
The 3.2 release adds support for the $abs absolute value aggregation operator which now allows this to be done in a single aggregate query:
var x = 1;
db.test.aggregate([
// Project a diff field that's the absolute difference along with the original doc.
{$project: {diff: {$abs: {$subtract: [x, '$ratio']}}, doc: '$$ROOT'}},
// Order the docs by diff
{$sort: {diff: 1}},
// Take the first one
{$limit: 1}
])
I have another idea, but very tricky and need to change your data structure.
You can use geolocation index which supported by mongodb
First, change your data to this structure and keep the second value with 0
{'ratio':[1.437, 0]}
Then you can use $near operator to find the the closest ratio value, and because the operator return a list sorted by distance with the integer you give, you have to use limit to get only the closest value.
db.places.find( { ratio : { $near : [50,0] } } ).limit(1)
If you don't want to do this, I think you can just use #JohnnyHK's answer :)

Finding an object in MongoDb that has a value within a range

MongoDb 3.4.9
I have objects that look like this:
startIpNum:16779264
endIpNum:16781311
locId:47667
startIpNum:16781312
endIpNum:16785407
locId:879228
etc
How can I find just the object that has a range (between startIpNum and endIpNum) for 16779300?
db.collection.find({ startIpNum:{ $gte: 16779300 }, endIpNum:{$lte: 16779300} })
It includes both upper and lower limit

Meteor,Mongo query find every nth document

I use time stamps in my collection so every document has a time stamp, user want's to get documents from "ts1" (time stamp 1) to "ts2" (time stamp 2), however there a too many documents in that interval so I wan't to return only every nth, for example if there is 100000 documents, I need to display 1000 documents, so 100000/1000=100. every 100th document.
Is this possible, and how could I achieve this.
PS. I need to query this inside Meteor publish method.
Here's what I'v got so far:
Meteor.publish('documents-chunk', function (from, to) {
//get find documents count and get nth
var count = Documents.find({time: {$gte: from, $lte: to}}).count();
if (count > 2000) {
var nth = Math.round(count / 1000);
return Documents.find(/*query every nth*/);
}
return Documents.find({time: {$gte: from, $lte: to}});
});
SOLUTION:
I ~solved this problem using answer from Matt K.
This is what I'v done: first I modified my collection and added additional "id" field:
**
1.
**
Document.find({}, {sort: {time: 1}}).forEach(function (c, i) {
Document.update(c, {$set: {id: i + 1}});
console.log(i + 1);
});
This collection had little less than 1,5M records so it took some time, (also to note, I had to add index {time: 1} to this collection otherwise it would crash the database)
**
2.
**
Meteor.publish('documents-chunk', function (from, to) {
var nth = Math.round(Documents.find({time: {$gte: from, $lte: to}}, {sort: {time: 1}}).count() / 1000);
return Documents.find({time: {$gte: from, $lte: to, $mod: [nth, 0]}}, {sort: {time: 1}});
});
This worked for me, and now I get the result I needed;
I'v read at http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/ that this kinda approach is not recommended. But at this point of time I could not find any other solution to this problem, though I found that it is requested https://jira.mongodb.org/browse/SERVER-2397 so maybe in future there will be cleaner solution, but for now it works.
You can't, at least not to my knowledge. You've got three options:
publish and subscribe to all 100,000, then display every 1000th. Logically speaking, your query is based on the number of results returned from a query. That is a 2 step process no matter how you look at it.
If you wanted to be cute, you could have _id (or another field) be an auto-incrementing number. Then, set var qCount = cursor.count(). Then, query for _id % qCount === 0.
add a sample field to every 1000th record when it's created, then query for: {$exists: {sample: true}}
rethink the business logic. what's the value add of every 1000th record? If it's to "eyeball the data" you should probably be using an aggregate on the data anyways to get rid of outliers. (this is the right choice, but convincing the client is another story...)
If you believe that mongoDB _id values are truly randomly assigned, then you could simply order by _id and pick the first N of a set. This would give you N random values from the interval.
Meteor.publish('documents-chunk', function (from, to) {
return Documents.find({time: {$gte: from, $lte: to}},{sort: {_id: 1}, {limit: 1000}});
});
I'd recommend running some statistics on the randomness of what you get back.

How to $add together a subset of elements of an array in mongodb aggregation?

Here is the problem I want to resolve:
each document contains an array of 30 integers
the documents are grouped under a certain condition (not relevant here)
while grouping them, I want to:
add together the 29 last elements of the array (skipping the first one) of each document
sum the previous result among the same group, and return it
Data structure is very difficult to change and I cannot afford a migration + I still need the 30 values for another purpose. Here is what I tried, unsuccessfully:
db.collection.aggregate([
{$match: {... some matching query ...}},
{$project: {total_29_last_values: {$add: ["$my_array.1", "$my_array.2", ..., "$my_array.29"]}}},
{$group: {
... some grouping here ...
my_result: {$sum: "$total_29_last_values"}
}}
])
Theoretically (IMHO) this should work, given the definition of $add in mongodb documentation, but for some reason it fails:
exception: $add only supports numeric or date types, not Array
Maybe there is not support for adding together elements of an array, but this seems strange...
Thanks for your help !
From the docs,
The $add expression has the following syntax:
{ $add: [ <expression1>, <expression2>, ... ] }
The arguments can be any valid expression as long as they resolve to
either all numbers or to numbers and a date.
It clearly states that the $add operator accepts only numbers or dates.
$my_array.1 resolves to an empty array. for example, []. (You can always look for a match in particular index, such as, {$match:{"a.0":1}} but cannot derive the value from a particular index of an array. For that you need to use the $ or the $slice operators.This is currently an unresolved issue: JIRA1, JIRA2)
And the $add expression becomes $add:[[],[],[],..].
$add does not take an array as input and hence you get the error stating that it does not support Array as input.
What you need to do is:
Match the documents.
Unwind the my_array field.
Group together based on the _id of each document to get the sum
of all the elements in the array skipping the first element.
Project the summed field for each grouped document.
Again group the documents based on the condition to get the sum.
Stage operators:
db.collection.aggregate([
{$match:{}}, // condition
{$unwind:"$my_array"},
{$group:{"_id":"$_id",
"first_element":{$first:"$my_array"},
"sum_of_all":{$sum:"$my_array"}}},
{$project:{"_id":"$_id",
"sum_of_29":{$subtract:["$sum_of_all","$first_element"]}}},
{$group:{"_id":" ", // whatever condition
"my_result":{$sum:"$sum_of_29"}}}
])