I am currently working on a mongoDB database for a class project. However, I'm very new to it.
I must find all documents where the latitude is greater than a certain value: 39. Longitude value can be anything.
Please find below the example of one document, showing the global structure of the database.
Example
Looking at MongoDB's documentation, I have two hints:
Trying a nested document query ("end station location.coordinates.0:{$gt:39})
Yet, it is not working..
Trying some geometric intersection. As I'm very new to MongoDB, I don't know if it'd be the easiest way to find what's I'm looking for.
Could someone help me improving myself ?
Regards,
I think your problem is you are looking for values greater than 39 in the first position in the array. And these values in your example are negative numbers, so no one is grater than 39.
If you do the same query using $lt you will get results. Example here.
Also, to find values $gt: 39 you have to find the second position into the array:
db.collection.find({
"end station location.coordinates.1": {
"$gt": 39
}
})
Example here
Also, if you want to get the values if exists almost in one filed end station location or start station location you need $or operator like this:
db.collection.find({
"$or": [
{
"end station location.coordinates.1": {
"$gt": 39
}
},
{
"start station location.coordinates.1": {
"$gt": 39
}
}
]
})
Example here
And remember, you want to search by second position into the array, so you have to use coordinates.1.
Related
I'm using MongoDB 4.0 on mongoDB Atlas cluster (3 replicas - 1 shard).
Assuming i have a collection that contains multiple documents.
Each of this documents holding an array out of subdocuments that represent cities in a certain year with additional information. An example document would look like that (i removed unessesary information to simplify example):
{_id:123,
cities:[
{name:"vienna",
year:1985
},
{name:"berlin",
year:2001
}
{name:"vienna",
year:1985
}
]}
I have a compound index on and year. What is the fastest way to count the occurrences of name and year combinations?
I already tried the following aggregation:
[{$unwind: {
path: '$cities'
}}, {$group: {
_id: {
name: 'cities.name',
year: '$cities.year'
},
count: {
$sum: 1
}
}}, {$project: {
count: 1,
name: '$_id.name',
year: '$_id.year',
_id: 0
}}]
Another approach i tried was a map-reduce in the following form - the map reduce performed a bit better ~30% less time needed.
map function:
function m() {
for (var i in this.cities) {
emit({
name: this.cities[i].name,
year: this.cities[i].year
},
1);
}
}
reduce function (also tried to replace sum with length, but surprisingly sum is faster):
function r(id, counts) {
return Array.sum(counts);
}
function call in mongoshell:
db.test.mapReduce(m,r,{out:"mr_test"})
Now i was asking myself - Is it possible to access the index? As far as i know it is a B+ tree that holds the pointers to the relevant documents on disk, therefore from a technical point of view I think is would be possible to iterate through all leaves of the index tree and just counting the pointers? Does anybody if this is possible?
Does anybody knows another way to solve this approach in a high performant way? (It is not possible to change the design, because of other dependencies of the software, we are running this on a very big dataset). Has anybody maybe experience in solve such task via shards?
The index will not be very helpful in this situation.
MongoDB indexes were designed for identifying documents that match a given critera.
If you create an index on {cities.name:1, cities.year:1}
This document:
{_id:123,
cities:[
{name:"vienna",
year:1985
},
{name:"berlin",
year:2001
}
{name:"vienna",
year:1985
}
]}
Will have 2 entries in the b-tree that refer to this document:
vienna|1985
berlin|2001
Even if it were possible to count the incidence of a specific key in the index, this does not necessarily correspond.
MongoDB does not provide a method to examine the raw entries in an index, and it explicitly refuses to use an index on a field containing an array for counting.
The MongoDB count command and helper functions all count documents, not elements inside of them. As you noticed, you can unwind the array and count the items in an aggregation pipeline, but at that point you've already loaded all of the documents into memory, so it's too late to make use of an index.
Let's assume I have a collection with documents with a ratio attribute that is a floating point number.
{'ratio':1.437}
How do I write a query to find the single document with the closest value to a given integer without loading them all into memory using a driver and finding one with the smallest value of abs(x-ratio)?
Interesting problem. I don't know if you can do it in a single query, but you can do it in two:
var x = 1; // given integer
closestBelow = db.test.find({ratio: {$lte: x}}).sort({ratio: -1}).limit(1);
closestAbove = db.test.find({ratio: {$gt: x}}).sort({ratio: 1}).limit(1);
Then you just check which of the two docs has the ratio closest to the target integer.
MongoDB 3.2 Update
The 3.2 release adds support for the $abs absolute value aggregation operator which now allows this to be done in a single aggregate query:
var x = 1;
db.test.aggregate([
// Project a diff field that's the absolute difference along with the original doc.
{$project: {diff: {$abs: {$subtract: [x, '$ratio']}}, doc: '$$ROOT'}},
// Order the docs by diff
{$sort: {diff: 1}},
// Take the first one
{$limit: 1}
])
I have another idea, but very tricky and need to change your data structure.
You can use geolocation index which supported by mongodb
First, change your data to this structure and keep the second value with 0
{'ratio':[1.437, 0]}
Then you can use $near operator to find the the closest ratio value, and because the operator return a list sorted by distance with the integer you give, you have to use limit to get only the closest value.
db.places.find( { ratio : { $near : [50,0] } } ).limit(1)
If you don't want to do this, I think you can just use #JohnnyHK's answer :)
i want to ask some info related findAndModify in MongoDB.
As i know the query is "isolated by document".
This mean that if i run 2 findAndModify like this:
{a:1},{set:{status:"processing", engine:1}}
{a:1},{set:{status:"processing", engine:2}}
and this query potentially can effect 2.000 documents then because there are 2-query (2engine) then maybe that some document will have "engine:1" and someother "engine:2".
I don't think findAndModify will isolate the "first query".
In order to isolate the first query i need to use $isolated.
Is everything write what i have write?
UPDATE - scenario
The idea is to write an proximity engine.
The collection User has 1000-2000-3000 users, or millions.
1 - Order by Nearest from point "lng,lat"
2 - in NodeJS i make some computation that i CAN'T made in MongoDB
3 - Now i will group the Users in "UserGroup" and i write an Bulk Update
When i have 2000-3000 Users, then this process (from 1 to 3) take time.
So i want to have Multiple Thread in parallel.
Parallel thread mean parallel query.
This can be a problem since Query3 can take some users of Query1.
If this happen, then at point (2) i don't have the most nearest Users but the most nearest "for this query" because maybe another query have take the rest of Users. This can create maybe that some users in New York is grouped with users of Los Angeles.
UPDATE 2 - scenario
I have an collection like this:
{location:[lng,lat], name:"1",gender:"m", status:'undone'}
{location:[lng,lat], name:"2",gender:"m", status:'undone'}
{location:[lng,lat], name:"3",gender:"f", status:'undone'}
{location:[lng,lat], name:"4",gender:"f", status:'done'}
What i should be able to do, is create 'Group' of users by grouping by the most nearest. Each Group have 1male+1female. In the example above, i'm expecting to have only 1 group (user1+user3) since there are Male+Female and are so near each other (user-2 is also Male, but is far away from User-3 and also user-4 is also Female but have status 'done' so is already processed).
Now the Group are created (only 1 group) so the 2users are marked as 'done' and the other User-2 is marked as 'undone' for future operation.
I want to be able to manage 1000-2000-3000 users very fast.
UPDATE 3 : from community
Okay now. Can I please try to summarise your case. Given your data, you want to "pair" male and female entries together based on their proximity to each other. Presumably you don't want to do every possible match but just set up a list of general "recommendations", and let's say 10 for each user by the nearest location. Now I'd have to be stupid to not see the full direction of where this is going, but does this sum up the basic initial problem statement. Process each user, find their "pairs", mark them as "done" once paired and exclude them from other pairings by combination where complete?
This is a non-trivial problem and can not be solved easily.
First of all, an iterative approach (which admittedly was my first one) may lead to wrong results.
Given we have the following documents
{
_id: "A",
gender: "m",
location: { longitude: 0, latitude: 1 }
}
{
_id: "B",
gender: "f",
location: { longitude: 0, latitude: 3 }
}
{
_id: "C",
gender: "m",
location: { longitude: 0, latitude: 4 }
}
{
_id: "D",
gender: "f",
location: { longitude: 0, latitude: 9 }
}
With an iterative approach, we now would start with "A" and calculate the closest female, which, of course would be "B" with a distance of 2. However, in fact, the closest distance between a male and a female would be 1 (distance from "B" to "C"). But even when we found this, that would leave the other match, "A" and "D", at a distance of 8, where, with our previous solution, "A" would have had a distance of only 2 to "B".
So we need to decide what way to go
Naively iterate over the documents
Find the lowest sum of distances between matching individuals (which itself isn't trivial to solve), so that all participants together have the shortest travel.
Matching only participants within an acceptable distance
Do some sort of divide and conquer and match participants within a certain radius of a common landmark (say cities, for example)
Solution 1: Naively iterate over the documents
var users = db.collection.find(yourQueryToFindThe1000users);
// We can safely use an unordered op here,
// which has greater performance.
// Since we use the "done" array do keep track of
// the processed members, there is no drawback.
var pairs = db.pairs.initializeUnorderedBulkOp();
var done = new Array();
users.forEach(
function(currentUser){
if( done.indexOf(currentUser._id) == -1 ) { return; }
var genderToLookFor = ( currentUser.gender === "m" ) ? "f" : "m";
// using the $near operator,
// the returned documents automatically are sorted from nearest
// to farest, and since findAndModify returns only one document
// we get the closest matching partner.
var nearPartner = db.collection.findAndModify(
query: {
status: "undone",
gender: genderToLookFor,
$near: {
$geometry: {
type: "Point" ,
coordinates: currentUser.location
}
}
},
update: { $set: { "status":"done" } },
fields: { _id: 1}
);
// Obviously, the current use already is processed.
// However, we store it for simplifying the process of
// setting the processed users to done.
done.push(currentUser._id, nearPartner._id);
// We have a pair, so we store it in a bulk operation
pairs.insert({
_id:{
a: currentUser._id,
b: nearPartner._id
}
});
}
)
// Write the found pairs
pairs.execute();
// Mark all that are unmarked by now as done
db.collection.update(
{
_id: { $in: done },
status: "undone"
},
{
$set: { status: "done" }
},
{ multi: true }
)
Solution 2: Find the smallest sum of distances between matches
This would be the ideal solution, but it is extremely complex to solve. We need to all members of one gender, calculate all distances to all members of the other gender and iterate over all possible sets of matches. In our example it is quite simple, since there are only 4 combinations for any given gender. Thinking of it twice, this might be at least a variant of the traveling salesman problem (MTSP?). If I am right with that, the number of combinations should be
for all n>2, where n is the number of possible pairs.
and hence
for n=10
and an astonishing
for n=25
That's 7.755 quadrillion (long scale) or 7.755 septillion (short scale).
While there are approaches to solving this kind of problem, the world record is somewhere in the range of 25,000 nodes using massive amounts of hardware and quite tricky algorithms. I think for all practical purposes, this "solution" can be ruled out.
Solution 3
In order to prevent the problem that people might be matched with unacceptable distances between them and depending on your use case, you might want to match people depending on their distance to a common landmark (where they are going to meet, for example the next bigger city).
For our example assume we have cities at [0,2] and [0,7]. The distance (5) between the cities hence has to be our acceptable range for matches. So we do a query for each city
db.collection.find({
$near: {
$geometry: {
type: "Point" ,
coordinates: [ 2 , 0 ]
},
$maxDistance: 5
}, status: "done"
})
and iterate over the results naively. Since "A" and "B" would be the first in the result set, they would be matched and done. Bad luck for "C" here, as no girl is left for him. But when we do the same query for the second city he gets his second chance. Ok, his travel gets a bit longer, but hey, he got a date with "D"!
To find the respective distances, take a fixed set of cities (towns, metropolitan areas, whatever your scale is), order them by location and set each cities radius to the bigger of the two distances to their immediate neighbors. This way, you get overlapping areas. So even when a match can not be found in one place, it may be found on others.
Iirc, Google Maps allows it to grab the cities of a nation based on their size. An easier way would be to let people choose their respective city.
Notes
The code shown is not production ready and needs to be refined.
Instead of using "m" and "f" for denoting a gender, I suggest using 1 and 0: Can still be easily mapped, but needs less space to save.
Same goes for status.
I think the last solution is the best, optimizing distances some wayish and keeping the chances high for a match.
I've a collection of addresses, I would like to filter the collection to keep the 10 nearest address, then I would like to be able to sort them from the farther to the nearest.
Is that possible to achieve this within a single find request in meteor ?
The following gives me the 10 nearest addresses:
Addresses.find({}, {sort:{distance:1}, limit:10});
but they are ordered by increasing distance, obviously if I do set distance:-1 they will come by decreasing order but I will also get only the 10 farthest addresses…
You need the aggregation framework:
db.collection.aggregate(
{ $sort: { distance: 1 } },
{ $limit: 10 },
{ $sort: { distance: -1 } }
)
I hope the query is self-explanatory.
If you can't run an aggregation or native mongo query in MeteorJS, then you'll probably have to reverse the results you got from the DB query programatically.
If you fetch the result of your search and reverse it it should work.
Addresses.find({}, {sort:{distance:1}, limit:10}).fetch().reverse()
The only drawback is that now it's an array and not a cursor anymore
The database is near 5GB. I have documents like:
{
_id: ..
user: "a"
hobbies: [{
_id: ..
name: football
},
{
_id: ..
name: beer
}
...
]
}
I want to return users who have more then 0 "hobbies"
I've tried
db.collection.find({"hobbies" : { > : 0}}).limit(10)
and it takes all RAM and no result.
How to do conduct this select?
And how to return only: id, name, count ?
How to do it with c# official driver?
TIA
P.S.
near i've found:
"Add new field to hande category size. It's a usual practice in mongo world."
is this true?
In this specific case, you can use list indexing to solve your problem:
db.collection.find({"hobbies.0" : {$exists : true}}).limit(10)
This just makes sure a 0th element exists. You can do the same to make sure the list is shorter than n or between x and y in length by checking the existing of elements at the ends of the range.
Have you tried using hobbies.length. i haven't tested this, but i believe this is the right way to query the range of the array in mongodb
db.collection.find({$where: '(this.hobbies.length > 0)'})
You can (sort of) check for a range of array lengths with the $size operator using a logical $not:
db.collection.find({array: {$not: {$size: 0}}})
That's somewhat true.
According to the manual
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
$size
The $size operator matches any array with the specified number of
elements. The following example would match the object {a:["foo"]},
since that array has just one element:
db.things.find( { a : { $size: 1 } } );
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements
So you can check for array size 0, but not for things like 'larger than 0'
Earlier questions explain how to handle the array count issue. Although in your case if ZERO really is the only value you want to test for, you could set the array to null when it's empty and set the option to not serialize it, then you can test for the existence of that field. Remember to test for null and to create the array when you want to add a hobby to a user.
For #2, provided you added the count field it's easy to select the fields you want back from the database and include the count field.
if you need to find only zero hobbies, and if the hobbies key is not set for someone with zero hobbies , use EXISTS flag.
Add an index on "hobbies" for performance enhancement :
db.collection.find( { hobbies : { $exists : true } } );
However, if the person with zero hobbies has empty array, and person with 1 hobby has an array with 1 element, then use this generic solution :
Maintain a variable called "hcount" ( hobby count), and always set it equal to size of hobbies array in any update.
Index on the field "hcount"
Then, you can do a query like :
db.collection.find( { hcount : 0 } ) // people with 0 hobbies
db.collection.find( { hcount : 5 } ) // people with 5 hobbies
3 - From #JohnPs answer, "$size" is also a good operator for this purpose.
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size