mongodb geoNear vs near - mongodb

It looks like mongodb offers two similar functions for geospatial queries - $near and $geoNear. According to the mongo docs
The geoNear command provides an alternative to the $near operator. In
addition to the functionality of $near, geoNear returns additional
diagnostic information.
It looks like geoNear provides a superset of the near functionality. For example, near seems to only return the closest 100 documents, whereas geoNear lets you specify a maximum. Is there a reason to use near instead of geoNear? Is one more efficient than the other?

Efficiency should be identical for either.
geoNear's major limitation is that as a command it can return a result set up to the maximum document size as all of the matched documents are returned in a single result document. It also requires that a distance field be added to each result document which may or may not be an issue depending on your usage.
$near is a query operator so the results can be larger than a single document (they are still returned in a single response but not a single document). You can also set the maximum number of documents via the query's limit().
I tend to recommend that users stick with the $near unless they need the diagnostics (e.g., distance, or location matched) from the geonear command.

These are major differences :-
$geoNear also gives you distance from the point but $near command doesn't.
$geoNear command requires that the collection have at most only one 2d index and/or only one 2dsphere index whereas geospatial query operators like $near and $geoWithin permit collections to have multiple geospatial indexes.
This is because in $geoNear command there is no option to specify the field on which you want to search, where as in $near command you can specify the field name.

The main difference is that $near is a query operator, but $geoNear is an aggregation stage. Both return documents in order of nearest to farthest from the given point.
What it means is that $near can be used in find() queries or in the $match aggregation stage, but $geoNear cannot. Instead $geoNear must be used as a separate aggregation stage only.
The options each feature provides also differ. I invite you to review the details in the corresponding documentation sections:
$near documentation
$geoNear documentation

The 100 documents limit with GeoNear is the default behaviour but you can just set the num fields as described on the mongodb documentation (http://docs.mongodb.org/manual/reference/command/geoNear/)
Default is set to 100 but you can set more. Unfortunately skip parameter is missing for the moment
(see https://jira.mongodb.org/browse/SERVER-3925)

Related

Does mongodb use index search in lookup stage?

I'm querying a collection with aggregate function in MongoDB and I have to look up some other collections in its aggregation. But I have a question about it:
Does MongoDB use indexes for foreignField? I wasn't able to figure this out and I searched
everywhere for this but I didn't get my answer. It must certainly use indexes for it but I just want to be sure.
The best way to determine how the database is executing a query is to generate and examine the explain output for the operation. With aggregations that include the $lookup stage specifically you will want to use the more verbose .explain("executionStats") mode. You may also utilize the $indexStats operator to confirm that the usage count of the intended index is increasing.
The best answer we can give based on the limited information in the question is: MongoDB will probably use the index. Query execution behavior, including index usage, depends on the situation and the version. If you provide more information in your question, then we can provide more specific information. There is also some details about index usage on the $lookup documentation page.

How to modify 2dsphere index without downtime for $geoNear queries?

$geoNear queries both require a geospatial index and also require only one geospatial index.
From the docs:
https://docs.mongodb.com/v3.4/reference/operator/aggregation/geoNear/#behavior
$geoNear requires a geospatial index.
The $geoNear requires that a collection have at most only one 2d index and/or only one 2dsphere index.
If I need to make changes to an existing geospatial index on a production system with frequent (one every few seconds) $geoNear queries, how would I apply this change without downtime?
I'm using Mongo 3.4 if that matters, and could upgrade to 3.6 if that would make this easier.
I just tried this on MongoDB 4.2.x and it appears to no longer be an issue. I don't know in which version this issue was resolved/improved. I had two 2dsphere indexes on the same collection and no queries were having issues.
According to the docs, this is still an issue, but only for $geoNear queries, and you can work around it by telling it which index to use:
If your collection has multiple 2dsphere index and/or multiple 2d
index, you must use the key option to specify the indexed field path
to use.
If you do not specify the key, you cannot have multiple
2dsphere index and/or multiple 2d index since without the key, index
selection among multiple 2d indexes or 2dsphere indexes is ambiguous.
https://docs.mongodb.com/manual/core/2dsphere/#geonear-and-geonear-restrictions

Difference between count() and find().count() in MongoDB

What is the difference between, I basically wanted to find all the documents in the mycollection.
db.mycollection.count() vs
db.mycollection.find().count()?
They both returns the same result. Is there any reason why would somebody choose the count() vs the find().count()? In contrast to the fact that find() has a default limit applied (correct me if I'm wrong) to which you would have to type "it" in order to see more in the shell.
db.collection.count() and cursor.count() are simply wrappers around the count command thus running db.collection.count() and cursor.count() with/without the same will return the same query argument, will return the same result. However the count result can be inaccurate in sharded cluster.
MongoDB drivers compatible with the 4.0 features deprecate their
respective cursor and collection count() APIs in favor of new APIs for
countDocuments() and estimatedDocumentCount(). For the specific API
names for a given driver, see the driver documentation.
The db.collection.countDocuments method internally uses an aggregation query to return the document count while db.collection.estimatedDocumentCount/ returns documents count based on metadata.
It is worth mentioning that the estimatedDocumentCount output can be inaccurate as mentioned in the documentation.
db.collection.count() without parameters counts all documents in a collection. db.collection.find() without parameters matches all documents in a collection, and appending count() counts them, so there is no difference.
This is confirmed explicitly in the db.collection.count() documentation:
To count the number of all documents in the orders collection, use the
following operation:
db.orders.count()
This operation is equivalent to the following:
db.orders.find().count()
As is mentioned in another answer by sheilak, the two are equivalent - except that db.collection.count() can be inaccurate for sharded clusters.
The latest documentation says:
count() is equivalent to the db.collection.find(query).count()
construct.
And then,
Sharded Clusters
On a sharded cluster, db.collection.count() can result in an
inaccurate count if orphaned documents exist or if a chunk migration
is in progress.
The documentation explains how to mitigate this bug (use an aggregate).
db.collection.count() is equivalent to the db.collection.find(query).count() construct.
Examples
Count all Documents in a Collection
db.orders.count()
This operation is equivalent to the following:
db.orders.find().count()
Count all Documents that Match a Query
Count the number of the documents in the orders collection with the field ord_dt greater than new Date('01/01/2012'):
db.orders.count( { ord_dt: { $gt: new Date('01/01/2012') } } )
The query is equivalent to the following:
db.orders.find( { ord_dt: { $gt: new Date('01/01/2012') } } ).count()
As per the documentation in the following scenario db.collection.count() can be inaccurate :
On a sharded cluster, db.collection.count() without a query predicate can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
After an unclean shutdown of a mongod using the Wired Tiger storage engine, count statistics reported by count() may be inaccurate.
I believe if you are using some kind of pagination like:
find(query).limit().skip().count()
You will not get the same result as
count(query)
So in cases like this, if you want to get the total, I think you might have to use both.

MongoDB {aggregation $match} vs {find} speed

I have a mongoDB collection with millions of rows and I'm trying to optimize my queries. I'm currently using the aggregation framework to retrieve data and group them as I want. My typical aggregation query is something like : $match > $group > $ group > $project
However, I noticed that the last parts only take a few ms, the beginning is the slowest.
I tried to perform a query with only the $match filter, and then to perform the same query with collection.find. The aggregation query takes ~80ms while the find query takes 0 or 1ms.
I have indexes on pretty much each field so I guess this isn't the problem. Any idea on what could go wrong ? Or is it just a "normal" drawback of the aggregation framework ?
I could use find queries instead of aggregation queries, however I would have to perform a lot of processing after the request and this process can be done quickly with $group etc. so I would rather keep the aggregation framework.
Thanks,
EDIT :
Here is my criteria :
{
"action" : "click",
"timestamp" : {
"$gt" : ISODate("2015-01-01T00:00:00Z"),
"$lt" : ISODate("2015-02-011T00:00:00Z")
},
"itemId" : "5"
}
The main purpose of the aggregation framework is to ease the query of a big number of entries and generate a low number of results that hold value to you.
As you have said, you can also use multiple find queries, but remember that you can not create new fields with find queries. On the other hand, the $group stage allows you to define your new fields.
If you would like to achieve the functionality of the aggregation framework, you would most likely have to run an initial find (or chain several ones), pull that information and further manipulate it with a programming language.
The aggregation pipeline might seem to take longer, but at least you know you only have to take into account the performance of one system - MongoDB engine.
Whereas, when it comes to manipulating the data returned from a find query, you would most likely have to further manipulate the data with a programming language, thus increasing the complexity depending on the intricacies of the programming language of choice.
Have you tried using explain() to your find queries? It'll give you good idea about how much time find() query will exactly take. You can do the same for $match with $explain & see whether there is any difference in index accessing & other parameters.
Also the $group part of aggregation framework doesn't utilize the indexing so it has to process all the records returned by $match stage of aggregation framework. So to better understand the the working of your query see the result set it returns & whether it fits into memory to be processed by MongoDB.
if you are concern with performance, then no doubt aggregation is time taking task rather then find clause.
when you are fetching record on multiple conditions, having lookup, grouping, and some limited record ( paginated) then it is best approch to use aggregate , meanwhile in find query is fast when you have to fetch very big data set. you have some population, projection and no pagination i suggest to use find query that is fast

MongoDb index intersection usage

I have trouble understanding what MongoDB is doing with my queries. My documents contain almost exclusively array fields, keeping me from using compound indexes.
every field is Indexed with ensureIndex({FieldName:1})
The queries are AND concatenated like that:
{$and: [{FIELD1:"field1Val"},{FIELD2:"field2Val"},{FIELD3:"field3Val"}]}
If i run this query, MongoDB appears to be using only one index.
Why isn't MongoDB using all the Indexes in parallel and then intersects them?
The same problem solved with Lucene runs 8 times faster then my MongoDB implementaition does now.
(Before v2.6, one of MongoDB's well-known limitation is that it can use only one index per query except some special cases using $or
To improve query speed, you can use hint() to enforce the index used. Choose the most seletive index.)
As the comments say, its no longer true. Use index intersection. It seems that u can use at most 2 index intersected. See : When are Compound Indexes still relevant in MongoDB 2.6, given the new Index Intersection feature?
#JohnnyHK Ty for the comments, it makes me learn new things.