SELECT avg(rate) FROM ratings WHERE sid=1 in MongoDB - mongodb

How to implement equivalent of this SQL command in MongoDB?
SELECT avg(rate) FROM ratings WHERE sid=1
No need to grouping.

Yes there is aggregation framework in mongodb where you can make a pipeline of stages you want for query.
db.collection.aggregate([
{
$match: {
"sid": 1
}
},
{
$project: avg(rate): {
$avg: "$rate"
}
}
])
As you know in sql query where part is applied first that's why we've place $match pipeline at first. $match in mongodb is somehow equivalent to where i SQL and there is $avg in mongodb which works the same as AVG in SQL

To solve this, use $avg within the $group aggregation pipeline element. Basic pipeline flow:
match on sid=1 (your WHERE clause)
group by sid (there's only one sid to group by at this point, because the others are filtered out via match), and generate an average within the group'd content
Your pipeline would look something like:
db.rates.aggregate(
[
{ $match: {"sid":1}},
{ $group: { _id: "$sid", rateAvg: {$avg: "$rate" } }}
])

Related

How does mongodb use an index to count documents?

According to docs, db.collection.countDocuments() wraps this:
db.collection.aggregate([
{ $match: <query> },
{ $group: { _id: null, n: { $sum: 1 } } }
])
Even if there is an index, all of the matched docs will be passed into the $group to be counted, no?
If not, how is mongodb able to count the docs without processing all matching docs?
The MongoDB query planner can make some optimizations.
In that sample aggregation, it can see that no fields are required except for the ones referenced in <query>, so it can add an implicit $project stage to select only those fields.
If those fields and the _id are all included in a single index, there is no need to fetch the documents to execute that query, all the necessary information is available from the index.

Mongodb ordering results by multiple fields in $group stage

I need to perform a similar task as mentioned in the below SQL query but in mongodb. Below is the SQL query:
last_value(status) over (partition by entity_entry_id order by
entity_id, entity_ingest_date)
As can be seen, I am partitioning the data by check_in_able_id and ordering the partitioned results by two fields and then selecting the last value in each partition.
How can I perform the same task in mongodb? I wrote the below query:
db.products.aggregate([
{
$group: {
_id: {
status_id: "$STATUS_ID"
},
last_entity_status: {
$last: "$ENTITY_ID, $ENTITY_INGEST_DATE"
}
}
}
])
but the above query doesn't work as $last takes only one parameter,

Why sort document by id is slower with $match than not in mongodb?

So, I tried to query
db.collection('collection_name').aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}])
the query above takes up 20s
but If I tried to query
db.collection('collection_name').aggregate([{$sort : {_id : -1}}])
it's only take 0.7s
Why does it the one without $match is actually faster than without match ?
update :
when I try this query
db.getCollection('callbackvirtualaccounts').aggregate([
{
$match: { owner_id: '5860457640b4fe652bd9c3eb' }
},
{
$sort: { created: -1 }
}
])
it's only takes 0.781s
Why sort by _id is slower than by created field ?
note : I'm using mongodb v3.0.0
db.collection('collection_name').aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}])
This collection probably won't be having and index on owner_id; Try using below mentioned index creation query and rerun your previous code.
db.collection('collection_name').createIndexes({ owner_id:1}) //Simple Index
or
db.collection('collection_name').createIndexes({ owner_id:1,_id:-1}) //Compound Index
**Note:: If you don't know how to compound index yet, you can create simple indexes individually on all keys which are used either in match or sort and that should be making query efficient as well.
The query speed depends upon a lot of factors. The size of collection, size of the document, indexes defined on the collection (and used in the queries and properly), the hardware components (like CPU, RAM, network) and other processes running at the time the query is running.
You have to tell what indexes are defined on the collection being discussed for further analysis. The command will retrieve them: db.collection.getIndexes()
Note the unique index on the _id field is created by default, and cannot be modified or deleted.
(i)
But If I tried to query: db.collection.aggregate( [ { $sort : { _id : -1 } } ] ) it's
only take 0.7s.
The query is faster because there is an index on the _id field and it is used in sort process. Aggregation queries use indexes with sort stage and when this sort happens early in the pipeline. You can verify if the index is used or not by generating a query plan (use explain with executionStats mode). There will be an index scan (IXSCAN) in the generated query plan.
(ii)
db.collection.aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}
])
The query above takes up 20s.
When I try this query it's only takes 0.781s.
db.collection.aggregate([
{
$match: { owner_id: '5860457640b4fe652bd9c3eb' }
},
{
$sort: { created: -1 }
}
])
Why sort by _id is slower than by created field ?
Cannot come to any conclusions with the available information. In general, the $match and $sort stages present early in the aggregation query can use any indexes created on the fields used in the operations.
Generating a query plan will reveal what the issues are.
Please run the explain with executionStats mode and post the query plan details for all queries in question. There is documentation for Mongodb v3.0.0 version on generation query plans using explain: db.collection.explain()

MongoDB Compass: select distinct field values

I am using MongoDB Compass and don't have Mongo Shell. I need to build a query using MongoDB Compass tool to select distinct values of the "genre" field from my collection.
Sample Input:
{"_id":{"$oid":"58c59c6a99d4ee0af9e0c34e"},"title":"Bateau-mouche sur la Seine","year":{"$numberInt":"1896"},"imdbId":"tt0000042","genre":["Documentary”,”Short”],"viewerRating":{"$numberDouble":"3.8"},"viewerVotes":{"$numberInt":"17"},"director":"Georges Mlis"}
{"_id":{"$oid":"58c59c6a99d4ee0af9e0c340"},"title":"Watering the Flowers","year":{"$numberInt":"1896"},"imdbId":"tt0000035","genre":["Short”],"viewerRating":{"$numberDouble":"5.3"},"viewerVotes":{"$numberInt":"33"},"director":"Georges M�li�s"}
{"_id":{"$oid":"58c59c6a99d4ee0af9e0c34a"},"title":"The Boxing Kangaroo","year":{"$numberInt":"1896"},"imdbId":"tt0000048","genre":["Short”],"viewerRating":{"$numberDouble":"5.2"},"viewerVotes":{"$numberInt":"48"},"director":"Birt Acres"}
Expected output: Documentary, Short
You can do this via aggregation framework in Compass, using $unwind and $group. The $unwind is performed to create a unique document for each element in the target array, which enables the $addToSet operator in the $group stage to then capture the genres as distinct elements.
Pipeline:
[
{
$unwind: {
path: '$genre',
preserveNullAndEmptyArrays: true
}
},
{
$group: {
_id: null,
uniqueGenres: { $addToSet: '$genre' }
}
}
]
See screenshot below for Compass example:

Trying to select single documents from mongo collection

We have a rudimentary versioning system in a collection that uses a field (pageId) as a root key. Subsequent versions of this page have the same pageId. This allows us to very easily find all versions of a single page.
How do I go about running a query that returns only the lastModified document for each distinct pageId.
In psuedo-code you could say:
For each distinct pageId
sort documents based on lastModified descending
and return only the first document
You can use the aggregation pipelines for that.
$sort - Sorts all input documents and returns them to the pipeline in sorted order.
$group - Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping.
$first - Returns the value that results from applying an expression to the first document in a group of documents that share the same group by key.
Example:
db.getCollection('t01').aggregate([
{
$sort: {'lastModified': -1}
},
{
$group: {
_id: "$pageId",
element1: { $first: "$element1" },
element2: { $first: "$element2" },
elementN: { $first: "$elementN" },
}
}
]);