Mongo order by length of array - mongodb

Lets say I have mongo documents like this:
Question 1
{
answers:[
{content: 'answer1'},
{content: '2nd answer'}
]
}
Question 2
{
answers:[
{content: 'answer1'},
{content: '2nd answer'}
{content: 'The third answer'}
]
}
Is there a way to order the collection by size of answers?
After a little research I saw suggestions of adding another field, that would contain number of answers and use it as a reference but may be there is native way to do it?

I thought you might be able to use $size, but that's only to find arrays of a certain size, not ordering.
From the mongo documentation:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24size
You cannot use $size to find a range of sizes (for example: arrays with more than 1 element). If you need to query for a range, create an extra size field that you increment when you add elements. Indexes cannot be used for the $size portion of a query, although if other query expressions are included indexes may be used to search for matches on that portion of the query expression.
Looks like you can probably fairly easily do this with the new aggregation framework, edit: which isn't out yet.
http://www.mongodb.org/display/DOCS/Aggregation+Framework
Update Now the Aggregation Framework is out...
> db.test.aggregate([
{$unwind: "$answers"},
{$group: {_id:"$_id", answers: {$push:"$answers"}, size: {$sum:1}}},
{$sort:{size:1}}]);
{
"result" : [
{
"_id" : ObjectId("5053b4547d820880c3469365"),
"answers" : [
{
"content" : "answer1"
},
{
"content" : "2nd answer"
}
],
"size" : 2
},
{
"_id" : ObjectId("5053b46d7d820880c3469366"),
"answers" : [
{
"content" : "answer1"
},
{
"content" : "2nd answer"
},
{
"content" : "The third answer"
}
],
"size" : 3
}
],
"ok" : 1
}

I use $project for this:
db.test.aggregate([
{
$project : { answers_count: {$size: { "$ifNull": [ "$answers", [] ] } } }
},
{
$sort: {"answers_count":1}
}
])
It also allows to include documents with empty answers.
But also has a disadvantage (or sometimes advantage): you should manually add all needed fields in $project step.

you can use mongodb aggregation stage $addFields which will add extra field to store count and then followed by $sort stage.
db.test.aggregate([
{
$addFields: { answers_count: {$size: { "$ifNull": [ "$answers", [] ] } } }
},
{
$sort: {"answers_count":1}
}
])

You can use $size attribute to order by array length.
db.getCollection('test').aggregate([
{$project: { "answers": 1, "answer_count": { $size: "$answers" } }},
{$sort: {"answer_count": -1}}])

Related

MongoDB- Subtract

Basically I want to subtract finish-start for each object.
So in the below example, the output should be 2 and 4 ((7-5=2) and (8-4=4)) and then I want to add that both fields into my existing documents.
How can I do this in MongoDB?
{
"title" : "The Hobbit",
"rating_average" : "???",
"ratings" : [
{
"title" : "best book ever",
"start" : 5,
"finish" : 7
},
{
"title" : "good book",
"start" : 4,
"finish" : 8
}
]
}
Not sure what exactly are the requirements for query output but this might help..
Query:
db.books.aggregate([
{ $unwind: "$ratings" },
{ $addFields: { "ratings.diff": { $subtract: [ "$ratings.finish", "$ratings.start" ] } } }
])
Pipeline Explanation:
Unwind the array because the $subtract cannot act on array data.
Add a new field called diff in the 'ratings' sub documents that has
the calculated value
EDIT:
OP asks for the results to be stored in the same subdocument. After discussions with friends and family I discovered this is possible but only with MongoDB version 4.2 or later. Here is an update statement that can achieve the desired results. This is an update not an aggregation.
Update: (MongoDB 4.2 or later specific)
db.books.update({
},
[
{
$replaceWith: {
ratings: {
$map: {
input: "$ratings",
in: {
start: "$$this.start",
finish: "$$this.finish",
diff: {
$subtract: [
"$$this.finish",
"$$this.start"
]
}
}
}
}
}
}
])

Select latest document after grouping them by a field in MongoDB

I got a question that I would expect to be pretty simple, but I cannot figure it out. What I want to do is this:
Find all documents in a collection and:
sort the documents by a certain date field
apply distinct on one of its other fields, but return the whole document
Best shown in an example.
This is a mock input:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("1998-11-04T18:46:14.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("1970-05-09T20:16:37.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
The expected output is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Or, in other words:
Group the input data by the commandName field
Inside each group sort the documents
Return the newest document from each group
My attempts to write this query have failed:
The distinct() function will only return the value of the field I am distinct-ing on, not the whole document. That makes it unsuitable for my case.
Tried writing an aggregate query, but ran into an issue of how to sort-and-select a single document from inside of each group? The sort aggreation stage will sort the groups among one other, which is not what I want.
I am not too well-versed in Mongo and this is where I hit a wall. Any ideas on how to continue?
For reference, this is the work-in-progress aggregation query I am trying to expand on:
db.getCollection('some_collection').aggregate([
{ $group: { '_id': '$commandName', 'docs': {$addToSet: '$$ROOT'} } },
{ $sort: {'_id.docs.???': 1}}
])
Post-resolved edit
Thank you for the answers. I got what I needed. For future reference, this is the full query that will do what was requested and also return a list of the filtered documents, not groups.
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': 1}},
{ $group: { '_id': '$commandName', 'result': { $last: '$$ROOT'} } },
{ $replaceRoot: {newRoot: '$result'} }
])
The query result without the $replaceRoot stage would be:
[
{
"_id": "migration_a",
"result": {
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
}
},
{
"_id": "migration_b",
"result": {
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
}
]
The outer _id and _result are just "group-wrappers" around the actual document I want, which is nested under the result key. Moving the nested document to the root of the result is done using the $replaceRoot stage. The query result when using that stage is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Try this:
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': -1}},
{ $group: { '_id': '$commandName', 'doc': {$first: '$$ROOT'} } }
])
I believe this will result in what you're looking for:
db.collection.aggregate([
{
$group: {
"_id": "$commandName",
"executionDate": {
"$last": "$executionDate"
}
}
}
])
You can check it out here
Of course, if you want to match your expected output exactly, you can add a sort (this may not be necessary since your goal is to simply return the newest document from each group):
{
$sort: {
"executionDate": 1
}
}
You can check this version out here.
The use-case the question presents is nearly covered in the $last aggregation operator documentation.
Which summarises:
the $group stage should follow a $sort stage to have the input
documents in a defined order. Since $last simply picks the last
document from a group.
Query: Link
db.collection.aggregate([
{
$sort: {
executionDate: 1
}
},
{
$group: {
_id: "$commandName",
executionDate: {
$last: "$executionDate"
}
}
}
]);

Remove https/https from documents of mongodb data

How to remove http:// or https:// from the beginning and '/' from the end of the tags.Domain in MongoDB aggregation?
Sample document:
{
"_id" : ObjectId("5d9f074f5833c8cd1f685e05"),
"tags" : [
{
"Domain" : "http://www.google.com",
"rank" : 1
},
{
"Domain" : "https://www.stackoverflow.com/",
"rank" : 2
}
]
}
Taking an assumption that the Domain field in tags would contain valid URLs with valid appends and prepends of (https, http, //, /, com/, org/, /in)
The $trim operator is used to remove https://, http://, and / from tags.Domain
NOTE: This would not work for a URL that is already formatted and doesn't contain those characters at the beginning/end. Example: 'hello.com' would become 'ello.com', 'xyz.ins' would become 'xyz.in' etc.
Aggregation Query:
db.collection.aggregate([
{
$addFields:{
"tags":{
$map:{
"input":"$tags",
"as":"tag",
"in":{
$mergeObjects:[
"$$tag",
{
"Domain":{
$trim: {
"input": "$$tag.Domain",
"chars": "https://"
}
}
}
]
}
}
}
}
}
]).pretty()
Output:(demo)
{
"_id" : 2, //ObjectId
"tags" : [
{
"rank" : 1,
"Domain" : "www.google.com"
},
{
"rank" : 2,
"Domain" : "www.stackoverflow.com"
}
]
}
The solution ended up being longer than I expected it to be (I hope someone can find a more concise solution), but here you go:
db.test.aggregate([
{$unwind:"$tags"}, //unwind tags so that we can separately deal with http and https
{
$facet: {
"https": [{ // the first stage will...
$match: { // only contain documents...
"tags.Domain": /^https.*/ // that are allowed by the match the regex /^https.*/
}
}, {
$addFields: { // for all matching documents...
"tags.Domain": {"$substr": ["$tags.Domain",8,-1]} // we change the tags.Domain field to required substring (skip 8 characters and go on till the last character)
}
}],
"http": [{ // similar as above except we're doing the inverse filter using $not
$match: {
"tags.Domain": { $not: /^https.*/ }
}
}, {
$addFields: { // for all matching documents...
"tags.Domain": {"$substr": ["$tags.Domain",7,-1]} // we change the tags.Domain field to required substring (skip 7 characters and go on till the last character)
}
}
]
}
},
{ $project: { all: { $concatArrays: [ "$https", "$http" ] } } }, //we have two arrays at this point, so we just concatenate them both to have one array called "all"
//unwind and group the array by _id to get the document back in the original format
{$unwind: "$all"},
{$group: {
_id: "$all._id",
tags: {$push: "$all.tags"}
}}
])
For removing the / from the end, you can have another facet with a regex that matches the url (something like /.*\/$/ should work), and use that facet in the concat as well.
With help from: https://stackoverflow.com/a/49660098/5530229 and https://stackoverflow.com/a/44729563/5530229
As dnickless said in the first answered referred above, as always with the aggregation framework, it may help to remove individual stages from the end of the pipeline and run the partial query in order to get an understanding of what each individual stage does.

MongoDB: How to match on the elements of an array?

I have two collections as follows:
db.qnames.find()
{ "_id" : ObjectId("5a4da53f97a9ca769a15d49e"), "domain" : "mail.google.com", "tldOne" : "google.com", "clients" : 10, "date" : "2016-12-30" }
{ "_id" : ObjectId("5a4da55497a9ca769a15d49f"), "domain" : "mail.google.com", "tldOne" : "google.com", "clients" : 9, "date" : "2017-01-30” }
and
db.dropped.find()
{ "_id" : ObjectId("5a4da4ac97a9ca769a15d49c"), "domain" : "google.com", "dropDate" : "2017-01-01", "regStatus" : 1 }
I would like to join the two collections and choose the documents for which 'dropDate' field (from dropped collection) is larger than the 'date' filed (from qnames field). So I used the following query:
db.dropped.aggregate( [{$lookup:{ from:"qnames", localField:"domain",foreignField:"tldOne",as:"droppedTraffic"}},
{$match: {"droppedTraffic":{$ne:[]} }},
{$unwind: "$droppedTraffic" } ,
{$match: {dropDate:{$gt:"$droppedTraffic.date"}}} ])
but this query does not filter the records where dropDate < date. Anyone can give me a clue of why it happens?
The reason why you are not getting the record is
Date is used as a String in your collections, to make use of the comparison operators to get the desired result modify your collection documents using new ISODate("your existing date in the collection")
Please note even after modifying both the collections you need to modify your aggregate query, since in the final $match query two values from the same document is been compared.
Sample query to get the desired documents
db.dropped.aggregate([
{$lookup: {
from:"qnames",
localField:"domain",
foreignField:"tldOne",
as:"droppedTraffic"}
},
{$project: {
_id:1, domain:1,
regStatus:1,
droppedTraffic: {
$filter: {
input: "$droppedTraffic",
as:"droppedTraffic",
cond:{ $gt: ["$$droppedTraffic.date", "$dropDate"]}
}
}
}}
])
In this approach given above we have used $filter which avoids the $unwind operation
You should use $redact to compare two fields of the same document. Following example should work:
db.dropped.aggregate( [
{$lookup:{ from:"qnames", localField:"domain",foreignField:"tldOne",as:"droppedTraffic"}},
{$match: {"droppedTraffic":{$ne:[]} }},
{$unwind: "$droppedTraffic" },
{
"$redact": {
"$cond": [
{ "$lte": [ "$dropDate", "$droppedTraffic.date" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])

Compare a date of two elements

My problem is difficult to explain :
In my website I save every action of my visitors (view, click, buy etc).
I have a simple collection named "flow" where my data is registered
{
"_id" : ObjectId("534d4a9a37e4fbfc0bf20483"),
"profile" : ObjectId("534bebc32939ffd316a34641"),
"activities" : [
{
"id" : ObjectId("534bebc42939ffd316a3af62"),
"date" : ISODate("2013-12-13T22:39:45.808Z"),
"verb" : "like",
"product" : "5"
},
{
"id" : ObjectId("534bebc52939ffd316a3f480"),
"date" : ISODate("2013-12-20T19:19:10.098Z"),
"verb" : "view",
"product" : "6"
},
{
"id" : ObjectId("534bebc32939ffd316a3690f"),
"date" : ISODate("2014-01-01T07:11:44.902Z"),
"verb" : "buy",
"product" : "5"
},
{
"id" : ObjectId("534bebc42939ffd316a3741b"),
"date" : ISODate("2014-01-11T08:49:02.684Z"),
"verb" : "favorite",
"product" : "26"
}
]
}
I would like to aggregate these data to retrieve the number of people who made an action (for example "view") and then another later in time (for example "buy"). To to that I need to compare "date" inside my "activities" array...
I tried to use aggregation framework to do that but I do not see how too make this request
This is my beginning :
db.flows.aggregate([
{ $project: { profile: 1, activities: 1, _id: 0 } },
{ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}, //First verb + second verb
{ $unwind: '$activities' },
{ $match: { 'activities.verb': {$in:['view', 'buy']} } }, //First verb + second verb,
{
$group: {
_id: '$profile',
view: { $push: { $cond: [ { $eq: [ "$activities.verb", "view" ] } , "$activities.date", null ] } },
buy: { $push: { $cond: [ { $eq: [ "$activities.verb", "buy" ] } , "$activities.date", null ] } }
}
}
])
Maybe the format of my collection "flow" is not the best to do what I want...If you have any better idea dont hesitate
Thank you for your help !
Here is the aggregation that will give you the total number of buyers who viewed first and then bought (though not necessarily the same product that they viewed).
db.flow.aggregate(
{$match: {"activities.verb":{$all:["view","buy"]}}},
{$unwind :"$activities"},
{$match: {"activities.verb":{$in:["view","buy"]}}},
{$group: {
_id:"$_id",
firstViewed:{$min:{$cond:{
if:{$eq:["$activities.verb","view"]},
then : "$activities.date",
else : new Date(9999,0,1)
}}},
lastBought: {$max:{$cond:{
if:{$eq:["$activities.verb","buy"]},
then:"$activities.date",
else:new Date(1900,0,1)}
}}}
},
{$project: {viewedThenBought:{$cond:{
if:{$gt:["$lastBought","$firstViewed"]},
then:1,
else:0
}}}},
{$group:{_id:null,totalViewedThenBought:{$sum:"$viewedThenBought"}}}
)
Here you first pass through the pipeline only the documents that have all the "verbs" you are interested in. When you group the first time, you want to use the earliest "view" and the last "buy" and the next project compares them to see if they viewed before they bought.
The last step gives you the count of all the people who satisfied your criteria.
Be careful to leave out all $project phases that don't actually compute any new fields (like you very first $project). The aggregation framework is smart enough to never pass through any fields that it sees are not used in any later stages, so there is never a need to $project just to "eliminate" fields as that will happen automatically.
For your query:
I would like to aggregate these data to retrieve the number of people who made an action
Try this:
db.flows.aggregate([
// De-normalize the array into individual documents
{"$unwind" : "$activities"},
// Match for the verbs you are interested in
{"$match" : {"activities.verb":{$in:["buy", "view"]}}},
// Group by verb to get the count
{"$group" : {_id:"$activities.verb", count:{$sum:1}}}
])
The above query would produce an output like:
{
"result" : [
{
"_id" : "buy",
"count" : 1
},
{
"_id" : "view",
"count" : 1
}
],
"ok" : 1
}
Note: The $and operator in your query ({ $match: { $and: [{'activities.verb': 'view'}, {'activities.verb': 'buy'}] }}) is not required as that's the default if you specify multiple conditions. Only if you need a logical OR, $or operator is required.
If you want to use the date in the aggregation query to do queries like how many "views by day", etc.. the Date Aggregation Operators will come in handy.
I see where you are going with this and I think you are basically on the right track. So more or less un-altered (but for formatting preference) and the few tweeks at the end:
db.flows.aggregate([
// Try to $match "first" always to make sure you can get an index
{ "$match": {
"$and": [
{"activities.verb": "view"},
{"activities.verb": "buy"}
]
}},
// Don't worry, the optimizer "sees" this and will sort of "blend" with
// with the first stage.
{ "$project": {
"profile": 1,
"activities": 1,
"_id": 0
}},
{ "$unwind": "$activities" },
{ "$match": {
"activities.verb": { "$in":["view", "buy"] }
}},
{ "$group": {
"_id": "$profile",
"view": { "$min": { "$cond": [
{ "$eq": [ "$activities.verb", "view" ] },
"$activities.date",
null
]}},
"buy": { "$max": { "$cond": [
{ "$eq": [ "$activities.verb", "buy" ] },
"$activities.date",
null
]}}
}},
{ "$project": {
"viewFirst": { "$lt": [ "$view", "$buy" ] }
}}
])
So essentially the $min and $max operators should be self explanatory in the context in that you should be looking for the "first" view to correspond with the "last" purchase. As for me, and would make sense, you would actually be matching these by product (but hint: "Grouping") but I'll leave that part up to you.
The other advantage here is that the false values will always be negated if there is an actual date to match the "verb". Otherwise this goes through as false and this turns out to be okay.
That is because the next thing you do is $project to "compare" the values and ask the question "Did the 'view' happen before the 'buy'?" which is a logical evaluation of the "less than" $lt operator.
As for the schema itself. If you are storing a lot of these "events" then you are probably better off flattening things out into separate documents and finding some way to mark each with the same "session" identifier if that is separate to "profile".
Getting away from large arrays ( which this seems to lead to ) if likely going to help performance, and with care, makes little different to the aggregation process.