Related
I've a collection in MongoDB of objects with this structure:
{
"_id": "ID",
"email": "EMAIL",
"name": "Foo",
"surname": "Bar",
"orders": [
{
"createdAt": "2019-09-09T07:30:25.575Z"
},
{
"createdAt": "2019-10-30T14:20:04.849Z"
},
{
"createdAt": "2019-10-30T16:38:27.271Z"
},
{
"createdAt": "2020-01-03T15:49:39.614Z"
},
],
}
I need to count all duplicates "createdAt" and distinct it with changing date format.
The result should be like below:
{
"_id": "ID",
"email": "EMAIL",
"name": "Foo",
"surname": "Bar",
"orders": [
{
"date": "2019-09-09",
"total": 1,
},
{
"date": "2019-10-30",
"total": 2,
},
{
"date": "2020-01-03",
"total": 1,
},
],
}
I tried with $unwind orders.createdAt in db.collection.aggregate() but i've no idea how can i get this result.
Thanks in advance.
Try this on for size. Given this data:
db.foo.insert([
{
"_id": "ID",
"email": "EMAIL", "name": "Foo", "surname": "Bar",
"orders": [
{ "createdAt": new Date("2019-09-09T07:30:25.575Z") },
{ "createdAt": new Date("2019-10-30T14:20:04.849Z") },
{ "createdAt": new Date("2019-10-30T16:38:27.271Z") },
{ "createdAt": new Date("2020-01-03T15:49:39.614Z") }
]
},
{
"_id": "ID2",
"email": "EMAIL2", "name": "Bin", "surname": "Baz",
"orders": [
{ "createdAt": new Date("2019-09-09T07:30:25.575Z") },
{ "createdAt": new Date("2020-10-30T14:20:04.849Z") },
{ "createdAt": new Date("2020-10-30T16:38:27.271Z") },
{ "createdAt": new Date("2020-10-30T15:49:39.614Z") }
]
}
]);
This agg:
db.foo.aggregate([
{$unwind: "$orders"}
// First $group is on just the Y-M-D part of the date plus the id.
// This will produce the basic info the OP seeks -- but not in the desired
// data structure:
,{$group: {
_id: {orig_id: "$_id", d: {$dateToString: {date: "$orders.createdAt", format: "%Y-%m-%d"}} },
n:{$sum:1} ,
email: {$first: "$email"},
name: {$first: "$name"},
surname: {$first: "$surname"}
}}
// The group is not guaranteed to preserve the order of the dates. So now that
// the basic agg is done, reorder by DATE. _id.d is a Y-M-D string but fortunately
// that sorts correctly for our purposes:
,{$sort: {"_id.d":1}}
// ...so in the second $group, we pluck just the id from the id+YMD_date key and
// take the YMD_date+n and *push* it onto a new orders array to arrive at the
// desired data structure. We are not guaranteed the order of orig_id (e.g.
// ID or ID2) but for each id, the push *will* happen in the order of arrival -- which was
// sorted correctly in the prior stage! As an experiment, try changing the
// sort to -1 (reverse) and see what happens.
,{$group: {_id: "$_id.orig_id",
email: {$first: "$email"},
name: {$first: "$name"},
surname: {$first: "$surname"},
orders: {$push: {date: "$_id.d", total: "$n"}} }}
]);
yields this output:
{
"_id" : "ID",
"email" : "EMAIL",
"name" : "Foo",
"surname" : "Bar",
"orders" : [
{
"date" : "2019-09-09",
"total" : 1
},
{
"date" : "2019-10-30",
"total" : 2
},
{
"date" : "2020-01-03",
"total" : 1
}
]
}
{
"_id" : "ID2",
"email" : "EMAIL2",
"name" : "Bin",
"surname" : "Baz",
"orders" : [
{
"date" : "2019-09-09",
"total" : 1
},
{
"date" : "2020-10-30",
"total" : 3
}
]
}
If you are willing to have a slightly more complex return structure and some dupe data in return for greater dynamic behavior by not having to enumerate each field (e.g. field: {$first: "$field"} then you can do this:
db.foo.aggregate([
{$unwind: "$orders"}
,{$group: {
_id: {orig_id: "$_id", d: {$dateToString: {date: "$orders.createdAt", format: "%Y-%m-%d"}} },
n:{$sum:1} ,
ALL: {$first: "$$CURRENT"}
}}
,{$group: {_id: "$_id.orig_id",
ALL: {$first: "$ALL"},
orders: {$push: {date: "$_id.d", total: "$n"}} }}
]);
to yield this:
{
"_id" : "ID2",
"ALL" : {
"_id" : "ID2",
"email" : "EMAIL2",
"name" : "Bin",
"surname" : "Baz",
"orders" : {
"createdAt" : ISODate("2019-09-09T07:30:25.575Z")
}
},
"orders" : [
{
"date" : "2019-09-09",
"total" : 1
},
{
"date" : "2020-10-30",
"total" : 3
}
]
}
{
"_id" : "ID",
"ALL" : {
"_id" : "ID",
"email" : "EMAIL",
"name" : "Foo",
"surname" : "Bar",
"orders" : {
"createdAt" : ISODate("2019-10-30T14:20:04.849Z")
}
},
"orders" : [
{
"date" : "2019-10-30",
"total" : 2
},
{
"date" : "2020-01-03",
"total" : 1
},
{
"date" : "2019-09-09",
"total" : 1
}
]
}
I have some data within a mongodb collection which looks like this:
[
{
"name" : "Apple",
"quantity" : "4",
},
{
"name" : "Apple",
"quantity" : "6",
},
{
"name" : "Orange",
"quantity" : "2",
},
{
"name" : "Orange",
"quantity" : "3",
},
]
I am trying to figure out a mongodb query and then its mongoose counterpart where I could utilize $sum to get all unique names with their respective sum. So the correct output after the query should look like this:
[
{
name: "Apple",
totalQuantity: "10"
},
{
name: "Orange",
totalQuantity: "5"
}
The $group will group documents by specified fields,
$group by name
$toInt convert string quantity to integer and $sum into totalQuantity
db.collection.aggregate([
{
$group: {
_id: "$name",
totalQuantity: {
$sum: { $toInt: "$quantity" }
}
}
},
{
$project: {
_id: 0,
name: "$_id",
totalQuantity: 1
}
}
])
Playground
I have a query question with mongodb
There are 2 collections in my database, names status and menu
The primary key in status _id is the foreign key for the value of the bought list in menu collection
For status collection:
{
"_id": "green", "description": "running"
}
{
"_id": "yellow", "description": "prepareing"
}
{
"_id": "black", "description": "closing"
}
{
"_id": "red", "description": "repairing"
}
For menu collection:
{
"name": "tony",
"bought": [
{
"notebook": "green"
},
{
"cellphone": "red"
}
]
}
{
"name": "andy",
"bought": [
{
"fan": "black"
}
]
}
How can I query to get the following answer?
(Just replace description for _id)
{
"name": "tony",
"bought": [
{
"notebook": "running"
},
{
"cellphone": "repairing"
}
]
}
Is it a subquery issue for NoSQL? How can I use the key word to google?
Here is a version using aggregate:
We start with a $unwind stage to extract each bought in a separate row
Then a $objectToArray to normalize the bought field.
We can then perform a $lookup to join on status.
Then we use $group to regroup by name
And $arrayToObject to reset bought to denormalized style
> db.menu.find()
{ "_id" : ObjectId("5a102b0b49b317e3f8d6268b"), "name" : "tony", "bought" : [ { "notebook" : "green" }, { "cellphone" : "red" } ] }
{ "_id" : ObjectId("5a102b0b49b317e3f8d6268c"), "name" : "andy", "bought" : [ { "fan" : "black" } ] }
> db.status.find()
{ "_id" : "green", "description" : "running" }
{ "_id" : "yellow", "description" : "prepareing" }
{ "_id" : "black", "description" : "closing" }
{ "_id" : "red", "description" : "repairing" }
> db.menu.aggregate([
{$unwind: '$bought'},
{$project: {name: 1, bought: {$objectToArray: '$bought'}}}, {$unwind: '$bought'},
{$lookup: {from: 'status', localField: 'bought.v', foreignField: '_id', as: "status"}},
{$project: {name: 1, bought: ["$bought.k", { $arrayElemAt: ["$status.description", 0]}]}},
{$addFields: {b: {v: {$arrayElemAt: ['$bought', 1]}, k: { $arrayElemAt: ['$bought', 0]}}}},
{$group: {_id: { name: '$name', _id: "$_id"}, b: {$push: "$b"}}},
{$project: {_id: "$_id._id", name: "$_id.name", bought: {$arrayToObject: "$b"}}}
])
{ "_id" : ObjectId("5a102b0b49b317e3f8d6268c"), "name" : "andy", "bought" : { "fan" : "closing" } }
{ "_id" : ObjectId("5a102b0b49b317e3f8d6268b"), "name" : "tony", "bought" : { "notebook" : "running", "cellphone" : "repairing" } }
I think it can be performed in a simplier way, but I don't know how (and I would be glad to know).
OK I am very new to Mongo, and I am already stuck.
Db has the following structure (much simplified for sure):
{
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"score" : "2",
"value" : "AA",
},
{
"score" : "2",
"value" : "AA",
},
{
"score" : "4",
"value" : "BB",
},
{
"score" : "3",
"value" : "CC",
}
]
},
{
"_id" : ObjectId("57fdfbc12dc30a46507044ef"),
"keyterms" : [
...
There are some Objects. Each Object have an array "keywords". Each of this Arrays Entries, which have score and value. There are some duplicates though (not really, since in the real db the keywords entries have much more fields, but concerning value and score they are duplicates).
Now I need a query, which
selects one object by id
groups its keyterms in by value
and counts the dublicates
sorts them by score
So I want to have something like that as result
// for Object 57fdfbc12dc30a46507044ec
"keyterms"; [
{
"score" : "4",
"value" : "BB",
"count" : 1
},
{
"score" : "3",
"value" : "CC",
"count" : 1
}
{
"score" : "2",
"value" : "AA",
"count" : 2
}
]
In SQL I would have written something like this
select
score, value, count(*) as count
from
all_keywords_table_or_some_join
group by
value
order by
score
But, sadly enough, it's not SQL.
In Mongo I managed to write this:
db.getCollection('tests').aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$_id",
'keyterms': {$push: "$keyterms"}
}},
{$project: {
'keyterms.score': 1,
'keyterms.value': 1
}}
])
But there is something missing: the grouping of the the keywords by their value. I can not get rid of the feeling, that this is the wrong approach at all. How can I select the keywords array and continue with that, and use an aggregate function inly on this - that would be easy.
BTW I read this
(Mongo aggregate nested array)
but I can't figure it out for my example unfortunately...
You'd want an aggregation pipeline where after you $unwind the array, you group the flattened documents by the array's value and score keys, aggregate the counts using the $sum accumulator operator and retain the main document's _id with the $first operator.
The preceding pipeline should then group the documents from the previous pipeline by the _id key so as to preserve the original schema and recreate the keyterms array using the $push operator.
The following demonstration attempts to explain the above aggregation operation:
db.tests.aggregate([
{ "$match": { "_id": ObjectId("57fdfbc12dc30a46507044ec") } },
{ "$unwind": "$keyterms" },
{
"$group": {
"_id": {
"value": "$keyterms.value",
"score": "$keyterms.score"
},
"doc_id": { "$first": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$sort": {"_id.score": -1 } },
{
"$group": {
"_id": "$doc_id",
"keyterms": {
"$push": {
"value": "$_id.value",
"score": "$_id.score",
"count": "$count"
}
}
}
}
])
Sample Output
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"value" : "BB",
"score" : "4",
"count" : 1
},
{
"value" : "CC",
"score" : "3",
"count" : 1
},
{
"value" : "AA",
"score" : "2",
"count" : 2
}
]
}
Demo
Meanwhile, I solved it myself:
aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$keyterms.value",
'keyterms': {$push: "$keyterms"},
'escore': {$first: "$keyterms.score"},
'evalue': {$first: "$keyterms.value"}
}},
{$limit: 15},
{$project: {
"score": "$escore",
"value": "$evalue",
"count": {$size: "$keyterms"}
}}
])
Is there an easy solution in MongoDB to find some objects that match a query and then to modify the result without modifying the persistent data depending on if a certain value is contained in an array?
Let explain me using an example:
students = [
{
name: "Alice",
age: 25,
courses: [ { name: "Databases", credits: 6 },{ name: "Java", credits: 4 }]
},
{
name: "Bob",
age: 22,
courses: [ { name: "Java", credits: 4 } ]
},
{
name: "Carol",
age: 19,
courses: [ { name: "Databases", credits: 6 } ]
},
{
name: "Dave", age: 18
}
]
Now, I want to query all students. The result should return all their data except 'courses'. Instead, I want to output a flag 'participant' indicating whether that person participates in the Databases course:
result = [
{ name: "Alice", age: 25, participant: 1 },
{ name: "Bob", age: 22, participant: 0 },
{ name: "Carol", age: 19, participant: 1 },
{ name: "Dave", age: 18, participant: 0}
]
without changing anything in the database.
I've already found a solution using aggregate. But it's very complicated and unhandy and so, I would like to know if there is a more handy solution for this problem.
My current solution looks like the following:
db.students.aggregate([
{$project: {"courses": {$ifNull: ["$courses", [{name: 0}]]}, name: 1, _id: 1, age: 1}},
{$unwind: "$courses"},
{$project: {name: 1, age: 1, participant: {$cond: [{$eq: ["$courses.name", "DB"]}, 1, 0]}}},
{$group: {_id: {_id: "$_id", age: 1, name: "$name"}, participant: {$sum: "$participant"}}},
{$project: {_id: 0, _id: "$_id._id", age: "$_id.age", name: "$_id.name", participant: 1}}
]);
One point I don't like in this solution is that I have to specify the output fields exactly three times. Also, this pipe is quite long.
Run the following aggregation pipeline to get the desired result:
db.students.aggregate([
{
"$project": {
"name": 1,
"age": 1,
"participant": {
"$size": {
"$ifNull" : [
{
"$setIntersection" : [
{
"$map": {
"input": "$courses",
"as": "el",
"in": {
"$eq": [ "$$el.name", "Databases" ]
}
}
},
[true]
]
},
[]
]
}
}
}
}
])
Output:
{
"result" : [
{
"_id" : ObjectId("564f1bb67d3c273d063cd216"),
"name" : "Alice",
"age" : 25,
"participant" : 1
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd217"),
"name" : "Bob",
"age" : 22,
"participant" : 0
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd218"),
"name" : "Carol",
"age" : 19,
"participant" : 1
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd219"),
"name" : "Dave",
"age" : 18,
"participant" : 0
}
],
"ok" : 1
}
The above pipeline uses only one step, $project in which the new field participant is created via a series of nested operators.
Crucial to the operations is the deeply nested $map operator which in essence creates a new array field that holds values as a result of the evaluated logic in a subexpression to each element of an array. Let's demonstrate this operation only by executing the pipeline with just the $map part:
db.students.aggregate([
{
"$project": {
"name": 1,
"age": 1,
"participant": {
"$map": {
"input": "$courses",
"as": "el",
"in": {
"$eq": [ "$$el.name", "Databases" ]
}
}
}
}
}
])
Output
{
"result" : [
{
"_id" : ObjectId("564f1bb67d3c273d063cd216"),
"name" : "Alice",
"age" : 25,
"participant" : [
true,
false
]
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd217"),
"name" : "Bob",
"age" : 22,
"participant" : [
false
]
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd218"),
"name" : "Carol",
"age" : 19,
"participant" : [
true
]
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd219"),
"name" : "Dave",
"age" : 18,
"participant" : null
}
],
"ok" : 1
}
Probe the array further by introducing the $setIntersection operator which returns a set with elements that appear in all of the input sets. Thus in the above you would need to get a resulting array that has true to denote that document user participated in a Database course, else it will return an empty or null array. Let's see how adding that operator affects the previous result:
db.students.aggregate([
{
"$project": {
"name": 1,
"age": 1,
"participant": {
"$setIntersection" : [
{
"$map": {
"input": "$courses",
"as": "el",
"in": {
"$eq": [ "$$el.name", "Databases" ]
}
}
},
[true]
]
}
}
}
])
Output:
{
"result" : [
{
"_id" : ObjectId("564f1bb67d3c273d063cd216"),
"name" : "Alice",
"age" : 25,
"participant" : [
true
]
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd217"),
"name" : "Bob",
"age" : 22,
"participant" : []
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd218"),
"name" : "Carol",
"age" : 19,
"participant" : [
true
]
},
{
"_id" : ObjectId("564f1bb67d3c273d063cd219"),
"name" : "Dave",
"age" : 18,
"participant" : null
}
],
"ok" : 1
}
To handle nulls, apply the $ifNull operator, equivalent to the coalesce command in SQL to substitute null values with an empty array:
db.students.aggregate([
{
"$project": {
"name": 1,
"age": 1,
"participant": {
"$ifNull" : [
{
"$setIntersection" : [
{
"$map": {
"input": "$courses",
"as": "el",
"in": {
"$eq": [ "$$el.name", "Databases" ]
}
}
},
[true]
]
},
[]
]
}
}
}
])
After this you can then wrap the $ifNull operator with the $size operator to return the number of elements in the participants array, and that yields the final output as above.
Based on what you said about the small number of objects, how about simply pulling out the database name and using JavaScript map to transform it? You're not saving much in terms of transfer and the code will be way more readable than the pipeline.