How to select and count distinct value of embedded array of collection in MongoDB? - mongodb

Collection A have:
[
{
name: 'peter',
types: ['human', 'male', 'young']
},
{
name: 'mice',
types: ['male', 'young']
},
{
name: 'hen',
types: ['female', 'old']
}
]
I know how to get all distinct values of types but how to get its no. of appearance. How to extract it with Mongo query?
It would be great if you can show the solution in Doctrine QueryBuilder way.
Thanks

with aggregation framework you can sum apperance of all array elements using query provide below:
db.collection.aggregate([{
$project : {
_id : 0,
types : 1
}
}, {
$unwind : "$types"
}, {
$group : {
_id : "$types",
count : {
$sum : 1
}
}
}
])
and output:
{
"_id" : "human",
"count" : 1
}, {
"_id" : "old",
"count" : 1
}, {
"_id" : "male",
"count" : 2
}, {
"_id" : "young",
"count" : 2
}, {
"_id" : "female",
"count" : 1
}

Related

How can I split a MongoDB collection into 3 and assign a new field?

I have a json collection with 300 records like this:
{
salesNumber: 23839,
batch: null
},
{
salesNumber 389230,
batch: null
}
...etc.
I need to divide this collection into 3 different batches. So, when sorted by salesNumber, the first 100 would be in batch 1, the next 100 would be batch 2, and the last 100 would be batch 3. How do I do this?
I wrote a script to select the first 100, but when I tried to turn it into an array to use in an update, the result was 0 records.
var firstBatchCompleteRecords = db.properties.find(
{
"auction": ObjectId("50")
}
).sort("saleNumber").limit(100);
// This returned 174 records as excepted with all the fields
var firstBatch = firstBatchCompleteRecords.distinct( "saleNumber", {});
// This returned 0 records
I was going to take the results of that last query and use it in an update statement:
db.properties.update(
{
"saleNumber":
{
"$in": firstBatch
}
}
,
{
$set:
{
batch: "1"
}
}
,
{
multi: true
}
);
...then I would have created an array using distinct of the next 100 and update those, but I never got that far.
there is a chance to get results using aggregation framework - and store them in new collection - then you can use this answer to iterate and update fields in source collection
Have a fun!
db.sn.aggregate([{
$sort : {
salesNumber : 1
}
}, {
$group : {
_id : null,
arrayOfData : {
$push : "$$ROOT"
},
}
}, {
$project : {
_id : 0,
firstHundred : {
$slice : ["$arrayOfData", 0, 100]
},
secondHundred : {
$slice : ["$arrayOfData", 99, 100]
},
thirdHundred : {
$slice : ["$arrayOfData", 199, 100]
},
}
}, {
$project : {
"firstHundred.batch" : {
$literal : 1
},
"firstHundred.salesNumber" : 1,
"firstHundred._id" : 1,
"secondHundred.batch" : {
$literal : 2
},
"secondHundred.salesNumber" : 1,
"secondHundred._id" : 1,
"thirdHundred.batch" : {
$literal : 3
},
"thirdHundred.salesNumber" : 1,
"thirdHundred._id" : 1,
}
}, {
$project : {
allValues : {
$setUnion : ["$firstHundred", "$secondHundred", "$thirdHundred"]
}
}
}, {
$unwind : "$allValues"
}, {
$project : {
_id : "$allValues._id",
salesNumber : "$allValues.salesNumber",
batch : "$allValues.batch",
}
}, {
$out : "collectionName"
}
])
db.collectionName.find()
and output generated for 6 document divided by 2:
{
"_id" : ObjectId("5733ade7eeeccba2bd546121"),
"salesNumber" : 389230,
"batch" : 2
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546120"),
"salesNumber" : 23839,
"batch" : 1
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546122"),
"salesNumber" : 43839,
"batch" : 1
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546124"),
"salesNumber" : 63839,
"batch" : 2
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546123"),
"salesNumber" : 589230,
"batch" : 3
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546125"),
"salesNumber" : 789230,
"batch" : 3
}
Any comments welcome!

MongoDB filtering out subdocuments with lookup aggregation

Our project database has a capped collection called values which gets updated every few minutes with new data from sensors. These sensors all belong to a single sensor node, and I would like to query the last data from these nodes in a single aggregation. The problem I am having is filtering out just the last of ALL the types of sensors while still having only one (efficient) query. I looked around and found the $group argument, but I can't seem to figure out how to use it correctly in this case.
The database is structured as follows:
nodes:
{
"_id": 681
"sensors": [
{
"type": "foo"
},
{
"type": "bar"
}
]
}
values:
{
"_id" : ObjectId("570cc8b6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 10
}
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 681,
"value" : 20
}
{
"_id" : ObjectId("167bc997bb66750d5740665e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 200,
"value" : 20
}
{
"_id" : ObjectId("110cc9c6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 12
}
so let's imagine I want the data from node 681, I would want a structure like this:
nodes:
{
"_id": 681
"sensors": [
{
"_id" : ObjectId("570cc8b6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "foo",
"nodeid" : 681,
"value" : 10
},
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"type" : "bar",
"nodeid" : 681,
"value" : 20
}
]
}
Notice how one value of foo is not queried, because I want to only get the latest value possible if there are more than one value (which is always going to be the case). The ordering of the collection is already according to the timestamp because the collection is capped.
I have this query, but it just gets all the values from the database (which is waaay too much to do in a lifetime, let alone one request of the web app), so I was wondering how I would filter it before it gets aggregated.
query:
db.nodes.aggregate(
[
{
$unwind: "$sensors"
},
{
$match:{
nodeid: 681
}
},
{
$lookup:{
from: "values", localField: "sensors.type", foreignField: "type", as: "sensors"
}
}
}
]
)
Try this
// Pipeline
[
// Stage 1 - sort the data collection if not already done (optional)
{
$sort: {
"timestamp":1
}
},
// Stage 2 - group by type & nodeid then get first item found in each group
{
$group: {
"_id":{type:"$type",nodeid:"$nodeid"},
"sensors": {"$first":"$$CURRENT"} //consider using $last if your collection is on reverse
}
},
// Stage 3 - project the fields in desired
{
$project: {
"_id":"$sensors._id",
"timestamp":"$sensors.timestamp",
"type":"$sensors.type",
"nodeid":"$sensors.nodeid",
"value":"$sensors.value"
}
},
// Stage 4 - group and push it to array sensors
{
$group: {
"_id":{nodeid:"$nodeid"},
"sensors": {"$addToSet":"$$CURRENT"}
}
}
]
as far as I got document structure, there is no need to use $lookup as all data is in readings(values) collection.
Please see proposed solution:
db.readings.aggregate([{
$match : {
nodeid : 681
}
},
{
$group : {
_id : {
type : "$type",
nodeid : "$nodeid"
},
readings : {
$push : {
timestamp : "$timestamp",
value : "$value",
id : "$_id"
}
}
}
}, {
$project : {
_id : "$_id",
readings : {
$slice : ["$readings", -1]
}
}
}, {
$unwind : "$readings"
}, {
$project : {
_id : "$readings.id",
type : "$_id.type",
nodeid : "$_id.nodeid",
timestamp : "$readings.timestamp",
value : "$readings.value",
}
}, {
$group : {
_id : "$nodeid",
sensors : {
$push : {
_id : "$_id",
timestamp : "$timestamp",
value : "$value",
type:"$type"
}
}
}
}
])
and output:
{
"_id" : 681,
"sensors" : [
{
"_id" : ObjectId("110cc9c6ac55850d5740784e"),
"timestamp" : ISODate("2016-04-09T12:06:46.344Z"),
"value" : 12,
"type" : "foo"
},
{
"_id" : ObjectId("190ac8b6ac55850d5740776e"),
"timestamp" : ISODate("2016-04-12T12:06:46.344Z"),
"value" : 20,
"type" : "bar"
}
]
}
Any comments welcome!

compare two collection in mongodb

I have two different collection book and music in JSON .First I give a book collection example:
{
"_id" : ObjectId("b1"),
"author" : [
"Mary",
],
"title" : "Book1",
}
{
"_id" : ObjectId("b2"),
"author" : [
"Joe",
"Tony",
"Mary"
],
"title" : "Book2",
}
{
"_id" : ObjectId("b3"),
"author" : [
"Joe",
"Mary"
],
"title" : "Book3",
}
.......
Mary writes 3 books, Joe write 2 books, Tony writes 1 book. Second I give a music collection example:
{
"_id" : ObjectId("m1"),
"author" : [
"Tony"
],
"title" : "Music1",
}
{
"_id" : ObjectId("m2"),
"author" : [
"Joe",
"Tony"
],
"title" : "Music2",
}
.......
Tony has 2 musics, Joe has 1 music, Mary has 0 music.
I hope to get the number of authors who write more books than music.
Thus, Mary(3 > 0) and Joe(2 > 1) should take into consideration, but not Tony(1 < 2). Thus the final result should be 2(Mary and Joe).
I write down following code, but don't know how to compare:
db.book.aggregate([
{ $project:{ _id:0, author:1}},
{ $unwind:"$author" },
{$group:{_id:"$author", count:{$sum:1}}}
]
)
db.music.aggregate([
{ $project:{ _id:0, author:1}},
{ $unwind:"$author" },
{$group:{_id:"$author", count:{$sum:1}}}
]
)
Is it so far right? How to do the following comparison? Thanks.
to solve that problem, we need to use $out phase and store result of both queries in intermediate collection and then use aggregated query to join them ($lookup).
db.books.aggregate([{
$project : {
_id : 0,
author : 1
}
}, {
$unwind : "$author"
}, {
$group : {
_id : "$author",
count : {
$sum : 1
}
}
}, {
$project : {
_id : 0,
author : "$_id",
count : 1
}
}, {
$out : "bookAuthors"
}
])
db.music.aggregate([{
$project : {
_id : 0,
author : 1
}
}, {
$unwind : "$author"
}, {
$group : {
_id : "$author",
count : {
$sum : 1
}
}
}, {
$project : {
_id : 0,
author : "$_id",
count : 1
}
}, {
$out : "musicAuthors"
}
])
db.bookAuthors.aggregate([{
$lookup : {
from : "musicAuthors",
localField : "author",
foreignField : "author",
as : "music"
}
}, {
$unwind : "$music"
}, {
$project : {
_id : "$author",
result : {
$gt : ["$count", "$music.count"]
},
count : 1,
}
}, {
$match : {
result : true
}
}
])
EDIT CHANGES:
used author field instead of _id
added logical statement embeded in document in $project phase
result : { $gt : ["$count", "$music.count"]
Any questions welcome!
Have a fun!

MongoDB $sum and $avg of sub documents

I need to get $sum and $avg of subdocuments, i would like to get $sum and $avg of Channels[0].. and other channels as well.
my data structure looks like this
{
_id : ... Location : 1,
Channels : [
{ _id: ...,
Value: 25
},
{
_id: ... ,
Value: 39
},
{
_id: ..,
Value: 12
}
]
}
In order to get the sum and average of the Channels.Value elements for each document in your collection you will need to use mongodb's Aggregation processing. Further, since Channels is an array you will need to use the $unwind operator to deconstruct the array.
Assuming that your collection is called example, here's how you could get both the document sum and average of the Channels.Values:
db.example.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$_id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
The output from your post's data would be:
{
"_id" : SomeObjectIdValue,
"documentSum" : 76,
"documentAvg" : 25.333333333333332
}
If you have more than one document in your collection then you will see a result row for each document containing a Channels array.
Solution 1: Using two groups based this example:
previous question
db.records.aggregate(
[
{ $unwind: "$Channels" },
{ $group: {
_id: {
"loc" : "$Location",
"cId" : "$Channels.Id"
},
"value" : {$sum : "$Channels.Value" },
"average" : {$avg : "$Channels.Value"},
"maximun" : {$max : "$Channels.Value"},
"minimum" : {$min : "$Channels.Value"}
}},
{ $group: {
_id : "$_id.loc",
"ChannelsSumary" : { $push :
{ "channelId" : '$_id.cId',
"value" :'$value',
"average" : '$average',
"maximun" : '$maximun',
"minimum" : '$minimum'
}}
}
}
]
)
Solution 2:
there is property i didn't show on my original question that might of help "Channels.Id" independent from "Channels._Id"
db.records.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )

mongodb aggregation find min value and other fields in nested array

Is it possible to find in a nested array the max date and show its price then show the parent field like the actual price.
The result I want it to show like this :
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"actualPrice":19500,
"lastModifDate" :ISODate("2015-05-04T22:53:50.583Z"),
"price":"16000"
}
The data :
db.adds.findOne()
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"addTitle" : "Clio pack luxe",
"actualPrice" : 19500,
"fistModificationDate" : ISODate("2015-05-03T22:00:00Z"),
"addID" : "1746540",
"history" : [
{
"price" : 18000,
"modifDate" : ISODate("2015-05-04T22:01:47.272Z"),
"_id" : ObjectId("5547ec4bfeb20b0414e8e51b")
},
{
"price" : 16000,
"modifDate" : ISODate("2015-05-04T22:53:50.583Z"),
"_id" : ObjectId("5547f87e83a1dae00bc033fa")
},
{
"price" : 19000,
"modifDate" : ISODate("2015-04-04T22:53:50.583Z"),
"_id" : ObjectId("5547f87e83a1dae00bc033fe")
}
],
"__v" : 1
}
my query
db.adds.aggregate(
[
{ $match:{addID:"1746540"}},
{ $unwind:"$history"},
{ $group:{
_id:0,
lastModifDate:{$max:"$historique.modifDate"}
}
}
])
I dont know how to include other fields I used $project but I get errors
thanks for helping
You could try the following aggregation pipeline which does not need to make use of the $group operator stage as the $project operator takes care of the fields projection:
db.adds.aggregate([
{
"$match": {"addID": "1746540"}
},
{
"$unwind": "$history"
},
{
"$project": {
"actualPrice": 1,
"lastModifDate": "$history.modifDate",
"price": "$history.price"
}
},
{
"$sort": { "lastModifDate": -1 }
},
{
"$limit": 1
}
])
Output
/* 1 */
{
"result" : [
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"actualPrice" : 19500,
"lastModifDate" : ISODate("2015-05-04T22:53:50.583Z"),
"price" : 16000
}
],
"ok" : 1
}