Aggregate distinct values in MongoDB - mongodb

I have a mongodb db with 18625 collections. It has following keys:
"_id" : ObjectId("5aab14d2fc08b46adb79d99c"),
"game_id" : NumberInt(4),
"score_phrase" : "Great",
"title" : "NHL 13",
"url" : "/games/nhl-13/ps3-128181",
"platform" : "PlayStation 3",
"score" : 8.5,
"genre" : "Sports",
"editors_choice" : "N",
"release_year" : NumberInt(2012),
"release_month" : NumberInt(9),
"release_day" : NumberInt(11)
Now, i wish to create another dimension/ collection with only genres.
If i use the following query :
db.ign.aggregate([ {$project: {"genre":1}}, { $out: "dimen_genre" } ]);
It generates 18625 collections, even though there are only 113 distinct
genres.
How to apply distinct here and get the collection for genres with only the distinct 113 values.
I googled, bt it showed that aggregate and distinct don't work together in mongo.
I also tried : db.dimen_genre.distinct('genre').length
this showed that in dimension_genre, there are 113 distinct genres.
Precisely,
how to make a collection from existing one with only distinct values.
I am really new to NoSQLs.

You can use $addToSet to group unique values in one document and then $unwind to get back multiple docs:
db.ign.aggregate([
{
$group: {
_id: null,
genre: { $addToSet: "$genre" }
}
},
{
$unwind: "$genre"
},
{
$project: {
_id: 0
}
},
{ $out: "dimen_genre" }
]);

You can try
db.names.aggregate(
[
{ $group : { _id : "$genre", books: { $push: "$$ROOT" } } }
]
)
I have tried with Test and Sports as genre
It gives you output something like this
{
"_id" : "Test",
"books" : [
{
"_id" : ObjectId("5aaea6150cc1403ee9a02e0c"),
"game_id" : 4,
"score_phrase" : "Great",
"title" : "NHL 13",
"url" : "/games/nhl-13/ps3-128181",
"platform" : "PlayStation 3",
"score" : 8.5,
"genre" : "Test",
"editors_choice" : "N",
"release_year" : 2012,
"release_month" : 9,
"release_day" : 11
}
]
}
/* 2 */
{
"_id" : "Sports",
"books" : [
{
"_id" : ObjectId("5aaea3be0cc1403ee9a02d97"),
"game_id" : 4,
"score_phrase" : "Great",
"title" : "NHL 13",
"url" : "/games/nhl-13/ps3-128181",
"platform" : "PlayStation 3",
"score" : 8.5,
"genre" : "Sports",
"editors_choice" : "N",
"release_year" : 2012,
"release_month" : 9,
"release_day" : 11
},
{
"_id" : ObjectId("5aaea3c80cc1403ee9a02d9b"),
"game_id" : 4,
"score_phrase" : "Great",
"title" : "NHL 13",
"url" : "/games/nhl-13/ps3-128181",
"platform" : "PlayStation 3",
"score" : 8.5,
"genre" : "Sports",
"editors_choice" : "N",
"release_year" : 2012,
"release_month" : 9,
"release_day" : 11
},
{
"_id" : ObjectId("5aaea3cf0cc1403ee9a02d9f"),
"game_id" : 4,
"score_phrase" : "Great",
"title" : "NHL 13",
"url" : "/games/nhl-13/ps3-128181",
"platform" : "PlayStation 3",
"score" : 8.5,
"genre" : "Sports",
"editors_choice" : "N",
"release_year" : 2012,
"release_month" : 9,
"release_day" : 11
}
]
}

Related

Mongodb aggregate match value in array

i'm working with the restaurants db in mongo
{
"_id" : ObjectId("5c66fcf59e184ea712adfba6"),
"address" : {
"building" : "97-22",
"coord" : [
-73.8601152,
40.7311739
],
"street" : "63 Road",
"zipcode" : "11374"
},
"borough" : "Queens",
"cuisine" : "Jewish/Kosher",
"grades" : [
{
"date" : ISODate("2014-11-24T00:00:00.000Z"),
"grade" : "Z",
"score" : 20
},
{
"date" : ISODate("2013-01-17T00:00:00.000Z"),
"grade" : "A",
"score" : 13
},
{
"date" : ISODate("2012-08-02T00:00:00.000Z"),
"grade" : "A",
"score" : 13
},
{
"date" : ISODate("2011-12-15T00:00:00.000Z"),
"grade" : "B",
"score" : 25
}
],
"name" : "Tov Kosher Kitchen",
"restaurant_id" : "40356068"
}
I'm tryng to filter with match in aggregate. I want to check if any score in grades is greater than 5
db.runCommand({
aggregate: "restaurants",
pipeline : [
{$match: {"grades": {$anyElementTrue: {"score": {$gt:5}}}}}
but i'm getting this error:
"errmsg" : "unknown operator: $anyElementTrue",
thanks
Try with $eleMatch
db.restaurants.aggregate([{$match: {"grades": {$elemMatch: {"score": {$gt:5}}}}}])

Aggregation in mongo

Below is a document from my database:
{
"_id" : ObjectId("58635ac32c9592064471cf5b"),
"agency_code" : "v5global",
"client_code" : "whirlpool",
"project_code" : "whirlpool",
"date" : {
"datetime" : 1464739200000.0,
"date" : 1464739200000.0,
"datejs" : ISODate("2016-06-01T00:00:00.000+0000"),
"datetimejs" : ISODate("2016-06-01T00:00:00.000+0000"),
"month" : NumberInt(5),
"year" : NumberInt(2016),
"day" : NumberInt(1)
},
"user" : {
"promoter_id" : NumberInt(19),
"promoter_name" : "Hira Singh Pawar",
"empcode" : "519230"
},
"counter" : {
"store_id" : NumberInt(4),
"store_name" : "Maya Sales ",
"chain_type" : "BS",
"address" : "6 Filamingo Market , Hissar",
"city" : "Hissar",
"state" : "Faridabad",
"region" : "North",
"sap_code" : "N_Far_91103948_1",
"unique_tp_code" : "91103948",
"location" : "6"
},
"insertedon" : {
"date" : 1464739200000.0,
"datejs" : ISODate("2016-06-01T00:00:00.000+0000"),
"datetimejs" : ISODate("2016-06-01T00:00:00.000+0000")
},
"insertedby" : "akshay",
"manager" : {
"manager_id" : NumberInt(5943),
"manager_name" : "Sonu Singh"
},
"type" : "display",
"data" : {
"brand" : "whirlpool",
"sku" : "60",
"model_name" : "Icemagic Fresh",
"sub_cat_name" : "DC",
"cat_name" : "Refrigerator",
"value" : NumberInt(1)
},
"IsDeleted" : false
}
I want to apply aggregation where I have to group it with city, state and region and if that counter has sold refrigerator I need that details in my result e.g if a counter has sold 2 refrigerators of whirlpool company then I want that to reflect in my result.
A counter can also sell other things like washing machines etc. So if they have sold 2 washing machines I want a result with { washingMachine: 2 }.
I have tried everything and nothing seems to be working here:
db.display_mop.aggregate( // Pipeline [
// Stage 1
{ $match: { "project_code":"whirlpool" } },
// Stage 2
{
$group: {
_id: {
"userid": "$user.promoter_id",
"userName": "$user.promoter_name",
"usercode": "$user.empcode",
"storename": "$counter.store_name",
"address": "$counter.address",
"city": "$counter.city",
"state": "$counter.state",
"region": "$counter.region"
}
}
},
],
// Options
{ allowDiskUse: true }

Simple update not working using positional $ operator

I'm just starting out with Mongo, and following the documentation here, I can't seem to update a value in a nested array when I apply the same technique.
This is my document:
{
"_id" : ObjectId("56d2cf8ee2b075667d4f0545"),
"address" : {
"building" : "522",
"coord" : [
-73.95171,
40.767461
],
"street" : "East 74 Street",
"zipcode" : "10021"
},
"borough" : "Manhattan",
"cuisine" : "American ",
"grades" : [
{
"date" : ISODate("2014-09-02T00:00:00Z"),
"grade" : "A",
"score" : 12
},
{
"grade" : "B",
"score" : 16,
"date" : ISODate("2013-12-19T00:00:00Z")
},
{
"date" : ISODate("2013-05-28T00:00:00Z"),
"grade" : "A",
"score" : 9
},
{
"date" : ISODate("2012-12-07T00:00:00Z"),
"grade" : "A",
"score" : 13
},
{
"date" : ISODate("2012-03-29T00:00:00Z"),
"grade" : "A",
"score" : 11
}
],
"name" : "Glorious Food",
"restaurant_id" : "40361521"
}
and this is my query:
db.restaurants.update(
{
_id: 'ObjectId("56d2cf8ee2b075667d4f0545")',
'grades.date': 'ISODate("2014-09-02T00:00:00Z")'
},
{
$set: { 'grades.$.score': 1 }
}
)
I'm sure I must have missed something obvious.
remove quotes from objectId and date field - please see below:
db.restaurants.update(
{
_id: ObjectId("56d2cf8ee2b075667d4f0545"),
"grades.date": ISODate("2014-09-02T00:00:00Z")
},
{
$set: { 'grades.$.score': 1 }
}
)

Query a collection for the latest unique object

Question: Lets say I have the following objects in a collection:
How would I return one record per "product_id" and only the one with the highest "version" number? And is this possible to do within mongoose?
{
"_id" : ObjectId("54f765564b10883c1800002a"),
"total_invoice_fob_case" : 86.70999999999999,
"status" : "Draft",
"discount" : "3.40",
"effective_date" : ISODate("2013-08-01T06:00:00.000Z"),
"version" : 2,
"controlstate" : "AB",
"controlstate_id" : ObjectId("54d510e9e3d793f581b6bb27"),
"product" : "Product A",
"product_id" : ObjectId("54f75b5e4b1088801a000627"),
"size" : "1.75LTR",
"size_id" : ObjectId("5418a3dd750b4294c2cb3a47"),
"vendor" : "BEAM SUNTORY",
"vendor_id" : ObjectId("54ef5aa74b1088781b000169"),
"product_state_code" : "123",
"net_fob_cost" : 86.70999999999999,
"change_reason" : [
"Other"
],
"submitted" : {
"submitted_date" : ISODate("2014-05-16T06:00:00.000Z")
}
},
{
"_id" : ObjectId("54f765564b10883c1800002b"),
"total_invoice_fob_case" : 86.70999999999999,
"status" : "Draft",
"discount" : "4.40",
"effective_date" : ISODate("2013-08-01T06:00:00.000Z"),
"version" : 3,
"controlstate" : "AB",
"controlstate_id" : ObjectId("54d510e9e3d793f581b6bb27"),
"product" : "Product A",
"product_id" : ObjectId("54f75b5e4b1088801a000627"),
"size" : "1.75LTR",
"size_id" : ObjectId("5418a3dd750b4294c2cb3a47"),
"vendor" : "BEAM SUNTORY",
"vendor_id" : ObjectId("54ef5aa74b1088781b000169"),
"product_state_code" : "123",
"net_fob_cost" : 86.70999999999999,
"change_reason" : [
"Other"
],
"submitted" : {
"submitted_date" : ISODate("2014-05-16T06:00:00.000Z")
}
},
{
"_id" : ObjectId("54f765564b10883c1800002c"),
"total_invoice_fob_case" : 86.70999999999999,
"status" : "Draft",
"discount" : "3.40",
"effective_date" : ISODate("2013-08-01T06:00:00.000Z"),
"version" : 2,
"controlstate" : "AB",
"controlstate_id" : ObjectId("54d510e9e3d793f581b6bb27"),
"product" : "Product B",
"product_id" : ObjectId("54f75b5e4b1088801a000628"),
"size" : "1.75LTR",
"size_id" : ObjectId("5418a3dd750b4294c2cb3a47"),
"vendor" : "BEAM SUNTORY",
"vendor_id" : ObjectId("54ef5aa74b1088781b000169"),
"product_state_code" : "123",
"net_fob_cost" : 86.70999999999999,
"change_reason" : [
"Other"
],
"submitted" : {
"submitted_date" : ISODate("2014-05-16T06:00:00.000Z")
}
}
Sounds like a job for Mongo's aggregation framework. You can extrapolate from this example how to approach the problem.
Update: To retrieve one per product_id with the highest version you would need to also use $first:
db.products.aggregate([
{$sort: {product_id: 1, version: -1}}, // sort first so that $first pulls the correct record
{$group: {
_id: {product_id: '$product_id'}, // group by the product_id
product: {$first: $$ROOT} // only return the first document per group
}}
]);
You need an aggregation pipeline that first sorts the documents in the collection by version number descending using a $sort pipeline stage, then groups the ordered documents by product_id using the $group operator. Within the grouping use the $first operator on $$ROOT to return the first document in the sorted group:
var pipeline = [
{
"$sort": { "version": -1 }
},
{
"$group": {
"_id": "$product_id",
"value": {
"$first": "$$ROOT"
}
}
},
{
"$project": {
"_id": 0,
"product_id": "$_id",
"status": "$value.status",
"version": "$value.version",
"product" : "$value.product"
}
}
];
// Mongoose aggregation
Model.aggregate(pipeline, function (err, res) {
if (err) return handleError(err);
console.log(res); //
});
Console Output:
[
{
"product_id" : ObjectId("54f75b5e4b1088801a000628"),
"status" : "Draft",
"version" : 2,
"product" : "Product B"
},
{
"product_id" : ObjectId("54f75b5e4b1088801a000627"),
"status" : "Draft",
"version" : 3,
"product" : "Product A"
}
]
-- UPDATE --
To project the full document, replace the $project pipeline with the following:
{
"$project": {
"_id": 0,
"product": "$value"
}
}
Output:
/* 1 */
{
"result" : [
{
"product" : {
"_id" : ObjectId("54f765564b10883c1800002c"),
"total_invoice_fob_case" : 86.7099999999999940,
"status" : "Draft",
"discount" : "3.40",
"effective_date" : ISODate("2013-08-01T06:00:00.000Z"),
"version" : 2,
"controlstate" : "AB",
"controlstate_id" : ObjectId("54d510e9e3d793f581b6bb27"),
"product" : "Product B",
"product_id" : ObjectId("54f75b5e4b1088801a000628"),
"size" : "1.75LTR",
"size_id" : ObjectId("5418a3dd750b4294c2cb3a47"),
"vendor" : "BEAM SUNTORY",
"vendor_id" : ObjectId("54ef5aa74b1088781b000169"),
"product_state_code" : "123",
"net_fob_cost" : 86.7099999999999940,
"change_reason" : [
"Other"
],
"submitted" : {
"submitted_date" : ISODate("2014-05-16T06:00:00.000Z")
}
}
},
{
"product" : {
"_id" : ObjectId("54f765564b10883c1800002b"),
"total_invoice_fob_case" : 86.7099999999999940,
"status" : "Draft",
"discount" : "4.40",
"effective_date" : ISODate("2013-08-01T06:00:00.000Z"),
"version" : 3,
"controlstate" : "AB",
"controlstate_id" : ObjectId("54d510e9e3d793f581b6bb27"),
"product" : "Product A",
"product_id" : ObjectId("54f75b5e4b1088801a000627"),
"size" : "1.75LTR",
"size_id" : ObjectId("5418a3dd750b4294c2cb3a47"),
"vendor" : "BEAM SUNTORY",
"vendor_id" : ObjectId("54ef5aa74b1088781b000169"),
"product_state_code" : "123",
"net_fob_cost" : 86.7099999999999940,
"change_reason" : [
"Other"
],
"submitted" : {
"submitted_date" : ISODate("2014-05-16T06:00:00.000Z")
}
}
}
],
"ok" : 1
}

Get document based on multiple criteria of embedded collection

I have the following document, I need to search for multiple items from the embedded collection"items".
Here's an example of a single SKU
db.sku.findOne()
{
"_id" : NumberLong(1192),
"description" : "Uploaded via CSV",
"items" : [
{
"_id" : NumberLong(2),
"category" : DBRef("category", NumberLong(1)),
"description" : "840 tag visual",
"name" : "840 Visual Mini Round",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(7),
"category" : DBRef("category", NumberLong(2)),
"description" : "Maxi",
"name" : "Maxi",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(11),
"category" : DBRef("category", NumberLong(3)),
"description" : "Button",
"name" : "Button",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(16),
"category" : DBRef("category", NumberLong(4)),
"customizationFields" : [
{
"_class" : "CustomizationField",
"_id" : NumberLong(1),
"displayText" : "Custom Print 1",
"fieldName" : "customPrint1",
"listOrder" : 1,
"maxInputLength" : 12,
"required" : false,
"version" : NumberLong(0)
},
{
"_class" : "CustomizationField",
"_id" : NumberLong(2),
"displayText" : "Custom Print 2",
"fieldName" : "customPrint2",
"listOrder" : 2,
"maxInputLength" : 17,
"required" : false,
"version" : NumberLong(0)
}
],
"description" : "2 custom lines of farm print",
"name" : "Custom 2",
"version" : NumberLong(2)
},
{
"_id" : NumberLong(20),
"category" : DBRef("category", NumberLong(5)),
"description" : "Color Red",
"name" : "Red",
"version" : NumberLong(0)
}
],
"skuCode" : "NF-USDA-XC2/SM-BC-R",
"version" : 0,
"webCowOptions" : "840miniwithcust2"
}
There are repeat items.id throughout the embedded collection. Each Sku is made up of multiple items, all combinations are unique, but one item will be part of many Skus.
I'm struggling with the query structure to get what I'm looking for.
Here are a few things I have tried:
db.sku.find({'items._id':2},{'items._id':7})
That one only returns items with the id of 7
db.sku.find({items:{$all:[{_id:5}]}})
That one doesn't return anything, but it came up when looking for solutions. I found about it in the MongoDB manual
Here's an example of a expected result:
sku:{ "_id" : NumberLong(1013),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(2) } ] },
sku:
{ "_id" : NumberLong(1014),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(2) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(24) } ] },
sku:
{ "_id" : NumberLong(1015),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(2) },
{ "_id" :NumberLong(5) } ] }
Each Sku that comes back has both a item of id:7, and id:2, with any other items they have.
To further clarify, my purpose is to determine how many remaining combinations exist after entering the first couple of items.
Basically a customer will start specifying items, and we'll weed it down to the remaining valid combinations. So Sku.items[0].id=5 can only be combined with items[1].id=7 or items[1].id=10 …. Then items[1].id=7 can only be combined with items[2].id=20 … and so forth
The goal was to simplify my rules for purchase, and drive it all from the Sku codes. I don't know if I dug a deeper hole instead.
Thank you,
On the part of extracting the sku with item IDs 2 and 7, when I recall correctly, you have to use $elemMatch:
db.sku.find({'items' :{ '$all' :[{ '$elemMatch':{ '_id' : 2 }},{'$elemMatch': { '_id' : 7 }}]}} )
which selects all sku where there is each an item with _id 2 and 7.
You can use aggregation pipelines
db.sku.aggregate([
{"$unwind": "$sku.items"},
{"$group": {"_id": "$_id", "items": {"$addToSet":{"_id": "$items._id"}}}},
{"$match": {"items._id": {$all:[2,7]}}}
])