Mongodb : get whether a document is the latest with a field value and filter on the result - mongodb

I am trying to port an existing SQL schema into Mongo.
We have document tables, with sometimes several times the same document, with a different revision but the same reference. I want to get only the latest revisions of the documents.
A sample input data:
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC305",
"code" : "305-D",
"title" : "Document 305",
"creationdate" : ISODate("2011-11-24T15:13:28.887Z"),
"creator" : "X"
},
{
"Uid" : "xxx",
"status" : "COMMENTED",
"reference" : "DOC306",
"code" : "306-A",
"title" : "Document 306",
"creationdate" : ISODate("2011-11-28T07:23:18.807Z"),
"creator" : "X"
},
{
"Uid" : "xxx",
"status" : "COMMENTED",
"reference" : "DOC306",
"code" : "306-B",
"title" : "Document 306",
"creationdate" : ISODate("2011-11-28T07:26:49.447Z"),
"creator" : "X"
},
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC501",
"code" : "501-A",
"title" : "Document 501",
"creationdate" : ISODate("2011-11-19T06:30:35.757Z"),
"creator" : "X"
},
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC501",
"code" : "501-B",
"title" : "Document 501",
"creationdate" : ISODate("2011-11-19T06:40:32.957Z"),
"creator" : "X"
}
Given this data, I want this result set (sometimes I want only the last revision, sometimes I want all revisions with an attribute telling me whether it's the latest):
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC305",
"code" : "305-D",
"title" : "Document 305",
"creationdate" : ISODate("2011-11-24T15:13:28.887Z"),
"creator" : "X",
"lastrev" : true
},
{
"Uid" : "xxx",
"status" : "COMMENTED",
"reference" : "DOC306",
"code" : "306-B",
"title" : "Document 306",
"creationdate" : ISODate("2011-11-28T07:26:49.447Z"),
"creator" : "X",
"lastrev" : true
},
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC501",
"code" : "501-B",
"title" : "Document 501",
"creationdate" : ISODate("2011-11-19T06:40:32.957Z"),
"creator" : "X",
"lastrev" : true
}
I already have a bunch of filters, sorting, and skip/limit (for pagination of data), so the final result set should be mindful of these constraints.
The current "find" query (built with the .Net driver), which filters fine but gives me all revisions of each document:
coll.find(
{ "$and" : [
{ "$or" : [
{ "deletedid" : { "$exists" : false } },
{ "deletedid" : null }
] },
{ "$or" : [
{ "taskid" : { "$exists" : false } },
{ "taskid" : null }
] },
{ "objecttypeuid" : { "$in" : ["xxxxx"] } }
] },
{ "_id" : 0, "Uid" : 1, "lastrev" : 1, "title" : 1, "code" : 1, "creator" : 1, "owner" : 1, "modificator" : 1, "status" : 1, "reference": 1, "creationdate": 1 }
).sort({ "creationdate" : 1 }).skip(0).limit(10);
Using another question, I have been able to build this aggregation, which gives me the latest revision of each document, but with not enough attributes in the result:
coll.aggregate([
{ $sort: { "creationdate": 1 } },
{
$group: {
"_id": "$reference",
result: { $last: "$creationdate" },
creationdate: { $last: "$creationdate" }
}
}
]);
I would like to integrating the aggregate with the find query.

I have found the way to mix aggregation and filtering:
coll.aggregate(
[
{ $match: {
"$and" : [
{ "$or" : [
{ "deletedid" : { "$exists" : false } },
{ "deletedid" : null }
] },
{ "$or" : [
{ "taskid" : { "$exists" : false } },
{ "taskid" : null }
] },
{ "objecttypeuid" : { "$in" : ["xxx"] } }
]
}
},
{ $sort: { "creationdate": 1 } },
{ $group: {
"_id": "$reference",
"doc": { "$last": "$$ROOT" }
}
},
{ $sort: { "doc.creationdate": 1 } },
{ $skip: skip },
{ $limit: limit }
],
{ allowDiskUse: true }
);
For each result node, this gives me a "doc" node with the document data. It has too much data still (it's missing projections), but it's a start.
Translated in .Net:
FilterDefinitionBuilder<BsonDocument> filterBuilder = Builders<BsonDocument>.Filter;
FilterDefinition<BsonDocument> filters = filterBuilder.Empty;
filters = filters & (filterBuilder.Not(filterBuilder.Exists("deletedid")) | filterBuilder.Eq("deletedid", BsonNull.Value));
filters = filters & (filterBuilder.Not(filterBuilder.Exists("taskid")) | filterBuilder.Eq("taskid", BsonNull.Value));
foreach (var f in fieldFilters) {
filters = filters & filterBuilder.In(f.Key, f.Value);
}
var sort = Builders<BsonDocument>.Sort.Ascending(orderby);
var group = new BsonDocument {
{ "_id", "$reference" },
{ "doc", new BsonDocument("$last", "$$ROOT") }
};
var aggregate = coll.Aggregate(new AggregateOptions { AllowDiskUse = true })
.Match(filters)
.Sort(sort)
.Group(group)
.Sort(sort)
.Skip(skip)
.Limit(rows);
return aggregate.ToList();
I'm pretty sure there are better ways to do this, though.

You answer is pretty close. Instead of $last, $max is better.
About $last operator:
Returns the value that results from applying an expression to the last document in a group of documents that share the same group by a field. Only meaningful when documents are in a defined order.
Get the last revision in each group, see code below in mongo shell:
db.collection.aggregate([
{
$group: {
_id: '$reference',
doc: {
$max: {
"creationdate" : "$creationdate",
"code" : "$code",
"Uid" : "$Uid",
"status" : "$status",
"title" : "$title",
"creator" : "$creator"
}
}
}
},
{
$project: {
_id: 0,
Uid: "$doc.Uid",
status: "$doc.status",
reference: "$_id",
code: "$doc.code",
title: "$doc.title",
creationdate: "$doc.creationdate",
creator: "$doc.creator"
}
}
]).pretty()
The output as your expect:
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC501",
"code" : "501-B",
"title" : "Document 501",
"creationdate" : ISODate("2011-11-19T06:40:32.957Z"),
"creator" : "X"
}
{
"Uid" : "xxx",
"status" : "COMMENTED",
"reference" : "DOC306",
"code" : "306-B",
"title" : "Document 306",
"creationdate" : ISODate("2011-11-28T07:26:49.447Z"),
"creator" : "X"
}
{
"Uid" : "xxx",
"status" : "ACCEPTED",
"reference" : "DOC305",
"code" : "305-D",
"title" : "Document 305",
"creationdate" : ISODate("2011-11-24T15:13:28.887Z"),
"creator" : "X"
}

Related

Problems aggregating MongoDB

I am having problems aggregating my Product Document in MongoDB.
My Product Document is:
{
"_id" : ObjectId("5d81171c2c69f45ef459e0af"),
"type" : "T-Shirt",
"name" : "Panda",
"description" : "Panda's are cool.",
"image" : ObjectId("5d81171c2c69f45ef459e0ad"),
"created_at" : ISODate("2019-09-17T18:25:48.026+01:00"),
"is_featured" : false,
"sizes" : [
"XS",
"S",
"M",
"L",
"XL"
],
"tags" : [ ],
"pricing" : {
"price" : 26,
"sale_price" : 8
},
"categories" : [
ObjectId("5d81171b2c69f45ef459e086"),
ObjectId("5d81171b2c69f45ef459e087")
],
"sku" : "5d81171c2c69f45ef459e0af"
},
And my Category Document is:
{
"_id" : ObjectId("5d81171b2c69f45ef459e087"),
"name" : "Art",
"description" : "These items are our artsy options.",
"created_at" : ISODate("2019-09-17T18:25:47.196+01:00")
},
My aim is to perform aggregation on the Product Document in order to count the number of items within each Category. So I have the Category "Art", I need to count the products are in the "Art" Category:
My current aggregate:
db.product.aggregate(
{ $unwind : "$categories" },
{
$group : {
"_id" : { "name" : "$name" },
"doc" : { $push : { "category" : "$categories" } },
}
},
{ $unwind : "$doc" },
{
$project : {
"_id" : 0,
"name" : "$name",
"category" : "$doc.category"
}
},
{
$group : {
"_id" : "$category",
"name": { "$first": "$name" },
"items_in_cat" : { $sum : 1 }
}
},
{ "$sort" : { "items_in_cat" : -1 } },
)
Which does actually work but not as I need:
{
"_id" : ObjectId("5d81171b2c69f45ef459e082"),
"name" : null, // Why is the name of the category no here?
"items_in_cat" : 4
},
As we can see the name is null. How can I aggregate the output to be:
{
"_id" : ObjectId("5d81171b2c69f45ef459e082"),
"name" : "Art",
"items_in_cat" : 4
},
We need to use $lookup to fetch the name from Category collection.
The following query can get us the expected output:
db.product.aggregate([
{
$unwind:"$categories"
},
{
$group:{
"_id":"$categories",
"items_in_cat":{
$sum:1
}
}
},
{
$lookup:{
"from":"category",
"let":{
"id":"$_id"
},
"pipeline":[
{
$match:{
$expr:{
$eq:["$_id","$$id"]
}
}
},
{
$project:{
"_id":0,
"name":1
}
}
],
"as":"categoryLookup"
}
},
{
$unwind:{
"path":"$categoryLookup",
"preserveNullAndEmptyArrays":true
}
},
{
$project:{
"_id":1,
"name":{
$ifNull:["$categoryLookup.name","NA"]
},
"items_in_cat":1
}
}
]).pretty()
Data set:
Collection: product
{
"_id" : ObjectId("5d81171c2c69f45ef459e0af"),
"type" : "T-Shirt",
"name" : "Panda",
"description" : "Panda's are cool.",
"image" : ObjectId("5d81171c2c69f45ef459e0ad"),
"created_at" : ISODate("2019-09-17T17:25:48.026Z"),
"is_featured" : false,
"sizes" : [
"XS",
"S",
"M",
"L",
"XL"
],
"tags" : [ ],
"pricing" : {
"price" : 26,
"sale_price" : 8
},
"categories" : [
ObjectId("5d81171b2c69f45ef459e086"),
ObjectId("5d81171b2c69f45ef459e087")
],
"sku" : "5d81171c2c69f45ef459e0af"
}
Collection: category
{
"_id" : ObjectId("5d81171b2c69f45ef459e086"),
"name" : "Art",
"description" : "These items are our artsy options.",
"created_at" : ISODate("2019-09-17T17:25:47.196Z")
}
{
"_id" : ObjectId("5d81171b2c69f45ef459e087"),
"name" : "Craft",
"description" : "These items are our artsy options.",
"created_at" : ISODate("2019-09-17T17:25:47.196Z")
}
Output:
{
"_id" : ObjectId("5d81171b2c69f45ef459e087"),
"items_in_cat" : 1,
"name" : "Craft"
}
{
"_id" : ObjectId("5d81171b2c69f45ef459e086"),
"items_in_cat" : 1,
"name" : "Art"
}

MongoDB perform join with aggregate

I have two collections User and Post.
User
user_id
region
is_join
Post
post_id
user_id
body
is_block
From perspective of rdb, User:Post relationship is 1:N.
Each user can write multiple post.
For example, currently documents is inserted like this.
User
> db.user.find()
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d2"), "user_id" : NumberLong(1), "region" : "US", "is_join" : true }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d5"), "user_id" : NumberLong(2), "region" : "KR", "is_join" : true }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d8"), "user_id" : NumberLong(3), "region" : "US", "is_join" : true }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28da"), "user_id" : NumberLong(4), "region" : "KR", "is_join" : false }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28dc"), "user_id" : NumberLong(5), "region" : "US", "is_join" : true }
Post
> db.post.find()
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d3"), "post_id" : NumberLong(1), "user_id" : NumberLong(1), "body" : "first", "is_block" : false }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d4"), "post_id" : NumberLong(4), "user_id" : NumberLong(1), "body" : "fourth", "is_block" : false }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d6"), "post_id" : NumberLong(2), "user_id" : NumberLong(2), "body" : "second", "is_block" : false }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d7"), "post_id" : NumberLong(3), "user_id" : NumberLong(2), "body" : "third", "is_block" : false }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28d9"), "post_id" : NumberLong(5), "user_id" : NumberLong(3), "body" : "fifth", "is_block" : true }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28db"), "post_id" : NumberLong(6), "user_id" : NumberLong(4), "body" : "sixth", "is_block" : false }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28dd"), "post_id" : NumberLong(7), "user_id" : NumberLong(5), "body" : "seven", "is_block" : true }
{ "_id" : ObjectId("5d15e41a1b48d9417ebc28de"), "post_id" : NumberLong(8), "user_id" : NumberLong(5), "body" : "eight", "is_block" : false }
To perform join via aggregate, there is more condition that have to apply.
User region='US', is_join=true
Posts created by user with the result of number 1
Post is_block=false
Sort by post_id
(Optional) If user do not wrote any post, except it. I know it can be perform through preserveNullAndEmptyArrays, but I think it cause performance issue.
Desired result
{
"posts" : [
{
"post_id" : NumberLong(1),
"body" : "first",
"is_block" : false
},
{
"post_id" : NumberLong(2),
"body" : "second",
"is_block" : false
},
{
"post_id" : NumberLong(3),
"body" : "third",
"is_block" : false
},
{
"post_id" : NumberLong(4),
"body" : "fourth",
"is_block" : false
},
{
"post_id" : NumberLong(8),
"body" : "eight",
"is_block" : false
}
]
}
post_id = 5 was excluded by is_block=true
post_id = 6 was excluded by is_join=false
post_id = 7 was excluded by is_block=true
And whole result is sorted by post_id.
I'm new at mongodb So, maybe I thinking too much in the form of a relational database.
And I don't know it can be perform on NoSQL.
Is there any way about it?
Any suggestion, very appreciate.
Thanks.
You can achieve such result using lookup pipeline operator.
const region = 'US';
const is_join = true;
const is_block = false;
const query = [
{
$match: {
region: region,
is_join: is_join
}
},
{
$lookup: {
let: { user_id: "$user_id" },
from: 'posts',
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$$user_id", "$user_id"], },
{ $eq: ["$is_block", is_block] }
]
}
}
},
{
$sort: {
post_id: 1
}
}
],
as: "posts"
}
},
{
$unwind: "$posts"
},
{
$group:{
_id: "mygroup",
posts: {
$push: {
post_id: "$posts.post_id",
body: "$posts.body",
is_block: "$posts.is_block",
}
}
}
},
{
$project:{
_id: false
}
}
]
db.users.aggregate(query)
OUTPUT
{
"posts" : [
{
"post_id" : 1,
"body" : "first",
"is_block" : false
},
{
"post_id" : 4,
"body" : "fourth",
"is_block" : false
},
{
"post_id" : 8,
"body" : "eight",
"is_block" : false
}
]
}
Try this -
To join two collection, you can use aggregation as -
User.aggregate([{
'$match': { 'region':'US', 'is_join': true } // match from users collection
}, {
$lookup: { // it will aggregate the result from both collection
from: 'posts',
localField: '_id',
foreignField: 'user_id',
as: 'posts'
}
},
{"$unwind":"$posts"},
{$match : { "posts.is_block" : false } }, // check inside post collection
{$sort : { "posts.post_id" : 1}}, // sort the data
], (err, users) => {
if (err) return callback(err, null);
console.log('users :', users);
});

Mongo aggregation, project a subfield of the first element in the array

I have a sub collection of elements and I want to project a certain subfield of the FIRST item in this collection. I have the following but it only projects the field for ALL elements in the array.
Items is the subcollection of Orders and each Item object has a Details sub object and an ItemName below that. I want to only return the item name of the FIRST item in the list. This returns the item name of every item in the list.
How can I tweak it?
db.getCollection('Orders').aggregate
([
{ $match : { "Instructions.1" : { $exists : true }}},
{ $project: {
_id:0,
'UserId': '$User.EntityId',
'ItemName': '$Items.Details.ItemName'
}
}
]);
Updated:
{
"_id" : "order-666156",
"State" : "ValidationFailed",
"LastUpdated" : {
"DateTime" : ISODate("2017-09-26T08:54:16.241Z"),
"Ticks" : NumberLong(636420128562417375)
},
"SourceOrderId" : "666156",
"User" : {
"EntityId" : NumberLong(34450),
"Name" : "Bill Baker",
"Country" : "United States",
"Region" : "North America",
"CountryISOCode" : "US",
},
"Region" : null,
"Currency" : null,
"Items" : [
{
"ClientOrderId" : "18740113",
"OrigClientOrderId" : "18740113",
"Quantity" : NumberDecimal("7487.0"),
"TransactDateTime" : {
"DateTime" : Date(-62135596800000),
"Ticks" : NumberLong(0)
},
"Text" : null,
"LocateRequired" : false,
"Details" : {
"ItemName" : "Test Item 1",
"ItemCost" : 1495.20
}
},
{
"ClientOrderId" : "18740116",
"OrigClientOrderId" : "18740116",
"Quantity" : NumberDecimal("241.0"),
"TransactDateTime" : {
"DateTime" : Date(-62135596800000),
"Ticks" : NumberLong(0)
},
"Text" : null,
"LocateRequired" : false,
"Details" : {
"ItemName" : "Test Item 2",
"ItemCost" : 2152.64
}
}
]
}
In case your are using at least MongoDB v3.2 you can use the $arrayElemAt operator for that. The below query does what you want. It will, however, not return any data for the sample you provided because the "Instructions.1": { $exists: true } filter removes the sample document.
db.getCollection('Orders').aggregate([{
$match: {
"Instructions.1": {
$exists: true
}
}
}, {
$project: {
"_id": 0,
"UserId": "$User.EntityId",
"ItemName": { $arrayElemAt: [ "$Items.Details.ItemName", 0 /* first item! */] }
}
}])

How to return all project employees?

I have datas of following format collection(projects) inside my database:
{ "_id" : ObjectId("5981a80f223e491a58230e5d"), "id" : 2, "name" : "gbqplhlqxzwl", "managerId" : 65151, "startDate" : "03.11.1999", "finishDate" : "02.01.2003", "projectStatus" : "POSTPONED", "participants" : [ ], "estimatedBudget" : 6017891.811079914 }
{ "_id" : ObjectId("5981a80f223e491a58230e5e"), "id" : 3, "name" : "erfekfsdgryu", "managerId" : 83749, "startDate" : "07.07.2007", "finishDate" : "26.12.2027", "projectStatus" : "POSTPONED", "participants" : [ 19229, 81856, 79270, 5509, 70344, 39424 ], "estimatedBudget" : 3086213.8981674756 }
{ "_id" : ObjectId("5981a80f223e491a58230e5f"), "id" : 1, "name" : "jvbzobhppntd", "managerId" : 18925, "startDate" : "29.04.1999", "finishDate" : "13.10.2008", "projectStatus" : "OPEN", "participants" : [ 46100, 96968, 6676, 56121, 4716, 68901, 43990, 48587, 62547, 30292, 65153, 17551, 27083, 20261, 27097, 50036, 86585, 69890, 18790, 22592, 60774, 93709, 78471, 27157, 4328, 36501, 47296, 16831 ], "estimatedBudget" : 3581496.7068344904 }
{ "_id" : ObjectId("5981a80f223e491a58230e60"), "id" : 4, "name" : "cdspkkqwvwld", "managerId" : 62042, "startDate" : "13.03.1998", "finishDate" : "20.06.2007", "projectStatus" : "OPEN", "participants" : [ 53480, 60897, 23677, 22064, 60807, 66637, 84609, 28378, 87143, 27675, 79283, 94992, 20429, 48769, 91671, 41747, 21651, 91134, 41684, 57228, 51949, 18756, 45679, 87781, 67287, 6902, 27526 ], "estimatedBudget" : 2126283.953787842 }
....
I need to find the busiest employee and list all his projects.
participants array contains employee ids who participate in the project.
I use the following query to find the busiest employee:
db.projects.aggregate(
{
$unwind: '$participants'
},
{
$addFields: {
count: 1
}
},
{
$group: {
_id : '$participants',
participation_count : {
'$sum':'$count'
}
}
},
{
$sort:{participation_count:-1}
},
{
$limit:1
}
)
and this work correctly. But I have no ideas how to list all his projects.
any ideas?
db.projects.aggregate(
[
{
$unwind: '$participants'
},
{
$addFields: {
count: 1
}
},
{
$group: {
_id : '$participants',
participation_count : {'$sum':'$count'},
projectId : {$push: '$id'}
}
},
{
$sort:{participation_count:-1}
},
{
$limit:1
}
],
{
allowDiskUse:true
}
)

Map aggregation results in Mongo

Here is the data set:
{ "_id" : "1", "key" : "111", "payload" : 100, "type" : "foo", "createdAt" : ISODate("2016-07-08T11:59:18.000Z") }
{ "_id" : "2", "key" : "111", "payload" : 100, "type" : "bar", "createdAt" : ISODate("2016-07-09T11:59:19.000Z") }
{ "_id" : "3", "key" : "222", "payload" : 100, "type" : "foo", "createdAt" : ISODate("2016-07-10T11:59:20.000Z") }
{ "_id" : "4", "key" : "222", "payload" : 100, "type" : "foo", "createdAt" : ISODate("2016-07-11T11:59:21.000Z") }
{ "_id" : "5", "key" : "222", "payload" : 100, "type" : "bar", "createdAt" : ISODate("2016-07-12T11:59:22.000Z") }
I have to group them by key:
db.items.aggregate([{$group: {_id: {key: '$key'}}}])
that produces the next set:
{ "_id" : { "key" : "111" } }
{ "_id" : { "key" : "222" } }
And after that I have to retrieve the most recent values of foo and bar per each group record.
My question is what is the most optimal way to do it? I can iterate the items in javascript and perform additional roundtrip to DB per each group result. But I'm not sure if it's time-efficient.
I am not sure about the most optimal way to do it, but the easy one will be to expand your aggregation pipeline like
db.items.aggregate([
{
$group:
{
_id: { key: "$key", type: "$type" },
last: { $max: "$createdAt" }
}
},
{
$group:
{
_id: { key: "$_id.key" },
mostRecent: { $push: { type: "$_id.type", createdAt: "$last" } }
}
}
]);
that for your collection of documents will result into
{ "_id" : { "key" : "222" }, "mostRecent" : [ { "type" : "bar", "createdAt" : ISODate("2016-07-12T11:59:22Z") }, { "type" : "foo", "createdAt" : ISODate("2016-07-11T11:59:21Z") } ] }
{ "_id" : { "key" : "111" }, "mostRecent" : [ { "type" : "bar", "createdAt" : ISODate("2016-07-09T11:59:19Z") }, { "type" : "foo", "createdAt" : ISODate("2016-07-08T11:59:18Z") } ] }