Mongodb: FInd one article for each author ID - mongodb

I have a user with an array of authors that he follow, like this:
"authors" : [
ObjectId("5a66d368486631e55a4ed05c"),
ObjectId("5a6765f5486631e55a564ae2")
]
And I have articles with author ID, like this:
"authorId" : ObjectId("5a66d368486631e55a4ed05c"),
I want to get the last article for each author without making multiples calls to the database with a recursivity.
Some ideas?
PD: I'm using the mongodb driver, I don't want to use mongoose for this, thanks

In MongoDB v 3.6 you can use custom pipelines for $lookup operator. In your case you can use $in inside $match stage to get matching articles and then $group those articles by authorId and take last one (using $sort and $last operators). You can add $replaceRoot to get initial shape from articles collection.
db.user.aggregate([
{
$match: { userId: "some user Id" }
},
{
$lookup: {
from: "articles",
let: { authors: "$authors" },
pipeline: [
{
$match: {
$expr: {
$in: [ "$authorId", "$$authors" ]
}
}
},
{
$sort: { createdAt: -1 }
},
{
$group: {
_id: "$authorId",
article: { $first: "$$ROOT" }
}
},
{
$replaceRoot: { newRoot: "$article" }
}
],
as: "articles"
}
}
])

Related

How to find duplicate records based on an id and a datetime field in MongoDB?

I have a MongoDB collection with millions of record. Sample records are shown below:
[
{
_id: ObjectId("609977b0e8e1c615cb551bf5"),
activityId: "123456789",
updateDateTime: "2021-03-24T20:12:02Z"
},
{
_id: ObjectId("739177b0e8e1c615cb551bf5"),
activityId: "123456789",
updateDateTime: "2021-03-24T20:15:02Z"
},
{
_id: ObjectId("805577b0e8e1c615cb551bf5"),
activityId: "123456789",
updateDateTime: "2021-03-24T20:18:02Z"
}
]
Multiple records could have the same activityId, in this case i want just the record that has the largest updateDateTime.
I have tried doing this and it works fine on a smaller collection but times out on a large collection.
[
{
$lookup: {
from: "MY_TABLE",
let: {
existing_date: "$updateDateTime",
existing_sensorActivityId: "$activityId"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$activityId", "$$existing_sensorActivityId"] },
{ $gt: ["$updateDateTime", "$$existing_date"] }
]
}
}
}
],
as: "matched_records"
}
},
{ $match: { "matched_records.0": { $exists: true } } },
{ $project: { _id: 1 } }
]
This gives me _ids for all the records which have the same activity id but smaller updateDateTime.
The slowness occurs at this step -> "matched_records.0": {$exists:true}
Is there a way to speed up this step or are there any other approach to this problem?
You can find unique documents and write result in new collection using $out instead of finding duplicate documents and deleting them,
How to find unique documents?
$sort by updateDateTime in descending order
$group by activityId and get first root record
$replaceRoot to replace record in root
$out to write query result in new collection
[
{ $sort: { updateDateTime: -1 } },
{
$group: {
_id: "$activityId",
record: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$record" } },
{ $out: "newCollectionName" } // set new collection name
]
Playground

return match item only from array of object mongoose

[
{
item: "journal",
instock: [
{
warehouse: "A",
qty: 5,
items: null
},
{
warehouse: "C",
qty: 15,
items: [
{
name: "alexa",
age: 26
},
{
name: "Shawn",
age: 26
}
]
}
]
}
]
db.collection.find({
"instock.items": {
$elemMatch: {
name: "alexa"
}
}
})
This returns whole items array where as i just want items array with one item {name: 'alexa', age: 26}
Playground Link : https://mongoplayground.net/p/0gB4hNswA6U
You can use the .aggregate(pipeline) function.
Your code will look like:
db.collection.aggregate([{
$unwind: {
path: "$instock"
}
}, {
$unwind: {
path: "$instock.items"
}
}, {
$replaceRoot: {
newRoot: "$instock.items"
}
}, {
$match: {
name: "alexa"
}
}])
Commands used in this pipeline:
$unwind - deconstructs an array of items into multiple documents which all contain the original fields of the original documents except for the unwinded field which now have a value of all the deconstructed objects in the array.
$replaceRoot - takes the inner object referenced on newRoot and puts it as the document.
$match - a way to filter the list of documents you ended up with by some condition. Basically the first argument in the .find() function.
For more information about aggregation visit MongoDB's website:
Aggregation
$unwind
$replaceRoot
$match
EDIT
The wanted result was to get single item arrays as a response, to achieve that you can simply remove the $replaceRoot stage in the pipeline.
Final pipeline:
db.collection.aggregate([{
$unwind: {
path: "$instock"
}
}, {
$unwind: {
path: "$instock.items"
}
}, {
$match: {
"instock.items.name": "alexa"
}
}])
You have to use $elemMatch in projection section:
db.collection.find({
// whatever your query is
}, {
"instock.items": {
$elemMatch: {
name: "alexa"
}
}
});
UPDATE
Here is a good example of this usage. From the official mongodb documentation.
An alternative approach where $filter is the key at the project stage.
db.collection.aggregate([
{
$match: {
"instock.items.name": "alexa"
}
},
{
$unwind: "$instock"
},
{
$project: {
"item": "$item",
qty: "$instock.qty",
warehouse: "$instock.warehouse",
items: {
$filter: {
input: "$instock.items",
as: "item",
cond: {
$eq: [
"$$item.name",
"alexa"
]
}
}
}
}
},
{
$group: {
_id: "$_id",
instock: {
$push: "$$ROOT"
}
}
}
])
The idea is to:
have $match as top level filter
then $unwind instock array items to prepare for the $filter
Use $project for rest of the fields as they are, and use $filter on items array field
Finally $group them back since $unwind was used previously.
Play link

How do I recombine unwinded documents?

I have the following document:
{
"_id" : ObjectId("5881cfa62189aa40268b458a"),
"description" : "Document A",
"companies" : [
{"code" : "0001"},
{"code" : "0002"},
{"code" : "0003"}
]
}
I want to filter the companies array to remove some objects based on the code field.
I've tried to use unwind and then match to filter out the companies, but I don't know how to recombine the objects. Is there another way of doing this?
Here's what I've tried so far:
db.getCollection('test').aggregate([
{
$unwind: {
'path': '$companies'
}
},
{
$match: {
'companies.code': {$in: ['0001', '0003']}
}
}
// How do I merge them back into a single document?
]);
A better way would be to just use the $filter operator on the array.
db.getCollection('test').aggregate([
{
$project:
{
companies: {
$filter: {
input: '$companies',
as: 'company',
cond: {$in: ['$$company.code', ['0001', '0003']]}
}
}
}
}
])
You can $group and control the document structure like that but its tedious work as you have to specify each and every field you want to preserve.
I recommend instead of unwinding to use $filter to match the companies like so:
db.getCollection('test').aggregate([
{
$addFields: {
companies: {
$filter: {
input: "$companies",
as: "company",
cond: {$in: ["$$company.code", ['0001', '0003']]}
}
}
}
},
{ // we now need this match as documents with no matched companies might exist
$match: {
"companies.0": {$exists: true}
}
}
])
If you want to keep the way you are doing using Aggregation pipeline:
db.getCollection('testcol').aggregate([
{$unwind: {'path': '$companies'}},
{$match: {'companies.code': {$in: ['0001', '0003']}}},
{$group: {_id: "$_id", description: { "$first": "$description" } , "companies": { $push: "$companies" }}} ,
])

Complex aggregation query with in clause from document array

Below is the sample MongoDB Data Model for a user collection:
{
"_id": ObjectId('58842568c706f50f5c1de662'),
"userId": "123455",
"user_name":"Bob"
"interestedTags": [
"music",
"cricket",
"hiking",
"F1",
"Mobile",
"racing"
],
"listFriends": [
"123456",
"123457",
"123458"
]
}
listFriends is an array of userId for other users
For a particular userId I need to extract the listFriends (userId's) and for those userId's I need to aggregate the interestedTags and their count.
I would be able to achieve this by splitting the query into two parts:
1.) Extract the listFriends for a particular userId,
2.) Use this list in an aggregate() function, something like this
db.user.aggregate([
{ $match: { userId: { $in: [ "123456","123457","123458" ] } } },
{ $unwind: '$interestedTags' },
{ $group: { _id: '$interestedTags', countTags: { $sum : 1 } } }
])
I am trying to solve the question: Is there a way to achieve the above functionality (both steps 1 and 2) in a single aggregate function?
You could use $lookup to look for friend documents. This stage is usually used to join two different collection, but it can also do join upon one single collection, in your case I think it should be fine:
db.user.aggregate([{
$match: {
_id: 'user1',
}
}, {
$unwind: '$listFriends',
}, {
$lookup: {
from: 'user',
localField: 'listFriends',
foreignField: '_id',
as: 'friend',
}
}, {
$project: {
friend: {
$arrayElemAt: ['$friend', 0]
}
}
}, {
$unwind: '$friend.interestedTags'
}, {
$group: {
_id: '$friend.interestedTags',
count: {
$sum: 1
}
}
}]);
Note: I use $lookup and $arrayElemAt which are only available in Mongo 3.2 or newer version, so check your Mongo version before using this pipeline.

MongoDB - objects? Why do I need _id in aggregate

Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id with something else like word Test, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
In a $group stage, _id is used to designate the group condition. You obviously need it.
If you're familiar with the SQL world, think of it as the GROUP BY clause.
Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.
The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.
In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,
$group : {_id : null, totalPop: { $sum: "$pop" }}}
would provide a single result for totalPop (akin to SELECT SUM() FROM table).
This behaviour is well described in the group operator documentation.
We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: "$founded_year",
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id": 1
}
}]).pretty()
We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:
db.companies.aggregate([{
$match: {
founded_year: {
$gte: 2010
}
}
}, {
$group: {
_id: {
founded_year: "$founded_year",
category_code: "$category_code"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.founded_year": 1
}
}]).pretty()
$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:
db.companies.aggregate([{
$group: {
_id: {
ipo_year: "$ipo.pub_year"
},
companies: {
$push: "$name"
}
}
}, {
$sort: {
"_id.ipo_year": 1
}
}]).pretty()
It's also perfect to have an expression that resolves to a document as a _id key.
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])