Doctrine Mongo ODM Duplicate Search - mongodb

I currently have a mongo shell command to run a duplicate search in a mailing list collection:
var duplicates = [];
db.mailing_entries.aggregate([
{ $group: {
_id: { full_name: "$full_name", business: "$business", address_line_1: "$address_line_1", postal_code: "$postal_code" },
dups: { $addToSet: "$_id" },
count: { $sum: 1 }
}},
{ $match: {
count: { $gt: 1 }
}}
])
.result
.forEach(function(doc) {
doc.dups.shift();
doc.dups.forEach( function(dupId){
duplicates.push(dupId);
}
)
});
printjson(duplicates);
The shell code is working perfectly for me, however, after much searching, I cannot find a way to properly translate to Doctrine Mongo ODM using a map reduce function, or any other method.
I'm currently using the Doctrine Mongo ODM module integrated with Zend Framework 2.
I have searched far and wide but to no avail.

Underlying doctrine/mongodb package introduced aggregation builder in version 1.2 and while they're no docs written as of yet you can take a look at examples in linked pull request and take advantage of your IDE's autocompletion.
We aim to allow hydrating results of aggregate results into documents in ODM but feature is not ready yet and for now you need to obtain builder like this (supposing MailingEntry is your mapped document)
$builder = $documentManager->getDocumentCollection(MailingEntry::class)->createAggregationBuilder();

Related

MongoDB: Updating a document with existing fields using aggregation

I am new to mongodb, I am learning from some Udemy courses and I want to know how I can update a document existing field without overwriting it.
I have the following collection with these documents:
enter image description here
I want to add new warehouses in the "item":"drafts" within the stock field.
What I am trying is:
enter image description here
And giving the output it seems that is working, but when I do again db.matrices.find(), what I get is the exactly same output that in the first image.
How can I update it? I have tried also the update method, but does not do what I want to do.
Thanks!
PD: I am using linux mint, with mongo v5.0.3, and mongosh v1.1.1
You are using the aggregate pipeline, this does not update the document in the DB, it just retrieves the result. starting in Mongo version 4.2+ you can now use an aggregation pipeline ( with some limitations ) to update a document, like so:
db.collection.updateOne({
item: "drafts"
},
[
{
$set: {
stock: {
$concatArrays: [
"$stock",
[
{
"warehouse": "A",
qty: 20
}
]
]
}
}
}
])
Mongo Playground
I will just say that this specific update is very simple, there is no need to use an aggregation pipeline for it. a simple use of the $push operator in the update field will suffice in a "normal" update:
db.collection.updateOne({
item: "drafts"
},
{
$push: {
stock: {
"warehouse": "A",
qty: 20
}
}
})
Mongo Playground

MongoDB paginate 2 collections together on common field

I've two mongo collections - File and Folder.
Both have some common fields like name, createdAt etc. I've a resources API that returns a response having items from both collections, with a type property added. type can be file or folder
I want to support pagination and sorting in this list, for example sort by createdAt. Is it possible with aggregation, and how?
Moving them to a container collection is not a preferred option, as then I have to maintain the container collection on each create/update/delete on either of the collection.
I'm using mongoose too, if it has got any utility function for this, or a plugin.
In this case, you can use $unionWith. Something like:
Folder.aggregate([
{ $project: { name: 1, createdAt: 1 } },
{
$unionWith: {
coll: "files", pipeline: [ { $project: { name: 1, createdAt: 1 } } ]
}
},
... // your sorting go here
])

MongoDB querying aggregation in one single document

I have a short but important question. I am new to MongoDB and querying.
My database looks like the following: I only have one document stored in my database (sorry for blurring).
The document consists of different fields:
two are blurred and not important
datum -> date
instance -> Array with an Embedded Document Object; Our instance has an id, two not important fields and a code.
Now I want to query how many times an object in my instance array has the group "a" and a text "sample"?
Is this even possible?
I only found methods to count how many documents have something...
I am using Mongo Compass, but i can also use Pymongo, Mongoengine or every other different tool for querying the mongodb.
Thank you in advance and if you have more questions please leave a comment!
You can try this
db.collection.aggregate([
{
$unwind: "$instance"
},
{
$unwind: "$instance.label"
},
{
$match: {
"instance.label.group": "a",
"instance.label.text": "sample",
}
},
{
$group: {
_id: {
group: "$instance.label.group",
text: "$instance.label.text"
},
count: {
$sum: 1
}
}
}
])

Refine/Restructure data from Mongodb query

Im using NodeJs, MongoDB Native 2.0+
The following query fetch one client document containing arrays of embedded staff and services.
db.collection('clients').findOne({_id: sessId}, {"services._id": 1, "staff": {$elemMatch: {_id: reqId}}}, callback)
Return a result like this:
{
_id: "5422c33675d96d581e09e4ca",
staff:[
{
name: "Anders"
_id: "5458d0aa69d6f72418969428"
// More fields not relevant to the question...
}
],
services: [
{
_id: "54578da02b1c54e40fc3d7c6"
},
{
_id: "54578da42b1c54e40fc3d7c7"
},
{
_id: "54578da92b1c54e40fc3d7c9"
}
]
}
Note that each embedded object in services actually contains several fields, but _id is the only field returned by means of the projection of the query.
From this returned data I start by "pluck" all id's from services and save them in an array later used for validation. This is by no means a difficult operation... but I'm curious... Is there an easy way to do some kind of aggregation instead of find, to get an array of already plucked objectId's directly from the DB. Something like this:
{
_id: "5422c33675d96d581e09e4ca",
staff:[
{
name: "Anders"
_id: "5458d0aa69d6f72418969428"
// More fields not relevant to the question...
}
],
services: [
"54578da02b1c54e40fc3d7c6",
"54578da42b1c54e40fc3d7c7",
"54578da92b1c54e40fc3d7c9"
]
}
One way of doing it is to first,
$unwind the document based on the staff field, this is done to
select the intended staff. This step is required due to the
unavailability of the $elemMatch operator in the aggregation
framework.
There is an open ticket here: Jira
Once the document with the correct staff is selected, $unwind, based on $services.
The $group, together $pushing all the services _id together in an array.
This is then followed by a $project operator, to show the intended fields.
db.clients.aggregate([
{$match:{"_id":sessId}},
{$unwind:"$staff"},
{$match:{"staff._id":reqId}},
{$unwind:"$services"},
{$group:{"_id":"$_id","services_id":{$push:"$services._id"},"staff":{$first:"$staff"}}},
{$project:{"services_id":1,"staff":1}}
])

Combine two fields from different documents in Mongodb

I have these documents in a collection :
{topic : "a",
messages : [ObjectId("21312321321323"),ObjectId("34535345353"),...]
},
{topic : "b,
messages : [ObjectId("1233232323232"),ObjectId("6556565656565"),...]
}
Is there a posibility to get a result with the combination of messages fields ? I like to get this for example :
{[
ObjectId(""),ObjectId(""),ObjectId(""),ObjectId("")
]}
I thought that this was possible with MapReduce but in my case the documents doesn't have anything in common. Right now I'm doing this in the backend using javascript and loops, but i think that this isn't the best option. Thanks.
You could use the $group operator in the Aggregation Framework. To use the Aggregation Framework you will want to be sure you're running on MongoDB 2.2 or newer, of course.
If used with $push you will get all the lists of messages concatenated together.
db.myCollection.aggregate({ $group: { messages: { $push: '$messages' } } });
If used with $addToSet you will get only the distinct values.
db.myCollection.aggregate({ $group: { messages: { $addToSet: '$messages' } } });
And if you want to filter down the candidate documents first, you can use $match.
db.myCollection.aggregate([
{ $match: { topic: { $in: [ 'a', 'b' ] } } },
{ $group: { matches: { $sum: 1 }, messages: { $push: '$messages' } } }
]);
One option is to use the aggregation framework.
However, if you're planning on having a large number of results (beyond just a "lightweight" result), a result document exceeding 16MB in size, or using excessive system memory, you'll need to just loop through the objects in the collection and concatenate the results manually (as you suggest you might be doing now) or risk mongodb throwing an exception.
Aggregation limits may be found at the bottom of this page:
http://docs.mongodb.org/manual/applications/aggregation/
Given the limitations, you may want to just use find with a projection to return just messages.
(And with anything like this, I'd strongly recommend you do some performance benchmarks to compare options with your data on your servers as the "Internet" would suggest right now that some people have found the Aggregation support to be slower than other techniques).