How to: $merge and On-Demand-Materialized views - mongodb

In the Mongo docs on materialized views i saw this example:
updateMonthlySales = function(startDate) {
db.bakesales.aggregate( [
{ $match: { date: { $gte: startDate } } },
{ $group: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, sales_quantity: { $sum: "$quantity"}, sales_amount: { $sum: "$amount" } } },
{ $merge: { into: "monthlybakesales", whenMatched: "replace" } }
] );
};
I wanted to understand
where do I write this function,
and how I can invoke it
It would be extra amazing if anyone has a solution from a Play-Framework/Reactive Mongo pov. Reactive-Mongo's way of writing aggregation is quite difficult imo, and I wanted to find a different way to trigger an aggregation, which. updates a view, which will be read by my application.
Any help would be very much appreciated :)

The page uses interactive shell to explain how materialised views work.
In this example they define the function in the mongo shell and then call it in the same shell https://docs.mongodb.com/manual/core/materialized-views/#perform-initial-run
This is for documentation purposes only. You need to run the function to actualize the view and they do it by invoking the same function in the same shell https://docs.mongodb.com/manual/core/materialized-views/#refresh-materialized-view
You won't do it manually in prod. It must be a daemon to run the pipeline by schedule, event, or on-demand. Depending on language you are using to write the daemon the syntax may be completely different. The essential part is it must be an aggregation pipeline with the $merge stage at the end.

Related

Morphia Aggregation Lookup syntax?

I'm trying to achieve a join on three String fields between two collections on MongoDB v. 4.4.3: one containing the original documents, the other the translations.
Both document types look like this:
{
"_id" : ObjectId("60644367b521563be8044f07"),
"dsId" : "2051918",
"lcId" : "data_euscreenXL_EUS_15541BBE705033639D4E06691D7A5D2E",
"pgId" : "1",
(...)
This MongoDB query does what I need, embedding the Translations in the result:
db.Original.aggregate([
{ $match: { query parameters } },
{ $lookup:
{
from: "Translation",
let: { "origDsId": "$dsId", origLcId: "$lcId", "origPgId": "$pgId" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$dsId", "$$origDsId" ] },
{ $eq: [ "$lcId", "$$origLcId" ] },
{ $eq: [ "$pgId", "$$origPgId" ] }
]
}
}
},
{ $project: { dsId: 0, _id: 0 } }
],
as: "translations"
}
}])
However, I can't figure out how to write the equivalent Morphia query. I updated to Morphia v.2.2, which adds the required features, but it's all very new and hasn't yet been documented on morphia.dev; I couldn't find much more on Javadoc either. This Morphia unit test on Github looked interesting and I tried copying that approach:
Aggregation<Original> query = datastore.aggregate(Original.class)
.match(eq("dsId", datasetId), eq("lcId", localId))
.lookup(Lookup.lookup(Translation.class)
.let("origDsId", value("$dsId"))
.let("origLcId", value("$lcId"))
.let("origPgId", value("$pgId"))
.pipeline(Match.match(expr(Expressions.of()
.field("$and",
array(Expressions
.of().field("$eq",
array(field("dsId"),
field("$origDsId"))),
Expressions
.of().field("$eq",
array(field("lcId"),
field("$origLcId"))),
Expressions
.of().field("$eq",
array(field("pgId"),
field("$origPgId"))))))))
.as("translations"));
...
This returns the Original documents, but fails to join the Translations.
The problem is that the syntax of the pipeline stage is rather puzzling. I wonder if anyone can shed some light on this?
the unit test example does not use (or need?) the double-$ form seen in "$$origDsId"? From the MongoDB documentation I understand that this form is used to refer to externally defined variables (eg in the "let" assignment before the "pipeline") but they don't work in the quoted example either;
what is the role of the static ArrayExpression "array"? It looks as if it's a kind of assignment container, where Expressions.of().field("$eq", array(field("dsId"), field("$origDsId"))) might mean something like "dsId" = "$origDsId" - which would be what I need (if it would work ;) )
I tried all sorts of combinations, using field("$origDsId"), value("$origDsId"), field("$$origDsId"), value("$$origDsId"), etcetera, but having no luck so far.
Thanks in advance!

Delete all but one duplicate from a mongo db

So I mad the mistake and saved a lot of doduments twice because I messed up my document id. Because I did a Insert, i multiplied my documents everytime I saved them. So I want to delete all duplicates except the first one, that i wrote. Luckilly the documents have an implicit unique key (match._id) and I should be able to tell what the first one was, because I am using the object id.
The documents look like this:
{
_id: "5e8e2d28ca6e660006f263e6"
match : {
_id: 2345
...
}
...
}
So, right now I have a aggregation that tells me what elements are duplicated and stores them in a collection. There is for sure a more elegant way, but I am still learning.
[{$sort: {"$_id": 1},
{$group: {
_id: "$match._id",
duplicateIds: {$push: "$_id"},
count: {$sum: 1}
}},
{$match: {
count: { $gt: 1 }
}}, {$addFields: {
deletableIds: { $slice: ["$duplicateIds", 1, 1000 ] }
}},
{$out: 'DeleteableIds'}]
Now I do not know how to proceed further, as it does not seem to have a "delete" operation in aggregations and I do not want to write those temp data to a db just so I can write a delete command with that, as I want to delete them in one go. Is there any other way to do this? I am still learning with mongodb and feel a little bit overwhelmed :/
Rather than doing all of those you can just pick first document in group for each _id: "$match._id" & make it as root document. Also, I don't think you need to do sorting in your case :
db.collection.aggregate([
{
$group: {
_id: "$match._id",
doc: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$doc"
}
}, {$out: 'DeleteableIds'}
])
Test : MongoDB-Playground
I think you're on the right track, however, to delete the duplicates you've found you can use a bulk write on the collection.
So if we imagine you aggregation query saved the following in the the DeleteableIds collection
> db.DeleteableIds.insertMany([
... {deletableIds: [1,2,3,4]},
... {deletableIds: [103,35,12]},
... {deletableIds: [345,311,232,500]}
... ]);
We can now take them and write a bulk write command:
const bulkwrite = db.DeleteableIds.find().map(x => ({ deleteMany : { filter: { _id: { $in: x.deletableIds } } } }))
then we can execute that against the database.
> db.collection1.bulkWrite(bulkwrite)
this will then delete all the duplicates.

Passing Variables to a MongoDB View

i'm trying to make views in MongoDB to avoid unnecessary returns. In the documentation says that the aggregation functions can take variables with a double dollar sign, taking this in mind i have created a view that in this example should take one variable to filter customerIds and group the results to sum the payments of different documents.
Example:
db.createView(
"viewName",
"myCollection",
[{
$match: { "customerId": "$$customerId", }
},{
$group: {
_id: null,
total: "$amount",
}
}]
)
The view is created OK and if i put some valid customerId in the aggregation function that works ok, but i don't have the slightest idea how to execute the view and pass the customerID that i need.
Any ideas? The mongodb documentation does not help me in this situation and i really need to create this as a view, since there are many applications that will connect to this view(s).
I have tried:
db.viewName.find({customerId: "some valid id"});
You can access it just like a collection, for example I am creating a view via:
db.runCommand({
create: 'AuthorsView',
viewOn: 'authors',
pipeline: [{
"$group": {
"_id": "$email",
"count": {
"$sum": 1
}
}
}]
})
Since this is now an existing view I can simply do:
db.getCollection('AuthorsView').find({})
To see all the documents or to add more parameters to the find
Not sure what you mean by passing variables since views are just like collections ... you run queries against them via find & aggregate.
First, you can't pass variables to $match without $expr. There is no error because "$$..." is interpreted as a string.
Second, If we fix things like this:
db.createView(
"viewName",
"myCollection",
[{
$match: {$expr:{$eq:["$customerId","$$customerId"]}}, }
},{
$group: {
_id: null,
total: "$amount",
}
}]
)
But $$ is not a system variable, so this won't work. You could pass a system variable like $$ROOT or a field path $field.path; the user-defined variables are made up of system variables or collection data.

Mongo aggregation and MongoError: exception: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit

I'm trying to aggregate data from my Mongo collection to produce some statistics for FreeCodeCamp by making a large json file of the data to use later.
I'm running into the error in the title. There doesn't seem to be a lot of information about this, and the other posts here on SO don't have an answer. I'm using the latest version of MongoDB and drivers.
I suspect there is probably a better way to run this aggregation, but it runs fine on a subset of my collection. My full collection is ~7GB.
I'm running the script via node aggScript.js > ~/Desktop/output.json
Here is the relevant code:
MongoClient.connect(secrets.db, function(err, database) {
if (err) {
throw err;
}
database.collection('user').aggregate([
{
$match: {
'completedChallenges': {
$exists: true
}
}
},
{
$match: {
'completedChallenges': {
$ne: ''
}
}
},
{
$match: {
'completedChallenges': {
$ne: null
}
}
},
{
$group: {
'_id': 1, 'completedChallenges': {
$addToSet: '$completedChallenges'
}
}
}
], {
allowDiskUse: true
}, function(err, results) {
if (err) { throw err; }
var aggData = results.map(function(camper) {
return _.flatten(camper.completedChallenges.map(function(challenges) {
return challenges.map(function(challenge) {
return {
name: challenge.name,
completedDate: challenge.completedDate,
solution: challenge.solution
};
});
}), true);
});
console.log(JSON.stringify(aggData));
process.exit(0);
});
});
Aggregate returns a single document containing all the result data, which limits how much data can be returned to the maximum BSON document size.
Assuming that you do actually want all this data, there are two options:
Use aggregateCursor instead of aggregate. This returns a cursor rather than a single document, which you can then iterate over
add a $out stage as the last stage of your pipeline. This tells mongodb to write your aggregation data to the specified collection. The aggregate command itself returns no data and you then query that collection as you would any other.
It just means that the result object you are building became too large. This kind of issue should not be impacted by the version. The fix implemented for 2.5.0 only prevents the crash from occurring.
You need to filter ($match) properly to have the data which you need in result. Also group with proper fields. The results are put into buffer of 64MB. So reduce your data. $project only the columns you require in result. Not whole documents.
You can combine your 3 $match objects to single to reduce pipelines.
{
$match: {
'completedChallenges': {
$exists: true,
$ne: null,
$ne: ""
}
}
}
I had this issue and I couldn't debug the problem so I ended up abandoning the aggregation approach. Instead I just iterated through each entry and created a new collection. Here's a stripped down shell script which might help you see what I mean:
db.new_collection.ensureIndex({my_key:1}); //for performance, not a necessity
db.old_collection.find({}).noCursorTimeout().forEach(function(doc) {
db.new_collection.update(
{ my_key: doc.my_key },
{
$push: { stuff: doc.stuff, other_stuff: doc.other_stuff},
$inc: { thing: doc.thing},
},
{ upsert: true }
);
});
I don't imagine that this approach would suit everyone, but hopefully that helps anyone who was in my particular situation.

Order_by length of listfield in mongoengine

I wan't to run a query to get all Articles that have more than 6 com and then sort according length of com list,
for this i doing it:
ArticleModel.objects.filter(com__6__exists=True).order_by('-com.length')[:50]
suppose com is a ListField, but ordering not work, how can i fix it? thanks
Standard queries cannot do this as the "sort" needs to be done on a physical field present in the document. The best way to do this is to actually keep a count of your "list" as another field in the document. That also makes your query more efficient as well as that "counter" field can be indexed, so the basic query becomes ( Raw MongoDB sytax ) :
{ "comLength": { "$gt": 6 } }
If you cannot or do not want to change the document structure then the only way to otherwise sort on the length of your list is to $project it via .aggregate():
ArticleModel._get_collection().aggregate([
{ "$match": { "com.6": { "$exists": true } }},
{ "$project": {
"com": 1,
"otherField": 1,
"comLength": { "$size": "$com" }
}},
{ "$sort": { "comLength": -1 } }
])
And that considers that you have MongoDB 2.6 at least for the use of the $size aggregation operator. If you don't then you have to $unwind and $group in order to calculate the length of arrays:
ArticleModel._get_collection().aggregate([
{ "$match": { "com.6": { "$exists": true } }},
{ "$unwind": "$com" },
{ "$group": {
"_id": "$_id",
"otherField": { "$first": "$otherField" }
"com": { "$push": "$com" },
"comLength": { "$sum": 1 }
}},
{ "$sort": { "comLength": -1 } }
])
So if you are going to go down that route then take a good look at the documentation since you are possibly not used to the raw MongoDB syntax and have been using the query DSL that MongoEngine provdides.
Overall, only the aggregation providers in .aggregate() or .mapReduce() can actually "create a field" that is not present within the document. There is also not test for the "current" length that is available to standard projection or sorting of documents either.
Your best option to to add another field and keep it in sync with the actual array length. But failing that the above shows you the general approach.
If you're creating the database and you know such request will mostly be requested a lot it's recommended to add "com_length" field in A ArticleModel and make it automatically inserted on every save using save() method
add inside of your ArticleModel in models.py
def save(self, *args, **kwargs):
self.com_length = len(self.com)
return super(ArticleModel, self).save(*args, **kwargs)
then for requesting the asked question:
ArticleModel.objects.filter(com__6__exists=True).order_by('-com_length')[:50]