MongoDB save aggregation result and retrieve saved result - mongodb

My mongo version is 3.4
I'm running a complex aggregation and writing the result to a collection using $out stage.
What I wish to do is, if that aggregation is ran before and result exists on target collection, just return the saved result without rerunning the whole pipeline.
Problem 1 is, when $out is used, aggregation doesn't return any results, so I have to send a separate query to the target collection to retrieve them.
Problem 2 is, I need another query to check if aggregation is ran before.
Is it possible to combine all 3 queries to a single pipeline, so it would:
check outCollection, if result with given user_id exists, return it
if doesn't, run pipeline and save the result to outCollection
return aggregation result
my current pipeline looks like:
db.getCollection('sourceCollection').aggregate([
{
$match: {
user_id: "myUserId"
}
},...//perform various steps
{
$out: "outCollection"
}],{
allowDiskUse: true
})
Result is saved to outCollection in a document like:
{ user_id: "myUserId", aggResult: [...] }

Related

Mongodb aggregate $count

I would like to count the number of documents returned by an aggregation.
I'm sure my initial aggregation works, because I use it later in my programm. To do so I created a pipeline variable (here called pipelineTest, ask me if you want to see it in detail, but it's quite long, that's why I don't give the lines here).
To count the number of documents returned, I push my pipeline with :
{$count: "totalCount"}
Now I would like to get (or log) totalCount value. What should I do ?
Here is the aggregation :
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.?)
Thanks for your help, I read lot and lot doc about aggregation and I still don't understand how to read the result of an aggregation...
Assuming you're using async/await syntax - you need to await on the result of the aggregation.
You can convert the cursor to an array, get the first element of that array and access totalCount.
pipelineTest.push({$count: "totalCount"});
cursorTest = await collection.aggregate(pipelineTest, options).toArray();
console.log(cursorTest[0].totalCount);
Aggregation
db.mycollection.aggregate([
{
$count: "totalCount"
}
])
Result
[ { totalCount: 3 } ]
Your Particulars
Try the following:
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.totalCount)

what is the difference between MongoDB find and aggregate in below queries?

select records using aggregate:
db.getCollection('stock_records').aggregate(
[
{
"$project": {
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
},
{
"$match": {
"$and": [
{
"info.store_id": "563dcf3465512285781608802a"
},
{
"info.created_date": {
$gt: ISODate("2021-07-18T21:07:42.313+00:00")
}
}
]
}
}
])
select records using find:
db.getCollection('stock_records').find(
{
'info.store_id':'563dcf3465512285781608802a',
'info.created_date':{ $gt:ISODate('2021-07-18T21:07:42.313+00:00')}
})
What is difference between these queries and which is best for select by id and date condition?
I think your question should be rephrased to "what's the difference between find and aggregate".
Before I dive into that I will say that both commands are similar and will perform generally the same at scale. If you want specific differences is that you did not add a project option to your find query so it will return the full document.
Regarding which is better, generally speaking unless you need a specific aggregation operator it's best to use find instead, it performs better
Now why is the aggregation framework performance "worse"? it's simple. it just does "more".
Any pipeline stage needs aggregation to fetch the BSON for the document then convert them to internal objects in the pipeline for processing - then at the end of the pipeline they are converted back to BSON and sent to the client.
This, especially for large queries has a very significant overhead compared to a find where the BSON is just sent back to the client.
Because of this, if you could execute your aggregation as a find query, you should.
Aggregation is slower than find.
In your example, Aggregation
In the first stage, you are returning all the documents with projected fields
For example, if your collection has 1000 documents, you are returning all 1000 documents each having specified projection fields. This will impact the performance of your query.
Now in the second stage, You are filtering the documents that match the query filter.
For example, out of 1000 documents from the stage 1 you select only few documents
In your example, find
First, you are filtering the documents that match the query filter.
For example, if your collection has 1000 documents, you are returning only the documents that match the query condition.
Here You did not specify the fields to return in the documents that match the query filter. Therefore the returned documents will have all fields.
You can use projection in find, instead of using aggregation
db.getCollection('stock_records').find(
{
'info.store_id': '563dcf3465512285781608802a',
'info.created_date': {
$gt: ISODate('2021-07-18T21:07:42.313+00:00')
}
},
{
"info.created_date": 1,
"info.store_id": 1,
"info.store_name": 1,
"_id": 1
}
)

Spring WebFlux + MongoDB: Tailable Cursor and Aggregation

I´m new with WebFlux and MongoDB. I´m trying to use aggregation in a capped collection with tailable cursor, but I´m nothing getting sucessful.
I´d like to execute this mongoDB query:
db.structures.aggregate(
[
{
$match: {
id: { $in: [8244, 8052]}
}
},
{ $sort: { id: 1, lastUpdate: 1} },
{
$group:
{
_id: {id: "$id"},
lastUpdate: { $last: "$lastUpdate" }
}
}
]
)
ReactiveMongoOperations gives me option to "tail" or "aggregation".
I´m able to execute aggregation:
MatchOperation match = new MatchOperation(Criteria.where("id").in(8244, 8052));
GroupOperation group = Aggregation.group("id", "$id").last("$lastUpdate").as("lastUpdate");
Aggregation aggregate = Aggregation.newAggregation(match, group);
Flux<Structure> result = mongoOperation.aggregate(aggregate,
"structures", Structure.class);
Or tail cursor
Query query = new Query();
query.addCriteria(Criteria.where("id").in(8244, 8052));
Flux<Structure> result = mongoOperation.tail(query, Structure.class);
Is it possible? Tail and Aggregation together?
Using aggregation was the way that I found to get only the last inserted document for each id.
Without aggregation I get:
query without aggregation
With aggregation:
query with aggregation
Tks in advance
The tailable cursor query creates a Flux that never completes (never emits onComplete event) and that Flux emits records as they are inserted in the database. Because of that fact I would think aggregations are not allowed by the database engine with the tailable cursor.
So the aggregation doesn't make sense in a way because on every newly inserted record the aggregation would need to be recomputed. Technically you can do a running aggregation where for every returned record you compute the wanted aggregate record and send it downstream.
One possible solution would be to do the aggregations programmatically on the returned "infinite" Flux:
mongoOperation.tail(query, Structure.class)
.groupBy(Structure::id) // create independent Fluxes based on id
.flatMap(groupedFlux ->
groupedFlux.scan((result, nextStructure) -> { // scan is like reduce but emits intermediate results
log.info("intermediate result is: {}", result);
if (result.getLastUpdate() > nextStructure.getLastUpdate()) {
return result;
} else {
result.setLastUpdate(nextStructure.getLastUpdate());
return result;
}
}));
On the other hand you should probably revisit your use case and what you need to accomplish here and see if something other than capped collection should be used or maybe the aggregation part is redundant (i.e. if newly inserted records always have the lastUpdate property larger then the previous record).

MongoDB Shell: Join a single field from another collection

I have a collection Spots:
- spotId
- name
and Codes:
- codeId
- spotId
- code
I want to introduce the field
- spotName
to Codes (and join by spotId).
How do I update all my existing documents within mongo shell?
When I try $lookup I always get the whole Spot document, not just the name field.
If you want to get a format where you get spotName in each doc, you can use an aggregation pipeline. $lookup was a good start. Here it generates the full array, which you may $unwind and project necessary stuff
[{$lookup: {
from: 'Spots',
localField: 'spotId',
foreignField: 'spotId',
as: 'new_field'
}},
{$unwind: '$new_field'},
{$project: {
codeId: 1,
spotId: 1,
code: 1,
spotName: '$new_field.name'
}}
]
The solution above represents the case if you need to just use the results.
If you indeed want to perform an update on the collection, the only way would be to write a find which goes through all docs in Codes and then once it gets the doc, it searches a doc in Spots using findOne, and then issuing an update statement.
Codes.find().each(function(err, doc) {
Spots.findOne({spotId: doc.spotId}, function (err, item) {
Codes.update({codeId: doc.codeId}, {$set: {spotName: item.name}}, function (err, res) {})
})
});
You will have to run a script
var spotMap ={}; Spots.find({},{spotId:1,name:1}).each(function(spot){spotMap[spot.spotId]=spot.name}); Codes.find({}).each(function(code){code.spotName=spotMap[code.spotId]; db.Codes.save(code);});
Also note that if the collection is too big, there would be problem while fetching all the data at once due to mongo limit of 16MB data in memory, and then you have to run the script in batches multiple time.
Codes.find({}).limit(1000).each(function(code){//do normal stuff}
Codes.find({}).skip(1000).limit(1000).each(function(code){//do normal stuff}
Codes.find({}).skip(2000).limit(1000).each(function(code){//do normal stuff}
and so on....

Within a mongodb $match, how to test for field MATCHING , rather than field EQUALLING

Can anyone tell me how to add a $match stage to an aggregation pipeline to filter for where a field MATCHES a query, (and may have other data in it too), rather than limiting results to entries where the field EQUALS the query?
The query specification...
var query = {hello:"world"};
...can be used to retrieve the following documents using the find() operation of MongoDb's native node driver, where the query 'map' is interpreted as a match...
{hello:"world"}
{hello:"world", extra:"data"}
...like...
collection.find(query);
The same query map can also be interpreted as a match when used with $elemMatch to retrieve documents with matching entries contained in arrays like these documents...
{
greetings:[
{hello:"world"},
]
}
{
greetings:[
{hello:"world", extra:"data"},
]
}
{
greetings:[
{hello:"world"},
{aloha:"mars"},
]
}
...using an invocation like [PIPELINE1] ...
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
]).toArray()
However, trying to get a list of the matching greetings with unwind [PIPELINE2] ...
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
]).toArray()
...produces all the array entries inside the documents with any matching entries, including the entries which don't match (simplified result)...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world", extra:"data"}},
{greetings:{hello:"world"}},
{greetings:{aloha:"mars"}},
]
I have been trying to add a second match stage, but I was surprised to find that it limited results only to those where the greetings field EQUALS the query, rather than where it MATCHES the query [PIPELINE3].
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
{$match:{greetings:query}},
]).toArray()
Unfortunately PIPELINE3 produces only the following entries, excluding the matching hello world entry with the extra:"data", since that entry is not strictly 'equal' to the query (simplified result)...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
]
...where what I need as the result is rather...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
{greetings:{"hello":"world","extra":"data"}
]
How can I add a second $match stage to PIPELINE2, to filter for where the greetings field MATCHES the query, (and may have other data in it too), rather than limiting results to entries where the greetings field EQUALS the query?
What you're seeing in the results is correct. Your approach is a bit wrong. If you want the results you're expecting, then you should use this approach:
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
{$match:{"greetings.hello":"world"}},
]).toArray()
With this, you should get the following output:
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
{greetings:{"hello":"world","extra":"data"}
]
Whenever you're using aggregation in MongoDB and want to create an aggregation pipeline that yields documents you expect, you should always start your query with the first stage. And then eventually add stages to monitor the outputs from subsequent stages.
The output of your $unwind stage would be:
[{
greetings:{hello:"world"}
},
{
greetings:{hello:"world", extra:"data"}
},
{
greetings:{hello:"world"}
},
{
greetings:{aloha:"mars"}
}]
Now if we include the third stage that you used, then it would match for greetings key that have a value {hello:"world"} and with that exact value, it would find only two documents in the pipeline. So you would only be getting:
{ "greetings" : { "hello" : "world" } }
{ "greetings" : { "hello" : "world" } }