How concurrency/locking works on documents while passing mongoDB aggregation pipeline - mongodb

Consider we have two collection coll1 and coll2. I am applying some aggregation stages to the coll1
db.coll1.aggregate([
{ $match: { ... } },
{ $lookup:{
from: "coll2",
localField: "_id",
foreignField: "_id",
as: "coll2"
}
}
// followed by other stages
//last stage being $merge
{ $merge : { into: "coll3", on: "_id"} }
])
So, my query is:
While the aggregation is in progress, is it possible the underlying collection, in this case coll1, is allowed to be modified/updated ? In either case, please help to understand how it works (went through mongoDb docs, but could not understand)
How does it write the final coll3 ? In sense, does it write all in one shot or one document as it finish the pipeline ?
In regards to spring-data-mongodb, I am successfully able to call mongoOperation.aggregate() for the above aggregation pipeline, but it returns aggregationResult object with Zero mappedResults.( When checked in db, coll3 is getting created).
Does $merge not return any such details ?
I am using mongoDb 4.2

Related

MongoDB purposely return only users that have no matching $lookup results

I have a users schema and a votes schema. I'm trying to return only users who haven't voted (have no returned votes).
I found this answer and using $lookup I have the below code to find each user and return all their votes as well. Which is halfway to what I'm trying to achieve.
How would I build a query so it only returns a user if they have no votes?
db.users.aggregate([
{
$addFields: { "_id": { "$toString": "$_id" } }
},
{
$lookup:
{
from: "votes",
localField: "_id",
foreignField: "voterId",
as: "votes"
}
}
])
Another question once I have a working solution, how would I go about scaling this up? Running this query in Robo 3T takes 9.05 seconds already for just loading 50 users and I have almost 40,000 users and over 200,000 votes in my database (which will only grow). Is there a more efficient way to do this? The final code will run on a Node.js server.
Update
As silencedogood said in a deleted answer, I don't need to use $addFields because user._id is automatically converted to a string (I thought it would be an ObjectId() initially). This however only saves 1 second off of loading 50 users (8.14s).
db.users.aggregate([
{
$lookup:
{
from: "votes",
localField: "_id",
foreignField: "voterId",
as: "votes"
}
}
])
I still need to figure out how to only return users who haven't voted.
An example shot of your data, and expected result, would help. The $addFields function is likely what is killing your performance. Why do you need this?
If the voterId is formatted as a string in the voter collection, but an objectId in the user collection (which I'm guessing is the case), you'll need to permanently cast to objectId if you want maximum performance. Nonetheless, this is roughly what you're looking for:
db.users.aggregate([
{
$lookup:
{
from: "votes",
localField: "_id",
foreignField: "voterId",
as: "votes"
}
},
{ "$match": { "votes.0": { "$exists": false } } }
])
This alone will only return users who don't have a vote entry. The equivalent of a left join, essentially.
Update
Since they are both strings, you can disregard that aspect of the answer. As to your performance issue... Not sure at the moment. That seems very unrealistic, I've never experienced query times that lengthy with a simple $lookup.

$lookup : computed foreinField workaround?

For an existing mongo database, the link between 2 collections is done by :
collA : field collB_id
collB : field _id = ObjectId("a string value")
where collB_id is _id.valueOf()
i.e. : the value of collB_id of collA is "a string value"
but in a $lookup :
localField: "collB_id",
foreignField: _id.valueOf(),
don't work, so what can I do ?
Mongodb v3.6
If i understood you correctly, you have two collections where documents from first collection (collA) reference documents from second collection (collB). And the problem is that you store reference as a string value of that objectId, so you cant use $lookup to join those docs.
collA:
{
"_id" : ObjectId(...),
"collB_id" : "123456...",
...
}
collB:
{
"_id" : ObjectId("123456..."),
...
}
If you are using mongo 4.0+ you can do it with following aggregation:
db.getCollection('collA').aggregate([
{
$addFields: {
converted_collB_id: { $toObjectId: "$collB_id" }
}
},
{
$lookup: {
from: 'collB',
localField: 'converted_collB_id',
foreignField: '_id',
as: 'joined'
}
}
]);
Mongo 4.0 introduces new aggregation pipeline operators $toObjectId and $toString.
That allows you to add new field which will be an ObjectId created from the string value stored in collB_id, and use that new field as localField in lookup.
I would strongly advise you not to store ObjectIds as strings.
You already experienced the problem with $lookup.
Also, size of ObjectId is 12 bytes while its hex representation is 24 bytes (that is twice the size). You will probably want to index that field as well, so you want it to be as small as possible.
ObjectId also contains timestamp which you can get by calling getTimestamp()
Make your life easier by using native types when possible!
Hope this helps!

Mongodb query execution take too much time

Iam working on the Go project and I am using mongodb to store my data. But suddenly the mongodb query execution took too much time to get data.
I have a collection named "cars" with around 25000 documents and each document containing around 200 fields (4.385KB). I have an aggregate query like this:
db.cars.aggregate([
{
$lookup:
{
from: "users",
localField: "uid",
foreignField: "_id",
as: "customer_info"
}
},{
$unwind: "$customer_info"
},{
$lookup:
{
from: "user_addresses",
localField: "uid",
foreignField: "_id",
as: "address"
}
},{
$unwind: "$address"
},{
$lookup:
{
from: "models",
localField: "_id",
foreignField: "car_id",
as: "model_info"
}
},{
$match:{
purchased_on:{$gt:1538392491},
status:{$in:[1,2,3,4]},
"customer_info.status":{$ne:9},
"model_info.status":{$ne:9},
}
},{
$sort:{
arrival_time:1
}
},{
$skip:0
},{
$limit:5
}
])
My document structure is like: https://drive.google.com/file/d/1hM-lPwvE45_213rQDYaYuYYbt3LRTgF0/view.
Now, If run this query with out indexing then it take around 10 mins to load the data. Can anyone suggest me how can I reduce its execution time ?
There are many things to do to optimize your query. What I would try :
As Anthony Winzlet said in comments, use as possible $match stage as first stage. This way, you can reduce number of documents passed to the following stages, and use indexes.
Assuming you use at least 3.6 mongo version, change your lookup stages using the 'let/pipeline' syntax (see here). This way, you can integrate your 'external filters' ( "customer_info.status":{$ne:9}, "model_info.status":{$ne:9} ) in a $match stage in your lookups pipeline. With indexes on right fields / collections, you will gain some time / memory in your $lookup stages.
Do your unwind stages as late as possible, to restrict number of documents passed to the following stages.
It's important to understand how works aggregation pipeline : each stage receive data, do its stuff, and pass data to next stage. So the less data is passed to the pipeline, the faster will be your query.

MongoDB — How to sort lookup based on matching field?

In my Mongo database, I have two collections — speakers and speeches, and I'm able to "join" them (getting a speaker's speeches) using the following code block:
// I've already found the 'speaker' in the database
db.collection('speakers').aggregate([{
$match: { "uuid": speaker.uuid } },
{ $lookup: {
from: "speeches",
localField: "uuid",
foreignField: "speaker_id",
as: "speeches" } }
]).toArray();
It works well, but I'd like to sort the speeches array by a date field or a title field, but nothing I do seems to make it happen. None of the examples I've seen here have done what I need them to do. I've tried adding { $sort: { "speech.title": -1 } } after the $lookup block, but it did nothing. Is this even possible?
You can use below $lookup pipeline variant available from 3.6.
{"$lookup":{
"from":"speeches",
"let":{"uuid":"$uuid"},
"pipeline":[
{"$match":{"$expr":{"$eq":["$$uuid","$speaker_id"]}}},
{"$sort":{"title":1}}
],
"as":"speeches"
}}

MongoDB query, filter using cursor

I have two collections, one that has _id and UserId, and another that has UserId (same unique identifier) and "other data".
I want to filter the latter collection based on a list of _ids from the former collection.
Can someone provide an example query for this scenario?
The only way to 'join' collections in MongoDB is a $lookup aggregation stage (available in version 3.2).
firstCollection.aggregate([
{ $match: { _id: {$in: [1,2,3] }}}, // filter by _ids
{
$lookup:
{
from: "secondCollection",
localField: "UserId",
foreignField: "UserId",
as: "data"
}
}
])
That will add 'data' field to the documents from the first collection which will contain all related documents from second collection. If relation is not 1:1, you can add $unwind stage to flatten results:
{$unwind: "$data"}