MongoDB: Delete 'orphaned' documents? - mongodb

I am trying to delete orphaned documents in mongodb, which cross collections.
In the collection 'values', I have documents like this:
value:
{
resultId: <ObjectId>
...other data..
}
which reference documents in the collection 'results':
result:
{
_id: <ObjectId> //the resultId
}
A number of 'result' documents have been deleted, resulting in orphaned 'value' documents. How can I find all orphans and delete them?

What you will want to do is build up an aggregate pipeline and use the $lookup operator to fetch the corresponding result document. Then, add a $match operator to your aggregate pipeline to filter those that don't have corresponding result object.
db.values.aggregate([
{
$lookup: {
from: "results",
localField: "resultId",
foreignField: "id",
as: "resultDocument"
}
},
{ $match: { resultDocument: {$size:0} }}
])
This way, you have identified your orphaned documents and can delete them afterwards.

Related

MongoDB $merge with two fields being unique before insert?

Say I have a collection of elements with several fields, including userId and questionId. If I'm in an aggregation pipeline and I have a list of documents that all have userId and questionId as fields, but their values might already be in the collection (ie. a document of
{userId:1, questionId:1, score: 1}
but a similar document already exists in the collection
{userId:1, questionId:1, score:0}
How do I do a $merge into the collection while checking both fields? The $merge function does have an 'on: [field]' field to check overlap, but I don't think it can check two.
You can specify multiple fields in the on clause using an array.
db.toBe.aggregate([
{
"$merge": {
"into": "asIs",
"on": [
"userId",
"questionId"
],
"whenMatched": "merge"
}
}
])
Mongo Playground

Mongo lookup returning all values

I have 2 collections as shown below:
branches
{
_id: ...,
custId: "abc123",
branchCode: "AA",
...other fields
}
branchHolidays
{
_id: ...,
custId: "abc123",
holidayDate: ISODate("2019-06-01T00:00:00:0000"),
holidayStatus: "PROCESSED",
..other fields
}
Need to get all branchHolidays with the custId available in branches collection along with the branchCode from branches collection. (branches.custId = branchHolidays.custId)
For the first part of join I tried the below query but I'm getting all the fields from branchHolidays collection.
db.getCollection('branchHolidays').aggregate([
{
$lookup: {
localField: "custId",
from: "branches",
foreignField: "custId",
as: "holidays"
}
},
$match: { holidayStatus: "PROCESSED" }
])
The above query returns all the documents from branchHolidays collection.
I'm new to mongo but I'm not able to figure out what the problem is. Have gone through most of the SO queries but haven't found anything which helped.
Note: There are multiple branchCodes mapped to 1 custId in branches collection.
The $lookup stage is similar to a left outer join. The sample aggregation should return all documents from the branchHolidays collection that have holidayStatus: "PROCESSED", and each document will have an added field holidays containing all documents from the branches collection that have the same custId. For those documents that do not match any braches, the holidays field will contain an empty array.
If you want to return only document that have matching branches, match on size, like:
holidays:{$not:{$size:0}}
Also note placing the $match: { holidayStatus: "PROCESSED" } before the $lookup will avoid querying the branches collection for documents that would be eliminated, which may improve performance.

MongoDB create a filtered collection with 2 collections data

I have a mongo database that consist of huge github data (users, issues, repos, etc).
I want to create small collections from this big data.
I sorted "users" collection according to "followers" count of users.
Then I got the first 1000 users from this query.
db.getCollection("users").find({}).sort({followers:-1}).limit(1000).forEach(function(doc){
db.usersnew.insert(doc);});
There is another collection called "repos" that consists of info about users' repository. (user key field :"owner.id" )
I want to create a new filtered repos collection which consists only users who present in usersnew collection.
I tried to use $look_up but it works like join.
db.getCollection('reposnew').aggregate([{
$lookup:
{
from: "users",
localField: "owner:id",
foreignField : "id",
as: "filteredRepo"
}
}])
It creates users collection + repos in a one collection.
I want only filtered repos collection with specific users' data.
You're on the right track, you just need to add an $out stage.
db.getCollection('reposnew').aggregate([
{
$lookup:
{
from: "users",
localField: "owner.id",
foreignField : "id",
as: "filteredRepo"
}
},
{
$match: {
"filteredRepo.0": {$exists: true}
}
},
{
$project: {
filteredRepo: 0
}
},
{
$out: "newCollectionName"
}
])

MongoDB query, filter using cursor

I have two collections, one that has _id and UserId, and another that has UserId (same unique identifier) and "other data".
I want to filter the latter collection based on a list of _ids from the former collection.
Can someone provide an example query for this scenario?
The only way to 'join' collections in MongoDB is a $lookup aggregation stage (available in version 3.2).
firstCollection.aggregate([
{ $match: { _id: {$in: [1,2,3] }}}, // filter by _ids
{
$lookup:
{
from: "secondCollection",
localField: "UserId",
foreignField: "UserId",
as: "data"
}
}
])
That will add 'data' field to the documents from the first collection which will contain all related documents from second collection. If relation is not 1:1, you can add $unwind stage to flatten results:
{$unwind: "$data"}

List of Id's with no dependency in Mongo Collection

I have a scenario in spring-mongo query. Mongo version is 3.2
Application have two collections (Collection A and Collection B).
**Sample contents**
Collection A :: {"_id":1, "name":"content 1"}...{"_id":100, "name":"content 100"}
Collection B :: {"_id":1, "name":"parent 1", "a":[1,2,58,67]}
{"_id":2, "name":"parent 2", "a":[2,85,96,99]}
Collection B holds reference ids of Collection A as a array.
Scenario:
I will past list of ids of Collection A to the query: I need to get list of ids of Collection A which are not associated anywhere in Collection B.
how to achieve this?
I am planning to proceed with Aggregation with following query.. Looking up with
preserveNullAndEmptyArrays saving my day.
db.a.aggregate([
{
$match: { "_id":{$in:["1","5","10"]} }
},
{
$lookup:
{
from: "b",
localField: "_id",
foreignField: "a",
as: "moneyid"
}
},
{
$unwind:
{"path":"$moneyid", "preserveNullAndEmptyArrays":true}
},
{
$match:{
"moneyid":{$eq:null}
}
}
])
It can be done using the following approach
1) Find Unique Id's in Collection B which are in array "a":[]
2) Execute a find query in Collection A - Use $nin in the find query and pass all the ids which you obtained from array "a":[] from Collection B
Note:If you need to find matching documents between two collections then you can use $lookup which works like a left outer join.