Delete all but one duplicate from a mongo db - mongodb

So I mad the mistake and saved a lot of doduments twice because I messed up my document id. Because I did a Insert, i multiplied my documents everytime I saved them. So I want to delete all duplicates except the first one, that i wrote. Luckilly the documents have an implicit unique key (match._id) and I should be able to tell what the first one was, because I am using the object id.
The documents look like this:
{
_id: "5e8e2d28ca6e660006f263e6"
match : {
_id: 2345
...
}
...
}
So, right now I have a aggregation that tells me what elements are duplicated and stores them in a collection. There is for sure a more elegant way, but I am still learning.
[{$sort: {"$_id": 1},
{$group: {
_id: "$match._id",
duplicateIds: {$push: "$_id"},
count: {$sum: 1}
}},
{$match: {
count: { $gt: 1 }
}}, {$addFields: {
deletableIds: { $slice: ["$duplicateIds", 1, 1000 ] }
}},
{$out: 'DeleteableIds'}]
Now I do not know how to proceed further, as it does not seem to have a "delete" operation in aggregations and I do not want to write those temp data to a db just so I can write a delete command with that, as I want to delete them in one go. Is there any other way to do this? I am still learning with mongodb and feel a little bit overwhelmed :/

Rather than doing all of those you can just pick first document in group for each _id: "$match._id" & make it as root document. Also, I don't think you need to do sorting in your case :
db.collection.aggregate([
{
$group: {
_id: "$match._id",
doc: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$doc"
}
}, {$out: 'DeleteableIds'}
])
Test : MongoDB-Playground

I think you're on the right track, however, to delete the duplicates you've found you can use a bulk write on the collection.
So if we imagine you aggregation query saved the following in the the DeleteableIds collection
> db.DeleteableIds.insertMany([
... {deletableIds: [1,2,3,4]},
... {deletableIds: [103,35,12]},
... {deletableIds: [345,311,232,500]}
... ]);
We can now take them and write a bulk write command:
const bulkwrite = db.DeleteableIds.find().map(x => ({ deleteMany : { filter: { _id: { $in: x.deletableIds } } } }))
then we can execute that against the database.
> db.collection1.bulkWrite(bulkwrite)
this will then delete all the duplicates.

Related

Merge Names From Data For Message Application

Hello guys I'm writing a Message Application with Node.js and Mongoose. I keep datas in mongodb like that:
I want to list users who messaged before so I need to filter my 'Messages' collection but I can't do what exactly I want. If he sent a message to a person I need to take persons name but, if he take a message from a person I need to take persons name however in first situation person name in reciever, in second situation person name in sender. I made a table for explain more easily. I have left table and I need 3 name like second table.(Need to eliminate one John's name)
Sorry, if this problem asked before but I don't know how can I search this problem.
I tried this but it take user name who logged in and duplicate some names.
Message.find({$or: [{sender: req.user.username}, {reciever: req.user.username}]})
One option is to use an aggregation pipeline to create two sets and simply union them:
db.collection.aggregate([
{$match: {$or: [{sender: req.user.username}, {reciever: req.user.username}]}},
{$group: {
_id: 0,
recievers: {$addToSet: "$reciever"},
senders: {$addToSet: "$sender"}
}},
{$project: {
_id: req.user.username,
previousChats: {"$setDifference":
[
{$setUnion: ["$recievers", "$senders"]},
[req.user.username]
]
}
}}
])
See how it works on the playground example
This is a tricky one, but can be solved with a fairly simple aggregation pipeline.
Explanation
On our first stage of the pipeline, we will want to get all the messages sent or received by the user (in our case David), for that we will use a $match stage:
{
$match: {
$or: [
{sender: 'David'},
{receiver: 'David'}
]
}
}
After we found all the messages from or to David, we can start collecting the people he talks to, for that we will use a $group stage and use 2 operations that will help us to achieve this:
$addToSet - This will add all the names to a set. Sets only contain one instance of the same value and ignore any other instance trying to be added to the set of the same value.
$cond - This will be used to add either the receiver or the sender, depending which one of them is David.
The stage will look like this:
{
$group: {
_id: null,
chats: {$addToSet: {$cond: {
if: {$eq: ['$sender', 'David']},
then: '$receiver',
else: '$sender'
}}}
}
}
Combining these 2 stages together will give us the expected result, one document looking like this:
{
"_id": null, // We don't care about this
"chats": [
"John",
"James",
"Daniel"
]
}
Final Solution
Message.aggregate([{
$match: {
$or: [
{
sender: req.user.username
},
{
receiver: req.user.username
}
]
}
}, {
$group: {
_id: null,
chats: {
$addToSet: {
$cond: {
'if': {
$eq: [
'$sender',
req.user.username
]
},
then: '$receiver',
'else': '$sender'
}
}
}
}
}])
Sources
Aggregation
$match aggregation stage
$group aggregation stage
$addToSet operation
$cond operation

MongoDB querying aggregation in one single document

I have a short but important question. I am new to MongoDB and querying.
My database looks like the following: I only have one document stored in my database (sorry for blurring).
The document consists of different fields:
two are blurred and not important
datum -> date
instance -> Array with an Embedded Document Object; Our instance has an id, two not important fields and a code.
Now I want to query how many times an object in my instance array has the group "a" and a text "sample"?
Is this even possible?
I only found methods to count how many documents have something...
I am using Mongo Compass, but i can also use Pymongo, Mongoengine or every other different tool for querying the mongodb.
Thank you in advance and if you have more questions please leave a comment!
You can try this
db.collection.aggregate([
{
$unwind: "$instance"
},
{
$unwind: "$instance.label"
},
{
$match: {
"instance.label.group": "a",
"instance.label.text": "sample",
}
},
{
$group: {
_id: {
group: "$instance.label.group",
text: "$instance.label.text"
},
count: {
$sum: 1
}
}
}
])

MongoDb - add index on 'calculated' fields

I have a query that includes an $expr-operator with a $cond in it.
Basically, I want to have objects with a timestamp from a certain year. If the timestamp is not set, I'll use the creation date instead.
{
$expr: {
$eq: [
{
$cond: {
'if': {
TimeStamp: {
$type: 'null'
}
},
then: {
$year: '$Created'
},
'else': {
$year: '$TimeStamp'
}
}
},
<wanted-year>
]
}
}
It would be nice to have this query using a index. But is it possible to do so? Should I just add index to both TimeStamp and Created-fields? Or is it possible to create an index for a Year-field that doesn't really exist on the document itself...?
Not possible
Indexes are stored on disk before executing the query.
Workaround: On-Demand Materialized Views
You store in separate collection your calculated data (with indexes)
This can't be done today without precomputing that information and storing it in a field on the document. The closest alternative would probably be to use MongoDB 4.2's aggregation pipeline-powered updates to precompute and store a createdOrTimestamp field whenever your documents are updated. You could then create an index on createdOrTimestamp that would be used when querying for documents that match a certain year.
What this would look like when updating or after inserting your document:
db.collection.update({ _id: ObjectId("5e8523e7ea740b14fb16b5c3") }, [
{
$set: {
createdOrTimestamp: {
$cond: {
if: {$gt: ['$TimeStamp', null]},
then: '$TimeStamp',
else: '$Created'
}
}
}
}
])
If documents already exist, you could also send off an updateMany operation with that aggregation to get that computed field into all your existing documents.
It would be really nice to be able to define computed fields declaratively on a collection just like indexes, so that they take care of keeping themselves up to date!

Most efficient way to put fields of an embedded document in its parent for an entire MongoDB collection?

I am looking for the most efficient way to modify all the documents of a collection from this structure:
{
[...]
myValues:
{
a: "any",
b: "content",
c: "can be found here"
}
[...]
}
so it becomes this:
{
[...]
a: "any",
b: "content",
c: "can be found here"
[...]
}
Basically, I want everything under the field myValues to be put in its parent document for all the documents of a collection.
I have been looking for a way to do this in a single query using dbCollection.updateMany(), put it does not seem possible to do such thing, unless the content of myValues is the same for all documents. But in my case the content of myValues changes from one document to the other. For example, I tried:
db.getCollection('myCollection').updateMany({ myValues: { $exists: true } }, { $set: '$myValues' });
thinking it would perhaps resolve the myValues object and use that object to set it in the document. But it returns an error saying it's illegal to assign a string to the $set field.
So what would be the most efficient approach for what I am trying to do? Is there a way to update all the documents of the collection as I need in a single command?
Or do I need to iterate on each document of the collection, and update them one by one?
For now, I iterate on all documents with the following code:
var documents = await myCollection.find({ myValues: { $exists: true } });
for (var document = await documents.next(); document != null; document = await documents.next())
{
await myCollection.updateOne({ _id: document._id }, { $set: document.myValues, $unset: { myValues: 1} });
}
Since my collection is very large, it takes really long to execute.
You can consider using $out as an alternative, single-command solution. It can be used to replace existing collection with the output of an aggregation. Knowing that you can write following aggregation pipeline:
db.myCollection.aggregate([
{
$replaceRoot: {
newRoot: {
$mergeObjects: [ "$$ROOT", "$myValues" ]
}
}
},
{
$project: {
myValues: 0
}
},
{
$out: "myCollection"
}
])
$replaceRoot allows you to promote an object which merges the old $$ROOT and myValues into root level.

MongoDB query to find property of first element of array

I have the following data in MongoDB (simplified for what is necessary to my question).
{
_id: 0,
actions: [
{
type: "insert",
data: "abc, quite possibly very very large"
}
]
}
{
_id: 1,
actions: [
{
type: "update",
data: "def"
},{
type: "delete",
data: "ghi"
}
]
}
What I would like is to find the first action type for each document, e.g.
{_id:0, first_action_type:"insert"}
{_id:1, first_action_type:"update"}
(It's fine if the data structured differently, but I need those values present, somehow.)
EDIT: I've tried db.collection.find({}, {'actions.action_type':1}), but obviously that returns all elements of the actions array.
NoSQL is quite new to me. Before, I would have stored all this in two tables in a relational database and done something like SELECT id, (SELECT type FROM action WHERE document_id = d.id ORDER BY seq LIMIT 1) action_type FROM document d.
You can use $slice operator in projection. (but for what you do i am not sure that the order of the array remain the same when you update it. Just to keep in mind))
db.collection.find({},{'actions':{$slice:1},'actions.type':1})
You can also use the Aggregation Pipeline introduced in version 2.2:
db.collection.aggregate([
{ $unwind: '$actions' },
{ $group: { _id: "$_id", first_action_type: { $first: "$actions.type" } } }
])
Using the $arrayElemAt operator is actually the most elegant way, although the syntax may be unintuitive:
db.collection.aggregate([
{ $project: {first_action_type: {$arrayElemAt: ["$actions.type", 0]}
])