MongoDB - How to remove duplicates - mongodb

I have a collection which have many duplicates due to the routines that populated it in the first place. How to dedupe these?
e.g.
{ "_id" : ObjectId("531a5fe448757e00244096fa"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
{ "_id" : ObjectId("531a731148757e17587a6e04"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
{ "_id" : ObjectId("531a7bb848757e1f7c0ca702"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
I want it to be just one (don't care which objectID gets picked)
{ "_id" : ObjectId("531a5fe448757e00244096fa"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }

You should use an Index over you code field:
db.<collection>.ensureIndex({'code' : 1}, {unique : true, dropDups : true})
unique will ensure you will not have duplicates anymore.
dropDups will delete all your duplicate documents when the ensureIndex operation is run

Related

MongoDB addToSet in nested array

I'm struggling to insert data inside a nested array in MongoDB.
My schema looks like this:
{
"_id" : ObjectId("5c0c55642440311ff0353846"),
"name" : "Test",
"email" : "test#gmail.com",
"username" : "test",
"password" : "$2a$10$RftzGtgM.DqIiaSvH4LqOO6RnLgQfLY3nk7UIAH4OAvvxo0ZMSaHu",
"created" : ISODate("2018-12-08T23:36:04.464Z"),
"classes" : [
{
"_id" : ObjectId("5c0c556e2440311ff0353847"),
"cName" : "1A",
"student" : [
{
"grades" : [ ],
"_id" : ObjectId("5c0c55812440311ff0353848"),
"name" : "StudentName",
"lname" : "StudenteLastName",
"gender" : "M"
}
insert }
],
"__v" : 0
}.
What I want to do is inserting a grade for the student inside "grades" array.
Expected result is:
{
"_id" : ObjectId("5c0c55642440311ff0353846"),
"name" : "Test",
"email" : "test#gmail.com",
"username" : "test",
"password" : "$2a$10$RftzGtgM.DqIiaSvH4LqOO6RnLgQfLY3nk7UIAH4OAvvxo0ZMSaHu",
"created" : ISODate("2018-12-08T23:36:04.464Z"),
"classes" : [
{
"_id" : ObjectId("5c0c556e2440311ff0353847"),
"cName" : "1A",
"student" : [
{
"grades" : [6],
"_id" : ObjectId("5c0c55812440311ff0353848"),
"name" : "StudentName",
"lname" : "StudenteLastName",
"gender" : "M"
}
]
}
],
"__v" : 0
}.
I tried some queries but none of them helped me, even searching a lot.
db.teachers.update({"_id": ObjectId("5c0c55642440311ff0353846"), "classes._id": ObjectId("5c0c556e2440311ff0353847"), "classes.student._id": ObjectId("5c0c55812440311ff0353848")},{$addToSet: {"classes.$.student.grades":6}})
Basically, I searched for the student with the first curly bracket (if I do "db.teachers.find(the three conditions) the result is correct) and then add to the grades array (of Integer) the value 6. But at this point I get errors, I think I'm making a mistake on the "adding" part.
I need also to do the same thing in Mongoose.
Any help is appreciated, thanks in advance!
Edit: I solved. I post my solution hoping it'll be useful to other:
For pushing inside a triple nested array do:
db.teachers.update({"_id":ObjectId("5c0c59985ae5981c58937e12"),"classes":{ $elemMatch : { _id : ObjectId("5c0c59a35ae5981c58937e13") }},"classes.student": { $elemMatch : { _id : ObjectId("5c0c59aa5ae5981c58937e14")} }},{$addToSet:{"classes.$.student.0.grades":3}})
https://docs.mongodb.com/manual/tutorial/query-array-of-documents/
Try using $elemMatch
"classes":{ $elemMatch : { _id : ObjectId("5c0c556e2440311ff0353847") }},
"classes.student": { $elemMatch : { _id : ObjectId("5c0c55812440311ff0353848")} }

MongoDB PartialFilterExpression filter issue

I have a collection of items where a document looks something like this:
{
"source" : "rest",
"serviceCode" : "fluff",
"fluff" : "puff",
"systemEntryTime" : ISODate("2018-05-16T09:04:00.585Z")
}
I have an index with a TTL option for two weeks :
{
"v" : 1,
"key" : {
"systemEntryTime" : 1
},
"name" : "systemEntryTime_1",
"ns" : "storage.item",
"expireAfterSeconds" : NumberLong(1209600)
}
Now I want certain documents where source = "ftp" to have a different TTL. For this purpose I created the following index with a partialFilterExpression:
{
"v" : 1,
"key" : {
"systemEntryTime" : 1,
"source" : 1
},
"name" : "systemEntryTime_1_source_1",
"ns" : "storage.item",
"expireAfterSeconds" : NumberLong(1),
"partialFilterExpression" : {
"source" : {
"$eq" : "ftp"
}
}
}
Unfortunately this is not working, what am I doing wrong here? I have experimented with dropping the old index and using only this, but no documents a dropped according to the TTL (or any documents at all for that matter).

mongodb aggregate with extra info

I have a mongo collection containing docs such as this:
{
"_id" : ObjectId("57697321c22d3917acd66513"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 1638,
"url" : "http://www.thecompany.com/path/to/page1",
"date" : ISODate("2016-06-21T17:02:20.352Z"),
"valid" : true
}
What I am trying to do is to run one query that would group on signature filed, return min and max price AND corresponding url:
{
"signature" : "AnotherAlphaNumericID",
"min_price" : 1504,
"min_rent_listing" : "http://www.thecompany.com/path/to/page1",
"max_price" : 1737,
"max_price_listing" : "http://www.thecompany.com/path/to/page2",
}
Running a $group on $signature field to obtain $min and $max is straight forward but in order to get the actual urls I split the query into 2 with the first query returning a sorted list of docs using $signature with prices from min to max and then (in python code) taking the first and last element. This works fine but would be nice to have one query.
Thoughts?
p.s.
Also 'toyed' with running one query for min and one for max and 'zipping' the results.
You can play a trick with help of $group and $project. Assuming dataset is
{
"_id" : ObjectId("57db28dc705af235a826873a"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 1638.0,
"url" : "http://www.thecompany.com/path/to/page1",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
{
"_id" : ObjectId("57db28dc705af235a826873b"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 168.0,
"url" : "http://www.thecompany.com/path/to/page2",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
{
"_id" : ObjectId("57db28dc705af235a826873c"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 163.0,
"url" : "http://www.thecompany.com/path/to/page3",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
{
"_id" : ObjectId("57db28dc705af235a826873d"),
"parent" : "AlphaNumericID",
"signature" : "AnotherAlphaNumericID",
"price" : 1680.0,
"url" : "http://www.thecompany.com/path/to/page4",
"date" : ISODate("2016-06-21T17:02:20.352+0000"),
"valid" : true
}
Try following query in shell
db.collection.aggregate([
{$sort:{price:1}},
{$group:{
_id:"$signature",
_first:{$first:"$url"},
_last:{$last:"$url"},
_min:{$first:"$price"},
_max:{$last:"$price"}}
},
{$project:{
_id:0,
min:{
url:"$_first",
price:"$_min"},
max:{
url:"$_last",
price:"$_max"}}
}
])
Output will be with minimum/maximum price and corresponding url
{
"min" : {
"url" : "http://www.thecompany.com/path/to/page3",
"price" : 163.0
},
"max" : {
"url" : "http://www.thecompany.com/path/to/page4",
"price" : 1680.0
}
}
What I changed from original answer:
_min:{$min:"$price"}, --> to use $first
_max:{$max:"$price"}} --> to use $last
Reason: we go into the pipeline with an ascending sort on price. By default, first record is min and last record is max.

Inconsistent query results with embedded documents on MongoDB

I've got a collection called payments with an example of its document shown below:
{
"_id" : ObjectId("579b5ee817e3aaac2f0aebc1"),
"updatedAt" : ISODate("2016-07-29T11:04:01.209-03:00"),
"createdAt" : ISODate("2016-07-29T10:49:28.113-03:00"),
"createdBy" : ObjectId("5763f56010cd7b03008147d4"),
"contract" : ObjectId("578cb907f1575f0300d84d09"),
"recurrence" : [
{
"when" : ISODate("2016-05-29T11:03:45.606-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2d"),
"transaction" : {
"createdAt" : ISODate("2016-05-29T11:03:45.608-03:00"),
"tid" : "9999999999999999B01A",
"status" : 4,
"code" : "00",
"message" : "Transação autorizada"
},
"status" : "PAGO"
},
{
"when" : ISODate("2016-06-29T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2c"),
"transaction" : {
"createdAt" : ISODate("2016-06-29T11:03:45.608-03:00"),
"tid" : "9999999999999999B01A",
"status" : 4,
"code" : "00",
"message" : "Transação autorizada"
},
"status" : "PAGO"
},
{
"when" : ISODate("2016-07-29T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2b"),
"status" : "ERRO",
"transaction" : {
"code" : "56",
"createdAt" : ISODate("2016-07-29T11:04:01.196-03:00"),
"message" : "Autorização negada",
"status" : 5,
"tid" : "1006993069000730B88A"
}
},
{
"when" : ISODate("2016-07-30T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e2a"),
"status" : "PENDENTE"
},
{
"when" : ISODate("2016-07-31T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e29"),
"status" : "PENDENTE"
},
{
"when" : ISODate("2016-08-01T11:03:45.608-03:00"),
"_id" : ObjectId("579b6241ea945e3631f64e28"),
"status" : "PENDENTE"
}
],
"status" : "PAGO",
"conditions" : {
"originalValue" : 7406.64,
"totalValue" : 7400,
"upfrontValue" : 1500,
"upfrontInstallments" : 3,
"balanceInstallments" : 9
},
"__v" : 0,
"transaction" : {
"code" : "00",
"createdAt" : ISODate("2016-07-29T10:49:46.610-03:00"),
"message" : "Transação autorizada",
"status" : 6,
"tid" : "1006993069000730AF5A"
}
}
If I run the query below, I get the desired document shown above:
db.payments.find({ "recurrence.transaction.tid": "1006993069000730B88A" })
However, if I run this other query, MongoDB returns my entire collection (presumably because it didn't match the subdocument's id):
db.payments.find({ "recurrence._id": ObjectId("579b6241ea945e3631f64e2b") })
Both queries should return the same result! I also checked some other questions including this one so unless I'm going crazy I'm doing the same thing. Not sure why the inconsistent results though.
Tryout this:
db.payments.find({ recurrence : { $elemMatch: { "transaction.tid": "1006993069000730B88A"} } }).pretty()

mongodb $maxScan didn't equals limit

This is my first question on stack overflow, I am so happy and await your answers. My question is:
When I use MongoDB Query Selectors, I want limit results. But $maxScan is not work as I want.
---------This is What I want result.
db.post.find({query:{status:"publish"},$orderby:{date:-1}},{status:1,name:1,date:1,$slice:2}).limit(3)
{ "_id" : ObjectId("519262580cf21fb1647fb765"), "date" : ISODate("2013-05-14T16:12:08.600Z"), "status" : "publish", "name" : "关于多说" }
{ "_id" : ObjectId("519254ad0cf2f064f6ecef82"), "date" : ISODate("2013-05-14T15:13:49.017Z"), "status" : "publish", "name" : "回顾<蜗居>的100句经典台词" }
{ "_id" : ObjectId("519254690cf2f064f6ecef81"), "date" : ISODate("2013-05-14T15:12:41.462Z"), "status" : "publish", "name" : "女人脱光了是什么" }
-----------This is the results I use $maxScan
db.post.find({query:{status:"publish"},$maxScan:3,$orderby:{date:-1}},{status:1,name:1,date:1})
{ "_id" : ObjectId("518e6c690cf21a363df2956e"), "date" : ISODate("2013-05-11T16:06:01.341Z"), "status" : "publish", "name" : "淘宝新店,充值任务" }
I find may be the $maxScan didn't like limit(). it first limit the collection data and then execute the query! but this is not I want. Is anything i wrong? please help.Thanks
--------------All results
db.post.find({query:{},$orderby:{date:-1}},{status:1,name:1,date:1})
{ "_id" : ObjectId("519262580cf21fb1647fb765"), "date" : ISODate("2013-05-14T16:12:08.600Z"), "status" : "publish", "name" : "关于多说" }
{ "_id" : ObjectId("519254ad0cf2f064f6ecef82"), "date" : ISODate("2013-05-14T15:13:49.017Z"), "status" : "publish", "name" : "回顾<蜗居>的100句经典台词" }
{ "_id" : ObjectId("519254690cf2f064f6ecef81"), "date" : ISODate("2013-05-14T15:12:41.462Z"), "status" : "publish", "name" : "女人脱光了是什么" }
{ "_id" : ObjectId("518ee61a0cf22bd326d60215"), "date" : ISODate("2013-05-12T00:45:14.295Z"), "status" : "publish", "name" : "JSTL日期格式化用法(转载)" }
{ "_id" : ObjectId("518e6c690cf21a363df2956e"), "date" : ISODate("2013-05-11T16:06:01.341Z"), "status" : "publish", "name" : "淘宝新店,充值任务" }
{ "_id" : ObjectId("518e21c90cf21a363df2956d"), "date" : ISODate("2013-05-11T10:47:37.803Z"), "status" : "draft", "name" : "一夜没睡" }
{ "_id" : ObjectId("518df75d0cf21a363df2956c"), "date" : ISODate("2013-05-11T07:46:37.726Z"), "status" : "draft", "name" : "飞娥入侵" }
{ "_id" : ObjectId("518d80630cf21a363df2956b"), "date" : ISODate("2013-05-10T23:18:59.323Z"), "status" : "publish", "name" : "Java的日期格式化常用方法" }
To return only the top results, you should use limit(), which will limit the amount of results returned from the cursor. This is commonly used with skip() to paginate the results.
It's not explained very clearly in the docs, but $maxScan as the name suggests limits the number of documents the query will examine. Presumably your query is examining some documents which don't meet the criteria (with status != publish) and then discarding them.
Do you have an index on status? It's possible that could help the query return the results you want while scanning fewer documents, but I still think limit() is what you want.