In the MongoDB aggregation framework, I was hoping to use the $unwind operator on an object (ie. a JSON collection). Doesn't look like this is possible, is there a workaround? Are there plans to implement this?
For example, take the article collection from the aggregation documentation . Suppose there is an additional field "ratings" that is a map from user -> rating. Could you calculate the average rating for each user?
Other than this, I'm quite pleased with the aggregation framework.
Update: here's a simplified version of my JSON collection per request. I'm storing genomic data. I can't really make genotypes an array, because the most common lookup is to get the genotype for a random person.
variants: [
name: 'variant1',
genotypes: {
person1: 2,
person2: 5,
person3: 7,
name: 'variant2',
genotypes: {
person1: 3,
person2: 3,
person3: 2,

It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.
The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.
Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:
{ title : title of article", ...
ratings: [
{ voter: "user1", score: 5 },
{ voter: "user2", score: 8 },
{ voter: "user3", score: 7 }
Now you can aggregate this with:
[ {$unwind: "$ratings"},
{$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } }
But this example structured as you describe it would look like this:
{ title : title of article", ...
ratings: {
user1: 5,
user2: 8,
user3: 7
or even this:
{ title : title of article", ...
ratings: [
{ user1: 5 },
{ user2: 8 },
{ user3: 7 }
Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]
An analogous relational DB schema to what you have would be:
user1: integer,
user2: integer,
user3: integer
That's not what would be done, instead we would do this:
username: varchar(32),
score: integer
and now we aggregate using SQL:
select username, avg(score) from T group by username;
There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.
[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.

Since 3.4.4, you can transform object to array using $objectToArray

This is an old question, but I've run across a tidbit of information through trial and error that people may find useful.
It's actually possible to unwind on a dummy value by fooling the parser this way:
{ $project: {
Field1: 1, Field2: 1, Field3: 1,
DummyUnwindField: { $ifNull: [null, [1.0]] }
{ $unwind: "$DummyUnwindField" }
This will produce 1 row per document, regardless of whether or not the value exists. You may be able tinker with this to generate the results you want. I had hoped to combine this with multiple $unwinds to (sort of like emit() in map/reduce), but alas, the last $unwind wins or they combine as an intersection rather than union which makes it impossible to achieve the results I was looking for. I am sadly disappointed with the aggregate framework functionality as it doesn't fit the one use case I was hoping to use it for (and seems strangely like a lot of the questions on StackOverflow in this area are asking) - ordering results based on match rate. Improving the poor map reduce performance would have made this entire feature unnecessary.

This is what I found & extended.
Lets create experimental database in mongo
db.copyDatabase('livedb' , 'experimentdb')
Now Use experimentdb & convert Array to object in your experimentcollection
e.ratings = [e.ratings]; //Objects name to be converted to array eg:ratings
Some nerdy js code to convert json to flat object
var flatArray = [];
var data = db.experimentcollection.find().toArray();
for (var index = 0; index < data.length; index++) {
var flatObject = {};
for (var prop in data[index]) {
var value = data[index][prop];
if (Array.isArray(value) && prop === 'ratings') {
for (var i = 0; i < value.length; i++) {
for (var inProp in value[i]) {
flatObject[inProp] = value[i][inProp];
flatObject[prop] = value;


how to drop duplicate embedded document

I have users' collection containing many lists of sub documents. Schema is something like this:
_id: ObjectId(),
name: aaa,
age: 20,
trans_id: 1,
product: mobile,
price: 30,
trans_id: 2,
product: tv,
price: 10
So I have one doubt. trans_id in transactions list is unique over all the products, but it may be possible that I may have copied the same transaction again with same trans_id (due to bad ETL programming). Now I want to drop those duplicate sub documents. I have indexed trans_id thought not unique. I read about dropDups option. But will it delete a particular duplicate exists in DB or it'll drop whole document (which I definitely don't want). If not how to do it?
PS: I am using MongoDB 2.6.6 version.
Nearest case to all we can see presented here it that now you need a way of defining the "distinct" items within the array where some items are in fact an "exact copy" of other items in the array.
The best case is to use $addToSet along with the $each modifier within a looping operation for the collection. Ideally you use the Bulk Operations API to take advantage of the reduced traffic when doing so:
var bulk = db.collection.initializeOrderedBulkOperation();
var count = 0;
// Read the docs
db.collection.find({}).forEach(function(doc) {
// Blank the array
bulk.find({ "_id": doc.id })
.updateOne({ "$set": { "transactions": [] } });
// Resend as a "set"
bulk.find({ "_id": doc.id })
"$addToSet": {
"trasactions": { "$each": doc.transactions }
// Execute once every 500 statements ( actually 1000 )
if ( count % 500 == 0 ) {
bulk = db.collection.initializeOrderedBulkOperation();
// If a remainder then execute the remaining stack
if ( count % 500 != 0 )
So as long as the "duplicate" content is "entirely the same" then this approach will work. If the only thing that is actually "duplicated" is the "trans_id" field then you need an entirely different approach, since none of the "whole documents" are "duplicated" and this means you need more logic in place to do this.

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
dups: {
$push: {
_id: "$_id"
count: {
$sum: 1
$match: {
count: {
"$gt": 1
allowDiskUse: true
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
create new index
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
.foreach(x=>cole.remove({id $eq x})

MongoDB: Unique index on array element's property

I have a structure similar to this:
class Cat {
int id;
List<Kitten> kittens;
class Kitten {
int id;
I'd like to prevent users from creating a cat with more than one kitten with the same id. I've tried creating an index as follows:
db.Cats.ensureIndex({'id': 1, 'kittens.id': 1}, {unique:true})
But when I attempt to insert a badly-formatted cat, Mongo accepts it.
Am I missing something? can this even be done?
As far as I know, unique indexes only enforce uniqueness across different documents, so this would throw a duplicate key error:
db.cats.insert( { id: 123, kittens: [ { id: 456 } ] } )
db.cats.insert( { id: 123, kittens: [ { id: 456 } ] } )
But this is allowed:
db.cats.insert( { id: 123, kittens: [ { id: 456 }, { id: 456 } ] } )
I'm not sure if there's any way enforce the constraint you need at the Mongo level, maybe it's something you could check in the application logic when you insert of update?
Ensuring uniqueness of the individual values in an array field
In addition to the example above, there is a function in MongoDB to ensure that when you are adding a new object/value to an array field, that it will only perform the update if the value/object doesn't already exist.
So if you have a document that looks like this:
{ _id: 123, kittens: [456] }
This would be allowed:
db.cats.update({_id:123}, {$push: {kittens:456}})
resulting in
{ _id: 123, kittens: [456, 456] }
however using the $addToSet function (as opposed to $push) would check if the value already exists before adding it.
So, starting with:
{ _id: 123, kittens: [456] }
then executing:
db.cats.update({_id:123}, {$addToSet: {kittens:456}})
Would not have any effect.
So, long story short, unique constraints don't validate uniqueness within the value items of an array field, just that two documents can't have identical values in the indexed fields.
There is an equivalent of insert with uniquness in array attribute. The following command essentially does insert while ensuring the uniqueness of kittens (upsert creates it for you if the object with 123 doesn't already exist).
{ id: 123 },
{ $addToSet: {kittens: { $each: [ 456, 456] }}, $set: {'otherfields': 'extraval', "field2": "value2"}},
{ upsert: true}
The resulting value of the object will be
"id": 123,
"kittens": [456],
"otherfields": "extraval",
"field2": "value2"
Well what seemed important here is ensuring that no more than an item should exist in a mongodb object array, with the same id or some other fields that is required to be treated uniquely. Then, a simple query like this will suffice for update, using $addToSet.
Forgive me I am not a mongo-shell expert, using Java Mongo Driver version 4.0.3
collection = database.getCollection("cat", Cat.class);
UpdateResult result = collection.updateOne(and(eq("Id", 1), nin("kittens.id", newKittenId)), addToSet("kittens", new Kitten("newKittenId")));
The query used here added an extra condition to the match query, which goes like; where cat.id is 1 and the newKittenId is not yet owned by any of the kittens that had previously been added. So if the id for the cat is found and no kitten has taken the new kittenId, the query goes ahead and update the cat's kittens by adding a new one. But if the newKittenId had been taken by one of the kittens, it simply returns updateresult with no count, and no modified field (nothing happens).
Note: This does not ensure unique constraints on the kitten.id, mongo DB does not support uniqueness on object arrays in a document, and addToSet does not really handle duplicate item in an object array, except the object is 100% a replica of what is in the database check here for more explanation about addToSet.
there is a workaround you can do using the document validator.
Here is an example validator where "a" is an array and within "a" subdocument field "b" value must be unique. This assumes the collection is either empty or already complies with the rule:
> db.runCommand({collMod:"coll", validator: {$expr:{$eq:[{$size:"$a.b"},{$size:{$setUnion:"$a.b"}}]}}})
/* test it */
> db.coll.insert({a:[{b:1}]}) /* success */
> db.coll.update({},{ '$push' : { 'a':{b:1}}})
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
see more info about this solution from the original post
You can write a custom Mongoose validation method in this case. You can hook into post validation. Mongoose has validation and you can implement a hook before (pre) or after(post) validation. In this case, you can use post validation to see if the array is valid. Then just make sure the array has no duplications. There may be efficiency improvements you can make based upon your details. If you only have '_id' for example you could just use the JS includes function.
catSchema.post('validate',function(next) {
return new Promise((resolve,reject) => {
for(var i = 0; i < this.kittens.length; i++) {
let kitten = this.kittens[i];
for(var p = 0; p < this.kittens.length; p++) {
if (p == i) {
if (kitten._id == this.kittens[p]._id) {
return reject('Duplicate Kitten Ids not allowed');
return resolve();
I like to use promises in validation because it's easier to specify errors.