mongodb count and remove duplicate values - mongodb

i have a large mongodb collection with a lot of duplicate inserts like this
{ "_id" : 1, "val" : "222222", "val2" : "37"}
{ "_id" : 2, "val" : "222222", "val2" : "37" }
{ "_id" : 3, "val" : "222222", "val2" : "37" }
{ "_id" : 4, "val" : "333333", "val2" : "66" }
{ "_id" : 5, "val" : "111111", "val2" : "22" }
{ "_id" : 6, "val" : "111111", "val2" : "22" }
{ "_id" : 7, "val" : "111111", "val2" : "22" }
{ "_id" : 8, "val" : "111111", "val2" : "22" }
i want to count all duplicates for each insert and only leave one unique entry with the count number in DB like this
{ "_id" : 1, "val" : "222222", "val2" : "37", "count" : "3"}
{ "_id" : 2, "val" : "333333", "val2" : "66", "count" : "1"}
{ "_id" : 2, "val" : "111111", "val2" : "22", "count" : "4" }
i already checked out MapReduce and aggregation framework but they never output the full document back and only do one calculation for full collection
it would be good to save the new data to a new collection

If you use mongodb 2.6, here is an example with the aggregation framework :
db.duplicate.aggregate({$group:{_id:"$val",count:{$sum :1}}},
{$project:{_id:0, val:"$_id", count:1}},
{$out:"deduplicate"})
group with val and count
project to rename _id field and mask _id field
out to write to a new collection (here the name is deduplicate)
Hope it fit with your case.

Might be easier with an incremental map reduce
mapper=function(){
emit({'val1':this.val, 'val2':this.val2}, {'count':1});
}
reducer=function(k,v){
counter=0;
for (i=0;i<v.length;i++){
counter+=v[i].count;
}
return {'count':counter}
}
Then in the shell you'll need to do
bigcollection.map_reduce(mapper, reducer, {out:{reduce:'reducedcollection'}})
This should result in a new collection called reduced collection. Your values will be the IDs and the count will be there. Note the use of two values as the key in your new collection. If you want to find a specific instance you can do:
reducedcollection.findOne({'id.val1':'33333', 'id.val2':'22'})
The interesting thing happens is that you can now drop the old collection and as new data comes in, map reduce it on top of the reducedcollection and you'll increment the counts.
Might be handy?

Related

How to query element in array of arrays in MongoDB?

I'm learning MongoDB on my own. I have a collection with entries that look like this:
{
"_id" : ObjectId("5d0c13fbfdca455311248d6f"),
"borough" : "Brooklyn",
"grades" :
[
{ "date" : ISODate("2014-04-16T00:00:00Z"), "grade" : "A", "score" : 5 },
{ "date" : ISODate("2013-04-23T00:00:00Z"), "grade" : "B", "score" : 2 },
{ "date" : ISODate("2012-04-24T00:00:00Z"), "grade" : "A", "score" : 5 }
],
"name" : "C & C Catering Service",
"restaurant_id" : "40357437"
}
And I want to find all restaurants in Brooklyn with at least one grades.grade of A.
I've figured out the first half of the puzzle:
db.restaurants.find({borough:{$eq:"Brooklyn"}})
But how do I query in the "grades" array for grade A?
Use dot (.) to access and query nested objects:
db.restaurants.find({'borough':{$eq:"Brooklyn"}, 'grades.grade': 'A'})
db.restaurants.find({"borough" : "Brooklyn","grades.grade":"A"})

MongoDB addToSet in nested array

I'm struggling to insert data inside a nested array in MongoDB.
My schema looks like this:
{
"_id" : ObjectId("5c0c55642440311ff0353846"),
"name" : "Test",
"email" : "test#gmail.com",
"username" : "test",
"password" : "$2a$10$RftzGtgM.DqIiaSvH4LqOO6RnLgQfLY3nk7UIAH4OAvvxo0ZMSaHu",
"created" : ISODate("2018-12-08T23:36:04.464Z"),
"classes" : [
{
"_id" : ObjectId("5c0c556e2440311ff0353847"),
"cName" : "1A",
"student" : [
{
"grades" : [ ],
"_id" : ObjectId("5c0c55812440311ff0353848"),
"name" : "StudentName",
"lname" : "StudenteLastName",
"gender" : "M"
}
insert }
],
"__v" : 0
}.
What I want to do is inserting a grade for the student inside "grades" array.
Expected result is:
{
"_id" : ObjectId("5c0c55642440311ff0353846"),
"name" : "Test",
"email" : "test#gmail.com",
"username" : "test",
"password" : "$2a$10$RftzGtgM.DqIiaSvH4LqOO6RnLgQfLY3nk7UIAH4OAvvxo0ZMSaHu",
"created" : ISODate("2018-12-08T23:36:04.464Z"),
"classes" : [
{
"_id" : ObjectId("5c0c556e2440311ff0353847"),
"cName" : "1A",
"student" : [
{
"grades" : [6],
"_id" : ObjectId("5c0c55812440311ff0353848"),
"name" : "StudentName",
"lname" : "StudenteLastName",
"gender" : "M"
}
]
}
],
"__v" : 0
}.
I tried some queries but none of them helped me, even searching a lot.
db.teachers.update({"_id": ObjectId("5c0c55642440311ff0353846"), "classes._id": ObjectId("5c0c556e2440311ff0353847"), "classes.student._id": ObjectId("5c0c55812440311ff0353848")},{$addToSet: {"classes.$.student.grades":6}})
Basically, I searched for the student with the first curly bracket (if I do "db.teachers.find(the three conditions) the result is correct) and then add to the grades array (of Integer) the value 6. But at this point I get errors, I think I'm making a mistake on the "adding" part.
I need also to do the same thing in Mongoose.
Any help is appreciated, thanks in advance!
Edit: I solved. I post my solution hoping it'll be useful to other:
For pushing inside a triple nested array do:
db.teachers.update({"_id":ObjectId("5c0c59985ae5981c58937e12"),"classes":{ $elemMatch : { _id : ObjectId("5c0c59a35ae5981c58937e13") }},"classes.student": { $elemMatch : { _id : ObjectId("5c0c59aa5ae5981c58937e14")} }},{$addToSet:{"classes.$.student.0.grades":3}})
https://docs.mongodb.com/manual/tutorial/query-array-of-documents/
Try using $elemMatch
"classes":{ $elemMatch : { _id : ObjectId("5c0c556e2440311ff0353847") }},
"classes.student": { $elemMatch : { _id : ObjectId("5c0c55812440311ff0353848")} }

Getting array of object with limit and offset doesn't work using mongodb

First let me say that I am new to mongodb. I am trying to get the data from the collection
Here is the document in my collection student:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"id" : "l_7c0e37b9-132e-4054-adbf-649dbc29f43d",
"name" : "Raj",
"class" : "10th",
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
the output which i require is
{
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc571",
"name" : "1"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc572",
"name" : "2"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
for this response i used the following query
db.getCollection('student').find({},{"assignments":1})
Now what exactly I am trying is to apply limit and offset for the comments list I tried with $slice:[0,3] but it gives me whole document with sliced result
but not assignments alone so how can I combine these two in order to get only assignments with limit and offset.
You'll need to aggregate rather than find because aggregate allows you to project+slice.
Given the document from your question, the following command ...
db.getCollection('student').aggregate([
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
... returns:
{
"_id" : ObjectId("5979e0473f00003717a9bd62"),
"assignments" : [
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc573",
"name" : "3"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc574",
"name" : "4"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc575",
"name" : "5"
},
{
"id" : "v_539f65c2-9f45-4d92-b05e-973cf08cc576",
"name" : "6"
}
]
}
This represents the assignments array (and only the assignments array) with a slice from element 2 to 5. You can change the slice arguments (2, 5 in the above example) to apply your own offset and limit (where the first argument is the offset and the limit is the difference between the first and second arguments).
If you want to add a match condition (to address specific documents) to the above then you'd do something like this:
db.getCollection('other').aggregate([
/// match a specific document
{$match: {"_id": ObjectId("5979e0473f00003717a9bd62")}},
// project on assignments and apply a slice to the projection
{$project: {assignments: {$slice: ['$assignments', 2, 5]}}}
])
More details on the match step here.

Mongo Aggregation / Grouping Query

I have records of this form:
{
"_id" : ObjectId("57993e64498e9bebb535154f"),
"fooKey" : "123|a|b|c|||d",
"locationId" : 1,
"type" : "FOO"
}
{
"_id" : ObjectId("579e0a3d498e9bebb545ff96"),
"fooKey" : "123|x|y|z|||v",
"locationId" : 1,
"type" : "FOO"
}
{
"_id" : ObjectId("57a5443b498e381a40a26afb"),
"fooKey" : "123|a|b|c|||d",
"locationId" : 2,
"type" : "FOO"
}
{
"_id" : ObjectId("57a63fef498e381a40a60347"),
"fooKey" : "123|x|y|z|||v",
"locationId" : 2,
"type" : "FOO"
}
{
"_id" : ObjectId("579ab3ce498e9538125052ca"),
"fooKey" : "456|h|j|j|||k",
"locationId" : 2,
"type" : "BAR"
}
I went through the documentation and this seems like it could be complex given that I need this today (and I am not an expert in Mongo). What I need an aggregation query (for only records with "type" : "FOO") to return groups grouped by:
The first field in the pipe-delimited string in the "fooKey" field (for example "123"
The locationId
and then where the resulting count of the type field (where it is specifically equal to "FOO" is greater than 1.
That is given the records above I need something along the lines of records 1 and 2 returned in a group along with records 3 and 4 aggregated in a group... along with a count of the group size.
Expected Output
Something like this:
{
"foo": "123",
"locationId": 1,
"type": "FOO",
"total": 2
},
{
"foo": "123",
"locationId": 2,
"type": "FOO",
"total": 2
}

Saving the result of a MongoDB query

When doing a research in mongo shell I often write quite complex queries and want the result to be stored in other collection. I know the way to do it with .forEach():
db.documents.find(query).forEach(function(d){db.results.insert(d)})
But it's kind of tedious to write that stuff each time. Is there a cleaner way? I'd like the syntax to be something like db.documents.find(query).dumpTo('collectionName').
Here's a solution I'll use: db.results.insert(db.docs.find(...).toArray())
There is still too much noise, though.
UPD: There is also an option to rewrite find using aggregation pipeline. Then you can use $out operator.
It looks like you are doing your queries from the mongo shell, which allows you to write code in javascript. You can assign the result of queries to a variable:
result = db.mycollection.findOne(my_query)
And save the result to another collection:
db.result.save(result)
You might have to remove the _id of the result if you want to append it to the result collection, to prevent a duplicate key error
Edit:
db.mycollection.findOne({'_id':db.mycollection.findOne()['_id']})
db.foo.save(db.bar.findOne(...))
If you want to save an array, you can write a javascript function. Something like the following should work (I haven't tested it):
function save_array(arr) {
for(var i = 0; i < arr.length; i++) {
db.result.save(arr[i])
}
}
...
result = db.mycollection.find(...)
save_array(result)
If you want the function to be available every time you start mongo shell, you can include it in your .mongorc.js file
As far as I know, there isn't built-in functionality to do this in MongoDB.
Other options would be to use mongoexport/mongoimport or mongodump/mongorestore functionalities.
In both mongoexport and mongodump you can filter the results by adding query options using --query <JSON> or -q <JSON>.
If your query is using an aggregation operator then the solution is as sample as using the $out.
I created a sample Collection with the name "tester" which contain the following records.
{ "_id" : ObjectId("4fb36bfd3d1c88bfa15103b1"), "name" : "bob", "value" : 5, "state" : "b"}
{ "_id" : ObjectId("4fb36c033d1c88bfa15103b2"), "name" : "bob", "value" : 3, "state" : "a"}
{ "_id" : ObjectId("4fb36c063d1c88bfa15103b3"), "name" : "bob", "value" : 7, "state" : "a"}
{ "_id" : ObjectId("4fb36c0c3d1c88bfa1a03b4"), "name" : "john", "value" : 2, "state" : "a"}
{ "_id" : ObjectId("4fb36c103d1c88bfa5103b5"), "name" : "john", "value" : 4, "state" : "b"}
{ "_id" : ObjectId("4fb36c143d1c88bfa15103b"), "name" : "john", "value" : 8, "state" : "b"}
{ "_id" : ObjectId("4fb36c163d1c88bfa15103a"), "name" : "john", "value" : 6, "state" : "a"}
Now using the aggregate operator I perform a group by and then save the result into a new collection using this magical operator "$out".
db.tester.aggregate([{$group:{
_id:{name:"$name",state:"$state"},
min:{$min:"$value"},
max:{$max:"$value"},
} },
{$out:"tester_max_min"}
])
What basically the query is trying to do is, group by name & state and find the min and max values for each individual group, and then save the result into a new collection named "tester_max_min"
db.tester_max_min.find();
The new collection formed will have the following documents in it :
{ "_id" : { "name" : "john", "state" : "b" }, "min" : 4, "max" : 8 }
{ "_id" : { "name" : "john", "state" : "a" }, "min" : 2, "max" : 6 }
{ "_id" : { "name" : "bob", "state" : "a" }, "min" : 3, "max" : 7 }
{ "_id" : { "name" : "bob", "state" : "b" }, "min" : 5, "max" : 5 }
I still need to explore how helpful can $out is but it works like a charm for any aggregator operator.