grouping mongo documents using elements of array field - mongodb

I have below 3 documents. Each represents a contact for a user :
{
"_id" : ObjectId("57f9f9f3b91d070315273d0d"),
"profileId" : "test",
"displayName" : "duplicateTest",
"email" : [
{
"emailId" : "a#a.com"
},
{
"emailId" : "b#b.com"
},
{
"emailId" : "c#c.com"
}
]
}
{
"_id" : ObjectId("57f9fab2b91d070315273d11"),
"profileId" : "test",
"displayName" : "duplicateTest2",
"email" : [
{
"emailId" : "a#a.com"
}
]
}
{
"_id" : ObjectId("57f9fcefb91d070315273d15"),
"profileId" : "test",
"displayName" : "duplicateTest2",
"email" : [
{
"emailId" : "b#b.com"
}
]
}
I need to aggregate/group them by array elements so that I can identify the duplicate contact ( based on email id). Since there is a common email id between doc (1 & 2) and doc( 1 & 3) these 3 represent one contact and should be merged into one as one contact.
I tried doing this using $unwind and $group in java as below:
List<DBObject> aggList = new ArrayList<DBObject>();
BasicDBObject dbo = new BasicDBObject("$match", new BasicDBObject("profileId", "0fb72dcf-292b-4343-a0e7-1d613a803b1e"));
aggList.add(dbo);
BasicDBObject dboUnwind = new BasicDBObject("$unwind", "$email");
aggList.add(dboUnwind);
BasicDBObject dboGroup = new BasicDBObject("$group",
new BasicDBObject().append("_id", new BasicDBObject("name", "$email.emailId"))
.append("uniqueIds", new BasicDBObject("$addToSet", "$_id"))
.append("count", new BasicDBObject("$sum", 1)));
aggList.add(dboGroup);
BasicDBObject dboCount = new BasicDBObject("$match", new BasicDBObject("count", new BasicDBObject("$gte", 2)));
aggList.add(dboCount);
BasicDBObject dboSort = new BasicDBObject("$sort", new BasicDBObject("count",-1));
aggList.add(dboSort);
BasicDBObject dboLimit = new BasicDBObject("$limit", 10);
aggList.add(dboLimit);
AggregationOutput output = collection.aggregate(aggList);
System.out.println(output.results());
This groups docs by email id (and rightly so) but doesn't serves the purpose.
Any help would be highly appreciated.
I need to implement the feature where user can be prompted about the possible duplicate contacts in his repository. I need aggregation result to be something like:
[
{
"_id":{
"name":[
{
"emailId" : "a#a.com"
},
{
"emailId" : "b#b.com"
},
{
"emailId" : "c#c.com"
}
]
},
"uniqueIds":[
{
"$oid":"57f9fcefb91d070315273d15"
},
{
"$oid":"57f9fcefb91d070315273d11"
},
{
"$oid":"57f9fcefb91d070315273d15"
}
],
"count":3
},
So basically, I need _id for all possible duplicate contacts (there could be another group of duplicates with _ids list as above) so that I can prompt it to user and user can merge them at his will.
Hope its more clear now. Thanks!

Well your question differs a bit from the result you are seeking. Your inital question pointed me to the following aggregation:
db.table.aggregate(
[
{
$unwind: "$email"
},
{
$group: {
_id : "$email.emailId",
duplicates : { $addToSet : "$_id"}
}
}
]
);
This results in:
{
"_id" : "c#c.com",
"duplicates" : [
ObjectId("57f9f9f3b91d070315273d0d")
]
}
{
"_id" : "b#b.com",
"duplicates" : [
ObjectId("57f9fcefb91d070315273d15"),
ObjectId("57f9f9f3b91d070315273d0d")
]
}
{
"_id" : "a#a.com",
"duplicates" : [
ObjectId("57f9fab2b91d070315273d11"),
ObjectId("57f9f9f3b91d070315273d0d")
]
}
Grouped by EMail.
But the sample output you added to your question made this aggregation:
db.table.aggregate(
[
{
$unwind: "$email"
},
{
$group: {
_id : "$profileId",
emails : { $addToSet : "$email.emailId"},
duplicates : { $addToSet : "$_id"}
}
}
]
);
Which results in:
{
"_id" : "test",
"emails" : [
"c#c.com",
"b#b.com",
"a#a.com"
],
"duplicates" : [
ObjectId("57f9fcefb91d070315273d15"),
ObjectId("57f9fab2b91d070315273d11"),
ObjectId("57f9f9f3b91d070315273d0d")
]
}

Related

I am new to mongoDB need a query to delete the collections

I have two collections.
1.Equipment
db.getCollection("Equipment").find({
$and: [
{ $where: 'this._id.length <= 7' },
{ "model": "A505"}
]})
{
"_id" : "1234567",
"locationId" : "DATALOAD",
"model" : "A505",
"subscriberId" : "",
"status" : "Stock",
"headendNumber" : "4"
}
{
"_id" : "P13050I",
"locationId" : "1423110302801",
"model" : "A505",
"subscriberId" : "37",
"status" : "Stock",
"headendNumber" : "4"
}
I will get more than 100 documents (rows) Equipment collection.
2.Subscriber
db.getCollection('Subscriber').find({})
{
"_id" : "5622351",
"equipment" : [
"0018015094E6",
"1234567",
"ADFB70878422",
"M10610TCB052",
"MA1113FHQ151"
]
}
{
"_id" : "490001508063",
"equipment" : [
"17616644510288",
"P13050I",
"M91416EA4251",
"128552270280560"
]
}
In the Subscriber collection, I need to remove (get all the id from Equipment collection loop it) only the matches equipment field.
Forex from the above result, I need to remove only "1234567", and "P13050I"
Expected output.
db.getCollection('Subscriber').find({})
{
"_id" : "5622351",
"equipment" : [
"0018015094E6",
"ADFB70878422",
"M10610TCB052",
"MA1113FHQ151"
]
}
{
"_id" : "490001508063",
"equipment" : [
"17616644510288",
"M91416EA4251",
"128552270280560"
]
}
Please help me, anyone.
You can use the following to update records.
Let's find records which need to deleted and store them in array
var equipments = [];
db.getCollection("Equipment").find({ $and: [
{ $where: 'this._id.length <= 7' },
{ "model": "A505"}
]}).forEach(function(item) => {
equipments.push(item._id)
})
Now, iterate over records of the second collection and update if required.
db.getCollection('Subscriber').find({}).forEach(function(document) => {
var filtered = document.equiment.filter(id => equipments.indexOf(id) < 0);
if(filtered.length < document.equipment.length){
db.getCollection('Subscriber').update({"_id": document.id }, { $set: {'equipment': filtered}})
}
})
.filter(id => equipments.indexOf(id) < 0) will keep entries which is not present in initially populated array equipments and it will persist if there is any change.

How to update Meteor array element inside a document

I have a Meteor Mongo document as shown below
{
"_id" : "zFndWBZTvZPgSKXHP",
"activityId" : "aRDABihAYFoAW7jbC",
"activityTitle" : "Test Mongo Document",
"users" : [
{
"id" : "b1#gmail.com",
"type" : "free"
},
{
"id" : "JqKvymryNaCjjKrAR",
"type" : "free"
},
],
}
I want to update a specific array element's email with custom generated id using Meteor query something like the below.
for instance, I want to update the document
if 'users.id' == "b1#gmail.com" then update it to users.id = 'SomeIDXXX'
So updated document should looks like below.
{
"_id" : "zFndWBZTvZPgSKXHP",
"activityId" : "aRDABihAYFoAW7jbC",
"activityTitle" : "Test Mongo Document",
"users" : [
{
"id" : "SomeIDXXX",
"type" : "free"
},
{
"id" : "JqKvymryNaCjjKrAR",
"type" : "free"
},
],
}
I have tried the below but didnt work.
Divisions.update(
{ activityId: activityId, "users.id": emailId },
{ $set: { "users": { id: _id } } }
);
Can someone help me with the relevant Meteor query ? Thanks !
Your query is actually almost right except for a small part where we want to identify the element to be updated by its index.
Divisions.update({
"activityId": "aRDABihAYFoAW7jbC",
"users.id": "b1#gmail.com"
}, {
$set: {"users.$.id": "b2#gmail.com"}
})
You might need the arrayFilters option.
Divisions.update(
{ activityId: activityId },
{ $set: { "users.$[elem].id": "SomeIDXXX" } },
{ arrayFilters: [ { "elem.id": "b1#gmail.com" } ], multi: true }
);
https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/
You need to use the $push operator instead of $set.
{ $push: { <field1>: <value1>, ... } }

Spring data MongoDb query based on last element of nested array field

I have the following data (Cars):
[
{
"make" : “Ferrari”,
"model" : “F40",
"services" : [
{
"type" : "FULL",
“date_time" : ISODate("2019-10-31T09:00:00.000Z"),
},
{
"type" : "FULL",
"scheduled_date_time" : ISODate("2019-11-04T09:00:00.000Z"),
}
],
},
{
"make" : "BMW",
"model" : “M3",
"services" : [
{
"type" : "FULL",
"scheduled_date_time" : ISODate("2019-10-31T09:00:00.000Z"),
},
{
"type" : "FULL",
“scheduled_date_time" : ISODate("2019-11-04T09:00:00.000Z"),
}
],
}
]
Using Spring data MongoDb I would like a query to retrieve all the Cars where the scheduled_date_time of the last item in the services array is in-between a certain date range.
A query which I used previously when using the first item in the services array is like:
mongoTemplate.find(Query.query(
where("services.0.scheduled_date_time").gte(fromDate)
.andOperator(
where("services.0.scheduled_date_time").lt(toDate))),
Car.class);
Note the 0 index since it's first one as opposed to the last one (for my current requirement).
I thought using an aggregate along with a projection and .arrayElementAt(-1) would do the trick but I haven't quite got it to work. My current effort is:
Aggregation agg = newAggregation(
project().and("services").arrayElementAt(-1).as("currentService"),
match(where("currentService.scheduled_date_time").gte(fromDate)
.andOperator(where("currentService.scheduled_date_time").lt(toDate)))
);
AggregationResults<Car> results = mongoTemplate.aggregate(agg, Car.class, Car.class);
return results.getMappedResults();
Any help suggestions appreciated.
Thanks,
This mongo aggregation retrieves all the Cars where the scheduled_date_time of the last item in the services array is in-between a specific date range.
[{
$addFields: {
last: {
$arrayElemAt: [
'$services',
-1
]
}
}
}, {
$match: {
'last.scheduled_date_time': {
$gte: ISODate('2019-10-26T04:06:27.307Z'),
$lt: ISODate('2019-12-15T04:06:27.319Z')
}
}
}]
I was trying to write it in spring-data-mongodb without luck.
They do not support $addFields yet, see here.
Since version 2.2.0 RELEASE spring-data-mongodb includes the Aggregation Repository Methods
The above query should be
interface CarRepository extends MongoRepository<Car, String> {
#Aggregation(pipeline = {
"{ $addFields : { last:{ $arrayElemAt: [$services,-1] }} }",
"{ $match: { 'last.scheduled_date_time' : { $gte : '$?0', $lt: '$?1' } } }"
})
List<Car> getCarsWithLastServiceDateBetween(LocalDateTime start, LocalDateTime end);
}
This method logs this query
[{ "$addFields" : { "last" : { "$arrayElemAt" : ["$services", -1]}}}, { "$match" : { "last.scheduled_date_time" : { "$gte" : "$2019-11-03T03:00:00Z", "$lt" : "$2019-11-05T03:00:00Z"}}}]
The date parameters are not parsing correctly. I didn't spend much time making it work.
If you want the Car Ids this could work.
public List<String> getCarsIdWithServicesDateBetween(LocalDateTime start, LocalDateTime end) {
return template.aggregate(newAggregation(
unwind("services"),
group("id").last("services.date").as("date"),
match(where("date").gte(start).lt(end))
), Car.class, Car.class)
.getMappedResults().stream()
.map(Car::getId)
.collect(Collectors.toList());
}
Query Log
[{ "$unwind" : "$services"}, { "$group" : { "_id" : "$_id", "date" : { "$last" : "$services.scheduled_date_time"}}}, { "$match" : { "date" : { "$gte" : { "$date" : 1572750000000}, "$lt" : { "$date" : 1572922800000}}}}]

How to get last array element while Projection mongodb

I have following document structure (This is dummy document for understanding purpose)
{
"id" : "p1245",
"Info" : [
{
"cloth_name" : "ABC",
"cloth_type" : "C"
},
{
"cloth_name" : "PQR",
"cloth_type" : "J"
},
{
"cloth_name" : "SAM",
"cloth_type" : "T"
}
]
},
{
"id" : "p124576",
"Info" : [
{
"cloth_name" : "HTC",
"cloth_type" : "C"
}
]
}
From these document I want to project the "cloth_type", so I tried following java code
DBObject fields = new BasicDBObject("id", 1);
fields.put("ClothType","$Info.cloth_type");
DBObject project = new BasicDBObject("$project", fields);
List<DBObject> pipeline = Arrays.asList(project);
AggregationOptions aggregationOptions = AggregationOptions.builder().batchSize(100).outputMode(AggregationOptions.OutputMode.CURSOR).allowDiskUse(true).build();
Cursor cursor = collection.aggregate(pipeline, aggregationOptions);
while (cursor.hasNext())
{
System.out.println(cursor.next());
}
(I don't want to use "$unwind" here)
and get following output:
{ "id" : "p1245" , "ClothType" : [ "C" , "J" , "T"]}
{ "id" : "p124576" , "ClothType" : [ "C"]}
If there are multiple "cloth_type" for single id, then I want only the last cloth_type from this array.
I want something like, e.g. if there is array of "ClothType" [ "C", "J", "T"] then I want to project only [ "T"] i.e last element of array.
Is there any ways to achive this without using "$unwind".

Updating an array of objects with a new key in mongoDB

Similar to this question
Barrowing the data set, I have something similar to this:
{
'user_id':'{1231mjnD-32JIjn-3213}',
'name':'John',
'campaigns':
[
{
'campaign_id':3221,
'start_date':'12-01-2012',
},
{
'campaign_id':3222,
'start_date':'13-01-2012',
}
]
}
And I want to add a new key in the campaigns like so:
{
'user_id':'{1231mjnD-32JIjn-3213}',
'name':'John',
'campaigns':
[
{
'campaign_id':3221,
'start_date':'12-01-2012',
'worker_id': '00000'
},
{
'campaign_id':3222,
'start_date':'13-01-2012',
'worker_id': '00000'
}
]
}
How to insert/update a new key into an array of objects?
I want to add a new key into every object inside the array with a default value of 00000.
I have tried:
db.test.update({}, {$set: {'campaigns.worker_id': 00000}}, true, true)
db.test.update({}, {$set: {campaigns: {worker_id': 00000}}}, true, true)
Any suggestions?
I'm supposing that this operation will occur once, so you can use a script to handle it:
var docs = db.test.find();
for(var i in docs) {
var document = docs[i];
for(var j in document.campaigns) {
var campaign = document.campaigns[j];
campaign.worker_id = '00000';
}
db.test.save(document);
}
The script will iterate over all documents in your collection then over all campaigns in each document, setting the *worker_id* property.
At the end, each document is persisted.
db.test.update({}, {$set: {'campaigns.0.worker_id': 00000}}, true, true
this will update 0 element.
if you want to add a new key into every object inside the array you should use:
$unwind
example:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date() ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
unwinding tags
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
result:
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
After you could write simple updaiting query.