Mongoose remove duplicate social security customer - mongodb

I have customers who have duplicate SocialSecurity and I want to remove them and keep the newest customer. I am doing this by comparing _id and keeping the one with the largest value. Unfortunately, when I am playing with dumby data, it seems like my code does not always delete the one with the smallest _id. Any idea why? I thought the $sort would work
let hc = db.getSiblingDB('customer');
hc.customers.aggregate([
{
"$group": {
_id: {socialsecurity: "$socialsecurity"},
imeis: { $addToSet: "$_id" },
count: { $sum : 1 }
}
},
{
"$match": {
count: { "$gt": 1 }
}
},
{
$sort: {_id: 1}
}
]).forEach(function(doc) {
doc.socialsecurity.shift();
hc.customers.remove({
_id: {$in: doc.socialsecurity}
});
})

Problem
{ $sort: { _id: 1} } is sorting in ascending order
So smallest will be the 1st _id in the array
doc.socialsecurity.shift(); will remove 1st element from the array that is smallest one.
Solution
{ $sort: { _id: -1 } } sort in descending order
OR
change doc.socialsecurity.shift(); to doc.socialsecurity.pop(); remove the last element from the array.

Related

How to sort a dictionary keys and pick the first in MongoDb?

I'm running the following query as described in the docs.
db.getCollection('things')
.find(
{ _id: UUID("...") },
{ _id: 0, history: 1 }
)
It produces a single element that, when unfolded in the GUI, shows the dictonary history. When I unfold that, I get to see the contents: bunch of keys and correlated values.
Now, I'd like to sort the keys alphabetically and pick n first ones. Please note that it's not an array but a dictionary that is stored. Also, it would be great if I could flatten the structure and pop up my history to be the head (root?) of the document returned.
I understand it's about projection and slicing. However, I'm not getting anywhere, despite many attempts. I get syntax errors or a full list of elements. Being rather nooby, I fear that I require a few pointers on how to diagnose my issue to begin with.
Based on the comments, I tried with aggregate and $sort. Regrettably, I only seem to be sorting the current output (that produces a single document due to the match condition). I want to access the elements inside history.
db.getCollection('things')
.aggregate([
{ $match: { _id: UUID("...") } },
{ $sort: { history: 1 } }
])
I'm sensing that I should use projection to pull out a list of elements residing under history but I'm getting no success using the below.
db.getCollection('things')
.aggregate([
{ $match: { _id: UUID("...") } },
{ $project: { history: 1, _id: 0 } }
])
It is a long process to just sort object properties by alphabetical order,
$objectToArray convert history object to array in key-value format
$unwind deconstruct above generated array
$sort by history key by ascending order (1 = ascending, -1 = descending)
$group by _id and reconstruct history key-value array
$slice to get your number of properties from dictionary from top, i have entered 1
$arrayToObject back to convert key-value array to object format
db.getCollection('things').aggregate([
{ $match: { _id: UUID("...") } },
{ $project: { history: { $objectToArray: "$history" } } },
{ $unwind: "$history" },
{ $sort: { "history.k": 1 } },
{
$group: {
_id: "$_id",
history: { $push: "$history" }
}
},
{
$project: {
history: {
$arrayToObject: { $slice: ["$history", 1] }
}
}
}
])
Playground
There is another option, but as per MongoDB, it can not guarantee this will reproduce the exact result,
$objectToArray convert history object to array in key-value format
$setUnion basically this operator will get unique elements from an array, but as per experience, it will sort elements by key ascending order, so as per MongoDB there is no guarantee.
$slice to get your number of properties from dictionary from top, i have entered 1
$arrayToObject back to convert key-value array to object format
db.getCollection('things').aggregate([
{ $match: { _id: UUID("...") } },
{
$project: {
history: {
$arrayToObject: {
$slice: [
{ $setUnion: { $objectToArray: "$history" } },
1
]
}
}
}
}
])
Playground

Count nested wildcard array mongodb query

I have the following data of users and model cars:
[
{
"user_id":"ebebc012-082c-4e7f-889c-755d2679bdab",
"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d":1,
"car_37c04124-cb12-436c-902b-6120f4c51782":0,
"car_b78ddcd0-1136-4f45-8599-3ce8d937911f":1
},
{
"user_id":"f3eb2a61-5416-46ba-bab4-459fbdcc7e29",
"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d":1,
"car_0d15eae9-9585-4f49-a416-46ff56cd3685":1
}
]
I want to see how many users have a car_ with the value 1 using mongodb, something like:
{"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d": 2}
For this example.
The issue is that I will never know how are the fields car_ are going to be, they will have a random structure (wildcard).
Notes:
car_id and user_id are at the same level.
The car_id is not given, I simply want to know for the entire database which are the most commmon cars_ with value 1.
$group by _id and convert root object to array using $objectToArray,
$unwind deconstruct root array
$match filter root.v is 1
$group by root.k and get total count
db.collection.aggregate([
{
$group: {
_id: "$_id",
root: { $first: { $objectToArray: "$$ROOT" } }
}
},
{ $unwind: "$root" },
{ $match: { "root.v": 1 } },
{
$group: {
_id: "$root.k",
count: { $sum: 1 }
}
}
])
Playground

MongoDB aggregation: How to get the index of a document in a collection depending sorted by a document property

Assume I have a collection with millions of documents. Below is a sample of how the documents look like
[
{ _id:"1a1", points:[2,3,5,6] },
{ _id:"1a2", points:[2,6] },
{ _id:"1a3", points:[3,5,6] },
{ _id:"1b1", points:[1,5,6] },
{ _id:"1c1", points:[5,6] },
// ... more documents
]
I want to query a document by _id and return a document that looks like below:
{
_id:"1a1",
totalPoints: 16,
rank: 29
}
I know I can query the whole document, sort by descending order then get the index of the document I want by _id and add one to get its rank. But I have worries about this method.
If the documents are in millions won't this be 'overdoing' it. Querying a whole collection just to get one document? Is there a way to achieve what I want to achieve without querying the whole collection? Or the whole collection has to be involved because of the ranking?
I cannot save them ranked because the points keep on changing. The actual code is more complex but the take away is that I cannot save them ranked.
Total points is the sum of the points in the points array. The rank is calculated by sorting all documents in descending order. The first document becomes rank 1 and so on.
an aggregation pipeline like the following can get the result you want. but how it operates on a collection of millions of documents remains to be seen.
db.collection.aggregate(
[
{
$group: {
_id: null,
docs: {
$push: { _id: '$_id', totalPoints: { $sum: '$points' } }
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
},
{
$sort: { totalPoints: -1 }
},
{
$group: {
_id: null,
docs: { $push: '$$ROOT' }
}
},
{
$set: {
docs: {
$map: {
input: {
$filter: {
input: '$docs',
as: 'x',
cond: { $eq: ['$$x._id', '1a3'] }
}
},
as: 'xx',
in: {
_id: '$$xx._id',
totalPoints: '$$xx.totalPoints',
rank: {
$add: [{ $indexOfArray: ['$docs._id', '1a3'] }, 1]
}
}
}
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
}
])

Mongo Query to return common values in array

I need a Mongo Query to return me common values present in an array.
So if there are 4 documents in match, then the values are returned if those are present in in all the 4 documents
Suppose I have the below documents in my db
Mongo Documents
{
"id":"0",
"merchants":["1","2"]
}
{
"id":"1",
"merchants":["1","2","4"]
}
{
"id":"2",
"merchants":["4","5"]
}
Input : List of id
(i) Input with id "0" and "1"
Then it should return me merchants:["1","2"] as both are present in documents with id "0" & id "1"
(ii) Input with id "1" and "2"
Then it should return me merchants:["4"] as it is common and present in both documents with id "1" & id "2"
(iii) Input with id "0" and "2"
Should return empty merchants:[] as no common merchants between these 2 documents
You can try below aggregation.
db.collection.aggregate(
{$match:{id: {$in: ["1", "2"]}}},
{$group:{_id:null, first:{$first:"$merchants"}, second:{$last:"$merchants"}}},
{$project: {commonToBoth: {$setIntersection: ["$first", "$second"]}, _id: 0 } }
)
Say you have a function query that does the required DB query for you, and you'll call that function with idsToMatch which is an array containing all the elements you want to match. I have used JS here as the driver language, replace it with whatever you are using.
The following code is dynamic, will work for any number of ids you give as input:
const query = (idsToMatch) => {
db.collectionName.aggregate([
{ $match: { id: {$in: idsToMatch} } },
{ $unwind: "$merchants" },
{ $group: { _id: { id: "$id", data: "$merchants" } } },
{ $group: { _id: "$_id.data", count: {$sum: 1} } },
{ $match: { count: { $gte: idsToMatch.length } } },
{ $group: { _id: 0, result: {$push: "$_id" } } },
{ $project: { _id: 0, result: "$result" } }
])
The first $group statement is to make sure you don't have any
repetitions in any of your merchants attribute in a document. If
you are certain that in your individual documents you won't have any
repeated value for merchants, you need not include it.
The real work happens only upto the 2nd $match phase. The last two
phases ($group and $project) are only to prettify the result,
you may choose not use them, and instead use the language of your
choice to transform it in the form you want
Assuming you want to reduce the phases as per the points given above, the actual code will reduce to:
aggregate([
{ $match: { id: {$in: idsToMatch} } },
{ $unwind: "$merchants" },
{ $group: { _id: "merchants", count: {$sum: 1} } },
{ $match: { count: { $gte: idsToMatch.length } } }
])
Your required values will be at the _id attribute of each element of the result array.
The answer provided by #jgr0 is correct to some extent. The only mistake is the intermediate match operation
(i) So if input ids are "1" & "0" then the query becomes
aggregate([
{"$match":{"id":{"$in":["1","0"]}}},
{"$unwind":"$merchants"},
{"$group":{"_id":"$merchants","count":{"$sum":1}}},
{"$match":{"count":{"$eq":2}}},
{"$group":{"_id":null,"merchants":{"$push":"$_id"}}},
{"$project":{"_id":0,"merchants":1}}
])
(ii) So if input ids are "1", "0" & "2" then the query becomes
aggregate([
{"$match":{"id":{"$in":["1","0", "2"]}}},
{"$unwind":"$merchants"},
{"$group":{"_id":"$merchants","count":{"$sum":1}}},
{"$match":{"count":{"$eq":3}}},
{"$group":{"_id":null,"merchants":{"$push":"$_id"}}},
{"$project":{"_id":0,"merchants":1}}
])
The intermediate match operation should be the count of ids in input. So in case (i) it is 2 and in case (2) it is 3.

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.