MongoDB Aggregation: Dedupe by array in subdocuments - mongodb

I have an aggregation query which calculates records by tag combinations this query is working well however it has one issue which is that it duplicates documents for tag combinations that are in different orders e.g. i could have one document with the tags: ['one', 'two'] and a second document with ['two' 'one'] the rest of the data would be exactly the same.
My first thought would be to do a $group aggregation query and search how to order the arrays in a project query however i cannot find anywhere how to do this. I did see for update queries you can use '$push' however this feature doesnt seem to exist for $project queries.
an example document at this phase is something like this
_id: "sadasdsad"
tags: ['one', 'two'],
total_count:37,
second_count:14,
what would be the best approach to solving this issue?

You can sort your array using $unwind,$sort and finally $group so all your tags are the same before grouping. Example : https://mongoplayground.net/p/EZi04LfY1ff
However, I would try to store those tags already sorted. So you can avoid these steps.
db.collection.aggregate({
"$unwind": "$tag"
},
{
"$sort": {
key: 1,
tag: 1
}
},
{
"$group": {
"_id": "$key",
"tag": {
"$push": "$tag"
}
}
},
{
"$group": {
"_id": "$tag",
"field": {
"$push": "$$ROOT"
}
}
})

Related

Project nested array element to top level using MongoDB aggregation pipeline

I have a groups collection with documents of the form
{
"_id": "g123"
...,
"invites": [
{
"senderAccountId": "a456",
"recipientAccountId": "a789"
},
...
]
}
I want to be able to list all the invites received by a user.
I thought of using an aggregation pipeline on the groups collection that filters all the groups to return only those to which the user has been invited to.
db.groups.aggregate([
{
$match: {
"invites.recipientAccountID": "<user-id>"
}
}
])
Lastly I want to project this array of groups to end up with an array of the form
[
{
"senderAccountId": "a...",
"recipientAccountId": "<user-id>",
"groupId": "g...", // Equal to "_id" field of document.
},
...
]
But I'm missing the "project" step in my aggregation pipeline to bring to the top-level the nested senderAccountId and recipientAccountId fields. I have seen examples online of projections in MongoDB queries and aggregation pipelines but I couldn't find examples for projecting the previously matched element of an array field of a document to the top-level.
I've thought of using Array Update Operators to reference the matched element but couldn't get any meaningful progress using this method.
There are multiple ways to do this, using a combination of unwind and project would work as well. Unwind will create one object for each and project let you choose how you want to structure your result with current variables.
db.collection.aggregate([
{
"$unwind": "$invites"
},
{
"$match": {
"invites.recipientAccountId": "a789"
}
},
{
"$project": {
recipientAccountId: "$invites.recipientAccountId",
senderAccountId: "$invites.senderAccountId",
groupId: "$_id",
_id: 0 // don't show _id key:value
}
}
])
You can also use nimrod serok's $replaceRoot in place of the $project one
{$replaceRoot: {newRoot: {$mergeObjects: ["$invites", {group: "$_id"}]}}}
playground
nimrod serok's solution might be a bit better because mine unwind it first and then matches it but I believe mine is more readable
I think what you want is $replaceRoot:
db.collection.aggregate([
{$match: {"invites.recipientAccountId": "a789"}},
{$set: {
invites: {$first: {
$filter: {
input: "$invites",
cond: {$eq: ["$$this.recipientAccountId", "a789"]}
}
}}
}},
{$replaceRoot: {newRoot: {$mergeObjects: ["$invites", {group: "$_id"}]}}}
])
See how it works on the playground example

How to group documents of a collection to a map with unique field values as key and count of documents as mapped value in mongodb?

I need a mongodb query to get the list or map of values with unique value of the field(f) as the key in the collection and count of documents having the same value in the field(f) as the mapped value. How can I achieve this ?
Example:
Document1: {"id":"1","name":"n1","city":"c1"}
Document2: {"id":"2","name":"n2","city":"c2"}
Document3: {"id":"3","name":"n1","city":"c3"}
Document4: {"id":"4","name":"n1","city":"c5"}
Document5: {"id":"5","name":"n2","city":"c2"}
Document6: {"id":"6,""name":"n1","city":"c8"}
Document7: {"id":"7","name":"n3","city":"c9"}
Document8: {"id":"8","name":"n2","city":"c6"}
Query result should be something like this if group by field is "name":
{"n1":"4",
"n2":"3",
"n3":"1"}
It would be nice if the list is also sorted in the descending order.
It's worth noting, using data points as field names (keys) is somewhat considered an anti-pattern and makes tooling difficult. Nonetheless if you insist on having data points as field names you can use this complicated aggregation to perform the query output you desire...
Aggregation
db.collection.aggregate([
{
$group: { _id: "$name", "count": { "$sum": 1} }
},
{
$sort: { "count": -1 }
},
{
$group: { _id: null, "values": { "$push": { "name": "$_id", "count": "$count" } } }
},
{
$project:
{
_id: 0,
results:
{
$arrayToObject:
{
$map:
{
input: "$values",
as: "pair",
in: ["$$pair.name", "$$pair.count"]
}
}
}
}
},
{
$replaceRoot: { newRoot: "$results" }
}
])
Aggregation Explanation
This is a 5 stage aggregation consisting of the following...
$group - get the count of the data as required by name.
$sort - sort the results with count descending.
$group - place results into an array for the next stage.
$project - use the $arrayToObject and $map to pivot the data such
that a data point can be a field name.
$replaceRoot - make results the top level fields.
Sample Results
{ "n1" : 4, "n2" : 3, "n3" : 1 }
For whatever reason, you show desired results having count as a string, but my results show the count as an integer. I assume that is not an issue, and may actually be preferred.

Unwind dictionary values in mongodb aggregation framework

I need to create some plots from single documents existing in mongodb. I can only use the mongodb aggregation framework (so for example I cannot just pull the documents into python and work with it there). I am using the query builder of metabase, so I am limited from this regard.
In order to do this, I am first using some $match queries in order to identify the documents that I need to look at (these are predefined and static). After the $match stage, I am left with one document (this is ok) with the following structure.
{
"id": 1,
"locs": {
"a":1,
"b":2,
"c":3
}
}
I need to change this structure to something like this:
[{"a":1}, {"b":2}, {"c":3"}]
or any other form that would allow me to create pie charts out of the structure.
Thanks!
You can convert locs object to array using $objectToArray. Now $unwind the locs array to split into multiple documents. Use $group with $push accumulator to make the split data again into k and v format. And finally use $replaceRoot with the final data field to move it to $$ROOT position.
db.collection.aggregate([
{ "$project": { "data": { "$objectToArray": "$locs" }}},
{ "$unwind": "$data" },
{ "$group": {
"_id": "$data",
"data": { "$push": { "k": "$data.k", "v": "$data.v" }}
}},
{ "$project": {
"data": { "$arrayToObject": "$data" }
}},
{ "$replaceRoot": { "newRoot": "$data" }}
])

MapReduce: aggregate in map function?

Suppose you have a DB where every document is a tweet from Twitter, and you want, with MapReduce, to generate another document that contains:
Number of tweets published on every country
List of words contained in those tweets, with a counter that counts the total hits of that word. This, for every country too.
My question: is it fine to aggregate and count the words on the map function, and then again on the reduce function? Doing it like this, the output of the map function represents the information of a single tweet, and the reduce function aggregates the info from several tweets, all from the same country, but I don't know if this is a good practice with the MapReduce algorithm...
Thank you in advance!
In mongoDB 3.4 you can do this process with aggregation framework.
For the first bullet, you just have to use $group operator at the country field and count the tweets.
For the second bullet, you have to use $split(new in 3.4) operator at the field of the tweet text, then use $unwind at the generated array and finally use $group with word as _id or country + word as _id.
If you have an older version of mongodb then you have to use map-reduce procedure but, have in mind, aggregation framework is much faster than map-reduce at mongodb.
$split: https://docs.mongodb.com/manual/reference/operator/aggregation/split/#exp._S_split
$unwind: https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
$group: https://docs.mongodb.com/manual/reference/operator/aggregation/group/
Building from the great answer above by Moi Syme, you ideally would run the following aggregate operation to get the desired result:
db.tweets.aggregate([
{ "$project": { "wordList": { "$split": [ "$text", " " ] }, "user.country": 1 } },
{ "$unwind": "$wordList" },
{
"$group": {
"_id": {
"country": "$user.country",
"word": "$wordList"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.country",
"numberOfTweets": { "$sum": 1 },
"counts": {
"$push": {
"word": "$_id.word",
"count": "$count"
}
}
}
}
])

How to do SQL INTERSECT OPERATION IN MONGODB

SELECT SOME_COLUMN
FROM TABLE
WHERE SOME_COLUMN_NAME = 'VALUE'
INTERSECT
SELECT SOME_COLUMN
FROM TABLE
WHERE SOME_COLUMN_NAME_VALUE = 'NEW_VALUE'
How to get the common or intersection values for the 2 queries (using INTERSECT operator in SQL) in MongoDB?
INTERSECT is a keyword for SQL, how is it done for MongoDB?
As with so many things from SQL, there is no exact counterpart for SQL INTERSECT in MongoDB, but depending on the actual problem there might be an alternative solution.
MongoDB has no operations which affects more than one collection, so creating an intersection between two collections can't be done completely on the database.
When both queries come from the same collection, you could maybe do something with aggregation. What you could do would depend on what you actually want to do.
Your question seems a little off with the statements "VALUE" and "NEWVALUE" in each sub-query portion. The point of INTERSECT is is matching on the column(s) with the "same" value.
But as long as you are talking about the same collection, then you can get the intersection of tho columns using the aggregation framework like so:
db.collection.aggregate([
// Get the "sets" for each field
{ "$group": {
"_id": null,
"field1": { "$addToSet": "$field1" },
"field2": { "$addToSet": "$field2" }
}},
// Intersect the "sets"
"same": { "$setIntersection": [ "$field1", "$field2" ] }
}},
// Unwind the result set
{ "$unwind": "$same" },
// Just project the wanted field
{ "$project": { "_id": 0, "same": 1 } }
])
That does make use of the $setIntersection operator introduced in MongoDB 2.6 in order to return a "set" with the common elements from the two "sets" being compared. The $addToSet operation constructs the two sets from the "unique" values in each field.
You can essentially do the same thing if your available MongoDB version is prior to 2.6, but just with a little more work:
db.collection.aggregate([
// Group each "set"
{ "$group": {
"_id": null,
"field1": { "$addToSet": "$field1" },
"field2": { "$addToSet": "$field2" }
}},
// Unwind each set
{ "$unwind": "$field1" },
{ "$unwind": "$field2" },
// Group on the compared values
{ "$group": {
"_id": null,
"same": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$field1", "$field2" ] },
"$field1",
false
]
}
}
}},
// Unwind again, should be compacted now
{ "$unwind": "$same" },
// Filter out the "false" values
{ "$match": { "same": { "$ne": false } } },
// Just project the wanted field
{ "$project": { "_id": 0, "same": 1 } }
])
Lacking support for the "set operators" in earlier versions, you just emulate the behavior by comparing the values of the two "sets". This largely works as when you $unwind an array, what is produced is essentially a new document for each of those values. So "unwinding" an array on top of another results in documents where each element can be compared against the other.
So with the single collection form this is a perfectly valid operation in order to get the "intersection". Like all things in MongoDB though, the general gearing is towards working with a single collection at a time. The general onus is on your design to structure the data so that comparisons are made on a single collection.
Similar results can be obtained with an incremental mapReduce process over multiple collections, but as your general question seems to refer to a single table source then this would in fact be a different question to the one you appear to be asking. Also of course, it is not a single operation and involves multiple processing steps.
You would generally be advised to take a good look at the manual section on SQL to aggregation mapping. This gives many common examples and is getting better over time to add additional use cases.