Query for documents where match contiguous array elements - mongodb

I have a MongoDB collection with documents in the following format:
{ "_id" : 1, "tokens": [ "I", "have", "a", "dream" ] },
{ "_id" : 2, "tokens": [ "dream", "a", "little", "dream" ] },
{ "_id" : 3, "tokens": [ "dream", "a", "dream" ] },
{ "_id" : 4, "tokens": [ "a" , "little", "dream" ] },
...
I need to get all doucuments which "tokens" include contiguous array elements: "a", "dream".
So, the following are matched doucuments:
{ "_id" : 1, "tokens": [ "I", "have", "a", "dream" ] },
{ "_id" : 3, "tokens": [ "dream", "a", "dream" ] },
Is there a way to get the right results?

A trick that is to have a regexp.
$match to get the all documents which has $all array input
$addFields to have a duplicate the tokens and input array
$reduce helps to concat all string joining -
$regexMatch to match both strings
$match to eliminate unwanted data
$project to get necessary fields only
The code is
[{
$match: {
tokens: { $all: ["a", "dream"] }
}
}, {
$addFields: {
duplicate: "$tokens",
inputData: ["a", "dream"]
}
}, {
$addFields: {
duplicate: {
$reduce: {
input: "$duplicate",
initialValue: "",
in: { $concat: ["$$value", "-", "$$this"] }
}
},
inputData: {
$reduce: {
input: "$inputData",
initialValue: "",
in: { $concat: ["$$value", "-", "$$this"] }
}
}
}
}, {
$addFields: {
match: {
$regexMatch: { input: "$duplicate", regex: '$inputData' }
}
}
}, {
$match: {
match: true
}
}, {
$project: { _id: 1, tokens: 1 }
}]
Working Mongo playground
Note: Do check multiple scenarios although its working for this scenario

Related

Concat int array field inside an array of object into one string field in mongodb aggregate

I would like to concat int array field values inside an array of objects into one string field after dividing them (by 10).
Heres the existing document format:
{
"no" : "2020921008981",
"date" : ISODate("2020-04-01T05:19:02.263+0000"),
"sale" : {
"soldItems" : [
{
"itemRefId" : "5b55ac7f0550de00210a3b24",
"soldPrice" : NumberInt(800),
},
{
"itemRefId" : "5b55ac7f0550de00210a3b25",
"soldPrice" : NumberInt(1000),
}
]
}
}
Expected result :
{
"no" : "2020921008981",
"date" : ISODate("2020-04-01T05:19:02.263+0000"),
"priceList" : "8.0 \n 10.0"
}
The attempt with $reduce :
priceList: {
$reduce: {
input: "$sale.soldItems.soldPrice",
initialValue: "",
in: {
$cond: [ { "$eq": [ { $toString: { $divide: [ "$$value", 10 ] } }, "" ] }, "$$this", { $concat: [ { $toString: { $divide: [ "$$value", 10 ] } }, "\n", "$$this" ] } ]
}
}
}
But end up getting "errmsg" : "$divide only supports numeric types, not string and double" error. Any idea would be appreciated.
db.case.aggregate([
{
$set: {
priceList: {
$reduce: {
input: {
$map: {
input: "$sale.soldItems.soldPrice",
in: { $toString: { $divide: ["$$this", 10] } }
}
},
initialValue: "",
in: { $concat: ["$$value", "$$this", " \n "] }
}
}
}
},
{
$project: {
_id: 0,
no: 1,
date: 1,
priceList: 1
}
}
])
Try the following aggregation query, where the idea is to:
First divide the field soldPrice by 10 or required divisor using $divide
Convert it into string and concat using $toString and $concat
the appender \n gets appended after each reduce op,remove that from the end using $rtrim
create the new field using $addFields
Query:
db.collection.aggregate([
{
$addFields: {
"itemPriceList": {
$rtrim: {
input: {
$reduce: {
input: "$salesOrder.purchaseItems",
initialValue: "",
in: {
$concat: [
"$$value",
{
$toString: {
$divide: [
"$$this.soldPrice",
10
]
}
},
"\n"
]
}
}
},
chars: "\n"
}
}
}
}
]);
Result:
[
{
"_id": ObjectId("5a934e000102030405000000"),
"caseNumber": "2020921008981",
"itemPriceList": "80\n100",
"salesOrder": {
"purchaseItems": [
{
"itemRefId": "5b55ac7f0550de00210a3b24",
"soldPrice": 800
},
{
"itemRefId": "5b55ac7f0550de00210a3b25",
"soldPrice": 1000
}
]
},
"startTime": ISODate("2016-05-18T16:00:00Z")
}
]
Plaground Test Link

Mongo remove duplicates in array of objects based on field

New to Mongo, have found lots of examples of removing dupes from arrays of strings using the aggregation framework, but am wondering if possible to remove dupes from array of objects based on a field in the object. Eg
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
},
{
"id" : 1,
"val" : "xxxxxx"
}
]
}
Here I'd like to remove dupes based on the id field. So would end up with
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
}
]
}
Picking the first/any object with given id works. Just want to end up with one per id. Is this doable in aggregation framework? Or even outside aggregation framework, just looking for a clean way to do this. Need to do this type of thing across many documents in collection, which seems like a good use case for aggregation framework, but as I mentioned, newbie here...thanks.
Well, you may get desired result 2 ways.
Classic
Flatten - Remove duplicates (pick first occurrence) - Group by
db.collection.aggregate([
{
$unwind: "$values"
},
{
$group: {
_id: "$values.id",
values: {
$first: "$values"
},
id: {
$first: "$_id"
},
name: {
$first: "$name"
}
}
},
{
$group: {
_id: "$id",
name: {
$first: "$name"
},
values: {
$push: "$values"
}
}
}
])
MongoPlayground
Modern
We need to use $reduce operator.
Pseudocode:
values : {
var tmp = [];
for (var value in values) {
if !(value.id in tmp)
tmp.push(value);
}
return tmp;
}
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$concatArrays: [
"$$value",
{
$cond: [
{
$in: [
"$$this.id",
"$$value.id"
]
},
[],
[
"$$this"
]
]
}
]
}
}
}
}
}
])
MongoPlayground
You can use $reduce, Try below query :
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$cond: [
{ $in: ["$$this.id", "$$value.id"] }, /** Check if 'id' exists in holding array if yes push same array or concat holding array with & array of new object */
"$$value",
{ $concatArrays: ["$$value", ["$$this"]] }
]
}
}
}
}
}
]);
Test : MongoDB-Playground

Count Both Outer and Inner embedded array in a single query

{
_id: ObjectId("5dbdacc28cffef0b94580dbd"),
"comments" : [
{
"_id" : ObjectId("5dbdacc78cffef0b94580dbf"),
"replies" : [
{
"_id" : ObjectId("5dbdacd78cffef0b94580dc0")
},
]
},
]
}
How to count the number of element in comments and sum with number of relies
My approach is do 2 query like this:
1. total elements of replies
db.posts.aggregate([
{$match: {_id:ObjectId("5dbdacc28cffef0b94580dbd")}},
{ $unwind: "$comments",},
{$project:{total:{$size:"$comments.replies"} , _id: 0} }
])
2. count total elements of comments
db.posts.aggregate([
{$match: {_id:ObjectId("5dbdacc28cffef0b94580dbd")}},
{$project:{total:{$size:"$comments.replies"} , _id: 0} }
])
Then sum up both, do we have any better solution to write the query like return the sum of of total element comments + replies
You can use $reduce and $concatArrays to "merge" an inner "array of arrays" into a single list and measure the $size of that. Then simply $add the two results together:
db.posts.aggregate([
{ "$match": { _id:ObjectId("5dbdacc28cffef0b94580dbd") } },
{ "$addFields": {
"totalBoth": {
"$add": [
{ "$size": "$comments" },
{ "$size": {
"$reduce": {
"input": "$comments.replies",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this" ]
}
}
}}
]
}
}}
])
Noting that an "array of arrays" is the effect of an expression like $comments.replies, so hence the operation to make these into a single array where you can measure all elements.
Try using the $unwind to flatten the list you get from the $project before using $count.
This is another way of getting the result.
Input documents:
{ "_id" : 1, "array1" : [ { "array2" : [ { id: "This is a test!"}, { id: "test1" } ] }, { "array2" : [ { id: "This is 2222!"}, { id: "test 222" }, { id: "222222" } ] } ] }
{ "_id" : 2, "array1" : [ { "array2" : [ { id: "aaaa" }, { id: "bbbb" } ] } ] }
The query:
db.arrsizes2.aggregate( [
{ $facet: {
array1Sizes: [
{ $project: { array1Size: { $size: "$array1" } } }
],
array2Sizes: [
{ $unwind: "$array1" },
{ $project: { array2Size: { $size: "$array1.array2" } } },
],
} },
{ $project: { result: { $concatArrays: [ "$array1Sizes", "$array2Sizes" ] } } },
{ $unwind: "$result" },
{ $group: { _id: "$result._id", total1: { $sum: "$result.array1Size" }, total2: { $sum: "$result.array2Size" } } },
{ $addFields: { total: { $add: [ "$total1", "$total2" ] } } },
] )
The output:
{ "_id" : 2, "total1" : 1, "total2" : 2, "total" : 3 }
{ "_id" : 1, "total1" : 2, "total2" : 5, "total" : 7 }

In MongoDB aggregation pipeline, how to project indices of embedded array that matched?

In a mongodb aggregation pipeline, I want to $project the indices of an embedded array (a sub-document) that matches a previous $match stage.
Say, I have the following docs.
{_id: '1', tags: ['aaa', 'bbb', 'ccc']},
{_id: '2', tags: ['baa', 'aaa', 'aaa']},
{_id: '3', tags: ['aac', 'cbb', 'aca']},
Now, if I query with {tags: 'aaa'}, I want to output something similar to
{_id: '1', tags: [0]},
{_id: '2', tags: [1, 2]}
db.inventory.aggregate([
{ $match : {tags : 'aaa' }},
{ $unwind : { path: "$tags", includeArrayIndex: "arrayIndex"}},
{ $match : {tags : 'aaa' }},
{ $group : {
_id : '$_id',
tags : { $push : '$arrayIndex'}
}
}
])
Output :
{ "_id" : "2", "tags" : [ NumberLong(1), NumberLong(2) ] }
{ "_id" : "1", "tags" : [ NumberLong(0) ] }
Another way :
db.inventory.aggregate([
{ $match : {tags : 'aaa' }},
{ $project : {
tags: {
$filter: {
input: {
$zip: {
inputs: [ "$tags", { $range: [0, { $size: "$tags" }] } ]
}
},
as: "tagWithIndex",
cond: {
$let: {
vars: {
tag : { $arrayElemAt: [ "$$tagWithIndex", 0 ] }
},
in: { $eq: [ "$$tag", 'aaa' ] }
}
}
}
}
}},
{ $unwind : '$tags'},
{ $group : {
_id : '$_id',
tags : {
$push : { $arrayElemAt: [ "$tags", 1]}
}
}
}
])
Output :
{ "_id" : "2", "tags" : [ 1, 2 ] }
{ "_id" : "1", "tags" : [ 0 ] }
hope this helps.
You need to $map over the $size of the $tags array to include index of the each element inside the tags array and then you can easily use $filter aggregation to exclude the elements which do contain letter aaa
db.collection.aggregate([
{ "$match": { "tags": "aaa" }},
{ "$project": {
"tags": {
"$filter": {
"input": {
"$map": {
"input": { "$range": [0, { "$size": "$tags" }] },
"in": {
"string": { "$arrayElemAt": ["$tags", "$$this"] },
"index": "$$this"
}
}
},
"cond": { "$eq": ["$$this.string", "aaa"] }
}
}
}},
{ "$project": { "tags": "$tags.index" }}
])
Output
[
{
"_id": "1",
"tags": [0]
},
{
"_id": "2",
"tags": [1, 2]
}
]
If you're searching for an array, you should use $in.
db.inventory.find( { tags: { $in: [ 'aaa' ] } } )
You can also write the same in the match. spelling is the same.
Will help for detail. That's what you're looking for.
Source : https://docs.mongodb.com/manual/reference/operator/query/in/
db.inventory.find( { "tags": { $in: 'aaa' } },
{ "tags.$": 1 } )
This is probably what you want.

Intersection of several arrays

I have some documents having a array protperty Items.
I want to get the intercept between n docuements.
db.things.insert({name:"A", items:[1,2,3,4,5]})
db.things.insert({name:"B", items:[2,4,6,8]})
db.things.insert({name:"C", items:[1,2]})
db.things.insert({name:"D", items:[5,6]})
db.things.insert({name:"E", items:[9,10]})
db.things.insert({name:"F", items:[1,5]})
Data:
{ "_id" : ObjectId("57974a0d356baff265710a1c"), "name" : "A", "items" : [ 1, 2, 3, 4, 5 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1d"), "name" : "B", "items" : [ 2, 4, 6, 8 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1e"), "name" : "C", "items" : [ 1, 2 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1f"), "name" : "D", "items" : [ 5, 6 ] },
{ "_id" : ObjectId("57974a0d356baff265710a20"), "name" : "E", "items" : [ 9, 10 ] },
{ "_id" : ObjectId("57974a1a356baff265710a21"), "name" : "F", "items" : [ 1, 5 ] }
For example:
things.mane.A intercept things.mane.C intercept things.mane.F:
[ 1, 2, 3, 4, 5 ] intercept [ 1, 2 ] intercept [ 1, 5 ]
Must be: [1]
I think that it's doable using $setIntersectionbut I can't find the way.
I can do it with two documents but how to do it with more ?
db.things.aggregate({$match:{"name":{$in:["A", "F"]}}},
{$group:{_id:null, "setA":{$first:"$items"}, "setF":{$last:"$items"} } },
{
"$project": {
"set1": 1,
"set2": 1,
"commonToBoth": { "$setIntersection": [ "$setA", "$setF" ] },
"_id": 0
}
}
)
{ "commonToBoth" : [ 5, 1 ] }
A solution which is not specific to the number of input items could look like so:
db.things.aggregate(
{
$match: {
"name": {
$in: ["A", "F"]
}
}
},
{
$group: {
_id: "$items",
count: {
$sum: 1
}
}
},
{
$group: {
_id: null,
totalCount: {
$sum: "$count"
},
items: {
$push: "$_id"
}
}
},
{
$unwind: {
path: "$items"
}
},
{
$unwind: {
path: "$items"
}
},
{
$group: {
_id: "$items",
totalCount: {
$first: "$totalCount"
},
count: {
$sum: 1
}
}
},
{
$project: {
_id: 1,
presentInAllDocs: {
$eq: ["$totalCount", "$count"]
}
}
},
{
$match: {
presentInAllDocs: true
}
},
{
$group: {
_id: null,
items: {
$push: "$_id"
}
}
}
)
which will output this
{
"_id" : null,
"items" : [
5,
1
]
}
Of course you can add a last $project stage to bring the result into the desired shape.
Explanation
The basic idea behind this is that when we count the number of documents and we count the number of occurrences of each item, then the items with a count equal to the total document count appeared in each document and are therefore in the intersection result.
This idea has one important assumption: your items arrays have no duplicates in it (i.e. they are sets). If this assumption is wrong, then you would have to insert an additional stage at the beginning of the pipeline to turn the arrays into sets.
One could also build this pipeline in a different and probably shorter way but I tried to keep the resource usage as low as possible and therefore added possibly unnecessary (from the functional point of view) stages. For example, the second stage groups by the items array as my assumption is that there are far fewer different values/arrays than documents so the rest of the pipeline has to work with a fraction of the initial document count. However, from the functional point of view, we just need the total count of documents and therefore we could skip that stage and just make a $group stage counting all documents and pushing them into an array for later usage - which of course is a big hit for memory consumption as we have now an array of all possible documents.
If your are using mongo 3.2, you could use arrayElemAt to precise all arguments of $setIntersection :
db.things.aggregate([{
$match: {
"name": {
$in: ["A", "B", "C"]
}
}
}, {
$group: {
_id: 0,
elements: {
$push: "$items"
}
}
}, {
$project: {
intersect: {
$setIntersection: [{
"$arrayElemAt": ["$elements", 0]
}, {
"$arrayElemAt": ["$elements", 1]
}, {
"$arrayElemAt": ["$elements", 2]
}]
},
}
}]);
You would have to dynamically add the require number of JsonObject with index such as :
{
"$arrayElemAt": ["$elements", <index>]
}
It should match with the number of elements of your input items in ["A", "B", "C"]
If you want to deal with duplicates (some name are present multiple time), regroup all your items by name, $unwind twice and $addToSet to merge all array for a specific $name before executing the previous aggregation :
db.things.aggregate([{
$match: {
"name": {
$in: ["A", "B", "C"]
}
}
}, {
$group: {
_id: "$name",
"items": {
"$push": "$items"
}
}
}, {
"$unwind": "$items"
}, {
"$unwind": "$items"
}, {
$group: {
_id: "$_id",
items: {
$addToSet: "$items"
}
}
}, {
$group: {
_id: 0,
elements: {
$push: "$items"
}
}
}, {
$project: {
intersect: {
$setIntersection: [{
"$arrayElemAt": ["$elements", 0]
}, {
"$arrayElemAt": ["$elements", 1]
}, {
"$arrayElemAt": ["$elements", 2]
}]
},
}
}]);
It isn't a clean solution but it works