MongoDB - transform array field to set - mongodb
I have documents with a field containing an array of values which can be duplicated. I want to transform these documents with an extra field corresponding to unique values of this array. I tried aggregate + addToSet without success.
Data:
{..., "random_integers" : [1, 1, 2, 2, 3, 3]},
{..., "random_integers" : [2, 3, 4, 4, 5, 6]},
{..., "random_integers" : [9, 9, 8, 8, 7, 7]}
Expecting:
{
...
"random_integers" : [1, 1, 2, 2, 3, 3],
"unique_integers" : [1, 2, 3],
},
{
...
"random_integers" : [2, 3, 4, 4, 5, 6],
"unique_integers" : [2, 3, 4, 5, 6],
},
{
...
"random_integers" : [9, 9, 8, 8, 7, 7],
"unique_integers" : [7, 8, 9],
}
Try with aggregate + addToSet():
# Query
db.getCollection().aggregate([
{
$group: {
_id: '$_id',
unique_integers: {$addToSet: '$random_integers' }
}
}
])
# Results
{..., "unique_integers" : [[1, 1, 2, 2, 3, 3]]},
{..., "unique_integers" : [[2, 3, 4, 4, 5, 6]]},
{..., "unique_integers" : [[9, 9, 8, 8, 7, 7]]}
$addToSet add the whole list into a set, instead of each element of the array. I tried to combine $addToSet with $each but it is not recognize by mongo on a group:
# Query
db.getCollection().aggregate([
{
$group: {
_id: '$_id',
unique_integers: {$addToSet: { $each: '$random_integers' }}
}
}
])
# Error
Error: command failed: {
"ok" : 0,
"errmsg" : "Unrecognized expression '$each'",
"code" : 168,
"codeName" : "InvalidPipelineOperator"
} : aggregate failed
db.ints.aggregate( [
{ $project: {
random_integers: 1,
unique_integers: { $setIntersection: [ "$random_integers", "$random_integers" ] },
_id: 0
} }
] )
Related
Seq sortWith function with strange behaviour
I was trying to sort elements of a Seq object with the sortWith function when I got an exception. I didn't use the sorted function because the code below is a simplification of the real code where the seq has tuples instead of ints. See below that in the last two cases, when comparing with (v1 <= v2) an exception is thrown, but when comparing with (v1 < v2) no exception is thrown. heitor#heitor-340XAA-350XAA-550XAA:~$ sbt console [info] welcome to sbt 1.6.2 (Ubuntu Java 11.0.11) [info] loading settings for project global-plugins from sbt-updates.sbt ... [info] loading global plugins from /home/heitor/.sbt/1.0/plugins [info] loading project definition from /home/heitor/project [info] loading settings for project root from build.sbt ... [info] set current project to example (in build file:/home/heitor/) [info] Starting scala interpreter... Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.11). Type in expressions for evaluation. Or try :help. scala> val lst69 = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1) val lst69: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1) scala> lst69.size val res0: Int = 69 scala> val lst68 = lst69.take(68) val lst68: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1) scala> lst68.size val res1: Int = 68 scala> lst68.sorted val res2: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10) scala> lst68.sortWith{ case (v1,v2) => (v1 <= v2) } val res3: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10) scala> lst69.sorted val res4: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10) scala> lst69.sortWith{ case (v1,v2) => (v1 <= v2) } java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.base/java.util.TimSort.mergeHi(TimSort.java:903) at java.base/java.util.TimSort.mergeAt(TimSort.java:520) at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461) at java.base/java.util.TimSort.sort(TimSort.java:254) at java.base/java.util.Arrays.sort(Arrays.java:1441) at scala.collection.SeqOps.sorted(Seq.scala:700) at scala.collection.SeqOps.sorted$(Seq.scala:692) at scala.collection.immutable.List.scala$collection$immutable$StrictOptimizedSeqOps$$super$sorted(List.scala:79) at scala.collection.immutable.StrictOptimizedSeqOps.sorted(StrictOptimizedSeqOps.scala:78) at scala.collection.immutable.StrictOptimizedSeqOps.sorted$(StrictOptimizedSeqOps.scala:78) at scala.collection.immutable.List.sorted(List.scala:79) at scala.collection.SeqOps.sortWith(Seq.scala:727) at scala.collection.SeqOps.sortWith$(Seq.scala:727) at scala.collection.AbstractSeq.sortWith(Seq.scala:1161) ... 59 elided scala> lst69.sortWith{ case (v1,v2) => (v1 < v2) } val res6: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10) scala> ```
You can try instead: lst69.sortWith{ case (v1,v2) => (v1 < v2 || v1 == v2 ) }
What does $$this mean in MongoDB Aggregation?
In a Mongo Aggregation example I've encountered the expression $$this, but cannot find a reference to it in the MongoDB documentation (not even in the documentation about aggregation variables) Here is the sample data: { "_id" : 1, "actions" : [ 2, 6, 3, 8, 5, 3 ] } { "_id" : 2, "actions" : [ 6, 4, 2, 8, 4, 3 ] } { "_id" : 3, "actions" : [ 6, 4, 6, 4, 3 ] } { "_id" : 4, "actions" : [ 6, 8, 3 ] } { "_id" : 5, "actions" : [ 6, 8 ] } { "_id" : 6, "actions" : [ 6, 3, 11, 8, 3 ] } { "_id" : 7, "actions" : [ 6, 3, 8 ] } Here is the code Here is the code I'm looking at: db.test.aggregate([ {$match:{actions:{$all:[6,3,8]}}}, {$project:{actions638:{$map:{ input:{$range:[0,{$subtract:[{$size:"$actions"},2]}]}, in:{$slice:["$actions","$$this",3]} }}}} ]) and here is the output { "_id" : 1, "actions638" : [ [ 2, 6, 3 ], [ 6, 3, 8 ], [ 3, 8, 5 ], [ 8, 5, 3 ] ] } { "_id" : 2, "actions638" : [ [ 6, 4, 2 ], [ 4, 2, 8 ], [ 2, 8, 4 ], [ 8, 4, 3 ] ] } { "_id" : 4, "actions638" : [ [ 6, 8, 3 ] ] } { "_id" : 6, "actions638" : [ [ 6, 3, 11 ], [ 3, 11, 8 ], [ 11, 8, 3 ] ] } { "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }
$$this refers to the current item inside the array that is being processed by the $map function. An alternative is to use the as property so that instead of referring to $$this you refer to the name you provided in the as. For example (from the docs) db.grades.aggregate( [ { $project: { adjustedGrades: { $map: { input: "$quizzes", as: "grade", in: { $add: [ "$$grade", 2 ] } } } } } ] )
Find entry's count of array elements in each array in documents
I have collection with such documents: [ {p: [1, 2, 3, 4]}, {p: [1, 2, 7, 9, 10]}, {p: [3, 5]} ] I want to know how many times each element of p in all documents appear in other document's p. The right result should be collection with such elements: [ {pElement: 1, count: 2}, {pElement: 2, count: 2}, {pElement: 3, count: 2}, {pElement: 4, count: 1}, {pElement: 7, count: 1}, {pElement: 9, count: 1}, {pElement: 10, count: 1}, {pElement: 5, count: 1} ] How can I achieve that?
You should use an Aggregation Pipeline with the following stages: Decompose the p arrays and generate one document for each element. You can use $unwind operator in order to do that. Group the generated documents based on the p value and count the occurrence of each one using the $group operator and $sum accumulator operator. Reshape the previous stage result to look like {pElement: p, count: c} using the $project operator. And sort them based on the count value using $sort operator. The final aggregation code would look like: db.collectionName.aggregate([ { $unwind: "$p" }, { $group: { _id: "$p", count: { $sum: 1 } } }, { $project: { _id: 0, pElement: "$_id", count: 1 } }, { $sort: { count: -1 } } ]) The result would be: { "count" : 2, "pElement" : 3 } { "count" : 2, "pElement" : 2 } { "count" : 2, "pElement" : 1 } { "count" : 1, "pElement" : 5 } { "count" : 1, "pElement" : 10 } { "count" : 1, "pElement" : 9 } { "count" : 1, "pElement" : 7 } { "count" : 1, "pElement" : 4 }
How to ensureIndex a list of geometry
I have a mongodb collection as below { "_id" : ObjectId(...), "gemetryCollectionId" : 1, "geometry" : [{ "type" : "Polygon", "coordinates" : [[[2, 3], [4, 4], [4, 3], [2, 3]]] }] } How do I ensure index for the geometry list? It doesn't work, If I do it like db.collectionName.ensureIndex({"geometry":"2dsphere"});
You are giving geometry as an array. Try creating the index by creating it as an object only. something like this: { "_id" : ObjectId(...), "gemetryCollectionId" : 1, "geometry" : { "type" : "Polygon", "coordinates" : [[[2, 3], [4, 4], [4, 3], [2, 3]]] } } It will work then. Thanks
MongoDB - select document where all values in field array are present in a given array
I have documents like { foo : [1, 2] } { foo : [2, 3] } Given an array like [2, 3, 4] How would I select only the second document? i.e. select only the documents where all the values in foo match values in a given array.
Basically there are some ways to match array. There is no exact solution for your need. Considering you have documents like : { "_id" : ObjectId("51b05a712961f4704684d901"), "x" : [ 6, 7, 8, 9 ] } { "_id" : ObjectId("51b05a712961f4704684d902"), "x" : [ 7, 8, 9, 10 ] } { "_id" : ObjectId("51b05a712961f4704684d903"), "x" : [ 8, 9, 10, 11 ] } You can use query1 like: db.collection.find({x:[3,4,5,6]}) The result is exact match only for arrays like x result1: { "_id" : ObjectId("51b05a712961f4704684d8fe"), "x" : [ 3, 4, 5, 6 ] } query1 will not match : { "_id" : ObjectId("51b05a712961f4704684d8fe"), "x" : [ 3, 4, 5] } { "_id" : ObjectId("51b05a712961f4704684d8fe"), "x" : [ 3, 4, 5, 6, 7] } You can use : query2 like: db.t.find({x:{$all:[3,4]}}) result2 can be: { "_id" : ObjectId("51b05a722961f4704684daf1"), "x" : [ 3, 4, 5, 6 ] } { "_id" : ObjectId("51b05c332961f4704684dce4"), "x" : [ 3, 4, 5 ] } { "_id" : ObjectId("51b05c772961f4704684dce5"), "x" : [ 3, 4, 5, 6, 7 ] } You can use : query3 like: db.t.find({x:{$in:[3,4]}}) Result3 would look like: { "_id" : ObjectId("51b05a722961f4704684daf1"), "x" : [ 3, 4, 5, 6 ] } { "_id" : ObjectId("51b05a722961f4704684daf2"), "x" : [ 4, 5, 6, 7 ] } See this question also : mongodb array matching So there is an open/unresolved ticket for a $subset operator which does what you likely to.