MongoDB - transform array field to set - mongodb

I have documents with a field containing an array of values which can be duplicated. I want to transform these documents with an extra field corresponding to unique values of this array. I tried aggregate + addToSet without success.
Data:
{..., "random_integers" : [1, 1, 2, 2, 3, 3]},
{..., "random_integers" : [2, 3, 4, 4, 5, 6]},
{..., "random_integers" : [9, 9, 8, 8, 7, 7]}
Expecting:
{
...
"random_integers" : [1, 1, 2, 2, 3, 3],
"unique_integers" : [1, 2, 3],
},
{
...
"random_integers" : [2, 3, 4, 4, 5, 6],
"unique_integers" : [2, 3, 4, 5, 6],
},
{
...
"random_integers" : [9, 9, 8, 8, 7, 7],
"unique_integers" : [7, 8, 9],
}
Try with aggregate + addToSet():
# Query
db.getCollection().aggregate([
{
$group: {
_id: '$_id',
unique_integers: {$addToSet: '$random_integers' }
}
}
])
# Results
{..., "unique_integers" : [[1, 1, 2, 2, 3, 3]]},
{..., "unique_integers" : [[2, 3, 4, 4, 5, 6]]},
{..., "unique_integers" : [[9, 9, 8, 8, 7, 7]]}
$addToSet add the whole list into a set, instead of each element of the array. I tried to combine $addToSet with $each but it is not recognize by mongo on a group:
# Query
db.getCollection().aggregate([
{
$group: {
_id: '$_id',
unique_integers: {$addToSet: { $each: '$random_integers' }}
}
}
])
# Error
Error: command failed: {
"ok" : 0,
"errmsg" : "Unrecognized expression '$each'",
"code" : 168,
"codeName" : "InvalidPipelineOperator"
} : aggregate failed

db.ints.aggregate( [
{ $project: {
random_integers: 1,
unique_integers: { $setIntersection: [ "$random_integers", "$random_integers" ] },
_id: 0
} }
] )

Related

Seq sortWith function with strange behaviour

I was trying to sort elements of a Seq object with the sortWith function when I got an exception. I didn't use the sorted function because the code below is a simplification of the real code where the seq has tuples instead of ints.
See below that in the last two cases, when comparing with (v1 <= v2) an exception is thrown, but when comparing with (v1 < v2) no exception is thrown.
heitor#heitor-340XAA-350XAA-550XAA:~$ sbt console
[info] welcome to sbt 1.6.2 (Ubuntu Java 11.0.11)
[info] loading settings for project global-plugins from sbt-updates.sbt ...
[info] loading global plugins from /home/heitor/.sbt/1.0/plugins
[info] loading project definition from /home/heitor/project
[info] loading settings for project root from build.sbt ...
[info] set current project to example (in build file:/home/heitor/)
[info] Starting scala interpreter...
Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.11).
Type in expressions for evaluation. Or try :help.
scala> val lst69 = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1)
val lst69: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1)
scala> lst69.size
val res0: Int = 69
scala> val lst68 = lst69.take(68)
val lst68: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1)
scala> lst68.size
val res1: Int = 68
scala> lst68.sorted
val res2: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst68.sortWith{ case (v1,v2) => (v1 <= v2) }
val res3: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst69.sorted
val res4: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst69.sortWith{ case (v1,v2) => (v1 <= v2) }
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.base/java.util.TimSort.mergeHi(TimSort.java:903)
at java.base/java.util.TimSort.mergeAt(TimSort.java:520)
at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461)
at java.base/java.util.TimSort.sort(TimSort.java:254)
at java.base/java.util.Arrays.sort(Arrays.java:1441)
at scala.collection.SeqOps.sorted(Seq.scala:700)
at scala.collection.SeqOps.sorted$(Seq.scala:692)
at scala.collection.immutable.List.scala$collection$immutable$StrictOptimizedSeqOps$$super$sorted(List.scala:79)
at scala.collection.immutable.StrictOptimizedSeqOps.sorted(StrictOptimizedSeqOps.scala:78)
at scala.collection.immutable.StrictOptimizedSeqOps.sorted$(StrictOptimizedSeqOps.scala:78)
at scala.collection.immutable.List.sorted(List.scala:79)
at scala.collection.SeqOps.sortWith(Seq.scala:727)
at scala.collection.SeqOps.sortWith$(Seq.scala:727)
at scala.collection.AbstractSeq.sortWith(Seq.scala:1161)
... 59 elided
scala> lst69.sortWith{ case (v1,v2) => (v1 < v2) }
val res6: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> ```
You can try instead:
lst69.sortWith{ case (v1,v2) => (v1 < v2 || v1 == v2 ) }

What does $$this mean in MongoDB Aggregation?

In a Mongo Aggregation example I've encountered the expression $$this, but cannot find a reference to it in the MongoDB documentation (not even in the documentation about aggregation variables)
Here is the sample data:
{ "_id" : 1, "actions" : [ 2, 6, 3, 8, 5, 3 ] }
{ "_id" : 2, "actions" : [ 6, 4, 2, 8, 4, 3 ] }
{ "_id" : 3, "actions" : [ 6, 4, 6, 4, 3 ] }
{ "_id" : 4, "actions" : [ 6, 8, 3 ] }
{ "_id" : 5, "actions" : [ 6, 8 ] }
{ "_id" : 6, "actions" : [ 6, 3, 11, 8, 3 ] }
{ "_id" : 7, "actions" : [ 6, 3, 8 ] }
Here is the code
Here is the code I'm looking at:
db.test.aggregate([
{$match:{actions:{$all:[6,3,8]}}},
{$project:{actions638:{$map:{
input:{$range:[0,{$subtract:[{$size:"$actions"},2]}]},
in:{$slice:["$actions","$$this",3]}
}}}}
])
and here is the output
{ "_id" : 1, "actions638" : [ [ 2, 6, 3 ], [ 6, 3, 8 ], [ 3, 8, 5 ], [ 8, 5, 3 ] ] }
{ "_id" : 2, "actions638" : [ [ 6, 4, 2 ], [ 4, 2, 8 ], [ 2, 8, 4 ], [ 8, 4, 3 ] ] }
{ "_id" : 4, "actions638" : [ [ 6, 8, 3 ] ] }
{ "_id" : 6, "actions638" : [ [ 6, 3, 11 ], [ 3, 11, 8 ], [ 11, 8, 3 ] ] }
{ "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }
$$this refers to the current item inside the array that is being processed by the $map function.
An alternative is to use the as property so that instead of referring to $$this you refer to the name you provided in the as. For example (from the docs)
db.grades.aggregate(
[
{ $project:
{ adjustedGrades:
{
$map:
{
input: "$quizzes",
as: "grade",
in: { $add: [ "$$grade", 2 ] }
}
}
}
}
]
)

Find entry's count of array elements in each array in documents

I have collection with such documents:
[
{p: [1, 2, 3, 4]},
{p: [1, 2, 7, 9, 10]},
{p: [3, 5]}
]
I want to know how many times each element of p in all documents appear in other document's p. The right result should be collection with such elements:
[
{pElement: 1, count: 2},
{pElement: 2, count: 2},
{pElement: 3, count: 2},
{pElement: 4, count: 1},
{pElement: 7, count: 1},
{pElement: 9, count: 1},
{pElement: 10, count: 1},
{pElement: 5, count: 1}
]
How can I achieve that?
You should use an Aggregation Pipeline with the following stages:
Decompose the p arrays and generate one document for each element. You can use $unwind operator in order to do that.
Group the generated documents based on the p value and count the occurrence of each one using the $group operator and $sum accumulator operator.
Reshape the previous stage result to look like {pElement: p, count: c} using the $project operator.
And sort them based on the count value using $sort operator.
The final aggregation code would look like:
db.collectionName.aggregate([
{ $unwind: "$p" },
{ $group: { _id: "$p", count: { $sum: 1 } } },
{ $project: { _id: 0, pElement: "$_id", count: 1 } },
{ $sort: { count: -1 } }
])
The result would be:
{ "count" : 2, "pElement" : 3 }
{ "count" : 2, "pElement" : 2 }
{ "count" : 2, "pElement" : 1 }
{ "count" : 1, "pElement" : 5 }
{ "count" : 1, "pElement" : 10 }
{ "count" : 1, "pElement" : 9 }
{ "count" : 1, "pElement" : 7 }
{ "count" : 1, "pElement" : 4 }

How to ensureIndex a list of geometry

I have a mongodb collection as below
{
"_id" : ObjectId(...),
"gemetryCollectionId" : 1,
"geometry" : [{
"type" : "Polygon",
"coordinates" : [[[2, 3], [4, 4], [4, 3], [2, 3]]]
}]
}
How do I ensure index for the geometry list?
It doesn't work, If I do it like
db.collectionName.ensureIndex({"geometry":"2dsphere"});
You are giving geometry as an array. Try creating the index by creating it as an object only. something like this:
{
"_id" : ObjectId(...),
"gemetryCollectionId" : 1,
"geometry" : {
"type" : "Polygon",
"coordinates" : [[[2, 3], [4, 4], [4, 3], [2, 3]]]
}
}
It will work then.
Thanks

MongoDB - select document where all values in field array are present in a given array

I have documents like
{
foo : [1, 2]
}
{
foo : [2, 3]
}
Given an array like
[2, 3, 4]
How would I select only the second document? i.e. select only the documents where all the values in foo match values in a given array.
Basically there are some ways to match array. There is no exact solution for your need.
Considering you have documents like :
{ "_id" : ObjectId("51b05a712961f4704684d901"), "x" : [ 6, 7, 8, 9 ] }
{ "_id" : ObjectId("51b05a712961f4704684d902"), "x" : [ 7, 8, 9, 10 ] }
{ "_id" : ObjectId("51b05a712961f4704684d903"), "x" : [ 8, 9, 10, 11 ] }
You can use query1 like:
db.collection.find({x:[3,4,5,6]})
The result is exact match only for arrays like x
result1:
{ "_id" : ObjectId("51b05a712961f4704684d8fe"), "x" : [ 3, 4, 5, 6 ] }
query1 will not match :
{ "_id" : ObjectId("51b05a712961f4704684d8fe"), "x" : [ 3, 4, 5] }
{ "_id" : ObjectId("51b05a712961f4704684d8fe"), "x" : [ 3, 4, 5, 6, 7] }
You can use : query2 like:
db.t.find({x:{$all:[3,4]}})
result2 can be:
{ "_id" : ObjectId("51b05a722961f4704684daf1"), "x" : [ 3, 4, 5, 6 ] }
{ "_id" : ObjectId("51b05c332961f4704684dce4"), "x" : [ 3, 4, 5 ] }
{ "_id" : ObjectId("51b05c772961f4704684dce5"), "x" : [ 3, 4, 5, 6, 7 ] }
You can use : query3 like:
db.t.find({x:{$in:[3,4]}})
Result3 would look like:
{ "_id" : ObjectId("51b05a722961f4704684daf1"), "x" : [ 3, 4, 5, 6 ] }
{ "_id" : ObjectId("51b05a722961f4704684daf2"), "x" : [ 4, 5, 6, 7 ] }
See this question also : mongodb array matching
So there is an open/unresolved ticket for a $subset operator which does what you likely to.