How to specify how much of the prior in t-1 goes onto the prior for t - winbugs

I want to be able to specify how much of the "alpha" in t-1 goes onto the prior for t. I tried alpha[t,1:k] <- theta[t-1,1:k]*alpha0*n[t-1]; alpha0<-.5, but it didn't work.
model {
for(t in 1:T){
alpha[t,1:k] ~ ddirch(theta[1:k])
#alpha[t,1:k] <- theta[t-1,1:k]*alpha0*n[t-1]; alpha0<-.5
y[t,1:k] ~ dmulti(alpha[t,1:k], n[t])
#N <- sum(y[1:k])
delta[t]<-step(alpha[t,1]-alpha[t,2]-alpha[t,3]-alpha[t,4]-alpha[t,6])
}
#gamma prior for the alpha vector
for(i in 1:k){
theta[i] ~ dgamma(0,.01)
}
}
# DATA
list(k=6, T=4,
n = c(600,600,600, 600),
y= structure(.Data=c(
100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100),
.Dim = c(4, 6)) )
# INITIALIZATION: Informative Prior
list(theta=c(.270,.160,.270,.09,.02,.08))

Related

In Spark, how do I transform my RDD into a list of differences between RDD items?

Suppose I have an RDD of integers that looks like this:
10, 20, 30, 40, 50, 60, 70, 80 ...
(ie there is a stream of different integers)
and modify the RDD so it looks like this:
15, 25, 35, 45, 55, 65, 75, 85...
(ie each item on the RDD is the difference of of the two RDDs above.)
My question is: In Spark, how do I transform my RDD into a list of differences between RDD items?
You can take help of rdd's sliding function. like below
import org.apache.spark.mllib.rdd.RDDFunctions._
val rdd=sc.parallelize(List(10, 20, 30, 40, 50, 60, 70, 80))
rdd.sliding(2).map(_.sum/2).collect
//output
res14: Array[Int] = Array(15, 25, 35, 45, 55, 65, 75)

Nested Looping over dynamic list of list in scala [duplicate]

Given the following list:
List(List(1,2,3), List(4,5))
I would like to generate all the possible combinations. Using yield, it can be done as follows:
scala> for (x <- l.head; y <- l.last) yield (x,y)
res17: List[(Int, Int)] = List((1,4), (1,5), (2,4), (2,5), (3,4), (3,5))
But the problem I have is that the List[List[Int]] is not fixed; it can grow and shrink in size, so I never know how many for loops I will need in advance. What I would like is to be able to pass that list into a function which will dynamically generate the combinations regardless of the number of lists I have, so:
def generator (x : List[List[Int]) : List[List[Int]]
Is there a built-in library function that can do this. If not how do I go about doing this. Any pointers and hints would be great.
UPDATE:
The answer by #DNA blows the heap with the following (not so big) nested List structure:
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
Calling the generator2 function as follows:
generator2(
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
)
Is there a way to generate the cartesian product without blowing the heap?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at scala.LowPriorityImplicits.wrapRefArray(LowPriorityImplicits.scala:73)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:82)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
Here's a recursive solution:
def generator(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generator(x.tail).map(i :: _))
}
which produces:
val a = List(List(1, 2, 3), List(4, 5))
val b = List(List(1, 2, 3), List(4, 5), List(6, 7))
generator(a) //> List(List(1, 4), List(1, 5), List(2, 4),
//| List(2, 5), List(3, 4), List(3, 5))
generator(b) //> List(List(1, 4, 6), List(1, 4, 7), List(1, 5, 6),
//| List(1, 5, 7), List(2, 4, 6), List(2, 4, 7),
//| List(2, 5, 6), List(2, 5, 7), Listt(3, 4, 6),
//| List(3, 4, 7), List(3, 5, 6), List(3, 5, 7))
Update: the second case can also be written as a for comprehension, which may be a little clearer:
def generator2(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: t => for (j <- generator2(t); i <- h) yield i :: j
}
Update 2: for larger datasets, if you run out of memory, you can use Streams instead (if it makes sense to process the results incrementally). For example:
def generator(x: Stream[Stream[Int]]): Stream[Stream[Int]] =
if (x.isEmpty) Stream(Stream.empty)
else x.head.flatMap(i => generator(x.tail).map(i #:: _))
// NB pass in the data as Stream of Streams, not List of Lists
generator(input).take(3).foreach(x => println(x.toList))
>List(0, 0, 0, 0, 0, 0, 0)
>List(0, 0, 0, 0, 0, 200, 0)
>List(0, 0, 0, 0, 100, 0, 0)
Feels like your problem can be described in terms of recursion:
If you have n lists of int:
list1 of size m and list2, ... list n
generate the X combinations for list2 to n (so n-1 lists)
for each combination, you generate m new ones for each value of list1.
the base case is a list of one list of int, you just split all the elements in singleton lists.
so with List(List(1,2), List(3), List(4, 5))
the result of your recursive call is List(List(3,4),List(3,5)) and for each you add 2 combinations: List(1,3,4), List(2,3,4), List(1,3,5), List(2,3,5).
Ezekiel has exactly what I was looking for. This is just a minor tweak of it to make it generic.
def generateCombinations[T](x: List[List[T]]): List[List[T]] = {
x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generateCombinations(x.tail).map(i :: _))
}
}
Here is another solution based on Ezekiel's one. More verbose, but it's tail recursion (stack-safe).
def generateCombinations[A](in: List[List[A]]): List[List[A]] =
generate(in, List.empty)
#tailrec
private def generate[A](in: List[List[A]], acc: List[List[A]]): List[List[A]] = in match {
case Nil => acc
case head :: tail => generate(tail, generateAcc(acc, head))
}
private def generateAcc[A](oldAcc: List[List[A]], as: List[A]): List[List[A]] = {
oldAcc match {
case Nil => as.map(List(_))
case nonEmptyAcc =>
for {
a <- as
xs <- nonEmptyAcc
} yield a :: xs
}
}
I realize this is old, but it seems like no other answer provided the non-recursive solution with fold.
def generator[A](xs: List[List[A]]): List[List[A]] = xs.foldRight(List(List.empty[A])) { (next, combinations) =>
for (a <- next; as <- combinations) yield a +: as
}

How to get first 100 prime numbers in scala as i got result but it dispays blank where it's not found

output : prime numbers
2
3
()
5
()
7
()
()
i want as
2
3
5
7
def primeNumber(range: Int): Unit ={
val primeNumbers: immutable.IndexedSeq[AnyVal] =
for (number :Int <- 2 to range) yield{
var isPrime = true
for(checker : Int <- 2 to Math.sqrt(number).toInt if number%checker==0 if isPrime) isPrime = false
if(isPrime) number
}
println("prime numbers")
for(prime <- primeNumbers)
println(prime)
}
so the underlying problem here is that your yield block effectively will return an Int or a Unit depending on isPrime this leads your collection to be of type AnyVal because that's pretty much the least upper bound that can represent both types. Unit is a type only inhabited by one value which is represented as an empty set of round brackets in scala () so that's what you see in your list.
As Puneeth Reddy V said you can use collect to filter out all the non-Int values but I think that is a suboptimal approach (partial functions are often considered a code-smell depending on what type of scala-style you do). More idiomatic would be to rethink your loop (such for loops are scarcely used in scala) and this could be definitely be done using a foldLeft operation maybe even something else.
You can use collect on your output
primeNumbers.collect{
case i : Int => i
}
res2: IndexedSeq[Int] = Vector(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)
The reason is if else returning two types of value one is prime value and another is empty. You could return 0 in else case and filter it before printing it.
scala> def primeNumber(range: Int): Unit ={
|
| val primeNumbers: IndexedSeq[Int] =
|
| for (number :Int <- 2 to range) yield{
|
| var isPrime = true
|
| for(checker : Int <- 2 to Math.sqrt(number).toInt if number%checker==0 if isPrime) isPrime = false
|
| if(isPrime) number
| else
| 0
| }
|
| println("prime numbers" + primeNumbers)
| for(prime <- primeNumbers.filter(_ > 0))
| println(prime)
| }
primeNumber: (range: Int)Unit
scala> primeNumber(10)
prime numbersVector(2, 3, 0, 5, 0, 7, 0, 0, 0)
2
3
5
7
But we should not write the code the way you have written it. You are using mutable variables. Try to write code in an immutable way.
For example
scala> def isPrime(number: Int) =
| number > 1 && !(2 to number - 1).exists(e => e % number == 0)
isPrime: (number: Int)Boolean
scala> def generatePrimes(starting: Int): Stream[Int] = {
| if(isPrime(starting))
| starting #:: generatePrimes(starting + 1)
| else
| generatePrimes(starting + 1)
| }
generatePrimes: (starting: Int)Stream[Int]
scala> generatePrimes(2).take(100).toList
res12: List[Int] = List(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101)

Generating all possible combinations from a List[List[Int]] in Scala

Given the following list:
List(List(1,2,3), List(4,5))
I would like to generate all the possible combinations. Using yield, it can be done as follows:
scala> for (x <- l.head; y <- l.last) yield (x,y)
res17: List[(Int, Int)] = List((1,4), (1,5), (2,4), (2,5), (3,4), (3,5))
But the problem I have is that the List[List[Int]] is not fixed; it can grow and shrink in size, so I never know how many for loops I will need in advance. What I would like is to be able to pass that list into a function which will dynamically generate the combinations regardless of the number of lists I have, so:
def generator (x : List[List[Int]) : List[List[Int]]
Is there a built-in library function that can do this. If not how do I go about doing this. Any pointers and hints would be great.
UPDATE:
The answer by #DNA blows the heap with the following (not so big) nested List structure:
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
Calling the generator2 function as follows:
generator2(
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
)
Is there a way to generate the cartesian product without blowing the heap?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at scala.LowPriorityImplicits.wrapRefArray(LowPriorityImplicits.scala:73)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:82)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
Here's a recursive solution:
def generator(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generator(x.tail).map(i :: _))
}
which produces:
val a = List(List(1, 2, 3), List(4, 5))
val b = List(List(1, 2, 3), List(4, 5), List(6, 7))
generator(a) //> List(List(1, 4), List(1, 5), List(2, 4),
//| List(2, 5), List(3, 4), List(3, 5))
generator(b) //> List(List(1, 4, 6), List(1, 4, 7), List(1, 5, 6),
//| List(1, 5, 7), List(2, 4, 6), List(2, 4, 7),
//| List(2, 5, 6), List(2, 5, 7), Listt(3, 4, 6),
//| List(3, 4, 7), List(3, 5, 6), List(3, 5, 7))
Update: the second case can also be written as a for comprehension, which may be a little clearer:
def generator2(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: t => for (j <- generator2(t); i <- h) yield i :: j
}
Update 2: for larger datasets, if you run out of memory, you can use Streams instead (if it makes sense to process the results incrementally). For example:
def generator(x: Stream[Stream[Int]]): Stream[Stream[Int]] =
if (x.isEmpty) Stream(Stream.empty)
else x.head.flatMap(i => generator(x.tail).map(i #:: _))
// NB pass in the data as Stream of Streams, not List of Lists
generator(input).take(3).foreach(x => println(x.toList))
>List(0, 0, 0, 0, 0, 0, 0)
>List(0, 0, 0, 0, 0, 200, 0)
>List(0, 0, 0, 0, 100, 0, 0)
Feels like your problem can be described in terms of recursion:
If you have n lists of int:
list1 of size m and list2, ... list n
generate the X combinations for list2 to n (so n-1 lists)
for each combination, you generate m new ones for each value of list1.
the base case is a list of one list of int, you just split all the elements in singleton lists.
so with List(List(1,2), List(3), List(4, 5))
the result of your recursive call is List(List(3,4),List(3,5)) and for each you add 2 combinations: List(1,3,4), List(2,3,4), List(1,3,5), List(2,3,5).
Ezekiel has exactly what I was looking for. This is just a minor tweak of it to make it generic.
def generateCombinations[T](x: List[List[T]]): List[List[T]] = {
x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generateCombinations(x.tail).map(i :: _))
}
}
Here is another solution based on Ezekiel's one. More verbose, but it's tail recursion (stack-safe).
def generateCombinations[A](in: List[List[A]]): List[List[A]] =
generate(in, List.empty)
#tailrec
private def generate[A](in: List[List[A]], acc: List[List[A]]): List[List[A]] = in match {
case Nil => acc
case head :: tail => generate(tail, generateAcc(acc, head))
}
private def generateAcc[A](oldAcc: List[List[A]], as: List[A]): List[List[A]] = {
oldAcc match {
case Nil => as.map(List(_))
case nonEmptyAcc =>
for {
a <- as
xs <- nonEmptyAcc
} yield a :: xs
}
}
I realize this is old, but it seems like no other answer provided the non-recursive solution with fold.
def generator[A](xs: List[List[A]]): List[List[A]] = xs.foldRight(List(List.empty[A])) { (next, combinations) =>
for (a <- next; as <- combinations) yield a +: as
}

Distinct() command used with skip() and limit()

I have those items in my MongoDB collection:
{x: 1, y: 60, z:100}
{x: 1, y: 60, z:100}
{x: 1, y: 60, z:100}
{x: 2, y: 60, z:100}
{x: 2, y: 60, z:100}
{x: 3, y: 60, z:100}
{x: 4, y: 60, z:100}
{x: 4, y: 60, z:100}
{x: 5, y: 60, z:100}
{x: 6, y: 60, z:100}
{x: 6, y: 60, z:100}
{x: 6, y: 60, z:100}
{x: 7, y: 60, z:100}
{x: 7, y: 60, z:100}
I want to query the distinct values of x (i.e. [1, 2, 3, 4, 5, 6, 7]) ... but I only want a part of them (similar to what we can obtain with skip(a) and limit(b)).
How do I do that with the java driver of MongoDB (or with spring-data-mongodb if possible) ?
in mongo shell is simple with aggregate framework:
db.collection.aggregate([{$group:{_id:'$x'}}, {$skip:3}, {$limit:5}])
for java look: use aggregation framework in java
Depending on your use case, you may find this approach to be more performant than aggregation. Here's a mongo shell example function.
function getDistinctValues(skip, limit) {
var q = {x:{$gt: MinKey()}}; // query
var s = {x:1}; // sort key
var results = [];
for(var i = 0; i < skip; i++) {
var result = db.test.find(q).limit(1).sort(s).toArray()[0];
if(!result) {
return results;
}
q.x.$gt = result.x;
}
for(var i = 0; i < limit; i++) {
var result = db.test.find(q).limit(1).sort(s).toArray()[0];
if(!result) {
break;
}
results.push(result.x);
q.x.$gt = result.x;
}
return results;
}
We are basically just finding the values one at a time, using the query and sort to skip past values we have already seen. You can easily improve on this by adding more arguments to make the function more flexible. Also, creating an index on the property you want to find distinct values for will improve performance.
A less obvious improvement would be to skip the "skip" phase all together and specify a value to continue from. Here's a mongo shell example function.
function getDistinctValues(limit, lastValue) {
var q = {x:{$gt: lastValue === undefined ? MinKey() : lastValue}}; // query
var s = {x:1}; // sort key
var results = [];
for(var i = 0; i < limit; i++) {
var result = db.test.find(q).limit(1).sort(s).toArray()[0];
if(!result) {
break;
}
results.push(result.x);
q.x.$gt = result.x;
}
return results;
}
If you do decide to go with the aggregation technique, make sure you add a $sort stage after the $group stage. Otherwise your results will not show up in a predictable order.