I have a stream of unordered measurements, that I'd like to group into batches of a fixed size, so that I can persist them efficiently later:
val measurements = for {
id <- Seq("foo", "bar", "baz")
value <- 1 to 5
} yield (id, value)
fs2.Stream.emits(scala.util.Random.shuffle(measurements)).toVector
That is, instead of:
(bar,4)
(foo,5)
(baz,3)
(baz,5)
(baz,4)
(foo,2)
(bar,2)
(foo,4)
(baz,1)
(foo,1)
(foo,3)
(bar,1)
(bar,5)
(bar,3)
(baz,2)
I'd like to have the following structure for a batch size equal to 3:
(bar,[4,2,1])
(foo,[5,2,4])
(baz,[3,5,4])
(baz,[1,2])
(foo,[1,3])
(bar,[5,3])
Is there a simple, idiomatic way to achieve this in FS2? I know there's a groupAdjacentBy function, but this will take into account neighbouring items only.
I'm on 0.10.5 at the moment.
This can be achieved with fs2 Pull:
import cats.data.{NonEmptyList => Nel}
import fs2._
object GroupingByKey {
def groupByKey[F[_], K, V](limit: Int): Pipe[F, (K, V), (K, Nel[V])] = {
require(limit >= 1)
def go(state: Map[K, List[V]]): Stream[F, (K, V)] => Pull[F, (K, Nel[V]), Unit] = _.pull.uncons1.flatMap {
case Some(((key, num), tail)) =>
val prev = state.getOrElse(key, Nil)
if (prev.size == limit - 1) {
val group = Nel.ofInitLast(prev.reverse, num)
Pull.output1(key -> group) >> go(state - key)(tail)
} else {
go(state.updated(key, num :: prev))(tail)
}
case None =>
val chunk = Chunk.vector {
state
.toVector
.collect { case (key, last :: revInit) =>
val group = Nel.ofInitLast(revInit.reverse, last)
key -> group
}
}
Pull.output(chunk) >> Pull.done
}
go(Map.empty)(_).stream
}
}
Usage:
import cats.data.{NonEmptyList => Nel}
import cats.implicits._
import cats.effect.{ExitCode, IO, IOApp}
import fs2._
object Answer extends IOApp {
type Key = String
override def run(args: List[String]): IO[ExitCode] = {
require {
Stream('a -> 1).through(groupByKey(2)).compile.toList ==
List('a -> Nel.one(1))
}
require {
Stream('a -> 1, 'a -> 2).through(groupByKey(2)).compile.toList ==
List('a -> Nel.of(1, 2))
}
require {
Stream('a -> 1, 'a -> 2, 'a -> 3).through(groupByKey(2)).compile.toList ==
List('a -> Nel.of(1, 2), 'a -> Nel.one(3))
}
val infinite = (for {
prng <- Stream.eval(IO { new scala.util.Random() })
keys <- Stream(Vector[Key]("a", "b", "c", "d", "e", "f", "g"))
key = Stream.eval(IO {
val i = prng.nextInt(keys.size)
keys(i)
})
num = Stream.eval(IO { 1 + prng.nextInt(9) })
} yield (key zip num).repeat).flatten
infinite
.through(groupByKey(3))
.showLinesStdOut
.compile
.drain
.as(ExitCode.Success)
}
}
Related
I am trying to solve Two sum problem using scala
val list = List(1,2,3,4,5)
val map = collection.mutable.Map.empty[Int, Int]
val sum = 9
for {
i <- 0 until list.size
} yield {
map.get(sum - list(i)) match {
case None => map += (list(i) -> i)
case Some(previousIndex) => println(s" Indexes $previousIndex $i")
}
}
Can anyone suggest an O(n) solution without using mutable map using scala
If you are trying to solve "Two sum problem" - meaning you need from given list find two numbers which gives sum equal to given, can go with:
val list = List(1,2,3,4,5)
val sum = 9
val set = list.toSet
val solution = list.flatMap { item =>
val rest = sum - item
val min = Math.min(item, rest)
val max = Math.max(item, rest)
if (set(rest)) Some(min, max) else None
}.toSet
println(solution)
Print result:
Set((4,5))
ScalaFiddle: https://scalafiddle.io/sf/LA6P3eh/0
UPDATE
The result required to return indices not values:
val list = List(1,2,3,4,5)
val sum = 9
val inputMap = list.zipWithIndex.toMap
val solution = list.zipWithIndex.flatMap { case (item, itemIndex) =>
inputMap.get(sum - item).map { restIndex =>
val minIndex = Math.min(itemIndex, restIndex)
val maxIndex = Math.max(itemIndex, restIndex)
minIndex -> maxIndex
}
}.toSet
println(solution)
Printout: Set((3,4))
ScalaFiddle: https://scalafiddle.io/sf/LA6P3eh/1
You can try something as follows for the first result:
object Solution extends App {
def twoSums(xs: List[Int], target: Int): Option[(Int,Int)] = {
#annotation.tailrec def go(zipped: List[(Int,Int)], map: Map[Int,Int] = Map.empty): Option[(Int,Int)] = {
zipped match {
case Nil => None
case (ele, idx) :: tail =>
map.get(target - ele) match {
case Some(prevIdx) => Some((prevIdx, idx))
case None => go(tail, map + (ele -> idx))
}
}
}
go(xs.zipWithIndex)
}
val res = twoSums(List(1,2,3,4,5), 9)
println(res)
}
Or via foldLeft for all results:
object Solution extends App {
def twoSums(xs: List[Int], target: Int): List[(Int, Int)] = {
xs.zipWithIndex.foldLeft((Map.empty[Int,Int], List.empty[(Int,Int)])) {
case ((map, results), (ele, idx)) =>
map.get(target - ele) match {
case Some(prevIdx) =>(map, (prevIdx, idx) :: results)
case None => (map + (ele -> idx), results)
}
}
}._2
val res = twoSums(List(1,2,3,4,5), 9)
println(res)
}
I want to group large Stream[F, A] into Stream[Stream[F, A]] with at most n element for inner stream.
This is what I did, basically pipe chunks into Queue[F, Queue[F, Chunk[A]], and then yields queue elements as result stream.
implicit class StreamSyntax[F[_], A](s: Stream[F, A])(
implicit F: Concurrent[F]) {
def groupedPipe(
lastQRef: Ref[F, Queue[F, Option[Chunk[A]]]],
n: Int): Pipe[F, A, Stream[F, A]] = { in =>
val initQs =
Queue.unbounded[F, Option[Queue[F, Option[Chunk[A]]]]].flatMap { qq =>
Queue.bounded[F, Option[Chunk[A]]](1).flatMap { q =>
lastQRef.set(q) *> qq.enqueue1(Some(q)).as(qq -> q)
}
}
Stream.eval(initQs).flatMap {
case (qq, initQ) =>
def newQueue = Queue.bounded[F, Option[Chunk[A]]](1).flatMap { q =>
qq.enqueue1(Some(q)) *> lastQRef.set(q).as(q)
}
val evalStream = {
in.chunks
.evalMapAccumulate((0, initQ)) {
case ((i, q), c) if i + c.size >= n =>
val (l, r) = c.splitAt(n - i)
q.enqueue1(Some(l)) >> q.enqueue1(None) >> q
.enqueue1(None) >> newQueue.flatMap { nq =>
nq.enqueue1(Some(r)).as(((r.size, nq), c))
}
case ((i, q), c) if (i + c.size) < n =>
q.enqueue1(Some(c)).as(((i + c.size, q), c))
}
.attempt ++ Stream.eval {
lastQRef.get.flatMap { last =>
last.enqueue1(None) *> last.enqueue1(None)
} *> qq.enqueue1(None)
}
}
qq.dequeue.unNoneTerminate
.map(
q =>
q.dequeue.unNoneTerminate
.flatMap(Stream.chunk)
.onFinalize(
q.dequeueChunk(Int.MaxValue).unNoneTerminate.compile.drain))
.concurrently(evalStream)
}
}
def grouped(n: Int) = {
Stream.eval {
Queue.unbounded[F, Option[Chunk[A]]].flatMap { empty =>
Ref.of[F, Queue[F, Option[Chunk[A]]]](empty)
}
}.flatMap { ref =>
val p = groupedPipe(ref, n)
s.through(p)
}
}
}
But it is very complicated, is there any simpler way ?
fs2 has chunkN chunkLimit methods that can help with grouping
stream.chunkN(n).map(Stream.chunk)
stream.chunkLimit(n).map(Stream.chunk)
chunkN produces chunks of size n until the end of a stream
chunkLimit splits existing chunks and can produce chunks with variable size.
scala> Stream(1,2,3).repeat.chunkN(2).take(5).toList
res0: List[Chunk[Int]] = List(Chunk(1, 2), Chunk(3, 1), Chunk(2, 3), Chunk(1, 2), Chunk(3, 1))
scala> (Stream(1) ++ Stream(2, 3) ++ Stream(4, 5, 6)).chunkLimit(2).toList
res0: List[Chunk[Int]] = List(Chunk(1), Chunk(2, 3), Chunk(4, 5), Chunk(6))
In addition to the already mentioned chunksN, also consider using groupWithin (fs2 1.0.1):
def groupWithin[F2[x] >: F[x]](n: Int, d: FiniteDuration)(implicit timer: Timer[F2], F: Concurrent[F2]): Stream[F2, Chunk[O]]
Divide this streams into groups of elements received within a time window, or limited by the number of the elements, whichever happens first. Empty groups, which can occur if no elements can be pulled from upstream in a given time window, will not be emitted.
Note: a time window starts each time downstream pulls.
I'm not sure why you'd want this to be nested streams, since the requirement is to have "at most n elements" in one batch - which implies that you're keeping track of a finite number of elements (which is exactly what a Chunk is for). Either way, a Chunk can always be represented as a Stream with Stream.chunk:
val chunks: Stream[F, Chunk[O]] = ???
val streamOfStreams: Stream[F, Stream[F, O]] = chunks.map(Stream.chunk)
Here's a complete example of how to use groupWithin:
import cats.implicits._
import cats.effect.{ExitCode, IO, IOApp}
import fs2._
import scala.concurrent.duration._
object GroupingDemo extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
Stream('a, 'b, 'c).covary[IO]
.groupWithin(2, 1.second)
.map(_.toList)
.showLinesStdOut
.compile.drain
.as(ExitCode.Success)
}
}
Outputs:
List('a, 'b)
List('c)
Finally I use a more reliable version (use Hotswap ensure queue termination) like this.
def grouped(
innerSize: Int
)(implicit F: Async[F]): Stream[F, Stream[F, A]] = {
type InnerQueue = Queue[F, Option[Chunk[A]]]
type OuterQueue = Queue[F, Option[InnerQueue]]
def swapperInner(swapper: Hotswap[F, InnerQueue], outer: OuterQueue) = {
val innerRes =
Resource.make(Queue.unbounded[F, Option[Chunk[A]]])(_.offer(None))
swapper.swap(innerRes).flatTap(q => outer.offer(q.some))
}
def loopChunk(
gathered: Int,
curr: Queue[F, Option[Chunk[A]]],
chunk: Chunk[A],
newInnerQueue: F[InnerQueue]
): F[(Int, Queue[F, Option[Chunk[A]]])] = {
if (gathered + chunk.size > innerSize) {
val (left, right) = chunk.splitAt(innerSize - gathered)
curr.offer(left.some) >> newInnerQueue.flatMap { nq =>
loopChunk(0, nq, right, newInnerQueue)
}
} else if (gathered + chunk.size == innerSize) {
curr.offer(chunk.some) >> newInnerQueue.tupleLeft(
0
)
} else {
curr.offer(chunk.some).as(gathered + chunk.size -> curr)
}
}
val prepare = for {
outer <- Resource.eval(Queue.unbounded[F, Option[InnerQueue]])
swapper <- Hotswap.create[F, InnerQueue]
} yield outer -> swapper
Stream.resource(prepare).flatMap {
case (outer, swapper) =>
val newInner = swapperInner(swapper, outer)
val background = Stream.eval(newInner).flatMap { initQueue =>
s.chunks
.filter(_.nonEmpty)
.evalMapAccumulate(0 -> initQueue) { (state, chunk) =>
val (gathered, curr) = state
loopChunk(gathered, curr, chunk, newInner).tupleRight({})
}
.onFinalize(swapper.clear *> outer.offer(None))
}
val foreground = Stream
.fromQueueNoneTerminated(outer)
.map(i => Stream.fromQueueNoneTerminatedChunk(i))
foreground.concurrently(background)
}
}
In the following Scala code I have a sequence of currying functions with different signatures. I want to iterate through them and invoke.
def intCheck(b: Int)(a: Int) = a == b
def stringCheck(b: String)(a: String) = a == b
def doubleCheck(b: Double)(a: Double) = a == b
val list = Seq(intCheck(1) _, stringCheck("a") _, doubleCheck(2.3) _)
for (f <- list) {
//if f is 1st function
f(2) // LINE 1
//if f is 2nd function
f("a") // LINE 2
//if f is 3rd function
f(2.0) // LINE 3
}
But for the LINE 1,2 & 3 I get a compilation error "Type mismatch, expected: String with Int with Double, actual: Int". How can I enforce compiler to avoid type-check here if I am sure about the type here.
i think this is a way...
sealed abstract class AnyChecker(val a:Any,val b:Any) {
def eval = a==b
}
class IntChecker(override val a:Int,override val b:Int) extends AnyChecker(a,b)
class DoubleChecker(override val a:Double,override val b:Double) extends AnyChecker(a,b)
class StringChecker(override val a:String,override val b:String) extends AnyChecker(a,b)
object IntChecker {
def apply(a:Int,b:Int) = new IntChecker(a,b)
def unapply(intChecker: IntChecker) = Some(intChecker.a,intChecker.b)
}
object DoubleChecker {
def apply(a:Double,b:Double) = new DoubleChecker(a,b)
def unapply(doubleChecker: DoubleChecker) = Some(doubleChecker.a,doubleChecker.b)
}
object StringChecker {
def apply(a:String,b:String) = new StringChecker(a,b)
def unapply(stringChecker: StringChecker) = Some(stringChecker.a,stringChecker.b)
}
val list = List(IntChecker(1,3), StringChecker("a","a"), DoubleChecker(2.3,3.1), StringChecker("a","b"), StringChecker("a","c"), StringChecker("x","x"))
for (f <- list) {
f match {
case a:IntChecker => println(s"a:${a.a}, b:${a.b}, ${a.eval}")
case a:DoubleChecker => println(s"a:${a.a}, b:${a.b}, ${a.eval}")
case StringChecker("a","a") => println("equals")
case StringChecker("a","b") => println("not equals")
case StringChecker("a",b) => println( StringChecker("a",b).eval)
case StringChecker(a,b) => println(StringChecker(a,b).eval)
}
}
Please, check this, i am sure this will help you.
http://bplawler.tumblr.com/post/7493366722/scala-programming-unapply-and-case-classes
Also, another simple way is only using AnyChcker....
class AnyChecker[T](val a: T, val b: T) {
def eval = a == b
override def toString = s"Checker($a,$b)"
}
object AnyChecker {
def apply[T](a: T, b: T) = new AnyChecker(a, b)
def unapply[T](doubleChecker: AnyChecker[T]) = Some(doubleChecker.a, doubleChecker.b)
}
val list2 = List(AnyChecker(1, 3), AnyChecker("a", "a"), AnyChecker(2.3, 3.1), AnyChecker("a", "b"), AnyChecker("a", "c"), AnyChecker("x", "x"))
for (checker <- list2) {
checker match {
case AnyChecker(1, 3) => println("ints")
case AnyChecker(2.3, 3.1) => println("doubles")
case AnyChecker("a","a") => println("double a")
case checker1: AnyChecker[Any] =>println(checker1)
}
}
if you only write like a tupes....
val list3 = List((1, "1"), ("a", "a"), (2.3, 3.1), ("a", "b"), ("a", "c"), ("x", "x")).map(element=>AnyChecker(element._1,element._2))
for (checker <- list3) {
checker match {
case AnyChecker(1, 3) => println("ints")
case AnyChecker(2.3, 3.1) => println("doubles")
case AnyChecker("a","a") => println("double a")
case checker1: AnyChecker[Any] =>println(checker1.eval)
}
}
If I were splitting a string, I would be able to do
"123,456,789".split(",")
to get
Seq("123","456","789")
Thinking of a string as a sequence of characters, how could this be generalized to other sequences of objects?
val x = Seq(One(),Two(),Three(),Comma(),Five(),Six(),Comma(),Seven(),Eight(),Nine())
x.split(
number=>{
case _:Comma => true
case _ => false
}
)
split in this case doesn't exist, but it reminds me of span, partition, groupby, but only span seems close, but it doesn't handle leading/ending comma's gracefully.
implicit class SplitSeq[T](seq: Seq[T]){
import scala.collection.mutable.ListBuffer
def split(sep: T): Seq[Seq[T]] = {
val buffer = ListBuffer(ListBuffer.empty[T])
seq.foreach {
case `sep` => buffer += ListBuffer.empty
case elem => buffer.last += elem
}; buffer.filter(_.nonEmpty)
}
}
It can be then used like x.split(Comma()).
The following is 'a' solution, not the most elegant -
def split[A](x: Seq[A], edge: A => Boolean): Seq[Seq[A]] = {
val init = (Seq[Seq[A]](), Seq[A]())
val (result, last) = x.foldLeft(init) { (cum, n) =>
val (total, prev) = cum
if (edge(n)) {
(total :+ prev, Seq.empty)
} else {
(total, prev :+ n)
}
}
result :+ last
}
Example result -
scala> split(Seq(1,2,3,0,4,5,0,6,7), (_:Int) == 0)
res53: Seq[Seq[Int]] = List(List(1, 2, 3), List(4, 5), List(6, 7))
This is how I've solved it in the past, but I suspect there is a better / more elegant way.
def break[A](xs:Seq[A], p:A => Boolean): (Seq[A], Seq[A]) = {
if (p(xs.head)) {
xs.span(p)
}
else {
xs.span(a => !p(a))
}
}
I have multiple Option's. I want to check if they hold a value. If an Option is None, I want to reply to user about this. Else proceed.
This is what I have done:
val name:Option[String]
val email:Option[String]
val pass:Option[String]
val i = List(name,email,pass).find(x => x match{
case None => true
case _ => false
})
i match{
case Some(x) => Ok("Bad Request")
case None => {
//move forward
}
}
Above I can replace find with contains, but this is a very dirty way. How can I make it elegant and monadic?
Edit: I would also like to know what element was None.
Another way is as a for-comprehension:
val outcome = for {
nm <- name
em <- email
pwd <- pass
result = doSomething(nm, em, pwd) // where def doSomething(name: String, email: String, password: String): ResultType = ???
} yield (result)
This will generate outcome as a Some(result), which you can interrogate in various ways (all the methods available to the collections classes: map, filter, foreach, etc.). Eg:
outcome.map(Ok(result)).orElse(Ok("Bad Request"))
val ok = Seq(name, email, pass).forall(_.isDefined)
If you want to reuse the code, you can do
def allFieldValueProvided(fields: Option[_]*): Boolean = fields.forall(_.isDefined)
If you want to know all the missing values then you can find all missing values and if there is none, then you are good to go.
def findMissingValues(v: (String, Option[_])*) = v.collect {
case (name, None) => name
}
val missingValues = findMissingValues(("name1", option1), ("name2", option2), ...)
if(missingValues.isEmpty) {
Ok(...)
} else {
BadRequest("Missing values for " + missingValues.mkString(", ")))
}
val response = for {
n <- name
e <- email
p <- pass
} yield {
/* do something with n, e, p */
}
response getOrElse { /* bad request /* }
Or, with Scalaz:
val response = (name |#| email |#| pass) { (n, e, p) =>
/* do something with n, e, p */
}
response getOrElse { /* bad request /* }
if ((name :: email :: pass :: Nil) forall(!_.isEmpty)) {
} else {
// bad request
}
I think the most straightforward way would be this:
(name,email,pass) match {
case ((Some(name), Some(email), Some(pass)) => // proceed
case _ => // Bad request
}
A version with stone knives and bear skins:
import util._
object Test extends App {
val zero: Either[List[Int], Tuple3[String,String,String]] = Right((null,null,null))
def verify(fields: List[Option[String]]) = {
(zero /: fields.zipWithIndex) { (acc, v) => v match {
case (Some(s), i) => acc match {
case Left(_) => acc
case Right(t) =>
val u = i match {
case 0 => t copy (_1 = s)
case 1 => t copy (_2 = s)
case 2 => t copy (_3 = s)
}
Right(u)
}
case (None, i) =>
val fails = acc match {
case Left(f) => f
case Right(_) => Nil
}
Left(i :: fails)
}
}
}
def consume(name: String, email: String, pass: String) = Console println s"$name/$email/$pass"
def fail(is: List[Int]) = is map List("name","email","pass") foreach (Console println "Missing: " + _)
val name:Option[String] = Some("Bob")
val email:Option[String]= None
val pass:Option[String] = Some("boB")
val res = verify(List(name,email,pass))
res.fold(fail, (consume _).tupled)
val res2 = verify(List(name, Some("bob#bob.org"),pass))
res2.fold(fail, (consume _).tupled)
}
The same thing, using reflection to generalize the tuple copy.
The downside is that you must tell it what tuple to expect back. In this form, reflection is like one of those Stone Age advances that were so magical they trended on twitter for ten thousand years.
def verify[A <: Product](fields: List[Option[String]]) = {
import scala.reflect.runtime._
import universe._
val MaxTupleArity = 22
def tuple = {
require (fields.length <= MaxTupleArity)
val n = fields.length
val tupleN = typeOf[Tuple2[_,_]].typeSymbol.owner.typeSignature member TypeName(s"Tuple$n")
val init = tupleN.typeSignature member nme.CONSTRUCTOR
val ctor = currentMirror reflectClass tupleN.asClass reflectConstructor init.asMethod
val vs = Seq.fill(n)(null.asInstanceOf[String])
ctor(vs: _*).asInstanceOf[Product]
}
def zero: Either[List[Int], Product] = Right(tuple)
def nextProduct(p: Product, i: Int, s: String) = {
val im = currentMirror reflect p
val ts = im.symbol.typeSignature
val copy = (ts member TermName("copy")).asMethod
val args = copy.paramss.flatten map { x =>
val name = TermName(s"_$i")
if (x.name == name) s
else (im reflectMethod (ts member x.name).asMethod)()
}
(im reflectMethod copy)(args: _*).asInstanceOf[Product]
}
(zero /: fields.zipWithIndex) { (acc, v) => v match {
case (Some(s), i) => acc match {
case Left(_) => acc
case Right(t) => Right(nextProduct(t, i + 1, s))
}
case (None, i) =>
val fails = acc match {
case Left(f) => f
case Right(_) => Nil
}
Left(i :: fails)
}
}.asInstanceOf[Either[List[Int], A]]
}
def consume(name: String, email: String, pass: String) = Console println s"$name/$email/$pass"
def fail(is: List[Int]) = is map List("name","email","pass") foreach (Console println "Missing: " + _)
val name:Option[String] = Some("Bob")
val email:Option[String]= None
val pass:Option[String] = Some("boB")
type T3 = Tuple3[String,String,String]
val res = verify[T3](List(name,email,pass))
res.fold(fail, (consume _).tupled)
val res2 = verify[T3](List(name, Some("bob#bob.org"),pass))
res2.fold(fail, (consume _).tupled)
I know this doesn't scale well, but would this suffice?
(name, email, pass) match {
case (None, _, _) => "name"
case (_, None, _) => "email"
case (_, _, None) => "pass"
case _ => "Nothing to see here"
}