Generating lazy scala streams by iteration - scala

I'm looking for a way to generate a scala stream (the equivalent of F#'s sequence) of this form:
let allRows resultSet : seq<Row> =
seq {
while resultSet.next() do
yield new Row(resultSet)
}
Is there any way to easily do this in scala? The only way I found involved (non-tailrecursive) recursion, which for large amounts of rows in a resultSet would mean certain stackoverflow.
Thanks

You can implement it like this:
def toStream(rs:ResultSet):Stream[Row] =
if(!rs.next) Stream.Empty
else new Row(rs) #:: toStream(rs)
Note that since toStream is defined using def (in opposite to definition with val) this solution will no keep whole stream in memory and head of stream will be garbage collected.
Another option you can use is to define new Iterator:
def toIterator(rs:ResultSet) = new Iterator[Row] {
override def hasNext: Boolean = rs.next()
override def next(): Row = new Row(rs)
}

Suppose you have something like
trait ResultSet {
def next: Boolean
}
class Row(rs: ResultSet)
You can define your function as
def allRows(rs: ResultSet): Stream[Row] =
Stream.continually(if (rs.next) Some(new Row(rs)) else None)
.takeWhile(_.isDefined).map(_.get)

Related

Scala: Avoid var or mutable type

I have a pseudo-code like shown below. The ItemChurner.churn() is abstracted component which generates objects until x times, where x is unknown. :
def func: MyList = {
var list: MyList = MyList()
while(ItemChurner.canChurn) {
list = new MyList(ItemChurner.churn(), list)
}
list
}
Is there a way to avoid use of var?
If canChurn works as it should:
def func(churner: ItemChurner) = {
val iterator = new Iterator {
def hasNext = churner.canChurn
def next = churner.churn()
}
iterator.toList
}
About version (of the question) that contained catched exception check for churn():
If really expect some exceptions, what's the point of canChurn then?
Anyway, if you care about exceptions:
Iterator.continually(Try(churner.churn)).takeWhile(_.isSuccess).map(_.get).toList
This actually is not much precise, as churn might throw some other exception that has to be propagated, so here the scala's Exception helpers) come in hand:
def step = catching(classOf[NoMoreElementsException]) opt churner.churn()
Iterator.continually(step).takeWhile(_.nonEmpty).map(_.get).toList
If you want to do using Simple recursion and avoid var. This is the general strategy used in functional programming.
Use Vector instead of List for effective append
def func[T](): List[T] = {
#tailrec
def helper(result: List[T]): List[T] = {
if (ItemChurner.canChurn) helper(result ++ List(ItemChurner.churn))
else result
}
helper(List.Empty[T])
}
Assuming ItemChurner.canChurn does not throw any exception. if it throws exceptions simply wrap it inside Try
Improving upon pamu's answer, you can do something on the lines of;
def func: MyList = {
def process(cur: MyList): MyList = {
if (ItemChurner.canChurn) process(new MyList(ItemChurner.churn(), cur))
else cur
}
process(new MyList())
}

Thread-safely transforming a value in a mutable map

Suppose I want to use a mutable map in Scala to keep track of the number of times I've seen some strings. In a single-threaded context, this is easy:
import scala.collection.mutable.{ Map => MMap }
class Counter {
val counts = MMap.empty[String, Int].withDefaultValue(0)
def add(s: String): Unit = counts(s) += 1
}
Unfortunately this isn't thread-safe, since the get and the update don't happen atomically.
Concurrent maps add a few atomic operations to the mutable map API, but not the one I need, which would look something like this:
def replace(k: A, f: B => B): Option[B]
I know I can use ScalaSTM's TMap:
import scala.concurrent.stm._
class Counter {
val counts = TMap.empty[String, Int]
def add(s: String): Unit = atomic { implicit txn =>
counts(s) = counts.get(s).getOrElse(0) + 1
}
}
But (for now) that's still an extra dependency. Other options would include actors (another dependency), synchronization (potentially less efficient), or Java's atomic references (less idiomatic).
In general I'd avoid mutable maps in Scala, but I've occasionally needed this kind of thing, and most recently I've used the STM approach (instead of just crossing my fingers and hoping I don't get bitten by the naïve solution).
I know there are a number of trade-offs here (extra dependencies vs. performance vs. clarity, etc.), but is there anything like a "right" answer to this problem in Scala 2.10?
How about this one? Assuming you don't really need a general replace method right now, just a counter.
import java.util.concurrent.ConcurrentHashMap
import java.util.concurrent.atomic.AtomicInteger
object CountedMap {
private val counts = new ConcurrentHashMap[String, AtomicInteger]
def add(key: String): Int = {
val zero = new AtomicInteger(0)
val value = Option(counts.putIfAbsent(key, zero)).getOrElse(zero)
value.incrementAndGet
}
}
You get better performance than synchronizing on the whole map, and you also get atomic increments.
The simplest solution is definitely synchronization. If there is not too much contention, performance might not be that bad.
Otherwise, you could try to roll up your own STM-like replace implementation. Something like this might do:
object ConcurrentMapOps {
private val rng = new util.Random
private val MaxReplaceRetryCount = 10
private val MinReplaceBackoffTime: Long = 1
private val MaxReplaceBackoffTime: Long = 20
}
implicit class ConcurrentMapOps[A, B]( val m: collection.concurrent.Map[A,B] ) {
import ConcurrentMapOps._
private def replaceBackoff() {
Thread.sleep( (MinReplaceBackoffTime + rng.nextFloat * (MaxReplaceBackoffTime - MinReplaceBackoffTime) ).toLong ) // A bit crude, I know
}
def replace(k: A, f: B => B): Option[B] = {
m.get( k ) match {
case None => return None
case Some( old ) =>
var retryCount = 0
while ( retryCount <= MaxReplaceRetryCount ) {
val done = m.replace( k, old, f( old ) )
if ( done ) {
return Some( old )
}
else {
retryCount += 1
replaceBackoff()
}
}
sys.error("Could not concurrently modify map")
}
}
}
Note that collision issues are localized to a given key. If two threads access the same map but work on distinct keys, you'll have no collisions and the replace operation will always succeed the first time. If a collision is detected, we wait a bit (a random amount of time, so as to minimize the likeliness of threads fighting forever for the same key) and try again.
I cannot guarantee that this is production-ready (I just tossed it right now), but that might do the trick.
UPDATE: Of course (as Ionuț G. Stan pointed out), if all you want is increment/decrement a value, java's ConcurrentHashMap already provides thoses operations in a lock-free manner.
My above solution applies if you need a more general replace method that would take the transformation function as a parameter.
You're asking for trouble if your map is just sitting there as a val. If it meets your use case, I'd recommend something like
class Counter {
private[this] myCounts = MMap.empty[String, Int].withDefaultValue(0)
def counts(s: String) = myCounts.synchronized { myCounts(s) }
def add(s: String) = myCounts.synchronized { myCounts(s) += 1 }
def getCounts = myCounts.synchronized { Map[String,Int]() ++ myCounts }
}
for low-contention usage. For high-contention, you should use a concurrent map designed to support such use (e.g. java.util.concurrent.ConcurrentHashMap) and wrap the values in AtomicWhatever.
If you are ok to work with future based interface:
trait SingleThreadedExecutionContext {
val ec = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
}
class Counter extends SingleThreadedExecutionContext {
private val counts = MMap.empty[String, Int].withDefaultValue(0)
def get(s: String): Future[Int] = future(counts(s))(ec)
def add(s: String): Future[Unit] = future(counts(s) += 1)(ec)
}
Test will look like:
class MutableMapSpec extends Specification {
"thread safe" in {
import ExecutionContext.Implicits.global
val c = new Counter
val testData = Seq.fill(16)("1")
await(Future.traverse(testData)(c.add))
await(c.get("1")) mustEqual 16
}
}

Scala Data Modeling and Generics

I'm using the Play Framework and Squeryl to make a fairly basic front end for a database, but I know I'm rewriting too much code. I have different models to represent data in my db, and they all do the same six functions
object ModelType{
def add(model:ModelType):Option[ModelType] = Option(AppDB.tablename.insert(model))
def remove(id: Long) = AppDB.tablename.delete(id)
def getAll():List[ModelType] = from(AppDB.tablename)(model => select(model) orderBy(model.aDifferentFieldForEachModel)) toList
def toJson(model:ModelType):JsValue ={
Json.toJson(
Map("field" -> Json.toJson(model.field))
)
}
def allToJson() = {
val json:List[JsValue] = getAll.map{toJson(_)}
Json.toJson(json.toSeq)
}
def validate(different values for each model) = // is fairly different for each one. Validates the submitted fields from a user
}
So I'm using case classes for each of the models, and using an accompanying object for these commands. How can I use generics or traits in scala to make my life easier and not type all of these methods out every time?
EDIT: Mostly solved with gzm0's answer, but the problem is now how would I implement getAll in the trait? I want to be able to save some variable for each model that resembles the model.aDifferentFieldForEachModel as above.
You could try the following:
trait ModelOps[T] {
def table: AppDB.Table // not sure about type
def order: AppDB.OrderByPredicate // not sure about type
def toJson(model: T): JsValue
def add(model: T): Option[T] = Option(AppDB.categories.insert(model))
def remove(id: Long) = AppDB.categories.delete(id)
def getAll(): List[T] = from(table)(model => select(model) orderBy(order)) toList
def allToJson() = {
val json:List[JsValue] = getAll.map{toJson(_)}
Json.toJson(json.toSeq)
}
}
Then you can for each model type:
object ModelType extends ModelOps[ModelType] {
def table = AppDB.examples
def order = yourPredicate
def toJson(model:ModelType):JsValue = {
Json.toJson(Map("field" -> Json.toJson(model.field)))
}
def validate(different values for each model) = // is fairly different for each one. Validates the submitted fields from a user
}
UPDATE About the true type of AppDB.OrderByPredicate:
Calling select on PrimitiveTypeMode returns a SelectState. On this SelectState, you will call orderBy which takes a List[BaseQueryYield#O] (or multiple of those in the same argument list). Hence you should define:
def order(model: T): List[BaseQueryYield#O]
and
def getAll() = from(table)(model => select(model) orderBy(order(model))) toList
By the way, BaseQueryYield#O resolves to ExpressionNode.

Traversable => Java Iterator

I have a Traversable, and I want to make it into a Java Iterator. My problem is that I want everything to be lazily done. If I do .toIterator on the traversable, it eagerly produces the result, copies it into a List, and returns an iterator over the List.
I'm sure I'm missing something simple here...
Here is a small test case that shows what I mean:
class Test extends Traversable[String] {
def foreach[U](f : (String) => U) {
f("1")
f("2")
f("3")
throw new RuntimeException("Not lazy!")
}
}
val a = new Test
val iter = a.toIterator
The reason you can't get lazily get an iterator from a traversable is that you intrinsically can't. Traversable defines foreach, and foreach runs through everything without stopping. No laziness there.
So you have two options, both terrible, for making it lazy.
First, you can iterate through the whole thing each time. (I'm going to use the Scala Iterator, but the Java Iterator is basically the same.)
class Terrible[A](t: Traversable[A]) extends Iterator[A] {
private var i = 0
def hasNext = i < t.size // This could be O(n)!
def next: A = {
val a = t.slice(i,i+1).head // Also could be O(n)!
i += 1
a
}
}
If you happen to have efficient indexed slicing, this will be okay. If not, each "next" will take time linear in the length of the iterator, for O(n^2) time just to traverse it. But this is also not necessarily lazy; if you insist that it must be you have to enforce O(n^2) in all cases and do
class Terrible[A](t: Traversable[A]) extends Iterator[A] {
private var i = 0
def hasNext: Boolean = {
var j = 0
t.foreach { a =>
j += 1
if (j>i) return true
}
false
}
def next: A = {
var j = 0
t.foreach{ a =>
j += 1
if (j>i) { i += 1; return a }
}
throw new NoSuchElementException("Terribly empty")
}
}
This is clearly a terrible idea for general code.
The other way to go is to use a thread and block the traversal of foreach as it's going. That's right, you have to do inter-thread communication on every single element access! Let's see how that works--I'm going to use Java threads here since Scala is in the middle of a switch to Akka-style actors (though any of the old actors or the Akka actors or the Scalaz actors or the Lift actors or (etc.) will work)
class Horrible[A](t: Traversable[A]) extends Iterator[A] {
private val item = new java.util.concurrent.SynchronousQueue[Option[A]]()
private class Loader extends Thread {
override def run() { t.foreach{ a => item.put(Some(a)) }; item.put(None) }
}
private val loader = new Loader
loader.start
private var got: Option[A] = null
def hasNext: Boolean = {
if (got==null) { got = item.poll; hasNext }
else got.isDefined
}
def next = {
if (got==null) got = item.poll
val ans = got.get
got = null
ans
}
}
This avoids the O(n^2) disaster, but ties up a thread and has desperately slow element-by-element access. I get about two million accesses per second on my machine, as compared to >100M for a typical traversable. This is clearly a horrible idea for general code.
So there you have it. Traversable is not lazy in general, and there is no good way to make it lazy without compromising performance tremendously.
I've run into this problem before and as far as I can tell, no one's particularly interested in making it easier to get an Iterator when all you've defined is foreach.
But as you've noted, toStream is the problem, so you could just override that:
class Test extends Traversable[String] {
def foreach[U](f: (String) => U) {
f("1")
f("2")
f("3")
throw new RuntimeException("Not lazy!")
}
override def toStream: Stream[String] = {
"1" #::
"2" #::
"3" #::
Stream[String](throw new RuntimeException("Not lazy!"))
}
}
Another alternative would be to define an Iterable instead of a Traversable, and then you'd get the iterator method directly. Could you explain a bit more what your Traversable is doing in your real use case?

Some help needed to help the type inferring engine

I've problems understanding where to put type informations in scala, and how to put it. Here I create several sequences of Actors and I don't type them. Even if I had to, I wouldn't know which type of sequence map produces to give them the proper type.
Then later when the compiler yells at me because I'm trying to sum Anys, I've no idea where to begin filling in the gaps.
Here is my code, I tried to minimize it while still letting the necessary info available.
object Actors {
def main(args: Array[String]) {
val array = randomArray(5)
val master = new Master(array, 5)
master.start
}
def randomArray(length: Int): Array[Int] = {
val generator = new Random
new Array[Int](length) map((_:Int) => generator nextInt)
}
}
class Master(array: Array[Int], slavesNumber: Int) extends Actor {
def act () {
val slaves = (1 to slavesNumber).map(_ => new Slave)
slaves.foreach(s => s.start)
val futures = slaves.map(s => s !! Work(array))
val results = awaitAll(3000, futures:_*)
val res2 = results.flatMap(x => x)
println((0 /: res2)(_+_))
}
}
class Slave() extends Actor {
def act () {
Actor.loop {
receive {
case Work(slice) =>
reply((slice :\ 0)(_+_))
}
}
}
}
I'd appreciate too some good pointers towards comprehensive doc on the matter.
The object that are passed between actors are not typed, actors have to filter the object themselves -- as you already do in the Slave actor. As you can see, !! is defined as
def !!(msg: Any): Future[Any]
so there is no type information in the returned Future. Probably the easiest solution is to replace the line var res2 .. with
val res2 = results collect {case Some(y:Int) => y}
this filters out just those Some results that are of type Int.