Transform Enumerator[T] into List[T]

Transform Enumerator[T] into List[T] - scala

I am integrating an application using ReactiveMongo with a legacy application.
As, I must maintain legacy application interfaces at some point I must block and/or transform my code into the specified interface types. I have that code distilled down to the example below.
Is there a better way than, getChunks, to consume all of an Enumerator with the output type being a List? What is the standard practice?
implicit def legacyAdapter[TInput,TResult]
(block: Future[Enumerator[TInput]])
(implicit translator : (TInput => TResult),
executionContext:ExecutionContext,
timeOut : Duration): List[TResult] = {
val iter = Iteratee.getChunks[TResult]
val exhaustFuture = block.flatMap{
enumy => { enumy.map(i => translator(i) ).run(iter) }
}
val r = Await.result(exhaustFuture , timeOut)
r
}

Iteratee.getChunks is the only utility offered by playframework that build a list by consuming all chuncks of an enumerator, you can of course do the same thing using Iteratee.fold but you will reinvent the wheel as Iteratee.getChunks uses Iteratee.fold.

We found that exposing the Collect method of http://reactivemongo.org/releases/0.10/api/index.html#reactivemongo.api.Cursor to be more performant than collecting the chunks of the Enumerator. So in a sense we solved our problem by changing the problem.
That did mean an API change for our data layer but the performance improvements allowed for it.

Related

How can I provide a custom header to a ZIO during tests

I have service that returns a ZIO[Has[MyCustomHeader]], and I'm having trouble testing it.
Other services in our organisation are tested by converting ZIO to Twitterfuture using runtime.unsafeRunToFuture (where runtime is a Runtime[ZEnv] ) and then awaiting the future, thus running the tests in blocking mode.
However this service has a Has[] requirement and runtime.unsafeRunToFuture doesnt handle those. So far my approach has been to try to convert my ZIO[Has[MyCustomHeader]] to a ZIO[ZEnv], but I've yet to succeed at this.
from what I gather I need to provide a ZLayer via ZIO.provideSomeLayer() but I'm simply too stupid to understand how to construct a ZLayer properly?
Am I even on the right path here? and if so, How do I construct a ZLayer with a static value for MyCustomHeader to use in my tests?
This is how far along I am at trying to add a header for testing purposes: it doesn't work, but might illustrate what I'm trying to achieve..maybe... I'm pretty confused myself:
object effectAwait {
implicit class ZioEffect[A](private val value: ZIO[Has[EnvironmentHeader], RequestFailure, A]) extends AnyVal {
final def await(implicit runtime: Runtime[ZEnv] = Runtime.default): A = {
val zmanaged = ZManaged.fromEffect(value).provide(Has(EnvironmentHeader("test")))
val layered = value.provideSomeLayer(zmanaged.toLayer)
val sf = runtime.unsafeRunToFuture(layered)
Await.result(sf, 10.seconds)
}
}
}
this however gives me the error:
could not find implicit value for izumi.reflect.Tag[A]. Did you
forget to put on a Tag, TagK or TagKK context bound on one of the
parameters in A? e.g. def x[T: Tag, F[_]: TagK] = ...
<trace>:
deriving Tag for A, dealiased: A:
could not find implicit value for Tag[A]: A is a type parameter without an implicit Tag!
val layered = value.provideSomeLayer(zmanaged.toLayer)

I think you can just use ZIO.provideLayer (instead of provideSomeLayer) here :)
Also, there's a runtime.unsafeRun that will wait for the result as well, so you don't necessarily have to convert it to a Future. Also, also, instead of relying on an implicit runtime, there's always zio.Runtime.default that you can use anywhere (it's a Runtime[ZEnv] so it should work just as well, unless you've otherwise customized the runtime's behavior)

Using scala-cats IO type to encapsulate a mutable Java library

I understand that generally speaking there is a lot to say about deciding what one wants to model as effect This discussion is introduce in Functional programming in Scala on the chapter on IO.
Nonethless, I have not finished the chapter, i was just browsing it end to end before takling it together with Cats IO.
In the mean time, I have a bit of a situation for some code I need to deliver soon at work.
It relies on a Java Library that is just all about mutation. That library was started a long time ago and for legacy reason i don't see them changing.
Anyway, long story short. Is actually modeling any mutating function as IO a viable way to encapsulate a mutating java library ?
Edit1 (at request I add a snippet)
Readying into a model, mutate the model rather than creating a new one. I would contrast jena to gremlin for instance, a functional library over graph data.
def loadModel(paths: String*): Model =
paths.foldLeft(ModelFactory.createOntologyModel(new OntModelSpec(OntModelSpec.OWL_MEM)).asInstanceOf[Model]) {
case (model, path) ⇒
val input = getClass.getClassLoader.getResourceAsStream(path)
val lang = RDFLanguages.filenameToLang(path).getName
model.read(input, "", lang)
}
That was my scala code, but the java api as documented in the website look like this.
// create the resource
Resource r = model.createResource();
// add the property
r.addProperty(RDFS.label, model.createLiteral("chat", "en"))
.addProperty(RDFS.label, model.createLiteral("chat", "fr"))
.addProperty(RDFS.label, model.createLiteral("<em>chat</em>", true));
// write out the Model
model.write(system.out);
// create a bag
Bag smiths = model.createBag();
// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
public boolean selects(Statement s) {
return s.getString().endsWith("Smith");
}
});
// add the Smith's to the bag
while (iter.hasNext()) {
smiths.add(iter.nextStatement().getSubject());
}

So, there are three solutions to this problem.
1. Simple and dirty
If all the usage of the impure API is contained in single / small part of the code base, you may just "cheat" and do something like:
def useBadJavaAPI(args): IO[Foo] = IO {
// Everything inside this block can be imperative and mutable.
}
I said "cheat" because the idea of IO is composition, and a big IO chunk is not really composition. But, sometimes you only want to encapsulate that legacy part and do not care about it.
2. Towards composition.
Basically, the same as above but dropping some flatMaps in the middle:
// Instead of:
def useBadJavaAPI(args): IO[Foo] = IO {
val a = createMutableThing()
mutableThing.add(args)
val b = a.bar()
b.computeFoo()
}
// You do something like this:
def useBadJavaAPI(args): IO[Foo] =
for {
a <- IO(createMutableThing())
_ <- IO(mutableThing.add(args))
b <- IO(a.bar())
result <- IO(b.computeFoo())
} yield result
There are a couple of reasons for doing this:
Because the imperative / mutable API is not contained in a single method / class but in a couple of them. And the encapsulation of small steps in IO is helping you to reason about it.
Because you want to slowly migrate the code to something better.
Because you want to feel better with yourself :p
3. Wrap it in a pure interface
This is basically the same that many third party libraries (e.g. Doobie, fs2-blobstore, neotypes) do. Wrapping a Java library on a pure interface.
Note that as such, the amount of work that has to be done is way more than the previous two solutions. As such, this is worth it if the mutable API is "infecting" many places of your codebase, or worse in multiple projects; if so then it makes sense to do this and publish is as an independent module.
(it may also be worth to publish that module as an open-source library, you may end up helping other people and receive help from other people as well)
Since this is a bigger task is not easy to just provide a complete answer of all you would have to do, it may help to see how those libraries are implemented and ask more questions either here or in the gitter channels.
But, I can give you a quick snippet of how it would look like:
// First define a pure interface of the operations you want to provide
trait PureModel[F[_]] { // You may forget about the abstract F and just use IO instead.
def op1: F[Int]
def op2(data: List[String]): F[Unit]
}
// Then in the companion object you define factories.
object PureModel {
// If the underlying java object has a close or release action,
// use a Resource[F, PureModel[F]] instead.
def apply[F[_]](args)(implicit F: Sync[F]): F[PureModel[F]] = ???
}
Now, how to create the implementation is the tricky part.
Maybe you can use something like Sync to initialize the mutable state.
def apply[F[_]](args)(implicit F: Sync[F]): F[PureModel[F]] =
F.delay(createMutableState()).map { mutableThing =>
new PureModel[F] {
override def op1: F[Int] = F.delay(mutableThing.foo())
override def op2(data: List[String]): F[Unit] = F.delay(mutableThing.bar(data))
}
}

Looking for some guidance on how to code a writer for a given "AST" (DynamoDB)

As a personal project, I am writing yet another Scala library for DynamoDb. It contains many interesting aspect such as reading and writing from an AST (just as Json), handling HTTP request, streaming data…
In order to be able able to communicate with DynamoDb, one needs to be able to read from / to the DynamoDb format (the “AST”). I extracted this reading / writing from / to the AST in a minimalist library: dynamo-ast. It contains two main type classes: DynamoReads[_] and DynamoWrites[_] (deeply inspired from Play Json).
I successfully coded the reading part of the library ending with a very simple code such as :
trait DynamoRead[A] { self =>
def read(dynamoType: DynamoType): DynamoReadResult[A]
}
case class TinyImage(url: String, alt: String)
val dynamoReads: DynamoReads[TinyImage] = {
for {
url <- read[String].at(“url”)
alt <- read[String].at(“alt”)
} yield (url, alt) map (TinyImage.apply _).tupled
}
dynamoReads.reads(dynamoAst) //yield DynamoReadResult[TinyImage]
At that point, I thought I wrote the most complicated part of the library and the DynamoWrite[_] part would be a piece of cake. I am however stuck on writing the DynamoWrite part. I was a fool.
My goal is to provide a very similar “user experience” with the DynamoWrite[_] and keep it as simple as possible such as :
val dynamoWrites: DynamoWrites[TinyImage] = {
for {
url <- write[String].at(“url”)
alt <- write[String].at(“alt”)
} yield (url, alt) map (TinyImage.unapply _) //I am not sure what to yield here nor how to code it
}
dynamoWrites.write(TinyImage(“http://fake.url”, “The alt desc”)) //yield DynamoWriteResult[DynamoType]
Since this library is deeply inspired from Play Json library (because I like its simplicity) I had a look at the sources several times. I kind of dislike the way the writer part is coded because to me, it adds a lot of overhead (basically each time a field a written, a new JsObject is created with one field and the resulting JsObject for a complete class is the merge of all the JsObjects containing one field).
From my understanding, the DynamoReads part can be written with only one trait (DynamoRead[_]). The DynamoWrites part however requires at least two such as :
trait DynamoWrites[A] {
def write(a: A): DynamoWriteResult[DynamoType]
}
trait DynamoWritesPath[A] {
def write(path:String, a: A): DynamoWriteResult[(String, DynamoType)]
}
The DynamoWrites[_] is to write plain String, Int… and the DynamoWritesPath[_] is to write a tuple of (String, WhateverTypeHere) (to simulate a “field”).
So writing write[String].at(“url”) would yield a DynamoWritesPath[String]. Now I have several issues :
I have no clue how to write flatMap for my DynamoWritesPath[_]
what should yield a for comprehension to be able to obtain a DynamoWrite[TinyImage]
What I wrote so far (totally fuzzy and not compiling at all, looking for some help on this). Not committed at the moment (gist): https://gist.github.com/louis-forite/cad97cc0a47847b2e4177192d9dbc3ae
To sum up, I am looking for some guidance on how to write the DynamoWrites[_] part. My goal is to provide for the client the most straight forward way to code a DynamoWrites[_] for a given type. My non goal is to write the perfect library and keep it a zero dependency library.
Link to the library: https://github.com/louis-forite/dynamo-ast

A Reads is a covariant functor. That means it has map. It can also be seen as a Monad which means it has flatMap (although a monad is overkill unless you need the previous field in order to know how to process the next):
trait Reads[A] {
def map [B] (f: A => B): Reads[B]
def flatMap [B](f: A => Reads[B]): Reads[B] // not necessary, but available
}
The reason for this, is that to transform a Reads[Int] to a Reads[String], you need to first read the Int, then apply the Int => String function.
But a Writes is a contravariant functor. It has contramap where the direction of the types is reversed:
trait Writes[A] {
def contramap [B](f: B => A): Reads[B]
}
The type on the function is reversed because to transform a Writes[Int] to a Writes[String] you must receive the String from the caller, apply the transformation String => Int and then write the Int.
I don't think it makes sense to provide for-comprehension syntax (flatMap) for the Writes API.
// here it is clear that you're extracting a string value
url <- read[String].at(“url”)
// but what does this mean for the write method?
url <- write[String].at("url")
// what is `url`?
That's probably why play doesn't provide one either, and why they focus on their combinator syntax (using the and function, their version of applicative functor builder?).
For reference: http://blog.tmorris.net/posts/functors-and-things-using-scala/index.html
You can achieve a more consistent API by using something like the and method in play json:
(write[String]("url") and write[String]("alt"))(unlift(TinyImage.unapply))
(read[String]("url") and read[String]("alt"))(TinyImage.apply)
// unfortunately, the type ascription is necessary in this case
(write[String]("url") and write[String]("alt")) {(x: TinyImage) =>
(x.url, x.alt)
}
// transforming
val instantDynamoType: DynamoFormat[Instant] =
format[String].xmap(Instant.parse _)((_: Instant).toString)
You can still use for-comprehension for the reads, although it's a bit over-powered (sort of implies that fields must be processed in-sequence, while that's not technically necessary).

Future with State monad

I would like to use State monad to implement caching for data provided from third party API. Let's imagine method getThirdPartyData(key: String) which firstly checks cache then if it's not present there should make request to API. First and most naive implementation which came to my mind was enclosing State type within Future -
Future[State[Cache, ThirdPartyData]]
But it's not correct because when request fails you will lose your cache (getThirdPartyData will return Failure).
Second option which came to my mind was to extend, or rather redefine State monad - s => Future[(s,a)], instead of s => (s,a), but I thought that it's quite popular problem so scalaz probably has some already defined way to solve this issue.
Any help greatly appreciated!

Is this what you are looking for StateT[Future, Cache, ThirdPartyData]?
implicit val m: Monoid[ThirdPartyData] = ...
val startState: Cache = ...
val l: List[StateT[Future, Cache, ThirdPartyData]] = ...
val result = l.sequenceU
.map { _.foldMap (identity)) }
.eval (startState)

Is there an easy way to get a Stream as output of a RowParser?

Given rowParser of type RowParser[Photo], this is how you would parse a list of rows coming from a table photo, according to the code samples I have seen so far:
def getPhotos(album: Album): List[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
).as(rowParser *)
}
Where the * operator creates a parser of type ResultSetParser[List[Photo]]. Now, I was wondering if it was equally possible to get a parser that yields a Stream (thinking that being more lazy is always better), but I only came up with this:
def getPhotos(album: Album): Stream[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
)() collect (rowParser(_) match { case Success(photo) => photo })
}
It works, but it seems overly complicated. I could of course just call toStream on the List I get from the first function, but my goal was to only apply rowParser on rows that are actually read. Is there an easier way to achieve this?
EDIT: I know that limit should be used in the query, if the number of rows of interest is known beforehand. I am also aware that, in many cases, you are going to use the whole result anyway, so being lazy will not improve performance. But there might be a case where you save a few cycles, e.g. if for some reason, you have search criteria that you cannot or do not want to express in SQL. So I thought it was odd that, given the fact that anorm provides a way to obtain a Stream of SqlRow, I didn't find a straightforward way to apply a RowParser on that.

I ended up creating my own stream method which corresponds to the list method:
def stream[A](p: RowParser[A]) = new ResultSetParser[Stream[A]] {
def apply(rows: SqlParser.ResultSet): SqlResult[Stream[A]] = rows.headOption.map(p(_)) match {
case None => Success(Stream.empty[A])
case Some(Success(a)) => {
val s: Stream[A] = a #:: rows.tail.flatMap(r => p(r) match {
case Success(r) => Some(r)
case _ => None
})
Success(s)
}
case Some(Error(msg)) => Error(msg)
}
}
Note that the Play SqlResult can only be either Success/Error while each row can also be Success/Error. I handle this for the first row only, assuming the rest will be the same. This may or may not work for you.

You're better off making smaller (paged) queries using limit and offset.
Anorm would need some modification if you're going to keep your (large) result around in memory and stream it from there. Then the other concern would be the new memory requirements for your JVM. And how would you deal with caching on the service level? See, previously you could easily cache something like photos?page=1&size=10, but now you just have photos, and the caching technology would have no idea what to do with the stream.
Even worse, and possibly on a JDBC-level, wrapping Stream around limited and offset-ed execute statements and just making multiple calls to the database behind the scenes, but this sounds like it would need a fair bit of work to port the Stream code that Scala generates to Java land (to work with Groovy, jRuby, etc), then get it on the approved for the JDBC 5 or 6 roadmap. This idea will probably be shunned as being too complicated, which it is.
You could wrap Stream around your entire DAO (where the limit and offset trickery would happen), but this almost sounds like more trouble than it's worth :-)

I ran into a similar situation but ran into a Call Stack Overflow exception when the built-in anorm function to convert to Streams attempted to parse the result set.
In order to get around this I elected to abandon the anorm ResultSetParser paradigm, and fall back to the java.sql.ResultSet object.
I wanted to use anorm's internal classes for the parsing result set rows, but, ever since version 2.4, they have made all of the pertinent classes and methods private to their package, and have deprecated several other methods that would have been more straight-forward to use.
I used a combination of Promises and Futures to work around the ManagedResource that anorm now returns. I avoided all deprecated functions.
import anorm._
import java.sql.ResultSet
import scala.concurrent._
def SqlStream[T](sql:SqlQuery)(parse:ResultSet => T)(implicit ec:ExecutionContext):Future[Stream[T]] = {
val conn = db.getConnection()
val mr = sql.preparedStatement(conn, false)
val p = Promise[Unit]()
val p2 = Promise[ResultSet]()
Future {
mr.map({ stmt =>
p2.success(stmt.executeQuery)
Await.ready(p.future, duration.Duration.Inf)
}).acquireAndGet(identity).andThen { case _ => conn.close() }
}
def _stream(rs:ResultSet):Stream[T] = {
if (rs.next()) parse(rs) #:: _stream(rs)
else {
p.success(())
Stream.empty
}
}
p2.future.map { rs =>
rs.beforeFirst()
_stream(rs)
}
}
A rather trivial usage of this function would be something like this:
def getText(implicit ec:ExecutionContext):Future[Stream[String]] = {
SqlStream(SQL("select FIELD from TABLE")) { rs => rs.getString("FIELD") }
}
There are, of course, drawbacks to this approach, however, this got around my problem and did not require inclusion of any other libraries.