Alpakka MongoDB - specify type in MongoSource - mongodb

I'm currently playing around with Akka Streams and the Alpakka MongoDB connector.
Is it possible to specify the type for MongoSource?
val codecRegistry = fromRegistries(fromProviders(classOf[TodoMongo]), DEFAULT_CODEC_REGISTRY)
private val todoCollection: MongoCollection[TodoMongo] = mongoDb
.withCodecRegistry(codecRegistry)
.getCollection("todo")
I would like to do something like this:
val t: FindObservable[Seq[TodoMongo]] = todoCollection.find()
MongoSource(t) // Stuck here
But I get the following error:
Expected Observable[scala.Document], Actual FindObservable[Seq[TodoMongo]].
I can't find the correct documentation about this part.

This is not published yet, but in Alpakka's master branch, MongoSource.apply takes a type parameter:
object MongoSource {
def apply[T](query: Observable[T]): Source[T, NotUsed] =
Source.fromPublisher(ObservableToPublisher(query))
}
Therefore, with the upcoming 0.18 release of Alpakka, you'll be able to do the following:
val source: Source[TodoMongo, NotUsed] = MongoSource[TodoMongo](todoCollection.find())
Note that source here assumes that todoCollection.find() returns an Observable[TodoMongo]; adjust the types as needed.
In the meantime, you could simply add the above code manually. For example:
package akka.stream.alpakka.mongodb.scaladsl
import akka.NotUsed
import akka.stream.alpakka.mongodb.ObservableToPublisher
import akka.stream.scaladsl.Source
import org.mongodb.scala.Observable
object MyMongoSource {
def apply[T](query: Observable[T]): Source[T, NotUsed] =
Source.fromPublisher(ObservableToPublisher(query))
}
Note that MyMongoSource is defined to reside in the akka.stream.alpakka.mongodb.scaladsl package (like MongoSource), because ObservableToPublisher is a package-private class. You would use MyMongoSource in the same way that you would use MongoSource:
val source: Source[TodoMongo, NotUsed] = MyMongoSource[TodoMongo](todoCollection.find())

Related

Could not find implicit value for parameter write eror, yet I defined the handler using the macro

I have the following:
Account.scala
package modules.accounts
import java.time.Instant
import reactivemongo.api.bson._
case class Account(id: String, name: String)
object Account {
type ID = String
implicit val accountHandler: BSONDocumentHandler[Account] = Macros.handler[Account]
// implicit def accountWriter: BSONDocumentWriter[Account] = Macros.writer[Account]
// implicit def accountReader: BSONDocumentReader[Account] = Macros.reader[Account]
}
AccountRepo.scala
package modules.accounts
import java.time.Instant
import reactivemongo.api.collections.bson.BSONCollection
import scala.concurrent.ExecutionContext
final class AccountRepo(
val coll: BSONCollection
)(implicit ec: ExecutionContext) {
import Account.{ accountHandler, ID }
def insertTest() = {
val doc = Account(s"account123", "accountName") //, Instant.now)
coll.insert.one(doc)
}
}
The error I am getting is:
could not find implicit value for parameter writer: AccountRepo.this.coll.pack.Writer[modules.accounts.Account]
[error] coll.insert.one(doc)
From what I understand the implicit handler that is generated by the macro should be enough and create the Writer. What am I doing wrong?
Reference: http://reactivemongo.org/releases/1.0/documentation/bson/typeclasses.html
The code is mismixing different versions.
The macro generated handler is using the new BSON API, as it can be seen with the import reactivemongo.api.bson, whereas the collection is using an old driver, as it can be seen as it uses reactivemongo.api.collections.bson instead of reactivemongo.api.bson.collection.
It's recommended to have a look at the documentation, and not mixing incompatible versions of related libraries.

How to use TypeInformation in a generic method using Scala

I'm trying to create a generic method in Apache Flink to parse a DataSet[String](JSON strings) using case classes. I tried to use the TypeInformation like it's mentioned here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html#generic-methods
I'm using liftweb to parse the JSON string, this is my code:
import net.liftweb.json._
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.scala._
class Loader(settings: Map[String, String])(implicit environment: ExecutionEnvironment) {
val env: ExecutionEnvironment = environment
def load[T: TypeInformation](): DataSet[T] = {
val data: DataSet[String] = env.fromElements(
"""{"name": "name1"}""",
"""{"name": "name2"}"""
)
implicit val formats = DefaultFormats
data.map(item => parse(item).extract[T])
}
}
But I got the error:
No Manifest available for T
data.map(item => parse(item).extract[T])
Then I tried to add a Manifest and delete the TypeInformation like this:
def load[T: Manifest](): DataSet[T] = { ...
And I got the next error:
could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[T]
I'm very confuse about this, I'll really appreciate your help.
Thanks.

Change a materialized value in a source using the contents of the stream

Alpakka provides a great way to access dozens of different data sources. File oriented sources such as HDFS and FTP sources are delivered as Source[ByteString, Future[IOResult]. However, HTTP requests via Akka HTTP are delivered as entity streams of Source[ByteString, NotUsed]. In my use case, I would like to retrieve content from HTTP sources as Source[ByteString, Future[IOResult] so I can build a unified resource fetcher that works from multiple schemes (hdfs, file, ftp and S3 in this case).
In particular, I would like to convert the Source[ByteString, NotUsed] source to
Source[ByteString, Future[IOResult] where I am able to calculate the IOResult from the incoming byte stream. There are plenty of methods like flatMapConcat and viaMat but none seem to be able to extract details from the input stream (such as number of bytes read) or initialise the IOResult structure properly. Ideally, I am looking for a method with the following signature that will update the IOResult as the stream comes in.
def matCalc(src: Source[ByteString, Any]) = Source[ByteString, Future[IOResult]] = {
src.someMatFoldMagic[ByteString, IOResult](IOResult.createSuccessful(0))(m, b) => m.withCount(m.count + b.length))
}
i can't recall any existing functionality, which can out of the box do this, but you can use alsoToMat (surprisingly didn't find it in akka streams docs, although you can look it in source code documentation & java api) flow function together with Sink.fold to accumulate some value and give it in the very end. eg:
def magic(source: Source[Int, Any]): Source[Int, Future[Int]] =
source.alsoToMat(Sink.fold(0)((acc, _) => acc + 1))((_, f) => f)
the thing is that alsoToMat combines input mat value with the one provided in alsoToMat. at the same time the values produced by source are not affected by the sink in alsoToMat:
def alsoToMat[Mat2, Mat3](that: Graph[SinkShape[Out], Mat2])(matF: (Mat, Mat2) ⇒ Mat3): ReprMat[Out, Mat3] =
viaMat(alsoToGraph(that))(matF)
it's not that hard to adapt this function to return IOResult, which is according to the source code:
final case class IOResult(count: Long, status: Try[Done]) { ... }
one more last thing which you need to pay attention - you want your source be like:
Source[ByteString, Future[IOResult]]
but if you wan't to carry these mat value till the very end of stream definition, and then do smth based on this future completion, that might be error prone approach. eg, in this example i finish the work based on that future, so the last value will not be processed:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Keep, Sink, Source}
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
object App extends App {
private implicit val sys: ActorSystem = ActorSystem()
private implicit val mat: ActorMaterializer = ActorMaterializer()
private implicit val ec: ExecutionContext = sys.dispatcher
val source: Source[Int, Any] = Source((1 to 5).toList)
def magic(source: Source[Int, Any]): Source[Int, Future[Int]] =
source.alsoToMat(Sink.fold(0)((acc, _) => acc + 1))((_, f) => f)
val f = magic(source).throttle(1, 1.second).toMat(Sink.foreach(println))(Keep.left).run()
f.onComplete(t => println(s"f1 completed - $t"))
Await.ready(f, 5.minutes)
mat.shutdown()
sys.terminate()
}
This can be done by using a Promise for the materialized value propagation.
val completion = Promise[IoResult]
val httpWithIoResult = http.mapMaterializedValue(_ => completion.future)
What is left now is to complete the completion promise when the relevant data becomes available.
Alternative approach would be to drop down to the GraphStage API where you get lower level control of materialized value propagation. But even there using Promises is often the chosen implementation for materialized value propagation. Take a look at built in operator implementations like Ignore.

How to create WSClient in Scala ?

Hello I'm writing scala code to pull the data from API.
Data is paginated, so I'm pulling a data sequentially.
Now, I'm looking a solution to pulling multiple page parallel and stuck to create WSClient programatically instead of Inject.
Anyone have a solution to create WSClient ?
I found a AhcWSClient(), but it required to implicitly import actor system.
When you cannot Inject one as suggested in the other answer, you can create a Standalone WS client using:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import play.api.libs.ws._
import play.api.libs.ws.ahc.StandaloneAhcWSClient
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val ws = StandaloneAhcWSClient()
No need to reinvent the wheel here. And I'm not sure why you say you can't inject a WSClient. If you can inject a WSClient, then you could do something like this to run the requests in parallel:
class MyClient #Inject() (wsClient: WSClient)(implicit ec: ExecutionContext) {
def getSomething(urls: Vector[String]): Future[Something] = {
val futures = urls.par.map { url =>
wsClient.url(url).get()
}
Future.sequence(futures).map { responses =>
//process responses here. You might want to fold them together
}
}
}

Scalacache with redis support

I am trying to integrate redis to scalacache. Keys are usually string but values can be objects, Set[String], etc. Cache is initialized by this
val cache: RedisCache = RedisCache(config.host, config.port)
private implicit val scalaCache: ScalaCache[Array[Byte]] = ScalaCache(cacheService.cache)
But while calling put, i am getting this error "Could not find any Codecs for type Set[String] and Repr". Looks like i need to provide codec for my cache input as suggested here so i added,
class A extends Codec[Set[String], Array[Byte]] with GZippingBinaryCodec[Set[String]]
Even after, my class A, is throwing the same error. What am i missing.
As you mentioned in the link, you can either serialize values in a binary format:
import scalacache.serialization.binary._
or as JSON using circe:
import scalacache.serialization.circe._
import io.circe.generic.auto._
Looks like its solved in next release by binary and circe serialization. I am on version 10 and solved by the following,
implicit object SetBindaryCodec extends Codec[Any, Array[Byte]] {
override def serialize(value: Any): Array[Byte] = {
val stream: ByteArrayOutputStream = new ByteArrayOutputStream()
val oos = new ObjectOutputStream(stream)
oos.writeObject(value)
oos.close()
stream.toByteArray
}
override def deserialize(data: Array[Byte]): Any = {
val ois = new ObjectInputStream(new ByteArrayInputStream(data))
val value = ois.readObject
ois.close()
value
}
}
Perks of being up to date. Will upgrade the version, posted it just in case somebody needs it.