How do I use infinite Scala streams as source in Spark Streaming? - scala

Suppose I essentially want Stream.from(0) as InputDStream. How would I go about this? The only way I can see is to use StreamingContext#queueStream, but I'd have to either enqueue elements from another thread or subclass Queue to create a queue that behaves like an infinite stream, both of which feel like a hack.
What's the correct way to do this?

I don't think that it's available in Spark by default but it's easy to implement it with ReceiverInputDStream.
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
class InfiniteStreamInputDStream[T](
#transient ssc_ : StreamingContext,
stream: Stream[T],
storageLevel: StorageLevel
) extends ReceiverInputDStream[T](ssc_) {
override def getReceiver(): Receiver[T] = {
new InfiniteStreamReceiver(stream, storageLevel)
}
}
class InfiniteStreamReceiver[T](stream: Stream[T], storageLevel: StorageLevel) extends Receiver[T](storageLevel) {
// Stateful iterator
private val streamIterator = stream.iterator
private class ReadAndStore extends Runnable {
def run(): Unit = {
while (streamIterator.hasNext) {
val next = streamIterator.next()
store(next)
}
}
}
override def onStart(): Unit = {
new Thread(new ReadAndStore).run()
}
override def onStop(): Unit = { }
}

Slightly modified code tat works with Spark 2.0:
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
import scala.reflect.ClassTag
class InfiniteDStream[T: ClassTag](
#transient ssc_ : StreamingContext,
stream: Stream[T],
storageLevel: StorageLevel
) extends ReceiverInputDStream[T](ssc_) {
override def getReceiver(): Receiver[T] = {
new InfiniteStreamReceiver(stream, storageLevel)
}
}
class InfiniteStreamReceiver[T](stream: Stream[T], storageLevel: StorageLevel) extends Receiver[T](storageLevel) {
private class ReadAndStore extends Runnable {
def run(): Unit = {
stream.foreach(store)
}
}
override def onStart(): Unit = {
val t = new Thread(new ReadAndStore)
t.setDaemon(true)
t.run()
}
override def onStop(): Unit = {}
}

Related

Scala Generic Repository Class For Reactive Mongo Repository(alpakka) - Needed Class Found T

Im trying to create a Generic Class in Scala so I can create a repository for different collection without repeating myself.
The problem is that if I do it as a Generic Class(as in this example) I get a problem in this line:
val codecRegistry = fromRegistries(fromProviders(classOf[T]), DEFAULT_CODEC_REGISTRY)
Expected Class but Found [T]
But if I change T for any other class (lets say User) in all the code it works.
This is my class:
package persistence.repository.impl
import akka.stream.Materializer
import akka.stream.alpakka.mongodb.scaladsl.{MongoSink, MongoSource}
import akka.stream.scaladsl.{Sink, Source}
import akka.{Done, NotUsed}
import com.mongodb.reactivestreams.client.MongoClients
import constants.MongoConstants._
import org.bson.codecs.configuration.CodecRegistries.{fromProviders, fromRegistries}
import org.mongodb.scala.MongoClient.DEFAULT_CODEC_REGISTRY
import org.mongodb.scala.bson.codecs.Macros._
import org.mongodb.scala.model.Filters
import persistence.entity.{ProductItem}
import persistence.repository.Repository
import scala.concurrent.{ExecutionContext, Future}
class UserMongoDatabase[T](implicit materializer: Materializer,
executionContext: ExecutionContext)
extends Repository[T] {
val codecRegistry = fromRegistries(fromProviders(classOf[T]), DEFAULT_CODEC_REGISTRY)
val client = MongoClients.create(HOST)
val db = client.getDatabase(DATABASE)
val requestedCollection = db
.getCollection(USER_COLLECTION, classOf[T])
.withCodecRegistry(codecRegistry)
val source: Source[T, NotUsed] =
MongoSource(requestedCollection.find(classOf[T]))
val rows: Future[Seq[T]] = source.runWith(Sink.seq)
override def getAll: Future[Seq[T]] = rows
override def getById(id: AnyVal): Future[Option[T]] = rows.map {
list =>
list.filter {
user => user.asInstanceOf[ {def _id: AnyVal}]._id == id
}.headOption
}
override def getByEmail(email: String): Future[Option[T]] = rows.map {
list =>
list.filter {
user => user.asInstanceOf[ {def email: AnyVal}].email == email
}.headOption
}
override def save(obj: T): Future[T] = {
val source = Source.single(obj)
source.runWith(MongoSink.insertOne(requestedCollection)).map(_ => obj)
}
override def delete(id: AnyVal): Future[Done] = {
val source = Source.single(id).map(i => Filters.eq("_id", id))
source.runWith(MongoSink.deleteOne(requestedCollection))
}
}
This is my repository trait:
package persistence.repository
import akka.Done
import scala.concurrent.Future
trait Repository[T]{
def getAll: Future[Seq[T]]
def getById(id: AnyVal): Future[Option[T]]
def save(user: T): Future[T]
def delete(id: AnyVal): Future[Done]
def getByEmail(email:String): Future[Option[T]]
}
As said in the comments, this is the perfect example of usage of ClassTag in Scala. It allow to retain the actual class of a generic/parameterized class.
class DefaultMongoDatabase[T](implicit ..., ct: ClassTag[T])
extends Repository[T] {
val codecRegistry = fromRegistries(fromProviders(ev.runtimeClass), ...)
(You can move the classtag logic in the trait if you want.)

How to implement Generic slick dao with oauth2 in scala

To work with OAuth2, Slick, Akka-HTTP in scala, I'm using the code from cdiniz https://github.com/cdiniz/slick-akka-http-oauth2.
I tried to create a DAO: UserDao and respective API: UserApi to fetch all records(in JSON) from database after successful verification of OAuth2.
BaseDao Code:
trait BaseDao[T,A] {
...
def getAll(): Future[Seq[A]]
}
class BaseDaoImpl[T <: BaseTable[A], A <: BaseEntity]()(implicit val tableQ: TableQuery[T], implicit val db: JdbcProfile#Backend#Database,implicit val profile: JdbcProfile) extends BaseDao[T,A] with Profile with DbModule {
import profile.api._
...
override def getAll(): Future[Seq[A]] = {
db.run(tableQ.result)
}
}
UserDao Code:
object UserService {
val modules = new ConfigurationServiceImpl with ActorServiceImpl with PersistenceServiceImpl
import modules._
class UserDao extends BaseDaoImpl[UserTable,UserEntity1] {}
}
UserApi Code:
//using io.circe for json encoder
class UserApi(val modules: Configuration with PersistenceService, val db: DatabaseService)(implicit executionContext: ExecutionContext) extends BaseApi {
override val oauth2DataHandler = modules.oauth2DataHandler
val userService = new UserDao
val testApi = pathPrefix("auth") {
path("users") {
pathEndOrSingleSlash {
get {
authenticateOAuth2Async[AuthInfo[OauthAccount]]("realm", oauth2Authenticator) {
auth => complete(userService.getAll().map(_.asJson))
}
}
}
}
}
}
Supporting Codes:
trait Profile {
val profile: JdbcProfile
}
trait DbModule extends Profile{
val db: JdbcProfile#Backend#Database
}
trait PersistenceService {
val accountsDao: AccountsDao
val oAuthAuthorizationCodesDao: OAuthAuthorizationCodesDao
val oauthClientsDao: OAuthClientsDao
val oauthAccessTokensDao: OAuthAccessTokensDao
val oauth2DataHandler : DataHandler[OauthAccount]
}
trait PersistenceServiceImpl extends PersistenceService with DbModule{
this: Configuration =>
private val dbConfig : DatabaseConfig[JdbcProfile] = DatabaseConfig.forConfig[JdbcProfile]("mysqldb")
override implicit val profile: JdbcProfile = dbConfig.driver
override implicit val db: JdbcProfile#Backend#Database = dbConfig.db
override val accountsDao = new AccountsDaoImpl
override val oAuthAuthorizationCodesDao = new OAuthAuthorizationCodesDaoImpl
override val oauthClientsDao = new OAuthClientsDaoImpl(this)
override val oauthAccessTokensDao = new OAuthAccessTokensDaoImpl(this)
override val oauth2DataHandler = new OAuth2DataHandler(this)
}
Here, The code runs without any issue but doesn't get any records as the response though there are records.
What is the problem in this code that I cannot look into? Any suggestion would be appreciable.

Unit testing in Spark with SQLContext implicits

I'm trying to run multiple unit tests in Spark and have copied (and slightly adapted) the bit from the source code:
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.scalatest.{BeforeAndAfterAll, Suite}
trait SharedSparkContext extends BeforeAndAfterAll {
self: Suite =>
#transient private var _sc: SparkContext = _
#transient private var _sqlContext: SQLContext = _
def sc: SparkContext = _sc
def sqlContext: SQLContext = _sqlContext
private var conf = new SparkConf(false)
override def beforeAll() {
super.beforeAll()
_sc = new SparkContext("local[*]", "Test Suites", conf)
_sqlContext = new SQLContext(_sc)
}
override def afterAll() {
try {
LocalSparkContext.stop(_sc)
_sc = null
} finally {
super.afterAll()
}
}
}
The LocalSparkContext class with companion object are simply copied from the source.
I thought about using it as follows, which tells me that stable identifier required because the def sqlContext does not have the member implicits:
class MySuite extends FlatSpec with SharedSparkContext {
import sqlContext.implicits._
// ...
}
I have tried replacing it with the following, but that gives me null pointer exceptions:
class MySuite extends FlatSpec with SharedSparkContext {
val sqlCtxt = sqlContext
import sqlCtxt.implicits._
// ...
}
I am using Spark 1.4.1 and I have set parallelExecution in test := false.
How can get this to work (without using additional packages)?
Instead of using a trait, you can use a simple object that holds all your variables, here's what I do for my tests :
object TestConfiguration extends Serializable {
private val sparkConf = new SparkConf()
.setAppName("Tests")
.setMaster("local")
private lazy val sparkContext = new SparkContext(sparkConf)
private lazy val sqlContext = new SQLContext(sparkContext)
def getSqlContext() = {
sqlContext
}
}
Then, you'll be able to use the sqlContext in a test suite.
class MySuite extends FlatSpec with SharedSparkContext {
val sqlCtxt = TestConfiguration.getSqlContext
import sqlCtxt.implicits._
// ...
}

Scala/Akka/Guice dynamically injecting child actors

I would like to be able to create multiple instances of the same parent actor, but with different child actors. I assume this has to be possible with Guice, but I haven't found the solution.
Here is what I have in mind ~
Controller:
class Application #Inject()(#Named(ParentActor.parentActor1) parentActor1: ActorRef,
#Named(ParentActor.parentActor2) parentActor2: ActorRef)
extends Controller {
def index = Action {
parentActor1 ! "Message"
parentActor2 ! "Message"
Ok()
}
}
Parent Actor:
object ParentActor {
final val parentActor1 = "parentActor1"
final val parentActor2 = "parentActor2"
}
class ParentActor #Inject() (childActor: ActorRef) extends Actor {
def receive = {
case "Message" =>
println(s"ParentActor ${self.path} received message...")
childActor ! "Message"
}
}
Child Actor A:
class ChildActorA extends Actor {
def receive = {
case "Message" =>
println("ChildActorA received message...")
}
}
Child Actor B:
class ChildActorB extends Actor {
def receive = {
case "Message" =>
println("ChildActorB received message...")
}
}
Module:
class Modules extends AbstractModule with AkkaGuiceSupport {
override def configure() = {
bindActor[ParentActor](ParentActor.parentActor1)
bindActor[ParentActor](ParentActor.parentActor2)
}
}
What if I wanted "parentActor1" to have have its "childActor" ref point to an instance of ChildActorA and "parentActor2" to have its "childActor" ref point to an instance of ChildActorB? Is this possible to achieve with Guice?
I'm using some code based on https://github.com/rocketraman/activator-akka-scala-guice to accomplish something similar
I'm not using Play, so I have to initialize Guice and bootstrap the actor system
import akka.actor._
import javax.inject.{Inject, Provider, Singleton}
import com.google.inject.AbstractModule
import net.codingwell.scalaguice.InjectorExtensions._
import com.google.inject.Guice
import com.google.inject.Injector
import scala.concurrent.Await
import scala.concurrent.duration.Duration
object Bootstrap extends App {
val injector = Guice.createInjector(
new AkkaModule(),
new ServiceModule()
)
implicit val system = injector.instance[ActorSystem]
val parentActor1 = system.actorOf(ParentActor.props(ChildActorA.name))
val parentActor2 = system.actorOf(ParentActor.props(ChildActorB.name))
parentActor1 ! "Message"
parentActor2 ! "Message"
system.terminate()
Await.result(system.whenTerminated, Duration.Inf)
}
To initialize Guice there are two classes/objects:
One to initialize the extension and inject the actor system where required
import akka.actor.ActorSystem
import AkkaModule.ActorSystemProvider
import com.google.inject.{AbstractModule, Injector, Provider}
import com.typesafe.config.Config
import net.codingwell.scalaguice.ScalaModule
import javax.inject.Inject
object AkkaModule {
class ActorSystemProvider #Inject() (val injector: Injector) extends Provider[ActorSystem] {
override def get() = {
val system = ActorSystem("actor-system")
GuiceAkkaExtension(system).initialize(injector)
system
}
}
}
class AkkaModule extends AbstractModule with ScalaModule {
override def configure() {
bind[ActorSystem].toProvider[ActorSystemProvider].asEagerSingleton()
}
}
another one to create the providers for the children
import javax.inject.Inject
import akka.actor.{Actor, ActorRef, ActorSystem}
import com.google.inject.name.{Named, Names}
import com.google.inject.{AbstractModule, Provides, Singleton}
import net.codingwell.scalaguice.ScalaModule
class ServiceModule extends AbstractModule with ScalaModule with GuiceAkkaActorRefProvider {
override def configure() {
bind[Actor].annotatedWith(Names.named(ChildActorA.name)).to[ChildActorA]
bind[Actor].annotatedWith(Names.named(ChildActorB.name)).to[ChildActorB]
}
#Provides
#Named(ChildActorA.name)
def provideChildActorARef(#Inject() system: ActorSystem): ActorRef = provideActorRef(system, ChildActorA.name)
#Provides
#Named(ChildActorB.name)
def provideChildActorBRef(#Inject() system: ActorSystem): ActorRef = provideActorRef(system, ChildActorB.name)
}
The extension
import akka.actor._
import com.google.inject.Injector
class GuiceAkkaExtensionImpl extends Extension {
private var injector: Injector = _
def initialize(injector: Injector) {
this.injector = injector
}
def props(actorName: String) = Props(classOf[GuiceActorProducer], injector, actorName)
}
object GuiceAkkaExtension extends ExtensionId[GuiceAkkaExtensionImpl] with ExtensionIdProvider {
override def lookup() = GuiceAkkaExtension
override def createExtension(system: ExtendedActorSystem) = new GuiceAkkaExtensionImpl
override def get(system: ActorSystem): GuiceAkkaExtensionImpl = super.get(system)
}
trait NamedActor {
def name: String
}
trait GuiceAkkaActorRefProvider {
def propsFor(system: ActorSystem, name: String) = GuiceAkkaExtension(system).props(name)
def provideActorRef(system: ActorSystem, name: String): ActorRef = system.actorOf(propsFor(system, name))
}
producer
import akka.actor.{IndirectActorProducer, Actor}
import com.google.inject.name.Names
import com.google.inject.{Key, Injector}
class GuiceActorProducer(val injector: Injector, val actorName: String) extends IndirectActorProducer {
override def actorClass = classOf[Actor]
override def produce() = injector.getBinding(Key.get(classOf[Actor], Names.named(actorName))).getProvider.get()
}
and your actors
import javax.inject.Inject
import akka.actor._
object ParentActor {
def props(childName: String)(implicit #Inject() system: ActorSystem) = Props(classOf[ParentActor],system.actorOf(GuiceAkkaExtension(system).props(childName)))
}
class ParentActor (childActor: ActorRef) extends Actor {
def receive = {
case "Message" =>
println(s"ParentActor ${self.path} received message...")
childActor ! "Message"
}
}
object ChildActorA extends NamedActor{
override final val name = "ChildActorA"
def props() = Props(classOf[ChildActorA])
}
class ChildActorA extends Actor {
def receive = {
case "Message" =>
println("ChildActorA received message...")
}
}
object ChildActorB extends NamedActor{
override final val name = "ChildActorB"
def props() = Props(classOf[ChildActorB])
}
class ChildActorB extends Actor {
def receive = {
case "Message" =>
println("ChildActorB received message...")
}
}
the output from sbt
> run
[info] Running Bootstrap
ParentActor akka://actor-system/user/$b received message...
ParentActor akka://actor-system/user/$d received message...
ChildActorB received message...
ChildActorA received message...
[success] Total time: 1 s, completed Jun 14, 2016 1:23:59 AM
You have to explicitly name the children,
It's not the purest or most elegant answer, and I'm sure the code can be optimized, but it allows you to create instances of the same parent with different children.
I'm thinking that you can also use BindingAnnotations

How to correctly schedule task in Play Framework 2.4.2 scala?

Trying to schedule tasks like this in Play Framework 2.4.2 Scala without luck:
import akka.actor.Actor
import play.api.libs.concurrent.Akka
import scala.concurrent.duration._
import play.api.Play.current
import scala.concurrent.ExecutionContext.Implicits.global
class Scheduler extends Actor {
override def preStart() {
val dbupdate = Akka.system.scheduler.schedule(
0.microseconds, 5.minutes, self, "update")
val pictureClean = Akka.system.scheduler.schedule(
0.microseconds, 30.minutes, self, "clean")
}
def receive = {
case "update" => updateDB()
case "clean" => clean()
}
def updateDB(): Unit ={
Logger.debug("updates running")
}
def clean(): Unit ={
Logger.debug("cleanup running")
}
}
Nothing is printed in console. What I'm doing wrong?
Ok. Here working code of scheduler I've built:
Module:
class JobModule extends AbstractModule with AkkaGuiceSupport {
def configure() = {
bindActor[SchedulerActor]("scheduler-actor")
bind(classOf[Scheduler]).asEagerSingleton()
}
}
Scheduler:
class Scheduler #Inject() (val system: ActorSystem, #Named("scheduler-actor") val schedulerActor: ActorRef)(implicit ec: ExecutionContext)
{
system.scheduler.schedule(
0.microseconds, 5.minutes, schedulerActor, "update")
system.scheduler.schedule(
30.minutes, 30.days, schedulerActor, "clean")
}
Actor:
#Singleton
class SchedulerActor #Inject() (updater: Updater) extends Actor {
def receive = {
case "update" => updateDB()
case "clean" => clean()
}
def updateDB(): Unit ={
Logger.debug("updates running")
}
def clean(): Unit ={
Logger.debug("cleanup running")
}
}
You also need to add your module in application.conf:
play.modules.enabled += "modules.JobModule"
Hope this will help someone
Try
context.system.scheduler.schedule
And also make sure you have logging at Debug level otherwise those messages won't make it to the console. If you're unsure try changing them to Logger.error temporarily.