Spark throws Not Serializable Exception inside a foreachRDD operation

Spark throws Not Serializable Exception inside a foreachRDD operation - scala

i'm trying to implement an observer pattern using scala and spark streaming. the idea is that whenever i receive a record from the stream (from kafka) i notify the observer by calling the method "notifyObservers" inside the closure. here's the code:
the stream is provided by the kafka utils.
the method notifyObserver is defined into an abstract class following the rules of the pattern.
the error I think is related on the fact that methods cant be serialize.
Am I thinking correctly? and if it was, what kind of solution should I follow?
thanks
def onMessageConsumed() = {
stream.foreachRDD(rdd => {
rdd.foreach(consumerRecord => {
val record = new Record[T](consumerRecord.topic(),
consumerRecord.value())
//notify observers with the record to compute
notifyObservers(record)
})
})
}

Yes, the classes that are used in the code that is sent to other executors (executed in foreach, etc.), should implement Serializable interface.
also, if you're notification code requires connection to some resource, you need to wrap foreach into foreachPartition, something like this:
stream.foreachRDD(rdd => {
rdd.foreachPartition(rddPartition =>
// setup connection to external component
rddPartition.foreach(consumerRecord => {
val record = new Record[T](consumerRecord.topic(),
consumerRecord.value())
notifyObservers(record)
})
// close connection to external component
})
})

Related

Lagom: How to use topics without Event Sourcing

I am new to Lagom and I want to deal with Event Sourcing at some later point. So I'm not using the persistentEntityRegistry, instead I'm using a simple repository for CRUD operations.
Anyway I would like to have the ability to notify other Services, when a create operation has happened.
In the hello-lagom project the topic is implemented like this:
override def greetingsTopic(): Topic[GreetingMessage] =
TopicProducer.singleStreamWithOffset {
fromOffset =>
persistentEntityRegistry.eventStream(HelloEventTag.INSTANCE, fromOffset)
.map(ev => (convertEvent(ev), ev.offset))
}
This will obviously not work, when I'm not working without the event sourcing, so I just wondered if there is another way to publish events to the topic.
I'm thinking about sth. like this:
override def createSth = ServiceCall { createCommandData =>
val id = UUID.randomUUID()
repository.addSth(Sth(id, createCommandData.someValue)) map {createdItem =>
myTopic.publish(SthWasCreated(id))
}
}

Make DB calls through different models atomic

Let's say that there are two models, Model1 and Model2, each model has a set of basic methods that call DB to retrieve or write data. For one Model1 there can exist multiple Model2, when inserting a (Model1, List[Model2]) all this data comes from the same form. The current implementation does the following:
Insert the Model1 instance using Model1's insert method.
When Model1 has been correctly inserted, proceed with inserting the List[Model2] using Model2's insert method.
The issue is that if an issue occurs while inserting one of the Model2, Model1 will remain in DB. A solution would be to catch any exception that anorm throws and undo whatever was executed before by doing the exact opposite of it. But is there a solution already out there that can be used? Something that captures all DB calls that were executed and revert them if needed?

What you're looking for is DB.withTransaction. It works exactly the same as DB.withConnection, except that autocommit is set to false, so that if any exceptions are thrown, the entire transaction will be rolled back.
Example:
case class Model1(id: Long, something: String, children: List[Model2])
case class Model2(id: Long, name: String)
object Model1 {
def create(model: Model1): Option[Model1] = {
DB.withTransaction { implicit c =>
SQL(...).executeInsert().map { id =>
model.copy(
id = id,
children = Model2.create(model.children)
)
}
}
}
}
object Model2 {
def create(models: List[Model2])(implicit c: java.sql.Connection): List[Model2] = {
...
}
}
Note how Model2.create accepts an implicit Connection parameter. This is so that it will use the same Connection as the Model1.create transaction, and be allowed to roll back on failure. I've left out the fine implementation details, as the key is just using withTransaction, and running each query on the same Connection.

Scala design suggestion needed

I would like to design a client that would talk to a REST API. I have implemented the bit that actually does call the HTTP methods on the server. I call this Layer, the API layer. Each operation the server exposes is encapsulated as one method in this layer. This method takes as input a ClientContext which contains all the needed information to make the HTTP method call on the server.
I'm now trying to set up the interface to this layer, let's call it ClientLayer. This interface will be the one any users of my client library should use to consume the services. When calling the interface, the user should create the ClientContext, set up the request parameters depending on the operation that he is willing to invoke. With the traditional Java approach, I would have a state on my ClientLayer object which represents the ClientContext:
For example:
public class ClientLayer {
private static final ClientContext;
...
}
I would then have some constructors that would set up my ClientContext. A sample call would look like below:
ClientLayer client = ClientLayer.getDefaultClient();
client.executeMyMethod(client.getClientContext, new MyMethodParameters(...))
Coming to Scala, any suggestions on how to have the same level of simplicity with respect to the ClientContext instantiation while avoiding having it as a state on the ClientLayer?

I would use factory pattern here:
object RestClient {
class ClientContext
class MyMethodParameters
trait Client {
def operation1(params: MyMethodParameters)
}
class MyClient(val context: ClientContext) extends Client {
def operation1(params: MyMethodParameters) = {
// do something here based on the context
}
}
object ClientFactory {
val defaultContext: ClientContext = // set it up here;
def build(context: ClientContext): Client = {
// builder logic here
// object caching can be used to avoid instantiation of duplicate objects
context match {
case _ => new MyClient(context)
}
}
def getDefaultClient = build(defaultContext)
}
def main(args: Array[String]) {
val client = ClientFactory.getDefaultClient
client.operation1(new MyMethodParameters())
}
}

New Relic async tracing in Akka without Play

I have application written using Akka library, and I want to use New Relic to monitor it. I've noticed that in case of Play applications, all async actions during request handling are properly handled, and all actors involved are shown in the web transaction trace.
But when I am trying to instrument pure Akka application using custom java transaction traces, I can't achieve the same result, all traces consist of just one line with doJob method name. Code below:
case class NewRelicRequest(...) extends Request { ... }
case class NewRelicResponse(...) extends Response { ... }
class MyApiActor extends Actor {
def receive = {
case MyRequest(_) => doJob(...)
case MyOtherRequest(_) => doOtherJob(...)
}
#Trace(dispatcher=true)
private def doJob(...) {
NewRelic.setRequestAndResponse(NewRelicRequest("/doJob"), NewRelicResponse(...))
fooActor ! msg
}
#Trace(dispatcher=true)
private def doOtherJob(...) {
NewRelic.setRequestAndResponse(NewRelicRequest("/doOtherJob"), NewRelicResponse(...))
(barActor ? msg).pipeTo(sender)
}
}
Can someone explain what cases are supported, and how can I achieve async traces similar to those I see for Play apps?

Using Akka with Scalatra

My target is building a highly concurrent backend for my widgets. I'm currently exposing the backend as a web service, which receives requests to run a specific widget (using Scalatra), fetches widget's code from DB and runs it in an actor (using Akka) which then replies with the results. So imagine I'm doing something like:
get("/run:id") {
...
val actor = Actor.actorOf("...").start
val result = actor !! (("Run",id), 10000)
...
}
Now I believe this is not the best concurrent solution and I should somehow combine listening for requests and running widgets in one actor implementation. How would you design this for maximum concurrency? Thanks.

You can start your actors in an akka boot file or in your own ServletContextListener so that they are started without being tied to a servlet.
Then you can look for them with the akka registry.
Actor.registry.actorFor[MyActor] foreach { _ !! (("Run",id), 10000) }
Apart from that there is no real integration for akka with scalatra at this moment.
So until now the best you can do is by using blocking requests to a bunch of actors.
I'm not sure but I wouldn't necessary spawn an actor for each request but rather have a pool of widget actors which you can send those requests. If you use a supervisor hierarchy then the you can use a supervisor to resize the pool if it is too big or too small.
class MyContextListener extends ServletContextListener {
def contextInitialized(sce: ServletContextEvent) {
val factory = SupervisorFactory(
SupervisorConfig(
OneForOneStrategy(List(classOf[Exception]), 3, 1000),
Supervise(actorOf[WidgetPoolSupervisor], Permanent)
}
def contextDestroyed(sce: ServletContextEvent) {
Actor.registry.shutdownAll()
}
}
class WidgetPoolSupervisor extends Actor {
self.faultHandler = OneForOneStrategy(List(classOf[Exception]), 3, 1000)
override def preStart() {
(1 to 5) foreach { _ =>
self.spawnLink[MyWidgetProcessor]
}
Scheduler.schedule(self, 'checkPoolSize, 5, 5, TimeUnit.MINUTES)
}
protected def receive = {
case 'checkPoolSize => {
//implement logic that checks how quick the actors respond and if
//it takes to long add some actors to the pool.
//as a bonus you can keep downsizing the actor pool until it reaches 1
//or until the message starts returning too late.
}
}
}
class ScalatraApp extends ScalatraServlet {
get("/run/:id") {
// the !! construct should not appear anywhere else in your code except
// in the scalatra action. You don't want to block anywhere else, but in a
// scalatra action it's ok as the web request itself is synchronous too and needs to
// to wait for the full response to have come back anyway.
Actor.registry.actorFor[MyWidgetProcessor] foreach {
_ !! ((Run, id), 10000)
} getOrElse {
throw new HeyIExpectedAResultException()
}
}
}
Please do regard the code above as pseudo code that happens to look like scala, I just wanted to illustrate the concept.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spark throws Not Serializable Exception inside a foreachRDD operation - scala

Related

Lagom: How to use topics without Event Sourcing

Make DB calls through different models atomic

Scala design suggestion needed

New Relic async tracing in Akka without Play

Using Akka with Scalatra

Categories

Resources