I am streaming data from mysql using Slick 3 and Akka Streams.
This is how I build my source
import slick.jdbc.MySQLProfile.api._
val enableJdbcStreaming: (java.sql.Statement) => Unit = {statement =>
if (statement.isWrapperFor(classOf[com.mysql.cj.jdbc.StatementImpl])) {
statement.unwrap(classOf[com.mysql.cj.jdbc.StatementImpl]).enableStreamingResults()
}
}
val query = Tables.Foo.filter(r => r.isActive === true)
.map(r => r.id).result.withStatementParameters(statementInit = enableJdbcStreaming)
Source.fromPublisher(db.stream(query))
My application runs for like 20 minutes and then shuts down with the following error
[error] Exception in thread "abhipool network timeout executor" java.lang.NullPointerException
[info] 15:31:46 INFO [HikariPool] - abhipool - Close initiated...
[error] at com.mysql.cj.mysqla.io.MysqlaProtocol.setSocketTimeout(MysqlaProtocol.java:1397)
[error] at com.mysql.cj.mysqla.MysqlaSession$1.run(MysqlaSession.java:401)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error] at java.lang.Thread.run(Thread.java:745)
I have a feeling that because my query is running for a very long time there is some kind of timeout occurring which is initiating this shutdown.
My connection
mysql {
profile = "slick.jdbc.MySQLProfile$"
dataSourceClass = "slick.jdbc.DatabaseUrlDataSource"
properties {
driver = "com.mysql.cj.jdbc.Driver"
url = "jdbc:mysql://foo:3306/bar?useLegacyDatetimeCode=false&serverTimezone=America/Chicago"
user = "foo"
password = "bar"
}
connectionTimeout = 0
idleTimeout = 0
maxLifetime = 0
maxConnections = 40
minConnections = 10
poolName = "abhipool"
numThreads = 10
}
Dependencies
"com.typesafe.slick" %% "slick" % "3.2.1",
"com.typesafe.slick" %% "slick-hikaricp" % "3.2.1",
"mysql" % "mysql-connector-java" % "6.0.6",
How can I configure my application database connections so that even if my streaming application streams data for several days... it keeps running.
There is an extremely lengthy conversation about this same issue here but it doesn't tell me how to really fix this issue. This issues makes it totally impossible to write long running streaming tasks which use Mysql as a source.
You can configure the MySQL driver by adding parameters in the URL
url = "jdbc:mysql://foo:3306/bar?useLegacyDatetimeCode=false&serverTimezone=America/Chicago&socketTimeout=30000"
I put 30000 for the sake of the example, put the right value that fits your need
Related
I am using Play (2.6) with Slick, my database connection times out every second or third try. The only way to get the connection up again is by restarting the app with sbt run. It's driving me crazy, any help appreciated.
To be clear, I'm using Slick with a local MySQL database for very lightweight usage.
build.sbt
libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.11"
libraryDependencies += "com.typesafe.slick" %% "slick" % "3.2.3"
libraryDependencies += "com.typesafe.slick" %% "slick-hikaricp" % "3.2.3"
application.conf
# db connections = ((physical_core_count * 2) + effective_spindle_count)
fixedConnectionPool = 5
repository.dispatcher {
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = ${fixedConnectionPool}
}
}
Error message:
java.sql.SQLTransientConnectionException: <dbconfig> - Connection is not available, request timed out after 1001ms.
Stack Trace:
com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:453)
slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:249)
slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:248)
slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:37)
slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:274)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Update
I didn't include the specific db config in application.conf:
<name> {
slick.driver = scala.slick.driver.MySQLDriver
driver = "com.mysql.cj.jdbc.Driver"
url = __
user = __
password = __
}
I've added to the config:
keepAliveConnection = true
connectionPool = disabled
And it's working fine.
I have following environment scala2.11.8 / akka 2.4.8 / slick 3.1.1 / postgreSQL 9.6
I have done following configuration in application.conf
mydb {
driver = "slick.driver.PostgresDriver$"
db {
url = "jdbc:postgresql://localhost:5432/mydb"
driver = org.postgresql.Driver
user="postgres"
password="postgres"
numThreads = 10
connectionPool = disabled
keepAliveConnection = true
}
}
The DB access is done in class
package mib
import slick.driver.PostgresDriver.api._
import scala.concurrent.ExecutionContext.Implicits.global
class DBAccess {
import scala.concurrent.Future
import scala.concurrent._
import scala.concurrent.duration._
import slick.backend.DatabaseConfig
import slick.driver.JdbcProfile
import slick.driver.PostgresDriver
import slick.driver.PostgresDriver.api._
import slick.jdbc.JdbcBackend.Database
println("creating database")
val dbConfig: DatabaseConfig[PostgresDriver] = DatabaseConfig.forConfig("mydb")
val db = dbConfig.db
try{
val accesspoints = TableQuery[mibPoint]
// SELECT * FROM users WHERE username='john'
val q = for (a <- accesspoints) yield a.mib_id
val dbAction = q.result
val f: Future[Seq[String]] = db.run(dbAction)
Await.result(f, Duration.Inf)
f.onSuccess { case s => println(s"Result: $s") }
}
catch
{
case _: Throwable =>println("got some exception")
}
finally
db.close
}
// this is a class that represents the table I've created in the database
class mibPoint(tag: Tag) extends Table[(String, Double,Double)](tag, "mib_non_info") {
def mac_id = column[String]("mib_id",O.PrimaryKey)
def lat = column[Double]("lat")
def lng = column[Double]("lng")
def * = (mib_id, lat,lng)
}
This class is called from APP object as
object wmib extends App {
val mWBootStrapper = new bootStrap
mWBootStrapper.ReadProperties();
val mdB = new DBAccess
}
However after running, I always get the output as "got some exception"
I have tried to enable logging using slf4j/logback but still i do not see much in the logs.
The above seems like very trivial and probably i am missing something obvious.
Thanks in advance,
Vishal
I added the exception handling as suggested by sarvesh. That was cool and thank you.
However my problem vanished and there was no exception.
What happened?
Earlier in the day, I had attempted to access the DB using the java JDBC way.
i.e. just to check that there is nothing wrong with DB and DB access.
In the process, I downloaded and added the postgresDriver in the classpath. Earlier that was not the case.
Since the driver was now in the path, the code just worked.
Since I was not printing the exception, i was not realizing the error.
I then removed the driver jar AND i got the following error.
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.JdbcBackend.statement - Preparing statement: select "mib_id" from "mibpoint"
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.DriverDataSource - Driver org.postgresql.Driver not already registered; trying to load it
java.lang.ClassNotFoundException: org.postgresql.Driver
at java.lang.ClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at slick.util.ClassLoaderUtil$$anon$1.loadClass(ClassLoaderUtil.scala:12)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:60)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:58)
at scala.Option.getOrElse(Option.scala:121)
Thanks to all for helping.
Vishal
I was running into the same connection issues when first using Slick. I submitted this PR with details on how to connect up a local Postgres server.
https://github.com/slick/slick/issues/1861#issuecomment-387616310.
But basically try edit your build.sbt and application.conf files:
The 2020 answer:
You have to make sure of two things:
Add the driver to the build.sbt's libraryDependencies: "org.postgresql" % "postgresql" % "42.2.5". That will cause java.sql.DriverManager's method getDrivers (which is used by slick in class DriverDataSource) to find the driver org.postgresql.Driver
Make sure that the database url in application.conf is following the JDBC's full-url pattern, as described in the source code: https://github.com/slick/slick/blob/42d787b4950fe876569b5fd68e98c4e0379ac83c/slick/src/main/scala/slick/jdbc/DatabaseUrlDataSource.scala#L9. For example: postgresql://user:password#localhost:5432/postgres.
My full configuration is:
build.sbt
libraryDependencies ++= Seq(
...,
"org.postgresql" % "postgresql" % "42.2.5"
)
application.conf
slick-postgres {
profile = "slick.jdbc.PostgresProfile$"
db {
dataSourceClass = "slick.jdbc.DatabaseUrlDataSource"
properties = {
driver = "org.postgresql.Driver"
url = "postgresql://postgres:postgres#localhost:5432/postgres"
}
}
}
I added the exception handling as suggested by sarvesh. That was cool and thank you. However my problem vanished and there was no exception. What happened? Earlier in the day, I had attempted to access the DB using the java JDBC way. i.e. just to check that there is nothing wrong with DB and DB access. In the process, I downloaded and added the postgresDriver in the classpath. Earlier that was not the case. Since the driver was now in the path, the code just worked. Since I was not printing the exception, i was not realizing the error. I then removed the driver jar AND i got the following error.
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.JdbcBackend.statement - Preparing statement: select "mib_id" from "mibpoint"
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.DriverDataSource - Driver org.postgresql.Driver not already registered; trying to load it
java.lang.ClassNotFoundException: org.postgresql.Driver
at java.lang.ClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at slick.util.ClassLoaderUtil$$anon$1.loadClass(ClassLoaderUtil.scala:12)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:60)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:58)
at scala.Option.getOrElse(Option.scala:121)
Thanks to all for helping. Vishal
mydb {
dataSourceClass = "slick.jdbc.DatabaseUrlDataSource"
properties = {
driver = "slick.driver.PostgresDriver$"
url = "postgres://postgresql:postgresql#localhost:5432/mydb"
}
}
Or.. you can try something like,
mydb = {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
url = "jdbc:postgresql://localhost:5432/mydb"
user = "postgres"
password = "postgres"
}
numThreads = 10
}
You need the Postgres Driver on the classpath:
Try adding "org.postgresql" % "postgresql" % "42.1.4" to your libraryDependencies.
In case anyone gets this weird error, which doesn't help explain what the problem is:
CreationException: Unable to create injector, see the following errors: 1) Error in custom provider, java.lang.IllegalStateException: when specifying driverClassName, jdbcUrl must also be specified while locating play.api.db.evolutions.ApplicationEvolutionsProvider at play.api.db.evolutions.EvolutionsModule.bindings(EvolutionsModule.scala:22): Binding(class play.api.db.evolutions.ApplicationEvolutions to ProviderConstructionTarget(class play.api.db.evolutions.ApplicationEvolutionsProvider) eagerly) (via modules: com.google.inject.util.Modules$OverrideModule -> play.api.inject.guice.GuiceableModuleConversions$$anon$1) while locating play.api.db.evolutions.ApplicationEvolutions 1 error
What I found strange was that the error goes away if you remove
"com.typesafe.play" %% "play-slick-evolutions" % "2.0.0"
from your build.sbt file.
Anyway, the problem was that I had my application.conf file look like this:
slick.dbs.default.driver = "slick.driver.PostgresDriver$"
slick.dbs.default.db.driver = "org.postgresql.Driver"
slick.dbs.default.url = "jdbc:postgresql://localhost:5432/pusdienodb"
slick.dbs.default.user = "pusdieno"
slick.dbs.default.password = "password"
Turns out that both url, user and password also need the .db. part.
So your configuration should look something like this in the end:
slick.dbs.default.driver = "slick.driver.PostgresDriver$"
slick.dbs.default.db.driver = "org.postgresql.Driver"
slick.dbs.default.db.url = "jdbc:postgresql://localhost:5432/pusdienodb"
slick.dbs.default.db.user = "pusdieno"
slick.dbs.default.db.password = "password"
I have a public available Amazon s3 resource (text file) and want to access it from spark. That means - I don't have any Amazon credentials - it works fine if I want to just download it:
val bucket = "<my-bucket>"
val key = "<my-key>"
val client = new AmazonS3Client
val o = client.getObject(bucket, key)
val content = o.getObjectContent // <= can be read and used as input stream
However, when I try to access the same resource from spark context
val conf = new SparkConf().setAppName("app").setMaster("local")
val sc = new SparkContext(conf)
val f = sc.textFile(s"s3a://$bucket/$key")
println(f.count())
I receive the following error with stacktrace:
Exception in thread "main" com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1781)
at org.apache.spark.rdd.RDD.count(RDD.scala:1099)
at com.example.Main$.main(Main.scala:14)
at com.example.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
I don't want to provide any AWS credentials - I just want to access resource anonymously (for now) - how to achieve this? I probably need to make it use something like AnonymousAWSCredentialsProvider - but how to put it inside spark or hadoop?
P.S. My build.sbt just in case
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.1",
"org.apache.hadoop" % "hadoop-aws" % "2.7.1"
)
UPDATED: After I did some investigations - I see the reason why itsn't working.
First of all, S3AFileSystem creates AWS client with the following order of credentials:
AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(accessKey, secretKey),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
"accessKey" and "secretKey" values are taken from the spark conf instance (keys must be "fs.s3a.access.key" and "fs.s3a.secret.key" or org.apache.hadoop.fs.s3a.Constants.ACCESS_KEY and org.apache.hadoop.fs.s3a.Constants.SECRET_KEY constants, which is more convenient).
Second - you probably see that AnonymousAWSCredentialsProvider is the third option (last priority) - what could possible be wrong with that? See the implementation of AnonymousAWSCredentials:
public class AnonymousAWSCredentials implements AWSCredentials {
public String getAWSAccessKeyId() {
return null;
}
public String getAWSSecretKey() {
return null;
}
}
It simply returns null for both access key and secret key. Sounds reasonable. But look inside AWSCredentialsProviderChain:
AWSCredentials credentials = provider.getCredentials();
if (credentials.getAWSAccessKeyId() != null &&
credentials.getAWSSecretKey() != null) {
log.debug("Loading credentials from " + provider.toString());
lastUsedProvider = provider;
return credentials;
}
It doesn't choose provider in case both keys are null - that means anonymous credentials can't work. Looks like a bug inside aws-java-sdk-1.7.4. I tried to use latest version - but it's incompatible with hadoop-aws-2.7.1.
Any other ideas?
I personally never accessed public data from Spark. You can try to use dummy credentials, or to create ones just for this usage. Set them directly on the SparkConf object.
val sparkConf: SparkConf = ???
val accessKeyId: String = ???
val secretAccessKey: String = ???
sparkConf.set("spark.hadoop.fs.s3.awsAccessKeyId", accessKeyId)
sparkConf.set("spark.hadoop.fs.s3n.awsAccessKeyId", accessKeyId)
sparkConf.set("spark.hadoop.fs.s3.awsSecretAccessKey", secretAccessKey)
sparkConf.set("spark.hadoop.fs.s3n.awsSecretAccessKey", secretAccessKey)
As an alternative, read the documentation of DefaultAWSCredentialsProviderChain to see where the credentials are looked for. The list (order is important) is:
Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY
Java System Properties - aws.accessKeyId and aws.secretKey
Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
Instance profile credentials delivered through the Amazon EC2 metadata service
This is what helped me:
val session = SparkSession.builder()
.appName("App")
.master("local[*]")
.config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
.getOrCreate()
val df = session.read.csv(filesFromS3:_*)
Versions:
"org.apache.spark" %% "spark-sql" % "2.4.0",
"org.apache.hadoop" % "hadoop-aws" % "2.8.5",
Documentation:
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties
It seems you can now use the aws.credentials.provider config key to use anonymous access given by org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider, which correctly special case the anonymous provider. However, you need a newer hadoop-aws than 2.7, which means you also need a spark installation without a bundled hadoop.
Here is how I did it colab:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.3.1/spark-2.3.1-bin-without-hadoop.tgz
!tar xf spark-2.3.1-bin-without-hadoop.tgz
!pip install -q findspark
!pip install -q pyarrow
Now we install hadoop on the side, and set the output of hadoop classpath to SPARK_DIST_CLASSPATH, so spark can see it.
import os
!wget -q http://mirror.nbtelecom.com.br/apache/hadoop/common/hadoop-2.8.4/hadoop-2.8.4.tar.gz
!tar xf hadoop-2.8.4.tar.gz
os.environ['HADOOP_HOME']= '/content/hadoop-2.8.4'
os.environ["SPARK_DIST_CLASSPATH"] = "/content/hadoop-2.8.4/etc/hadoop:/content/hadoop-2.8.4/share/hadoop/common/lib/*:/content/hadoop-2.8.4/share/hadoop/common/*:/content/hadoop-2.8.4/share/hadoop/hdfs:/content/hadoop-2.8.4/share/hadoop/hdfs/lib/*:/content/hadoop-2.8.4/share/hadoop/hdfs/*:/content/hadoop-2.8.4/share/hadoop/yarn/lib/*:/content/hadoop-2.8.4/share/hadoop/yarn/*:/content/hadoop-2.8.4/share/hadoop/mapreduce/lib/*:/content/hadoop-2.8.4/share/hadoop/mapreduce/*:/content/hadoop-2.8.4/contrib/capacity-scheduler/*.jar"
Then we do like in https://mikestaszel.com/2018/03/07/apache-spark-on-google-colaboratory/, but add s3a and anonymous reading support, which is what the question is about.
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.3.1-bin-without-hadoop"
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.6,org.apache.hadoop:hadoop-aws:2.8.4 --conf spark.sql.execution.arrow.enabled=true --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider pyspark-shell'
And finally we can create the session.
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
I have been playing around with Akka Persistence and have written the following program to test my understanding. The problem is that I get different results each time I run this program. The correct answer is 49995000 but I don't always get that. I have cleaned out the journal directory between each run but it does not make any difference. Can anyone see what's going wrong? The program simply sums all the numbers from 1 to n (where n is 9999 in the code below).
The correct answer is : (n * (n+1)) / 2. For n=9999 that's 49995000.
EDIT: Seems to work more consistently with JDK 8 than with JDK 7. Should I be using JDK 8 only?
package io.github.ourkid.akka.aggregator.guaranteed
import akka.actor.Actor
import akka.actor.ActorPath
import akka.actor.ActorSystem
import akka.actor.Props
import akka.actor.actorRef2Scala
import akka.persistence.AtLeastOnceDelivery
import akka.persistence.PersistentActor
case class ExternalRequest(updateAmount : Int)
case class CountCommand(deliveryId : Long, updateAmount : Int)
case class Confirm(deliveryId : Long)
sealed trait Evt
case class CountEvent(updateAmount : Int) extends Evt
case class ConfirmEvent(deliveryId : Long) extends Evt
class TestGuaranteedDeliveryActor(counter : ActorPath) extends PersistentActor with AtLeastOnceDelivery {
override def persistenceId = "persistent-actor-ref-1"
override def receiveCommand : Receive = {
case ExternalRequest(updateAmount) => persist(CountEvent(updateAmount))(updateState)
case Confirm(deliveryId) => persist(ConfirmEvent(deliveryId)) (updateState)
}
override def receiveRecover : Receive = {
case evt : Evt => updateState(evt)
}
def updateState(evt:Evt) = evt match {
case CountEvent(updateAmount) => deliver(counter, id => CountCommand(id, updateAmount))
case ConfirmEvent(deliveryId) => confirmDelivery(deliveryId)
}
}
class FactorialActor extends Actor {
var count = 0
def receive = {
case CountCommand(deliveryId : Long, updateAmount:Int) => {
count = count + updateAmount
sender() ! Confirm(deliveryId)
}
case "print" => println(count)
}
}
object GuaranteedDeliveryTest extends App {
val system = ActorSystem()
val factorial = system.actorOf(Props[FactorialActor])
val delActor = system.actorOf(Props(classOf[TestGuaranteedDeliveryActor], factorial.path))
import system.dispatcher
system.scheduler.schedule(0 seconds, 2 seconds) { factorial ! "print" }
for (i <- 1 to 9999)
delActor ! ExternalRequest(i)
}
SBT file
name := "akka_aggregator"
organization := "io.github.ourkid"
version := "0.0.1-SNAPSHOT"
scalaVersion := "2.11.4"
scalacOptions ++= Seq("-unchecked", "-deprecation")
resolvers ++= Seq(
"Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
)
val Akka = "2.3.7"
val Spray = "1.3.2"
libraryDependencies ++= Seq(
// Core Akka
"com.typesafe.akka" %% "akka-actor" % Akka,
"com.typesafe.akka" %% "akka-cluster" % Akka,
"com.typesafe.akka" %% "akka-persistence-experimental" % Akka,
"org.iq80.leveldb" % "leveldb" % "0.7",
"org.fusesource.leveldbjni" % "leveldbjni-all" % "1.8",
// For future REST API
"io.spray" %% "spray-httpx" % Spray,
"io.spray" %% "spray-can" % Spray,
"io.spray" %% "spray-routing" % Spray,
"org.typelevel" %% "scodec-core" % "1.3.0",
// CSV reader
"net.sf.opencsv" % "opencsv" % "2.3",
// Logging
"com.typesafe.akka" %% "akka-slf4j" % Akka,
"ch.qos.logback" % "logback-classic" % "1.0.13",
// Testing
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"com.typesafe.akka" %% "akka-testkit" % Akka % "test",
"io.spray" %% "spray-testkit" % Spray % "test",
"org.scalacheck" %% "scalacheck" % "1.11.6" % "test"
)
fork := true
mainClass in assembly := Some("io.github.ourkid.akka.aggregator.TestGuaranteedDeliveryActor")
application.conf file
##########################################
# Akka Persistence Reference Config File #
##########################################
akka {
# Loggers to register at boot time (akka.event.Logging$DefaultLogger logs
# to STDOUT)
loggers = ["akka.event.slf4j.Slf4jLogger"]
# Log level used by the configured loggers (see "loggers") as soon
# as they have been started; before that, see "stdout-loglevel"
# Options: OFF, ERROR, WARNING, INFO, DEBUG
loglevel = "DEBUG"
# Log level for the very basic logger activated during ActorSystem startup.
# This logger prints the log messages to stdout (System.out).
# Options: OFF, ERROR, WARNING, INFO, DEBUG
stdout-loglevel = "INFO"
# Filter of log events that is used by the LoggingAdapter before
# publishing log events to the eventStream.
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
# Protobuf serialization for persistent messages
actor {
serializers {
akka-persistence-snapshot = "akka.persistence.serialization.SnapshotSerializer"
akka-persistence-message = "akka.persistence.serialization.MessageSerializer"
}
serialization-bindings {
"akka.persistence.serialization.Snapshot" = akka-persistence-snapshot
"akka.persistence.serialization.Message" = akka-persistence-message
}
}
persistence {
journal {
# Maximum size of a persistent message batch written to the journal.
max-message-batch-size = 200
# Maximum size of a deletion batch written to the journal.
max-deletion-batch-size = 10000
# Path to the journal plugin to be used
plugin = "akka.persistence.journal.leveldb"
# In-memory journal plugin.
inmem {
# Class name of the plugin.
class = "akka.persistence.journal.inmem.InmemJournal"
# Dispatcher for the plugin actor.
plugin-dispatcher = "akka.actor.default-dispatcher"
}
# LevelDB journal plugin.
leveldb {
# Class name of the plugin.
class = "akka.persistence.journal.leveldb.LeveldbJournal"
# Dispatcher for the plugin actor.
plugin-dispatcher = "akka.persistence.dispatchers.default-plugin-dispatcher"
# Dispatcher for message replay.
replay-dispatcher = "akka.persistence.dispatchers.default-replay-dispatcher"
# Storage location of LevelDB files.
dir = "journal"
# Use fsync on write
fsync = on
# Verify checksum on read.
checksum = off
# Native LevelDB (via JNI) or LevelDB Java port
native = on
# native = off
}
# Shared LevelDB journal plugin (for testing only).
leveldb-shared {
# Class name of the plugin.
class = "akka.persistence.journal.leveldb.SharedLeveldbJournal"
# Dispatcher for the plugin actor.
plugin-dispatcher = "akka.actor.default-dispatcher"
# timeout for async journal operations
timeout = 10s
store {
# Dispatcher for shared store actor.
store-dispatcher = "akka.persistence.dispatchers.default-plugin-dispatcher"
# Dispatcher for message replay.
replay-dispatcher = "akka.persistence.dispatchers.default-plugin-dispatcher"
# Storage location of LevelDB files.
dir = "journal"
# Use fsync on write
fsync = on
# Verify checksum on read.
checksum = off
# Native LevelDB (via JNI) or LevelDB Java port
native = on
}
}
}
snapshot-store {
# Path to the snapshot store plugin to be used
plugin = "akka.persistence.snapshot-store.local"
# Local filesystem snapshot store plugin.
local {
# Class name of the plugin.
class = "akka.persistence.snapshot.local.LocalSnapshotStore"
# Dispatcher for the plugin actor.
plugin-dispatcher = "akka.persistence.dispatchers.default-plugin-dispatcher"
# Dispatcher for streaming snapshot IO.
stream-dispatcher = "akka.persistence.dispatchers.default-stream-dispatcher"
# Storage location of snapshot files.
dir = "snapshots"
}
}
view {
# Automated incremental view update.
auto-update = on
# Interval between incremental updates
auto-update-interval = 5s
# Maximum number of messages to replay per incremental view update. Set to
# -1 for no upper limit.
auto-update-replay-max = -1
}
at-least-once-delivery {
# Interval between redelivery attempts
redeliver-interval = 5s
# Maximum number of unconfirmed messages that will be sent in one redelivery burst
redelivery-burst-limit = 10000
# After this number of delivery attempts a `ReliableRedelivery.UnconfirmedWarning`
# message will be sent to the actor.
warn-after-number-of-unconfirmed-attempts = 5
# Maximum number of unconfirmed messages that an actor with AtLeastOnceDelivery is
# allowed to hold in memory.
max-unconfirmed-messages = 100000
}
dispatchers {
default-plugin-dispatcher {
type = PinnedDispatcher
executor = "thread-pool-executor"
}
default-replay-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 2
parallelism-max = 8
}
}
default-stream-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 2
parallelism-max = 8
}
}
}
}
}
Correct output:
18:02:36.684 [default-akka.actor.default-dispatcher-3] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
18:02:36.684 [default-akka.actor.default-dispatcher-3] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started
18:02:36.684 [default-akka.actor.default-dispatcher-3] DEBUG akka.event.EventStream - Default Loggers started
0
18:02:36.951 [default-akka.actor.default-dispatcher-14] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.persistence.serialization.MessageSerializer] for message [akka.persistence.PersistentImpl]
18:02:36.966 [default-akka.actor.default-dispatcher-3] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.serialization.JavaSerializer] for message [io.github.ourkid.akka.aggregator.guaranteed.CountEvent]
3974790
24064453
18:02:42.313 [default-akka.actor.default-dispatcher-11] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.serialization.JavaSerializer] for message [io.github.ourkid.akka.aggregator.guaranteed.ConfirmEvent]
49995000
49995000
49995000
49995000
Incorrect run:
17:56:22.493 [default-akka.actor.default-dispatcher-4] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
17:56:22.508 [default-akka.actor.default-dispatcher-4] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started
17:56:22.508 [default-akka.actor.default-dispatcher-4] DEBUG akka.event.EventStream - Default Loggers started
0
17:56:22.750 [default-akka.actor.default-dispatcher-2] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.persistence.serialization.MessageSerializer] for message [akka.persistence.PersistentImpl]
17:56:22.765 [default-akka.actor.default-dispatcher-7] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.serialization.JavaSerializer] for message [io.github.ourkid.akka.aggregator.guaranteed.CountEvent]
3727815
22167811
17:56:28.391 [default-akka.actor.default-dispatcher-3] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.serialization.JavaSerializer] for message [io.github.ourkid.akka.aggregator.guaranteed.ConfirmEvent]
49995000
51084018
51084018
52316760
52316760
52316760
52316760
52316760
Another incorrect run:
17:59:12.122 [default-akka.actor.default-dispatcher-3] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
17:59:12.137 [default-akka.actor.default-dispatcher-3] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started
17:59:12.137 [default-akka.actor.default-dispatcher-3] DEBUG akka.event.EventStream - Default Loggers started
0
17:59:12.387 [default-akka.actor.default-dispatcher-7] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.persistence.serialization.MessageSerializer] for message [akka.persistence.PersistentImpl]
17:59:12.402 [default-akka.actor.default-dispatcher-13] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.serialization.JavaSerializer] for message [io.github.ourkid.akka.aggregator.guaranteed.CountEvent]
2982903
17710176
49347145
17:59:18.204 [default-akka.actor.default-dispatcher-13] DEBUG a.s.Serialization(akka://default) - Using serializer[akka.serialization.JavaSerializer] for message [io.github.ourkid.akka.aggregator.guaranteed.ConfirmEvent]
51704199
51704199
55107844
55107844
55107844
55107844
You're using AtLeastOnceDelivery semantics. As it said here:
Note At-least-once delivery implies that original message send order
is not always preserved and the destination may receive duplicate
messages. That means that the semantics do not match those of a normal
ActorRef send operation:
it is not at-most-once delivery message order for the same
sender–receiver pair is not preserved due to possible resends after a
crash and restart of the destination messages are still delivered—to
the new actor incarnation These semantics is similar to what an
ActorPath represents (see Actor Lifecycle), therefore you need to
supply a path and not a reference when delivering messages. The
messages are sent to the path with an actor selection.
So some numbers may be received more than once. You can just ignore duplicate numbers inside FactorialActor or don't use this semantic.