Fetch size in PGConnection.getNotifications - postgresql

A function in my postgresql database sends a notification when a table is updated.
I'm polling that postgresql database by scalikejdbc, to get all the notifications, and then, do something with them.
The process is explained here . A typical reactive system to sql tables updates.
I get the PGConnection from the java.sql.Connection. And, after that, I get the notifications in this way:
val notifications = Option(pgConnection.getNotifications).getOrElse(Array[PGNotification]())
I'm trying to get the notifications in chunks of 1000 by setting the fetch size to 1000, and disabling the auto commit. But fetch size property is ignored.
Any ideas how I could do that?
I wouldn't want to handle hundreds of thousands of notifications in a single map over my notifications dataset.
pgConnection.getNotifications.size could be huge, and therefore, this code wouldn't scale well.
Thanks!!!

To better scale, consider using postgresql-async and Akka Streams: the former is a library that can obtain PostgreSQL notifications asynchronously, and the former is a Reactive Streams implementation that provides backpressure (which would obviate the need for paging). For example:
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import com.github.mauricio.async.db.postgresql.PostgreSQLConnection
import com.github.mauricio.async.db.postgresql.util.URLParser
import scala.concurrent.duration._
import scala.concurrent.Await
class DbActor(implicit materializer: ActorMaterializer) extends Actor with ActorLogging {
private implicit val ec = context.system.dispatcher
val queue =
Source.queue[String](Int.MaxValue, OverflowStrategy.backpressure)
.to(Sink.foreach(println))
.run()
val configuration = URLParser.parse("jdbc:postgresql://localhost:5233/my_db?user=dbuser&password=pwd")
val connection = new PostgreSQLConnection(configuration)
Await.result(connection.connect, 5 seconds)
connection.sendQuery("LISTEN my_channel")
connection.registerNotifyListener { message =>
val msg = message.payload
log.debug("Sending the payload: {}", msg)
self ! msg
}
def receive = {
case payload: String =>
queue.offer(payload).pipeTo(self)
case QueueOfferResult.Dropped =>
log.warning("Dropped a message.")
case QueueOfferResult.Enqueued =>
log.debug("Enqueued a message.")
case QueueOfferResult.Failure(t) =>
log.error("Stream failed: {}", t.getMessage)
case QueueOfferResult.QueueClosed =>
log.debug("Stream closed.")
}
}
The code above simply prints notifications from PostgreSQL as they occur; you can replace the Sink.foreach(println) with another Sink. To run it:
import akka.actor._
import akka.stream.ActorMaterializer
object Example extends App {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
system.actorOf(Props(classOf[DbActor], materializer))
}

Related

How to emit messages from a Sink and pass them to another function?

I am currently building a client-side WebSockets consumer using Akka-HTTP. Instead of trying to do the parsing in the Sink, I wanted to wrap the code in a function which emits the outcome from the Sink, and then use this output (from the function) later for further processing (more parsing...etc.).
I am currently able to print every message from the Sink; however, the return type of the function remains to be Unit. My objective is to Emit a String from the function, for each item that lands in the sink, and then use the returned string to do further parsing. I have the code I have so far (Note: it's mostly boiler plate).
import java.util.concurrent.atomic.AtomicInteger
import akka.Done
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.StatusCodes
import akka.http.scaladsl.model.ws.{Message, TextMessage, WebSocketRequest, WebSocketUpgradeResponse}
import akka.http.scaladsl.settings.ClientConnectionSettings
import akka.stream.Materializer
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import akka.util.ByteString
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.util.{Failure, Success, Try}
object client extends App {
def parseData(uri: String)(implicit system: ActorSystem, materializer: Materializer): Unit = {
val defaultSettings = ClientConnectionSettings(system)
val pingCounter = new AtomicInteger()
val customWebsocketSettings = defaultSettings.websocketSettings.withPeriodicKeepAliveData(
() => ByteString(s"debug-${pingCounter.incrementAndGet()}")
)
val customSettings = defaultSettings.withWebsocketSettings(customWebsocketSettings)
val outgoing = Source.maybe[Message]
val sink: Sink[Message, Future[Done]] = Sink.foreach[Message] {
case message: TextMessage.Strict => message.text // I Want to emit/stream this message as a String from the function (or get a handle on it from the outside)
case _ => println("Other")
}
val webSocketFlow: Flow[Message, Message, Future[WebSocketUpgradeResponse]] =
Http().webSocketClientFlow(WebSocketRequest(uri), settings = customSettings)
val (upgradeResponse, closed) =
outgoing
.viaMat(webSocketFlow)(Keep.right)
.toMat(sink)(Keep.both)
.run()
val connected = upgradeResponse.flatMap { upgrade =>
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Future.successful(Done)
} else {
throw new RuntimeException(
s"Connection failed: ${upgrade.response.status}"
)
}
}
connected.onComplete {
case Success(value) => value
case Failure(exception) => throw exception
}
closed.onComplete { _ =>
println("Retrying...")
parseData(uri)
}
upgradeResponse.onComplete {
case Success(value) => println(value)
case Failure(exception) => throw exception
}
}
}
And in a seperate object, I would like to do the parsing, so something like:
import akka.actor.ActorSystem
import akka.stream.Materializer
import api.client.parseData
object Application extends App {
implicit val system: ActorSystem = ActorSystem()
implicit val materializer: Materializer = Materializer(system)
val uri = "ws://localhost:8080/foobar"
val res = parseData(uri) // I want to handle the function output here
// parse(res)
println(res)
Is there a way I can get a handle on the Sink from outside the function, or do I need to do any parsing in the Sink. I am mainly trying to not overcomplicate the Sink.
Update: I am also considering if adding another Flow element to the stream (which handles the parsing) is a better practice than getting values outside of the stream.
Adding a flow element seems to solve your problem while being totally idiomatic.
What you have to keep in mind is that the sinks semantic is meant to describe how to "terminate" the stream, so while it can describe very complex computations, it will always return a single value which is returned only once the stream ends.
Said differently, a sink does not return a value per stream element, it returns a value per whole stream.

How should I test akka-streams RestartingSource usage

I'm working on an application that has a couple of long-running streams going, where it subscribes to data about a certain entity and processes that data. These streams should be up 24/7, so we needed to handle failures (network issues etc).
For that purpose, we've wrapped our sources in RestartingSource.
I'm now trying to verify this behaviour, and while it looks like it functions, I'm struggling to create a test where I push in some data, verify that it processes correctly, then send an error, and verify that it reconnects after that and continues processing.
I've boiled that down to this minimal case:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{RestartSource, Sink, Source}
import akka.stream.testkit.TestPublisher
import org.scalatest.concurrent.Eventually
import org.scalatest.{FlatSpec, Matchers}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext
class MinimalSpec extends FlatSpec with Matchers with Eventually {
"restarting a failed source" should "be testable" in {
implicit val sys: ActorSystem = ActorSystem("akka-grpc-measurements-for-test")
implicit val mat: ActorMaterializer = ActorMaterializer()
implicit val ec: ExecutionContext = sys.dispatcher
val probe = TestPublisher.probe[Int]()
val restartingSource = RestartSource
.onFailuresWithBackoff(1 second, 1 minute, 0d) { () => Source.fromPublisher(probe) }
var last: Int = 0
val sink = Sink.foreach { l: Int => last = l }
restartingSource.runWith(sink)
probe.sendNext(1)
eventually {
last shouldBe 1
}
probe.sendNext(2)
eventually {
last shouldBe 2
}
probe.sendError(new RuntimeException("boom"))
probe.expectSubscription()
probe.sendNext(3)
eventually {
last shouldBe 3
}
}
}
This test consistently fails on the last eventually block with Last failure message: 2 was not equal to 3. What am I missing here?
Edit: akka version is 2.5.31
I figured it out after having had a look at the TestPublisher code. Its subscription is a lazy val. So when RestartSource detects the error, and executes the factory method () => Source.fromPublisher(probe) again, it gets a new Source, but the subscription of the probe is still pointing to the old Source. Changing the code to initialize both a new Source and TestPublisher works.

items fail to be processed in Akka streams app that uses Source.queues and Sink.queues in a flow

I am trying to create an (Akka HTTP) stream procsesing flow using the classes akka.stream.scaladsl.Source and Sink queues.
I am using a queue because I have a processing step in my flow that issues http requests and I want this step to take as many
items off the queue as there are max-open-requests, and stop taking off the queue once max-open-requests are in flight.
The result is that backpressure is applied when my connection pool is overloaded.
Below, I have a very simplified test that reflects the main logic of my app. In the test 'Stress Spec' (below)
I am simulating a number of simultaneous connections via which I will send a 'Source' of 'Requesto' objects
to the getResponses method of the class ServiceImpl.
In the processing step 'pullOffSinkQueue' you will note that I am incrementing a counter to see how many items
I have pulled off the queue.
The test will send Serviceimpl a set of requests whose cardinality is set to equal
streamedRequestsPerConnection * numSimultaneousConnections.
When I send 20 requests my test passes fine. In particular the count of requests pulled off the
Sink.queue will be equal to the number of requests I send out. However, if
I increase the number of requests I send to above 50 or so, I see consistent failures in the test.
I get a message such as the one below
180 was not equal to 200
ScalaTestFailureLocation: com.foo.StressSpec at (StressSpec.scala:116)
Expected :200
Actual :180
<Click to see difference>
This indicates that the number of items pulled off the queue does not equal the number of items put on the queue.
I have a feeling this might be due to the fact that my test is not properly waiting for all items put into the stream
to be processed. If anyone has any suggestions, I'd be all ears ! Code is below.
package com.foo
import java.util.concurrent.atomic.AtomicInteger
import akka.stream.ActorAttributes.supervisionStrategy
import akka.stream.{Attributes, Materializer, QueueOfferResult}
import akka.stream.Supervision.resumingDecider
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import scala.concurrent.{ExecutionContext, Future}
import akka.NotUsed
import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import org.scalatest.mockito.MockitoSugar
import org.scalatest.{FunSuite, Matchers}
import scala.collection.immutable
import scala.concurrent.duration._
import scala.concurrent.{Await, Future, _}
final case class Responso()
final case class Requesto()
object Handler {
val dbRequestCounter = new AtomicInteger(0)
}
class Handler(implicit ec: ExecutionContext, mat: Materializer) {
import Handler._
private val source =
Source.queue[(Requesto, String)](8, akka.stream.OverflowStrategy.backpressure)
private val sink =
Sink.queue[(Requesto, String)]().withAttributes(Attributes.inputBuffer(8, 8))
private val (sourceQueue, sinkQueue) = source.toMat(sink)(Keep.both).run()
def placeOnSourceQueue(ar: Requesto): Future[QueueOfferResult] = {
sourceQueue.offer((ar, "foo"))
}
def pullOffSinkQueue(qofr: QueueOfferResult): Future[Responso] = {
dbRequestCounter.incrementAndGet()
qofr match {
case QueueOfferResult.Enqueued =>
sinkQueue.pull().flatMap { maybeRequestPair: Option[(Requesto, String)] =>
Future.successful(Responso())
}
case error =>
println("enqueuing error: " + error)
Future.failed(new RuntimeException("enqueuing error: " + error))
}
}
}
class ServiceImpl(readHandler: Handler, writeHandler: Handler)
(implicit log: LoggingAdapter, mat: Materializer) {
private val readAttributeFlow: Flow[Requesto, Responso, NotUsed] = {
Flow[Requesto]
.mapAsyncUnordered(1)(readHandler.placeOnSourceQueue)
.mapAsyncUnordered(1)(readHandler.pullOffSinkQueue)
}
def getResponses(request: Source[Requesto, NotUsed]): Source[Responso, NotUsed] =
request
.via(readAttributeFlow)
.withAttributes(supervisionStrategy(resumingDecider))
}
class StressSpec
extends FunSuite
with MockitoSugar
with Matchers {
val streamedRequestsPerConnection = 10
val numSimultaneousConnections = 20
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val log: LoggingAdapter = Logging(actorSystem.eventStream, "test")
implicit val ec: ExecutionContext = actorSystem.dispatcher
import Handler._
lazy val requestHandler = new Handler()
lazy val svc: ServiceImpl =
new ServiceImpl(requestHandler, requestHandler)
test("can handle lots of simultaneous read requests") {
val totalExpected = streamedRequestsPerConnection * numSimultaneousConnections
def sendRequestAndAwaitResponse(): Unit = {
def getResponses(i: Integer) = {
val requestStream: Source[Requesto, NotUsed] =
Source(1 to streamedRequestsPerConnection)
.map { i =>
Requesto()
}
svc.getResponses(requestStream).runWith(Sink.seq)
}
val responses: immutable.Seq[Future[immutable.Seq[Responso]]] =
(1 to numSimultaneousConnections).map { getResponses(_) }
val flattenedResponses: Future[immutable.Seq[Responso]] =
Future.sequence(responses).map(_.flatten)
Await.ready(flattenedResponses, 1000.seconds).value.get
}
sendRequestAndAwaitResponse()
dbRequestCounter.get shouldBe(totalExpected)
}
}

Limit number of messages sent within time interval

Using below code I'm attempting to limit the amount of messages send to an actor within a specified time frame. But the messages are not being throttled and are being sent as quickly as possible. The downstream actor just makes a http request to the Google home page.
The throttler code where I attempt to limit 3 messages to be sent within 3 seconds :
val throttler: ActorRef =
Source.actorRef(bufferSize = 1000, OverflowStrategy.dropNew)
.throttle(3, 1.second)
.to(Sink.actorRef(printer, NotUsed))
.run()
How can I limit the number of messages sent within loop :
for( a <- 1 to 10000){
// Create the 'greeter' actors
val howdyGreeter: ActorRef =
system.actorOf(Greeter.props(String.valueOf(a), printer), String.valueOf(a))
howdyGreeter ! RequestActor("RequestActor")
howdyGreeter ! Greet
}
to 3 per second ?
entire code :
//https://developer.lightbend.com/guides/akka-quickstart-scala/full-example.html
import akka.NotUsed
import akka.stream.{OverflowStrategy, ThrottleMode}
import akka.stream.scaladsl.{Sink, Source}
import org.apache.http.client.methods.HttpGet
import org.apache.http.entity.StringEntity
import org.apache.http.impl.client.DefaultHttpClient
import net.liftweb.json._
import net.liftweb.json.Serialization.write
import org.apache.http.util.EntityUtils
//import akka.contrib.throttle.TimerBasedThrottler
import akka.actor.{Actor, ActorLogging, ActorRef, ActorSystem, Props}
import scala.concurrent.duration._
import akka.NotUsed
import akka.actor.ActorRef
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.OverflowStrategy
import akka.stream.ThrottleMode
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Source
object Greeter {
def props(message: String, printerActor: ActorRef): Props = Props(new Greeter(message, printerActor))
final case class RequestActor(who: String)
case object Greet
}
class Greeter(message: String, printerActor: ActorRef) extends Actor {
import Greeter._
import Printer._
var greeting = ""
def receive = {
case RequestActor(who) =>
val get = new HttpGet("http://www.google.com")
val response = (new DefaultHttpClient).execute(get)
// val responseString = EntityUtils.toString(response.getEntity, "UTF-8")
// System.out.println(responseString)
greeting = String.valueOf(response.getStatusLine.getStatusCode)
println("message is "+message)
// greeting = message + ", " + who
case Greet =>
printerActor ! Greeting(greeting)
}
}
object Printer {
def props: Props = Props[Printer]
final case class Greeting(greeting: String)
}
class Printer extends Actor with ActorLogging {
import Printer._
def receive = {
case Greeting(greeting) =>
log.info("Greeting received (from " + sender() + "): " + greeting)
}
}
object AkkaQuickstart extends App {
import Greeter._
// Create the 'helloAkka' actor system
val system: ActorSystem = ActorSystem("helloAkka")
// Create the printer actor,this is also the target actor
val printer: ActorRef = system.actorOf(Printer.props, "printerActor")
implicit val materializer = ActorMaterializer.create(system)
val throttler: ActorRef =
Source.actorRef(bufferSize = 1000, OverflowStrategy.dropNew)
.throttle(3, 1.second)
.to(Sink.actorRef(printer, NotUsed))
.run()
//Create a new actor for each request thread
for( a <- 1 to 10000){
// Create the 'greeter' actors
val howdyGreeter: ActorRef =
system.actorOf(Greeter.props(String.valueOf(a), printer), String.valueOf(a))
howdyGreeter ! RequestActor("RequestActor")
howdyGreeter ! Greet
}
}
An actor cannot influence what other actors do, in particular it has no control over who puts messages in its mailbox and when — this is how the actor model works. An actor only gets to decide what to do with the messages it finds in its mailbox, and over this it has full control. It can for example drop them, send back error replies, buffer them, etc.
If you want throttling and back-pressure, I recommend not using Actors at all for this part, but only use Akka Streams. The code that generates your request messages should be a Source, not a for-loop. Which source is most appropriate depends entirely on your real use-case, e.g. creating a stream from a strict collection with Source.from() or asynchronously pulling new elements out of a data structure with Source.unfoldAsync plus many more. Doing it this way ensures that the requests are only emitted when the time is right according to the downstream capacity or rate throttling.
It does not appear to me that you're actually using the throttler:
val throttler: ActorRef =
Source.actorRef(bufferSize = 1000, OverflowStrategy.dropNew)
.throttle(3, 1.second)
.to(Sink.actorRef(printer, NotUsed))
.run()
But I don't see any messages being sent to throttler in your code: throttler will only throttle messages sent to throttler.

Akka Flow hangs when making http requests via connection pool

I'm using Akka 2.4.4 and trying to move from Apache HttpAsyncClient (unsuccessfully).
Below is simplified version of code that I use in my project.
The problem is that it hangs if I send more than 1-3 requests to the flow. So far after 6 hours of debugging I couldn't even locate the problem. I don't see exceptions, error logs, events in Decider. NOTHING :)
I tried reducing connection-timeout setting to 1s thinking that maybe it's waiting for response from the server but it didn't help.
What am I doing wrong ?
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.headers.Referer
import akka.http.scaladsl.model.{HttpRequest, HttpResponse}
import akka.http.scaladsl.settings.ConnectionPoolSettings
import akka.stream.Supervision.Decider
import akka.stream.scaladsl.{Sink, Source}
import akka.stream.{ActorAttributes, Supervision}
import com.typesafe.config.ConfigFactory
import scala.collection.immutable.{Seq => imSeq}
import scala.concurrent.{Await, Future}
import scala.concurrent.duration.Duration
import scala.util.Try
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
val config = ConfigFactory.load()
private val baseDomain = "www.google.com"
private val poolClientFlow = Http()(system).cachedHostConnectionPool[Any](baseDomain, 80, ConnectionPoolSettings(config))
private val decider: Decider = {
case ex =>
ex.printStackTrace()
Supervision.Stop
}
private def sendMultipleRequests[T](items: Seq[(HttpRequest, T)]): Future[Seq[(Try[HttpResponse], T)]] =
Source.fromIterator(() => items.toIterator)
.via(poolClientFlow)
.log("Logger")(log = myAdapter)
.recoverWith {
case ex =>
println(ex)
null
}
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.runWith(Sink.seq)
.map { v =>
println(s"Got ${v.length} responses in Flow")
v.asInstanceOf[Seq[(Try[HttpResponse], T)]]
}
def main(args: Array[String]) {
val headers = imSeq(Referer("https://www.google.com/"))
val reqPair = HttpRequest(uri = "/intl/en/policies/privacy").withHeaders(headers) -> "some req ID"
val requests = List.fill(10)(reqPair)
val qwe = sendMultipleRequests(requests).map { case responses =>
println(s"Got ${responses.length} responses")
system.terminate()
}
Await.ready(system.whenTerminated, Duration.Inf)
}
}
Also what's up with proxy support ? Doesn't seem to work for me either.
You need to consume the body of the response fully so that the connection is made available for subsequent requests. If you don't care about the response entity at all, then you can just drain it to a Sink.ignore, something like this:
resp.entity.dataBytes.runWith(Sink.ignore)
By the default config, when using a host connection pool, the max connections is set to 4. Each pool has it's own queue where requests wait until one of the open connections becomes available. If that queue ever goes over 32 (default config, can be changed, must be a power of 2) then yo will start seeing failures. In your case, you only do 10 requests, so you don't hit that limit. But by not consuming the response entity you don't free up the connection and everything else just queues in behind, waiting for the connections to free up.