Are Iteratees safe for managing resources? - scala

Suppose I were reading from an InputStream.
How I would normally do it:
val inputStream = ...
try {
doStuff(inputStream)
} finally {
inputStream.close()
}
Whether or not doStuff throws an exception, we will close the InputStream.
How I would do it with iteratees:
val inputStream ...
Enumerator.fromStream(inputStream)(Iteratee.foreach(doStuff))
Will the InputStream be closed (even if doStuff throws an exception)?
A little test:
val inputStream = new InputStream() { // returns 10, 9, ... 0, -1
private var i = 10
def read() = {
i = math.max(0, i) - 1
i
}
override def close() = println("closed") // looking for this
}
Enumerator.fromStream(inputStream)(Iteratee.foreach(a => 1 / 0)).onComplete(println)
We only see:
Failure(java.lang.ArithmeticException: / by zero)
The stream was never closed. Replace 1 / 0 with 1 / 1 and you'll see that the stream closes.
Of course, I could maintain a reference to the original stream and close it in case of failure, but AFAIK the idea of using iteratees is creating composable iteration without having to do that.
Is this expected behavior?
Is there a way to use iteratees so the resources are always disposed correctly?

Iteratees were specially designed for safe resource management. See the first sentence of the Iteratee IO: safe, practical, declarative input processing:
Iteratee IO is a style of incremental input processing with precise resource control.
The idea is that when your resource is only accessed through iteratees then the resource-owning code can tell exactly when the iteratee is finished with the resource and close it immediately. On the other hand, when iteration is managed manually (as with a traditional InputStream) the user of the resource is responsible for closing it. This can lead to leaks.
Saying that, there was a bug in Play 2.1 where fromStream didn't manage the closing of its underlying InputStream! This bug was fixed in Play 2.2.
You can see the fromStream code to see how the Enumerator was fixed by using onDoneEnumerating to close the resource when the iteratee is finished.

Related

Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams. I would like to process this stream using Akka streams.
The Java API has a pub sub design and roughly works like this:
Subscription sub = createSubscription();
sub.addListener(new Listener() {
public void eventsReceived(List events) {
for (Event e : events)
buffer.enqueue(e)
}
});
I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success. Can anyone guide me on the best way to interface with this API using Akka? Is Akka Streams the best tool here?
To feed a Source, you don't necessarily need to use a custom graph stage. Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream.
There are a couple of tricky things to be aware of. The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription. Something like this:
def bufferSize: Int = ???
Source.fromMaterializer { (mat, att) =>
val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
val subscription = createSubscription()
subscription.addListener(
new Listener() {
def eventsReceived(events: java.util.List[Event]): Unit = {
import scala.collection.JavaConverters.iterableAsScalaIterable
import akka.stream.QueueOfferResult._
iterableAsScalaIterable(events).foreach { event =>
queue.offer(event) match {
case Enqueued => () // do nothing
case Dropped => ??? // handle a dropped pubsub element, might well do nothing
case QueueClosed => ??? // presumably cancel the subscription...
}
}
}
}
)
source.withAttributes(att)
}
Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue. Our subscription adds incoming elements to the queue.
The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up. The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.
Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst). If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer) to a stage which signals perpetual demand. For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed. Alternatively, it may be possible to combine your Events in some meaningful way, in which case a conflate stage will automatically combine incoming Events if the downstream can't process them quickly.
Great answer! I did build something similar. There are also kamon metrics to monitor queue size exc.
class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
private val logger = LoggerFactory.getLogger(getClass)
def bufferSize: Int = 1000
def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
Source.fromMaterializer { (mat, attr) =>
val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)
val receiver: MessageReceiver = {
(message: PubsubMessage, consumer: AckReplyConsumer) => {
metricsRegistry.inputEventQueueSize.update(queue.size())
queue.offer((message, consumer)) match {
case QueueOfferResult.Enqueued =>
metricsRegistry.inputQueueAddEventCounter.increment()
case QueueOfferResult.Dropped =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
case QueueOfferResult.Failure(ex) =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
case QueueOfferResult.QueueClosed =>
logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
consumer.nack()
mat.shutdown()
sys.exit(1)
}
}
}
val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
subscriber.startAsync().awaitRunning()
source.withAttributes(attr)
}
}
}

Strange timeout with ScalaTest's Selenium DSL

I'm writing Selenium tests with ScalaTest's Selenium DSL and I'm running into timeouts I can't explain. To make matters more complicated, they only seem to happen some of the time.
The problem occurs whenever I access an Element after a page load or some Javascript rendering. It looks like this:
click on "editEmployee"
eventually {
textField(name("firstName")).value = "Steve"
}
My PatienceConfig is configured like this:
override implicit val patienceConfig: PatienceConfig =
PatienceConfig(timeout = Span(5, Seconds), interval = Span(50, Millis))
The test fails with the following error:
- should not display the old data after an employee was edited *** FAILED ***
The code passed to eventually never returned normally. Attempted 1 times over 10.023253653000001 seconds.
Last failure message: WebElement 'firstName' not found.. (EditOwnerTest.scala:24)
It makes sense that it doesn't succeed immediately, because the click causes some rendering, and the textfield may not be available right away. However, it shouldn't take 10 seconds to make an attempt to find it, right?
Also, I find it very interesting that the eventually block tried it only once, and that it took almost precisely 10 seconds. This smells like a timeout occurred somewhere, and it's not my PatienceConfig, because that was set to time out after 5 seconds.
With this workaround, it does work:
click on "editEmployee"
eventually {
find(name("firstName")).value // from ScalaTest's `OptionValues`
}
textField(name("firstName")).value = "Steve"
I did some digging in the ScalaTest source, and I've noticed that all calls that have this problem (it's not just textField), eventually call webElement at some point. The reason why the workaround works, is because it doesn't call webElement. webElement is defined like this:
def webElement(implicit driver: WebDriver, pos: source.Position = implicitly[source.Position]): WebElement = {
try {
driver.findElement(by)
}
catch {
case e: org.openqa.selenium.NoSuchElementException =>
// the following is avoid the suite instance to be bound/dragged into the messageFun, which can cause serialization problem.
val queryStringValue = queryString
throw new TestFailedException(
(_: StackDepthException) => Some("WebElement '" + queryStringValue + "' not found."),
Some(e),
pos
)
}
}
I've copied that code into my project and played around with it, and it looks like constructing and/or throwing the exception is where most of the 10 seconds are spent.
(EDIT Clarification: I've actually seen the code actually spend its 10 seconds inside the catch block. The implicit wait is set to 0, and besides, if I remove the catch block everything simply works as expected.)
So my question is, what can I do to avoid this strange behaviour? I don't want to have to insert superfluous calls to find all the time, because it's easily forgotten, especially since, as I said, the error occurs only some of the time. (I haven't been able to determine when the behaviour occurs and when it doesn't.)
It is clear that the textField(name("firstName")).value = "Steve" ends up calling the WebElement as you have found out.
Since the issue in the op is happening where ever web elements are involved (which in turn implies that webdriver is involved), I think it is safe to assume that the issue is related to the implicit wait on the Web driver.
implicitlyWait(Span(0, Seconds))
The above should ideally fix the issue. Also, making implicit wait to be 0 is a bad practice. Any web page might have some loading issues. The page load is handled by Selenium outside its wait conditions. But slow element load (may be due to ajax calls) could result in failure. I usually keep 10 seconds as my standard implicit wait. For scenarios which require more wait, explicit waits can be used.
def implicitlyWait(timeout: Span)(implicit driver: WebDriver): Unit = {
driver.manage.timeouts.implicitlyWait(timeout.totalNanos, TimeUnit.NANOSECONDS)
}
Execution Flow:
name("firstName") ends up having value as Query {Val by = By.className("firstName") }.
def name(elementName: String): NameQuery = new NameQuery(elementName)
case class NameQuery(queryString: String) extends Query { val by = By.name(queryString) }
Query is fed to the textField method which calls the Query.webElement as below.
def textField(query: Query)(implicit driver: WebDriver, pos: source.Position): TextField = new TextField(query.webElement)(pos)
sealed trait Query extends Product with Serializable {
val by: By
val queryString: String
def webElement(implicit driver: WebDriver, pos: source.Position = implicitly[source.Position]): WebElement = {
try {
driver.findElement(by)
}
catch {
case e: org.openqa.selenium.NoSuchElementException =>
// the following is avoid the suite instance to be bound/dragged into the messageFun, which can cause serialization problem.
val queryStringValue = queryString
throw new TestFailedException(
(_: StackDepthException) => Some("WebElement '" + queryStringValue + "' not found."),
Some(e),
pos
)
}
}
}
I don't know ScalaTest's specifics, but such strange timeouts usually occur when you're mixing up implicit and explicit waits together.
driver.findElement uses implicit waits internally. And depending on specified explicit waits timeout, you may face with summing both together.
Ideally, implicit waits should be set to 0 to avoid such issues.

How to properly use spray.io LruCache

I am quite an unexperienced spray/scala developer, I am trying to properly use spray.io LruCache. I am trying to achieve something very simple. I have a kafka consumer, when it reads something from its topic I want it to put the value it reads to cache.
Then in one of the routings I want to read this value, the value is of type string, what I have at the moment looks as follows:
object MyCache {
val cache: Cache[String] = LruCache(
maxCapacity = 10000,
initialCapacity = 100,
timeToLive = Duration.Inf,
timeToIdle = Duration(24, TimeUnit.HOURS)
)
}
to put something into cache i use following code:
def message() = Future { new String(singleMessage.message()) }
MyCache.cache(key, message)
Then in one of the routings I am trying to get something from the cache:
val res = MyCache.cache.get(keyHash)
The problem is the type of res is Option[Future[String]], it is quite hard and ugly to access the real value in this case. Could someone please tell me how I can simplify my code to make it better and more readable ?
Thanks in advance.
Don't try to get the value out of the Future. Instead call map on the Future to arrange for work to be done on the value when the Future is completed, and then complete the request with that result (which is itself a Future). It should look something like this:
path("foo") {
complete(MyCache.cache.get(keyHash) map (optMsg => ...))
}
Also, if singleMessage.message does not do I/O or otherwise block, then rather than creating the Future like you are
Future { new String(singleMessage.message) }
it would be more efficient to do it like so:
Future.successful(new String(singleMessage.message))
The latter just creates an already completed Future, bypassing the use of an ExecutionContext to evaluate the function.
If singleMessage.message does do I/O, then ideally you would do that I/O with some library (like Spray client, if it's an HTTP request) that returns a Future (rather than using Future { ... } to create another thread which will block).

Scala Queue and NoSuchElementException

I am getting an infrequent NoSuchElementException error when operating over my Scala 2.9.2 Queue. I don't understand the exception becase the Queue has elements in it. I've tried switching over to a SynchronizedQueue, thinking it was a concurrency issue (my queue is written and read to from different threads) but that didn't solve it.
The reduced code looks like this:
val window = new scala.collection.mutable.Queue[Packet]
...
(thread 1)
window += packet
...
(thread 2)
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
Which results in
32
java.util.NoSuchElementException
at scala.collection.mutable.LinkedListLike$class.head(LinkedListLike.scala:76)
at scala.collection.mutable.LinkedList.head(LinkedList.scala:78)
at scala.collection.mutable.MutableList.head(MutableList.scala:53)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.mutable.MutableList.foreach(MutableList.scala:30)
The docs for LinkedListLike.head() say
Exceptions thrown
`NoSuchElementException`
if the linked list is empty.
but how can this exception be thrown if the Queue is not empty?
You should have window (mutable data structure) accessed from only a single thread. Other threads should send messages to that one.
There is Akka that allows relatively easy concurrent programming.
class MySource(windowHolderRef:ActorRef) {
def receive = {
case MyEvent(packet:Packet) =>
windowHolderRef ! packet
}
}
case object CheckMessages
class WindowHolder {
private val window = new scala.collection.mutable.Queue[Packet]
def receive = {
case packet:Packet =>
window += packet
case CheckMessages =>
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
}
}
To check messages periodically you can schedule periodic messages.
// context.schedule(1 second, 1 second, CheckMessages)

shutdown hook won't start upon ^C (scala)

i'm trying to get a clean and gracefull shutdown, and for some reason, it wont execute. iv'e tried:
sys addShutdownHook{
logger.warn("SHUTTING DOWN...")
// irrelevant logic here...
}
and also:
Runtime.getRuntime.addShutdownHook(ThreadOperations.delayOnThread{
logger.warn("SHUTTING DOWN...")
// irrelevant logic here...
}
)
where ThreadOperations.delayOnThread definition is:
object ThreadOperations {
def startOnThread(body: =>Unit) : Thread = {
onThread(true, body)
}
def delayOnThread(body: =>Unit) : Thread = {
onThread(false, body)
}
private def onThread(runNow : Boolean, body: =>Unit) : Thread = {
val t=new Thread {
override def run=body
}
if(runNow){t.start}
t
}
// more irrelevant operations...
}
but when i run my program (executable jar, double activation), the hook does not start. so what am i doing wrong? what is the right way to add a shutdown hook in scala? is it in any way related to the fact i'm using double activation?
double activation is done like that:
object Gate extends App {
val givenArgs = if(args.isEmpty){
Array("run")
}else{
args
}
val jar = Main.getClass.getProtectionDomain().getCodeSource().getLocation().getFile;
val dir = jar.dropRight(jar.split(System.getProperty("file.separator")).last.length + 1)
val arguments = Seq("java", "-cp", jar, "boot.Main") ++ givenArgs.toSeq
Process(arguments, new java.io.File(dir)).run();
}
(scala version: 2.9.2 )
thanks.
In your second attempt, your shutdown hook you seems to just create a thread and never start it (so it just gets garbage collected and does nothing). Did I miss something? (EDIT: yes I did, see comment. My bad).
In the first attempt, the problem might just be that the underlying log has some caching, and the application exits before the log is flushed.
Solved it.
For some reason, I thought that run as opposed to ! would detach the process. It actually hangs on because there are open streams left to the Process, which is returned from run (or maybe it just hangs for another reason, 'cause exec doesn't hang, but returns a Process with open streams to and from the child process, much like run). For this reason, the original process was still alive, and I accidentally sent the signals to it. Of course, it did not contain a handler, or a shutdown hook, so nothing happened.
The solution was to use Runtime.getRuntime.exec(arguments.toArray) instead of Process(arguments, new java.io.File(dir)).run();, close the streams in the Gate object, and send the ^C signal to the right process.