Embedded kafka throws exception when producing within multiple scalatest suits - apache-kafka

Here is how my test suit is configured.
"test payments" should {
"Add 100 credits" in {
runTeamTest { team =>
withRunningKafka {
val addCreditsRequest = AddCreditsRequest(team.id.stringify, member1Email, 100)
TestCommon.makeRequestAndCheck(
member1Email,
TeamApiGenerated.addCredits().url,
Helpers.POST,
Json.toJson(addCreditsRequest),
OK
)
val foundTeam = TestCommon.waitForFuture(TeamDao.findOneById(team.id))
foundTeam.get.credits mustEqual initialCreditAmount + 100
}
}
}
"deduct 100 credits" in {
runTeamTest { team =>
withRunningKafka {
val deductCreditsRequest = DeductCreditsRequest(team.id.stringify, member1Email, 100)
TestCommon.makeRequestAndCheck(
member1Email,
TeamApiGenerated.deductCredits().url,
Helpers.POST,
Json.toJson(deductCreditsRequest),
OK
)
val foundTeam = TestCommon.waitForFuture(TeamDao.findOneById(team.id))
foundTeam.get.credits mustEqual initialCreditAmount - 100
}
}
}
Within Scalatest, the overarching suit name is "test payments" and the subsequent tests inside it have issues after the first one is run. If I run each of the two tests individually, they will succeed, but if I run the entire suit, the first succeeds and the second returns a org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. exception. The code above doesn't display the code within the controllers that are being tested, but within the controller, I have a kafka consumer that is constantly polling and close() isn't called on it within the tests.

I'd suggest you use companion object methods EmbeddedKafka.start() and EmbeddedKafka.stop() in the beforeAll and afterAll sections. This way you also avoid stopping / starting Kafka again for a single test class.
Also try to make sure you're not trying to start 2 or more instances of Kafka on the same port at the same time.

Related

Why am I occasionally getting a InvalidStateStoreException PARTITIONS_REVOKED, not RUNNING when retrieving a store to query it?

I am accessing a state store to query it and have had to wrap the store() statement with a try/catch block to retry it because sometimes I am getting this exception:
org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get state store customers-store because the stream thread is PARTITIONS_REVOKED, not RUNNING
at org.apache.kafka.streams.state.internals.StreamThreadStateStoreProvider.stores(StreamThreadStateStoreProvider.java:49)
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:57)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1053)
at com.codependent.kafkastreams.customer.service.CustomerService.getCustomer(CustomerService.kt:75)
at com.codependent.kafkastreams.customer.service.CustomerServiceKt.main(CustomerService.kt:108)
This is the code used to retrieve the store (the full code is on a github repo):
fun getCustomer(id: String): Customer? {
var keyValueStore: ReadOnlyKeyValueStore<String, Customer>? = null
while(keyValueStore == null) {
try {
keyValueStore = streams.store(CUSTOMERS_STORE, QueryableStoreTypes.keyValueStore<String, Customer>())
} catch (ex: InvalidStateStoreException) {
ex.printStackTrace()
}
}
val customer = keyValueStore.get(id)
return customer
}
And this is the main program:
fun main(args: Array<String>) {
val customerService = CustomerService("main", "localhost:9092")
customerService.initializeStreams()
customerService.createCustomer(Customer("53", "Joey"))
val customer = customerService.getCustomer("53")
println(customer)
customerService.stopStreams()
}
The exception happens randomly running the program several times, after the previous executions finish. Note: I don't do anything to the executing Kafka cluster and use its default config.
At the time you are accessing the store, the Kafka Streams application is going through a rebalance, and state stores aren't accessible at that time. You want to make sure you only query the stores when the application state is RUNNING and not REBALANCING.
What you could do is check the state of the application before attempting to read from the store like this:
if(streams.state() == State.RUNNING) {
keyValueStore = streams.store(...);
val customer = keyValueStore.get(id);
return customer;
}
There is also a KafkaStreams.setStateListener method you can use to register a KafkStreams.StateListener implementation. The StateListener.onChange method is called each time the application changes its state.

Strange timeout with ScalaTest's Selenium DSL

I'm writing Selenium tests with ScalaTest's Selenium DSL and I'm running into timeouts I can't explain. To make matters more complicated, they only seem to happen some of the time.
The problem occurs whenever I access an Element after a page load or some Javascript rendering. It looks like this:
click on "editEmployee"
eventually {
textField(name("firstName")).value = "Steve"
}
My PatienceConfig is configured like this:
override implicit val patienceConfig: PatienceConfig =
PatienceConfig(timeout = Span(5, Seconds), interval = Span(50, Millis))
The test fails with the following error:
- should not display the old data after an employee was edited *** FAILED ***
The code passed to eventually never returned normally. Attempted 1 times over 10.023253653000001 seconds.
Last failure message: WebElement 'firstName' not found.. (EditOwnerTest.scala:24)
It makes sense that it doesn't succeed immediately, because the click causes some rendering, and the textfield may not be available right away. However, it shouldn't take 10 seconds to make an attempt to find it, right?
Also, I find it very interesting that the eventually block tried it only once, and that it took almost precisely 10 seconds. This smells like a timeout occurred somewhere, and it's not my PatienceConfig, because that was set to time out after 5 seconds.
With this workaround, it does work:
click on "editEmployee"
eventually {
find(name("firstName")).value // from ScalaTest's `OptionValues`
}
textField(name("firstName")).value = "Steve"
I did some digging in the ScalaTest source, and I've noticed that all calls that have this problem (it's not just textField), eventually call webElement at some point. The reason why the workaround works, is because it doesn't call webElement. webElement is defined like this:
def webElement(implicit driver: WebDriver, pos: source.Position = implicitly[source.Position]): WebElement = {
try {
driver.findElement(by)
}
catch {
case e: org.openqa.selenium.NoSuchElementException =>
// the following is avoid the suite instance to be bound/dragged into the messageFun, which can cause serialization problem.
val queryStringValue = queryString
throw new TestFailedException(
(_: StackDepthException) => Some("WebElement '" + queryStringValue + "' not found."),
Some(e),
pos
)
}
}
I've copied that code into my project and played around with it, and it looks like constructing and/or throwing the exception is where most of the 10 seconds are spent.
(EDIT Clarification: I've actually seen the code actually spend its 10 seconds inside the catch block. The implicit wait is set to 0, and besides, if I remove the catch block everything simply works as expected.)
So my question is, what can I do to avoid this strange behaviour? I don't want to have to insert superfluous calls to find all the time, because it's easily forgotten, especially since, as I said, the error occurs only some of the time. (I haven't been able to determine when the behaviour occurs and when it doesn't.)
It is clear that the textField(name("firstName")).value = "Steve" ends up calling the WebElement as you have found out.
Since the issue in the op is happening where ever web elements are involved (which in turn implies that webdriver is involved), I think it is safe to assume that the issue is related to the implicit wait on the Web driver.
implicitlyWait(Span(0, Seconds))
The above should ideally fix the issue. Also, making implicit wait to be 0 is a bad practice. Any web page might have some loading issues. The page load is handled by Selenium outside its wait conditions. But slow element load (may be due to ajax calls) could result in failure. I usually keep 10 seconds as my standard implicit wait. For scenarios which require more wait, explicit waits can be used.
def implicitlyWait(timeout: Span)(implicit driver: WebDriver): Unit = {
driver.manage.timeouts.implicitlyWait(timeout.totalNanos, TimeUnit.NANOSECONDS)
}
Execution Flow:
name("firstName") ends up having value as Query {Val by = By.className("firstName") }.
def name(elementName: String): NameQuery = new NameQuery(elementName)
case class NameQuery(queryString: String) extends Query { val by = By.name(queryString) }
Query is fed to the textField method which calls the Query.webElement as below.
def textField(query: Query)(implicit driver: WebDriver, pos: source.Position): TextField = new TextField(query.webElement)(pos)
sealed trait Query extends Product with Serializable {
val by: By
val queryString: String
def webElement(implicit driver: WebDriver, pos: source.Position = implicitly[source.Position]): WebElement = {
try {
driver.findElement(by)
}
catch {
case e: org.openqa.selenium.NoSuchElementException =>
// the following is avoid the suite instance to be bound/dragged into the messageFun, which can cause serialization problem.
val queryStringValue = queryString
throw new TestFailedException(
(_: StackDepthException) => Some("WebElement '" + queryStringValue + "' not found."),
Some(e),
pos
)
}
}
}
I don't know ScalaTest's specifics, but such strange timeouts usually occur when you're mixing up implicit and explicit waits together.
driver.findElement uses implicit waits internally. And depending on specified explicit waits timeout, you may face with summing both together.
Ideally, implicit waits should be set to 0 to avoid such issues.

Apache Spark: how to cancel job in code and kill running tasks?

I am running a Spark application (version 1.6.0) on a Hadoop cluster with Yarn (version 2.6.0) in client mode. I have a piece of code that runs a long computation, and I want to kill it if it takes too long (and then run some other function instead).
Here is an example:
val conf = new SparkConf().setAppName("TIMEOUT_TEST")
val sc = new SparkContext(conf)
val lst = List(1,2,3)
// setting up an infite action
val future = sc.parallelize(lst).map(while (true) _).collectAsync()
try {
Await.result(future, Duration(30, TimeUnit.SECONDS))
println("success!")
} catch {
case _:Throwable =>
future.cancel()
println("timeout")
}
// sleep for 1 hour to allow inspecting the application in yarn
Thread.sleep(60*60*1000)
sc.stop()
The timeout is set for 30 seconds, but of course the computation is infinite, and so Awaiting on the result of the future will throw an Exception, which will be caught and then the future will be canceled and the backup function will execute.
This all works perfectly well, except that the canceled job doesn't terminate completely: when looking at the web UI for the application, the job is marked as failed, but I can see there are still running tasks inside.
The same thing happens when I use SparkContext.cancelAllJobs or SparkContext.cancelJobGroup. The problem is that even though I manage to get on with my program, the running tasks of the canceled job are still hogging valuable resources (which will eventually slow me down to a near stop).
To sum things up: How do I kill a Spark job in a way that will also terminate all running tasks of that job? (as opposed to what happens now, which is stopping the job from running new tasks, but letting the currently running tasks finish)
UPDATE:
After a long time ignoring this problem, we found a messy but efficient little workaround. Instead of trying to kill the appropriate Spark Job/Stage from within the Spark application, we simply logged the stage ID of all active stages when the timeout occurred, and issued an HTTP GET request to the URL presented by the Spark Web UI used for killing said stages.
I don't know it this answers your question.
My need was to kill jobs hanging for too much time (my jobs extract data from Oracle tables, but for some unknonw reason, seldom the connection hangs forever).
After some study, I came to this solution:
val MAX_JOB_SECONDS = 100
val statusTracker = sc.statusTracker;
val sparkListener = new SparkListener()
{
override def onJobStart(jobStart : SparkListenerJobStart)
{
val jobId = jobStart.jobId
val f = Future
{
var c = MAX_JOB_SECONDS;
var mustCancel = false;
var running = true;
while(!mustCancel && running)
{
Thread.sleep(1000);
c = c - 1;
mustCancel = c <= 0;
val jobInfo = statusTracker.getJobInfo(jobId);
if(jobInfo!=null)
{
val v = jobInfo.get.status()
running = v == JobExecutionStatus.RUNNING
}
else
running = false;
}
if(mustCancel)
{
sc.cancelJob(jobId)
}
}
}
}
sc.addSparkListener(sparkListener)
try
{
val df = spark.sql("SELECT * FROM VERY_BIG_TABLE") //just an example of long-running-job
println(df.count)
}
catch
{
case exc: org.apache.spark.SparkException =>
{
if(exc.getMessage.contains("cancelled"))
throw new Exception("Job forcibly cancelled")
else
throw exc
}
case ex : Throwable =>
{
println(s"Another exception: $ex")
}
}
finally
{
sc.removeSparkListener(sparkListener)
}
For the sake of future visitors, Spark introduced the Spark task reaper since 2.0.3, which does address this scenario (more or less) and is a built-in solution.
Note that is can kill an Executor eventually, if the task is not responsive.
Moreover, some built-in Spark sources of data have been refactored to be more responsive to spark:
For the 1.6.0 version, Zohar's solution is a "messy but efficient" one.
According to setJobGroup:
"If interruptOnCancel is set to true for the job group, then job cancellation will result in Thread.interrupt() being called on the job's executor threads."
So the anno function in your map must be interruptible like this:
val future = sc.parallelize(lst).map(while (!Thread.interrupted) _).collectAsync()

spray and actor non deterministic tests

Helo,
at the beginning i wold like to apologize for my english :)
akka=2.3.6
spray=1.3.2
scalatest=2.2.1
I encountered strange behavior of teting routes, which asks actors in handleWith directive,
I've route with handleWith directive
pathPrefix("firstPath") {
pathEnd {
get(complete("Hello from this api")) ~
post(handleWith { (data: Data) =>{ println("receiving data")
(dataCalculator ? data).collect {
case Success(_) =>
Right(Created -> "")
case throwable: MyInternalValidatationException =>
Left(BadRequest -> s"""{"${throwable.subject}" : "${throwable.cause}"}""")
}
}})
}
}
and simple actor wchich always responds when receive object Data and has own receive block wrapped in LoggingReceive, so I should see logs when message is receiving by actor
and i test it using (I think simple code)
class SampleStarngeTest extends WordSpec with ThisAppTestBase with OneInstancePerTest
with routeTestingSugar {
val url = "/firstPath/"
implicit val routeTestTimeout = RouteTestTimeout(5 seconds)
def postTest(data: String) = Post(url).withJson(data) ~> routes
"posting" should {
"pass" when {
"data is valid and comes from the identified user" in {
postTest(correctData.copy(createdAt = System.currentTimeMillis()).asJson) ~> check {
print(entity)
status shouldBe Created
}
}
"report is valid and comes from the anonymous" in {
postTest(correctData.copy(createdAt = System.currentTimeMillis(), adid = "anonymous").asJson) ~> check {
status shouldBe Created
}
}
}
}
}
and behavior:
When I run either all tests in package (using Intellij Idea 14 Ultimate) or sbt test I encounter the same results
one execution -> all tests pass
and next one -> not all pass, this which not pass I can see:
1. fail becouse Request was neither completed nor rejected within X seconds ( X up tp 60)
2. system console output from route from line post(handleWith { (data: Data) =>{ println("receiving data"), so code in handleWith was executed
3. ask timeout exception from route code, but not always (among failed tests)
4. no logs from actor LoggingReceive, so actor hasn't chance to respond
5. when I rerun teststhe results are even different from the previous
Is there problem with threading? or test modules, thread blocking inside libraries? or sth else? I've no idea why it isn't work :(

How can one verify messages sent to self are delivered when testing Akka actors?

I have an Actor that is similar to the following Actor in function.
case class SupervisingActor() extends Actor {
protected val processRouter = //round robin router to remote workers
override def receive = {
case StartProcessing => { //sent from main or someplace else
for (some specified number of process actions ){
processRouter ! WorkInstructions
}
}
case ProcessResults(resultDetails) => { //sent from the remote workers when they complete their work
//do something with the results
if(all of the results have been received){
//*********************
self ! EndProcess //This is the line in question
//*********************
}
}
case EndProcess {
//do some reporting
//shutdown the ActorSystem
}
}
}
}
How can I verify the EndProcess message is sent to self in tests?
I'm using scalatest 2.0.M4, Akka 2.0.3 and Scala 1.9.2.
An actor sending to itself is very much an intimiate detail of how that actor performs a certain function, hence I would rather test the effect of that message than whether or not that message has been delivered. I’d argue that sending to self is the same as having a private helper method on an object in classical OOP: you also do not test whether that one is invoked, you test whether the right thing happened in the end.
As a side note: you could implement your own message queue type (see https://doc.akka.io/docs/akka/snapshot/mailboxes.html#creating-your-own-mailbox-type) and have that allow the inspection or tracing of message sends. The beauty of this approach is that it can be inserted purely by configuration into the actor under test.
In the past, I have overridden the implementation for ! so that I could add debug/logging. Just call super.! when you're done, and be extra careful not to do anything that would throw an exception.
I had the same issue with an FSM actor. I tried setting up a custom mailbox as per the accepted answer but a few minutes didn't get it working. I also attempted to override the tell operator as per another answer but that was not possible as self is a final val. Eventually I just replaced:
self ! whatever
with:
sendToSelf(whatever)
and added that method into the actor as:
// test can override this
protected def sendToSelf(msg: Any) {
self ! msg
}
then in the test overrode the method to capture the self sent message and sent it back into the fsm to complete the work:
#transient var sent: Seq[Any] = Seq.empty
val fsm = TestFSMRef(new MyActor(x,yz) {
override def sendToSelf(msg: Any) {
sent = sent :+ msg
}
})
// yes this is clunky but it works
var wait = 100
while( sent.isEmpty && wait > 0 ){
Thread.sleep(10)
wait = wait - 10
}
fsm ! sent.head