Right now I'm having difficulty with a custom gatling feeder despite the fact that it's circular. I'm getting this error:
java.lang.IllegalStateException: Feeder is now empty, stopping engine
I'm reading this is the default behavior. However, I want to make sure each user users a different refurl from the feeder: refUrlFeederBuffer.
Also, why isn't it running my after method? I need my cleanup procedures to run regardless of the success or failure of the simulation. If I don't cleanup I can't restart the test!
var refUrlFeeder: Array [Map[String, String]] = Array()
before {
//create stuff and put the refUrls from it in a map
refUrlFeeder = refUrlFeeder :+ Map("refUrl" -> simpleUrl)
}
after {
// delete whatever I created in the before method
// THIS METHOD DOES NOT EXCUTE if the feeder is emptied
// I need it to execute regardless of errors during the scenario
}
object ImportRecords {
val someXml = "<some xml>"
val feeder = RecordSeqFeederBuilder(refUrlFeeder).circular
val update =
feed(feeder)
exec(http("Update stuff")
.put("${refUrl}")
.body(StringBody(someXml))
.asXML
.check(status.is(200))
)
}
val adminUpdaters = scenario("Admins who do updates").exec(ImportRecords.update) setUp(adminUpdaters.inject(atOnceUsers(1)).protocols(httpConf))
When the feeder runs out of items Gatling stops whole engine. It is exceptional situation which is stated also in exception itself:
[error] java.lang.IllegalStateException: Feeder is now empty, stopping engine
Hook after is called only when simulation is completed. You can receive the errors in sense of logic in your simulation, but not developer bugs. Hook is not called when there is a developer bug, which in this case it is.
Simply running out of feeder is a bug, because it says that your setUp part of simulation is not in correlation with your provided data in this case your feeder.
Btw. what does your setUp part of the simulation looks like?
EDIT: Just looking at your code structure, I'm guessing (while not seeing the whole simulation), that initialisation of your ImportRecords happens before hook before is called and thus your val feeder contains empty array. Making an empty array circular will lead to just another empty array, hence you will get an exception when Gatling tries to take an element from feeder. Try to add:
println(refUrlFeeder)
into initialisation of your object ImportRecords to find out if this is the case.
Good luck
Related
I'm developing a shared library in Jenkins to help with interacting with an internal API. I can make single call which starts a long running process to create an object. I have to continue to query the API to check for the process' completion.
I'm trying to get this done in using a simple loop, but I keep getting stuck. Here's my function to query the API until it's completed:
def url = new URL("http://myapi/endpoint")
HttpURLConnection = http = (HttpURLConnection) url.openConnection()
http.setDoOutput(true)
http.setRequestMethod('POST')
http.setRequestProperty("Content-Type", "application/x-www-form-urlencoded")
def body = ["string", "anotherstring"].join('=')
OutputStreamWriter osw = new OutputStreamWriter(http.outStream)
osw.write(body)
osw.flush()
osw.close()
for(int i = 0; i < 30; i++) {
Integer counter = 0
http.connect()
response = http.content.text
def status = new JsonSlurperClassic().parseText(response)
// Code to check values here
}
When I run this through a pipeline, the first iteration through the loop works fine. The next iteration bombs with this error:
Caused: java.io.NotSerializableException: sun.net.www.protocol.http.HttpURLConnection
I just started in Groovy, so I feel like I'm trying to do this wrong. I've looked all over trying to find answers and tried several things without any luck.
Thanks in advance.
When running a pipeline job, Jenkins will constantly save the state of the execution so it can be paused and later on resumed, this means that Jenkins must be able to serialize the state of the script and therefore all of the objects that you create in your pipeline must be serializable.
If an object is not serializable you will get the NotSerializableException whenever Jenkins will attempt to serialize your un-serializable object when it save the state.
To overcome this issue you can use the #NonCPS annotation, which will cause Jenkins to execute the function without trying to serialize it. Read more info on this issue at pipeline-best-practice.
While normal Pipeline is restricted to serializable local variables (see appendix at bottom), #NonCPS functions can use more complex, nonserializable types internally (for example regex matchers, etc). Parameters and return types should still be Serializable, however.
There are however some limitations so read the documentation carefully, for example the return value types from #NonCPS methods must be serializable and you can't use any pipeline steps or CPS transformed methods inside a #NonCPS annotated function. Additional info can be found here.
One last thing, to overcome all these issues you can also use the Jenkins HTTP Plugin which includes all the HTTP abilities you will probably need wrapped in an easy to use build in interface.
I have been researching on the proper way of handling exceptions in Apache Spark jobs. I have read through different questions in Stackoverflow but I still haven't got to a conclusion. From my point of view there are three ways of handling exceptions:
Try catch/block surrounding the lambda function that is going to perform the computation. This is tricky because the block will have to be placed surrounding the code that triggers the lazy computation. If an error happens then I assume there won't be any RDD to work with (Taken from this blog entry)
val lines: RDD[String] = sc.textFile("large_file.txt")
val tokens =
lines.flatMap(_ split " ")
.map(s => s(10))
try {
// This try-catch block catch all the exceptions thrown by the
// preceding transformations.
tokens.saveAsTextFile("/some/output/file.txt")
} catch {
case e : StringIndexOutOfBoundsException =>
// Doing something in response of the exception
}
Try catch/block inside the lambda function: This implies deciding on the correct output of a caught exception inside the lambda function.
rdd.map({
Try(fn) match{
case Success: _
case Failure:<<Record with error flag>>
}).filter(record.errorflag==null)
Let the exception propagate. The task will fail and the Spark framework will relaunch the task again. This works when the error is caused by reasons outside the code scope. e.g. (memory leak, connection to another service lost momentarily.)
What's the correct way of handling exceptions?. I guess it depends on what you want to achieve with the RDD operation. If an error in one of the RDD records means that the output is not valid then option 1 is the way to go. If we expect some of the records to fail, we go for option 2. Option 3 does not even need to make a choice as it is the normal behaviour of the platform.
In the past we did not bother with try/catch approach except for input parameter checking.
For the rest we just relied on checking the return code as in:
spark-submit --master yarn ... bla bla
ret_val=$?
...
Why? As you need to correct something in general and we needed to then start over again. It's hard to dynamically correct certain things. Your scheduling tool can pick this up as well, Rundeck, Airflow...et al.
More advanced restart options are possible but simply get convoluted, but could be done. As you allude to in option 2. Never seen that done though.
I have some async (ZIO) code, which I need to test. If I create a testing part using Thread.sleep() it works fine and I always get response:
for {
saved <- database.save(smth)
result <- eventually {
Thread.sleep(20000)
database.search(...)
}
} yield result
But if I made same logic using timeout and interval from eventually then it never works correctly ( I got timeouts):
for {
saved <- database.save(smth)
result <- eventually(timeout(Span(20, Seconds)), interval(Span(20, Seconds))) {
database.search(...)
}
} yield result
I do not understand why timeout and interval works different then Thread.sleep. It should be doing exactly same thing. Can someone explain it to me and tell how I should change this code to do not need to use Thread.sleep()?
Assuming database.search(...) returns ZIO[] object.
eventually{database.search(...)} most probably succeeds immediately after the first try.
It successfully created a task to query the database.
Then database is queried without any retry logic.
Regarding how to make it work:
val search: ZIO[Any, Throwable, String] = ???
val retried: ZIO[Any with Clock, Throwable, Option[String]] = search.retry(Schedule.spaced(Duration.fromMillis(1000))).timeout(Duration.fromMillis(20000))
Something like that should work. But I believe that more elegant solutions exist.
The other answer from #simpadjo addresses the "what" quite succinctly. I'll add some additional context as to why you might see this behavior.
for {
saved <- database.save(smth)
result <- eventually {
Thread.sleep(20000)
database.search(...)
}
} yield result
There are three different technologies being mixed here which is causing some confusion.
First is ZIO which is an asynchronous programming library that uses it's own custom runtime and execution model to perform tasks. The second is eventually which comes from ScalaTest and is useful for checking asynchronous computations by effectively polling the state of a value. And thirdly, there is Thread.sleep which is a Java api that literally suspends the current thread and prevents task progression until the timer expires.
eventually uses a simple retry mechanism that differs based on whether you are using a normal value or a Future from the scala standard library. Basically it runs the code in the block and if it throws then it sleeps the current thread and then retries it based on some interval configuration, eventually timing out. Notably in this case the behavior is entirely synchronous, meaning that as long as the value in the {} doesn't throw an exception it won't keep retrying.
Thread.sleep is a heavy weight operation and in this case it is effectively blocking the function being passed to eventually from progressing for 20 seconds. Meaning that by the time the database.search is called the operation has likely completed.
The second variant is different, it executes the code in the eventually block immediately, if it throws an exception then it will attempt it again based on the interval/timeout logic that your provide. In this scenario the save may not have completed (or propagated if it is eventually consistent). Because you are returning a ZIO which is designed not to throw, and eventually doesn't understand ZIO it will simply return the search attempt with no retry logic.
The accepted answer:
val retried: ZIO[Any with Clock, Throwable, Option[String]] = search.retry(Schedule.spaced(Duration.fromMillis(1000))).timeout(Duration.fromMillis(20000))
works because the retry and timeout are using the built-in ZIO operators which do understand how to actually retry and timeout a ZIO. Meaning that if search fails the retry will handle it until it succeeds.
I've seen this question asked here but they essentially focus on spark streaming and I can't find a proper solution to work on batch. The idea is to loop through several days and at each iteration/day it updates the information about the previous day (that is used for the current iteration). The code looks like the following:
var prevIterDataRdd = // some RDD
days.foreach(folder => {
val previousData : Map[String, Double] = parseResult(prevIterDataRdd)
val broadcastMap = sc.broadcast(previousData)
val (result, previousStatus) =
processFolder(folder, broadcastMap)
// store result
result.write.csv(outputPath)
// updating the RDD that enables me to extract previousData to update broadcast
val passingPrevStatus = prevIterDataRdd.subtractByKey(previousStatus)
prevIterDataRdd = previousStatus.union(passingPrevStatus)
broadcastMap.unpersist(true)
broadcastMap.destroy()
})
Using broadcastMap.destroy() does not run because it does not let me use the broadcastMap again (which I actually don't understand because it should be totally unrelated - immutable).
How should I run this loop and update the broadcast variable at each iteration?
When using method unpersist I pass the true argument in order to make it blocking. Is sc.broadcast() also blocking?
Do I really need unpersist() if I'm anyway broadcasting again?
Why can't I use the broadcast again after using destroy given that I'm creating a new broadcast variable?
Broadcast variables are immutable but you can create a new broadcast variable.
This new broadcast variable can be used in the next iteration.
All you need to do is to change the reference to the newly created broadcast, unpersist the old broadcast from the executors and destroy it from the driver.
Define the variable at class level which will allow you to change the reference of broadcast variable in driver and use the destroy method.
object Main extends App {
// defined and initialized at class level to allow reference change
var previousData: Map[String, Double] = null
override def main(args: Array[String]): Unit = {
//your code
}
}
You were not allowed to use the destroy method on the variable because the reference no longer exists in the driver. Changing the reference to the new broadcast variable can resolve the issue.
Unpersist only removes data from the executors and hence when the variable is re-accessed, the driver resends it to the executors.
blocking = true will allow you let the application completely remove the data from the executor before the next access.
sc.broadcast() - There is no official documentation saying that it is blocking. Although as soon as it is called the application will start broadcasting the data to the executors before running the next line of the code .So if the data is very large it may slow down your application. So be care full on how you are using it .
It is a good practice to call unpersist before destroy.This will help you get rid of data completely from executors and driver.
let's say I have a code as follows :
var model = initiliazeModel(some_params)
dstream.foreachRDD { rdd =>
model = model.update(rdd)
println(model)
}
println(model) // or doing some thing on the model
My problem is that even if the first println gives the desired result ie. the model up-to-date, the second println displays the initialized model and not the updated one !!!
My question is how can I spread the updated model outside the block foreachRDD ?!
I also think of a synchronization problem because the 2nd println is run before the 1st one !!!
Thanks for help !
You have a common misconception here. In general, when you call map, filter, foreach, and any other transformation, you are not executing anything just yet. You closures are sent to executors and the stages configured, but all things are evaluated lazily. Your main program proceed ahead, either adding more configuration or other things, not waiting for all computations to be done. Thus, when your program reaches your second println (miliseconds after), the model has not changed nor has any other println been called.
In Scala, I have no idea, but in Java, you can enclose your foreach and model variable within a class as a static members and then use model variable after the success of foreach in another class.
Accumulators are global in Spark. you can update the accumulator variable anywhere in the Program and it gets reflects everywhere regardless of wether it is different executor or driver program.
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
creating and initilizing Accumulator
val accumulator = sc.accumulator(0)
Initializing the accumulator
accumulator.add(1)
accessing the latest value
accumulator.value
hope this helps