Need advice on how to optimize my Scala tests

Need advice on how to optimize my Scala tests - scala

I am using ScalaTest for automation.
A typical test logical structure is that the test is doing some checks of the application logic and then cleans up. So let's call it a test body part and a test cleanup part. If the test body fails I would like to see it in the test report. If the test body does not fail but the cleanup part fails I also would like to see in the test report that the test ended up with error.
So I've come up with the following structure (example is the most simple I can provide):
"Admin" should "be able to create a new team" in{
val tempTeam = Team("Temp QA Team")
val attempt=Try{
When("Admin opens the Teams view")
TeamsPage.open
And("creates a new team")
TeamsPage.createNewTeam(tempTeam)
Then("this team is shown in the list")
TeamsPage.isParticularTeamShownInTeamList(tempTeam.name) shouldBe true
}
val cleanUp = Try(TeamsPage.cleanUpTeam(tempTeam))
attempt match{
case Failure(e) => throw e
case Success(r) =>{
if(cleanUp.isFailure) cleanUp.get
r
}
}
}
Please note here that I need the cleanup part to always execute, not only when the test body part is successful.
It works as I expect but I see two problems:
IntelliJ Idea tells me that cleanUp.get is useless expression. How to write that part in more correct way? I could rewrite it as if(cleanUp.isFailure) throw cleanUp.failed.get, then the IDE would not complain but actually that is a longer way to write the same statement.
The last part of this test code which actually compares results of the test body part and cleanup part and decides what to return looks a bit bloated. Probably you can advice me how to make it more concise and clear?

If I understand what you're trying to do correctly, the answer is flatMap and map as laid out in the documentation for scala.util.Try
In your case (taking your code as is), you would want
"Admin" should "be able to create a new team" in{
val tempTeam = Team("Temp QA Team")
val attempt=Try{
When("Admin opens the Teams view")
TeamsPage.open
And("creates a new team")
TeamsPage.createNewTeam(tempTeam)
Then("this team is shown in the list")
TeamsPage.isParticularTeamShownInTeamList(tempTeam.name) shouldBe true
}
val cleanUp = Try(TeamsPage.cleanUpTeam(tempTeam))
attempt.flatMap(r => cleanup.map(c => r)).get
}
This will return the result of attempt, unless it fails, in which case it will throws attempt's exception. It will ignore the successful result of cleanup (as your code did), but if cleanup throws an exception, you'll throw that exception.
N.B. I didn't actually try this in an IDE, so I can't say if this will address your question #1 about IntelliJ saying that get was a useless expression.

Related

How to pass and get attributes in Gatling session from and to "exec" blocks

I'm pretty new at Scala/Gatling so forgive me if you see an anti-pattern or something wrong, I have a Gatling scenario in which I have to run some bash external scripts, and have to save some variables for their use in another exec block (I've tried calling the .exec right after the " exec(session => { ..." block, and have tried calling it as a method in another object.
exec(session => {
val scriptOutput = s"src/main/resources/thepath/myscript.sh ${arg1} ${arg2}".!!
val x_variable = "123" + scriptOutput
session.set("x_variable",x_variable)
})
.exec(MyClient.calling)
In "MyClient", I need to use the value of "x_variable", I currently have something like this:
def calling() = {
exec(http("POST to ${x_variable}")
.post("/${x_variable}"))
}
But when doing so, it doesn't work, the Post call is made but the variable "x_variable" is empty. To summarize, the question is how to pass that "session" information to any next "exec" block (right after or in another object), and how to consume it from that "session"?

The code described is working now, it seems I had some "trash" in the environment, after doing an mvn clean install it worked as expected.

How do I ensure that my Apache Spark setup code runs only once?

I'm writing a Spark job in Scala that reads in parquet files on S3, does some simple transforms, and then saves them to a DynamoDB instance. Each time it runs we need to create a new table in Dynamo so I've written a Lambda function which is responsible for table creation. The first thing my Spark job does is generates a table name, invokes my Lambda function (passing the new table name to it), waits for the table to be created, and then proceeds normally with the ETL steps.
However it looks as though my Lambda function is consistently being invoked twice. I cannot explain that. Here's a sample of the code:
def main(spark: SparkSession, pathToParquet: String) {
// generate a unique table name
val tableName = generateTableName()
// call the lambda function
val result = callLambdaFunction(tableName)
// wait for the table to be created
waitForTableCreation(tableName)
// normal ETL pipeline
var parquetRDD = spark.read.parquet(pathToParquet)
val transformedRDD = parquetRDD.map((row: Row) => transformData(row), encoder=kryo[(Text, DynamoDBItemWritable)])
transformedRDD.saveAsHadoopDataset(getConfiguration(tableName))
spark.sparkContext.stop()
}
The code to wait for table creation is pretty-straightforward, as you can see:
def waitForTableCreation(tableName: String) {
val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.defaultClient()
val waiter: Waiter[DescribeTableRequest] = client.waiters().tableExists()
try {
waiter.run(new WaiterParameters[DescribeTableRequest](new DescribeTableRequest(tableName)))
} catch {
case ex: WaiterTimedOutException =>
LOGGER.error("Timed out waiting to create table: " + tableName)
throw ex
case t: Throwable => throw t
}
}
And the lambda invocation is equally simple:
def callLambdaFunction(tableName: String) {
val myLambda = LambdaInvokerFactory.builder()
.lambdaClient(AWSLambdaClientBuilder.defaultClient)
.lambdaFunctionNameResolver(new LambdaByName(LAMBDA_FUNCTION_NAME))
.build(classOf[MyLambdaContract])
myLambda.invoke(new MyLambdaInput(tableName))
}
Like I said, when I run spark-submit on this code, it definitely does hit the Lambda function. But I can't explain why it hits it twice. The result is that I get two tables provisioned in DynamoDB.
The waiting step also seems to fail within the context of running this as a Spark job. But when I unit-test my waiting code it seems to work fine on its own. It successfully blocks until the table is ready.
At first I theorized that perhaps spark-submit was sending this code to all of the worker nodes and they were independently running the whole thing. Initially I had a Spark cluster with 1 master and 2 workers. However I tested this out on another cluster with 1 master and 5 workers, and there again it hit the Lambda function exactly twice, and then apparently failed to wait for table creation because it dies shortly after invoking the Lambdas.
Does anyone have any clues as to what Spark might be doing? Am I missing something obvious?
UPDATE: Here's my spark-submit args which are visible on the Steps tab of EMR.
spark-submit --deploy-mode cluster --class com.mypackage.spark.MyMainClass s3://my-bucket/my-spark-job.jar
And here's the code for my getConfiguration function:
def getConfiguration(tableName: String) : JobConf = {
val conf = new Configuration()
conf.set("dynamodb.servicename", "dynamodb")
conf.set("dynamodb.input.tableName", tableName)
conf.set("dynamodb.output.tableName", tableName)
conf.set("dynamodb.endpoint", "https://dynamodb.us-east-1.amazonaws.com")
conf.set("dynamodb.regionid", "us-east-1")
conf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
conf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
new JobConf(conf)
}
Also here is a Gist containing some of the exception logs I see when I try to run this.

Thanks #soapergem for adding logging and options. I add an answer (a try one) since it may be a little bit longer than a comment :)
To wrap-up:
nothing strange with spark-submit and configuration options
in https://gist.github.com/soapergem/6b379b5a9092dcd43777bdec8dee65a8#file-stderr-log you can see that the application is executed twice. It passes twice from an ACCEPTED to RUNNING state. And that's consistent with EMR defaults (How to prevent EMR Spark step from retrying?). To confirm that, you can check whether you have 2 tables created after executing the step (I suppose here that you're generating tables with dynamic names; a different name per execution which in case of retry should give 2 different names)
For your last question:
It looks like my code might work if I run it in "client" deploy mode, instead of "cluster" deploy mode? Does that offer any hints to anyone here?
For more information about the difference, please check https://community.hortonworks.com/questions/89263/difference-between-local-vs-yarn-cluster-vs-yarn-c.html In your case, it looks like the machine executing spark-submit in client mode has different IAM policies than the EMR jobflow. My supposition here is that your jobflow role is not allowed to dynamodb:Describe* and that's why you're getting the exception with 500 code (from your gist):
Caused by: com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested resource not found: Table: EmrTest_20190708143902 not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: V0M91J7KEUVR4VM78MF5TKHLEBVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4243)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4210)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeDescribeTable(AmazonDynamoDBClient.java:1890)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1857)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:129)
at org.apache.hadoop.dynamodb.DynamoDBClient$1.call(DynamoDBClient.java:126)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:80)
To confirm this hypothesis, you an execute your part creating the table and waiting for creation locally (no Spark code here, just a simple java command of your main function) and:
for the first execution ensure that you have all permissions. IMO it will be dynamodb:Describe* on Resources: * (if it's the reason, AFAIK you should use somthing Resources: Test_Emr* in production for principle of least privilege )
for the 2nd execution remove dynamodb:Describe* and check whether you're getting the same stack trace like in the gist

I encountered the same problem in cluster mode too (v2.4.0). I workaround it by launching my apps programmatically using SparkLauncher instead of using spark-submit.sh. You could move your lambda logic into your main method that starts your spark app like this:
def main(args: Array[String]) = {
// generate a unique table name
val tableName = generateTableName()
// call the lambda function
val result = callLambdaFunction(tableName)
// wait for the table to be created
waitForTableCreation(tableName)
val latch = new CountDownLatch(1);
val handle = new SparkLauncher(env)
.setAppResource("/path/to/spark-app.jar")
.setMainClass("com.company.SparkApp")
.setMaster("yarn")
.setDeployMode("cluster")
.setConf("spark.executor.instances", "2")
.setConf("spark.executor.cores", "2")
// other conf ...
.setVerbose(true)
.startApplication(new SparkAppHandle.Listener {
override def stateChanged(sparkAppHandle: SparkAppHandle): Unit = {
latch.countDown()
}
override def infoChanged(sparkAppHandle: SparkAppHandle): Unit = {
}
})
println("app is launching...")
latch.await()
println("app exited")
}

your spark job starts before the table is actually created because defining operations one by one doesn't mean they will wait until previous one is finished
you need to change the code so that block related to spark is starting after table is created, and in order to achieving it you have to either use for-comprehension that insures every step is finished or put your spark pipeline into the callback of waiter called after the table is created (if you have any, hard to tell)
you can also use andThen or simple map
the main point is that all the lines of code written in your main are executed one by one immediately without waiting for previous one to finish

Taking webpage screenshot on completion of a Cucumber Step Definition

I'm currently researching a way in which I can implement a screen capture method in my acceptance test suite in Scala Cucumber after each step definition is completed in the scenario.
I have already implemented a method that will take a screenshot of a webpage if one of the automation test fails by invoking the method in the after hooks class. This does work fine but this will only capture the web page once the entire scenario has been completed.
I wasn't sure if there was something like before and after hooks that could be applied to the steps instead of the scenario.
Hooks.scala
#After
def tearDown(result: Scenario){
if (result.isFailed) {
ifCurrentDriverTakesSnapshot {
takesSnapshot =>
Snapshotter.takeErrorSnapshot(takesSnapshot, result)
}
}
Snapshotter.Scala
def takeErrorSnapshot(takesScreenshot: TakesScreenshot, result: Scenario)
= {
try {
val screenshot = takesScreenshot.getScreenshotAs(OutputType.BYTES)
result.embed(screenshot, "image/png")
}
catch {
case e: WebDriverException =>
e.printStackTrace(System.err)
}
}
I would like to be able to do this in a class or method that can be called after each new page is opened or after each step definition. I could right a step definition that would handle the screen capture but I would like to do in a better way as I have 100's of test scenarios so it would be better to avoid adding a step for this in between every step definition as they have done in the below link.
Cucumber Java screenshots
If anyone could share some light on the matter i'd greatly appreciate it as I'm struggling to find much on the subject.
Thanks!!!

Is there a way to chain two arbitrary specs2 tests (in Scala)?

Every now and then I run into a situation where I need to make absolutely sure that one test executes (successfully) before another one.
For example:
"The SecurityManager" should {
"make sure an administrative user exists" in new WithApplication with GPAuthenticationTestUtility {
checkPrerequisiteAccounts(prerequisiteAccounts)
}
"get an account id from a token" in new WithApplication with GPAuthenticationTestUtility {
val token = authenticate(prerequisiteAccounts.head)
token.isSuccess must beTrue
myId = GPSecurityController.getAccountId(token.get)
myId != None must beTrue
myId.get.toInt > 0 must beTrue
}
The first test will create the admin user if it doesn't exist. The second test uses that account to perform a test.
I am aware I can do a Before/After treatment in specs2 (though I've never done one). But I really don't want checkPrerequisiteAccounts to run before every test, just before that first test executes... sort of a "before you start doing anything at all, do this one thing..."
Anyone know if there is a way to tag a particular test as "do first" or "do before anything else?"

You can just add a "Step" in between tests to enforce some sequentiality:
"make sure an administrative user exists" in ok
step("admin is created".pp)
"get an account id from a token" in ok

You can also add sequential to your spec like, to sequential execution of tests.
class MySpec extends mutable.Specification {
sequential
// rest follows
behaviour one
behaviour two
}

Deadbolt 2 Restrict function has only one possible failure code

This question may have a bit of philosophical aspect to it.
I have been using Deadbolt 2 (Scala) in my Play application and it works quite well.
In looking at the Restrict function definition (line 47) I noticed that it will invoke the onAuthFailure for one of the following reasons:
No user in session (no subject)
Action specified no roles.
User attempted an action for which they did not possess one or more required roles.
In my application UI, I would like to receive a different status code for each of these so that a user that is not logged in (condition 1) will be redirected to login page but condition 3 would be more gracefully handled by just a warning (since they can do no harm anyway and might have accidentally tried to edit when they have 'read-only' access - perhaps a UI bug, but logging in again is a bit draconian).
If I had to settle for just 2 status codes, however, I would want to differentiate between 1 and the other 2. I can see how this could be accomplished but would like to get other opinions on the merits of even doing this.
If I were to implement this change, it looks like I could just override the Restrict function in my own extension of the DeadboltActions trait.
I'm a little new to scala, so I'm open to additional ideas on how to best accomplish these goals.

I decided to just add the code to differentiate between condition 1 and either 2 or 3 as follows:
In MyDeadboltHandler:
class MyDeadboltHandler(dynamicResourceHandler: Option[DynamicResourceHandler] = None) extends DeadboltHandler {
...
def onAuthFailure[A](request: Request[A]): Result = {
Logger.error("authentication failure")
val json = new JsonStatus("Failed to authenticate", -1).toJson.toString
if(noUserInSession(request)){
Results.Forbidden(json).withHeaders("Access-Control-Allow-Origin" -> "*")
}
else{
Results.Unauthorized (json).withHeaders("Access-Control-Allow-Origin" -> "*")
}
}
def noUserInSession(request:RequestHeader) = {
username(request) match {
case Some(u:String) => false
case _ => true
}
}
This works well for me and does not impose upon the basic Deadbolt-2 functionality.