Passing arguments between Gatling scenarios and simulation - scala

I'm current creating some Gatling simulation to test a REST API. I don't really understand Scala.
I've created a scenario with several exec and pause;
object MyScenario {
val ccData = ssv("cardcode_fr.csv").random
val nameData = ssv("name.csv").random
val mobileData = ssv("mobile.csv").random
val emailData = ssv("email.csv").random
val itemData = ssv("item_fr.csv").random
val scn = scenario("My use case")
.feed(ccData)
.feed(nameData)
.feed(mobileData)
.feed(emailData)
.feed(itemData)
.exec(
http("GetCustomer")
.get("/rest/customers/${CardCode}")
.headers(Headers.headers)
.check(
status.is(200)
)
)
.pause(3, 5)
.exec(
http("GetOffers")
.get("/rest/offers")
.queryParam("customercode", "${CardCode}")
.headers(Headers.headers)
.check(
status.is(200)
)
)
}
And I've a simple Simulation :
class MySimulation extends Simulation {
setUp(MyScenario.scn
.inject(
constantUsersPerSec (1 ) during (1)))
.protocols(EsbHttpProtocol.httpProtocol)
.assertions(
global.successfulRequests.percent.is(100))
}
The application I'm trying to simulate is a multilocation mobile App, so I've prepared a set of samples data for each Locale (US, FR, IT...)
My REST API handles all the locales, therefore I want to make the simulation concurrently execute several instances of MyScenario, each with a different locale sample, to simulate the global load.
Is it possible to execute my simulation without having to create/duplicate the scenario and change the val ccData = ssv("cardcode_fr.csv").random for each one?
Also, each locale has its own load, how can I create a simulation that takes a single scenario and executes it several times concurrently with a different load and feeders?
Thanks in advance.

From what you've said, I think this may be a good approach:
Start by grouping your data in such a way that you can look up each item you want to send based on the current locale. For this, I would recommend using a Map that matches a locale string (such as "FR") to the item that matches that locale for the field you're looking to fill in. Then, at the start of each iteration of the scenario, you just pick which locale you want to use for the current iteration from a list. It would look something like this:
val locales = List("US", "FR", "IT")
val names = Map( "US" -> "John", "FR" -> "Pierre", "IT" -> "Guillame")
object MyScenario {
//These two lines pick a random locale from your list
val random_index = rand.nextInt(locales.length);
val currentLocale = locales(random_index);
//This line gets the name
val name = names(currentLocale)
//Do the rest of your logic here
}
This is a very simplified example - you'll have to figure out how you actually want to retrieve the data from files and put it into a Map structure, as I assume you don't want to hard code every item for every field into your code.

Related

Problems joining 2 kafka streams (using custom timestampextractor)

I'm having problems joining 2 kafka streams extracting the date from the fields of my event. The join is working fine when I do not define a custom TimeStampExtractor but when I do the join does not work anymore. My topology is quite simple:
val builder = new StreamsBuilder()
val couponConsumedWith = Consumed.`with`(Serdes.String(),
getAvroCouponSerde(schemaRegistryHost, schemaRegistryPort))
val couponStream: KStream[String, Coupon] = builder.stream(couponInputTopic, couponConsumedWith)
val purchaseConsumedWith = Consumed.`with`(Serdes.String(),
getAvroPurchaseSerde(schemaRegistryHost, schemaRegistryPort))
val purchaseStream: KStream[String, Purchase] = builder.stream(purchaseInputTopic, purchaseConsumedWith)
val couponStreamKeyedByProductId: KStream[String, Coupon] = couponStream.selectKey(couponProductIdValueMapper)
val purchaseStreamKeyedByProductId: KStream[String, Purchase] = purchaseStream.selectKey(purchaseProductIdValueMapper)
val couponPurchaseValueJoiner = new ValueJoiner[Coupon, Purchase, Purchase]() {
#Override
def apply(coupon: Coupon, purchase: Purchase): Purchase = {
val discount = (purchase.getAmount * coupon.getDiscount) / 100
new Purchase(purchase.getTimestamp, purchase.getProductid, purchase.getProductdescription, purchase.getAmount - discount)
}
}
val fiveMinuteWindow = JoinWindows.of(TimeUnit.MINUTES.toMillis(10))
val outputStream: KStream[String, Purchase] = couponStreamKeyedByProductId.join(purchaseStreamKeyedByProductId,
couponPurchaseValueJoiner,
fiveMinuteWindow
)
outputStream.to(outputTopic)
builder.build()
As I said this code works like a charm when I do not use a custom TimeStampExtractor but when I do by setting the StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG to my custom extractor class (I've double checked that the class is extracting the date properly) the join does not work anymore.
I'm testing the topology by running a unit test and passing the following events to it:
val coupon1 = new Coupon("Dec 05 2018 09:10:00.000 UTC", "1234", 10F)
// Purchase within the five minutes after the coupon - The discount should be applied
val purchase1 = new Purchase("Dec 05 2018 09:12:00.000 UTC", "1234", "Green Glass", 25.00F)
val purchase1WithDiscount = new Purchase("Dec 05 2018 09:12:00.000 UTC", "1234", "Green Glass", 22.50F)
val couponRecordFactory1 = couponRecordFactory.create(couponInputTopic, "c1", coupon1)
val purchaseRecordFactory1 = purchaseRecordFactory.create(purchaseInputTopic, "p1", purchase1)
testDriver.pipeInput(couponRecordFactory1)
testDriver.pipeInput(purchaseRecordFactory1)
val outputRecord1 = testDriver.readOutput(outputTopic,
new StringDeserializer(),
JoinTopologyBuilder.getAvroPurchaseSerde(
schemaRegistryHost,
schemaRegistryPort).deserializer())
OutputVerifier.compareKeyValue(outputRecord1, "1234", purchase1WithDiscount)
Not sure if the step of selecting a new key is getting rid of the proper date. I have tested a lot of combinations with no luck :(
Any help would be really appreciated!
I'm not sure of that because I don't know how much you test your code, but my guess will be that :
1) your code work with the default timestamp extractor because it's using the time when you're sending record into the pipes as timestamps records, so basically it will work because in your test you're sending data one after another without a pause.
2) you are using the TopologyTestDriver to do your tests !
Note that it's very useful for testing your business code and the topology as a unit (what I have as inputs and what is the correct according outputs) but there isn't a Kafka Stream app running in thoses tests.
In your case you can play with the method advanceWallClockTime(long) in the TopologyTestDriver class to simulate the system time walking.
If you want to start the topology you will have to do an integration test with an embedded kafka cluster (there is one on kafka libraries that's working just fine !).
Let me know if that's help :-)
Thank you for replying. I was working on this yesterday and I think I found the problem. As you said I am using the TopologyTestDriver to run my tests and when you initialize the TopologyTestDriver class it uses an initialWallClockTime, if you do not provide a value, the TopologyTestDriver will pick up the currentTimeMillis:
public TopologyTestDriver(Topology topology, Properties config) {
this(topology, config, System.currentTimeMillis());
}
There is another constructor that allows you to pass-in an initialWallClockTime. I've been testing this method but for some reason it does not work for me.
So to sum up my solution has been to create the Purchase and Coupon objects with the current timestamp. I'm still using my custom timestamp extractor but instead of hardcoding a date I am always getting the current timestamp and this way the join works fine.
Not fully happy with my end solution because I don't know why the initialWallClockTime does not work for me, but at least the tests are working fine now.

Making gatling generate random data per request using a feeder

I'm trying to get gatling to create random data per POST request. I've followed a few posts on stackoverflow and other places. I came up with this scenario -
def randomUuid = UUID.randomUUID().toString
val feeder = Iterator.continually(Map("user" -> randomUuid))
def createPostRequest = {
http("createuser")
.post("http://jsonplaceholder.typicode.com/posts")
.body(StringBody("${user}"))
.check(status.is(201))
}
val scn = scenario("some load test")
.feed(feeder)
.forever(exec(createPostRequest))
setUp(scn.inject(atOnceUsers(1)))
.maxDuration(20 minutes)
However, when I run this code it just calls my feeder once to create a single UUID and just re-uses the same UUID throughout the load test.
I created the code above after following this thread. I'm using gatling 2.2.5. Here's my sbt config -
import sbt._
object Dependencies {
private val gatlingHighcharts = "io.gatling.highcharts" % "gatling-
charts-highcharts" % "2.2.5" % "test"
private val gatlingTest = "io.gatling" % "gatling-test-framework" % gatlingHighcharts.revision % "test"
val gatlingDependencies = Seq(gatlingHighcharts, gatlingTest)
}
As you don't call feed inside a loop, typically your forever one, you will indeed only generate one single value per virtual user.
If what you want is to have unique values per loop iteration, move the feed call inside the loop.
in your setUp, you're only creating one user - so your scenario is only getting executed once, meaning that 'feed' only occurs once before you start looping over your request.
change your scenario to be
val scn = scenario("some load test")
.feed(feeder)
.exec(createPostRequest)
and make your setUp (replacing 100 with whatever number of users you want)
setUp(scn.inject(atOnceUsers(100)))

SparkSQL performance issue with collect method

We are currently facing a performance issue in sparksql written in scala language. Application flow is mentioned below.
Spark application reads a text file from input hdfs directory
Creates a data frame on top of the file using programmatically specifying schema. This dataframe will be an exact replication of the input file kept in memory. Will have around 18 columns in the dataframe
var eqpDF = sqlContext.createDataFrame(eqpRowRdd, eqpSchema)
Creates a filtered dataframe from the first data frame constructed in step 2. This dataframe will contain unique account numbers with the help of distinct keyword.
var distAccNrsDF = eqpDF.select("accountnumber").distinct().collect()
Using the two dataframes constructed in step 2 & 3, we will get all the records which belong to one account number and do some Json parsing logic on top of the filtered data.
var filtrEqpDF =
eqpDF.where("accountnumber='" + data.getString(0) + "'").collect()
Finally the json parsed data will be put into Hbase table
Here we are facing performance issues while calling the collect method on top of the data frames. Because collect will fetch all the data into a single node and then do the processing, thus losing the parallel processing benefit.
Also in real scenario there will be 10 billion records of data which we can expect. Hence collecting all those records in to driver node will might crash the program itself due to memory or disk space limitations.
I don't think the take method can be used in our case which will fetch limited number of records at a time. We have to get all the unique account numbers from the whole data and hence I am not sure whether take method, which takes
limited records at a time, will suit our requirements
Appreciate any help to avoid calling collect methods and have some other best practises to follow. Code snippets/suggestions/git links will be very helpful if anyone have had faced similar issues
Code snippet
val eqpSchemaString = "acoountnumber ....."
val eqpSchema = StructType(eqpSchemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)));
val eqpRdd = sc.textFile(inputPath)
val eqpRowRdd = eqpRdd.map(_.split(",")).map(eqpRow => Row(eqpRow(0).trim, eqpRow(1).trim, ....)
var eqpDF = sqlContext.createDataFrame(eqpRowRdd, eqpSchema);
var distAccNrsDF = eqpDF.select("accountnumber").distinct().collect()
distAccNrsDF.foreach { data =>
var filtrEqpDF = eqpDF.where("accountnumber='" + data.getString(0) + "'").collect()
var result = new JSONObject()
result.put("jsonSchemaVersion", "1.0")
val firstRowAcc = filtrEqpDF(0)
//Json parsing logic
{
.....
.....
}
}
The approach usually take in this kind of situation is:
Instead of collect, invoke foreachPartition: foreachPartition applies a function to each partition (represented by an Iterator[Row]) of the underlying DataFrame separately (the partition being the atomic unit of parallelism of Spark)
the function will open a connection to HBase (thus making it one per partition) and send all the contained values through this connection
This means the every executor opens a connection (which is not serializable but lives within the boundaries of the function, thus not needing to be sent across the network) and independently sends its contents to HBase, without any need to collect all data on the driver (or any one node, for that matter).
It looks like you are reading a CSV file, so probably something like the following will do the trick:
spark.read.csv(inputPath). // Using DataFrameReader but your way works too
foreachPartition { rows =>
val conn = ??? // Create HBase connection
for (row <- rows) { // Loop over the iterator
val data = parseJson(row) // Your parsing logic
??? // Use 'conn' to save 'data'
}
}
You can ignore collect in your code if you have large set of data.
Collect Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.
Also this can cause the driver to run out of memory, though, because collect() fetches the entire RDD/DF to a single machine.
I have just edited your code, which should work for you.
var distAccNrsDF = eqpDF.select("accountnumber").distinct()
distAccNrsDF.foreach { data =>
var filtrEqpDF = eqpDF.where("accountnumber='" + data.getString(0) + "'")
var result = new JSONObject()
result.put("jsonSchemaVersion", "1.0")
val firstRowAcc = filtrEqpDF(0)
//Json parsing logic
{
.....
.....
}
}

Flink: ProcessWindowFunction

I am recently studying ProcessWindowFunction in Flink's new release. It says the ProcessWindowFunction supports global state and window state. I use Scala API to give it a try. I can so far get the global state working but I do no have any luck to make it for the window state. What I'm doing is to process system logs and count the number of logs keyed by hostname and severity level. I would like to calculate the difference in log count between two adjacent windows. Here is my code implementing ProcessWindowFunction.
class LogProcWindowFunction extends ProcessWindowFunction[LogEvent, LogEvent, Tuple, TimeWindow] {
// Create a descriptor for ValueState
private final val valueStateWindowDesc = new ValueStateDescriptor[Long](
"windowCounters",
createTypeInformation[Long])
private final val reducingStateGlobalDesc = new ReducingStateDescriptor[Long](
"globalCounters",
new SumReduceFunction(),
createTypeInformation[Long])
override def process(key: Tuple, context: Context, elements: Iterable[LogEvent], out: Collector[LogEvent]): Unit = {
// Initialize the per-key and per-window ValueState
val valueWindowState = context.windowState.getState(valueStateWindowDesc)
val reducingGlobalState = context.globalState.getReducingState(reducingStateGlobalDesc)
val latestWindowCount = valueWindowState.value()
println(s"lastWindowCount: $latestWindowCount ......")
val latestGlobalCount = if (reducingGlobalState.get() == null) 0L else reducingGlobalState.get()
// Compute the necessary statistics and determine if we should launch an alarm
val eventCount = elements.size
// Update the related state
valueWindowState.update(eventCount.toLong)
reducingGlobalState.add(eventCount.toLong)
for (elem <- elements) {
out.collect(elem)
}
}
}
I always get 0 value from the window state instead of the previous updated count it should be. I've been struggling with such problem for several days. Can someone please help me to figure it out? Thanks.
The scope of the per-window state is a single window instance. In the case of your process method above, every time it is called a new window is in scope, and so the latestWindowCount is always zero.
For a normal, vanilla window that is only going to fire once, per-window state is useless. Only if a window somehow has multiple firings (e.g., late firings) can you make good use of the per-window state. If you are trying to remember something from one window to the next, then you can do this with the global window state.
For an example of using per-window state to remember data to use in late firings, see slides 13-19 in Flink's advanced window training.

Saving session attribute in a list

I would like to save a session attribute in a list in my gatling simulation. What am trying to do is to get all the values of my JSON who are defined in a CV file and write it in a file. In my example below "test" is always equal to the value of the first jsonPath.
Here what I am doing:
val scn1 = scenario("[SCENARIO] GET")
.repeat(Nbproduct-1, "counter") (
feed(csv(CSV).circular)
.exec(http("get JSON")
.get(url_1")
.check(jsonPath("""$.${meta_ref}""").find.saveAs("test")))
.pause(1)
.exec(session => {
writer.write("\""+session("meta_cts").as[String]+"\":\"" + session("test").as[String]+"\",\n")
session
}
)
I also tried this but it get the value of the counter...
.check(jsonPath("""$.${meta_ref}""").find.saveAs("""jdd_value("${counter}")""")))
Thanks for the help!
Feeders are shared datasources, so first user will pop the first record, second user the second record, etc...
Then, it's not possible to define checks at runtime (depending on some entries in a file). All DSL components are builders that are only resolved once when the Simulation is loaded.