Gatling & Scala : How to split values in loop? - scala

I want to split some values in loop. I used split method in check and it works for me. But, there are more than 25 values of two different types.
So, I am implementing loop in scala and struggling.
Consider the following scenario:
import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
class testSimulation extends Simulation {
val httpProtocol = http
val uri1 = ""
val scn = scenario("EditAttribute")
.post(uri1 + "/web/guest/")
.check(jsonPath("$.data[0].pid").transform(_.split('#').toSeq).saveAs("pID"))) // Saving splited value
.check(jsonPath("$.data[*].AdId").findAll.saveAs("aID")) // All values are collected in vector
// .check(jsonPath("$.data[*].AdId").transform(_.split('#').toSeq).saveAs("aID")) // Split method Not working for batch
// .check(jsonPath("$.data[*].AdId").findAll.saveAs("aID")) // To verify the length of array (vector)
.formParam("entityTypeId", "${pId(0)}") // passing splited value, perfectly done
.formParam("action_id", "${aId(0)},${aId(1)},${aId(2)},..and so on) // need to pass splitted values which is not happening
.formParam("userId", "${rID}")
// To verify values on console (What value I m getting after splitting)...
.exec( session => {
val abc = session("pID").as[Seq[String]]
val xyz = session("aID").as[Seq[String]]
println("Separated pId ===> " +abc(0)) // output - first splitted value
println("Separated pId ===> " +abc(1)) // split separater
println("Separated pId ===> " +abc(2)) // second splitted value
println("Length ===> " +abc.length) // output - 3
println("Length ===> " +xyz.length) // output - 25
.get("https://" + uri1 + "/logout")
I want to implement a loop which performs splitting of all (25) values in session. I do not want to do hard coding.
I am newbie to scala and Gatling as well.

Since it is a session function the below snippet will give a direction to continue ,use split just like you do in Java :-
exec { session =>
var requestIdValue = new scala.util.Random().nextInt(Integer.MAX_VALUE).toString();
var length = jobsQue.length
try {
var reportElement = jobsQue.pop()
jobData = reportElement.getData;
xml = Configuration.XML.replaceAll("requestIdValue", requestIdValue);
println(s"For Request Id : $requestIdValue .Data Value from feeder is : $jobData Current size of jobsQue : $length");
} catch {
case e: NoSuchElementException => print("Erorr")
"xmlRequest" -> xml)


crafting the body for request does not work concurrently

I would like to send simultaneous requests through gatlings for some duration
below is the snippet of my code where I am crafting the requests.
JSON file contents function which is used for crafting the json. its been used in the main request
the TestDevice_dev.csv has list of devices till 30 after 30 I will reuse it.
val dFeeder = csv("TestDevice_dev.csv").circular
val trip_dte_tunnel_1 = scenario("TripSimulation")
.exec(session => {
val key = conf.getString("config.env.sign_key")
var bodyTrip = CannedRequests.jsonFileContents("${deviceID}")
//deviceId comes from the feeder
session.set("trip_sign", SignatureGeneration.getSignature(key, bodyTrip))
the scenario is started as below
val scn_trip = scenario("trip simulation")
.repeat{1} {
setUp(scn_trip.inject(constantUsersPerSec(5) during (5 seconds))) ```
it runs fine if there is 1 user for 5 seconds but not simulatenous users.
the json request which is crafted looks like the below
`def jsonFileContents(deviceId: String): String= {
val fileName = "trip-data.json"
var stringBuilder=""
var timeStamp1:Long ="America/Chicago")).toInstant().toEpochMilli().toLong - 10000.toLong
for (line <- (Source fromFile fileName).getLines) {
if (line.contains("eventDateTime")) {
var lineReplace=line.replaceAll("<timeStamp>", timeStamp1.toString())
timeStamp1 = timeStamp1+1000.toLong
else if (line.contains("onTimeStamp")) {
var lineReplace1=line.replaceAll("<onTimeStamp>", timeStamp1.toString)
else if (line.contains("deviceID")){
var lineReplace2=line.replace("<deviceID>", deviceId)
else {
stringBuilder =stringBuilder+line
Best guess: your feeder contains one single entry and you're using the default queue strategy. Either add more entries in your feeder file to match the number of users, or use a different strategy.
This really is explained in the documentation, including the tutorials. I recommend you take some time to read the documentation before rushing into the code, you'll save lots of time in the end.
You don't need to do your own parameter substitution of values in the json file - Gatling supports passing en ELFileBody as the body where you can have a json file with gatling EL expressions like ${deviceId}.

Transforming specific field of the RDD

I am new to spark.I have a doubt in transforming the specific field of a RDD.
I have a file like below:
And I want ouput like below:In the third field I want to remove all characters and integers separated by |.
how can I do that.
I tried the below code.
val inputRdd =sc.textFile("file:///home/arun/Desktop/inputcsv.txt");
val result =inputRdd.flatMap(line=>line.split("\\|")).collect;
def ghi(arr:Array[String]):Array[String]=
var outlist=scala.collection.mutable.Buffer[String]();
for( i <-0 to arr.length-1){
var io=arr(i); var arru=scala.collection.mutable.Buffer[String]();
var ki=io.split("/");
for(st <-0 to ki.length-1 )
var ion =ki(st).split("-");
var strui="";
for(in <-0 to arru.length-1)
var ion =arr(i).split("-");
return outlist.toArray;
var output=ghi(result);
val finalrdd=sc.parallelize(out, 1);
Please help me.
What we need to do is to extract the numbers from that field and add them as new entries to the Array being processed.
Something like this should do:
// use data provided as sample
val dataSample ="""2016-11-10T07:01:37|AAA|S16.12|MN-MN/AAA-329044|288364|2|3
val data = sparkContext.parallelize(dataSample)
val records=> line.split("\\|"))
// this regex can find and extract the contiguous digits in a mixed string.
val numberExtractor = "\\d+".r.unanchored
// we replace field#3 with the results of the regex
val field3Exploded ={arr => arr.take(3) ++ numberExtractor.findAllIn(arr.drop(3).head) ++ arr.drop(4)}
// Let's visualize the result
field3Exploded.collect.foreach(arr=> println(arr.mkString(",")))

How to extract nouns from a German text with the StanfordNLP tool in Scala?

I want to extract the nouns of a German text with the StanfordNLP tool. Therefore I added the dependencies for German texts.
My dependencies:
<!--BEGIN: NLP For German Text -->
<!--END: NLP For German Text -->
<!-- Other dependencies -->
In my Scala class I want to extract the nouns of a tweet. Here the following code snipped:
// Start a new processor to use the NLP tools
val proc: Processor = new FastNLPProcessor
// TODO: Explain the val doc
val doc = proc.annotate(text)
// Is a String where the keywords are stored that we want to extract (e.g. Nouns - "N"; etc.)
var keywords: String = ""
// Iterate throgh each sentence
for (sentence <- doc.sentences) {
// i - contains the word of each sentence in a text in the current loop
// x - saves the position of the word in the current loop
// E.g. for tweet text :: "new scala update xyz" -> first loop: i = new ; x = 0
for ((i, x) <- sentence.tags.get.view.zipWithIndex) {
// "N" - is the abbreviation for Nouns
if (i.toString().startsWith("N")) {
// Append to keywords the noun of a text
keywords = keywords + " " + sentence.words.array(x)
// Print the current state of the keyword string
But it only works for english texts. These are my imports:
import org.clulab.processors.Processor
import org.clulab.processors.fastnlp.FastNLPProcessor
import play.api.libs.json._
import scala.util.parsing.json.JSONObject
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.MongoConnection
import com.mongodb.casbah.commons.MongoDBObject
import org.clulab.struct.DirectedGraphEdgeIterator
The complete Scala class:
object KeywordExtractor {
// Creates a connection to the MongoDB client
val mongoConn = MongoClient("localhost", 27017)
// Names the DB where the data should be saved
val mongoDB = mongoConn("dbtest")
// Defines the collection in which the text should be stored
val mongoColl = mongoDB("testcollection")
def extractKey(tweet: String) = {
// tweet is sent as a Json-String. jsonObject -> stores the sent tweet as a Json Object
val jsonObject = Json.parse(tweet)
// text String to store the text from the tweet
var text = ""
try {
// try to parse the text from the jsonObject into the text variable
text = (jsonObject \ "text").as[String]
} catch {
case e: JsResultException => println("Limit reached")
// Start a new processor to use the NLP tools
val proc: Processor = new FastNLPProcessor
// TODO: Explain the val doc
val doc = proc.annotate(text)
// Is a String where the keywords are stored that we want to extract (e.g. Nouns - "N"; etc.)
var keywords: String = ""
// Iterate throgh each sentence
for (sentence <- doc.sentences) {
// i - contains the word of each sentence in a text in the current loop
// x - saves the position of the word in the current loop
// E.g. for tweet text :: "new scala update xyz" -> first loop: i = new ; x = 0
for ((i, x) <- sentence.tags.get.view.zipWithIndex) {
// "N" - is the abbreviation for Nouns
if (i.toString().startsWith("N")) {
// Append to keywords the noun of a text
keywords = keywords + " " + sentence.words.array(x)
// Print the current state of the keyword string
// Create a new MongoDB builder
val builder = MongoDBObject.newBuilder
// Creates a monogDb object to store the keywords (a bison file is created and text is the key for the value keywords)
builder += "text" -> keywords
// TODO: .result not clear
val newObj = builder.result
// Insert the new Object to the MonogDB
// Clear the memory of the doc to avoid a out of memory error (doc allocates a lot of memory)
It appears that, as of 8/2016, the CLU Lab Processors library does not support multiple languages (see this issue).
The authors have not prioritized adding new languages but are interested in adding it if other have the time.
Note that you can tell the vanilla CoreNLP POS Taggers to use the German model by changing the properties of the StanfordCoreNLP object.

java heap space error when converting csv to json but no error with d3.csv()

Platform being used: Apache Zeppelin
Language: scala, javascript
I use d3js to read a csv file of size ~40MB and it works perfectly fine with the below code:
<script type="text/javascript">
d3.csv("test.csv", function(data) {
// data is JSON array. Do something with data;
Now, the idea is to avoid d3js, instead, construct the JSONarray in scala and access this variable in javascript code through z.angularBind(). Both of the below code works for smaller files, but gives java heap space error for the CSV file of size 40MB. What I am unable to understand is when d3.csv() can perfectly do the job without any heap space error, why cannot these 2 below code?
Edited Code 1: Using scala's
import org.json._
var br = new BufferedReader(new FileReader("/root/test.csv"))
var contentLine = br.readLine();
var keys = contentLine.split(",")
contentLine = br.readLine();
var ja = new JSONArray();
while (contentLine != null) {
var splits = contentLine.split(",")
var i = 0
var jo = new JSONObject()
for(i <- 0 to splits.length-1){
jo.put(keys(i), splits(i));
contentLine = br.readLine();
//z.angularBind("ja",ja.toString()) //ja can be accessed now in javascript (EDITED-10/11/15)
Edited Code 2:
I thought the heap space issue might go away if I use Apache spark to construct the JSON array like in below code, but this one too gives heap space error:
def myf(keys: Array[String], value: String):String = {
var splits = value.split(",")
var jo = new JSONObject()
for(i <- 0 to splits.length-1){
jo.put(keys(i), splits(i));
val csv = sc.textFile("/root/test.csv")
val firstrow = csv.first
val header = firstrow.split(",")
val data = csv.filter(x => x != firstrow)
var g = => myf(header,value)).collect()
// EDITED BELOW 2 LINES-10/11/15
//var ja= g.mkString("[", ",", "]")
//z.angularBind("ja",ja) //ja can be accessed now in javascript
You are creating JSON-objects. They are not native to java/scala and will therefore take up more space in that environment. What does z.angularBind() really do?
Also what is the heap size of your javascript environment (see for chrome) and your java environment (see How is the default java heap size determined?).
Update: Removed the original part of the answer where I misunderstood the question

Apache-Spark: method in foreach doesn't work

I read file from HDFS, which contains x1,x2,y1,y2 representing a envelope in JTS.
I would like to use those data to build STRtree in foreach.
val inputData = sc.textFile(inputDataPath).cache()
val strtree = new STRtree
inputData.foreach(line => {val array = line.split(",").map(_.toDouble);val e = new Envelope(array(0),array(1),array(2),array(3)) ;
println("envelope is " + e);
new Rectangle(array(0),array(1),array(2),array(3)))})
As you can see, I also print the e object.
To my surprise, when I log the size of strtree, it is zero! It seems that insert method make no senses here.
By the way, if I write hard code some test data line by line, the strtree can be built well.
One more thing, those project is packed into jar and submitted in the spark-shell.
So, why does the method in foreach not work ?
You will have to collect() to do this:
inputData.collect().foreach(line => {
... // your code
You can do this (for avoiding collecting all data):
val pairs = => {
val array = line.split(",").map(_.toDouble);
val e = new Envelope(array(0),array(1),array(2),array(3)) ;
println("envelope is " + e);
(e, new Rectangle(array(0),array(1),array(2),array(3)))
pairs.collect().foreach(pair => {
strtree.insert(pair._1, pair._2)
Use .map() instead of .foreach() and reassign the outcome.
Foreach does not return the outcome of applyied function. It can be used for sending data somewhere, storing to db, printing, and so on.