Gatling feeder through the iteration over a map - scala

I wish to create a custom feeder in Gatling scala, which fills the parameter element via iterating through a map.
I have the following code snippet:
val idPostFeeder = Iterator.continually(
Map("postId" -> getValues())
)
getValues is a Collection containing String elements
I tried also the following way:
val idPostFeeder = (for (i <- getFile().get(l.get(b)) yield {
Map("postId" -> s"$i")
} )
val l = getFile().keysIterator.toList;
var b = l.indexOf() until l.indexOf(mapLenght)
getFile is a Map[String, String], from which I need the values passed to the feeder.
Is there a way to fill a feeder via an iteration over a Collection or a Map?
Thank you!

for a collection of Strings, you just need to convert to a Map
getValues.map(s => Map("postId" -> s)).toIterator
you now have an iterator that maps "postId" to each value of your collection

Related

Scala create immutable nested map

I have a situation here
I have two strins
val keyMap = "anrodiApp,key1;iosApp,key2;xyz,key3"
val tentMap = "androidApp,tenant1; iosApp,tenant1; xyz,tenant2"
So what I want to add is to create a nested immutable nested map like this
tenant1 -> (andoidiApp -> key1, iosApp -> key2),
tenant2 -> (xyz -> key3)
So basically want to group by tenant and create a map of keyMap
Here is what I tried but is done using mutable map which I do want, is there a way to create this using immmutable map
case class TenantSetting() {
val requesterKeyMapping = new mutable.HashMap[String, String]()
}
val requesterKeyMapping = keyMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map(keyValuePair => (keyValuePair[0],keyValuePair[1]))
.toMap
}.flatten.toMap
val config = new mutable.HashMap[String, TenantSetting]
tentMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map { keyValuePair =>
val requester = keyValuePair[0]
val tenant = keyValuePair[1]
if (!config.contains(tenant)) config.put(tenant, new TenantSetting)
config.get(tenant).get.requesterKeyMapping.put(requester, requesterKeyMapping.get(requester).get)
}
}
The logic to break the strings into a map can be the same for both as it's the same syntax.
What you had for the first string was not quite right as the filter you were applying to each string from the split result and not on the array result itself. Which also showed in that you were using [] on keyValuePair which was of type String and not Array[String] as I think you were expecting. Also you needed a trim in there to cope with the spaces in the second string. You might want to also trim the key and value to avoid other whitespace issues.
Additionally in this case the combination of map and filter can be more succinctly done with collect as shown here:
How to convert an Array to a Tuple?
The use of the pattern with 2 elements ensures you filter out anything with length other than 2 as you wanted.
The iterator is to make the combination of map and collect more efficient by only requiring one iteration of the collection returned from the first split (see comments below).
With both strings turned into a map it just needs the right use of groupByto group the first map by the value of the second based on the same key to get what you wanted. Obviously this only works if the same key is always in the second map.
def toMap(str: String): Map[String, String] =
str
.split(";")
.iterator
.map(_.trim.split(','))
.collect { case Array(key, value) => (key.trim, value.trim) }
.toMap
val keyMap = toMap("androidApp,key1;iosApp,key2;xyz,key3")
val tentMap = toMap("androidApp,tenant1; iosApp,tenant1; xyz,tenant2")
val finalMap = keyMap.groupBy { case (k, _) => tentMap(k) }
Printing out finalMap gives:
Map(tenant2 -> Map(xyz -> key3), tenant1 -> Map(androidApp -> key1, iosApp -> key2))
Which is what you wanted.

Extracting data from RDD in Scala/Spark

So I have a large dataset that is a sample of a stackoverflow userbase. One line from this dataset is as follows:
<row Id="42" Reputation="11849" CreationDate="2008-08-01T13:00:11.640" DisplayName="Coincoin" LastAccessDate="2014-01-18T20:32:32.443" WebsiteUrl="" Location="Montreal, Canada" AboutMe="A guy with the attention span of a dead goldfish who has been having a blast in the industry for more than 10 years.
Mostly specialized in game and graphics programming, from custom software 3D renderers to accelerated hardware pipeline programming." Views="648" UpVotes="337" DownVotes="40" Age="35" AccountId="33" />
I would like to extract the number from reputation, in this case it is "11849" and the number from age, in this example it is "35" I would like to have them as floats.
The file is located in a HDFS so it comes in the format RDD
val linesWithAge = lines.filter(line => line.contains("Age=")) //This is filtering data which doesnt have age
val repSplit = linesWithAge.flatMap(line => line.split("\"")) //Here I am trying to split the data where there is a "
so when I split it with quotation marks the reputation is in index 3 and age in index 23 but how do I assign these to a map or a variable so I can use them as floats.
Also I need it to do this for every line on the RDD.
EDIT:
val linesWithAge = lines.filter(line => line.contains("Age=")) //transformations from the original input data
val repSplit = linesWithAge.flatMap(line => line.split("\""))
val withIndex = repSplit.zipWithIndex
val indexKey = withIndex.map{case (k,v) => (v,k)}
val b = indexKey.lookup(3)
println(b)
So if added an index to the array and now I've successfully managed to assign it to a variable but I can only do it to one item in the RDD, does anyone know how I could do it to all items?
What we want to do is to transform each element in the original dataset (represented as an RDD) into a tuple containing (Reputation, Age) as numeric values.
One possible approach is to transform each element of the RDD using String operations in order to extract the values of the elements "Age" and "Reputation", like this:
// define a function to extract the value of an element, given the name
def findElement(src: Array[String], name:String):Option[String] = {
for {
entry <- src.find(_.startsWith(name))
value <- entry.split("\"").lift(1)
} yield value
}
We then use that function to extract the interesting values from every record:
val reputationByAge = lines.flatMap{line =>
val elements = line.split(" ")
for {
age <- findElement(elements, "Age")
rep <- findElement(elements, "Reputation")
} yield (rep.toInt, age.toInt)
}
Note how we don't need to filter on "Age" before doing this. If we process a record that does not have "Age" or "Reputation", findElement will return None. Henceforth the result of the for-comprehension will be None and the record will be flattened by the flatMap operation.
A better way to approach this problem is by realizing that we are dealing with structured XML data. Scala provides built-in support for XML, so we can do this:
import scala.xml.XML
import scala.xml.XML._
// help function to map Strings to Option where empty strings become None
def emptyStrToNone(str:String):Option[String] = if (str.isEmpty) None else Some(str)
val xmlReputationByAge = lines.flatMap{line =>
val record = XML.loadString(line)
for {
rep <- emptyStrToNone((record \ "#Reputation").text)
age <- emptyStrToNone((record \ "#Age").text)
} yield (rep.toInt, age.toInt)
}
This method relies on the structure of the XML record to extract the right attributes. As before, we use the combination of Option values and flatMap to remove records that do not contain all the information we require.
First, you need a function which extracts the value for a given key of your line (getValueForKeyAs[T]), then do:
val rdd = linesWithAge.map(line => (getValueForKeyAs[Float](line,"Age"), getValueForKeyAs[Float](line,"Reputation")))
This should give you an rdd of type RDD[(Float,Float)]
getValueForKeyAs could be implemented like this:
def getValueForKeyAs[A](line:String, key:String) : A = {
val res = line.split(key+"=")
if(res.size==1) throw new RuntimeException(s"no value for key $key")
val value = res(1).split("\"")(1)
return value.asInstanceOf[A]
}

How to efficiently delete subset in spark RDD

When conducting research, I find it somewhat difficult to delete all the subsets in Spark RDD.
The data structure is RDD[(key,set)]. For example, it could be:
RDD[ ("peter",Set(1,2,3)), ("mike",Set(1,3)), ("jack",Set(5)) ]
Since the set of mike (Set(1,3)) is a subset of peter's (Set(1,2,3)), I want to delete "mike", which will end up with
RDD[ ("peter",Set(1,2,3)), ("jack",Set(5)) ]
It is easy to implement in python locally with two "for" loop operation. But when I want to extend to cloud with scala and spark, it is not that easy to find a good solution.
Thanks
I doubt we can escape to comparing each element to each other (the equivalent of a double loop in a non-distributed algorithm). The subset operation between sets is not reflexive, meaning that we need to compare is "alice" subsetof "bob" and is "bob" subsetof "alice".
To do this using the Spark API, we can resort to multiplying the data with itself using a cartesian product and verifying each entry of the resulting matrix:
val data = Seq(("peter",Set(1,2,3)), ("mike",Set(1,3)), ("anne", Set(7)),("jack",Set(5,4,1)), ("lizza", Set(5,1)), ("bart", Set(5,4)), ("maggie", Set(5)))
// expected result from this dataset = peter, olga, anne, jack
val userSet = sparkContext.parallelize(data)
val prod = userSet.cartesian(userSet)
val subsetMembers = prod.collect{case ((name1, set1), (name2,set2)) if (name1 != name2) && (set2.subsetOf(set1)) && (set1 -- set2).nonEmpty => (name2, set2) }
val superset = userSet.subtract(subsetMembers)
// lets see the results:
superset.collect()
// Array[(String, scala.collection.immutable.Set[Int])] = Array((olga,Set(1, 2, 3)), (peter,Set(1, 2, 3)), (anne,Set(7)), (jack,Set(5, 4, 1)))
This can be achieved by using RDD.fold function.
In this case the output required is a "List" (ItemList) of superset items. For this the input should also be converted to "List" (RDD of ItemList)
import org.apache.spark.rdd.RDD
// type alias for convinience
type Item = Tuple2[String, Set[Int]]
type ItemList = List[Item]
// Source RDD
val lst:RDD[Item] = sc.parallelize( List( ("peter",Set(1,2,3)), ("mike",Set(1,3)), ("jack",Set(5)) ) )
// Convert each element as a List. This is needed for using fold function on RDD
// since the data-type of the parameters are the same as output parameter
// data-type for fold function
val listOflst:RDD[ItemList] = lst.map(x => List(x))
// for each element in second ItemList
// - Check if it is not subset of any element in first ItemList and add first
// - Remove the subset of newly added elements
def combiner(first:ItemList, second:ItemList) : ItemList = {
def helper(lst: ItemList, i:Item) : ItemList = {
val isSubset: Boolean = lst.exists( x=> i._2.subsetOf(x._2))
if( isSubset) lst else i :: lst.filterNot( x => x._2.subsetOf(i._2))
}
second.foldLeft(first)(helper)
}
listOflst.fold(List())(combiner)
You can use filter after a map.
You can build like a map that will return a value for what you want to delete. First build a function:
def filter_mike(line):
if line[1] != Set(1,3):
return line
else:
return None
Then you can filter now like this:
your_rdd.map(filter_mike).filter(lambda x: x != None)
This will work

Creating a Map by reading elements of List in Scala

I have some records in a List .
Now I want to create a new Map(Mutable Map) from that List with unique key for each record. I want to achieve this my reading a List and calling the higher order method called map in scala.
records.txt is my input file
100,Surender,2015-01-27
100,Surender,2015-01-30
101,Raja,2015-02-19
Expected Output :
Map(0-> 100,Surender,2015-01-27, 1 -> 100,Surender,2015-01-30,2 ->101,Raja,2015-02-19)
Scala Code :
object SampleObject{
def main(args:Array[String]) ={
val mutableMap = scala.collection.mutable.Map[Int,String]()
var i:Int =0
val myList=Source.fromFile("D:\\Scala_inputfiles\\records.txt").getLines().toList;
println(myList)
val resultList= myList.map { x =>
{
mutableMap(i) =x.toString()
i=i+1
}
}
println(mutableMap)
}
}
But I am getting output like below
Map(1 -> 101,Raja,2015-02-19)
I want to understand why it is keeping the last record alone .
Could some one help me?
val mm: Map[Int, String] = Source.fromFile(filename).getLines
.zipWithIndex
.map({ case (line, i) => i -> line })(collection.breakOut)
Here the (collection.breakOut) is to avoid the extra parse caused by toMap.
Consider
(for {
(line, i) <- Source.fromFile(filename).getLines.zipWithIndex
} yield i -> line).toMap
where we read each line, associate an index value starting from zero and create a map out of each association.

Create Map in Scala using loop

I am trying to create a map after getting result for each items in the list. Here is what I tried so far:
val sourceList: List[(Int, Int)] = ....
val resultMap: Map[Int, Int] = for(srcItem <- sourceList) {
val result: Int = someFunction(srcItem._1)
Map(srcItem._1 -> result)
}
But I am getting type mismatch error in IntelliJ and I am definitely not writing proper syntax here. I don't think I can use yield as I don't want List of Map. What is correct way to create Map using for loop. Any suggestion?
The simplest way is to create the map out of a list of tuples:
val resultMap = sourceList.map(item => (item._1, someFunction(item._1))).toMap
Or, in the monadic way:
val listOfTuples = for {
(value, _) <- sourceList
} yield (value, someFunction(value))
val resultMap = listOfTuples.toMap
Alternatively, if you want to avoid the creation of listOfTuples you can make the transformation a lazy one by calling .view on sourceList and then call toMap:
val resultMap = sourceList.view
.map(item => (item._1, someFunction(item._1)))
.toMap
Finally, if you really want to avoid generating extra objects you can use a mutable Map instead and append the keys and values to it using += or .put