Using a csv feeder to check a count i.e. convert to Int - scala

I am using Gatling 2.0.0-SNAPSHOT (from a few months ago) and I have a CSV file:
search,min_results
test,1000
testing,50
...
I would, ideally, like to check that the number of results are equal or greater than the min_results column, something like:
.get(...)
.check(
jsonPath("$..results")
.count
.is(>= "${min_results}".toInt)
The last line doesn't work as it probably isn't valid Scala but also"${min_results}".toInt tries to convert ${min_results} rather than its value to an Int.
I would settle for just fixing the conversion toInt problem but this might result in a slightly less robust script. However, I could then add a max_results column and use the .in(xx to yy).
Thanks

In this case, toInt definitely won't work, as it would be called before the EL has been resolved.
However, this code snippet would achieve what you want :
.get(...)
.check(
jsonPath("$...results")
.count
.greaterThanOrEqual(session => session("min_results").as[String].toInt))
Here, using as[Int] works as expected, since the min_results attribute has already been resolved from the session at this point.

Related

Removing Data Type From Tuple When Printing In Scala

I currently have two maps: -
mapBuffer = Map[String, ListBuffer[(Int, String, Float)]
personalMapBuffer = Map[mapBuffer, String]
The idea of what I'm trying to do is create a list of something, and then allow a user to create a personalised list which includes a comment, so they'd have their own list of maps.
I am simply trying to print information as everything is good from the above.
To print the Key from mapBuffer, I use: -
mapBuffer.foreach(line => println(line._1))
This returns: -
Sample String 1
Sample String 2
To print the same thing from personalMapBuffer, I am using: -
personalMapBuffer.foreach(line => println(line._1.map(_._1)))
However, this returns: -
List(Sample String 1)
List(Sample String 2)
I obviously would like it to just return "Sample String" and remove the List() aspect. I'm assuming this has something to do with the .map function, although this was the only way I could find to access a tuple within a tuple. Is there a simple way to remove the data type? I was hoping for something simple like: -
line._1.map(_._1).removeDataType
But obviously no such pre-function exists. I'm very new to Scala so this might be something extremely simple (which I hope it is haha) or it could be a bit more complex. Any help would be great.
Thanks.
What you see if default List.toString behaviour. You build your own string with mkString operation :
val separator = ","
personalMapBuffer.foreach(line => println(line._1.map(_._1.mkString(separator))))
which will produce desired result of Sample String 1 or Sample String 1, Sample String 2 if there will be 2 strings.
Hope this helps!
I have found a way to get the result I was looking for, however I'm not sure if it's the best way.
The .map() method just returns a collection. You can see more info on that here:- https://www.geeksforgeeks.org/scala-map-method/
By using any sort of specific element finder at the end, I'm able to return only the element and not the data type. For example: -
line._1.map(_._1).head
As I was writing this Ivan Kurchenko replied above suggesting I use .mkString. This also works and looks a little bit better than .head in my mind.
line._1.map(_._1).mkString("")
Again, I'm not 100% if this is the most efficient way but if it is necessary for something, this way has worked for me for now.

How to convert "like each" into a functional form?

Let's say I have a column of a table whose data type is a character array. I want to pass in a functional select where clause, where the column is in a list of given strings. However, I cannot simply use (in; `col; myList) for reasons. Instead, I need to do the equivalent of:
max col like/: myList
which effectively gives the same result. However, I have tried to put this in functional form
(max; (like/:; `col; myList))
And I am getting a type error. Any ideas on how I could make this work?
A nice trick when dealing with this problem is using parse on a string of the select statement you want to functionalize. For example:
q)parse"select from t where max col like/: myList"
?
`t
,,(max;((/:;like);`col;`myList))
0b
()
Or specifically in your case you want the 3rd element of the result list (the where clause):
q)(parse"select from t where max col like/: myList")2
max ((/:;like);`col;`myList)
I even think using this pattern in your actual code can be a good idea, as functionalized statements like max ((/:;like);`col;`myList) can get pretty unreadable pretty quickly!
Hope that helps!
(any; ((/:;like); `col; enlist,myList))
it should be: (max;((/:;like);`col;`mylist))

How do I print the contents of an ApacheSpark RDD in my terminal?

This is my first time using Scala and ApacheSpark for a project. I'm trying to print the contents of an matrix when I run my code in the terminal, but nothing I try is working so far.
Instead I only get this printed:
org.apache.spark.mllib.linalg.distributed.MatrixEntry;#71870da7
org.apache.spark.mllib.linalg.distributed.CoordinateMatrix#1dcca8d3
I just using println() but when I use collect(), that doesn't give a good result either.
The default toString prints the name of a class followed by an address in memory.
org.apache.spark.mllib.linalg.distributed.MatrixEntry;#71870da7
You're going to want to find a way to iterate through your matrix and print each element.
Building on #zero323 's comment ( aside would you like to put an answer out there?): given an RDD[SomeType] you can call
rdd.collect()
or
rdd.take(k)
Then you can print out the results using normal toString() methods that depend on the type of the rdd contents. So if SomeType were a List[Double] then the
println(s"${rdd.collect().mkString(",")}")
would give you a single-line comma separated output of the results.
As #zero323 another consideration is: "do you really want to print out the contents of your rdd?" More likely you might only want a summary - such as
println(s"Number of entries in RDD is ${rdd.count()}")
Iterate over the rdd like this,
rdd.foreach(println)
scala>val rdd1 = sc.parallelize(List(1,2,3,4)).map(_*2)
To print the data within RDD
scala> rdd1.collect().foreach(println)
Output:
2
4
6
8

Parsing options that take more than one value with scopt in scala

I am using scopt to parse command line arguments in scala. I want it to be able to parse options with more than one value. For instance, the range option, if specified, should take exactly two values.
--range 25 45
Coming, from python background, I am basically looking for a way to do the following with scopt instead of python's argparse:
parser.add_argument("--range", default=None, nargs=2, type=float,
metavar=('start', 'end'),
help=(" Foo bar start and stop "))
I dont think minOccurs and maxOccurs solves my problem exactly, nor the key:value example in its help.
Looking at the source code, this is not possible. The Read type class used has a member tuplesToRead, but it doesn't seem to be working when you force it to 2 instead of 1. You will have to make a feature request, I guess, or work around this by using --min 25 --max 45, or --range '25 45' with a custom Read instance that splits this string into two parts. As #roterl noted, this is not a standard way of parsing.
It should be ok if only your values are delimited with something else than a space...
--range 25-45
... although you need to split them manually. Parse it with something like:
opt[String]('r', "range").action { (x, c) =>
val rx = "([0-9]+)\\-([0-9]+)".r
val rx(from, to) = x
c.copy(from = from.toInt, to = to.toInt)
}
// ...
println(s" Got range ${parsedArgs.from}..${parsedArgs.to}")

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]