I'm really new to Scala and I'm not even able to concatenate Strings. Here is my code:
object RandomData {
private[this] val bag = new scala.util.Random
def apply(sensorId: String, stamp: Long, size: Int): String = {
var cpt: Int = 0
var data: String = "test"
repeat(10) {
data += "_test"
}
return data
}
}
I got the error:
type mismatch;
found : Unit
required: com.excilys.ebi.gatling.core.structure.ChainBuilder
What am I doing wrong ??
repeat is offered by Gatling in order to repeat Gatling tasks, e.g., query a website. If you have a look at the documentation (I wasn't able to find a link to the API doc of repeat), you'll see that repeat expects a chain, which is why your error message says "required: com.excilys.ebi.gatling.core.structure.ChainBuilder". However, all you do is to append to a string - which will not return a value of type ChainBuilder.
Moreover, appending to a string is nothing that should be done via Gatling. It looks to me as if you are confusing Gatling's repeat with a Scala for loop. If you only want to append "_test" to data 10 times, use one of Scala's loops (for, while) or a functional approach with e.g. foldLeft. Here are two examples:
/* Imperative style loop */
for(i <- 1 to 10) {
data += "_test"
}
/* Functional style with lazy streams */
data += Stream.continually("_test").take(10).mkString("")
Your problem is that the block
{
data += "_test"
}
evaluates to Unit, whereas the repeat method seems to want it to evaluate to a ChainBuilder.
Check out the documentation for the repeat method. I was unable to find it, but it's probably reasonable to assume that it looks something like
def repeat(numTimes: Int)(thunk: => ChainBuilder): Unit
I'm not sure if the repeat method does anything special, but with your usage, you could just use this block instead of the repeat(10){...}
for(i <- 1 to 10) data += "_test"
Also, as a side note, you don't need the return keyword with scala. You can just say data instead of return data.
Related
I have a file, each row is a json array.
I reading each line of the file, and trying to convert the rows into a json array, and then for each element I am converting to a case class using json spray.
I have this so far:
for (line <- source.getLines().take(10)) {
val jsonArr = line.parseJson.convertTo[JsArray]
for (ele <- jsonArr.elements) {
val tryUser = Try(ele.convertTo[User])
}
}
How could I convert this entire process into a single line statement?
val users: Seq[User] = source.getLines.take(10).map(line => line.parseJson.convertTo[JsonArray].elements.map(ele => Try(ele.convertTo[User])
The error is:
found : Iterator[Nothing]
Note: I used Scala 2.13.6 for all my examples.
There is a lot to unpack in these few lines of code. First of all, I'll share some code that we can use to generate some meaningful input to play around with.
object User {
import scala.util.Random
private def randomId: Int = Random.nextInt(900000) + 100000
private def randomName: String = Iterator
.continually(Random.nextInt(26) + 'a')
.map(_.toChar)
.take(6)
.mkString
def randomJson(): String = s"""{"id":$randomId,"name":"$randomName"}"""
def randomJsonArray(size: Int): String =
Iterator.continually(randomJson()).take(size).mkString("[", ",", "]")
}
final case class User(id: Int, name: String)
import scala.util.{Try, Success, Failure}
import spray.json._
import DefaultJsonProtocol._
implicit val UserFormat = jsonFormat2(User.apply)
This is just some scaffolding to define some User domain object and come up with a way to generate a JSON representation of an array of such objects so that we can then use a JSON library (spray-json in this case) to parse it back into what we want.
Now, going back to your question. This is a possible way to massage your data into its parsed representation. It may not fit 100% what your are trying to do, but there's some nuance in the data types involved and how they work:
val parsedUsers: Iterator[Try[User]] =
for {
line <- Iterator.continually(User.randomJsonArray(4)).take(10)
element <- line.parseJson.convertTo[JsArray].elements
} yield Try(element.convertTo[User])
First difference: notice that I use the for comprehension in a form in which the "outcome" of an iteration is not a side effect (for (something) { do something }) but an actual value for (something) yield { return a value }).
Second difference: I explicitly asked for an Iterator[Try[User]] rather than a Seq[User]. We can go very down into a rabbit hole on the topic of why the types are what they are here, but the simple explanation is that a for ... yield expression:
returns the same type as the one in the first line of the generation -- if you start with a val ns: Iterator[Int]; for (n<- ns) ... you'll get an iterator at the end
if you nest generators, they need to be of the same type as the "outermost" one
You can read more on for comprehensions on the Tour of Scala and the Scala Book.
One possible way of consuming this is the following:
for (user <- parsedUsers) {
user match {
case Success(user) => println(s"parsed object $user")
case Failure(error) => println(s"ERROR: '${error.getMessage}'")
}
As for how to turn this into a "one liner", for comprehensions are syntactic sugar applied by the compiler which turns every nested call into a flatMap and the final one into map, as in the following example (which yields an equivalent result as the for comprehension above and very close to what the compiler does automatically):
val parsedUsers: Iterator[Try[User]] = Iterator
.continually(User.randomJsonArray(4))
.take(10)
.flatMap(line =>
line.parseJson
.convertTo[JsArray]
.elements
.map(element => Try(element.convertTo[User]))
)
One note that I would like to add is that you should be mindful of readability. Some teams prefer for comprehensions, others manually rolling out their own flatMap/map chains. Coders discretion is advised.
You can play around with this code here on Scastie (and here is the version with the flatMap/map calls).
I written the code below for finding even numbers and the number just before it in a RDD object. In this I first converted that to a List and tried to use my own function to find the even numbers and the numbers just before them. The following is my code. In this I have made an empty list in which I am trying to append the numbers one by one.
object EvenandOdd
{
def mydef(nums:Iterator[Int]):Iterator[Int]=
{
val mylist=nums.toList
val len= mylist.size
var elist=List()
var i:Int=0
var flag=0
while(flag!=1)
{
if(mylist(i)%2==0)
{
elist.++=List(mylist(i))
elist.++=List(mylist(i-1))
}
if(i==len-1)
{
flag=1
}
i=i+1
}
}
def main(args:Array[String])
{
val myrdd=sc.parallelize(List(1,2,3,4,5,6,7,8,9,10),2)
val myx=myrdd.mapPartitions(mydef)
myx.collect
}
}
I am not able to execute this command in Scala shell as well as in Eclipse and not able to figure out the error as I am just a beginner to Scala.
The following are the errors I got in Scala Shell.
<console>:35: error: value ++= is not a member of List[Nothing]
elist.++=List(mylist(i))
^
<console>:36: error: value ++= is not a member of List[Nothing]
elist.++=List(mylist(i-1))
^
<console>:31: error: type mismatch;
found : Unit
required: Iterator[Int]
while(flag!=1)
^
Your code looks too complicated and not functional. Also, it introduce potential problems with memory: you take Iterator as param and return Iterator as output. So, knowing that Iterator itself could be lazy and has under the hood huge amount of data, materializing it inside method with list could cause OOM. So your task is to get as much data from initial iterator as it it enough to answer two methods for new Iterator: hasNext and next
For example (based on your implementation, which outputs duplicates in case of sequence of even numbers) it could be:
def mydef(nums:Iterator[Int]): Iterator[Int] = {
var before: Option[Int] = None
val helperIterator = new Iterator[(Option[Int], Int)] {
override def hasNext: Boolean = nums.hasNext
override def next(): (Option[Int], Int) = {
val result = (before, nums.next())
before = Some(result._2)
result
}
}
helperIterator.withFilter(_._2 % 2 == 0).flatMap{
case (None, next) => Iterator(next)
case (Some(prev), next) => Iterator(prev, next)
}
}
Here you have two iterators. One helper, which just prepare data, providing previous element for each next. And next on - resulting, based on helper, which filter only even for sequence elements (second in pair), and output both when required (or just one, if first element in sequence is even)
For initial code
Additionally to answer of #pedrorijo91, in initial code you do did not also return anything (suppose you wanted to convert elist to Iterator)
It will be easier if you use a functional coding style rather than an iterative coding style. In functional style the basic operation is straightforward.
Given a list of numbers, the following code will find all the even numbers and the values that precede them:
nums.sliding(2,1).filter(_(1) % 2 == 0)
The sliding operation creates a list containing all possible pairs of adjacent values in the original list.
The filter operation takes only those pairs where the second value is even.
The result is an Iterator[List[Int]] where each List[Int] has two elements. You should be able to use this in your RDD framework.
It's marked part of the developer API, so there's no guarantee it'll stick around, but the RDDFunctions object actually defines sliding for RDDs. You will have to make sure it sees elements in the order you want.
But this becomes something like
rdd.sliding(2).filter(x => x(1) % 2 == 0) # pairs of (preceding number, even number)
for the first 2 errors:
there's no ++= operator on Lists. You will have to do list = list ++ element
I've read a lot of code snippets in scala that make use of the symbol =>, but I've never really been able to comprehend it. I've tried to search in the internet, but couldn't find anything comprehensive. Any pointers/explanation about how the symbol is/can be used will be really helpful.
(More specifially, I also want to know how the operator comes into picture in function literals)
More than passing values/names, => is used to define a function literal, which is an alternate syntax used to define a function.
Example time. Let's say you have a function that takes in another function. The collections are full of them, but we'll pick filter. filter, when used on a collection (like a List), will take out any element that causes the function you provide to return false.
val people = List("Bill Nye", "Mister Rogers", "Mohandas Karamchand Gandhi", "Jesus", "Superman", "The newspaper guy")
// Let's only grab people who have short names (less than 10 characters)
val shortNamedPeople = people.filter(<a function>)
We could pass in an actual function from somewhere else (def isShortName(name: String): Boolean, perhaps), but it would be nicer to just place it right there. Alas, we can, with function literals.
val shortNamedPeople = people.filter( name => name.length < 10 )
What we did here is create a function that takes in a String (since people is of type List[String]), and returns a Boolean. Pretty cool, right?
This syntax is used in many contexts. Let's say you want to write a function that takes in another function. This other function should take in a String, and return an Int.
def myFunction(f: String => Int): Int = {
val myString = "Hello!"
f(myString)
}
// And let's use it. First way:
def anotherFunction(a: String): Int = {
a.length
}
myFunction(anotherFunction)
// Second way:
myFunction((a: String) => a.length)
That's what function literals are. Going back to by-name and by-value, there's a trick where you can force a parameter to not be evaluated until you want to. The classic example:
def logger(message: String) = {
if(loggingActivated) println(message)
}
This looks alright, but message is actually evaluated when logger is called. What if message takes a while to evaluate? For example, logger(veryLongProcess()), where veryLongProcess() returns a String. Whoops? Not really. We can use our knowledge about function literals to force veryLongProcess() not to be called until it is actually needed.
def logger(message: => String) = {
if(loggingActivated) println(message)
}
logger(veryLongProcess()) // Fixed!
logger is now taking in a function that takes no parameters (hence the naked => on the left side). You can still use it as before, but now, message is only evaluated when it's used (in the println).
I came across this sentence in Scala in explaining its functional behavior.
operation of a program should map input of values to output values rather than change data in place
Could somebody explain it with a good example?
Edit: Please explain or give example for the above sentence in its context, please do not make it complicate to get more confusion
The most obvious pattern that this is referring to is the difference between how you would write code which uses collections in Java when compared with Scala. If you were writing scala but in the idiom of Java, then you would be working with collections by mutating data in place. The idiomatic scala code to do the same would favour the mapping of input values to output values.
Let's have a look at a few things you might want to do to a collection:
Filtering
In Java, if I have a List<Trade> and I am only interested in those trades executed with Deutsche Bank, I might do something like:
for (Iterator<Trade> it = trades.iterator(); it.hasNext();) {
Trade t = it.next();
if (t.getCounterparty() != DEUTSCHE_BANK) it.remove(); // MUTATION
}
Following this loop, my trades collection only contains the relevant trades. But, I have achieved this using mutation - a careless programmer could easily have missed that trades was an input parameter, an instance variable, or is used elsewhere in the method. As such, it is quite possible their code is now broken. Furthermore, such code is extremely brittle for refactoring for this same reason; a programmer wishing to refactor a piece of code must be very careful to not let mutated collections escape the scope in which they are intended to be used and, vice-versa, that they don't accidentally use an un-mutated collection where they should have used a mutated one.
Compare with Scala:
val db = trades filter (_.counterparty == DeutscheBank) //MAPPING INPUT TO OUTPUT
This creates a new collection! It doesn't affect anyone who is looking at trades and is inherently safer.
Mapping
Suppose I have a List<Trade> and I want to get a Set<Stock> for the unique stocks which I have been trading. Again, the idiom in Java is to create a collection and mutate it.
Set<Stock> stocks = new HashSet<Stock>();
for (Trade t : trades) stocks.add(t.getStock()); //MUTATION
Using scala the correct thing to do is to map the input collection and then convert to a set:
val stocks = (trades map (_.stock)).toSet //MAPPING INPUT TO OUTPUT
Or, if we are concerned about performance:
(trades.view map (_.stock)).toSet
(trades.iterator map (_.stock)).toSet
What are the advantages here? Well:
My code can never observe a partially-constructed result
The application of a function A => B to a Coll[A] to get a Coll[B] is clearer.
Accumulating
Again, in Java the idiom has to be mutation. Suppose we are trying to sum the decimal quantities of the trades we have done:
BigDecimal sum = BigDecimal.ZERO
for (Trade t : trades) {
sum.add(t.getQuantity()); //MUTATION
}
Again, we must be very careful not to accidentally observe a partially-constructed result! In scala, we can do this in a single expression:
val sum = (0 /: trades)(_ + _.quantity) //MAPPING INTO TO OUTPUT
Or the various other forms:
(trades.foldLeft(0)(_ + _.quantity)
(trades.iterator map (_.quantity)).sum
(trades.view map (_.quantity)).sum
Oh, by the way, there is a bug in the Java implementation! Did you spot it?
I'd say it's the difference between:
var counter = 0
def updateCounter(toAdd: Int): Unit = {
counter += toAdd
}
updateCounter(8)
println(counter)
and:
val originalValue = 0
def addToValue(value: Int, toAdd: Int): Int = value + toAdd
val firstNewResult = addToValue(originalValue, 8)
println(firstNewResult)
This is a gross over simplification but fuller examples are things like using a foldLeft to build up a result rather than doing the hard work yourself: foldLeft example
What it means is that if you write pure functions like this you always get the same output from the same input, and there are no side effects, which makes it easier to reason about your programs and ensure that they are correct.
so for example the function:
def times2(x:Int) = x*2
is pure, while
def add5ToList(xs: MutableList[Int]) {
xs += 5
}
is impure because it edits data in place as a side effect. This is a problem because that same list could be in use elsewhere in the the program and now we can't guarantee the behaviour because it has changed.
A pure version would use immutable lists and return a new list
def add5ToList(xs: List[Int]) = {
5::xs
}
There are plenty examples with collections, which are easy to come by but might give the wrong impression. This concept works at all levels of the language (it doesn't at the VM level, however). One example is the case classes. Consider these two alternatives:
// Java-style
class Person(initialName: String, initialAge: Int) {
def this(initialName: String) = this(initialName, 0)
private var name = initialName
private var age = initialAge
def getName = name
def getAge = age
def setName(newName: String) { name = newName }
def setAge(newAge: Int) { age = newAge }
}
val employee = new Person("John")
employee.setAge(40) // we changed the object
// Scala-style
case class Person(name: String, age: Int) {
def this(name: String) = this(name, 0)
}
val employee = new Person("John")
val employeeWithAge = employee.copy(age = 40) // employee still exists!
This concept is applied on the construction of the immutable collection themselves: a List never changes. Instead, new List objects are created when necessary. Use of persistent data structures reduce the copying that would happen on a mutable data structure.
Perhaps this is just my background in more imperative programming, but I like having return statements in my code.
I understand that in Scala, returns are not necessary in many methods, because whatever the last computed value is is returned by default. I understand that this makes perfect sense for a "one-liner", e.g.
def square(x) = x * x
I also understand the definitive case for using explicit returns (when you have several branches your code could take, and you want to break out of the method for different branches, e.g. if an error occurs). But what about multiline functions? Wouldn't it be more readable and make more sense if there was an explicit return, e.g.
def average(x: List[Int]) : Float = {
var sum = 0
x.foreach(sum += _)
return sum / x.length.toFloat
}
def average(x: List[Int]) : Float =
x.foldLeft(0)(_ + _) / x.length.toFloat
UPDATE: While I was aiming to show how an iterative code can be made into a functional expression, #soc rightly commented that an even shorter version is x.sum / x.length.toFloat
I usually find that in Scala I have less need to "return in the middle". Furthermore, a large function is broken into smaller expressions that are clearer to reason. So instead of
var x = ...
if (some condition) x = ...
else x = ...
I would write
if (some condition) ...
else ....
Similar thing happens using match expressions. And you can always have helper nested classes.
Once you are comfortable with many forms of expressions that evaluate to a result (e.g., 'if') without a return statement, then having one in your methods looks out of place.
At one place of work we had a rule of not having 'return' in the middle of a method since it is easy for a reader of your code to miss it. If 'return' is only at the last line, what's the point of having it at all?
The return doesn't tell you anything extra, so I find it actually clutters my understanding of what goes on. Of course it returns; there are no other statements!
And it's also a good idea to get away from the habit of using return because it's really unclear where you are returning to when you're using heavily functional code:
def manyFutures = xs.map(x => Futures.future { if (x<5) return Nil else x :: Nil })
Where should the return leave execution? Outside of manyFutures? Just the inner block?
So although you can use explicit returns (at least if you annotate the return type explicitly), I suggest that you try to get used to the last-statement-is-the-return-value custom instead.
Your sense that the version with the return is more readable is probably a sign that you are still thinking imperatively rather than declaratively. Try to think of your function from the perspective of what it is (i.e. how it is defined), rather than what it does (i.e. how it is computed).
Let's look at your average example, and pretend that the standard library doesn't have the features that make an elegant one-liner possible. Now in English, we would say that the average of a list of numbers is the sum of the numbers in the list divided by its length.
def average(xs: List[Int]): Float = {
var sum = 0
xs.foreach(sum += _)
sum / xs.length.toFloat
}
This description is fairly close to the English version, ignoring the first two lines of the function. We can factor out the sum function to get a more declarative description:
def average(xs: List[Int]): Float = {
def sum(xs: List[Int]): Int = // ...
sum(xs) / xs.length.toFloat
}
In the declarative definition, a return statement reads quite unnaturally because it explicitly puts the focus on doing something.
I don't miss the return-statement at the end of a method because it is unnecessary. Furthermore, in Scala each method does return a value - even methods which should not return something. Instead of void there is the type Unit which is returned:
def x { println("hello") }
can be written as:
def x { println("hello"); return Unit }
But println returns already the type Unit - thus when you explicitly want to use a return-statement you have to use it in methods which return Unit, too. Otherwise you do not have continuous identical build-on code.
But in Scala there is the possibility to break your code in a lot of small methods:
def x(...): ReturnType = {
def func1 = ... // use funcX
def func2 = ... // use funcX
def func3 = ... // use funcX
funcX
}
An explicitly used return-statement does not help to understand the code better.
Also Scala has a powerful core library which allows you to solve many problems with less code:
def average(x: List[Int]): Float = x.sum.toFloat / x.length
I believe you should not use return, I think it somehow goes against the elegant concept that in Scala everything is an expression that evaluates to something.
Like your average method: it's just an expression that evaluates to Float, no need to return anything to make that work.