Is string concatenation in scala as costly as it is in Java? - scala

In Java, it's a common best practice to do string concatenation with StringBuilder due to the poor performance of appending strings using the + operator. Is the same practice recommended for Scala or has the language improved on how java performs its string concatenation?

Scala uses Java strings (java.lang.String), so its string concatenation is the same as Java's: the same thing is taking place in both. (The runtime is the same, after all.) There is a special StringBuilder class in Scala, that "provides an API compatible with java.lang.StringBuilder"; see http://www.scala-lang.org/api/2.7.5/scala/StringBuilder.html.
But in terms of "best practices", I think most people would generally consider it better to write simple, clear code than maximally efficient code, except when there's an actual performance problem or a good reason to expect one. The + operator doesn't really have "poor performance", it's just that s += "foo" is equivalent to s = s + "foo" (i.e. it creates a new String object), which means that, if you're doing a lot of concatenations to (what looks like) "a single string", you can avoid creating unnecessary objects — and repeatedly recopying earlier portions from one string to another — by using a StringBuilder instead of a String. Usually the difference is not important. (Of course, "simple, clear code" is slightly contradictory: using += is simpler, using StringBuilder is clearer. But still, the decision should usually be based on code-writing considerations rather than minor performance considerations.)

Scalas String concatenation works the same way as Javas does.
val x = 5
"a"+"b"+x+"c"
is translated to
new StringBuilder()).append("ab").append(BoxesRunTime.boxToInteger(x)).append("c").toString()
StringBuilder is scala.collection.mutable.StringBuilder. That's the reason why the value appended to the StringBuilder is boxed by the compiler.
You can check the behavior by decompile the bytecode with javap.

I want to add: if you have a sequence of strings, then there is already a method to create a new string out of them (all items, concatenated). It's called mkString.
Example: (http://ideone.com/QJhkAG)
val example = Seq("11111", "2222", "333", "444444")
val result = example.mkString
println(result) // prints "111112222333444444"

Scala uses java.lang.String as the type for strings, so it is subject to the same characteristics.

Related

Is it Scala style to use a for loop in Scala/Spark?

I have heard that it is a good practice in Scala to eliminate for loops and do things "the Scala way". I even found a Scala style checker at http://www.scalastyle.org. Are for loops a no-no in Scala? In a course at https://www.udemy.com/course/apache-spark-with-scala-hands-on-with-big-data/learn/lecture/5363798#overview I found this example, which makes me thing that for looks are okay to use, but using the Scala format and syntax of course, in a single line and not like the traditional Java for looks in multiple lines of code. See this example I found from that Udemy course:
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
for (ship <- shipList) {println(ship)}
That for loop prints this result, as expected:
Enterprise Defiant Voyager Deep Space Nine
I was wondering if using for as in the example above is acceptable Scala style code, or it if is a no-no and why. Thank you!
There is no problem in this for loop, but you can use functions form List object for your work in more functional way.
e.g. instead of using
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
for (ship <- shipList) {println(ship)}
You can use
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
shipList.foreach(element => println(element) )
or
shipList.foreach(println)
You can use for loops in Scala, there is no problem with that. But the difference is that this for-loop is not an expression and does not return a value, so you need to use a variable in order to return any value. Scala gives preference to work with immutable types.
In your example you print messages in the console, you need to perform a "side effect" to extract the value breaking the referencial transparency, I mean, you depend on the IO operation to extract a value, or you have mutate a variable which is in the scope which maybe is being accessed by another thread or another concurrent task thereby there is no guarantee that the value that you collect wont be what you are expecting. Obviously, all these hypothesis are related to concurrent/parallel programming and there is where Scala and the immutable style help.
To show the elements of a collection you can use a for loop, but if you want to count the total number of chars in Scala you do that using a expression like:
val chars = shipList.foldLeft(0)((a, b) => a + b.length)
To sum up, most of the times the Scala code that you will read uses immutable style of programming although not always because Scala supports the other way of coding too, but it is weird to find something using a classic Java OOP style, mutating object instances and using getters and setters.

Are these lines of scala code equivalent?

Given an object myObject which has a method getSomething which takes in a String parameter and returns a String
Are #1 and #2 equivalent?
val foo = myOjbect.getSomething("foo")
val foo = myOjbect getSomething "foo"
And are either acceptable / preferred over the other? When would you use 1 vs 2 and vica versa?
They are strictly equivalent.
Regarding your second question, Stack Overflow is not really meant to decide what is acceptable or preferred. Yet you can refer to the scala documentation on method invocation that states this:
Scala has a special punctuation-free syntax for invoking methods that take one argument. Many Scala programmers use this notation for symbolic-named methods:
// recommended
a + b
// legal, but less readable
a+b
// legal, but definitely strange
a.+(b)
but avoid it for almost all alphabetic-named methods:
// recommended
names.mkString(",")
// also sometimes seen; controversial
names mkString ","
Yes they are identical.
There is no consensus (I don't think it is possible to achieve one) which form are preferable (such questions are offtopic here).
Yes, they are identical.
I mostly use the 2nd version when developing tests like:
result mustEqual "Hello"

Merging a list of Strings using mkString vs foldRight

I am currently trying out things in Scala, trying to get accustomed to functional programming as well as leaning a new language again (it's been a while since last time).
Now given a list of strings if I want to merge them into one long string (e.g. "scala", "is", "fun" => "scalaisfun") I figured one way to do it would be to do a foldRight and apply concatenation on the respective elements. Another way, admittedly much simpler, is to call mkString.
I checked on github but couldn't really find the source code for the respective functions (any help on that would be appreciated), so I am not sure how the functions are implemented. From the top of my head, I think the mkString is more flexible but it feels that there might be a foldRight in the implementation somewhere. Is there any truth to it?
Otherwise the scaladocs mention that mkString calls on toString for each respective element. Seeing that they are already strings to start with, that could be one negative point for mkStringin this particular case. Any comments on the pros and cons of both methods, with respect to performance, simplicity/elegance etc?
Simple answer: use mkString.
someString.toString returns the same object.
mkString is implemented with a single StringBuilder and it creates only 1 new string. With foldLeft you'll create N-1 new strings.
You could use StringBuilder in foldLeft, it will be as fast as mkString, but mkString is shorter:
strings.foldLeft(new StringBuilder){ (sb, s) => sb append s }.toString
strings.mkString // same result, at least the same speed
Don't use foldRight unless you really need it, as it will overflow your stack for large collections (for some types of collections). foldLeft or fold will work (does not store intermediate data on the stack), but will be slower and more awkward than mkString. If the list is nonempty, reduce and reduceLeft will also work.
Im memory serves, mkString uses a StringBuilder to build the String which is efficient. You could accomplish the same thing using a Scala StringBuilder as the accumulator to foldRight, but why bother if mkString can already do all that good stuff for you. Plus mkString gives you the added benefit of also including an optional delimiter. You could do that in foldRight but it's already done for you with mkString

The case for point free style in Scala

This may seem really obvious to the FP cognoscenti here, but what is point free style in Scala good for? What would really sell me on the topic is an illustration that shows how point free style is significantly better in some dimension (e.g. performance, elegance, extensibility, maintainability) than code solving the same problem in non-point free style.
Quite simply, it's about being able to avoid specifying a name where none is needed, consider a trivial example:
List("a","b","c") foreach println
In this case, foreach is looking to accept String => Unit, a function that accepts a String and returns Unit (essentially, that there's no usable return and it works purely through side effect)
There's no need to bind a name here to each String instance that's passed to println. Arguably, it just makes the code more verbose to do so:
List("a","b","c") foreach {println(_)}
Or even
List("a","b","c") foreach {s => println(s)}
Personally, when I see code that isn't written in point-free style, I take it as an indicator that the bound name may be used twice, or that it has some significance in documenting the code. Likewise, I see point-free style as a sign that I can reason about the code more simply.
One appeal of point-free style in general is that without a bunch of "points" (values as opposed to functions) floating around, which must be repeated in several places to thread them through the computation, there are fewer opportunities to make a mistake, e.g. when typing a variable's name.
However, the advantages of point-free are quickly counterbalanced in Scala by its meagre ability to infer types, a fact which is exacerbated by point-free code because "points" serve as clues to the type inferencer. In Haskell, with its almost-complete type inferencing, this is usually not an issue.
I see no other advantage than "elegance": It's a little bit shorter, and may be more readable. It allows to reason about functions as entities, without going mentally a "level deeper" to function application, but of course you need getting used to it first.
I don't know any example where performance improves by using it (maybe it gets worse in cases where you end up with a function when a method would be sufficient).
Scala's point-free syntax is part of the magic Scala operators-which-are-really-functions. Even the most basic operators are functions:
For example:
val x = 1
val y = x + 1
...is the same as...
val x = 1
val y = x.+(1)
...but of course, the point-free style reads more naturally (the plus appears to be an operator).

Scala multiline string placeholder

This question is related to ( Why is there no string interpolation in Scala? ), but deals more specifically with multi-line strings.
I've just about bought into Martin's suggestion for simple string placeholder where
msg = "Hello {name}!"
can be be represented without much difference in Scala today like this:
msg = "Hello"+name+"!"
However, I don't think that approach holds with multi-line strings. And, in some cases it may be encouraging other poor practices in favor of readability. Note that in the Scala Play ANORM database mapping how the framework tries to preserve readability in plain SQL (using placeholders), but at the expense of duplicating the {countryCode} variable name and in a non-type-safe way, see...
.on("countryCode" -> "FRA")
SQL(
"""
select * from Country c
join CountryLanguage l on l.CountryCode = c.Code
where c.code = {countryCode};
"""
).on("countryCode" -> "FRA")
Additionally, assuming no change in Scala to address this, what would be the implication of using inline XML? How would performance, memory, etc. with something like:
val countryCode = "FRA"
SQL(<c>
select * from Country c
join CountryLanguage l on l.CountryCode = c.Code
where c.code = {countryCode};
</c>.text)
A scala.xml.Elem would be constructed which had the string contents represented as an ArrayBuffer, chopped up for every { } substitution. I'm certainly no authority but I believe what would happen is that there's a little extra overhead in construction the object and then getting the children and concatenating them together at runtime but, at least in this example, as soon as it's passed to the SQL function which then extracts the string it wants (or perhaps this would be done with an implicit) the Elem object would be discarded so there'd be a little extra memory usage, but only briefly.
But in the bigger picture, I don't think it's performance that would hinder the adoption of this solution but I guess a lot of people would be uncomfortable abusing XML in this way by using a made-up tag. The problem would be with other users reading the code later trying to figure out the semantic meaning of the tag... only to find there isn't one.
The example you give is almost certainly not doing string concatenation, it's creating parameterized SQL statements (probably via JDBC's PreparedStatement).
Ironically, the lack of easy string concatenation is probably slightly encouraging best practices in this case (although I certainly wouldn't use that as an argument either way on the topic).
If you are coming to this question from the future, multi-line string interpolation is now a thing.
val when = "now"
println(s"""this is $when a thing.""")
// this is now a thing