Is there a substring proxy in scala? - scala

Using strings as String objects is pretty convenient for many string processing tasks.
I need extract some substrings to process and scala String class provide me with such functionality. But it is rather expensive: new String object is created every time substring function is used. Using tuples (string : String, start : Int, stop : Int) solves the performance problem, but makes code much complicated.
Is there any library for creating string proxys, that stores original string, range bound and is compatibles with other string functions?

Java 7u6 and later now implement #substring as a copy, not a view, making this answer obsolete.
If you're running your Scala program on the Sun/Oracle JVM, you shouldn't need to perform this optimization, because java.lang.String already does it for you.
A string is stored as a reference to a char array, together with an offset and a length. Substrings share the same underlying array, but with a different offset and/or length.

Look at the implementation of String (in particular substring(int beginIndex, int endIndex)): it's already represented as you wish.

Related

Most common string in list with counter

I am trying to write a function that
takes a List read from a file as input
outputs the most frequently used string as well an integer that shows the number of times that it was used.
example output:
("Cat",5)
function signature:
def mostFreq(info: List[List[String]]): (String, Int) =
First,I thought about creating a
Map variable and a counter variable
iterating over my list to fill the map
then iterate over the map
However, there must be a simpler way to do this scala but I'm not used to scala's library just yet.
I have seen this as one way to do it that uses only integers.
Finding the most frequent/common element in a collection?
But I was wondering how it could be done using string and integers.
The solution from the linked post has just about everything you need for this.
def mostFreq(info: List[List[String]]): (String, Int) =
info.flatten.groupBy(identity).mapValues(_.size).maxBy(_._2)
It doesn't handle ties terribly well, but you haven't stated how ties should be handled.

Working with opaque types (Char and Long)

I'm trying to export a Scala implementation of an algorithm for use in JavaScript. I'm using #JSExport. The algorithm works with Scala Char and Long values which are marked as opaque in the interoperability guide.
I'd like to know (a) what this means; and (b) what the recommendation is for dealing with this.
I presume it means I should avoid Char and Long and work with String plus a run-time check on length (or perhaps use a shapeless Sized collection) and Int instead.
But other ideas welcome.
More detail...
The kind of code I'm looking at is:
#JSExport("Foo")
class Foo(val x: Int) {
#JSExport("add")
def add(n: Int): Int = x+n
}
...which works just as expected: new Foo(1).add(2) produces 3.
Replacing the types with Long the same call reports:
java.lang.ClassCastException: 1 is not an instance of scala.scalajs.runtime.RuntimeLong (and something similar with methods that take and return Char).
Being opaque means that
There is no corresponding JavaScript type
There is no way to create a value of that type from JavaScript (except if there is an #JSExported constructor)
There is no way of manipulating a value of that type (other than calling #JSExported methods and fields)
It is still possible to receive a value of that type from Scala.js code, pass it around, and give it back to Scala.js code. It is also always possible to call .toString(), because java.lang.Object.toString() is #JSExported. Besides toString(), neither Char nor Long export anything, so you can't do anything else with them.
Hence, as you have experienced, a JavaScript 1 cannot be used as a Scala.js Long, because it's not of the right type. Neither is 'a' a valid Char (but it's a valid String).
Therefore, as you have inferred yourself, you must indeed avoid opaque types, and use other types instead if you need to create/manipulate them from JavaScript. The Scala.js side can convert back and forth using the standard tools in the language, such as someChar.toInt and someInt.toChar.
The choice of which type is best depends on your application. For Char, it could be Int or String. For Long, it could be String, a pair of Ints, or possibly even Double if the possible values never use more than 52 bits of precision.

Is there any simple way to extract multiple values from Guava's HashCode?

With Guava, hashing can be as simple as
byte[] byteHash = Hashing.md5().hashBytes(aByteArray).asBytes();
but seemingly only as all you want is a byte[] (possibly converted to a hex string), or a single int or long. But in one place I need two longs and in another one I need five int from sha1.
I can see some solutions like reading from new DataInputStream(new ByteArrayInputStream(byteHash)), using a ByteBuffer, or converting manually from the byte[]. However, all of them are extremely ugly (e.g. swallowing an impossible IOException) and long (and also inefficient, but this doesn't bother me here).
So is there any simple way to extract multiple (non-byte) values from Guava's HashCode?
There's nothing built in to HashCode for this, no.
Doing what you need with ByteBuffer seems really easy though, and neither long nor especially inefficient:
ByteBuffer buf = ByteBuffer.wrap(byteHash);
long l1 = buf.getLong();
long l2 = buf.getLong();
(I suppose an asReadOnlyByteBuffer() method could avoid the need for cloning a byte array, but I don't know if that's really necessary.)

Is string concatenation in scala as costly as it is in Java?

In Java, it's a common best practice to do string concatenation with StringBuilder due to the poor performance of appending strings using the + operator. Is the same practice recommended for Scala or has the language improved on how java performs its string concatenation?
Scala uses Java strings (java.lang.String), so its string concatenation is the same as Java's: the same thing is taking place in both. (The runtime is the same, after all.) There is a special StringBuilder class in Scala, that "provides an API compatible with java.lang.StringBuilder"; see http://www.scala-lang.org/api/2.7.5/scala/StringBuilder.html.
But in terms of "best practices", I think most people would generally consider it better to write simple, clear code than maximally efficient code, except when there's an actual performance problem or a good reason to expect one. The + operator doesn't really have "poor performance", it's just that s += "foo" is equivalent to s = s + "foo" (i.e. it creates a new String object), which means that, if you're doing a lot of concatenations to (what looks like) "a single string", you can avoid creating unnecessary objects — and repeatedly recopying earlier portions from one string to another — by using a StringBuilder instead of a String. Usually the difference is not important. (Of course, "simple, clear code" is slightly contradictory: using += is simpler, using StringBuilder is clearer. But still, the decision should usually be based on code-writing considerations rather than minor performance considerations.)
Scalas String concatenation works the same way as Javas does.
val x = 5
"a"+"b"+x+"c"
is translated to
new StringBuilder()).append("ab").append(BoxesRunTime.boxToInteger(x)).append("c").toString()
StringBuilder is scala.collection.mutable.StringBuilder. That's the reason why the value appended to the StringBuilder is boxed by the compiler.
You can check the behavior by decompile the bytecode with javap.
I want to add: if you have a sequence of strings, then there is already a method to create a new string out of them (all items, concatenated). It's called mkString.
Example: (http://ideone.com/QJhkAG)
val example = Seq("11111", "2222", "333", "444444")
val result = example.mkString
println(result) // prints "111112222333444444"
Scala uses java.lang.String as the type for strings, so it is subject to the same characteristics.

In Scala, is it possible to simultaneously extend a library and have a default conversion?

For example in the following article
http://www.artima.com/weblogs/viewpost.jsp?thread=179766
Two separate examples are given:
Automatic string conversion
Addition of append method
Suppose I want to have automatic string conversion AND a new append method. Is this possible? I have been trying to do both at the same time but I get compile errors. Does that mean the two implicits are conflicting?
You can have any number of implicit conversions from a class provided that each one can be unambiguously determined depending on usage. So the array to string and array to rich-array-class-containing-append is fine since String doesn't have an append method. But you can't convert to StringBuffer which has append methods which would interfere with your rich array append.