Understanding Sets and Sequences using String checking as an example - scala

I have a string which I would like to cross check if it is purely made of letters and space.
val str = "my long string to test"
val purealpha = " abcdefghijklmnopqrstuvwxyz".toSet
if (str.forall(purestring(_))) println("PURE") else "NOTPURE"
The above CONCISE code does the job. However, if I run it this way:
val str = "my long string to test"
val purealpha = " abcdefghijklmnopqrstuvwxyz" // not converted toSet
str.forall(purealpha(_)) // CONCISE code
I get an error (found: Char ... required: Boolean) and it can only work using the contains method this way:
str.forall(purealpha.contains(_))
My question is how can I use the CONCISE form without converting the string to a Set. Any suggestions on having my own String class with the right combination of methods to enable the nice code; or maybe some pure function(s) working on strings.
It's just a fun exercise I'm doing, so I can understand the intricate details of various methods on collections (including apply method) and how to write nice concise code and classes.

A slightly different approach is to use a regex pattern.
val str = "my long string to test"
val purealpha = "[ a-z]+"
str matches purealpha // res0: Boolean = true

If we look at the source code we can see that both these implementations are doing different things, although giving the same result.
When you are converting it to a Set and using the forAll, you are ultimately calling the apply method for the set. Here is how the apply is called explicitly in your code, also using named parameters in the anonymous functions:
if (str.forall(s => purestring.apply(s))) println("PURE") else "NOTPURE" // first example
str.forall(s => purealpha.apply(s)) // second example
Anyway, let's take a look at the source code for apply for Set (gotten from GenSetLike.scala):
/** Tests if some element is contained in this set.
*
* This method is equivalent to `contains`. It allows sets to be interpreted as predicates.
* #param elem the element to test for membership.
* #return `true` if `elem` is contained in this set, `false` otherwise.
*/
def apply(elem: A): Boolean = this contains elem
When you leave the String literal, you have to specifically call the .contains (this is the source code for that gotten from SeqLike.scala):
/** Tests whether this $coll contains a given value as an element.
* $mayNotTerminateInf
*
* #param elem the element to test.
* #return `true` if this $coll has an element that is equal (as
* determined by `==`) to `elem`, `false` otherwise.
*/
def contains[A1 >: A](elem: A1): Boolean = exists (_ == elem)
As you can imagine, doing an apply for the String literal will not give the same result as doing an apply for a Set.
A suggestion on having more conciseness is to omit the (_) entirely in the second example (compiler type inference will pick that up):
val str = "my long string to test"
val purealpha = " abcdefghijklmnopqrstuvwxyz" // not converted toSet
str.forall(purealpha.contains)

Related

Contains of Option[String] in Scala not working as expected?

I just discovered something weird. This statement:
Some("test this").contains("test")
Evaluates to false. While this evaluates to true:
Some("test this").contains("test this")
How does this make sense? I thought the Option would run the contains on the wrapped object if possible.
EDIT:
I'm also thinking about this from a code readability perspective. Imagine you are seeing this code:
person.name.contains("Roger")
Must name be equal to Roger? Or can it contain Roger? The behavior depends if it's a String or Option[String].
There's a principle in typed functional programming called "parametric reasoning". Broadly stated, the principle is that it's desirable to be able to have intuitions about what a function does just from looking at its type signature.
If we "devirtualize" (effectively turning it into a static method... this is actually a fairly common optimization step in object-oriented runtimes) Option's contains method has the signature:
def contains[A, A1 >: A](opt: Option[A], elem: A1): Boolean
That is, it takes an Option[A] and an A1 (where A1 is a supertype of A, if it's not an A) and returns a Boolean. Implicitly in Scala's typesystem, of course, we know that A and A1 are both subtypes of Any.
Without knowing anything more about what the types A and A1 are (A might be String and A1 might be AnyRef, or A and A1 might both be Int: whatever our intuition, it has to apply as much in either situation), what could we possibly do? We're basically limited to combinations of operations involving an Option[Any] and an Any which eventually get us to a Boolean (and, ideally, won't throw an exception).
For instance, opt.nonEmpty && opt.get == elem works: we can always call nonEmpty on an Option[Any] and then compare the contents using equality. We could also do something like opt.isEmpty || (opt.get.## % 43) == (elem.## % 57), but knowing that the contents of the Option and some other object have equal remainders in two different bases doesn't strike one as useful.
Note that in your specific case, because there's no contains method on an Any. What should the behavior be if we have an Option[Int]?
It might actually be useful, since we do have the ability to convert arbitrary objects into Strings via the toString method (thank you Java!), to implement a containsSubstring method on Option[A]:
def containsSubstring(substring: String): Boolean =
nonEmpty && get.toString.contains(substring)
You could implement an enrichment class along these lines:
object Enrichments {
implicit class OptionOps[A](opt: Option[A]) extends AnyVal {
def containsSubstring(substring: String): Boolean =
opt.nonEmpty && opt.get.toString.contains(substring)
}
}
then you only need:
import Enrichments.OptionOps
Some("test this").containsSubstring("test") // evaluates true
case class Person(name: Option[String], age: Int)
// Option(p).containsSubstring("Roger") would also work, assuming Person doesn't override toString...
def isRoger(p: Person): Boolean = p.name.containsSubstring("Roger")
I recommend to check the docs and eventually the code of the API that you are you using. The docs detail what Option's contains does and how it works:
/** Tests whether the option contains a given value as an element.
*
* This is equivalent to:
*
* option match {
* case Some(x) => x == elem
* case None => false
* }
*
* // Returns true because Some instance contains string "something" which equals "something".
* Some("something") contains "something"
*
* // Returns false because "something" != "anything".
* Some("something") contains "anything"
*
* // Returns false when method called on None.
* None contains "anything"
*
* #param elem the element to test.
* #return `true` if the option has an element that is equal (as
* determined by `==`) to `elem`, `false` otherwise.
*/
final def contains[A1 >: A](elem: A1): Boolean =
!isEmpty && this.get == elem

Scala : StringBuilder class (append vs ++=) performance

I was using '++=' in scala for combining the string values using StringBuilder class instance.
StringBuilder class also provides append method that takes string parameter to combine string values.
I could see both the methods in the scala stringbuilder class documentation here .
I was told that ++= is slower than append while combining multiple string values but I am not able to find any documentation that says it would be slower.
If anyone can give some link to documentation or explanation on why ++= is slower than append that would help me understand the concept better.
Operation that I am performing or the code is as below:
val sourceAlias = "source"
val destinationAlias = "destination"
val compositeKeys = Array("Id","Name")
val initialUpdateExpression = new StringBuilder("")
for (key <- compositeKeys) {
initialUpdateExpression ++= s"$sourceAlias.$key = $destinationAlias.$key and "
}
initialUpdateExpression ++= s"$sourceAlias.Valid = $destinationAlias.Valid"
val updateExpression = initialUpdateExpression.toString()
They seem to be the same, as far as I can tell directly from the code in StringBuider.scala for Scala 2.13.8.
++= looks like this:
/** Alias for `addAll` */
def ++= (s: String): this.type = addAll(s)
which calls:
/** Overloaded version of `addAll` that takes a string */
def addAll(s: String): this.type = { underlying.append(s); this }
That in the end, calls:
#Override
#HotSpotIntrinsicCandidate
public StringBuilder append(String str) {
super.append(str);
return this;
}
Whereas, append is overloaded, but assuming you want the String version:
/** Appends the given String to this sequence.
*
* #param s a String.
* #return this StringBuilder.
*/
def append(s: String): StringBuilder = {
underlying append s
this
}
Which in turn, calls:
#Override
#HotSpotIntrinsicCandidate
public StringBuilder append(String str) {
super.append(str);
return this;
}
Which is exactly the same code, as ++= calls. So no, they should have the same performance for your particular use case. I also tried decompiling an example with both method calls, and I did not see any difference between them.
EDIT:
Perhaps you might been told about better performance when combining multiple string values in the same append versus using String concatenation +. For example:
sb.append("some" + "thing")
sb.append("some").append("thing")
The second line is slightly more efficient, since in the first one you create an additional String and an additional unnamed StringBuilder. If this is the case, check this post for clarification on this matter.
There is no difference. But you should not be using StringBuilder to begin with. Mutable structures and vars are evil, you should pretend they do not exist at all, at least until you acquire enough command of the language to be able to identify the 0.1% of use cases where they are actually necessary.
val updateExpression = Seq("Id", "Name", "Valid")
.map { key => s"source.$key = destination.$key") }
.mkString(" and ")

Scala - Collection comparison - Why is Set(1) == ListSet(1)?

Why is the output of this comparison outputs true?
import scala.collection.immutable.ListSet
Set(1) == ListSet(1) // Expect false
//Output
res0: Boolean = true
And in a more general sense, how the comparison is actually done?
Since the inheritance chain Set <: GenSet <: GenSetLike is a bit lengthy, it might be not immediately obvious where to look for the code of equals, so I thought maybe I quote it here:
GenSetLike.scala:
/** Compares this set with another object for equality.
*
* '''Note:''' This operation contains an unchecked cast: if `that`
* is a set, it will assume with an unchecked cast
* that it has the same element type as this set.
* Any subsequent ClassCastException is treated as a `false` result.
* #param that the other object
* #return `true` if `that` is a set which contains the same elements
* as this set.
*/
override def equals(that: Any): Boolean = that match {
case that: GenSet[_] =>
(this eq that) ||
(that canEqual this) &&
(this.size == that.size) &&
(try this subsetOf that.asInstanceOf[GenSet[A]]
catch { case ex: ClassCastException => false })
case _ =>
false
}
Essentially, it checks whether the other object is also a GenSet, and if yes, it attempts to perform some fail-fast checks (like comparing size and invoking canEqual), and if the sizes are equal, it checks whether this set is a subset of another set, presumably by checking each element.
So, the exact class used to represent the set at runtime is irrelevant, what matters is that the compared object is also a GenSet and has the same elements.
From Scala collections equality:
The collection libraries have a uniform approach to equality and hashing. The idea is, first, to divide collections into sets, maps, and sequences.
...
On the other hand, within the same category, collections are equal if and only if they have the same elements
In your case, both collections are considered sets and they contain the same elements, hence, they're equal.
Scala 2.12.8 documentation:
This class implements immutable sets using a list-based data
structure.
So ListSet is a set too but with concrete (list-based) implementation.

String interpolation in Scala 2.10 - How to interpolate a String variable?

String interpolation is available in Scala starting Scala 2.10
This is the basic example
val name = "World" //> name : String = World
val message = s"Hello $name" //> message : String = Hello World
I was wondering if there is a way to do dynamic interpolation, e.g. the following (doesn't compile, just for illustration purposes)
val name = "World" //> name : String = World
val template = "Hello $name" //> template : String = Hello $name
//just for illustration:
val message = s(template) //> doesn't compile (not found: value s)
Is there a way to "dynamically" evaluate a String like that? (or is it inherently wrong / not possible)
And what is s exactly? it's not a method def (apparently it is a method on StringContext), and not an object (if it was, it would have thrown a different compile error than not found I think)
s is actually a method on StringContext (or something which can be implicitly converted from StringContext). When you write
whatever"Here is text $identifier and more text"
the compiler desugars it into
StringContext("Here is text ", " and more text").whatever(identifier)
By default, StringContext gives you s, f, and raw* methods.
As you can see, the compiler itself picks out the name and gives it to the method. Since this happens at compile time, you can't sensibly do it dynamically--the compiler doesn't have information about variable names at runtime.
You can use vars, however, so you can swap in values that you want. And the default s method just calls toString (as you'd expect) so you can play games like
class PrintCounter {
var i = 0
override def toString = { val ans = i.toString; i += 1; ans }
}
val pc = new PrintCounter
def pr[A](a: A) { println(s"$pc: $a") }
scala> List("salmon","herring").foreach(pr)
1: salmon
2: herring
(0 was already called by the REPL in this example).
That's about the best you can do.
*raw is broken and isn't slated to be fixed until 2.10.1; only text before a variable is actually raw (no escape processing). So hold off on using that one until 2.10.1 is out, or look at the source code and define your own. By default, there is no escape processing, so defining your own is pretty easy.
Here is a possible solution to #1 in the context of the original question based on Rex's excellent answer
val name = "World" //> name: String = World
val template = name=>s"Hello $name" //> template: Seq[Any]=>String = <function1>
val message = template(name) //> message: String = Hello World
String interpolation happens at compile time, so the compiler does not generally have enough information to interpolate s(str). It expects a string literal, according to the SIP.
Under Advanced Usage in the documentation you linked, it is explained that an expression of the form id"Hello $name ." is translated at compile time to new StringContext("Hello", "."). id(name).
Note that id can be a user-defined interpolator introduced through an implicit class. The documentation gives an example for a json interpolator,
implicit class JsonHelper(val sc: StringContext) extends AnyVal {
def json(args: Any*): JSONObject = {
...
}
}
This is inherently impossible in the current implementation: local variable names are not available at execution time -- may be kept around as debug symbols, but can also have been stripped. (Member variable names are, but that's not what you're describing here).

Can anyone explain how the symbol "=>" is used in Scala

I've read a lot of code snippets in scala that make use of the symbol =>, but I've never really been able to comprehend it. I've tried to search in the internet, but couldn't find anything comprehensive. Any pointers/explanation about how the symbol is/can be used will be really helpful.
(More specifially, I also want to know how the operator comes into picture in function literals)
More than passing values/names, => is used to define a function literal, which is an alternate syntax used to define a function.
Example time. Let's say you have a function that takes in another function. The collections are full of them, but we'll pick filter. filter, when used on a collection (like a List), will take out any element that causes the function you provide to return false.
val people = List("Bill Nye", "Mister Rogers", "Mohandas Karamchand Gandhi", "Jesus", "Superman", "The newspaper guy")
// Let's only grab people who have short names (less than 10 characters)
val shortNamedPeople = people.filter(<a function>)
We could pass in an actual function from somewhere else (def isShortName(name: String): Boolean, perhaps), but it would be nicer to just place it right there. Alas, we can, with function literals.
val shortNamedPeople = people.filter( name => name.length < 10 )
What we did here is create a function that takes in a String (since people is of type List[String]), and returns a Boolean. Pretty cool, right?
This syntax is used in many contexts. Let's say you want to write a function that takes in another function. This other function should take in a String, and return an Int.
def myFunction(f: String => Int): Int = {
val myString = "Hello!"
f(myString)
}
// And let's use it. First way:
def anotherFunction(a: String): Int = {
a.length
}
myFunction(anotherFunction)
// Second way:
myFunction((a: String) => a.length)
That's what function literals are. Going back to by-name and by-value, there's a trick where you can force a parameter to not be evaluated until you want to. The classic example:
def logger(message: String) = {
if(loggingActivated) println(message)
}
This looks alright, but message is actually evaluated when logger is called. What if message takes a while to evaluate? For example, logger(veryLongProcess()), where veryLongProcess() returns a String. Whoops? Not really. We can use our knowledge about function literals to force veryLongProcess() not to be called until it is actually needed.
def logger(message: => String) = {
if(loggingActivated) println(message)
}
logger(veryLongProcess()) // Fixed!
logger is now taking in a function that takes no parameters (hence the naked => on the left side). You can still use it as before, but now, message is only evaluated when it's used (in the println).