Accessing values in an Nx2 matrix of Strings in Scala - scala

I have the following:
var data = Array[Array[String]]()
data :+= Array("item1", "")
data :+= Array("item2", "")
data :+= Array("item3", "")
I would like to somehow access the second element (the value lets say) of a certain "key" in data. I have to do this right now:
data(2)(1) = "test"
But I would like to do it without having to worry about indexes, something where I can just call the "key" and modify the value, like data("item1") = "test".
Unfornately I do have to use this matrix Array of Strings, which is obviously not optimal, but this is what I have to work with. How do I do this with my datastructure?
Also if I was to change datastructure, what Scala datastrucure would be the best? (I am considering changing it, but unlikely)

Use Map
val items = mutable.Map("foo" -> "bar", "bar" -> "bat")
items.update("foo", "baz")
If you have to stick with array, something like this will work
data.find(_.head == "foo").foreach { _(1) = "baz" }
You can make it look like a function call with a little implicit trickery:
object ArrayAsMap {
implicit class Wrapper(val it:Array[String]) extends AnyVal {
def <<-(foo: String) = it(1) = foo
}
implicit class Wrapper2(val it: Array[Array[String]]) extends AnyVal {
def apply(s: String) = Wrapper(it.find(_.head == s).get)
}
implicit def toarray(w: Wrapper) = w.it
}
Now, you can write that assignment like:
import ArrayAsMap._
data("foo") <<- "bar"
A few things about this:
the approach you are using is really bad: not only you have to scan the entire array to find the key every time, you are also making implicit assumptions about the contents of the (outer) array, which is especially bad because it is mutable. Something like this data("foo") = Array.empty will cause that code to crash badly at runtime.
you should avoid using mutable state (var) and mutable containers (like Array or mutable.Map ... Arrays are kinda inevitable legacy from java, but you should not normally mutate them). Code that uses immutable and referentially transparent statements is much easier to read, maintain and reason about, and much less error-prone. 99% of real-life use cases in scala do not require mutable state if implemented correctly. So, it might be a good idea for you to just pretend that vars and mutable containers do not exist at all, and you cannot change any value once it is assigned, until you have learned enough scala to be able to tell that 1% of cases when mutation is really necessary.

Related

How to design abstract classes if methods don't have the exact same signature?

This is a "real life" OO design question. I am working with Scala, and interested in specific Scala solutions, but I'm definitely open to hear generic thoughts.
I am implementing a branch-and-bound combinatorial optimization program. The algorithm itself is pretty easy to implement. For each different problem we just need to implement a class that contains information about what are the allowed neighbor states for the search, how to calculate the cost, and then potentially what is the lower bound, etc...
I also want to be able to experiment with different data structures. For instance, one way to store a logic formula is using a simple list of lists of integers. This represents a set of clauses, each integer a literal. We can have a much better performance though if we do something like a "two-literal watch list", and store some extra information about the formula in general.
That all would mean something like this
object BnBSolver[S<:BnBState]{
def solve(states: Seq[S], best_state:Option[S]): Option[S] = if (states.isEmpty) best_state else
val next_state = states.head
/* compare to best state, etc... */
val new_states = new_branches ++ states.tail
solve(new_states, new_best_state)
}
class BnBState[F<:Formula](clauses:F, assigned_variables) {
def cost: Int
def branches: Seq[BnBState] = {
val ll = clauses.pick_variable
List(
BnBState(clauses.assign(ll), ll :: assigned_variables),
BnBState(clauses.assign(-ll), -ll :: assigned_variables)
)
}
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Int) :F =
Formula(clauses.filterNot(_ contains ll)
.map(_.filterNot(_==-ll))))
}
Hopefully this is not too crazy, wrong or confusing. The whole issue here is that this assign method from a formula would usually take just the current literal that is going to be assigned. In the case of two-literal watch lists, though, you are doing some lazy thing that requires you to know later what literals have been previously assigned.
One way to fix this is you just keep this list of previously assigned literals in the data structure, maybe as a private thing. Make it a self-standing lazy data structure. But this list of the previous assignments is actually something that may be naturally available by whoever is using the Formula class. So it makes sense to allow whoever is using it to just provide the list every time you assign, if necessary.
The problem here is that we cannot now have an abstract Formula class that just declares a assign(ll:Int):Formula. In the normal case this is OK, but if this is a two-literal watch list Formula, it is actually an assign(literal: Int, previous_assignments: Seq[Int]).
From the point of view of the classes using it, it is kind of OK. But then how do we write generic code that can take all these different versions of Formula? Because of the drastic signature change, it cannot simply be an abstract method. We could maybe force the user to always provide the full assigned variables, but then this is a kind of a lie too. What to do?
The idea is the watch list class just becomes a kind of regular assign(Int) class if I write down some kind of adapter method that knows where to take the previous assignments from... I am thinking maybe with implicit we can cook something up.
I'll try to make my answer a bit general, since I'm not convinced I'm completely following what you are trying to do. Anyway...
Generally, the first thought should be to accept a common super-class as a parameter. Obviously that won't work with Int and Seq[Int].
You could just have two methods; have one call the other. For instance just wrap an Int into a Seq[Int] with one element and pass that to the other method.
You can also wrap the parameter in some custom class, e.g.
class Assignment {
...
}
def int2Assignment(n: Int): Assignment = ...
def seq2Assignment(s: Seq[Int]): Assignment = ...
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Assignment) :F = ...
}
And of course you would have the option to make those conversion methods implicit so that callers just have to import them, not call them explicitly.
Lastly, you could do this with a typeclass:
trait Assigner[A] {
...
}
implicit val intAssigner = new Assigner[Int] {
...
}
implicit val seqAssigner = new Assigner[Seq[Int]] {
...
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign[A : Assigner](ll: A) :F = ...
}
You could also make that type parameter at the class level:
case class Formula[A:Assigner,F<:Formula[A,F]](clauses:List[List[Int]]) {
def assign(ll: A) :F = ...
}
Which one of these paths is best is up to preference and how it might fit in with the rest of the code.

How to efficiently select a random element from a Scala immutable HashSet

I have a scala.collection.immutable.HashSet that I want to randomly select an element from.
I could solve the problem with an extension method like this:
implicit class HashSetExtensions[T](h: HashSet[T]) {
def nextRandomElement (): Option[T] = {
val list = h.toList
list match {
case null | Nil => None
case _ => Some (list (Random.nextInt (list.length)))
}
}
}
...but converting to a list will be slow. What would be the most efficient solution?
WARNING This answer is for experimental use only. For real project you probably should use your own collection types.
So i did some research in the HashSet source and i think there is little opportunity to someway extract the inner structure of most valuable class HashTrieSet without package violation.
I did come up with this code, which is extended Ben Reich's solution:
package scala.collection
import scala.collection.immutable.HashSet
import scala.util.Random
package object random {
implicit class HashSetRandom[T](set: HashSet[T]) {
def randomElem: Option[T] = set match {
case trie: HashSet.HashTrieSet[T] => {
trie.elems(Random.nextInt(trie.elems.length)).randomElem
}
case _ => Some(set.size) collect {
case size if size > 0 => set.iterator.drop(Random.nextInt(size)).next
}
}
}
}
file should be created somewhere in the src/scala/collection/random folder
note the scala.collection package - this thing makes the elems part of HashTrieSet visible. This is only solution i could think, which could run better than O(n). Current version should have complexity O(ln(n)) as any of immutable.HashSet's operation s.
Another warning - private structure of HashSet is not part of scala's standard library API, so it could change any version making this code erroneous (though it's didn't changed since 2.8)
Since size is O(1) on HashSet, and iterator is as lazy as possible, I think this solution would be relatively efficient:
implicit class RichHashSet[T](val h: HashSet[T]) extends AnyVal {
def nextRandom: Option[T] = Some(h.size) collect {
case size if size > 0 => h.iterator.drop(Random.nextInt(size)).next
}
}
And if you're trying to get every ounce of efficiency you could use match here instead of the more concise Some/collect idiom used here.
You can look at the mutable HashSet implementation to see the size method. The iterator method defined there basically just calls iterator on FlatHashTable. The same basic efficiencies of these methods apply to immutable HashSet if that's what you're working with. As a comparison, you can see the toList implementation on HashSet is all the way up the type hierarchy at TraversableOnce and uses far more primitive elements which are probably less efficient and (of course) the entire collection must be iterated to generate the List. If you were going to convert the entire set to a Traversable collection, you should use Array or Vector which have constant-time lookup.
You might also note that there is nothing special about HashSet in the above methods, and you could enrich Set[T] instead, if you so chose (although there would be no guarantee that this would be as efficient on other Set implementations, of course).
As a side note, when implementing enriched classes for extension methods, you should always consider making an implicit, user-defined value class by extending AnyVal. You can read about some of the advantages and limitations in the docs, and on this answer.

Scala: Thread safe mutable lazy Iterator with append

For an immutable flavour, Iterator does the job.
val x = Iterator.fill(100000)(someFn)
Now I want to implement a mutable version of Iterator, with three guarantees:
thread-safe on all transformations(fold, foldLeft, ..) and append
lazy evaluated
traversable only once! Once used, an object from this Iterator should be destroyed.
Is there an existing implementation to give me these guarantees? Any library or framework example would be great.
Update
To illustrate the desired behaviour.
class SomeThing {}
class Test(val list: Iterator[SomeThing]) {
def add(thing: SomeThing): Test = {
new Test(list ++ Iterator(thing))
}
}
(new Test()).add(new SomeThing).add(new SomeThing);
In this example, SomeThing is an expensive construct, it needs to be lazy.
Re-iterating over list is never required, Iterator is a good fit.
This is supposed to asynchronously and lazily sequence 10 million SomeThing instances without depleting the executor(a cached thread pool executor) or running out of memory.
You don't need a mutable Iterator for this, just daisy-chain the immutable form:
class SomeThing {}
case class Test(val list: Iterator[SomeThing]) {
def add(thing: => SomeThing) = Test(list ++ Iterator(thing))
}
(new Test()).add(new SomeThing).add(new SomeThing)
Although you don't really need the extra boilerplate of Test here:
Iterator(new SomeThing) ++ Iterator(new SomeThing)
Note that Iterator.++ takes a by-name param, so the ++ operation is already lazy.
You might also want to try this, to avoid building intermediate Iterators:
Iterator.continually(new SomeThing) take 2
UPDATE
If you don't know the size in advance, then I'll often use a tactic like this:
def mkSomething = if(cond) Some(new Something) else None
Iterator.continually(mkSomething) takeWhile (_.isDefined) map { _.get }
The trick is to have your generator function wrap its output in an Option, which then gives you a way to flag that the iteration is finished by returning None
Of course... If you're really pushing out the boat, you can even use the dreaded null:
def mkSomething = if(cond) { new Something } else null
Iterator.continually(mkSomething) takeWhile (_ != null)
Seems like you need to hide the fact that the iterator is mutable but at the same time allow it to grow mutably. What I'm going to propose is the same sort of trick I've used to speed up ::: in the past:
abstract class AppendableIterator[A] extends Iterator[A]{
protected var inner: Iterator[A]
def hasNext = inner.hasNext
def next() = inner next ()
def append(that: Iterator[A]) = synchronized{
inner = new JoinedIterator(inner, that)
}
}
//You might need to add some more things, this is a skeleton
class JoinedIterator[A](first: Iterator[A], second: Iterator[A]) extends Iterator[A]{
def hasNext = first.hasNext || second.hasNext
def next() = if(first.hasNext) first next () else if(second.hasNext) second next () else Iterator.next()
}
So what you're really doing is leaving the Iterator at whatever place in its iteration you might have it while still preserving the thread safety of the append by "joining" another Iterator in non-destructively. You avoid the need to recompute the two together because you never actually force them through a CanBuildFrom.
This is also a generalization of just adding one item. You can always wrap some A in an Iterator[A] of one element if you so choose.
Have you looked at the mutable.ParIterable in the collection.parallel package?
To access an iterator over elements you can do something like
val x = ParIterable.fill(100000)(someFn).iterator
From the docs:
Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially.
...
The higher-order functions passed to certain operations may contain side-effects. Since implementations of bulk operations may not be sequential, this means that side-effects may not be predictable and may produce data-races, deadlocks or invalidation of state if care is not taken. It is up to the programmer to either avoid using side-effects or to use some form of synchronization when accessing mutable data.

What are good examples of: "operation of a program should map input values to output values rather than change data in place"

I came across this sentence in Scala in explaining its functional behavior.
operation of a program should map input of values to output values rather than change data in place
Could somebody explain it with a good example?
Edit: Please explain or give example for the above sentence in its context, please do not make it complicate to get more confusion
The most obvious pattern that this is referring to is the difference between how you would write code which uses collections in Java when compared with Scala. If you were writing scala but in the idiom of Java, then you would be working with collections by mutating data in place. The idiomatic scala code to do the same would favour the mapping of input values to output values.
Let's have a look at a few things you might want to do to a collection:
Filtering
In Java, if I have a List<Trade> and I am only interested in those trades executed with Deutsche Bank, I might do something like:
for (Iterator<Trade> it = trades.iterator(); it.hasNext();) {
Trade t = it.next();
if (t.getCounterparty() != DEUTSCHE_BANK) it.remove(); // MUTATION
}
Following this loop, my trades collection only contains the relevant trades. But, I have achieved this using mutation - a careless programmer could easily have missed that trades was an input parameter, an instance variable, or is used elsewhere in the method. As such, it is quite possible their code is now broken. Furthermore, such code is extremely brittle for refactoring for this same reason; a programmer wishing to refactor a piece of code must be very careful to not let mutated collections escape the scope in which they are intended to be used and, vice-versa, that they don't accidentally use an un-mutated collection where they should have used a mutated one.
Compare with Scala:
val db = trades filter (_.counterparty == DeutscheBank) //MAPPING INPUT TO OUTPUT
This creates a new collection! It doesn't affect anyone who is looking at trades and is inherently safer.
Mapping
Suppose I have a List<Trade> and I want to get a Set<Stock> for the unique stocks which I have been trading. Again, the idiom in Java is to create a collection and mutate it.
Set<Stock> stocks = new HashSet<Stock>();
for (Trade t : trades) stocks.add(t.getStock()); //MUTATION
Using scala the correct thing to do is to map the input collection and then convert to a set:
val stocks = (trades map (_.stock)).toSet //MAPPING INPUT TO OUTPUT
Or, if we are concerned about performance:
(trades.view map (_.stock)).toSet
(trades.iterator map (_.stock)).toSet
What are the advantages here? Well:
My code can never observe a partially-constructed result
The application of a function A => B to a Coll[A] to get a Coll[B] is clearer.
Accumulating
Again, in Java the idiom has to be mutation. Suppose we are trying to sum the decimal quantities of the trades we have done:
BigDecimal sum = BigDecimal.ZERO
for (Trade t : trades) {
sum.add(t.getQuantity()); //MUTATION
}
Again, we must be very careful not to accidentally observe a partially-constructed result! In scala, we can do this in a single expression:
val sum = (0 /: trades)(_ + _.quantity) //MAPPING INTO TO OUTPUT
Or the various other forms:
(trades.foldLeft(0)(_ + _.quantity)
(trades.iterator map (_.quantity)).sum
(trades.view map (_.quantity)).sum
Oh, by the way, there is a bug in the Java implementation! Did you spot it?
I'd say it's the difference between:
var counter = 0
def updateCounter(toAdd: Int): Unit = {
counter += toAdd
}
updateCounter(8)
println(counter)
and:
val originalValue = 0
def addToValue(value: Int, toAdd: Int): Int = value + toAdd
val firstNewResult = addToValue(originalValue, 8)
println(firstNewResult)
This is a gross over simplification but fuller examples are things like using a foldLeft to build up a result rather than doing the hard work yourself: foldLeft example
What it means is that if you write pure functions like this you always get the same output from the same input, and there are no side effects, which makes it easier to reason about your programs and ensure that they are correct.
so for example the function:
def times2(x:Int) = x*2
is pure, while
def add5ToList(xs: MutableList[Int]) {
xs += 5
}
is impure because it edits data in place as a side effect. This is a problem because that same list could be in use elsewhere in the the program and now we can't guarantee the behaviour because it has changed.
A pure version would use immutable lists and return a new list
def add5ToList(xs: List[Int]) = {
5::xs
}
There are plenty examples with collections, which are easy to come by but might give the wrong impression. This concept works at all levels of the language (it doesn't at the VM level, however). One example is the case classes. Consider these two alternatives:
// Java-style
class Person(initialName: String, initialAge: Int) {
def this(initialName: String) = this(initialName, 0)
private var name = initialName
private var age = initialAge
def getName = name
def getAge = age
def setName(newName: String) { name = newName }
def setAge(newAge: Int) { age = newAge }
}
val employee = new Person("John")
employee.setAge(40) // we changed the object
// Scala-style
case class Person(name: String, age: Int) {
def this(name: String) = this(name, 0)
}
val employee = new Person("John")
val employeeWithAge = employee.copy(age = 40) // employee still exists!
This concept is applied on the construction of the immutable collection themselves: a List never changes. Instead, new List objects are created when necessary. Use of persistent data structures reduce the copying that would happen on a mutable data structure.

Pros and Cons of choosing def over val

I'm asking a slight different question than this one. Suppose I have a code snippet:
def foo(i : Int) : List[String] = {
val s = i.toString + "!" //using val
s :: Nil
}
This is functionally equivalent to the following:
def foo(i : Int) : List[String] = {
def s = i.toString + "!" //using def
s :: Nil
}
Why would I choose one over the other? Obviously I would assume the second has a slight disadvantages in:
creating more bytecode (the inner def is lifted to a method in the class)
a runtime performance overhead of invoking a method over accessing a value
non-strict evaluation means I could easily access s twice (i.e. unnecesasarily redo a calculation)
The only advantage I can think of is:
non-strict evaluation of s means it is only called if it is used (but then I could just use a lazy val)
What are peoples' thoughts here? Is there a significant dis-benefit to me making all inner vals defs?
1)
One answer I didn't see mentioned is that the stack frame for the method you're describing could actually be smaller. Each val you declare will occupy a slot on the JVM stack, however, the whenever you use a def obtained value it will get consumed in the first expression you use it in. Even if the def references something from the environment, the compiler will pass .
The HotSpot should optimize both these things, or so some people claim. See:
http://www.ibm.com/developerworks/library/j-jtp12214/
Since the inner method gets compiled into a regular private method behind the scene and it is usually very small, the JIT compiler might choose to inline it and then optimize it. This could save time allocating smaller stack frames (?), or, by having fewer elements on the stack, make local variables access quicker.
But, take this with a (big) grain of salt - I haven't actually made extensive benchmarks to backup this claim.
2)
In addition, to expand on Kevin's valid reply, the stable val provides also means that you can use it with path dependent types - something you can't do with a def, since the compiler doesn't check its purity.
3)
For another reason you might want to use a def, see a related question asked not so long ago:
Functional processing of Scala streams without OutOfMemory errors
Essentially, using defs to produce Streams ensures that there do not exist additional references to these objects, which is important for the GC. Since Streams are lazy anyway, the overhead of creating them is probably negligible even if you have multiple defs.
The val is strict, it's given a value as soon as you define the thing.
Internally, the compiler will mark it as STABLE, equivalent to final in Java. This should allow the JVM to make all sorts of optimisations - I just don't know what they are :)
I can see an advantage in the fact that you are less bound to a location when using a def than when using a val.
This is not a technical advantage but allows for better structuring in some cases.
So, stupid example (please edit this answer, if you’ve got a better one), this is not possible with val:
def foo(i : Int) : List[String] = {
def ret = s :: Nil
def s = i.toString + "!"
ret
}
There may be cases where this is important or just convenient.
(So, basically, you can achieve the same with lazy val but, if only called at most once, it will probably be faster than a lazy val.)
For a local declaration like this (with no arguments, evaluated precisely once and with no code evaluated between the point of declaration and the point of evaluation) there is no semantic difference. I wouldn't be surprised if the "val" version compiled to simpler and more efficient code than the "def" version, but you would have to examine the bytecode and possibly profile to be sure.
In your example I would use a val. I think the val/def choice is more meaningful when declaring class members:
class A { def a0 = "a"; def a1 = "a" }
class B extends A {
var c = 0
override def a0 = { c += 1; "a" + c }
override val a1 = "b"
}
In the base class using def allows the sub class to override with possibly a def that does not return a constant. Or it could override with a val. So that gives more flexibility than a val.
Edit: one more use case of using def over val is when an abstract class has a "val" for which the value should be provided by a subclass.
abstract class C { def f: SomeObject }
new C { val f = new SomeObject(...) }