In scala, how do I get access to specific index in tuple? - scala

I am implementing function that gets random index and returns the element at random index of tuple.
I know that for tuple like, val a=(1,2,3) a._1=2
However, when I use random index val index=random_index(integer that is smaller than size of tuple), a._index doesnt work.

You can use productElement, note that it is zero based and has return type of Any:
val a=(1,2,3)
a.productElement(1) // returns 2nd element

If you know random_index only at runtime the best what you can have is (as #GuruStron answered)
val a = (1,2,3)
val i = 1
val x = a.productElement(i)
x: Any // 2
If you know random_index at compile time you can do
import shapeless.syntax.std.tuple._
val a = (1,2,3)
val x = a(1)
x: Int // 2 // not just Any
// a(4) // doesn't compile
val i = 1
// a(i) // doesn't compile
https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#hlist-style-operations-on-standard-scala-tuples
Although this a(1) seems to be pretty similar to standard a._1.

Related

Broadcasting of scala options on sub-elements [duplicate]

I am new to scala, please help me with the below question.
Can we call map method on an Option? (e.g. Option[Int].map()?).
If yes then could you help me with an example?
Here's a simple example:
val x = Option(5)
val y = x.map(_ + 10)
println(y)
This will result in Some(15).
If x were None instead, y would also be None.
Yes:
val someInt = Some (2)
val noneInt:Option[Int] = None
val someIntRes = someInt.map (_ * 2) //Some (4)
val noneIntRes = noneInt.map (_ * 2) //None
See docs
You can view an option as a collection that contains exactly 0 or 1 items. Mapp over the collection gives you a container with the same number of items, with the result of applying the mapping function to every item in the original.
Sometimes it's more convenient to use fold instead of mapping Options. Consider the example:
scala> def printSome(some: Option[String]) = some.fold(println("Nothing provided"))(println)
printSome: (some: Option[String])Unit
scala> printSome(Some("Hi there!"))
Hi there!
scala> printSome(None)
Nothing provided
You can easily proceed with the real value inside fold, e.g. map it or do whatever you want, and you're safe with the default fold option which is triggered on Option#isEmpty.

Adding Sparse Vectors 3.0.0 Apache Spark Scala

I am trying to create a function as the following to add
two org.apache.spark.ml.linalg.Vector. or i.e two sparse vectors
This vector could look as the following
(28,[1,2,3,4,7,11,12,13,14,15,17,20,22,23,24,25],[0.13028398104008743,0.23648605632753023,0.7094581689825907,0.13028398104008743,0.23648605632753023,0.0,0.14218861229025295,0.3580566057240087,0.14218861229025295,0.13028398104008743,0.26056796208017485,0.0,0.14218861229025295,0.06514199052004371,0.13028398104008743,0.23648605632753023])
For e.g.
def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector): org.apache.spark.ml.linalg.Vector = {
}
Let's look at a use case
val x = Vectors.sparse(2, List(0), List(1)) // [1, 0]
val y = Vectors.sparse(2, List(1), List(1)) // [0, 1]
I want to output to be
Vectors.sparse(2, List(0,1), List(1,1))
Here's another case where they share the same indices
val x = Vectors.sparse(2, List(1), List(1))
val y = Vectors.sparse(2, List(1), List(1))
This output should be
Vectors.sparse(2, List(1), List(2))
I've realized doing this is harder than it seems. I looked into one possible solution of converting the vectors into breeze, adding them in breeze and then converting it back to a vector. e.g Addition of two RDD[mllib.linalg.Vector]'s. So I tried implementing this.
def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector) ={
val dense_x = x.toDense
val dense_y = y.toDense
val bv1 = new DenseVector(dense_x.toArray)
val bv2 = new DenseVector(dense_y.toArray)
val vectout = Vectors.dense((bv1 + bv2).toArray)
vectout
}
however this gave me an error in the last line
val vectout = Vectors.dense((bv1 + bv2).toArray)
Cannot resolve the overloaded method 'dense'.
I'm wondering why is error is occurring and ways to fix it?
To answer my own question, I had to think about how sparse vectors are. For e.g. Sparse Vectors require 3 arguments. the number of dimensions, an array of indices, and finally an array of values. For e.g.
val indices: Array[Int] = Array(1,2)
val norms: Array[Double] = Array(0.5,0.3)
val num_int = 4
val vector: Vector = Vectors.sparse(num_int, indices, norms)
If I converted this SparseVector to an Array I would get the following.
code:
val choiced_array = vector.toArray
choiced_array.map(element => print(element + " "))
Output:
[0.0, 0.5,0.3,0.0].
This is considered a more dense representation of it. So once you convert the two vectors to array you can add them with the following code
val add: Array[Double] = (vector.toArray, vector_2.toArray).zipped.map(_ + _)
This gives you another array of them both added. Next to create your new sparse vector, you would want to create an indices array as shown in the construction
var i = -1;
val new_indices_pre = add.map( (element:Double) => {
i = i + 1
if(element > 0.0)
i
else{
-1
}
})
Then lets filter out all -1 indices indication that indicate zero for that indice.
new_indices_pre.filter(element => element != -1)
Remember to filter out none zero values from the array which has the addition of the two vectors.
val final_add = add.filter(element => element > 0.0)
Lastly, we can make the new sparse Vector
Vectors.sparse(num_int,new_indices,final_add)

Create a PriorityQueue that contains triples, and returns the minimum third element in Scala?

I have a Priority Queue in Scala that I define below. My goal is that when I call dequeue I get the triple that has the most minimum third element in that triple. I figured that using Ordering is the way to go, but I cannot seem to get it to work.
import scala.collection.mutable.PriorityQueue
def orderByWeight(lst : (Int, Int, Int)) = lst._3
val pq = new PriorityQueue[(Int, Int, Int)]()(Ordering.by(orderByWeight))
var x = ListBuffer((0,1,2), (0,2,3), (0,3,4))
x.map(i => pq.enqueue(i))
I am confused on what my orderByWeight function should be. For the code above, the desired output if I call pq.dequeue should be (0, 1, 2). Note x is ordered at random. Any ideas?
If you want all the 3-tuples dequeued in order of smalled 3rd element to largest, I think this is all you need.
val pq = PriorityQueue[(Int, Int, Int)]()(Ordering.by(-_._3))
If you need an ordered output in case of 3rd-element ties, you can expand it.
var x = ListBuffer((0,1,2), (0,2,3), (0,3,4), (1,0,2))
val pq = PriorityQueue(x:_*)(Ordering[(Int, Int)].on(x => (-x._3, -x._2)))

Updating a 2d table of counts

Suppose I want a Scala data structure that implements a 2-dimensional table of counts that can change over time (i.e., individual cells in the table can be incremented or decremented). What should I be using to do this?
I could use a 2-dimensional array:
val x = Array.fill[Int](1, 2) = 0
x(1)(2) += 1
But Arrays are mutable, and I guess I should slightly prefer immutable data structures.
So I thought about using a 2-dimensional Vector:
val x = Vector.fill[Int](1, 2) = 0
// how do I update this? I want to write something like val newX : Vector[Vector[Int]] = x.add((1, 2), 1)
// but I'm not sure how
But I'm not sure how to get a new vector with only a single element changed.
What's the best approach?
Best depends on what your criteria are. The simplest immutable variant is to use a map from (Int,Int) to your count:
var c = (for (i <- 0 to 99; j <- 0 to 99) yield (i,j) -> 0).toMap
Then you access your values with c(i,j) and set them with c += ((i,j) -> n); c += ((i,j) -> (c(i,j)+1)) is a little bit annoying, but it's not too bad.
Faster is to use nested Vectors--by about a factor of 2 to 3, depending on whether you tend to re-set the same element over and over or not--but it has an ugly update method:
var v = Vector.fill(100,100)(0)
v(82)(49) // Easy enough
v = v.updated(82, v(82).updated(49, v(82)(49)+1) // Ouch!
Faster yet (by about 2x) is to have only one vector which you index into:
var u = Vector.fill(100*100)(0)
u(82*100 + 49) // Um, you think I can always remember to do this right?
u = u.updated(82*100 + 49, u(82*100 + 49)+1) // Well, that's actually better
If you don't need immutability and your table size isn't going to change, just use an array as you've shown. It's ~200x faster than the fastest vector solution if all you're doing is incrementing and decrementing an integer.
If you want to do this in a very general and functional (but not necessarily performant) way, you can use lenses. Here's an example of how you could use Scalaz 7's implementation, for example:
import scalaz._
def at[A](i: Int): Lens[Seq[A], A] = Lens.lensg(a => a.updated(i, _), (_(i)))
def at2d[A](i: Int, j: Int) = at[Seq[A]](i) andThen at(j)
And a little bit of setup:
val table = Vector.tabulate(3, 4)(_ + _)
def show[A](t: Seq[Seq[A]]) = t.map(_ mkString " ") mkString "\n"
Which gives us:
scala> show(table)
res0: String =
0 1 2 3
1 2 3 4
2 3 4 5
We can use our lens like this:
scala> show(at2d(1, 2).set(table, 9))
res1: String =
0 1 2 3
1 2 9 4
2 3 4 5
Or we can just get the value at a given cell:
scala> val v: Int = at2d(2, 3).get(table)
v: Int = 5
Or do a lot of more complex things, like apply a function to a particular cell:
scala> show(at2d(2, 2).mod(((_: Int) * 2), table))
res8: String =
0 1 2 3
1 2 3 4
2 3 8 5
And so on.
There isn't a built-in method for this, perhaps because it would require the Vector to know that it contains Vectors, or Vectors or Vectors etc, whereas most methods are generic, and it would require a separate method for each number of dimensions, because you need to specify a co-ordinate arg for each dimension.
However, you can add these yourself; the following will take you up to 4D, although you could just add the bits for 2D if that's all you need:
object UpdatableVector {
implicit def vectorToUpdatableVector2[T](v: Vector[Vector[T]]) = new UpdatableVector2(v)
implicit def vectorToUpdatableVector3[T](v: Vector[Vector[Vector[T]]]) = new UpdatableVector3(v)
implicit def vectorToUpdatableVector4[T](v: Vector[Vector[Vector[Vector[T]]]]) = new UpdatableVector4(v)
class UpdatableVector2[T](v: Vector[Vector[T]]) {
def updated2(c1: Int, c2: Int)(newVal: T) =
v.updated(c1, v(c1).updated(c2, newVal))
}
class UpdatableVector3[T](v: Vector[Vector[Vector[T]]]) {
def updated3(c1: Int, c2: Int, c3: Int)(newVal: T) =
v.updated(c1, v(c1).updated2(c2, c3)(newVal))
}
class UpdatableVector4[T](v: Vector[Vector[Vector[Vector[T]]]]) {
def updated4(c1: Int, c2: Int, c3: Int, c4: Int)(newVal: T) =
v.updated(c1, v(c1).updated3(c2, c3, c4)(newVal))
}
}
In Scala 2.10 you don't need the implicit defs and can just add the implicit keyword to the class definitions.
Test:
import UpdatableVector._
val v2 = Vector.fill(2,2)(0)
val r2 = v2.updated2(1,1)(42)
println(r2) // Vector(Vector(0, 0), Vector(0, 42))
val v3 = Vector.fill(2,2,2)(0)
val r3 = v3.updated3(1,1,1)(42)
println(r3) // etc
Hope that's useful.

Why can't i define a variable recursively in a code block?

Why can't i define a variable recursively in a code block?
scala> {
| val test: Stream[Int] = 1 #:: test
| }
<console>:9: error: forward reference extends over definition of value test
val test: Stream[Int] = 1 #:: test
^
scala> val test: Stream[Int] = 1 #:: test
test: Stream[Int] = Stream(1, ?)
lazy keyword solves this problem, but i can't understand why it works without a code block but throws a compilation error in a code block.
Note that in the REPL
scala> val something = "a value"
is evaluated more or less as follows:
object REPL$1 {
val something = "a value"
}
import REPL$1._
So, any val(or def, etc) is a member of an internal REPL helper object.
Now the point is that classes (and objects) allow forward references on their members:
object ForwardTest {
def x = y // val x would also compile but with a more confusing result
val y = 2
}
ForwardTest.x == 2
This is not true for vals inside a block. In a block everything must be defined in linear order. Thus vals are no members anymore but plain variables (or values, resp.). The following does not compile either:
def plainMethod = { // could as well be a simple block
def x = y
val y = 2
x
}
<console>: error: forward reference extends over definition of value y
def x = y
^
It is not recursion which makes the difference. The difference is that classes and objects allow forward references, whereas blocks do not.
I'll add that when you write:
object O {
val x = y
val y = 0
}
You are actually writing this:
object O {
val x = this.y
val y = 0
}
That little this is what is missing when you declare this stuff inside a definition.
The reason for this behavior depends on different val initialization times. If you type val x = 5 directly to the REPL, x becomes a member of an object, which values can be initialized with a default value (null, 0, 0.0, false). In contrast, values in a block can not initialized by default values.
This tends to different behavior:
scala> class X { val x = y+1; val y = 10 }
defined class X
scala> (new X).x
res17: Int = 1
scala> { val x = y+1; val y = 10; x } // compiles only with 2.9.0
res20: Int = 11
In Scala 2.10 the last example does not compile anymore. In 2.9.0 the values are reordered by the compiler to get it to compile. There is a bug report which describes the different initialization times.
I'd like to add that a Scala Worksheet in the Eclipse-based Scala-IDE (v4.0.0) does not behave like the REPL as one might expect (e.g. https://github.com/scala-ide/scala-worksheet/wiki/Getting-Started says "Worksheets are like a REPL session on steroids") in this respect, but rather like the definition of one long method: That is, forward referencing val definitions (including recursive val definitions) in a worksheet must be made members of some object or class.