I would like to write a Matrix class in Scala from that I can instantiate objects like this:
val m1 = new Matrix( (1.,2.,3.),(4.,5.,6.),(7.,8.,9.) )
val m2 = new Matrix( (1.,2.,3.),(4.,5.,6.) )
val m3 = new Matrix( (1.,2.),(3.,4.) )
val m4 = new Matrix( (1.,2.),(3.,4.),(5.,6.) )
I have tried this:
class Matrix(a: (List[Double])* ) { }
but then I get a type mismatch because the matrix rows are not of type List[Double].
Further it would be nice to just have to type Integers (1,2,3) instead of (1.,2.,3.) but still get a double matrix.
How to solve this?
Thanks!
Malte
(1.0, 2.0) is a Tuple2[Double, Double] not a List[Double]. Similarly (1.0, 2.0, 3.0) is a Tuple3[Double, Double, Double].
If you need to handle a fixed number of cardinality, the simplest solution in plain vanilla scala would be to have
class Matrix2(rows: Tuple2[Double, Double]*)
class Matrix3(rows: Tuple3[Double, Double, Double]*)
and so on.
Since there exist an implicit conversion from Int to Double, you can pass a tuple of ints and it will be automatically converted.
new Matrix2((1, 2), (3, 4))
If you instead need to abstract over the row cardinality, enforcing an NxM using types, you would have to resort to some more complex solutions, perhaps using the shapeless library.
Or you can use an actual list, but you cannot restrict the cardinality, i.e. you cannot ensure that all rows have the same length (again, in vanilla scala, shapeless can help)
class Matrix(rows: List[Double]*)
new Matrix(List(1, 2), List(3, 4))
Finally, the 1. literal syntax is deprecated since scala 2.10 and removed in scala 2.11. Use 1.0 instead.
If you need support for very large matrices, consider using an existing implementation like Breeze. Breeze has a DenseMatrix which probably meets your requirements. For performance reasons, Breeze offloads more complex operations into native code.
Getting Matrix algebra right is a difficult exercise and unless you are specifically implementing matrix to learn/assignment, better to go with proven libraries.
Edited based on comment below:
You can consider the following design.
class Row(n : Int*)
class Matrix(rows: Row*) {...}
Usage:
val m = new Matrix(Row(1, 2, 3), Row(2,3,4))
You need to validate that all Rows are of the length and reject the input if they are not.
I have hacked a solution in an - I think - a bit unscala-ish way
class Matrix(values: Object*) { // values is of type WrappedArray[Object]
var arr : Array[Double] = null
val rows : Integer = values.size
var cols : Integer = 0
var _arrIndex = 0
for(row <- values) {
// if Tuple (Tuple extends Product)
if(row.isInstanceOf[Product]) {
var colcount = row.asInstanceOf[Product].productIterator.length
assert(colcount > 0, "empty row")
assert(cols == 0 || colcount == cols, "varying number of columns")
cols = colcount
if(arr == null) {
arr = new Array[Double](rows*cols)
}
for(el <- row.asInstanceOf[Product].productIterator) {
var x : Double = 0.0
if(el.isInstanceOf[Integer]) {
x = el.asInstanceOf[Integer].toDouble
}
else {
assert(el.isInstanceOf[Double], "unknown element type")
x = el.asInstanceOf[Double]
}
arr(_arrIndex) = x
_arrIndex = _arrIndex + 1
}
}
}
}
works like
object ScalaMatrix extends App {
val m1 = new Matrix((1.0,2.0,3.0),(5,4,5))
val m2 = new Matrix((9,8),(7,6))
println(m1.toString())
println(m2.toString())
}
I don't really like it. What do you think about it?
Related
The following is a minimal example of the problem I am facing. I have an array that I want to modify in-place as it has about a million elements. the following code works except for the very last statement.
import spark.implicits._
case class Frame(x: Double, var y: Array[Double]) {
def total(): Double = {
return y.sum
}
def modifier(): Unit = {
for (i <- 0 until y.length) {
y(i) += 10
}
return
}
}
val df = Seq(
(1.0, Array(0, 2, 1)),
(8.0, Array(1, 2, 3)),
(9.0, Array(11, 21, 23))
).toDF("x", "y")
val ds = df.as[Frame]
ds.show
ds.map(_.total()).show // works
ds.map(_.modifier()).show // does not work
The error is as follows:
scala> ds.map(_.modifier()).show
<console>:50: error: Unable to find encoder for type Unit. An implicit Encoder[Unit] is needed to store Unit instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
ds.map(_.modifier()).show
I cannot see the origin of the problem. I would be grateful for any help in fixing the bug.
Actually, this has nothing to do with 'var' or 'val', its about mutable data structures. The problem is that modifier returns Unit (e.g. nothing), so you cannot map on this results. You can run it using :
case class Frame(x: Double, var y: Array[Double]) {
def total(): Double = {
return y.sum
}
def modifier(): Frame = {
for (i <- 0 until y.length) {
y(i) += 10
}
return this
}
}
But I does not make much sense in my opinion, you should avoid mutable state. In addition, I would keep case classes simple (i.e. without logic) in spark, use them as data containers only. If you must increase every element by then, you can do it also like this:
case class Frame(x: Double, val y: Array[Double])
ds.map(fr => fr.copy(y = fr.y.map(_+10.0))).show
I have been trying to code some scalacheck property to verify the Codility TapeEquilibrium problem. For those who do not know the problem, see the following link: https://app.codility.com/programmers/lessons/3-time_complexity/tape_equilibrium/.
I coded the following yet incomplete code.
test("Lesson 3 property"){
val left = Gen.choose(-1000, 1000).sample.get
val right = Gen.choose(-1000, 1000).sample.get
val expectedSum = Math.abs(left - right)
val leftArray = Gen.listOfN(???, left) retryUntil (_.sum == left)
val rightArray = Gen.listOfN(???, right) retryUntil (_.sum == right)
val property = forAll(leftArray, rightArray){ (r: List[Int], l: List[Int]) =>
val array = (r ++ l).toArray
Lesson3.solution3(array) == expectedSum
}
property.check()
}
The idea is as follows. I choose two random numbers (values left and right) and calculate its absolute difference. Then, my idea is to generate two arrays. Each array will be random numbers whose sum will be either "left" or "right". Then by concatenating these array, I should be able to verify this property.
My issue is then generating the leftArray and rightArray. This itself is a complex problem and I would have to code a solution for this. Therefore, writing this property seems over-complicated.
Is there any way to code this? Is coding this property an overkill?
Best.
My issue is then generating the leftArray and rightArray
One way to generate these arrays or (lists), is to provide a generator of nonEmptyList whose elements sum is equal to a given number, in other word, something defined by method like this:
import org.scalacheck.{Gen, Properties}
import org.scalacheck.Prop.forAll
def listOfSumGen(expectedSum: Int): Gen[List[Int]] = ???
That verifies the property:
forAll(Gen.choose(-1000, 1000)){ sum: Int =>
forAll(listOfSumGen(sum)){ listOfSum: List[Int] =>
(listOfSum.sum == sum) && listOfSum.nonEmpty
}
}
To build such a list only poses a constraint on one element of the list, so basically here is a way to generate:
Generate list
The extra constrained element, will be given by the expectedSum - the sum of list
Insert the constrained element into a random index of the list (because obviously any permutation of the list would work)
So we get:
def listOfSumGen(expectedSum: Int): Gen[List[Int]] =
for {
list <- Gen.listOf(Gen.choose(-1000,1000))
constrainedElement = expectedSum - list.sum
index <- Gen.oneOf(0 to list.length)
} yield list.patch(index, List(constrainedElement), 0)
Now we the above generator, leftArray and rightArray could be define as follows:
val leftArray = listOfSumGen(left)
val rightArray = listOfSumGen(right)
However, I think that the overall approach of the property described is incorrect, as it builds an array where a specific partition of the array equals the expectedSum but this doesn't ensure that another partition of the array would produce a smaller sum.
Here is a counter-example run-through:
val left = Gen.choose(-1000, 1000).sample.get // --> 4
val right = Gen.choose(-1000, 1000).sample.get // --> 9
val expectedSum = Math.abs(left - right) // --> |4 - 9| = 5
val leftArray = listOfSumGen(left) // Let's assume one of its sample would provide List(3,1) (whose sum equals 4)
val rightArray = listOfSumGen(right)// Let's assume one of its sample would provide List(2,4,3) (whose sum equals 9)
val property = forAll(leftArray, rightArray){ (l: List[Int], r: List[Int]) =>
// l = List(3,1)
// r = List(2,4,3)
val array = (l ++ r).toArray // --> Array(3,1,2,4,3) which is the array from the given example in the exercise
Lesson3.solution3(array) == expectedSum
// According to the example Lesson3.solution3(array) equals 1 which is different from 5
}
Here is an example of a correct property that essentially applies the definition:
def tapeDifference(index: Int, array: Array[Int]): Int = {
val (left, right) = array.splitAt(index)
Math.abs(left.sum - right.sum)
}
forAll(Gen.nonEmptyListOf(Gen.choose(-1000,1000))) { list: List[Int] =>
val array = list.toArray
forAll(Gen.oneOf(array.indices)) { index =>
Lesson3.solution3(array) <= tapeDifference(index, array)
}
}
This property definition might collides with the way the actual solution has been implemented (which is one of the potential pitfall of scalacheck), however, that would be a slow / inefficient solution hence this would be more a way to check an optimized and fast implementation against slow and correct implementation (see this presentation)
Try this with c# :
using System;
using System.Collections.Generic;
using System.Linq;
private static int TapeEquilibrium(int[] A)
{
var sumA = A.Sum();
var size = A.Length;
var take = 0;
var res = new List<int>();
for (int i = 1; i < size; i++)
{
take = take + A[i-1];
var resp = Math.Abs((sumA - take) - take);
res.Add(resp);
if (resp == 0) return resp;
}
return res.Min();
}
as mentioned in the title I have a SortedSet with custom ordering. The set holds objects of class Edge (representing an edge in a graph). Each Edge has a cost associated with it as well as it's start and end point.
case class Edge(firstId : Int, secondId : Int, cost : Int) {}
My ordering for SortedSet of edges looks like this (it's for the A* algorithm) :
object Ord {
val edgeCostOrdering: Ordering[Edge] = Ordering.by { edge : Edge =>
if (edge.secondId == goalId) graphRepresentation.calculateStraightLineCost(edge.firstId, goalId) else edge.cost + graphRepresentation.calculateStraightLineCost(edge.secondId, goalId)
}
}
However after I apply said ordering to the set and I try to sort edges that have different start/end points but the same cost - only the last encountered edge retains in the set.
For example :
val testSet : SortedSet[Edge] = SortedSet[Edge]()(edgeOrder)
val testSet2 = testSet + Edge(1,4,2)
val testSet3 = testSet2 + Edge(3,2,2)
println(testSet3)
Only prints (3,2,2)
Aren't these distinct objects? They only share the same value for one field so shouldn't the Set be able to handle this?
Consider using a mutable.PriorityQueue instead, it can keep multiple elements that have the same order. Here is a simpler example where we order pairs by the second component:
import collection.mutable.PriorityQueue
implicit val twoOrd = math.Ordering.by{ (t: (Int, Int)) => t._2 }
val p = new PriorityQueue[(Int, Int)]()(twoOrd)
p += ((1, 2))
p += ((42, 2))
Even though both pairs are mapped to 2, and therefore have the same priority, the queue does not lose any elements:
p foreach println
(1,2)
(42,2)
To retain all the distinct Edges with the same ordering cost value in the SortedSet, you can modify your Ordering.by's function to return a Tuple that includes the edge Ids as well:
val edgeCostOrdering: Ordering[Edge] = Ordering.by { edge: Edge =>
val cost = if (edge.secondId == goalId) ... else ...
(cost, edge.firstId, edge.secondId)
}
A quick proof of concept below:
import scala.collection.immutable.SortedSet
case class Foo(a: Int, b: Int)
val fooOrdering: Ordering[Foo] = Ordering.by(_.b)
val ss = SortedSet(Foo(2, 2), Foo(2, 1), Foo(1, 2))(fooOrdering)
// ss: scala.collection.immutable.SortedSet[Foo] = TreeSet(Foo(2,1), Foo(1,2))
val fooOrdering: Ordering[Foo] = Ordering.by(foo => (foo.b, foo.a))
val ss = SortedSet(Foo(2, 2), Foo(2, 1), Foo(1, 2))(fooOrdering)
// ss: scala.collection.immutable.SortedSet[Foo] = TreeSet(Foo(1,2), Foo(2,1), Foo(2,2))
I am new to scala and I am practicing it with k-means algorithm following the tutorial from k-means
I am confused by this part of this tutorial:
var newCentroids = pointsGroup.mapValues(ps => average(ps)).collectAsMap()
This causes a type mismatch error because function average needs a Seq, while we give it an Iterable. How can I fix this? What caused this error?
Well Seq is a sub-type of Iterable but not vice-versa, so it is not possible to convert these types in the type systems.
There is an explicit conversion available by writing average(ps.toSeq). This conversion will iterate the Iterable and collect the items into a Seq.
We could easily replace Seq with Iterable in provided solution for average function:
def average(ps: Iterable[Vector]) : Vector = {
val numVectors = ps.size
var out = new Vector(ps.head.elements)
ps foreach ( out += _)
out / numVectors
}
Or even in constant space:
def average(ps: Iterable[Vector]): Vector = {
val numVectors = ps.size
val vSize = ps.head.elements.length
def element(index: Int): Double = ps.map(_(index)).sum / numVectors
new Vector(0 until vSize map element toArray)
}
Newbie Scala Question:
Say I want to do this [Java code] in Scala:
public static double[] abs(double[] r, double[] im) {
double t[] = new double[r.length];
for (int i = 0; i < t.length; ++i) {
t[i] = Math.sqrt(r[i] * r[i] + im[i] * im[i]);
}
return t;
}
and also make it generic (since Scala efficiently do generic primitives I have read). Relying only on the core language (no library objects/classes, methods, etc), how would one do this? Truthfully I don't see how to do it at all, so I guess that's just a pure bonus point question.
I ran into sooo many problems trying to do this simple thing that I have given up on Scala for the moment. Hopefully once I see the Scala way I will have an 'aha' moment.
UPDATE:
Discussing this with others, this is the best answer I have found so far.
def abs[T](r: Iterable[T], im: Iterable[T])(implicit n: Numeric[T]) = {
import n.mkNumericOps
r zip(im) map(t => math.sqrt((t._1 * t._1 + t._2 * t._2).toDouble))
}
Doing generic/performant primitives in scala actually involves two related mechanisms which scala uses to avoid boxing/unboxing (e.g. wrapping an int in a java.lang.Integer and vice versa):
#specialize type annotations
Using Manifest with arrays
specialize is an annotation that tells the Java compiler to create "primitive" versions of code (akin to C++ templates, so I am told). Check out the type declaration of Tuple2 (which is specialized) compared with List (which isn't). It was added in 2.8 and means that, for example code like CC[Int].map(f : Int => Int) is executed without ever boxing any ints (assuming CC is specialized, of course!).
Manifests are a way of doing reified types in scala (which is limited by the JVM's type erasure). This is particularly useful when you want to have a method genericized on some type T and then create an array of T (i.e. T[]) within the method. In Java this is not possible because new T[] is illegal. In scala this is possible using Manifests. In particular, and in this case it allows us to construct a primitive T-array, like double[] or int[]. (This is awesome, in case you were wondering)
Boxing is so important from a performance perspective because it creates garbage, unless all of your ints are < 127. It also, obviously, adds a level of indirection in terms of extra process steps/method calls etc. But consider that you probably don't give a hoot unless you are absolutely positively sure that you definitely do (i.e. most code does not need such micro-optimization)
So, back to the question: in order to do this with no boxing/unboxing, you must use Array (List is not specialized yet, and would be more object-hungry anyway, even if it were!). The zipped function on a pair of collections will return a collection of Tuple2s (which will not require boxing, as this is specialized).
In order to do this generically (i.e. across various numeric types) you must require a context bound on your generic parameter that it is Numeric and that a Manifest can be found (required for array creation). So I started along the lines of...
def abs[T : Numeric : Manifest](rs : Array[T], ims : Array[T]) : Array[T] = {
import math._
val num = implicitly[Numeric[T]]
(rs, ims).zipped.map { (r, i) => sqrt(num.plus(num.times(r,r), num.times(i,i))) }
// ^^^^ no SQRT function for Numeric
}
...but it doesn't quite work. The reason is that a "generic" Numeric value does not have an operation like sqrt -> so you could only do this at the point of knowing you had a Double. For example:
scala> def almostAbs[T : Manifest : Numeric](rs : Array[T], ims : Array[T]) : Array[T] = {
| import math._
| val num = implicitly[Numeric[T]]
| (rs, ims).zipped.map { (r, i) => num.plus(num.times(r,r), num.times(i,i)) }
| }
almostAbs: [T](rs: Array[T],ims: Array[T])(implicit evidence$1: Manifest[T],implicit evidence$2: Numeric[T])Array[T]
Excellent - now see this purely generic method do some stuff!
scala> val rs = Array(1.2, 3.4, 5.6); val is = Array(6.5, 4.3, 2.1)
rs: Array[Double] = Array(1.2, 3.4, 5.6)
is: Array[Double] = Array(6.5, 4.3, 2.1)
scala> almostAbs(rs, is)
res0: Array[Double] = Array(43.69, 30.049999999999997, 35.769999999999996)
Now we can sqrt the result, because we have a Array[Double]
scala> res0.map(math.sqrt(_))
res1: Array[Double] = Array(6.609841147864296, 5.481788029466298, 5.980802621722272)
And to prove that this would work even with another Numeric type:
scala> import math._
import math._
scala> val rs = Array(BigDecimal(1.2), BigDecimal(3.4), BigDecimal(5.6)); val is = Array(BigDecimal(6.5), BigDecimal(4.3), BigDecimal(2.1))
rs: Array[scala.math.BigDecimal] = Array(1.2, 3.4, 5.6)
is: Array[scala.math.BigDecimal] = Array(6.5, 4.3, 2.1)
scala> almostAbs(rs, is)
res6: Array[scala.math.BigDecimal] = Array(43.69, 30.05, 35.77)
scala> res6.map(d => math.sqrt(d.toDouble))
res7: Array[Double] = Array(6.609841147864296, 5.481788029466299, 5.9808026217222725)
Use zip and map:
scala> val reals = List(1.0, 2.0, 3.0)
reals: List[Double] = List(1.0, 2.0, 3.0)
scala> val imags = List(1.5, 2.5, 3.5)
imags: List[Double] = List(1.5, 2.5, 3.5)
scala> reals zip imags
res0: List[(Double, Double)] = List((1.0,1.5), (2.0,2.5), (3.0,3.5))
scala> (reals zip imags).map {z => math.sqrt(z._1*z._1 + z._2*z._2)}
res2: List[Double] = List(1.8027756377319946, 3.2015621187164243, 4.6097722286464435)
scala> def abs(reals: List[Double], imags: List[Double]): List[Double] =
| (reals zip imags).map {z => math.sqrt(z._1*z._1 + z._2*z._2)}
abs: (reals: List[Double],imags: List[Double])List[Double]
scala> abs(reals, imags)
res3: List[Double] = List(1.8027756377319946, 3.2015621187164243, 4.6097722286464435)
UPDATE
It is better to use zipped because it avoids creating a temporary collection:
scala> def abs(reals: List[Double], imags: List[Double]): List[Double] =
| (reals, imags).zipped.map {(x, y) => math.sqrt(x*x + y*y)}
abs: (reals: List[Double],imags: List[Double])List[Double]
scala> abs(reals, imags)
res7: List[Double] = List(1.8027756377319946, 3.2015621187164243, 4.6097722286464435)
There isn't a easy way in Java to create generic numeric computational code; the libraries aren't there as you can see from oxbow's answer. Collections also are designed to take arbitrary types, which means that there's an overhead in working with primitives with them. So the fastest code (without careful bounds checking) is either:
def abs(re: Array[Double], im: Array[Double]) = {
val a = new Array[Double](re.length)
var i = 0
while (i < a.length) {
a(i) = math.sqrt(re(i)*re(i) + im(i)*im(i))
i += 1
}
a
}
or, tail-recursively:
def abs(re: Array[Double], im: Array[Double]) = {
def recurse(a: Array[Double], i: Int = 0): Array[Double] = {
if (i < a.length) {
a(i) = math.sqrt(re(i)*re(i) + im(i)*im(i))
recurse(a, i+1)
}
else a
}
recurse(new Array[Double](re.length))
}
So, unfortunately, this code ends up not looking super-nice; the niceness comes once you package it in a handy complex number array library.
If it turns out that you don't actually need highly efficient code, then
def abs(re: Array[Double], im: Array[Double]) = {
(re,im).zipped.map((i,j) => math.sqrt(i*i + j*j))
}
will do the trick compactly and conceptually clearly (once you understand how zipped works). The penalty in my hands is that this is about 2x slower. (Using List makes it 7x slower than while or tail recursion in my hands; List with zip makes it 20x slower; generics with arrays are 3x slower even without computing the square root.)
(Edit: fixed timings to reflect a more typical use case.)
After Edit:
OK I have got running what I wanted to do. Will take two Lists of any type of number and return an Array of Doubles.
def abs[A](r:List[A], im:List[A])(implicit numeric: Numeric[A]):Array[Double] = {
var t = new Array[Double](r.length)
for( i <- r.indices) {
t(i) = math.sqrt(numeric.toDouble(r(i))*numeric.toDouble(r(i))+numeric.toDouble(im(i))*numeric.toDouble(im(i)))
}
t
}