Using the kronecker product on complex matrices with scalaNLP breeze - scala
I had a piece of code:
def this(vectors: List[DenseVector[Double]]) {
this(vectors.length)
var resultVector = vectors.head
for (vector <- vectors) {
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
}
_vector = resultVector
}
It worked just the way I wanted it to work. The problem is that I needed complex values in stead of doubles. After importing breeze.math.Complex, I changed the code to:
def this(vectors: List[DenseVector[Complex]]) {
this(vectors.length)
var resultVector = vectors.head
for (vector <- vectors) {
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
}
_vector = resultVector
}
This however results into the errors:
Error:(42, 26) could not find implicit value for parameter impl: breeze.linalg.kron.Impl2[breeze.linalg.DenseMatrix[breeze.math.Complex],breeze.linalg.DenseMatrix[breeze.math.Complex],VR]
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
^
Error:(42, 26) not enough arguments for method apply: (implicit impl: breeze.linalg.kron.Impl2[breeze.linalg.DenseMatrix[breeze.math.Complex],breeze.linalg.DenseMatrix[breeze.math.Complex],VR])VR in trait UFunc.
Unspecified value parameter impl.
resultVector = kron(resultVector.toDenseMatrix, vector.toDenseMatrix).toDenseVector
^
Is this a bug or am I forgetting to do something?
I found the problem in the following way:
I first rewrote the function to use less matrix conversions
As there was a problem with the implicit impl variable of kron, I also rewrote the function call to explicitly state which variable to use to use
.
def this(vectors: List[DenseVector[Complex]]) {
this(vectors.length)
var resultMatrix = vectors.head.toDenseMatrix
for (i <- 1 until vectors.length) {
resultMatrix = kron(resultMatrix, vectors(i).toDenseMatrix)(kron.kronDM_M[Complex, Complex, DenseMatrix[Complex], Complex])
}
_vector = resultMatrix.toDenseVector
}
This showed me that there was no ScalarMulOp for V2, M, DenseMatrix[RV] where M is a Matrix[V1], V1 and V2 are the input types and RV is the output type of the ScalarMulOp
Digging through the source code of breeze I found in DenseMatrixOps that there only was an implicit ScalarMulOp for the above types if V1, V2 and RV are of type Int, Long, Float and Double. By copying the function and making it specific for Complex numbers, I was able to get the kronecker product to work. Now I could also remove the explicit use of (kron.kronDM_M[Complex, Complex, DenseMatrix[Complex], Complex]). The ScalarMulOp function in question is:
implicit def s_dm_op_Complex_OpMulScalar(implicit op: OpMulScalar.Impl2[Complex, Complex, Complex]):
OpMulScalar.Impl2[Complex, DenseMatrix[Complex], DenseMatrix[Complex]] =
new OpMulScalar.Impl2[Complex, DenseMatrix[Complex], DenseMatrix[Complex]] {
def apply(b: Complex, a: DenseMatrix[Complex]): DenseMatrix[Complex] = {
val res: DenseMatrix[Complex] = DenseMatrix.zeros[Complex](a.rows, a.cols)
val resd: Array[Complex] = res.data
val ad: Array[Complex] = a.data
var c = 0
var off = 0
while (c < a.cols) {
var r = 0
while (r < a.rows) {
resd(off) = op(b, ad(a.linearIndex(r, c)))
r += 1
off += 1
}
c += 1
}
res
}
implicitly[BinaryRegistry[Complex, Matrix[Complex], OpMulScalar.type, Matrix[Complex]]].register(this)
}
Related
What is a good implementation of generic 2D Array literals in scala3 based on tuples?
The following code is a proof-of-concept for an uncluttered way to declare 2D array literals in scala3. It's based on this answer to a related question, implemented for scala2: https://stackoverflow.com/a/13863525/666886 The non-generic code below provides a clean Array literal declaration, but is inflexible w.r.t. type and dimension. It would be better to derive array dimensions from tuple arity and number of rows. #!/usr/bin/env scala3 object Array2d { def main(args: Array[String]): Unit = { prettyPrintArray() } type X=Int type Tuple26 = (X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X) type Array2d = Array[Array[X]] def apply(tuples: Tuple26 *): Array2d = { for { tupe <- tuples row = for ( i <- tupe.toList ) yield i } yield row.toArray }.toArray lazy val letterFrequencies = Array2d( (46,615,763,839,1745,325,628,651,1011,128,573,1319,797,1123,884,726,49,1642,2241,1162,631,299,408,97,659,184), (15,103,128,202,597,49,126,107,358,32,123,321,171,226,483,38,6,439,565,233,340,21,58,21,227,40), (63,128,106,218,689,86,75,407,526,14,241,369,197,307,627,208,5,507,675,343,361,60,66,24,226,28), (39,202,218,149,1257,108,183,175,634,35,140,407,212,418,692,194,10,596,763,272,409,97,184,52,348,46), (745,597,689,1257,919,396,568,583,1214,111,504,1315,726,1083,1217,763,34,1876,2323,1223,662,437,455,199,655,187), (25,49,86,108,396,118,75,72,295,11,68,258,58,129,248,26,7,299,415,208,205,14,50,20,151,24), (28,126,75,183,568,75,99,141,424,24,55,324,146,416,476,129,3,423,536,206,326,48,66,4,244,28), (51,107,407,175,583,72,141,53,399,21,145,282,198,238,488,174,6,342,715,454,285,44,158,14,207,23), (11,358,526,634,1214,295,424,399,169,71,392,853,518,884,594,467,50,970,1471,853,335,246,215,104,323,130), (28,32,14,35,111,11,24,21,71,2,31,41,28,52,98,26,1,45,112,33,75,9,12,1,38,3), (73,123,241,140,504,68,55,145,392,31,61,247,102,300,376,152,8,338,744,184,240,21,86,2,224,20), (319,321,369,407,1315,258,324,282,853,41,247,250,341,369,875,395,10,468,1237,491,566,173,201,61,463,50), (97,171,197,212,726,58,146,198,518,28,102,341,102,257,555,195,6,429,769,273,401,33,61,40,260,44), (123,226,307,418,1083,129,416,238,884,52,300,369,257,159,849,265,18,533,1027,537,499,113,220,45,374,65), (84,483,627,692,1217,248,476,488,594,98,376,875,555,849,525,566,18,1096,1658,856,462,175,329,85,521,130), (26,38,208,194,763,26,129,174,467,26,152,395,195,265,566,127,8,488,874,347,353,35,83,24,333,28), (9,6,5,10,34,7,3,6,50,1,8,10,6,18,18,8,1,16,37,25,96,0,2,0,7,1), (642,439,507,596,1876,299,423,342,970,45,338,468,429,533,1096,488,16,246,1461,789,673,189,234,52,491,77), (241,565,675,763,2323,415,536,715,1471,112,744,1237,769,1027,1658,874,37,1461,717,1367,1088,221,484,86,598,105), (162,233,343,272,1223,208,206,454,853,33,184,491,273,537,856,347,25,789,1367,254,548,72,202,49,384,52), (31,340,361,409,662,205,326,285,335,75,240,566,401,499,462,353,96,673,1088,548,73,61,41,32,317,51), (99,21,60,97,437,14,48,44,246,9,21,173,33,113,175,35,0,189,221,72,61,20,25,10,63,7), (8,58,66,184,455,50,66,158,215,12,86,201,61,220,329,83,2,234,484,202,41,25,11,12,133,19), (7,21,24,52,199,20,4,14,104,1,2,61,40,45,85,24,0,52,86,49,32,10,12,1,34,2), (59,227,226,348,655,151,244,207,323,38,224,463,260,374,521,333,7,491,598,384,317,63,133,34,43,54), (84,40,28,46,187,24,28,23,130,3,20,50,44,65,130,28,1,77,105,52,51,7,19,2,54,42), ) def prettyPrintArray(): Unit = { val alphabet = "abcdefghijklmnopqrstuvwxyz" val toprow = alphabet.map { "%4s".format(_) }.mkString(",") printf("// %s\n", toprow) for (a <- alphabet){ val freqs: Seq[String] = for { b <- alphabet (x,y) = (alphabet.indexOf(a), alphabet.indexOf(b)) freq = letterFrequencies(x)(y) } yield "%4d".format(freq) printf("/* %s */ (%s),\n", a, freqs.mkString(",")) } } }
The following works, but the same thing based on the new scala 3 type IArray would be nice to have as well. #!/usr/bin/env scala3 // Example of declaring a generic NxN Array[Array[_]] in scala3 by using // scala 3 tuple extensions, for a concise declaration of a 2d array. // An improved version would avoid the deprecated ClassTag reference, // and the use of `asInstanceOf[List[T]]`. // Perhaps `shapeless 3` would help in this regard? object Array2d { def main(args: Array[String]): Unit = { prettyPrintArray() } // it would be nice if a Tuple with a single homogeneous field Type // automatically converted to an Array of that type, rather // than an Array[AnyRef] import scala.reflect.ClassTag def apply[T:ClassTag](tuples: Tuple *): Array[Array[T]] = { val rows: Seq[Array[T]] = for { tupe <- tuples row: Seq[T] = tupe.toList.asInstanceOf[List[T]] arr: Array[T] = (for (i:T <- row) yield i).toArray } yield arr rows.toArray } // this table is useful for analyzing nytimes Wordle guesses. lazy final val letterFrequencies = Array2d[Int]( (46,615,763,839,1745,325,628,651,1011,128,573,1319,797,1123,884,726,49,1642,2241,1162,631,299,408,97,659,184), (15,103,128,202,597,49,126,107,358,32,123,321,171,226,483,38,6,439,565,233,340,21,58,21,227,40), (63,128,106,218,689,86,75,407,526,14,241,369,197,307,627,208,5,507,675,343,361,60,66,24,226,28), (39,202,218,149,1257,108,183,175,634,35,140,407,212,418,692,194,10,596,763,272,409,97,184,52,348,46), (745,597,689,1257,919,396,568,583,1214,111,504,1315,726,1083,1217,763,34,1876,2323,1223,662,437,455,199,655,187), (25,49,86,108,396,118,75,72,295,11,68,258,58,129,248,26,7,299,415,208,205,14,50,20,151,24), (28,126,75,183,568,75,99,141,424,24,55,324,146,416,476,129,3,423,536,206,326,48,66,4,244,28), (51,107,407,175,583,72,141,53,399,21,145,282,198,238,488,174,6,342,715,454,285,44,158,14,207,23), (11,358,526,634,1214,295,424,399,169,71,392,853,518,884,594,467,50,970,1471,853,335,246,215,104,323,130), (28,32,14,35,111,11,24,21,71,2,31,41,28,52,98,26,1,45,112,33,75,9,12,1,38,3), (73,123,241,140,504,68,55,145,392,31,61,247,102,300,376,152,8,338,744,184,240,21,86,2,224,20), (319,321,369,407,1315,258,324,282,853,41,247,250,341,369,875,395,10,468,1237,491,566,173,201,61,463,50), (97,171,197,212,726,58,146,198,518,28,102,341,102,257,555,195,6,429,769,273,401,33,61,40,260,44), (123,226,307,418,1083,129,416,238,884,52,300,369,257,159,849,265,18,533,1027,537,499,113,220,45,374,65), (84,483,627,692,1217,248,476,488,594,98,376,875,555,849,525,566,18,1096,1658,856,462,175,329,85,521,130), (26,38,208,194,763,26,129,174,467,26,152,395,195,265,566,127,8,488,874,347,353,35,83,24,333,28), (9,6,5,10,34,7,3,6,50,1,8,10,6,18,18,8,1,16,37,25,96,0,2,0,7,1), (642,439,507,596,1876,299,423,342,970,45,338,468,429,533,1096,488,16,246,1461,789,673,189,234,52,491,77), (241,565,675,763,2323,415,536,715,1471,112,744,1237,769,1027,1658,874,37,1461,717,1367,1088,221,484,86,598,105), (162,233,343,272,1223,208,206,454,853,33,184,491,273,537,856,347,25,789,1367,254,548,72,202,49,384,52), (31,340,361,409,662,205,326,285,335,75,240,566,401,499,462,353,96,673,1088,548,73,61,41,32,317,51), (99,21,60,97,437,14,48,44,246,9,21,173,33,113,175,35,0,189,221,72,61,20,25,10,63,7), (8,58,66,184,455,50,66,158,215,12,86,201,61,220,329,83,2,234,484,202,41,25,11,12,133,19), (7,21,24,52,199,20,4,14,104,1,2,61,40,45,85,24,0,52,86,49,32,10,12,1,34,2), (59,227,226,348,655,151,244,207,323,38,224,463,260,374,521,333,7,491,598,384,317,63,133,34,43,54), (84,40,28,46,187,24,28,23,130,3,20,50,44,65,130,28,1,77,105,52,51,7,19,2,54,42), ) def prettyPrintArray(): Unit = { val alphabet = "abcdefghijklmnopqrstuvwxyz" val toprow = alphabet.map { "%4s".format(_) }.mkString(",") printf("// %s\n", toprow) for (a <- alphabet){ val freqs: Seq[String] = for { b <- alphabet (x,y) = (alphabet.indexOf(a), alphabet.indexOf(b)) freq = letterFrequencies(x)(y) } yield "%4d".format(freq) printf("/* %s */ (%s),\n", a, freqs.mkString(",")) } } }
Scala Spark: How do I bootstrap sample from a column of a Spark Dataframe?
I am looking to sample values, with replacement, from a column of a Spark DataFrame, using the Scala programming language in a Jupyter Notebook setting in a cluster environment. How do I do this? I tried the following function that I found online: import scala.util def bootstrapMean(originalData: Array[Double]): Double = { val n = originalData.length def draw: Double = originalData(util.Random.nextInt(n)) // a tail recursive loop to randomly draw and add a value to the accumulating sum def drawAndSumValues(current: Int, acc: Double = 0D): Double = { if (current == 0) acc else drawAndSumValues(current - 1, acc + draw) } drawAndSumValues(n) / n } Like so: val data = stack.select("column_with_values").collect.map(_.toSeq).flatten val m = 10 val bootstraps = Vector.fill(m)(bootstrapMean(data)) But I get the error: An error was encountered: <console>:47: error: type mismatch; found : Array[Any] required: Array[Double] Note: Any >: Double, but class Array is invariant in type T. You may wish to investigate a wildcard type such as `_ >: Double`. (SLS 3.2.10) val bootstraps = Vector.fill(m)(bootstrapMean(data)) Not sure how to debug this, and whether I should bother to or try another approach. I'm looking for ideas/documentation/code. Thanks. Update: How do I put the user mck's solution below, in a for loop? I tried the following: var bootstrap_container = Seq() var a = 1 for( a <- 1 until 3){ var sampled = stack_b.select("diff_hours").sample(withReplacement = true, fraction = 0.5, seed = a) var smpl_average = sampled.select(avg("diff_hours")).collect()(0)(0) var bootstrap_smpls = bootstrap_container.union(Seq(smpl_average)).collect() } bootstrap_smpls but that gives an error: <console>:49: error: not enough arguments for method collect: (pf: PartialFunction[Any,B])(implicit bf: scala.collection.generic.CanBuildFrom[Seq[Any],B,That])That. Unspecified value parameter pf. var bootstrap_smpls = bootstrap_container.union(Seq(smpl_average)).collect()
You can use the sample method of dataframes, for example, if you want to sample with replacement and with a fraction of 0.5: val sampled = stack.select("column_with_values").sample(true, 0.5) To get the mean, you can do: val col_average = sampled.select(avg("column_with_values")).collect()(0)(0) EDIT: var bootstrap_container = List[Double]() var a = 1 for( a <- 1 until 3){ var sampled = stack_b2.select("diff_hours").sample(withReplacement = true, fraction = 0.5, seed = a) var smpl_average = sampled.select(avg("diff_hours")).collect()(0)(0) bootstrap_container = bootstrap_container :+ smpl_average.asInstanceOf[Double] } var mean_bootstrap = bootstrap_container.reduce(_ + _) / bootstrap_container.length
Type parameter issue in Scala with generic function
I'm trying to come up with a generic function (toBitSet) using type parameter T. def toBitSet[T:Integral](x:T, valueBitwidth:Int, filterBitwidth:Int, bigEndian:Boolean = true, shift:Int = 0) = { BitSet((for (i <- 0 to (valueBitwidth - 1) if (((x & 0xFF) >> i) & 1) == 1) yield (i + shift)): _*) } byteToBitSet and shortToBitSet functions are specializaton of the generic function. def byteToBitSet(x:Byte, filterBitwidth:Int, bigEndian:Boolean = true, shift:Int = 0) = { toBitSet[Byte](x = x, valueBitwidth = 8, filterBitwidth = filterBitwidth, bigEndian = bigEndian, shift = shift) } def shortToBitSet(x:Short, filterBitwidth:Int, bigEndian:Boolean = true, shift:Int = 0) = { toBitSet[Short](x = x, valueBitwidth = 16, filterBitwidth = filterBitwidth, bigEndian = bigEndian, shift = shift) } However, Scala doesn't understand the operators (>>, &, ==, +) on type T to show an error message. I specified that T is Integral type, but it doesn't work. How to solve this issue?
The type signature def func[T: Integral](arg: T) = {} is actually a syntactic shorthand for: def func[T](arg: T)(implicit ev: Integral[T]) = {} ("ev" is often chosen as the name for this "evidence" argument.) The Integral trait outlines what operations you then use on elements of type T. Example: addition is ev.plus(t1, t2) If you import Integral.Implicits._ then you can use the more natural infix notation: t1 + t2 Unfortunately, the Integral trait doesn't include bit-wise operations like & and >>. If you can modify your algorithm to use only those available ops you'll get the function you're after.
Type mismatch from partition in Scala (expected (Set[String]...), actual (Set[String]...) )
I have a partition method that creates tuple of two sets of string. def partition(i:Int) = { dictionary.keySet.partition(dictionary(_)(i) == true) } I also have a map that maps integer to the return value from the partition method. val m = Map[Int, (Set[String], Set[String])]() for (i <- Range(0, getMaxIndex())) { m(i) = partition(i) } The issue is that I have type mismatch error, but the error message does not make sense to me. What might be wrong? This is the code: import scala.collection.mutable.Map import scala.collection.{BitSet} case class Partition(dictionary:Map[String, BitSet]) { def max(x:Int, y:Int) = if (x > y) x else y def partition(i:Int) = { dictionary.keySet.partition(dictionary(_)(i) == true) } def getMaxIndex() = { val values = dictionary.values (0 /: values) ((m, bs) => max(m, bs.last)) } def get() = { val m = Map[Int, (Set[String], Set[String])]() for (i <- Range(0, getMaxIndex())) { m(i) = partition(i) } m } }
When I compile your example, the error is clear: <console>:64: error: type mismatch; found : (scala.collection.Set[String], scala.collection.Set[String]) required: (scala.collection.immutable.Set[String], scala.collection.immutable.Set[String]) m(i) = partition(i) ^ Looking into the API, the keySet method of a mutable map does not guarantee that the returned set is immutable. Compare this with keySet on an immutable Map—it does indeed return an immutable set. Therefore, you could either use an immutable Map and a var force the result of your partition method to return an immutable set (e.g. toSet) define the value type of your map to be collection.Set instead of Predef.Set which is an alias for collection.immtuable.Set. To clarify these types, it helps to specify an explicit return type for your public methods (get and partition)
How to add 'Array[Ordered[Any]]' as a method parameter
Below is an implementation of Selection sort written in Scala. The line ss.sort(arr) causes this error : type mismatch; found : Array[String] required: Array[Ordered[Any]] Since the type Ordered is inherited by StringOps should this type not be inferred ? How can I add the array of Strings to sort() method ? Here is the complete code : object SelectionSortTest { def main(args: Array[String]){ val arr = Array("Hello","World") val ss = new SelectionSort() ss.sort(arr) } } class SelectionSort { def sort(a : Array[Ordered[Any]]) = { var N = a.length for (i <- 0 until N) { var min = i for(j <- i + 1 until N){ if( less(a(j) , a(min))){ min = j } exchange(a , i , min) } } } def less(v : Ordered[Any] , w : Ordered[Any]) = { v.compareTo(w) < 0 } def exchange(a : Array[Ordered[Any]] , i : Integer , j : Integer) = { var swap : Ordered[Any] = a(i) a(i) = a(j) a(j) = swap } }
Array is invariant. You cannot use an Array[A] as an Array[B] even if A is subtype of B. See here why: Why are Arrays invariant, but Lists covariant? Neither is Ordered, so your implementation of less will not work either. You should make your implementation generic the following way: object SelectionSortTest { def main(args: Array[String]){ val arr = Array("Hello","World") val ss = new SelectionSort() ss.sort(arr) } } class SelectionSort { def sort[T <% Ordered[T]](a : Array[T]) = { var N = a.length for (i <- 0 until N) { var min = i for(j <- i + 1 until N){ if(a(j) < a(min)){ // call less directly on Ordered[T] min = j } exchange(a , i , min) } } } def exchange[T](a : Array[T] , i : Integer , j : Integer) = { var swap = a(i) a(i) = a(j) a(j) = swap } } The somewhat bizarre statement T <% Ordered[T] means "any type T that can be implicitly converted to Ordered[T]". This ensures that you can still use the less-than operator. See this for details: What are Scala context and view bounds?
The answer by #gzm0 (with some very nice links) suggests Ordered. I'm going to complement with an answer covering Ordering, which provides equivalent functionality without imposing on your classes as much. Let's adjust the sort method to accept an array of type 'T' for which an Ordering implicit instance is defined. def sort[T : Ordering](a: Array[T]) = { val ord = implicitly[Ordering[T]] import ord._ // now comparison operations such as '<' are available for 'T' // ... if (a(j) < a(min)) // ... } The [T : Ordering] and implicitly[Ordering[T]] combo is equivalent to an implicit parameter of type Ordering[T]: def sort[T](a: Array[T])(implicit ord: Ordering[T]) = { import ord._ // ... } Why is this useful? Imagine you are provided with a case class Account(balance: Int) by some third party. You can now add an Ordering for it like so: // somewhere in scope implicit val accountOrdering = new Ordering[Account] { def compare(x: Account, y: Account) = x.balance - y.balance } // or, more simply implicit val accountOrdering: Ordering[Account] = Ordering by (_.balance) As long as that instance is in scope, you should be able to use sort(accounts). If you want to use some different ordering, you can also provide it explicitly, like so: sort(accounts)(otherOrdering). Note that this isn't very different from providing an implicit conversion to Ordering (at least not within the context of this question).
Even though, when coding Scala, I'm used to prefer functional programming style (via combinators or recursion) over imperative style (via variables and iterations), THIS TIME, for this specific problem, old school imperative nested loops result in simpler and performant code. I don't think falling back to imperative style is a mistake for certain classes of problems, such as sorting algorithms which usually transform the input buffer (more like a procedure) rather than resulting to a new sorted collection. Here it is my solution: package bitspoke.algo import scala.math.Ordered import scala.collection.mutable.Buffer abstract class Sorter[T <% Ordered[T]] { // algorithm provided by subclasses def sort(buffer : Buffer[T]) : Unit // check if the buffer is sorted def sorted(buffer : Buffer[T]) = buffer.isEmpty || buffer.view.zip(buffer.tail).forall { t => t._2 > t._1 } // swap elements in buffer def swap(buffer : Buffer[T], i:Int, j:Int) { val temp = buffer(i) buffer(i) = buffer(j) buffer(j) = temp } } class SelectionSorter[T <% Ordered[T]] extends Sorter[T] { def sort(buffer : Buffer[T]) : Unit = { for (i <- 0 until buffer.length) { var min = i for (j <- i until buffer.length) { if (buffer(j) < buffer(min)) min = j } swap(buffer, i, min) } } } As you can see, to achieve parametric polymorphism, rather than using java.lang.Comparable, I preferred scala.math.Ordered and Scala View Bounds rather than Upper Bounds. That's certainly works thanks to Scala Implicit Conversions of primitive types to Rich Wrappers. You can write a client program as follows: import bitspoke.algo._ import scala.collection.mutable._ val sorter = new SelectionSorter[Int] val buffer = ArrayBuffer(3, 0, 4, 2, 1) sorter.sort(buffer) assert(sorter.sorted(buffer))