How do I use the forall loop to check if all rows have the same number of columns in a Vector in Scala? - scala

So I have a generic vector: 1:vec: Vector[Vector[T]]
Now I wanna use the require and forall to check if the length of each row is the same.
This is how far I've gotten:
2:require(vec.forall(row => data(row).length == ???)
So essentially I wanna make sure that each row has same number of columns, I don't wanna use 3:data(row + 1).length since I could probably use a for loop in that case. Can anyone give a tip on how to resolve code 2?

If all the rows must have the same length, you can compare each row with any of the others.
if (vec.size < 2) true // vacuous
else {
val firstLength = data(vec.head).length
vec.forall(row => data(row).length == firstLength)
}
Strictly speaking, the first comparison in the forall will always be true, but Vector.tail is probably more work than performing the comparison; if data(row).length is particularly expensive, making it
vec.tail.forall(...)
might be worth it.
(If instead of a Vector, we were dealing with a List, tail is most definitely cheaper than data(row).length, though other cautions around using List may apply)

Consider a 3x4 vector of vectors such as for instance
val xs = Vector.tabulate(3) { _ => Vector.tabulate(4) { _ => 1 }}
namely,
Vector(Vector(1, 1, 1, 1),
Vector(1, 1, 1, 1),
Vector(1, 1, 1, 1))
Collect the size for each nested vector,
val s = xs.map {_.size}
// Vector(4, 4, 4)
Now we can compare consecutive sizes with Vector.forall by pairing them with
s.zip(s.drop(1))
// Vector((4,4), (4,4))
where the first pair corresponds to the first and second vector sizes, and the second pair to the second and third vector sizes; thus
s.zip(s.drop(1)).forall { case(a,b) => a == b }
// true
With this approach we can define other predicates in Vector.forall, such as monotonically increasing pairs,
val xs = Vector(Vector(1), Vector(1,2), Vector(1,2,3))
val s = xs.map {_.size}
// Vector(1, 2, 3)
s.zip(s.drop(1))
// Vector((1,2), (2,3))
s.zip(s.drop(1)).forall { case(a,b) => a == b }
// false
s.zip(s.drop(1)).forall { case(a,b) => a < b }
// true

Related

Avoid ListBuffer while preparing an element-wise multiplication of two SparseVectors

I'm trying to implement a element-wise multiplication of two ml.linalg.SparseVector instances (also called a Hadamard product).
A SparseVector represents a vector, but rather than having space taken up by all the "0" values, they are omitted. The vector is represented as two lists of Indices and Values.
For example: SparseVector(indices: [0, 100, 100000], values: [0.25, 1, 0.8]) concisely represents an array of 100,000 elements, where only 3 values are non-zero.
I now need an element-wise multiplication of two of these, and there seems to be no built-in. Conceptually, it should be simple - any indices they don't have in common are dropped, and for the indices in common, the numbers are multiplied together.
For example: SparseVector(indices: [0, 500, 100000], values: [10, 1, 10]) when multiplied with the above should return: SparseVector(indices: [0, 100000], values: [2.5, 8])
Sadly, I've found no built-in for this. I have an approach for doing this in a single pass, but it isn't very scala-y, it has to build up the list in a loop as it discovers which indices are in common, and then grab the corresponding values for each index (which have the same cardinal position, but in a second array).
import org.apache.spark.ml.linalg._
import org.apache.spark.sql.functions.udf
import scala.collection.mutable.ListBuffer
// Return a new SparseVector whose values are the element-wise product (Hadamard product)
val multSparseVectors = udf((v1: SparseVector, v2: SparseVector) => {
// val commonIndexes = v1.indices.intersect(v2.indices); // Missing scale factors are assumed to have a value of 0, so only common elements remain
// TODO: No clear way to map common indices to the values that go with those indices. E.g. no "valueForIndex" method
// new SparseVector(v1.size, commonIndexes, commonIndexes.map(i => v1.valueForIndex(i) * v2.valueForIndex(i)).toArray);
val indices = ListBuffer[Int](); // TODO: Some way to do this without mutable lists?
val values = ListBuffer[Double]();
var v1Pos = 0; // Current index of SparseVector v1 (we will be making a single pass)
var v2pos = 0; // Current index of SparseVector v2 (we will be making a single pass)
while(v1Pos < v1.indices.length && v2pos < v2.indices.length) {
while(v1.indices(v1Pos) < v2.indices(v2pos))
v2pos += 1; // Advance our position in SparseVector 2 until we've matched or passed the current SparseVector 1 index
if(v2pos > v2.indices.length && v1.indices(v1Pos) == v2.indices(v2pos)) {
indices += v1.indices(v1Pos);
values += v1.values(v1Pos) * v2.values(v2pos);
}
v1Pos += 1;
}
new SparseVector(v1.size, indices.toArray, values.toArray);
})
spark.udf.register("multSparseVectors", multSparseVectors)
Can anyone think of a way that I can do this using a map or similar? My main goal is I want to avoid having to make multiple O(N) passes over the second vector to "lookup" the position of a value in the indices list so that I can grab the corresponding values entry, because this would take O(K + N*2) time when I know there's an O(K + N) solution possible.
I've come up with a solution by boiling this problem into a more general one:
Finding the indices at which two arrays intersect
Given an answer to the above question (where to the two arrays v1.indices and v2.indices intersect), we can trivially use those indices to extract back the new SparseVector indices, and the values from each vector to be multiplied together.
The solution is given below:
%scala
import scala.annotation.tailrec
import org.apache.spark.ml.linalg._
import org.apache.spark.sql.functions.udf
// This fanciness from https://stackoverflow.com/a/71928709/529618 finds the indices at which two lists intersect
#tailrec
def indicesOfIntersection(left: List[Int], right: List[Int], lidx: Int = 0, ridx: Int = 0, result: List[(Int, Int)] = Nil): List[(Int, Int)] = (left, right) match {
case (Nil, _) | (_, Nil) => result.reverse
case (l::tail, r::_) if l < r => indicesOfIntersection(tail, right, lidx+1, ridx, result)
case (l::_, r::tail) if l > r => indicesOfIntersection(left, tail, lidx, ridx+1, result)
case (l::ltail, r::rtail) => indicesOfIntersection(ltail, rtail, lidx+1, ridx+1, (lidx, ridx) :: result)
}
// Return a new SparseVector whose values are the element-wise product (Hadamard product)
val multSparseVectors = udf((v1: SparseVector, v2: SparseVector) => {
val intersection = indicesOfIntersection(v1.indices.toList, v2.indices.toList);
new SparseVector(v1.size,
intersection.map{case (x1,_) => v1.indices(x1)}.toArray,
intersection.map{case (x1,x2) => v1.values(x1) * v2.values(x2)}.toArray);
})
spark.udf.register("multSparseVectors", multSparseVectors)

How to trick Scala map method to produce more than one output per each input item?

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1 : 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1 : n output?
The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).
My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:
val X = Seq(1, 2, 3, 4, 5)
val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here
val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples
// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
newX = x*x :: newX
if(x % 2 == 0) newX = (x / 2) :: newX
})
newX
Is there a better way than foreach construct?
.flatMap produces any number of outputs from a single input.
val X = Seq(1, 2, 3, 4, 5)
X.flatMap { x =>
if (x % 2 == 0) Seq(x*x, x / 2) else Seq(x / 2)
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)
flatMap in more detail
In X.map(f), f is a function that maps each input to a single output. By contrast, in X.flatMap(g), the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f) and concatenates them.
The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option. Similarly, Future(x)#flatMap(g) will allow g to return a Future.
Whenever the number of elements you return depends on the input, you should think of flatMap.

How to traverse array from both left to right and from right to left?

Suppose I have an imperative algorithm that keeps two indices left and right and moves them from left to right and from right to left
var left = 0
var right = array.length - 1
while (left < right) { .... } // move left and right inside the loop
Now I would like to write this algorithm without mutable indices.
How can I do that ? Do you have any examples of such algorithms ? I would prefer a non-recursive approach.
You can map pairs of elements between your list and its reverse, then go from left to right through that list of pairs and keep taking as long as your condition is satisfied:
val list = List(1, 2, 3, 4, 5)
val zipped = list zip list.reverse
val filtered = zipped takeWhile { case (a, b) => (a < b) }
Value of filtered is List((1, 5), (2, 4)).
Now you can do whatever you need with those elements:
val result = filtered map {
case (a, b) =>
// do something with each left-right pair, e.g. sum them
a + b
}
println(result) // List(6, 6)
If you need some kind of context dependant operation (that is, each
iteration depends on the result of the previous one) then you have to
use a more powerful abstraction (monad), but let's not go there if
this is enough for you. Even better would be to simply use recursion, as pointed out by others, but you said that's not an option.
EDIT:
Version without extra pass for reversing, only constant-time access for elem(length - index):
val list = List(1, 2, 3, 4, 5)
val zipped = list.view.zipWithIndex
val filtered = zipped takeWhile { case (a, index) => (a < list(list.length - 1 - index)) }
println(filtered.toList) // List((1, 0), (2, 1))
val result = filtered map {
case (elem, index) => // do something with each left-right pair, e.g. sum them
val (a, b) = (elem, list(list.length - 1 - index))
a + b
}
println(result.toList) // List(6, 6)
Use reverseIterator:
scala> val arr = Array(1,2,3,4,5)
arr: Array[Int] = Array(1, 2, 3, 4, 5)
scala> arr.iterator.zip(arr.reverseIterator).foreach(println)
(1,5)
(2,4)
(3,3)
(4,2)
(5,1)
This function is efficient on IndexedSeq collections, which Array is implicitly convertible to.
It really depends on what needs to be done at each iteration, but here's something to think about.
array.foldRight(0){case (elem, index) =>
if (index < array.length/2) {
/* array(index) and elem are opposite elements in the array */
/* do whatever (note: requires side effects) */
index+1
} else index // do nothing
} // ignore result
Upside: Traverse the array only once and no mutable variables.
Downside: Requires side effects (but that was implied in your example). Also, it'd be better if it traversed only half the array, but that would require early breakout and Scala doesn't offer an easy/elegant solution for that.
myarray = [1,2,3,4,5,6]
rmyarray = myarray[::-1]
Final_Result = []
for i in range(len(myarray)//2):
Final_Result.append(myarray[i])
Final_Result.append(rmyarray[i])
print(Final_Result)
# This is the simple approach I think 😉.

How do I populate a list of objects with new values

Apologies: I'm well noob
I have an items class
class item(ind:Int,freq:Int,gap:Int){}
I have an ordered list of ints
val listVar = a.toList
where a is an array
I want a list of items called metrics where
ind is the (unique) integer
freq is the number of times that ind appears in list
gap is the minimum gap between ind and the number in the list before it
so far I have:
def metrics = for {
n <- 0 until 255
listVar filter (x == n) count > 0
}
yield new item(n, (listVar filter == n).count,0)
It's crap and I know it - any clues?
Well, some of it is easy:
val freqMap = listVar groupBy identity mapValues (_.size)
This gives you ind and freq. To get gap I'd use a fold:
val gapMap = listVar.sliding(2).foldLeft(Map[Int, Int]()) {
case (map, List(prev, ind)) =>
map + (ind -> (map.getOrElse(ind, Int.MaxValue) min ind - prev))
}
Now you just need to unify them:
freqMap.keys.map( k => new item(k, freqMap(k), gapMap.getOrElse(k, 0)) )
Ideally you want to traverse the list only once and in the course for each different Int, you want to increment a counter (the frequency) as well as keep track of the minimum gap.
You can use a case class to store the frequency and the minimum gap, the value stored will be immutable. Note that minGap may not be defined.
case class Metric(frequency: Int, minGap: Option[Int])
In the general case you can use a Map[Int, Metric] to lookup the Metric immutable object. Looking for the minimum gap is the harder part. To look for gap, you can use the sliding(2) method. It will traverse the list with a sliding window of size two allowing to compare each Int to its previous value so that you can compute the gap.
Finally you need to accumulate and update the information as you traverse the list. This can be done by folding each element of the list into your temporary result until you traverse the whole list and get the complete result.
Putting things together:
listVar.sliding(2).foldLeft(
Map[Int, Metric]().withDefaultValue(Metric(0, None))
) {
case (map, List(a, b)) =>
val metric = map(b)
val newGap = metric.minGap match {
case None => math.abs(b - a)
case Some(gap) => math.min(gap, math.abs(b - a))
}
val newMetric = Metric(metric.frequency + 1, Some(newGap))
map + (b -> newMetric)
case (map, List(a)) =>
map + (a -> Metric(1, None))
case (map, _) =>
map
}
Result for listVar: List[Int] = List(2, 2, 4, 4, 0, 2, 2, 2, 4, 4)
scala.collection.immutable.Map[Int,Metric] = Map(2 -> Metric(4,Some(0)),
4 -> Metric(4,Some(0)), 0 -> Metric(1,Some(4)))
You can then turn the result into your desired item class using map.toSeq.map((i, m) => new Item(i, m.frequency, m.minGap.getOrElse(-1))).
You can also create directly your Item object in the process, but I thought the code would be harder to read.

help rewriting in functional style

I'm learning Scala as my first functional-ish language. As one of the problems, I was trying to find a functional way of generating the sequence S up to n places. S is defined so that S(1) = 1, and S(x) = the number of times x appears in the sequence. (I can't remember what this is called, but I've seen it in programming books before.)
In practice, the sequence looks like this:
S = 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7 ...
I can generate this sequence pretty easily in Scala using an imperative style like this:
def genSequence(numItems: Int) = {
require(numItems > 0, "numItems must be >= 1")
var list: List[Int] = List(1)
var seq_no = 2
var no = 2
var no_nos = 0
var num_made = 1
while(num_made < numItems) {
if(no_nos < seq_no) {
list = list :+ no
no_nos += 1
num_made += 1
} else if(no % 2 == 0) {
no += 1
no_nos = 0
} else {
no += 1
seq_no += 1
no_nos = 0
}
}
list
}
But I don't really have any idea how to write this without using vars and the while loop.
Thanks!
Pavel's answer has come closest so far, but it's also inefficient. Two flatMaps and a zipWithIndex are overkill here :)
My understanding of the required output:
The results contain all the positive integers (starting from 1) at least once
each number n appears in the output (n/2) + 1 times
As Pavel has rightly noted, the solution is to start with a Stream then use flatMap:
Stream from 1
This generates a Stream, a potentially never-ending sequence that only produces values on demand. In this case, it's generating 1, 2, 3, 4... all the way up to Infinity (in theory) or Integer.MAX_VALUE (in practice)
Streams can be mapped over, as with any other collection. For example: (Stream from 1) map { 2 * _ } generates a Stream of even numbers.
You can also use flatMap on Streams, allowing you to map each input element to zero or more output elements; this is key to solving your problem:
val strm = (Stream from 1) flatMap { n => Stream.fill(n/2 + 1)(n) }
So... How does this work? For the element 3, the lambda { n => Stream.fill(n/2 + 1)(n) } will produce the output stream 3,3. For the first 5 integers you'll get:
1 -> 1
2 -> 2, 2
3 -> 3, 3
4 -> 4, 4, 4
5 -> 5, 5, 5
etc.
and because we're using flatMap, these will be concatenated, yielding:
1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, ...
Streams are memoised, so once a given value has been calculated it'll be saved for future reference. However, all the preceeding values have to be calculated at least once. If you want the full sequence then this won't cause any problems, but it does mean that generating S(10796) from a cold start is going to be slow! (a problem shared with your imperative algorithm). If you need to do this, then none of the solutions so far is likely to be appropriate for you.
The following code produces exactly the same sequence as yours:
val seq = Stream.from(1)
.flatMap(Stream.fill(2)(_))
.zipWithIndex
.flatMap(p => Stream.fill(p._1)(p._2))
.tail
However, if you want to produce the Golomb sequence (that complies with the definition, but differs from your sample code result), you may use the following:
val seq = 1 #:: a(2)
def a(n: Int): Stream[Int] = (1 + seq(n - seq(seq(n - 2) - 1) - 1)) #:: a(n + 1)
You may check my article for more examples of how to deal with number sequences in functional style.
Here is a translation of your code to a more functional style:
def genSequence(numItems: Int): List[Int] = {
genSequenceR(numItems, 2, 2, 0, 1, List[Int](1))
}
def genSequenceR(numItems: Int, seq_no: Int, no:Int, no_nos: Int, numMade: Int, list: List[Int]): List[Int] = {
if(numMade < numItems){
if(no_nos < seq_no){
genSequenceR(numItems, seq_no, no, no_nos + 1, numMade + 1, list :+ no)
}else if(no % 2 == 0){
genSequenceR(numItems, seq_no, no + 1, 0, numMade, list)
}else{
genSequenceR(numItems, seq_no + 1, no + 1, 0, numMade, list)
}
}else{
list
}
}
The genSequenceR is the recursive function that accumulates values in the list and calls the function with new values based on the conditions. Like the while loop, it terminates, when numMade is less than numItems and returns the list to genSequence.
This is a fairly rudimentary functional translation of your code. It can be improved and there are better approaches typically used. I'd recommend trying to improve it with pattern matching and then work towards the other solutions that use Stream here.
Here's an attempt from a Scala tyro. Keep in mind I don't really understand Scala, I don't really understand the question, and I don't really understand your algorithm.
def genX_Ys[A](howMany : Int, ofWhat : A) : List[A] = howMany match {
case 1 => List(ofWhat)
case _ => ofWhat :: genX_Ys(howMany - 1, ofWhat)
}
def makeAtLeast(startingWith : List[Int], nextUp : Int, howMany : Int, minimumLength : Int) : List[Int] = {
if (startingWith.size >= minimumLength)
startingWith
else
makeAtLeast(startingWith ++ genX_Ys( howMany, nextUp),
nextUp +1, howMany + (if (nextUp % 2 == 1) 1 else 0), minimumLength)
}
def genSequence(numItems: Int) = makeAtLeast(List(1), 2, 2, numItems).slice(0, numItems)
This seems to work, but re-read the caveats above. In particular, I am sure there is a library function that performs genX_Ys, but I couldn't find it.
EDIT Could be
def genX_Ys[A](howMany : Int, ofWhat : A) : Seq[A] =
(1 to howMany) map { x => ofWhat }
Here is a very direct "translation" of the definition of the Golomb seqence:
val it = Iterator.iterate((1,1,Map(1->1,2->2))){ case (n,i,m) =>
val c = m(n)
if (c == 1) (n+1, i+1, m + (i -> n) - n)
else (n, i+1, m + (i -> n) + (n -> (c-1)))
}.map(_._1)
println(it.take(10).toList)
The tripel (n,i,m) contains the actual number n, the index i and a Map m, which contains how often an n must be repeated. When the counter in the Map for our n reaches 1, we increase n (and can drop n from the map, as it is not longer needed), else we just decrease n's counter in the map and keep n. In every case we add the new pair i -> n into the map, which will be used as counter later (when a subsequent n reaches the value of the current i).
[Edit]
Thinking about it, I realized that I don't need indexes and not even a lookup (because the "counters" are already in the "right" order), which means that I can replace the Map with a Queue:
import collection.immutable.Queue
val it = 1 #:: Iterator.iterate((2, 2, Queue[Int]())){
case (n,1,q) => (n+1, q.head, q.tail + (n+1))
case (n,c,q) => (n,c-1,q + n)
}.map(_._1).toStream
The Iterator works correctly when starting by 2, so I had to add a 1 at the beginning. The second tuple argument is now the counter for the current n (taken from the Queue). The current counter could be kept in the Queue as well, so we have only a pair, but then it's less clear what's going on due to the complicated Queue handling:
val it = 1 #:: Iterator.iterate((2, Queue[Int](2))){
case (n,q) if q.head == 1 => (n+1, q.tail + (n+1))
case (n,q) => (n, ((q.head-1) +: q.tail) + n)
}.map(_._1).toStream