How to trick Scala map method to produce more than one output per each input item? - scala

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1 : 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1 : n output?
The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).
My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:
val X = Seq(1, 2, 3, 4, 5)
val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here
val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples
// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
newX = x*x :: newX
if(x % 2 == 0) newX = (x / 2) :: newX
})
newX
Is there a better way than foreach construct?

.flatMap produces any number of outputs from a single input.
val X = Seq(1, 2, 3, 4, 5)
X.flatMap { x =>
if (x % 2 == 0) Seq(x*x, x / 2) else Seq(x / 2)
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)
flatMap in more detail
In X.map(f), f is a function that maps each input to a single output. By contrast, in X.flatMap(g), the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f) and concatenates them.
The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option. Similarly, Future(x)#flatMap(g) will allow g to return a Future.
Whenever the number of elements you return depends on the input, you should think of flatMap.

Related

Functional way to build a matrix from the list in scala

I've asked this question on https://users.scala-lang.org/ but haven't got a concrete answer yet. I am given a vector v and I would like to construct a matrix m based on this vector according to the rules specified below. I would like to write the following code in a purely functional way, i.e. m = v.map(...) or similar. I can do it easily in procedural way like this
import scala.util.Random
val v = Vector.fill(50)(Random.nextInt(100))
println(v)
val m = Array.fill[Int](10, 10)(0)
def populateMatrix(x: Int): Unit = m(x/10)(x%10) += 1
v.map(x => populateMatrix(x))
m foreach { row => row foreach print; println }
In words, I am iterating through v, getting a pair of indices (i,j) from each v(k) and updating the matrix m at these positions, i.e., m(i)(j) += 1. But I am seeking a functional way. It is clear for me how to implement this in, e.g. Mathematica
v=RandomInteger[{99}, 300]
m=SparseArray[{Rule[{Quotient[#, 10] + 1, Mod[#, 10] + 1}, 1]}, {10, 10}] & /# v // Total // Normal
But how to do it in scala, which is functional language, too?
Your populate matrix approach can be "reversed" - map vector into index tuples, group them, count size of each group and turn it into map (index tuple -> size) which will be used to populate corresponding index in array with Array.tabulate:
val v = Vector.fill(50)(Random.nextInt(100))
val values = v.map(i => (i/10, i%10))
.groupBy(identity)
.view
.mapValues(_.size)
val result = Array.tabulate(10,10)( (i, j)=> values.getOrElse((i,j), 0))

How do I use the forall loop to check if all rows have the same number of columns in a Vector in Scala?

So I have a generic vector: 1:vec: Vector[Vector[T]]
Now I wanna use the require and forall to check if the length of each row is the same.
This is how far I've gotten:
2:require(vec.forall(row => data(row).length == ???)
So essentially I wanna make sure that each row has same number of columns, I don't wanna use 3:data(row + 1).length since I could probably use a for loop in that case. Can anyone give a tip on how to resolve code 2?
If all the rows must have the same length, you can compare each row with any of the others.
if (vec.size < 2) true // vacuous
else {
val firstLength = data(vec.head).length
vec.forall(row => data(row).length == firstLength)
}
Strictly speaking, the first comparison in the forall will always be true, but Vector.tail is probably more work than performing the comparison; if data(row).length is particularly expensive, making it
vec.tail.forall(...)
might be worth it.
(If instead of a Vector, we were dealing with a List, tail is most definitely cheaper than data(row).length, though other cautions around using List may apply)
Consider a 3x4 vector of vectors such as for instance
val xs = Vector.tabulate(3) { _ => Vector.tabulate(4) { _ => 1 }}
namely,
Vector(Vector(1, 1, 1, 1),
Vector(1, 1, 1, 1),
Vector(1, 1, 1, 1))
Collect the size for each nested vector,
val s = xs.map {_.size}
// Vector(4, 4, 4)
Now we can compare consecutive sizes with Vector.forall by pairing them with
s.zip(s.drop(1))
// Vector((4,4), (4,4))
where the first pair corresponds to the first and second vector sizes, and the second pair to the second and third vector sizes; thus
s.zip(s.drop(1)).forall { case(a,b) => a == b }
// true
With this approach we can define other predicates in Vector.forall, such as monotonically increasing pairs,
val xs = Vector(Vector(1), Vector(1,2), Vector(1,2,3))
val s = xs.map {_.size}
// Vector(1, 2, 3)
s.zip(s.drop(1))
// Vector((1,2), (2,3))
s.zip(s.drop(1)).forall { case(a,b) => a == b }
// false
s.zip(s.drop(1)).forall { case(a,b) => a < b }
// true

Scala - create a new list and update particular element from existing list

I am new to Scala and new OOP too. How can I update a particular element in a list while creating a new list.
val numbers= List(1,2,3,4,5)
val result = numbers.map(_*2)
I need to update third element only -> multiply by 2. How can I do that by using map?
You can use zipWithIndex to map the list into a list of tuples, where each element is accompanied by its index. Then, using map with pattern matching - you single out the third element (index = 2):
val numbers = List(1,2,3,4,5)
val result = numbers.zipWithIndex.map {
case (v, i) if i == 2 => v * 2
case (v, _) => v
}
// result: List[Int] = List(1, 2, 6, 4, 5)
Alternatively - you can use patch, which replaces a sub-sequence with a provided one:
numbers.patch(from = 2, patch = Seq(numbers(2) * 2), replaced = 1)
I think the clearest way of achieving this is by using updated(index: Int, elem: Int). For your example, it could be applied as follows:
val result = numbers.updated(2, numbers(2) * 2)
list.zipWithIndex creates a list of pairs with original element on the left, and index in the list on the right (indices are 0-based, so "third element" is at index 2).
val result = number.zipWithIndex.map {
case (n, 2) => n*2
case n => n
}
This creates an intermediate list holding the pairs, and then maps through it to do your transformation. A bit more efficient approach is to use iterator. Iterators a 'lazy', so, rather than creating an intermediate container, it will generate the pairs one-by-one, and send them straight to the .map:
val result = number.iterator.zipWithIndex.map {
case (n, 2) => n*2
case n => n
}.toList
1st and the foremost scala is FOP and not OOP. You can update any element of a list through the keyword "updated", see the following example for details:
Signature :- updated(index,value)
val numbers= List(1,2,3,4,5)
print(numbers.updated(2,10))
Now here the 1st argument is the index and the 2nd argument is the value. The result of this code will modify the list to:
List(1, 2, 10, 4, 5).

Using `quick-look` to find certain element

First of all, I want to say that this is a school assignment and I am only seeking for some guidance.
My task was to write an algorithm that finds the k:th smallest element in a seq using quickselect. This should be easy enough but when running some tests I hit a wall. For some reason if I use input (List(1, 1, 1, 1), 1) it goes into infinite loop.
Here is my implementation:
val rand = new scala.util.Random()
def find(seq: Seq[Int], k: Int): Int = {
require(0 <= k && k < seq.length)
val a: Array[Int] = seq.toArray[Int] // Can't modify the argument sequence
val pivot = rand.nextInt(a.length)
val (low, high) = a.partition(_ < a(pivot))
if (low.length == k) a(pivot)
else if (low.length < k) find(high, k - low.length)
else find(low, k)
}
For some reason (or because I am tired) I cannot spot my mistake. If someone could hint me where I go wrong I would be pleased.
Basically you are depending on this line - val (low, high) = a.partition(_ < a(pivot)) to split the array into 2 arrays. The first one containing the continuous sequence of elements smaller than pivot-element and the second contains the rest.
Then you say that if the first array has length k that means you have already seen k elements smaller that your pivot-element. Which means pivot-element is actually k+1th smallest and you are actually returning k+1th smallest element instead of kth. This is your first mistake.
Also... A greater problem occurs when you have all elements which are same because your first array will always have 0 elements.
Not only that... your code will give you wrong answer for inputs where you have repeating elements among k smallest ones like - (1, 3, 4, 1, 2).
The solution lies in obervation that in the sequence (1, 1, 1, 1) the 4th smallest element is the 4th 1. Meaning you have to use <= instead of <.
Also... Since the partition function will not split the array until your boolean condition is false, you can not use partition for achieving this array split. you will have to write the split yourself.

How to Map Partial Elements in Scala/Spark

I have a list of integers:
val mylist = List(1, 2, 3, 4)
What I want to do is to map the element which are even numbers in mylist, and multiply them by 2.
Maybe the code should be:
mylist.map{ case x%2==2 => x*2 }
I expect the result to be List(4, 8) but it's not. What is the correct code?
I know I could realize this function by using filter + map
a.filter(_%2 == 0).map(_*2)
but is there some way to realize this function by only using map()?
map does not reduce number of elements in transformation.
filter + map is right approach.
But if single method is needed, use collect:
mylist.collect{ case x if x % 2 == 0 => 2 * x }
Edit:
withFilter + map is more efficient than filter + map (as withFilter does not create intermediate collection, i.e. it works lazily):
mylist.withFilter(_ % 2 == 0).map(_ * 2)
which is same as for :
for { e <- mylist if (e % 2 == 0) } yield 2 * e