I have List of lists List[List[Int] to n] and Lists[Int] are of different size and i want to multiply values all of them between each other.
For List(List(1,2,3), List(4,5), List(6)) it would be
1*4*6, 1*5*6, 2*4*6, 2*5*6, 3*4*6, 3*5*6
and so on and return result as a List of resulting values List[24, 30, 48, 60, 72, 90]
You can do it using foldLeft this way:
list.foldLeft(List(1)) {
case (acc, item) => acc.flatMap(v => item.map(_*v))
}
Explanation
Lets define method producing all possible multiplication of pairs of two lists:
def mul(a:List[Int], b:List[Int]) = b.flatMap (item => a.map(_*item))
this method for each item in b produces a list of items from a multiplied by current value of b
Now we can apply this procedure to all elements of list of lists, giving it initial value of List(1)
Related
I'm trying to implement a element-wise multiplication of two ml.linalg.SparseVector instances (also called a Hadamard product).
A SparseVector represents a vector, but rather than having space taken up by all the "0" values, they are omitted. The vector is represented as two lists of Indices and Values.
For example: SparseVector(indices: [0, 100, 100000], values: [0.25, 1, 0.8]) concisely represents an array of 100,000 elements, where only 3 values are non-zero.
I now need an element-wise multiplication of two of these, and there seems to be no built-in. Conceptually, it should be simple - any indices they don't have in common are dropped, and for the indices in common, the numbers are multiplied together.
For example: SparseVector(indices: [0, 500, 100000], values: [10, 1, 10]) when multiplied with the above should return: SparseVector(indices: [0, 100000], values: [2.5, 8])
Sadly, I've found no built-in for this. I have an approach for doing this in a single pass, but it isn't very scala-y, it has to build up the list in a loop as it discovers which indices are in common, and then grab the corresponding values for each index (which have the same cardinal position, but in a second array).
import org.apache.spark.ml.linalg._
import org.apache.spark.sql.functions.udf
import scala.collection.mutable.ListBuffer
// Return a new SparseVector whose values are the element-wise product (Hadamard product)
val multSparseVectors = udf((v1: SparseVector, v2: SparseVector) => {
// val commonIndexes = v1.indices.intersect(v2.indices); // Missing scale factors are assumed to have a value of 0, so only common elements remain
// TODO: No clear way to map common indices to the values that go with those indices. E.g. no "valueForIndex" method
// new SparseVector(v1.size, commonIndexes, commonIndexes.map(i => v1.valueForIndex(i) * v2.valueForIndex(i)).toArray);
val indices = ListBuffer[Int](); // TODO: Some way to do this without mutable lists?
val values = ListBuffer[Double]();
var v1Pos = 0; // Current index of SparseVector v1 (we will be making a single pass)
var v2pos = 0; // Current index of SparseVector v2 (we will be making a single pass)
while(v1Pos < v1.indices.length && v2pos < v2.indices.length) {
while(v1.indices(v1Pos) < v2.indices(v2pos))
v2pos += 1; // Advance our position in SparseVector 2 until we've matched or passed the current SparseVector 1 index
if(v2pos > v2.indices.length && v1.indices(v1Pos) == v2.indices(v2pos)) {
indices += v1.indices(v1Pos);
values += v1.values(v1Pos) * v2.values(v2pos);
}
v1Pos += 1;
}
new SparseVector(v1.size, indices.toArray, values.toArray);
})
spark.udf.register("multSparseVectors", multSparseVectors)
Can anyone think of a way that I can do this using a map or similar? My main goal is I want to avoid having to make multiple O(N) passes over the second vector to "lookup" the position of a value in the indices list so that I can grab the corresponding values entry, because this would take O(K + N*2) time when I know there's an O(K + N) solution possible.
I've come up with a solution by boiling this problem into a more general one:
Finding the indices at which two arrays intersect
Given an answer to the above question (where to the two arrays v1.indices and v2.indices intersect), we can trivially use those indices to extract back the new SparseVector indices, and the values from each vector to be multiplied together.
The solution is given below:
%scala
import scala.annotation.tailrec
import org.apache.spark.ml.linalg._
import org.apache.spark.sql.functions.udf
// This fanciness from https://stackoverflow.com/a/71928709/529618 finds the indices at which two lists intersect
#tailrec
def indicesOfIntersection(left: List[Int], right: List[Int], lidx: Int = 0, ridx: Int = 0, result: List[(Int, Int)] = Nil): List[(Int, Int)] = (left, right) match {
case (Nil, _) | (_, Nil) => result.reverse
case (l::tail, r::_) if l < r => indicesOfIntersection(tail, right, lidx+1, ridx, result)
case (l::_, r::tail) if l > r => indicesOfIntersection(left, tail, lidx, ridx+1, result)
case (l::ltail, r::rtail) => indicesOfIntersection(ltail, rtail, lidx+1, ridx+1, (lidx, ridx) :: result)
}
// Return a new SparseVector whose values are the element-wise product (Hadamard product)
val multSparseVectors = udf((v1: SparseVector, v2: SparseVector) => {
val intersection = indicesOfIntersection(v1.indices.toList, v2.indices.toList);
new SparseVector(v1.size,
intersection.map{case (x1,_) => v1.indices(x1)}.toArray,
intersection.map{case (x1,x2) => v1.values(x1) * v2.values(x2)}.toArray);
})
spark.udf.register("multSparseVectors", multSparseVectors)
I am new to Scala and new OOP too. How can I update a particular element in a list while creating a new list.
val numbers= List(1,2,3,4,5)
val result = numbers.map(_*2)
I need to update third element only -> multiply by 2. How can I do that by using map?
You can use zipWithIndex to map the list into a list of tuples, where each element is accompanied by its index. Then, using map with pattern matching - you single out the third element (index = 2):
val numbers = List(1,2,3,4,5)
val result = numbers.zipWithIndex.map {
case (v, i) if i == 2 => v * 2
case (v, _) => v
}
// result: List[Int] = List(1, 2, 6, 4, 5)
Alternatively - you can use patch, which replaces a sub-sequence with a provided one:
numbers.patch(from = 2, patch = Seq(numbers(2) * 2), replaced = 1)
I think the clearest way of achieving this is by using updated(index: Int, elem: Int). For your example, it could be applied as follows:
val result = numbers.updated(2, numbers(2) * 2)
list.zipWithIndex creates a list of pairs with original element on the left, and index in the list on the right (indices are 0-based, so "third element" is at index 2).
val result = number.zipWithIndex.map {
case (n, 2) => n*2
case n => n
}
This creates an intermediate list holding the pairs, and then maps through it to do your transformation. A bit more efficient approach is to use iterator. Iterators a 'lazy', so, rather than creating an intermediate container, it will generate the pairs one-by-one, and send them straight to the .map:
val result = number.iterator.zipWithIndex.map {
case (n, 2) => n*2
case n => n
}.toList
1st and the foremost scala is FOP and not OOP. You can update any element of a list through the keyword "updated", see the following example for details:
Signature :- updated(index,value)
val numbers= List(1,2,3,4,5)
print(numbers.updated(2,10))
Now here the 1st argument is the index and the 2nd argument is the value. The result of this code will modify the list to:
List(1, 2, 10, 4, 5).
I am new to spark using scala and very much confused by the notations (x,y) in some scenarios and x._1, y._1. Especially when they are used one over the other in spark transformations
could someone explain is there a specific rule of thumb for when to use each of these syntaxes
Basically there are 2 ways to access a tuple parameter in anonymous function. They're functionally equivalent, use whatever method you prefer.
Through the attributes _1, _2,...
Through pattern matching into variable with meaningful name
val tuples = Array((1, 2), (2, 3), (3, 4))
// Attributes
tuples.foreach { t =>
println(s"${t._1} ${t._2}")
}
// Pattern matching
tuples.foreach { t =>
t match {
case (first, second) =>
println(s"$first $second")
}
}
// Pattern matching can also written as
tuples.foreach { case (first, second) =>
println(s"$first $second")
}
The notation (x, y) is a tuple of 2 elements, x and y. There are different ways to get access to the individual values in a tuple. You can use the ._1, ._2 notation to get at the elements:
val tup = (3, "Hello") // A tuple with two elements
val number = tup._1 // Gets the first element (3) from the tuple
val text = tup._2 // Gets the second element ("Hello") from the tuple
You can also use pattern matching. One way to extract the two values is like this:
val (number, text) = tup
Unlike a collection (for example, a List) a tuple has a fixed number of values (it's not always exactly two values) and the values can have different types (such as an Int and a String in the example above).
There are many tutorials about Scala tuples, for example: Scala tuple examples and syntax.
I have a list details of this type :
case class Detail(point: List[Double], cluster: Int)
val details = List(Detail(List(2.0, 10.0),1), Detail(List(2.0, 5.0),3),
Detail(List(8.0, 4.0),2), Detail(List(5.0, 8.0),2))
I want filter this list into a tuple which contains a sum of each corresponding point where the cluster is 2
So I filter this List :
details.filter(detail => detail.cluster == 2)
which returns :
List(Detail(List(8.0, 4.0),2), Detail(List(5.0, 8.0),2))
It's the summing of the corresponding values I'm having trouble with. In this example the tuple should contain (8+5, 4+8) = (13, 12)
I'm thinking to flatten the List and then sum each corresponding value but
List(details).flatten
just returns the same List
How to sum the corresponding values in the List into a Tuple ?
I could achieve this easily using a for loop and just extract the details I need into a counter but what is the functional solution ?
What do you want to happen if the lists for different Details have different lengths?
Or same length which is different from 2? Tuples are generally only used when you need a fixed in advance number of elements; you won't even be able to write a return type if you need tuples of different lengths.
Assuming that all of them are lists of the same length and you get a list in return, something like this should work (untested):
details.filter(_.cluster == 2).map(_.point).transpose.map(_.sum)
I.e. first get all points as a list of lists, transpose it so you get a list for each "coordinate", and sum each of these lists.
If you do know that each point has two coordinates, this should likely be reflected in your Point type, by using (Double, Double) instead of List[Double] and you can just fold over the list of points, which should be a bit more efficient. Look at definition of foldLeft and the standard implementation of sum in terms of foldLeft:
def sum(list: List[Int]): Int = list.foldLeft(0)((acc, x) => acc + x)
and it should be easy to do what you want.
You can use just one foldLeft with PF without filter:
details.foldLeft((0.0,0.0))({
case ((accX, accY), Detail(x :: y :: Nil, 2)) => (accX + x, accY + y)
case (acc, _) => acc
})
res1: (Double, Double) = (13.0,12.0)
Apologies: I'm well noob
I have an items class
class item(ind:Int,freq:Int,gap:Int){}
I have an ordered list of ints
val listVar = a.toList
where a is an array
I want a list of items called metrics where
ind is the (unique) integer
freq is the number of times that ind appears in list
gap is the minimum gap between ind and the number in the list before it
so far I have:
def metrics = for {
n <- 0 until 255
listVar filter (x == n) count > 0
}
yield new item(n, (listVar filter == n).count,0)
It's crap and I know it - any clues?
Well, some of it is easy:
val freqMap = listVar groupBy identity mapValues (_.size)
This gives you ind and freq. To get gap I'd use a fold:
val gapMap = listVar.sliding(2).foldLeft(Map[Int, Int]()) {
case (map, List(prev, ind)) =>
map + (ind -> (map.getOrElse(ind, Int.MaxValue) min ind - prev))
}
Now you just need to unify them:
freqMap.keys.map( k => new item(k, freqMap(k), gapMap.getOrElse(k, 0)) )
Ideally you want to traverse the list only once and in the course for each different Int, you want to increment a counter (the frequency) as well as keep track of the minimum gap.
You can use a case class to store the frequency and the minimum gap, the value stored will be immutable. Note that minGap may not be defined.
case class Metric(frequency: Int, minGap: Option[Int])
In the general case you can use a Map[Int, Metric] to lookup the Metric immutable object. Looking for the minimum gap is the harder part. To look for gap, you can use the sliding(2) method. It will traverse the list with a sliding window of size two allowing to compare each Int to its previous value so that you can compute the gap.
Finally you need to accumulate and update the information as you traverse the list. This can be done by folding each element of the list into your temporary result until you traverse the whole list and get the complete result.
Putting things together:
listVar.sliding(2).foldLeft(
Map[Int, Metric]().withDefaultValue(Metric(0, None))
) {
case (map, List(a, b)) =>
val metric = map(b)
val newGap = metric.minGap match {
case None => math.abs(b - a)
case Some(gap) => math.min(gap, math.abs(b - a))
}
val newMetric = Metric(metric.frequency + 1, Some(newGap))
map + (b -> newMetric)
case (map, List(a)) =>
map + (a -> Metric(1, None))
case (map, _) =>
map
}
Result for listVar: List[Int] = List(2, 2, 4, 4, 0, 2, 2, 2, 4, 4)
scala.collection.immutable.Map[Int,Metric] = Map(2 -> Metric(4,Some(0)),
4 -> Metric(4,Some(0)), 0 -> Metric(1,Some(4)))
You can then turn the result into your desired item class using map.toSeq.map((i, m) => new Item(i, m.frequency, m.minGap.getOrElse(-1))).
You can also create directly your Item object in the process, but I thought the code would be harder to read.