I have a Scala function that does 2-3 recursive calls through its lifetime. I want to save the variable inside the second tuple in a list. Is there a smart way to do this?
Just passing the variable around would mean that I would have a List[String], when in actuality what I want is a List[List[String]].
Would there be a need for a variable inside the function that updated with each itteration?
def someRecursiveFunction(listOfWords:List[String])List[List[String]] = {
val textSplitter = listOfWords.lastIndexOf("Some Word")
if (Some Word != -1) {
val someTuple = listofWords.splitAt(textSplitter)
val valueIwant = someTuple._2
someRecursiveFunction(someTuple._1)
}
List(someTuple._2,someTuple._2(1),someTuple._2(2)) // What I want back
}
Is there a way to extract the second tuple out of the recursive function so that I can use it further on in my program?
If the return type is fixed to be List[List[String]], the following changes to be made
to the code :
Because someType._2 is accessed as someType._2(2), there should be at least
3 strings in someType._2 list.
The last expression to be must of return type ie., List[List[String]]. Because someType._2(1)
and someType._2(2) are just strings and not List[String]:
List(someTuple._2,List(someTuple._2(1),someTuple._2(2))) will be of return type
List[List[String]]
The value of "Some Word" will be changing in the recursive process duly noting
that the someTuple._2.size is always >=3.
As we need to access someType._2 and it will be changing during each recursion,
it is declared as var within the recursive function.
With this understanding drawn from your requirement, the following code may be
what you are looking for:
def someRecursiveFunction(listOfWords:List[String],sw: String):List[List[String]] = {
val textSplitter = listOfWords.lastIndexOf(sw)
var i =0
if(i==0) { var someTuple:(List[String],List[String]) = (List(),List()) }
if (textSplitter != -1 && listOfWords.size-3>=textSplitter) {
someTuple = listOfWords.splitAt(textSplitter)
println(someTuple._1,someTuple._2) // for checking recursion
if( someTuple._1.size>=3){ i+=1
someRecursiveFunction(someTuple._1,someTuple._1(textSplitter-3))}
}
List(someTuple._2,List(someTuple._2(1),someTuple._2(2))) // What I want back
}
In Scala REPL:
val list = List("a","b","c","x","y","z","k","j","g","Some Word","d","e","f","u","m","p")
scala> val list = List("a","b","c","x","y","z","k","j","g","Some Word","d","e","f","u","m","p")
list: List[String] = List(a, b, c, x, y, z, k, j, g, Some Word, d, e, f, u, m, p)
scala> someRecursiveFunction(list,"d")
(List(a, b, c, x, y, z, k, j, g, Some Word),List(d, e, f, u, m, p))
(List(a, b, c, x, y, z, k),List(j, g, Some Word))
(List(a, b, c, x),List(y, z, k))
(List(a),List(b, c, x))
res70: List[List[String]] = List(List(b, c, x), List(c, x))
scala> someRecursiveFunction(list,"Some Word")
(List(a, b, c, x, y, z, k, j, g),List(Some Word, d, e, f, u, m, p))
(List(a, b, c, x, y, z),List(k, j, g))
(List(a, b, c),List(x, y, z))
(List(),List(a, b, c))
res71: List[List[String]] = List(List(a, b, c), List(b, c))
Related
I am wondering whether the following is possible in Scala:
Given some vector x = (x_1, x_2, ..., x_n) in R^n and a function f that maps R^n to R, I would like to replicate this concept in Scala. The idea of Scala's partial function/currying should hold here (i.e. when applying a single value x_i, return a function that is defined only for a subset of its input domain). For example, when n = 2 define f(x, y) = sin(x + y), then trivially, f(2, y) = sin(2 + y).
However, the dimension (n > 0) may vary from case to case and may even be provided in input.
Partial application for n = 2 is:
def leftPartialFunction(f: (Double, Double) => Double)(x: Double): Double => Double = f(x, _)
but how can this be generalized for arbitrary n?
For example, how can I apply the function in position i?
Something like this I assume would not work:
def partialFunction(f: IndexedSeq[Double] => Double)(xi: Double): IndexedSeq[Double] => Double = .... // cannot work well with indexed seq as they are not "disjoint"
Try the following implementation of partialFunction:
import shapeless.{::, HList, HNil, Nat, Succ}
import shapeless.ops.function.{FnFromProduct, FnToProduct}
import shapeless.ops.hlist.{At, Drop, Prepend, Take}
def partialFunction[N <: Nat, F, X,
L <: HList, Y, L1 <: HList, L2 <: HList, L3 <: HList
](i: N)(f: F)(xi: X)(implicit
fnToProduct: FnToProduct.Aux[F, L => Y],
at: At.Aux[L, N, X],
take: Take.Aux[L, N, L1],
drop: Drop.Aux[L, Succ[N], L2],
prepend: Prepend.Aux[L1, L2, L3],
fnFromProduct: FnFromProduct[L3 => Y],
take1: Take.Aux[L3, N, L1],
drop1: Drop.Aux[L3, N, L2],
prepend1: Prepend.Aux[L1, X :: L2, L],
): fnFromProduct.Out =
fnFromProduct(l3 => fnToProduct(f)(prepend1(take1(l3), xi :: drop1(l3))))
Testing:
import shapeless.Nat._1
val f: (Int, Boolean, Double) => String = (i, b, d) => s"i=$i, b=$b, d=$d"
f(1, true, 2.0) // i=1, b=true, d=2.0
val f1 = partialFunction(_1)(f)(true)
f1: ((Int, Double) => String)
f1(1, 2.0) // i=1, b=true, d=2.0
You can also write partialFunction(Nat(1))(f)(true) instead of partialFunction(_1)(f)(true).
The data consists of two columns
A B
A C
A D
B A
B C
B D
B E
C A
C B
C D
C E
D A
D B
D C
D E
E B
E C
E D
In the first row, think of it as A is friends with B, etc.
How do I find their common friends?
(A,B) -> (C D)
Meaning A and B have common friends C and D. I came as close as doing a groupByKey with the following result.
(B,CompactBuffer(A, C, D, E))
(A,CompactBuffer(B, C, D))
(C,CompactBuffer(A, B, D, E))
(E,CompactBuffer(B, C, D))
(D,CompactBuffer(A, B, C, E))
The code:
val rdd: RDD[String] = spark.sparkContext.textFile("twocols.txt")
val splitrdd: RDD[(String, String)] = rdd.map { s =>
var str = s.split(" ")
new Tuple2(str(0), str(1))
}
val group: RDD[(String, Iterable[String])] = splitrdd.groupByKey()
group.foreach(println)
First swap the elements:
val swapped = splitRDD.map(_.swap)
Then self-join and swap back:
val shared = swapped.join(swapped).map(_.swap)
Finally filter out duplicates (if needed) and groupByKey:
shared.filter { case ((x, y), _) => x < y }.groupByKey
This is just an ugly attempt:
Suppose you have converted your two columns into Array[Array[String]] (or List[List[String]], it's really the same), say
val pairs=Array(
Array("A","B"),
Array("A","C"),
Array("A","D"),
Array("B","A"),
Array("B","C"),
Array("B","D"),
Array("B","E"),
Array("C","A"),
Array("C","B"),
Array("C","D"),
Array("C","E"),
Array("D","A"),
Array("D","B"),
Array("D","C"),
Array("D","E"),
Array("E","B"),
Array("E","C"),
Array("E","D")
)
Define the group for which you want to find their common friends:
val group=Array("C","D")
The following will find the friends for each member in your group
val friendsByMemberOfGroup=group.map(
i => pairs.filter(x=> x(1) contains i)
.map(x=>x(0))
)
For example, pairs.filter(x=>x(1) contains "C").map(x=>x(0)) returns the friends of "C" where "C" is being taken from the second column and its friends are taken from the first column:
scala> pairs.filter(x=> x(1) contains "C").map(x=>x(0))
res212: Array[String] = Array(A, B, D, E)
And the following loop will find the common friends of all the members in your group
var commonFriendsOfGroup=friendsByMemberOfGroup(0).toSet
for(i <- 1 to friendsByMemberOfGroup.size-1){
commonFriendsOfGroup=
commonFriendsOfGroup.intersect(friendsByMemberOfGroup(i).toSet)
}
So you get
scala> commonFriendsOfGroup.toArray
res228: Array[String] = Array(A, B, E)
If you change your group to val group=Array("A","B","E") and apply the previous lines then you will get
scala> commonFriendsOfGroup.toArray
res230: Array[String] = Array(C, D)
Continuing from where you left off:
val group: RDD[(String, Iterable[String])] = splitrdd.groupByKey()
val group_map = group.collectAsMap
val common_friends = group
.flatMap{case (x, friends) =>
friends.map{y =>
((x,y),group_map.get(y).get.toSet.intersect(friends.toSet))
}
}
scala> common_friends.foreach(println)
((B,A),Set(C, D))
((B,C),Set(A, D, E))
((B,D),Set(A, C, E))
((B,E),Set(C, D))
((D,A),Set(B, C))
((D,B),Set(A, C, E))
((D,C),Set(A, B, E))
((D,E),Set(B, C))
((A,B),Set(C, D))
((A,C),Set(B, D))
((A,D),Set(B, C))
((C,A),Set(B, D))
((C,B),Set(A, D, E))
((C,D),Set(A, B, E))
((C,E),Set(B, D))
((E,B),Set(C, D))
((E,C),Set(B, D))
((E,D),Set(B, C))
Note: this assumes your data has the relationship in both directions like in your example: (A B and B A). If it's not the case you need to add some code to deal with the fact that group_map.get(y) might return None.
So I ended up doing this on the client side. DO NOT DO THIS
val arr: Array[(String, Iterable[String])] = group.collect()
//arr.foreach(println)
var arr2 = scala.collection.mutable.Set[((String, String), List[String])]()
for (i <- arr)
for (j <- arr)
if (i != j) {
val s1 = i._2.toSet
val s2 = j._2.toSet
val s3 = s1.intersect(s2).toList
//println(s3)
val pair = if (i._1 < j._1) (i._1, j._1) else (j._1, i._1)
arr2 += ((pair, s3))
}
arr2.foreach(println)
The result is
((B,E),List(C, D))
((A,C),List(B, D))
((A,B),List(C, D))
((A,D),List(B, C))
((B,D),List(A, C, E))
((C,D),List(A, B, E))
((B,C),List(A, D, E))
((C,E),List(B, D))
((D,E),List(B, C))
((A,E),List(B, C, D))
I am wondering if I can do this using transformations within Spark.
Postgres doesn't accept all kind of symbols that Scalacheck arbString generates. Is there a way to generate human readable strings with Scalacheck?
If you take a look at the Gen object you can see a few generators, including alphaChar and identifier.
scala> import org.scalacheck.Gen._
import org.scalacheck.Gen._
scala> identifier.sample
res0: Option[String] = Some(vxlgvihQeknhe4PolpsJas1s0gx3dmci7z9i2pkYlxhO2vdrkqpspcaUmzrxnnb)
scala> alphaChar.sample
res1: Option[Char] = Some(f)
scala> listOf(alphaChar).sample
res2: Option[List[Char]] = Some(List(g, n, x, Y, h, a, c, e, a, j, B, d, m, a, r, r, Z, a, z, G, e, i, i, v, n, Z, x, z, t))
scala> listOf(alphaChar).map(_.mkString).sample
res3: Option[String] = Some(oupwJfqmmqebcsqbtRxzmgnJvdjzskywZiwsqnkzXttLqydbaahsfrjqdyyHhdaNpinvnxinhxhjyzvehKmbuejaeozytjyoyvb)
You can do so by adding a case class ReadableChar(c: Char), and defining an instance of arbitrary for it. Maybe something like
case class ReadableChar(c: Char)
implicit val arbReadable: Arbitrary[ReadableChar] = Arbitrary {
val legalChars = Range('a', 'z').map(_.toChar)
for {
c <- Gen.oneOf(legalChars)
} yield ReadableChar(c)
}
Then you can use the instance for Arbitrary[Array[ReadableChar]] to generate an array of readable chars, turn it into a string via .map(_.c).toString.
This works if you want to define "human readable strings" by the chars they are allowed to contain. If you need additional restrictions you can write a second case class ReadableString(s: String) and define an instance of Arbitrary for it, too.
I want to do k-fold cross validation. Essentially we are given a bunch of data allData. Suppose we partition our input into "k" cluster and put it in groups.
The desired output is a trainAndTestDataList: List[(Iterable[T], Iterable[T])], where the Listis of size "k". The "i"th element of the trainAndTestDataList is a tuple like (A, B), where A should be the "i"th element of groups and B should be all elements of groups except the "i"th one, concatenated.
Any ideas on implementing this efficiently?
val allData: Iterable[T] = ... // we get the data from somewhere
val groupSize = Math.ceil(allData.size / k).toInt
val groups = allData.grouped(groupSize).toList
val trainAndTestDataList = ... // fill out this part
One thing to keep in mind is that allData can be very long, however "k" is very small (say 5). So it is very crucial to keep all the data vectors as Iterator (and not List, Seq, etc).
Update: Here is how I did (and I am not happy about it):
val trainAndTestDataList = {
(0 until k).map{ fold =>
val (a,b) = groups.zipWithIndex.partition{case (g, idx) => idx == fold}
(a.unzip._1.flatten.toIterable, b.unzip._1.flatten.toIterable)
}
}
Reasons I don't like it:
much twisted especially after partition where I do an unzip, then ._1 and flatten. I think one should be able to do a better job.
Although a is a Iterable[T], the output of a.unzip._1.flatten. is a List[T], I think. This is no good, since the number of the element in this list might be very large.
You could try that operation
implicit class TeeSplitOp[T](data: Iterable[T]) {
def teeSplit(count: Int): Stream[(Iterable[T], Iterable[T])] = {
val size = data.size
def piece(i: Int) = i * size / count
Stream.range(0, size - 1) map { i =>
val (prefix, rest) = data.splitAt(piece(i))
val (test, postfix) = rest.splitAt(piece(i + 1) - piece(i))
val train = prefix ++ postfix
(test, train)
}
}
}
This split will be as lazy as splitAt and ++ are within your collection type.
You can try it with
1 to 10 teeSplit 3 force
I believe this should work. It also takes care of the randomization (don't neglect this!) in a reasonably efficient manner, i.e. O(n) instead of O(n log(n)) required for the more naive approach using a random shuffle/permutation of the data.
import scala.util.Random
def testTrainDataList[T](
data: Seq[T],
k: Int,
seed: Long = System.currentTimeMillis()
): Seq[(Iterable[T], Iterable[T])] = {
def createKeys(n: Int, k: Int) = {
val groupSize = n/k
val rem = n % k
val cumCounts = Array.tabulate(k){ i =>
if (i < rem) (i + 1)*(groupSize + 1) else (i + 1)*groupSize + rem
}
val rng = new Random(seed)
for (count <- n to 1 by -1) yield {
val j = rng.nextInt(count)
val i = cumCounts.iterator.zipWithIndex.find(_._1 > j).map(_._2).get
for (s <- i until k) cumCounts(s) -= 1
}
}
val keys = createKeys(data.length, k)
for (i <- 0 until k) yield {
val testIterable = new Iterable[T] {
def iterator = (keys.iterator zip data.iterator).filter(_._1 == i).map(_._2)
}
val trainIterable = new Iterable[T] {
def iterator = (keys.iterator zip data.iterator).filter(_._1 != i).map(_._2)
}
(testIterator, trainIterator)
}
}
Note the way I define testIterable and trainIterable. This makes your test/train sets lazy and non-memoized, which I gathered is what you wanted.
Example usage:
val data = 'a' to 'z'
for (((testData, trainData), index) <- testTrainDataList(data, 4).zipWithIndex) {
println(s"index = $index")
println("test: " + testData.mkString(", "))
println("train: " + trainData.mkString(", "))
}
//index = 0
//test: i, l, o, q, v, w, y
//train: a, b, c, d, e, f, g, h, j, k, m, n, p, r, s, t, u, x, z
//
//index = 1
//test: a, d, e, h, n, r, z
//train: b, c, f, g, i, j, k, l, m, o, p, q, s, t, u, v, w, x, y
//
//index = 2
//test: b, c, m, t, u, x
//train: a, d, e, f, g, h, i, j, k, l, n, o, p, q, r, s, v, w, y, z
//
//index = 3
//test: f, g, j, k, p, s
//train: a, b, c, d, e, h, i, l, m, n, o, q, r, t, u, v, w, x, y, z
I have a Scala Map:
x: [b,c]
y: [b,d,e]
z: [d,f,g,h]
I want inverse of this map for look-up.
b: [x,y]
c: [x]
d: [x,z] and so on.
Is there a way to do it without using in-between mutable maps
If its not a multi-map - Then following works:
typeMap.flatMap { case (k, v) => v.map(vv => (vv, k))}
EDIT: fixed answer to include what Marth rightfully pointed out. My answer is a bit more lenghty than his as I try to go through each step and not use the magic provided by flatMaps for educational purposes, his is more straightforward :)
I'm unsure about your notation. I assume that what you have is something like:
val myMap = Map[T, Set[T]] (
x -> Set(b, c),
y -> Set(b, d, e),
z -> Set(d, f, g, h)
)
You can achieve the reverse lookup as follows:
val instances = for {
keyValue <- myMap.toList
value <- keyValue._2
}
yield (value, keyValue._1)
At this point, your instances variable is a List of the type:
(b, x), (c, x), (b, y) ...
If you now do:
val groupedLookups = instances.groupBy(_._1)
You get:
b -> ((b, x), (b, y)),
c -> ((c, x)),
d -> ((d, y), (d, z)) ...
Now we want to reduce the values so that they only contain the second part of each pair. Therefore we do:
val reverseLookup = groupedLookup.map(_._1 -> _._2.map(_._2))
Which means that for every pair we maintain the original key, but we map the list of arguments to something that only has the second value of the pair.
And there you have your result.
(You can also avoid assigning to an intermediate result, but I thought it was clearer like this)
Here is my simplification as a function:
def reverseMultimap[T1, T2](map: Map[T1, Seq[T2]]): Map[T2, Seq[T1]] =
map.toSeq
.flatMap { case (k, vs) => vs.map((_, k)) }
.groupBy(_._1)
.mapValues(_.map(_._2))
The above was derived from #Diego Martinoia's answer, corrected and reproduced below in function form:
def reverseMultimap[T1, T2](myMap: Map[T1, Seq[T2]]): Map[T2, Seq[T1]] = {
val instances = for {
keyValue <- myMap.toList
value <- keyValue._2
} yield (value, keyValue._1)
val groupedLookups = instances.groupBy(_._1)
val reverseLookup = groupedLookups.map(kv => kv._1 -> kv._2.map(_._2))
reverseLookup
}