Transform Array[Seq[(Int, String)]] to Seq[(Int, String)] in SCALA - scala

I'm pretty new to scala and I can't find a way to get rid of my Array[Seq[(Int, String)]] to one big Seq[(Int, String)] containing the (Int, String) of each Seq[(Int, String)].
Here is a more explicit example:
Array[Seq[(Int, String)]]:
ArrayBuffer((1,a), (1,group), (1,of))
ArrayBuffer((2,following), (2,clues))
ArrayBuffer((3,three), (3,girls))
And here is what I want my Seq[(Int, String)]] to looks like:
Seq((1,a), (1,group), (1,of), (2,following), (2,clues), (3,three), (3,girls))

You are looking for flatten: val flat: Array[(Int, String)] = originalArray.flatten
If you want it to be a Seq rather than an array (good choice), just tuck a .toSeq at the end: originalArray.flatten.toSeq

Related

scala spark reducebykey use custom fuction

I want to use reducebykey but when i try to use it, it show error:
type miss match required Nothing
question: How can I create a custom function for reducebykey?
{(key,value)}
key:string
value: map
example:
rdd = {("a", "weight"->1), ("a", "weight"->2)}
expect{("a"->3)}
def combine(x: mutable.map[string,Int],y:mutable.map[string,Int]):mutable.map[String,Int]={
x.weight = x.weithg+y.weight
x
}
rdd.reducebykey((x,y)=>combine(x,y))
Lets say you have a RDD[(K, V)] (or PairRDD[K, V] to be more accurate) and you want to somehow combine values with same key then you can use reduceByKey which expects a function (V, V) => V and gives you the modified RDD[(K, V)] (or PairRDD[K, V])
Here, your rdd = {("a", "weight"->1), ("a", "weight"->2)} is not real Scala and similary the whole combine function is wrong both syntactically and logically (it will not compile). But I am guessing that what you have is something like following,
val rdd = sc.parallelize(List(
("a", "weight"->1),
("a", "weight"->2)
))
Which means that your rdd is of type RDD[(String, (String, Int))] or PairRDD[String, (String, Int)] which means that reduceByKey wants a function of type ((String, Int), (String, Int)) => (String, Int).
def combine(x: (String, Int), y: (String, Int])): (String, Int) =
(x._1, x._2 + y._2)
val rdd2 = rdd.reducebykey(combine)
If your problem is something else then please update the question to share your problem with real code, so that others can actually understand it.

How to access to a value of a scala Tuples

I have a sequence of tuples that with a value and his power 2:
val fields3: Seq[(Int, Int)] = Seq((3, 9), (5, 25))
the thing that I want to know is if there is a way to access to a value of the same tuple directly when I create the object whithout use a foreach:
val fields3: Seq[(Int, Int)] = Seq((3, 3 * 3 ), (5, 5 * 5))
my idea is something like:
val fields3: Seq[(Int, Int)] = Seq((3, _1 * _1 ), (5, _1 * _1)) //like this doesn't compile
You can do something like this:
Seq(2,3,4).map(i => (i, i*i))
You could wrap the tuple in a case class potentially:
case class TupleInt(base: Int) {
val tuple: (Int, Int) = (base, base*base)
}
Then you could create the sequence like this:
val fields3: Seq[(Int, Int)] = Seq(TupleInt(3), TupleInt(5)).map(_.tuple)
I would prefer the answer #geek94 gave, this is too verbose for what you want to do.
An equally valid way to express this is:
val fields3: Seq[(Int, Int)] = Seq(3, 5).map(i => i -> i*i)

Parallelize collection in spark scala shell

I trying to parallelize the tuple and getting error below. Please let me know that is the error in below syntax
Thank you
Method parallelize need a Seq. Each item in the seq will be one record.
def parallelize[T](seq: Seq[T],
numSlices: Int = defaultParallelism)
(implicit arg0: ClassTag[T]): RDD[T]
In your example, you need add a Seq to wrap the Tuple, and in this case the RDD only has ONE record
scala> val rdd = sc.parallelize(Seq(("100", List("5", "-4", "2", "NA", "-1"))))
rdd: org.apache.spark.rdd.RDD[(String, List[String])] = ParallelCollectionRDD[2] at parallelize at <console>:24
scala> rdd.count
res4: Long = 1

How to return Map in scala

I just started playing with scala and i cross the following issue. I want to simply return a Map with Int as key and List of Tuples for values. That is my method:
def findOpenTiles(board: Array[Array[Int]]): Map[Int, List[(Int, Int)]]={
val openTilesMap = Map[Int, List[(Int, Int)]]
for (x <- 0 until Constant.boardWidth; y <- 0 until Constant.boardHeight) yield {
if (hasOpenTile(board, x, y)){
// add to values to openTilesMap
}
}
openTilesMap
}
However my IDE shows error as:
Expression of type (Seq[(Int, List[Int, Int])]) => Map[Int, List[(Int, Int)]] doesn't conform to expected type Map[Int, List[(Int, Int)]]
Does it mean that val openTilesMap = Map[Int, List[(Int, Int)]] creates Seq of Tuples (Int, List[Int, Int]) instead of Map? If so, how can i make it return Map?
// edit
I'm trying to write a bot to javascript game. I'm mapping a board of tiles. In the mentioned method I am trying to find all "open tiles" (tiles which are not fully surounded by other tiles, thus can be moved) and in the return i would like to have a Map where key is a tile number with coordinates as values. In next step i want to find if it is possible to find path between "open" tiles with the same number.
I think the problem is the line
val openTilesMap = Map[Int, List[(Int, Int)]]
You should try this:
val openTilesMap: Map[Int, List[(Int, Int)]] = Map()
Your version assigns the type Map[Int, List[(Int, Int)]] to the value openTilesMap.

How to add optional entries to Map in Scala?

Suppose I am adding an optional entry of type Option[(Int, String)] to Map[Int, String]
def foo(oe: Option[(Int, String)], map: Map[Int, String]) = oe.fold(map)(map + _)
Now I wonder how to add a few optional entries:
def foo(oe1: Option[(Int, String)],
oe2: Option[(Int, String)],
oe3: Option[(Int, String)],
map: Map[Int, String]): Map[Int, String] = ???
How would you implement it ?
As I mention in a comment above, Scala provides an implicit conversion (option2Iterable) that allows you to use Option as a collection of one or zero objects in the context of other types in the collection library.
This has some annoying consequences, but it does provide the following nice syntax for your operation:
def foo(oe1: Option[(Int, String)],
oe2: Option[(Int, String)],
oe3: Option[(Int, String)],
map: Map[Int, String]): Map[Int, String] = map ++ oe1 ++ oe2 ++ oe3
This works because the ++ on Map takes an GenTraversableOnce[(A, B)], and the Iterable that you get from option2Iterable is a subtype of GenTraversableOnce.
There are lots of variations on this approach. You could also write map ++ Seq(oe1, oe2, oe3).flatten, for example. I find that less clear, and it involves the creation of an extra collection, but if you like it, go for it.
map ++ Seq(oe1, oe2, oe3).flatten
If the number of optional entries is variable I would use variable length arguments
def foo(map: Map[Int, String], os: Option[(Int, String)]*) = map ++ os.flatten