How to get n items sequentially in a Scala sequence - scala

Is there a way to get n items sequentially until the end in Scala? Like get (1..n), get (n+1..2n), get (2n+1..3n), ...
An example when n is 3:
val items = Seq(1,2,3,4,5,6,7,8)
items
.myFunc(3)
.map(subSeq => println(subSeq))
Output:
[1,2,3]
[4,5,6]
[7,8]

Related

Split rdd and access subgroup of elements

I would like to split my RDD regarding commas and access a predefined set of elements.
For example, I have a RDD like that:
a, b, c, d
e, f, g, h
and I need to split then access the first and fourth element on the first line and the second and third element on the second line to get this resulting RDD:
a, d
f, g
I can't hard write "1" and "4" on my code, that's why solution like that won't work:
rdd.map{line => val words = line.split(",") (words(0),words(3)) }
Lets assume I have a second RRD with the same number of lines which contains the elements I want to get for each line
1,4
2,3
Is there a way to get my elements ?
Lets assume I have a second RRD with the same number of lines which contains the elements I want to get for each line
1,4
2,3
Is there a way to get my elements ?
If you have a second RDD that already has the numbers of the groups you want for each line, you could zip them.
From Spark docs:
<U> RDD<scala.Tuple2<T,U>> zip(RDD<U> other, scala.reflect.ClassTag<U> evidence$13)
Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc.
So in your example, a, b, c, d would be in a key-value pair with 1,4 and e, f, g, h with 2,3 . So you could do something like:
val groupNumbers = lettersRDD zip numbersRDD
groupnumbers.map{tuple ->
val numbers: Seq[Int] = // get the numbers from tuple._2
val words = tuple._1.split(",") (words(numbers.head),words(numbers(1) ) }
}

Scala Shuffle A List Randomly And repeat it

I want to shuffle a scala list randomly.
I know i can do this by using scala.util.Random.shuffle
But here by calling this i will always get a new set of list. What i really want is that in some cases i want the shuffle to give me the same output always. So how can i achieve that?
Basically what i want to do is to shuffle a list randomly at first and then repeat it in the same order. For the first time i want to generate the list randomly and then based on some parameter repeat the same shuffling.
Use setSeed() to seed the generator before shuffling. Then if you want to repeat a shuffle reuse the seed.
For example:
scala> util.Random.setSeed(41L) // some seed chosen for no particular reason
scala> util.Random.shuffle(Seq(1,2,3,4))
res0: Seq[Int] = List(2, 4, 1, 3)
That shuffled: 1st -> 3rd, 2nd -> 1st, 3rd -> 4th, 4th -> 2nd
Now we can repeat that same shuffle pattern.
scala> util.Random.setSeed(41L) // same seed
scala> util.Random.shuffle(Seq(2,4,1,3)) // result of previous shuffle
res1: Seq[Int] = List(4, 3, 2, 1)
Let a be the seed parameter
Let b be the how many time you want to shuffle
There are two ways to kinda of do this
you can use scala.util.Random.setSeed(a) where 'a' can be any integer so after you finish your shuffling b times you can set the 'a' seed again and then your shuffling will be in the same order as your parameter 'a'
The other way is to shuffle List(1,2,3,...a) == 1 to a b times save that as a nested list or vector and then you can map it to your iterable
val arr = List(Bob, Knight, John)
val randomer = (0 to b).map(x => scala.util.Random.shuffle((0 to arr.size))
randomer.map(x => x.map(y => arr(y)))
You can use the same randomer for you other list you want to shuffle by mapping it

Group pair of elements in a List

I have a list (in Scala).
val seqRDD = sc.parallelize(Seq(("a","b"),("b","c"),("c","a"),("d","b"),("e","c"),("f","b"),("g","a"),("h","g"),("i","e"),("j","m"),("k","b"),("l","m"),("m","j")))
I group by the second element for a particular statistics and flatten the result into one list.
val checkItOut = seqRDD.groupBy(each => (each._2))
.map(each => each._2.toList)
.collect
.flatten
.toList
The output looks like this:
checkItOut: List[(String, String)] = List((c,a), (g,a), (a,b), (d,b), (f,b), (k,b), (m,j), (b,c), (e,c), (i,e), (j,m), (l,m), (h,g))
Now, what I'm trying to do is "group" all elements (not pairs) that are connected to other elements in any pair to one list.
For example:
c is with a in one pair, a is with g in its next, so (a,c,g) are connected. Then, c is also with b and e, that b is with a, d, f, k and these are with other characters in some other pair. I want to have them in a list.
I know this can be done with a BFS traversal. BUt wondering if there was an API in Spark that does this?
GraphX, Connected Components: http://spark.apache.org/docs/latest/graphx-programming-guide.html#connected-components

How to compare two list using Scala?

I have two lists
val firstList = List(("A","B",12),("P","Q",13),("L","M",21))
val secondList = List(("A",11),("P",34),("L",43))
I want output as below
val outPutList = List(("P","Q",13,34),("L","M",21,43))
I want to compare third member of firstList to second element of secondList. This means -
I want to check second list value as secondList.map(_.2) is greater than first list as firstList.map(_.3)
Using a for comprehension as follows,
for ( ((a,b,m), (c,n)) <- (firstList zip secondList) if n > m) yield (a,b,m,n)

simple function to return list of integers

if am trying to write a simple function that list of pair of integers - representing a graph and returns a list of integers : all the nodes in a graph
eg if input is [(1,2) (3,4) (5,6) (1,5)]
o/p should be [1,2,3,4,5,6,1,5]
The function is simply returning list of nodes , in the returning list values may repeat as above.
I wrote the following function
fun listofnodes ((x:int,y:int)::xs) = if xs=nil then [x::y] else [[x::y]#listofnodes(xs)]
stdIn:15.12-15.18 Error: operator and operand don't agree [tycon mismatch
operator domain: int * int list
operand: int * int
in expression:
x :: y.
I am not able to figure out what is wrong.
first of all you should know what each operator does:
:: puts individual elemtents into an existing list so that: 1::2::3::[] = [1,2,3]
# puts two lists together so that: [1,2] # [3,4] = [1,2,3,4]
you can also use :: to put lists together but then it becomes a list of lists like:
[1,2] :: [3,4] = [[1,2],[3,4]]
so by writing [x::y] you are saying that x and y should become a list inside a list.
and you shouldnt use an if statement to check for the end of the list, instead you can use patterns to do it like this:
fun listofnodes [] = []
| listofnodes ((x,y)::xs) = x :: y :: listofnodes(xs);
the first pattern assures that when we reach the end of the list, when you extract the final tuple your xs is bound to an empty list which it calls itself with, it leaves an empty list to put all the elements into, so that [(1,2) (3,4) (5,6) (1,5)] would evaluate like this:
1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 1 :: 5 :: [] = [1,2,3,4,5,6,1,5].
you could also make it like this:
fun listofnodes [] = []
| listofnodes ((x,y)::xs) = [x,y] # listofnodes(xs);
this way you make a small 2 element list out of each tuple, and then merge all these small lists into one big list. you dont really need the empty list at the end, but its the only way of ensuring that the recursion stops at the end of the list and you have to put something on the other side of the equals sign. it evaluates like this:
[1,2] # [3,4] # [5,6] # [1,5] # [] = [1,2,3,4,5,6,1,5].
also you cast your x and y as ints, but you dont really have to. if you dont, it gets the types " ('a * 'a) list -> 'a list " which just means that it works for all input types including ints (as long as the tuple doesnt contain conflicting types, like a char and an int).
im guessing you know this, but in case you dont: what you call pairs, (1,2), is called tuples.