Spark - Remove intersecting elements between two array type columns - scala

I have dataframe like this
+---------+--------------------+----------------------------+
| Name| rem1| quota |
+---------+--------------------+----------------------------+
|Customer_3|[258, 259, 260, 2...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_4|[18, 19, 20, 27, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_5|[16, 17, 51, 52, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_6|[6, 7, 8, 9, 10, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_7|[0, 30, 31, 32, 3...|[1, 2, 3, 4, 5, 6, 7,..500]|
I would like to remove list value in rem1 from quota and create as one new column. I have tried.
val dfleft = dfpci_remove2.withColumn("left",$"quota".filter($"rem1"))
<console>:123: error: value filter is not a member of org.apache.spark.sql.ColumnName
Please advise.

You can use a filter in a column in such way, you can write an udf as below
val filterList = udf((a: Seq[Int], b: Seq[Int]) => a diff b)
df.withColumn("left", filterList($"rem1", $"quota") )
This should give you the expected result.
Hope this helps!

Related

Combine two different RDD's with two different sets of data but the same key

RDD_1 contains rows like the following:
(u'id2875421', 2, datetime.datetime(2016, 3, 14, 17, 24, 55), datetime.datetime(2016, 3, 14, 17, 32, 30), 1, -73.9821548461914, 40.76793670654297, -73.96463012695312, 40.765602111816406, u'N', 455)
RDD_2 contains rows like the following:
(u'id2875421', 1.9505895451732258)
What I'm trying to do is get an rdd in the form of
(u'id2875421', 2, datetime.datetime(2016, 3, 14, 17, 24, 55), datetime.datetime(2016, 3, 14, 17, 32, 30), 1, 1.9505895451732258, u'N', 455)
So I'm trying to replace the location columns with a distance column.
rdd1.join(rdd2) gives me:
(u'id1585324', (1, 0.9773030754631484))
and rdd1.union(rdd2) gives me:
(u'id2875421', 2, datetime.datetime(2016, 3, 14, 17, 24, 55), datetime.datetime(2016, 3, 14, 17, 32, 30), 1, -73.9821548461914, 40.76793670654297, -73.96463012695312, 40.765602111816406, u'N', 455)
IIUC, just convert the first RDD into a paired RDD and then join:
rdd1.keyBy(lambda x: x[0]) \
.join(rdd2) \
.map(lambda x: x[1][0][:5] + (x[1][1],) + x[1][0][9:]) \
.collect()
#[(u'id2875421',
# 2,
# datetime.datetime(2016, 3, 14, 17, 24, 55),
# datetime.datetime(2016, 3, 14, 17, 32, 30),
# 1,
# 1.9505895451732258,
# u'N',
# 455)]
Here I use the keyBy() function to convert x[0] of rdd1 to key and the original element as value, then join rdd2 and use map() function to pick what you want in the final tuple.

How to convert a List[List[Long]] to a List[List[Int]]?

What's the best way to convert a List[List[Long]] to a List[List[Int]] in Scala?
For example, given the following list of type List[List[Long]]
val l: List[List[Long]] = List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
how can it be converted to List[List[Int]]?
You can also use cats lib for that and compose List functors
import cats.Functor
import cats.implicits._
import cats.data._
val l: List[List[Long]] = List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
Functor[List].compose[List].map(l)(_.toInt)
//or
Nested(l).map(_.toInt).value
and one more pure scala approach (not very safe, though)
val res:List[List[Int]] = l.asInstanceOf[List[List[Int]]]
Try l.map(_.map(_.toInt)) like so
val l: List[List[Long]] = List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
l.map(_.map(_.toInt))
which should give
res2: List[List[Int]] = List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
Only if you are completely sure that you won't overflow the Int.
val l1: List[List[Long]] = List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
val l2: List[List[Int]] = l1.map(list => list.map(long => long.toInt))
(Basically, every time you want to transform a List into another List, use map).
can be achieved with simple transformation on collection using map function.
map works by applying a function to each element in the list. in your case nested lists are there. so you need to apply map function 2 times like below example...
val x : List[List[Long]] = List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
println(x)
val y :List[List[Int]]= x.map(a => a.map(_.toInt))
println(y)
Output :
List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))
List(List(11, 10, 11, 10, 11), List(8, 19, 24, 0, 2))

Can you merge two Flux, without blocking, such that the result only contains unique elements?

Is there a way to merge two Flux such that the result only contains unique elements? I can block on the output and then convert it to a set, but is there a way that does not depend on blocking?
Source (Kotlin)
val set1 = Flux.just(1, 2, 3, 4, 5)
val set2 = Flux.just(2, 4, 6, 8, 10)
val mergedSet = set1.mergeWith(set2)
println(mergedSet.collectList().block())
Output
[1, 2, 3, 4, 5, 2, 4, 6, 8, 10]
Desired Output (order is not important)
[1, 2, 3, 4, 5, 6, 8, 10]
You can use the Flux's merge method and then apply distinct() to it.
Flux.merge (Flux.just(1, 2, 3, 4, 5), Flux.just(2, 4, 6, 8, 10)).distinct();
This way you get a flux which produces only distinct values.

Unable to replicate shuffle in a array in Swift

I have two arrays that I want to shuffle. These are the two arrays:
var allCards = ["2_of_clubs", "2_of_spades", "2_of_diamonds", "2_of_hearts", "3_of_clubs", "3_of_spades", "3_of_diamonds", "3_of_hearts", "4_of_clubs", "4_of_spades", "4_of_diamonds", "4_of_hearts", "5_of_clubs", "5_of_spades", "5_of_diamonds", "5_of_hearts", "6_of_clubs", "6_of_spades", "6_of_diamonds", "6_of_hearts", "7_of_clubs", "7_of_spades","7_of_diamonds","7_of_hearts", "8_of_clubs", "8_of_spades", "8_of_diamonds", "8_of_hearts", "9_of_clubs", "9_of_spades", "9_of_diamonds", "9_of_hearts", "10_of_clubs", "10_of_spades", "10_of_diamonds", "10_of_hearts", "jack_of_clubs", "jack_of_spades", "jack_of_diamonds", "jack_of_hearts", "queen_of_clubs", "queen_of_spades", "queen_of_diamonds", "queen_of_hearts", "king_of_clubs", "king_of_spades", "king_of_diamonds", "king_of_hearts", "ace_of_clubs", "ace_of_spades", "ace_of_diamonds", "ace_of_hearts"]
var allValues = [2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14]
I want to shuffle them equally, so value 2 stays at 2 of clubs, 2 of spades and so on. I tried using the answers from Shuffle array swift 3 and How do I shuffle an array in Swift? they stated this should work:
let randomIndex = UInt64(arc4random_uniform(UInt32(1000)))
let randomShuffle = GKLinearCongruentialRandomSource(seed: randomIndex)
let shuffledValues = randomShuffle.arrayByShufflingObjects(in: allValues)
let shuffledCards = randomShuffle.arrayByShufflingObjects(in: allCards)
print(shuffledValues)
print(shuffledCards)
I get this as a print:
[3, 6, 5, 5, 9, 10, 11, 11, 8, 6, 5, 3, 14, 12, 3, 8, 2, 3, 10, 4, 13, 12, 7, 12, 10, 5, 12, 13, 14, 11, 2, 6, 9, 7, 10, 14, 7, 8, 6, 14, 4, 9, 13, 2, 11, 9, 4, 7, 8, 2, 13, 4]
[jack_of_clubs, 6_of_hearts, 10_of_hearts, 6_of_spades, king_of_hearts, 5_of_spades, 5_of_hearts, ace_of_diamonds, queen_of_diamonds, 10_of_spades, 7_of_hearts, queen_of_spades, 9_of_clubs, 2_of_diamonds, 3_of_hearts, 3_of_diamonds, 9_of_spades, queen_of_clubs, 8_of_clubs, 9_of_diamonds, 7_of_clubs, 3_of_spades, 8_of_spades, 8_of_hearts, 5_of_clubs, 6_of_diamonds, ace_of_spades, 2_of_spades, ace_of_clubs, 10_of_diamonds, 4_of_spades, 2_of_clubs, 10_of_clubs, king_of_diamonds, 7_of_diamonds, 6_of_clubs, 8_of_diamonds, queen_of_hearts, 9_of_hearts, jack_of_diamonds, 2_of_hearts, king_of_clubs, jack_of_spades, 4_of_hearts, 7_of_spades, 3_of_clubs, 4_of_diamonds, 4_of_clubs, king_of_spades, jack_of_hearts, ace_of_hearts, 5_of_diamonds]
Both have the same count. I am curious why this does not work. Is it possible to edit this code to make it work, else I would like to know how to shuffle an array and replicate that.
You can pair up your elements with zip, then shuffle, then unzip.
let pairs = Array(zip(allCards, allValues))
let randomIndex = UInt64(arc4random_uniform(UInt32(1000)))
let randomShuffle = GKLinearCongruentialRandomSource(seed: randomIndex)
let shuffledPairs = randomShuffle.arrayByShufflingObjects(in: pairs) as! [(String, Int)]
let shuffledCards = shuffledPairs.map { $0.0 }
let shuffledValues = shuffledPairs.map { $0.1 }

FoldRight function scala

I have this function that uses foldright to append the two lists
def append[T](l1: List[T], l2: List[T]): List[T] = (l1 :\ l2) ((a,b) => a::b)
The scala returns:
val l1 = List(1,2,3,4,5)
val l2 = List(6,7,8,9,10)
println(append(l1,l2))
Result: List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
As starting from right to left, the result should not return opposite? Why returns in this way?
foldRight is execute from right to left, so the iteration is,
1: a is 5, b is 6, 7, 8, 9, 10, result is 5, 6, 7, 8, 9, 10
2: a is 4, b is 5, 6, 7, 8, 9, 10, result is 4, 5, 6, 7, 8, 9, 10
...
final result is 1, 2, 3, ..., 8, 9, 10