Related
I executed the following code:
temp = rdd.map( lambda p: ( p[0], (p[1],p[2],p[3],p[4],p[5]) ) ).groupByKey().mapValues(list).collect()
print(temp)
and I could get data:
[ ("A", [("a", 1, 2, 3, 4), ("b", 2, 3, 4, 5), ("c", 4, 5, 6, 7)]) ]
I'm trying to make a dictionary with second list argument.
For example I want to reconstruct temp like this format:
("A", {"a": [1, 2, 3, 4], "b":[2, 3, 4, 5], "c":[4, 5, 6, 7]})
Is there any clear way to do this?
If I understood you correctly you need something like this:
spark = SparkSession.builder.getOrCreate()
data = [
["A", "a", 1, 2, 5, 6],
["A", "b", 3, 4, 6, 9],
["A", "c", 7, 5, 6, 0],
]
rdd = spark.sparkContext.parallelize(data)
temp = (
rdd.map(lambda x: (x[0], ({x[1]: [x[2], x[3], x[4], x[5]]})))
.groupByKey()
.mapValues(list)
.mapValues(lambda x: {k: v for y in x for k, v in y.items()})
)
print(temp.collect())
# [('A', {'a': [1, 2, 5, 6], 'b': [3, 4, 6, 9], 'c': [7, 5, 6, 0]})]
This is easily doable with a custom Python function once you obtain the temp object. You just need to use tuple, list and dict manipulation.
def my_format(l):
# get tuple inside list
tup = l[0]
# create dictionary with key equal to first value of each sub-tuple
dct = {}
for e in tup[1]:
dct2 = {e[0]: list(e[1:])}
dct.update(dct2)
# combine first element of list with dictionary
return (tup[0], dct)
my_format(temp)
# ('A', {'a': [1, 2, 3, 4], 'b': [2, 3, 4, 5], 'c': [4, 5, 6, 7]})
I was given a list of apps along with their ratings:
let appRatings = [
"Calendar Pro": [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise": [2, 1, 2, 2, 1, 2, 4, 2]
]
I want to write a func that takes appRating as input and return their name and average rating, like this.
["Calendar Pro": 3,
"The Messenger": 3,
"Socialise": 2]
Does anyone know how to implement such a method that it takes (name and [rating]) as input and outputs (name and avgRating ) using a closure inside the func?
This is what I have so far.
func calculate( appName: String, ratings : [Int]) -> (String, Double ) {
let avg = ratings.reduce(0,+)/ratings.count
return (appName, Double(avg))
}
Fundamentally, what you're trying to achieve is a mapping between one set of values into another. Dictionary has a function for this, Dictionary.mapValues(_:), specifically for mapping values only (keeping them under the same keys).
let appRatings = [
"Calendar Pro": [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise": [2, 1, 2, 2, 1, 2, 4, 2]
]
let avgAppRatings = appRatings.mapValues { allRatings in
return computeAverage(of: allRatings) // Dummy function we'll implement later
}
So now, it's a matter of figuring out how to average all the numbers in an Array. Luckily, this is very easy:
We need to sum all the ratings
We can easily achieve this with a reduce expression. StWe'll reduce all numbers by simply adding them into the accumulator, which will start with 0
allRatings.reduce(0, { accumulator, rating in accumulator + rate })
From here, we can notice that the closure, { accumulator, rating in accumulator + rate } has type (Int, Int) -> Int, and just adds the numbers together. Well hey, that's exactly what + does! We can just use it directly:
allRatings.reduce(0, +)
We need to divide the ratings by the number of ratings
There's a catch here. In order for the average to be of any use, it can't be truncated to a mere Int. So we need both the sum and the count to be converted to Double first.
You need to guard against empty arrays, whose count will be 0, resulting in Double.infinity.
Putting it all together, we get:
let appRatings = [
"Calendar Pro": [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise": [2, 1, 2, 2, 1, 2, 4, 2]
]
let avgAppRatings = appRatings.mapValues { allRatings in
if allRatings.isEmpty { return nil }
return Double(allRatings.reduce(0, +)) / Double(allRatings.count)
}
Add in some nice printing logic:
extension Dictionary {
var toDictionaryLiteralString: String {
return """
[
\t\(self.map { k, v in "\(k): \(v)" }.joined(separator: "\n\t"))
]
"""
}
}
... and boom:
print(avgAppRatings.toDictionaryLiteralString)
/* prints:
[
Socialise: 2.0
The Messenger: 3.0
Calendar Pro: 3.375
]
*/
Comments on your attempt
You had some questions as to why your attempt didn't work:
func calculate( appName: String, ratings : [Int]) -> (String: Int ) {
var avg = ratings.reduce(0,$0+$1)/ratings.count
return appName: sum/avg
}
$0+$1 isn't within a closure ({ }), as it needs to be.
appName: sum/avg isn't valid Swift.
The variable sum doesn't exist.
avg is a var variable, even though it's never mutated. It should be a let constant.
You're doing integer devision, which doesn't support decimals. You'll need to convert your sum and count into a floating point type, like Double, first.
A fixed version might look like:
func calculateAverage(of numbers: [Int]) -> Double {
let sum = Double(ratings.reduce(0, +))
let count = Double(numbers.count)
return sum / count
}
To make a function that processes your whole dictionary, incoroprating my solution above, you might write a function like:
func calculateAveragesRatings(of appRatings: [String: [Int]]) -> [String: Double?] {
return appRatings.mapValues { allRatings in
if allRatings.isEmpty { return nil }
return Double(allRatings.reduce(0, +)) / Double(allRatings.count)
}
}
This a simple solution that takes into account that a rating is an integer:
let appRatings = [
"Calendar Pro": [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise": [2, 1, 2, 2, 1, 2, 4, 2]
]
let appWithAverageRating: [String: Int] = appRatings.mapValues { $0.reduce(0, +) / $0.count}
print("appWithAverageRating =", appWithAverageRating)
prints appWithAverageRating = ["The Messenger": 3, "Calendar Pro": 3, "Socialise": 2]
If you'd like to check whether an app has enough ratings before returning an average rating, then the rating would be an optional Int:
let minimumNumberOfRatings = 0 // You can change this
var appWithAverageRating: [String: Int?] = appRatings.mapValues { ratingsArray in
guard ratingsArray.count > minimumNumberOfRatings else {
return nil
}
return ratingsArray.reduce(0, +) / ratingsArray.count
}
If you'd like the ratings to go by half stars (0, 0.5, 1, ..., 4.5, 5) then we could use this extension:
extension Double {
func roundToHalf() -> Double {
let n = 1/0.5
let numberToRound = self * n
return numberToRound.rounded() / n
}
}
Then the rating will be an optional Double. Let's add an AppWithoutRatings and test our code:
let appRatings = [
"Calendar Pro": [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise": [2, 1, 2, 2, 1, 2, 4, 2],
"AppWithoutRatings": []
]
let minimumNumberOfRatings = 0
var appWithAverageRating: [String: Double?] = appRatings.mapValues { ratingsArray in
guard ratingsArray.count > minimumNumberOfRatings else {
return nil
}
let rating: Double = Double(ratingsArray.reduce(0, +) / ratingsArray.count)
return rating.roundToHalf()
}
And this prints:
appWithAverageRating = ["Calendar Pro": Optional(3.0), "Socialise": Optional(2.0), "The Messenger": Optional(3.0), "AppWithoutRatings": nil]
I decided to make an Dictionary extension for this, so it is very easy to use in the future.
Here is my code I created:
extension Dictionary where Key == String, Value == [Float] {
func averageRatings() -> [String : Float] {
// Calculate average
func average(ratings: [Float]) -> Float {
return ratings.reduce(0, +) / Float(ratings.count)
}
// Go through every item in the ratings dictionary
return self.mapValues { $0.isEmpty ? 0 : average(ratings: $0) }
}
}
let appRatings: [String : [Float]] = ["Calendar Pro": [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise": [2, 1, 2, 2, 1, 2, 4, 2]]
print(appRatings.averageRatings())
which will print the result of ["Calendar Pro": 3.375, "Socialise": 2.0, "The Messenger": 3.0].
Just to make the post complete another approach using reduce(into:) to avoid using a dictionary with an optional value type:
extension Dictionary where Key == String, Value: Collection, Value.Element: BinaryInteger {
var averageRatings: [String : Value.Element] {
return reduce(into: [:]) {
if !$1.value.isEmpty {
$0[$1.key] = $1.value.reduce(0,+) / Value.Element($1.value.count)
}
}
}
}
let appRatings2 = ["Calendar Pro" : [1, 5, 5, 4, 2, 1, 5, 4],
"The Messenger": [5, 4, 2, 5, 4, 1, 1, 2],
"Socialise" : [2, 1, 2, 2, 1, 2, 4, 2] ]
let keySorted = appRatings2.averageRatings.sorted(by: {$0.key<$1.key})
keySorted.map{ print($0,$1) }
Calendar Pro 3
Socialise 2
The Messenger 3
I have a nested map like so:
val m: Map[Int, Map[String, Seq[Int]]] =
Map(
1 -> Map(
"A" -> Seq(1, 2, 3),
"B" -> Seq(4, 5, 6)
),
2 -> Map(
"C" -> Seq(7, 8, 9),
"D" -> Seq(10, 11, 12),
"E" -> Seq(13, 14, 15)
),
3 -> Map(
"F" -> Seq(16, 17, 18)
)
)
I want the desired output to show every possible combination of the integers in the Seqs. For example:
List((1, "A", 1),
(1, "A", 2),
(1, "A", 3),
(1, "B", 4),
(1, "B", 5),
(1, "B", 6),
(2, "C", 7),
(2, "C", 8),
(2, "C", 9),
(2, "D", 10),
(2, "D", 11),
(2, "D", 12),
(2, "E", 13),
(2, "E", 14),
(2, "E", 15),
(3, "F", 16),
(3, "F", 17),
(3, "F", 18))
I have been trying different combinations of map and flatMap, but nothing has been working. Any ideas?
Here is a possibility using a for comprehension:
for {
(k1, v1) <- m
(k2, v2) <- v1
v3 <- v2
} yield (k1, k2, v3)
This goes through all top key/value pairs of m. For each of these top values, this goes through all nested key/values. And finally for all of these nested values (which are the lists), it goes through each elements and yields what's requested.
A for comprehension is an equivalent to nested flatMaps, such as:
m.flatMap{
case (k1, v1) => v1.flatMap {
case (k2, v2) => v2.map(v3 => (k1, k2, v3))
}
}
NSMutableArray *sample;
I have an NSmutableArray, and I want to split it into chunks. I have tried checking the internet didn't find the solution for it. I got the link to split integer array.
How about this which is more Swifty?
let integerArray = [1,2,3,4,5,6,7,8,9,10]
let stringArray = ["a", "b", "c", "d", "e", "f"]
let anyObjectArray: [Any] = ["a", 1, "b", 2, "c", 3]
extension Array {
func chunks(_ chunkSize: Int) -> [[Element]] {
return stride(from: 0, to: self.count, by: chunkSize).map {
Array(self[$0..<Swift.min($0 + chunkSize, self.count)])
}
}
}
integerArray.chunks(2) //[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
stringArray.chunks(3) //[["a", "b", "c"], ["d", "e", "f"]]
anyObjectArray.chunks(2) //[["a", 1], ["b", 2], ["c", 3]]
To Convert NSMutableArray to Swift Array:
let nsarray = NSMutableArray(array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
if let swiftArray = nsarray as NSArray as? [Int] {
swiftArray.chunks(2) //[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
}
If you wanna insist to use NSArray, then:
let nsarray = NSMutableArray(array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
extension NSArray {
func chunks(_ chunkSize: Int) -> [[Element]] {
return stride(from: 0, to: self.count, by: chunkSize).map {
self.subarray(with: NSRange(location: $0, length: Swift.min(chunkSize, self.count - $0)))
}
}
}
nsarray.chunks(3) //[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
You can use the subarray method.
let array = NSArray(array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
let left = array.subarray(with: NSMakeRange(0, 5))
let right = array.subarray(with: NSMakeRange(5, 5))
I have an rdd of student grades and I need to first group them by the first column which is university and then show the average of student count in each course like this. What is the easiest way to do this query?
+----------+-------------------+
|university| avg of students |
+----------+--------------------+
| MIT| 3 |
| Cambridge| 2.66
Here is the dataset.
case class grade(university: String, courseId: Int, studentId: Int, grade: Double)
val grades = List(grade(
grade("Cambridge", 1, 1001, 4),
grade("Cambridge", 1, 1004, 4),
grade("Cambridge", 2, 1006, 3.5),
grade("Cambridge", 2, 1004, 3.5),
grade("Cambridge", 2, 1002, 3.5),
grade("Cambridge", 3, 1006, 3.5),
grade("Cambridge", 3, 1007, 5),
grade("Cambridge", 3, 1008, 4.5),
grade("MIT", 1, 1001, 4),
grade("MIT", 1, 1002, 4),
grade("MIT", 1, 1003, 4),
grade("MIT", 1, 1004, 4),
grade("MIT", 1, 1005, 3.5),
grade("MIT", 2, 1009, 2))
1) First groupBy university
2) then get course count per university
3) then groupBy courseId
4) then get student count per course
grades.groupBy(_.university).map { case (k, v) =>
val courseCount = v.map(_.courseId).distinct.length
val studentCountPerCourse = v.groupBy(_.courseId).map { case (k, v) => v.length }.sum
k -> (studentCountPerCourse.toDouble / courseCount.toDouble)
}
Scala REPL
scala> val grades = List(
grade("Cambridge", 1, 1001, 4),
grade("Cambridge", 1, 1004, 4),
grade("Cambridge", 2, 1006, 3.5),
grade("Cambridge", 2, 1004, 3.5),
grade("Cambridge", 2, 1002, 3.5),
grade("Cambridge", 3, 1006, 3.5),
grade("Cambridge", 3, 1007, 5),
grade("Cambridge", 3, 1008, 4.5),
grade("MIT", 1, 1001, 4),
grade("MIT", 1, 1002, 4),
grade("MIT", 1, 1003, 4),
grade("MIT", 1, 1004, 4),
grade("MIT", 1, 1005, 3.5),
grade("MIT", 2, 1009, 2))
// grades: List[grade] = List(...)
scala> grades.groupBy(_.university).map { case (k, v) =>
val courseCount = v.map(_.courseId).distinct.length
val studentCountPerCourse = v.groupBy(_.courseId).map { case (k, v) => v.length }.sum
k -> (studentCountPerCourse.toDouble / courseCount.toDouble)
}
// res2: Map[String, Double] = Map("MIT" -> 3.0, "Cambridge" -> 2.6666666666666665)
gradesRdd.map({ case Grade(university: String, courseId: Int, studentId: Int, gpa: Int) =>
((university),(courseId))}).mapValues(x => (x, 1))
.reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2))
.mapValues(y => 1.0 * y._1 / y._2).collect
res73: Array[(String, Double)] = Array((Cambridge,2.125), (MIT,1.1666666666666667))