I am using dropWhile in scala below is my problem.
Problem:
val list = List(87, 44, 5, 4, 200)
list.dropWhile(_ < 100) should be(/*result*/)
My Answer:
val list = List(87, 44, 5, 4, 200)
list.dropWhile(_ < 100) should be(List(44,5,4,200))
As per the documentation on dropWhile will continually drop elements until a predicate is no longer satisfied:
In my list the first element will satisfy the predicate so i removed the first element from the list.
val list = List(87, 44, 5, 4, 200)
list.dropWhile(_ < 100) should be(/*result*/)
I am expecting a result of List(44,5,4,200)
But it is not.
You are kind of going in the wrong direction. The head of the list is 87. The next element is 44, etc. dropWhile will continue to drop elements from the list until it hits that 200. If you initialize the list with more elements to the right of the 200, say
val list = List(87, 44, 5, 4, 200, 54, 60)
Then list.dropWhile(_ < 100) will return dropped: List[Int] = List(200, 54, 60)
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I'm updating a variable within a for-loop in scala that I need outside the loop. I tested the code below and got the message "ERROR: undefined". The variable is not empty, inside the loop the values are returned. Thank you.
val example=List(0,0,1,0.7,10,2,5,7,4,1,-9,0,0,0,0,3,3,0,0,0,-80,-6.6,-1,0)
var b=scala.collection.mutable.MutableList.empty[Double]
var b_val:Double=0
for (i<-1 to 24) {
if ( example (i) != 0 ){b_val = b_val + example(i)} else {b_val = 0}
b += b_val;
}
println(b);
You're getting an error because example only has 24 items. Therefore example(i) will throw an IndexOutOfBoundsException when i is 24.
you are trying to add b_val which is double to AnyVal which won't work.
So, to add you need to cast the elements in List to Double as below or define example as List[Double], also your iteration won't work because you are doing to 24 which will blow up because list indexes from 0 to 23
val example = List(0, 0, 1, 0.7, 10, 2, 5, 7, 4, 1, -9, 0, 0, 0, 0, 3, 3, 0, 0, 0, -80, -6.6, -1, 0)
var b = scala.collection.mutable.MutableList.empty[Double]
var b_val: Double = 0
for (i <- 0 until example.length) {
if (example(i) != 0) {
b_val = (b_val + example(i).asInstanceOf[Double])
} else {
b_val = 0
}
b += b_val
}
println(b)
Result is : MutableList(0.0, 0.0, 1.0, 1.7, 11.7, 13.7, 18.7, 25.7, 29.7, 30.7, 21.7, 0.0, 0.0, 0.0, 0.0, 3.0, 6.0, 0.0, 0.0, 0.0, -80.0, -86.6, -87.6, 0.0)
But only you know what you are trying to achieve, small refactor I would do to it in scala way
val example = List(0, 0, 1, 0.7, 10, 2, 5, 7, 4, 1, -9, 0, 0, 0, 0, 3, 3, 0, 0, 0, -80, -6.6, -1, 0)
var b = scala.collection.mutable.MutableList.empty[Double]
var b_val: Double = 0
example.foreach(elem => {
if (elem != 0) {
b_val = b_val + elem.asInstanceOf[Double]
} else {
b_val = 0
}
b += b_val
})
println(b)
I think, you are looking for something like this:
val b = example.scanLeft(0.0) {
case (_, 0) => 0
case (l, r) => l + r
}
If you are going to be writing scala code anyway, learn to do it in scala way. There is no point otherwise.
I have a user data from movielense ml-100K dataset.
Sample rows are -
1|24|M|technician|85711
2|53|F|other|94043
3|23|M|writer|32067
4|24|M|technician|43537
5|33|F|other|15213
I have read data as RDD as follows-
scala> val user_data = sc.textFile("/home/user/Documents/movielense/ml-100k/u.user").map(x=>x.split('|'))
user_data: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[5] at map at <console>:29
scala> user_data.take(5)
res0: Array[Array[String]] = Array(Array(1, 24, M, technician, 85711), Array(2, 53, F, other, 94043), Array(3, 23, M, writer, 32067), Array(4, 24, M, technician, 43537), Array(5, 33, F, other, 15213))
# encode distinct profession with zipWithIndex -
scala> val indexed_profession = user_data.map(x=>x(3)).distinct().sortBy[String](x=>x).zipWithIndex()
indexed_profession: org.apache.spark.rdd.RDD[(String, Long)] = ZippedWithIndexRDD[18] at zipWithIndex at <console>:31
scala> indexed_profession.collect()
res1: Array[(String, Long)] = Array((administrator,0), (artist,1), (doctor,2), (educator,3), (engineer,4), (entertainment,5), (executive,6), (healthcare,7), (homemaker,8), (lawyer,9), (librarian,10), (marketing,11), (none,12), (other,13), (programmer,14), (retired,15), (salesman,16), (scientist,17), (student,18), (technician,19), (writer,20))
I want to do one hot encoding for Occupation column.
Expected output is -
userId Age Gender Occupation Zipcodes technician other writer
1 24 M technician 85711 1 0 0
2 53 F other 94043 0 1 0
3 23 M writer 32067 0 0 1
4 24 M technician 43537 1 0 0
5 33 F other 15213 0 1 0
How do I achieve this on RDD in scala.
I want to perform operation on RDD without converting it to dataframe.
Any help
Thanks
I did this in following way -
1) Read user data -
scala> val user_data = sc.textFile("/home/user/Documents/movielense/ml-100k/u.user").map(x=>x.split('|'))
2) show 5 rows of data-
scala> user_data.take(5)
res0: Array[Array[String]] = Array(Array(1, 24, M, technician, 85711), Array(2, 53, F, other, 94043), Array(3, 23, M, writer, 32067), Array(4, 24, M, technician, 43537), Array(5, 33, F, other, 15213))
3) Create map of profession by indexing-
scala> val indexed_profession = user_data.map(x=>x(3)).distinct().sortBy[String](x=>x).zipWithIndex().collectAsMap()
scala> indexed_profession
res35: scala.collection.Map[String,Long] = Map(scientist -> 17, writer -> 20, doctor -> 2, healthcare -> 7, administrator -> 0, educator -> 3, homemaker -> 8, none -> 12, artist -> 1, salesman -> 16, executive -> 6, programmer -> 14, engineer -> 4, librarian -> 10, technician -> 19, retired -> 15, entertainment -> 5, marketing -> 11, student -> 18, lawyer -> 9, other -> 13)
4) create encode function which does one hot encoding of profession
scala> def encode(x: String) =
|{
| var encodeArray = Array.fill(21)(0)
| encodeArray(indexed_user.get(x).get.toInt)=1
| encodeArray
}
5) Apply encode function to user data -
scala> val encode_user_data = user_data.map{ x => (x(0),x(1),x(2),x(3),x(4),encode(x(3)))}
6) show encoded data -
scala> encode_user_data.take(6)
res71: Array[(String, String, String, String, String, Array[Int])] =
1,24,M,technician,85711,Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0)),
2,53,F,other,94043,Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0)),
3,23,M,writer,32067,Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)),
4,24,M,technician,43537,Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0)),
5,33,F,other,15213,Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0)),
6,42,M,executive,98101,Array(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)))
[My solution is for Dataframe] This below should help in converting a categorical map to one-hot. You have to create a map catMap object with keys as column name and values as list of categories.
var OutputDf = df
for (cat <- catMap.keys) {
val categories = catMap(cat)
for (oneHotVal <- categories) {
OutputDf = OutputDf.withColumn(oneHotVal,
when(lower(OutputDf(cat)) === oneHotVal, 1).otherwise(0))
}
}
OutputDf