Related
I am so fresh in Scala and Functional Programming. And I am stuck with an operation with collections in Scala. I have a variable like that:
val res4: List[(List[Double], Option[Int])] =
List(
(List(4.0, 2.0, 3.0, 4.0, 3.0, 2.5, 4.0),1998),
(List(3.0, 4.0, 3.0, 3.0, 3.5, 2.0, 3.0, 3.0, 4.0).2000,
.......
)
I want to have a map or something like that by using each score in the list:
(4.0, 1998),
(2.0, 1998),
(3.0, 1998),
(4.0, 1998),
(3.0, 1998),
(2.5, 1998),
(4.0, 1998),
(3.0, 1998),
....
How can i do that?
Furhermore, If you know the tip about how to transform Some(1998) into 1998, I will be so appreciated.
You can use flatMap:
List(
(List(4.0, 2.0, 3.0, 4.0, 3.0, 2.5, 4.0), Some(1998)),
(List(3.0, 4.0, 3.0, 3.0, 3.5, 2.0, 3.0, 3.0, 4.0), Some(2000))
)
.flatMap(row => row._1.map(number => (number, row._2)))
.foreach(it => println(it))
I'm working on a small personal project and am trying to re-write the codebase from Python to Scala so that I can be a little more competent functional programmer.
I am working with a Seq that contains stock data and need to create a running sum of volume traded for each day.
My code so far is:
import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat
case class SymbolData(date: DateTime, open: Double, high: Double, low: Double, close: Double, adjClose: Double, volume: Int)
def dateTimeHelper(date: String): DateTime = {
DateTimeFormat.forPattern("yyyy-MM-dd").parseDateTime(date)
}
val sampleData: Seq[SymbolData] = Seq(
SymbolData(dateTimeHelper("2019-01-01"), 1.0, 1.0, 1.0, 1.0, 1.0, 10),
SymbolData(dateTimeHelper("2019-01-02"), 3.0, 2.0, 5.0, 2.0, 8.0, 20),
SymbolData(dateTimeHelper("2019-01-03"), 1.0, 1.0, 1.0, 1.0, 1.0, 10),
SymbolData(dateTimeHelper("2019-01-04"), 4.0, 3.0, 2.5, 2.3, 5.3, 7))
Not all dates may be present so I do not think using a sliding window will be appropriate. For the output I would need to get a Seq of ints that contain sum of last 2 days of data, for example:
Seq(10, 30, 30, 17) # 2019-01-01 has only 1 day with sum value of 10 since there is no data for 2018-12-31, 2019-01-02 would be 30 since we have 2nd and 1st of Jan present, etc...
This is not overly difficult to do in base python, however with Scala there seem to be quite a few options (recursive use of folds?) but I am struggling with the syntax and implementation. Would anyone be able to shed some light on this?
You say "not all dates may be present" but you don't specify how date gaps should be handled.
Here I guessed that output should include all 2-day sums, gap days included.
import java.time.LocalDate
import java.time.temporal.ChronoUnit.DAYS
case class SymbolData(date : LocalDate
,open : Double
,high : Double
,low : Double
,close : Double
,adjClose : Double
,volume : Int)
val sampleData: List[SymbolData] = List(
SymbolData(LocalDate.parse("2019-01-01"), 1.0, 1.0, 1.0, 1.0, 1.0, 10),
SymbolData(LocalDate.parse("2019-01-02"), 3.0, 2.0, 5.0, 2.0, 8.0, 20),
SymbolData(LocalDate.parse("2019-01-03"), 1.0, 1.0, 1.0, 1.0, 1.0, 10),
SymbolData(LocalDate.parse("2019-01-04"), 4.0, 3.0, 2.5, 2.3, 5.3, 7),
// 1 day gap
SymbolData(LocalDate.parse("2019-01-06"), 4.4, 3.3, 2.2, 2.3, 1.3, 13),
// 2 day gap
SymbolData(LocalDate.parse("2019-01-09"), 2.4, 2.2, 1.5, 3.1, 0.9, 21),
SymbolData(LocalDate.parse("2019-01-10"), 2.4, 2.2, 1.5, 3.1, 0.9, 11)
)
val volByDate = sampleData.foldLeft(Map.empty[LocalDate,Int]){
case (m,sd) => m + (sd.date -> sd.volume)
}.withDefaultValue(0)
val startDate = sampleData.head.date
val endDate = sampleData.last.date
val rslt = List.unfold(startDate){ date => //<--Scala 2.13
if (date isAfter endDate) None
else
Some(volByDate(date) + volByDate(date.minus(1L,DAYS)) -> date.plus(1L,DAYS))
}
//rslt: List[Int] = List(10, 30, 30, 17, 7, 13, 13, 0, 21, 32)
I have N maps (Map[String, Double]) each having the same set of keys. Let's say something like the following:
map1 = ("elem1": 2.0, "elem2": 4.0, "elem3": 3.0)
map2 = ("elem1": 4.0, "elem2": 1.0, "elem3": 1.0)
map3 = ("elem1": 3.0, "elem2": 10.0, "elem3": 2.0)
I need to return a new map with element-wise average of those input maps:
resultMap = ("elem1": 3.0, "elem2": 5.0, "elem3": 2.0)
What's the cleanest way to do that in scala? Preferrably without using extra external libraries.
This all happens in Spark*. Thus any answers suggesting spark-specific usage could be helpful.
One option is to convert all Maps to Seqs, union them to a single Seq, group by key and take the average of values:
val maps = Seq(map1, map2, map3)
maps.map(_.toSeq).reduce(_++_).groupBy(_._1).mapValues(x => x.map(_._2).sum/x.length)
// res6: scala.collection.immutable.Map[String,Double] = Map(elem1 -> 3.0, elem3 -> 2.0, elem2 -> 5.0)
Since your question is tagged with apache-spark you can get your desired output by combining the maps into RDD[Map[String, Double]] as
scala> val rdd = sc.parallelize(Seq(Map("elem1"-> 2.0, "elem2"-> 4.0, "elem3"-> 3.0),Map("elem1"-> 4.0, "elem2"-> 1.0, "elem3"-> 1.0),Map("elem1"-> 3.0, "elem2"-> 10.0, "elem3"-> 2.0)))
rdd: org.apache.spark.rdd.RDD[scala.collection.immutable.Map[String,Double]] = ParallelCollectionRDD[1] at parallelize at <console>:24
Then you can use flatMap to flatten the entries of maps into individual rows and use groupBy function with key and sum the grouped values and devide it with the size of the grouped maps. You should get Your desired output as
scala> rdd.flatMap(row => row).groupBy(kv => kv._1).mapValues(values => values.map(value => value._2).sum/values.size)
res0: org.apache.spark.rdd.RDD[(String, Double)] = MapPartitionsRDD[5] at mapValues at <console>:27
scala> res0.foreach(println)
[Stage 0:> (0 + 0) / 4](elem2,5.0)
(elem3,2.0)
(elem1,3.0)
Hope the answer is helpful
I have a Vector and want to remove elements from the Vector. How can I do it in Scala?My input is a Vector[2.0, 3.0, 0.3, 1.0, 4.0] -->Vector[Double] and I want a Vector[2.0, 0.3, 4.0] as output, so I want to remove the element with the index 1 and 3 from my input Vector...
def removeElementFromVector(input: Vector) = {
val inputAsArray = input.toArray
inputAsArray
// ...
val reducedInputAsVector = inputAsArray.toVector
}
Yes, you can use filter to achieve it, but we need to add index to remove element at an index:
Ex: Your vector (scala.collection.immutable.Vector[Double]):
scala> val v1 = val v1 = Vector(2.2, 3.3, 4.4, 5.5, 6.6, 4.4)
Output: Vector(2.2, 3.3, 4.4, 5.5, 6.6, 4.4)
Now, we will remove element at index 2:
scala> var indexRemove=2
scala> val v2 = v1.zipWithIndex.filter(x => x._2!=indexRemove).map(x=>x._1).toVector
Output: Vector(2.2, 3.3, 5.5, 6.6, 4.4)
Now, we will remove element at index 3
scala> var indexRemove=3
scala> val v2 = v1.zipWithIndex.filter(x => x._2!=indexRemove).map(x=>x._1).toVector
Output: Vector(2.2, 3.3, 4.4, 6.6, 4.4)
Hope this helps.
My input is: Vector[2.0, 3.0, 0.3, 1.0, 4.0] -->Vector[Double] and I want a Vector[2.0, 3.0, 0.3, 4.0] as output, so I want to remove the element with the index 1 and 3 from my input Vector. Sry for my question was not clear enough..
You can use filter method to remove them.
> val reducedInputVector = input.filter(x => !(Array(1,3) contains input.indexOf(x)))
reducedInputVector: scala.collection.immutable.Vector[Double] = Vector(2.0, 0.3, 4.0)
I solved it with apply:
val vec1 = Vector(2.0,3.0,0.3,1.0, 4.0)
val vec2 = Vectors.dense(vec1.apply(0), vec1.apply(1),vec1.apply(2), vec1.apply(4))
the output is
vec1: scala.collection.immutable.Vector[Double] = Vector(2.0, 3.0, 0.3, 1.0, 4.0)
vec2: org.apache.spark.mllib.linalg.Vector = [2.0,3.0,0.3,4.0]
def deleteItem[A](row: Vector[A], item: Int): Vector[A] = {
val (a,b) = row.splitAt(item)
if (b!=Nil) a ++ b.tail else a
}
I try to use (1,), but doesn't work, what's the syntax to define Tuple1 in scala ?
scala> val a=(1,)
<console>:1: error: illegal start of simple expression
val a=(1,)
For tuple with cardinality 2 or more, you can use parentheses, however for with cardinality 1, you need to use Tuple1:
scala> val tuple1 = Tuple1(1)
tuple1: (Int,) = (1,)
scala> val tuple2 = ('a', 1)
tuple2: (Char, Int) = (a,1)
scala> val tuple3 = ('a', 1, "name")
tuple3: (Char, Int, java.lang.String) = (a,1,name)
scala> tuple1._1
res0: Int = 1
scala> tuple2._2
res1: Int = 1
scala> tuple3._1
res2: Char = a
scala> tuple3._3
res3: String = name
To declare the type, use Tuple1[T], for example val t : Tuple1[Int] = Tuple1(22)
A tuple is, by definition, an ordered list of elements. While Tuple1 exists, I haven't seen it used explicitly given you'd normally use a single element. Nevertheless, there is no sugar, you need to use Tuple1(1).
There is a valid use case in Spark that requires Tuple1: create a dataframe with one column.
import org.apache.spark.ml.linalg.Vectors
val data = Seq(
Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
)
data.toDF("features").show()
It will throw an error:
"value toDF is not a member of Seq[org.apache.spark.ml.linalg.Vector]"
To make it work, we have to convert each row to Tuple1:
val data = Seq(
Tuple1(Vectors.sparse(5, Seq((1, 1.0), (3, 7.0)))),
Tuple1(Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0)),
Tuple1(Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))
)
or a better way:
val data = Seq(
Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
).map(Tuple1.apply)