I have this code:
val products = List()
def loadProducts(report: (Asset, Party, AssetModel, Location, VendingMachineReading)) = {
report match {
case (asset, party, assetModel, location, reading) =>
EvadtsParser.parseEvadts(reading.evadts, result)
(result.toMap).map(product => ReportData(
customer = party.name,
location = location.description,
asset = asset.`type`,
category = "",
product = product._1,
counter = product._2,
usage = 0,
period = "to be defined")).toList
}
}
results.foreach(result => products ::: loadProducts(result))
println(products)
Can you please tell me what I am doing wrong because products list is empty? If I println products inside loadProducts method, products is not empty. Is the concatenation I am doing wrong?
PS: I am a scala beginner.
As I've already said, ::: yields a new list instead of mutating the one you already have in place.
http://take.ms/WDB http://take.ms/WDB
You have two options to go: immutable and mutable
Here is what you can do in immutable and idiomatic style:
def loadProducts(report: (...)): List[...] = {
...
}
val products = result.flatMap(result => loadProducs(result))
println(products)
But also, you can tie with mutability and use ListBuffer to do the things you've wanted:
def loadProducts(report: (...)): List[T] = {
...
}
val buffer = scala.collection.mutable.ListBuffer[T]()
result.foreach(result => buffer ++ = loadProducs(result))
val products = buffer.toList
println(products)
P.S. flatMap( ...) is an analog to map(...).flatten, so don't be confused that I and Tomasz written it so differently.
List type is immutable, val implies a reference that never changes. Thus you can't really change the contents of products reference. I suggest building a "list of lists" first and then flattening:
val products = results.map(loadProducts).flatten
println(products)
Notice that map(loadProducts) is just a shorthand for map(loadProducts(_)), which is a shorthand for map(result => loadProducts(result)).
If you become more experienced, try foldLeft() approach, which continuously builds products list just like you wanted to do this:
results.foldLeft(List[Int]())((agg, result) => agg ++ loadProducts(result))
Related
Forgive me if I'm not naming things by their actual name, I've just started to learn Scala. I've been looking around for a while, but can not find a clear answer to my question.
Suppose I have a list of objects, each object has two fields: x: Int and l: List[String], where the Strings, in my case, represent categories.
The l lists can be of arbitrary length, so an object can belong to multiple categories. Furthermore, various objects can belong to the same category. My goal is to group the objects by the individual categories, where the categories are the keys. This means that if an object is linked to say "N" categories, it will occur in "N" of the key-value pairs.
So far I managed to groupBy the lists of categories through:
objectList.groupBy(x => x.l)
However, this obviously groups the objects by list of categories rather than by categories.
I'm trying to do this with immutable collections avoiding loops etc.
If anyone has some ideas that would be much appreciated!
Thanks
EDIT:
By request the actual case class and what I am trying.
case class Car(make: String, model: String, fuelCapacity: Option[Int], category:Option[List[String]])
Once again, a car can belong to multiple categories. Let's say List("SUV", "offroad", "family").
I want to group by category elements rather than by the whole list of categories, and have the fuelCapacity as the values, in order to be able to extract average fuelCapacity per category amongst other metrics.
Using your EDIT as a guide.
case class Car( make: String
, model: String
, fuelCapacity: Option[Int]
, category:Option[List[String]] )
val cars: List[Car] = ???
//all currently known category strings
val cats: Set[String] = cars.flatMap(_.category).flatten.toSet
//category -> list of cars in this category
val catMap: Map[String,List[Car]] =
cats.map(cat => (cat, cars.filter(_.category.contains(cat)))).toMap
//category -> average fuel capacity for cars in this category
val fcAvg: Map[String,Double] =
catMap.map{case (cat, cars) =>
val fcaps: List[Int] = cars.flatMap(_.fuelCapacity)
if (fcaps.lengthIs < 1) (cat, -1d)
else (cat, fcaps.sum.toDouble / fcaps.length)
}
Something like the following?
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupBy { case (c,_) => c } // Map[String,Seq[(String,YourType)]]
.mapValues { items => c -> items.map { case (_, o) => o } } // Map[String, Seq[YourType]]
(Deliberately "heavy" to help you understand the idea behind it)
EDIT, or as of Scala 2.13 thanks to groupMap:
objectList // Seq[YourType]
.flatMap(o => o.l.map(c => c -> o)) // Seq[(String, YourType)]
.groupMap { case (c,_) => c } { case (_, o) => o } // Map[String,Seq[YourType]]
You are very close, you just need to split each individual element in the list before the group so try with something like this:
// I used a Set instead of a List,
// since I don't think the order of categories matters
// as well I would think having two times the same category doesn't make sense.
final case class MyObject(x: Int, categories: Set[String] = Set.empty) {
def addCategory(category: String): MyObject =
this.copy(categories = this.categories + category)
}
def groupByCategories(data: List[MyObject]): Map[String, List[Int]] =
data
.flatMap(o => o.categories.map(c => c -> o.x))
.groupMap(_._1)(_._2)
What is the best way to implement a sorting function that has an internal state received else where?
Something like:
type Sorter[Item] = (Item, Item) => Boolean
type StringSorter = Sorter[String]
def customSorter : StringSorter = (i1,i2) =>
{
val i1Cnt = itemCountMap.get(i1)
val i2Cnt = itemCountMap.get(i2)
if (i1Cnt==None || i2Cnt==None ) {
i1<i2
} else {
i1Cnt.get<i2Cnt.get
}
}
And here is an example:
val l1 = List("a","b","c")
val itemCountMap = Map("a"->2,"b"->3,"c"1)
l1.sortWith(customSorter)
//The returned list will be ["c","a","b"]
I am pretty new to Scala, and in general, lambda functions are not suppose to have states (right?).
Why you ask? 'cause I am using a generic type lists in spark which later, deep in the code of the executors, I want to analyze based on a specific order, and this order may depend on some static list, and I also want to control this order function.
First, this i1Cnt==null will never be true because get on Map never returns null, it returns an Option WHICH IS NOT A NULL, it is very different.
Second, there is no problem with state in a lambda, there is a problem with mutable shared state (and not only in a lambda, but everywhere).
Third, your function would be better if it receives the Map to use instead of relying on a global variable.
Fourth, here is the fixed code.
type Sorter[Item] = (Item, Item) => Boolean
def customSorter(map: Map[String, Int]): Sorter[String] = { (s1, s2) =>
(map.get(s1), map.get(s2)) match {
case (Some(i1), Some(i2)) => i1 < i2
case _ => s1 < s2
}
}
You can see it running here)
I'm new to Scala and attempting to do some data analysis.
I have a CSV files with a few headers - lets say item no., item type, month, items sold.
I have made an Item class with the fields of the headers.
I split the CSV into a list with each iteration of the list being a row of the CSV file being represented by the Item class.
I am attempting to make a method that will create maps based off of the parameter I send in. For example if I want to group the items sold by month, or by item type. However I am struggling to send the Item.field into a method.
F.e what I am attempting is something like:
makemaps(Item.month);
makemaps(Item.itemtype);
def makemaps(Item.field):
if (item.field==Item.month){}
else (if item.field==Item.itemType){}
However my logic for this appears to be wrong. Any ideas?
def makeMap[T](items: Iterable[Item])(extractKey: Item => T): Map[T, Iterable[Item]] =
items.groupBy(extractKey)
So given this example Item class:
case class Item(month: String, itemType: String, quantity: Int, description: String)
You could have (I believe the type ascriptions are mandatory):
val byMonth = makeMap[String](items)(_.month)
val byType = makeMap[String](items)(_.itemType)
val byQuantity = makeMap[Int](items)(_.quantity)
val byDescription = makeMap[String](items)(_.description)
Note that _.month, for instance, creates a function taking an Item which results in the String contained in the month field (simplifying a little).
You could, if so inclined, save the functions used for extracting keys in the companion object:
object Item {
val month: Item => String = _.month
val itemType: Item => String = _.itemType
val quantity: Item => Int = _.quantity
val description: Item => String = _.description
// Allows us to determine if using a predefined extractor or using an ad hoc one
val extractors: Set[Item => Any] = Set(month, itemType, quantity, description)
}
Then you can pass those around like so:
val byMonth = makeMap[String](items)(Item.month)
The only real change semantically is that you explicitly avoid possible extra construction of lambdas at runtime, at the cost of having the lambdas stick around in memory the whole time. A fringe benefit is that you might be able to cache the maps by extractor if you're sure that the source Items never change: for lambdas, equality is reference equality. This might be particularly useful if you have some class representing the collection of Items as opposed to just using a standard collection, like so:
object Items {
def makeMap[T](items: Iterable[Item])(extractKey: Item => T): Map[T,
Iterable[Item]] =
items.groupBy(extractKey)
}
class Items(val underlying: immutable.Seq[Item]) {
def makeMap[T](extractKey: Item => T): Map[T, Iterable[Item]] =
if (Item.extractors.contains(extractKey)) {
if (extractKey == Item.month) groupedByMonth.asInstanceOf[Map[T, Iterable[Item]]]
else if (extractKey == Item.itemType) groupedByItemType.asInstanceOf[Map[T, Iterable[Item]]]
else if (extractKey == Item.quantity) groupedByQuantity.asInstanceOf[Map[T, Iterable[Item]]]
else if (extractKey == Item.description) groupedByDescription.asInstanceOf[Map[T, Iterable[Item]]]
else throw new AssertionError("Shouldn't happen!")
} else {
Items.makeMap(underlying)(extractKey)
}
lazy val groupedByMonth = Items.makeMap[String](underlying)(Item.month)
lazy val groupedByItemType = Items.makeMap[String](underlying)(Item.itemType)
lazy val groupedByQuantity = Items.makeMap[Int](underlying)(Item.quantity)
lazy val groupedByDescription = Items.makeMap[String](underlying)(Item.description)
}
(that is almost certainly a personal record for asInstanceOfs in a small block of code... I'm not sure if I should be proud or ashamed of this snippet)
I am trying to find an elegant way to do:
val l = List(1,2,3)
val (item, idx) = l.zipWithIndex.find(predicate)
val updatedItem = updating(item)
l.update(idx, updatedItem)
Can I do all in one operation ? Find the item, if it exist replace with updated value and keep it in place.
I could do:
l.map{ i =>
if (predicate(i)) {
updating(i)
} else {
i
}
}
but that's pretty ugly.
The other complexity is the fact that I want to update only the first element which match predicate .
Edit: Attempt:
implicit class UpdateList[A](l: List[A]) {
def filterMap(p: A => Boolean)(update: A => A): List[A] = {
l.map(a => if (p(a)) update(a) else a)
}
def updateFirst(p: A => Boolean)(update: A => A): List[A] = {
val found = l.zipWithIndex.find { case (item, _) => p(item) }
found match {
case Some((item, idx)) => l.updated(idx, update(item))
case None => l
}
}
}
I don't know any way to make this in one pass of the collection without using a mutable variable. With two passes you can do it using foldLeft as in:
def updateFirst[A](list:List[A])(predicate:A => Boolean, newValue:A):List[A] = {
list.foldLeft((List.empty[A], predicate))((acc, it) => {acc match {
case (nl,pr) => if (pr(it)) (newValue::nl, _ => false) else (it::nl, pr)
}})._1.reverse
}
The idea is that foldLeft allows passing additional data through iteration. In this particular implementation I change the predicate to the fixed one that always returns false. Unfortunately you can't build a List from the head in an efficient way so this requires another pass for reverse.
I believe it is obvious how to do it using a combination of map and var
Note: performance of the List.map is the same as of a single pass over the list only because internally the standard library is mutable. Particularly the cons class :: is declared as
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
so tl is actually a var and this is exploited by the map implementation to build a list from the head in an efficient way. The field is private[scala] so you can't use the same trick from outside of the standard library. Unfortunately I don't see any other API call that allows to use this feature to reduce the complexity of your problem to a single pass.
You can avoid .zipWithIndex() by using .indexWhere().
To improve complexity, use Vector so that l(idx) becomes effectively constant time.
val l = Vector(1,2,3)
val idx = l.indexWhere(predicate)
val updatedItem = updating(l(idx))
l.updated(idx, updatedItem)
Reason for using scala.collection.immutable.Vector rather than List:
Scala's List is a linked list, which means data are access in O(n) time. Scala's Vector is indexed, meaning data can be read from any point in effectively constant time.
You may also consider mutable collections if you're modifying just one element in a very large collection.
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
I have a SQL database table with the following structure:
create table category_value (
category varchar(25),
property varchar(25)
);
I want to read this into a Scala Map[String, Set[String]] where each entry in the map is a set of all of the property values that are in the same category.
I would like to do it in a "functional" style with no mutable data (other than the database result set).
Following on the Clojure loop construct, here is what I have come up with:
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
#tailrec
def loop(m: Map[String, Set[String]]): Map[String, Set[String]] = {
if (resultSet.next) {
val category = resultSet.getString("category")
val property = resultSet.getString("property")
loop(m + (category -> m.getOrElse(category, Set.empty)))
} else m
}
loop(Map.empty)
}
Is there a better way to do this, without using mutable data structures?
If you like, you could try something around
def fillMap(statement: java.sql.Statement): Map[String, Set[String]] = {
val resultSet = statement.executeQuery("select category, property from category_value")
Iterator.continually((resultSet, resultSet.next)).takeWhile(_._2).map(_._1).map{ res =>
val category = res.getString("category")
val property = res.getString("property")
(category, property)
}.toIterable.groupBy(_._1).mapValues(_.map(_._2).toSet)
}
Untested, because I don’t have a proper sql.Statement. And the groupBy part might need some more love to look nice.
Edit: Added the requested changes.
There are two parts to this problem.
Getting the data out of the database and into a list of rows.
I would use a Spring SimpleJdbcOperations for the database access, so that things at least appear functional, even though the ResultSet is being changed behind the scenes.
First, some a simple conversion to let us use a closure to map each row:
implicit def rowMapper[T<:AnyRef](func: (ResultSet)=>T) =
new ParameterizedRowMapper[T]{
override def mapRow(rs:ResultSet, row:Int):T = func(rs)
}
Then let's define a data structure to store the results. (You could use a tuple, but defining my own case class has advantage of being just a little bit clearer regarding the names of things.)
case class CategoryValue(category:String, property:String)
Now select from the database
val db:SimpleJdbcOperations = //get this somehow
val resultList:java.util.List[CategoryValue] =
db.query("select category, property from category_value",
{ rs:ResultSet => CategoryValue(rs.getString(1),rs.getString(2)) } )
Converting the data from a list of rows into the format that you actually want
import scala.collection.JavaConversions._
val result:Map[String,Set[String]] =
resultList.groupBy(_.category).mapValues(_.map(_.property).toSet)
(You can omit the type annotations. I've included them to make it clear what's going on.)
Builders are built for this purpose. Get one via the desired collection type companion, e.g. HashMap.newBuilder[String, Set[String]].
This solution is basically the same as my other solution, but it doesn't use Spring, and the logic for converting a ResultSet to some sort of list is simpler than Debilski's solution.
def streamFromResultSet[T](rs:ResultSet)(func: ResultSet => T):Stream[T] = {
if (rs.next())
func(rs) #:: streamFromResultSet(rs)(func)
else
rs.close()
Stream.empty
}
def fillMap(statement:java.sql.Statement):Map[String,Set[String]] = {
case class CategoryValue(category:String, property:String)
val resultSet = statement.executeQuery("""
select category, property from category_value
""")
val queryResult = streamFromResultSet(resultSet){rs =>
CategoryValue(rs.getString(1),rs.getString(2))
}
queryResult.groupBy(_.category).mapValues(_.map(_.property).toSet)
}
There is only one approach I can think of that does not include either mutable state or extensive copying*. It is actually a very basic technique I learnt in my first term studying CS. Here goes, abstracting from the database stuff:
def empty[K,V](k : K) : Option[V] = None
def add[K,V](m : K => Option[V])(k : K, v : V) : K => Option[V] = q => {
if ( k == q ) {
Some(v)
}
else {
m(q)
}
}
def build[K,V](input : TraversableOnce[(K,V)]) : K => Option[V] = {
input.foldLeft(empty[K,V]_)((m,i) => add(m)(i._1, i._2))
}
Usage example:
val map = build(List(("a",1),("b",2)))
println("a " + map("a"))
println("b " + map("b"))
println("c " + map("c"))
> a Some(1)
> b Some(2)
> c None
Of course, the resulting function does not have type Map (nor any of its benefits) and has linear lookup costs. I guess you could implement something in a similar way that mimicks simple search trees.
(*) I am talking concepts here. In reality, things like value sharing might enable e.g. mutable list constructions without memory overhead.