Is it possible to work with a list of generic values with different type parameters in Scala? - scala

I want to achieve the following:
There is a list of strings I need to process.
There are several different kinds of these processors, each of which knows which part of the string to read.
I need to work in 2 phases: first, processors need to see each input string to build processor-specific data; second, each input string is processed by each of the processors, and the resulting strings are combined into one.
It's easy to do it in a mutable way: there's a common base class for all processors, the different kinds of data they aggregate is encapsulated in the concrete implementations; the interface consists of just 2 functions --- "look at input string and build internal data" and "process input string using your internal data."
As I am writing it in Scala, I am wondering if there exists a pure functional approach. The problem is that now the base trait for these processors is parameterized by the type of their internal data, and there doesn't seem to be a way to have a list of processors of different kinds.
This problem can be demonstrated on a simpler case: say I'd stick with the mutable approach, but for some reason have parameterized the type of what the processor takes from the string:
trait F[V] {
def get(line: String) : V
def aggregate(value: V)
def process(value: V) : String
}
class F1 extends F[Int] // ...
class F2 extends F[HashMap[Int, Int]] // ...
for (s <- List("string1", "string2");
f <- List(new F1(), new F2())
{
f.aggregate(f.get(s)); // Whoops --- doesn't work
}
It doesn't work because f.get(s) returns Any. Looks like I need to express in Scala's type system that List(new F1(), new F2()) contains F[?] that are different but consistent in that if I take an element of that list, it has some concrete value of its type parameter, and f.get(s) is of that type, which should be accepted by f.aggregate().
In the end, I would like to have something like this (with omissions because I don't get how to do it):
trait F[D] {
def initData : D
def aggregate(line: String, data: D) : D
def process(line: String, data: D) : String
}
class F1 extends F[Int] // ...
class F2 extends F[HashMap[Int, Int]] // ...
// Phase 1
// datas --- List of f.initData, how to?
for (s <- List("string1", "string2")) {
for (f <- List(new F1(), new F2()) {
// let fdata be f's data
// update fdata with f.aggregate(s, fdata)
}
}
// Phase 2
for (s <- List("string1", "string2")) {
for (f <- List(new F1(), new F2()) {
// let fdata be f's data
// for all fs, concatenate f.process(s, fdata) into an output string
}
}
Questions:
Is this task solvable in pure functional way in Scala?
Is this task solvable in other functional languages?
This situation looks like quite a general one. Is there a name for it I could search?
Where is the best place to read about it, assuming little to no background on theory of types and functional programming languages?

Also, you may use abstract types instead of generics, so:
trait F {
type D
def initData: D
def aggregate(line: String, data: D): D
def process(line: String, data: D): String
}
class F1 extends F { type D = Int } // ...
class F2 extends F { type D = Map[Int, Int] } // ...
val strings = List("string1", "string2")
for (f <- List(new F1(), new F2())) {
val d = strings.foldLeft(f.initData) { (d, s) => f.aggregate(s, d) }
for (s <- strings)
f.process(s, d)
}
Don't sure, if I undrestood correct order of operation, but it may be a starting point.

Edit Just noticed, that my former solution was overly verbose, consing up a temporary data structure without any need.
I am not sure, what you mean with "purely functional". The following solution (if it is a solution to your problem) is "purely functional", as it has no side effects except the final println call in main.
Note, that the List[F[_]](...) is important, since otherwise, the compiler will infer a very specific internal type for the elements of the list, which doesn't go well with the aggregateAndProcess function.
trait F[D] {
type Data = D // Abbreviation for easier copy+paste below. Does not
// contribute to the actual solution otherwise
def initData: Data
def aggregate(line: String, data: Data) : Data
def process(line: String, aggData: Data): String
}
class F1 extends F[Int] {
def initData: Data = 1
def aggregate(line: String, data: Data) : Data = data + 1
def process(line: String, aggData: Data): String = line + "/F1" + aggData
}
class F2 extends F[Boolean] {
def initData: Data = false
def aggregate(line: String, data: Data) : Data = !data
def process(line: String, aggData: Data): String = line + "/F2" + aggData
}
object Main {
private def aggregateAndProcess[T](line: String, processor: F[T]): String =
processor.process(line, processor.aggregate(line, processor.initData))
def main(args: Array[String]) {
val r = for {
s <- List("a", "b")
d <- List[F[_]](new F1, new F2)
} yield
aggregateAndProcess(s, d)
println(r.toList)
}
}
Note, though, that I am still unsure as to what you actually want to accomplish. The F interface doesn't really specify, which information flows from which method into whatever location at what time, so: this is still a best-guess efford.

Related

Scala - evaluate function calls sequentially until one return

I have a few 'legacy' endpoints that can return the Data I'm looking for.
def mainCall(id): Data {
maybeMyDataInEndpoint1(id: UUID): DataA
maybeMyDataInEndpoint2(id: UUID): DataB
maybeMyDataInEndpoint3(id: UUID): DataC
}
null can be returned if no DataX found
return types for each method are different. There are a convert method that converting each DataX to unified Data.
The endpoints are not Scala-ish
What is the best Scala approach to evaluate those method calls sequentially until I have the value I need?
In pseudo I would do something like:
val myData = maybeMyDataInEndpoint1 getOrElse maybeMyDataInEndpoint2 getOrElse maybeMyDataInEndpoint3
I'd use an easier approach, though the other Answers use more elaborate language features.
Just use Option() to catch the null, chain with orElse. I'm assuming methods convertX(d:DataX):Data for explicit conversion. As it might not be found at all we return an Option
def mainCall(id: UUID): Option[Data] {
Option(maybeMyDataInEndpoint1(id)).map(convertA)
.orElse(Option(maybeMyDataInEndpoint2(id)).map(convertB))
.orElse(Option(maybeMyDataInEndpoint3(id)).map(convertC))
}
Maybe You can lift these methods as high order functions of Lists and collectFirst, like:
val fs = List(maybeMyDataInEndpoint1 _, maybeMyDataInEndpoint2 _, maybeMyDataInEndpoint3 _)
val f = (a: UUID) => fs.collectFirst {
case u if u(a) != null => u(a)
}
r(myUUID)
The best Scala approach IMHO is to do things in the most straightforward way.
To handle optional values (or nulls from Java land), use Option.
To sequentially evaluate a list of methods, fold over a Seq of functions.
To convert from one data type to another, use either (1.) implicit conversions or (2.) regular functions depending on the situation and your preference.
(Edit) Assuming implicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(implicit convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1),
legacyEndpoint(maybeMyDataInEndpoint2),
legacyEndpoint(maybeMyDataInEndpoint3)
)
def mainCall(id: UUID): Option[Data] =
legacyEndpoints.foldLeft(Option.empty[Data])(_ orElse _(id))
(Edit) Using explicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1)(fromDataA),
legacyEndpoint(maybeMyDataInEndpoint2)(fromDataB),
legacyEndpoint(maybeMyDataInEndpoint3)(fromDataC)
)
... // same as before
Here is one way to do it.
(1) You can make your convert methods implicit (or wrap them into implicit wrappers) for convenience.
(2) Then use Stream to build chain from method calls. You should give type inference a hint that you want your stream to contain Data elements (not DataX as returned by legacy methods) so that appropriate implicit convert will be applied to each result of a legacy method call.
(3) Since Stream is lazy and evaluates its tail "by name" only first method gets called so far. At this point you can apply lazy filter to skip null results.
(4) Now you can actually evaluate chain, getting first non-null result with headOption
(HACK) Unfortunately, scala type inference (at the time of writing, v2.12.4) is not powerful enough to allow using #:: stream methods, unless you guide it every step of the way. Using cons makes inference happy but is cumbersome. Also, building stream using vararg apply method of companion object is not an option too, since scala does not support "by-name" varargs yet. In my example below I use combination of stream and toLazyData methods. stream is a generic helper, builds streams from 0-arg functions. toLazyData is an implicit "by-name" conversion designed to interplay with implicit convert functions that convert from DataX to Data.
Here is the demo that demonstrates the idea with more detail:
object Demo {
case class Data(value: String)
class DataA
class DataB
class DataC
def maybeMyDataInEndpoint1(id: String): DataA = {
println("maybeMyDataInEndpoint1")
null
}
def maybeMyDataInEndpoint2(id: String): DataB = {
println("maybeMyDataInEndpoint2")
new DataB
}
def maybeMyDataInEndpoint3(id: String): DataC = {
println("maybeMyDataInEndpoint3")
new DataC
}
implicit def convert(data: DataA): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataB): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataC): Data = if (data == null) null else Data(data.toString)
implicit def toLazyData[T](value: => T)(implicit convert: T => Data): (() => Data) = () => convert(value)
def stream[T](xs: (() => T)*): Stream[T] = {
xs.toStream.map(_())
}
def main (args: Array[String]) {
val chain = stream(
maybeMyDataInEndpoint1("1"),
maybeMyDataInEndpoint2("2"),
maybeMyDataInEndpoint3("3")
)
val result = chain.filter(_ != null).headOption.getOrElse(Data("default"))
println(result)
}
}
This prints:
maybeMyDataInEndpoint1
maybeMyDataInEndpoint2
Data(Demo$DataB#16022d9d)
Here maybeMyDataInEndpoint1 returns null and maybeMyDataInEndpoint2 needs to be invoked, delivering DataB, maybeMyDataInEndpoint3 never gets invoked since we already have the result.
I think #g.krastev's answer is perfectly good for your use case and you should accept that. I'm just expending a bit on it to show how you can make the last step slightly better with cats.
First, the boilerplate:
import java.util.UUID
final case class DataA(i: Int)
final case class DataB(i: Int)
final case class DataC(i: Int)
type Data = Int
def convertA(a: DataA): Data = a.i
def convertB(b: DataB): Data = b.i
def convertC(c: DataC): Data = c.i
def maybeMyDataInEndpoint1(id: UUID): DataA = DataA(1)
def maybeMyDataInEndpoint2(id: UUID): DataB = DataB(2)
def maybeMyDataInEndpoint3(id: UUID): DataC = DataC(3)
This is basically what you have, in a way that you can copy/paste in the REPL and have compile.
Now, let's first declare a way to turn each of your endpoints into something safe and unified:
def makeSafe[A, B](evaluate: UUID ⇒ A, f: A ⇒ B): UUID ⇒ Option[B] =
id ⇒ Option(evaluate(id)).map(f)
With this in place, you can, for example, call the following to turn maybeMyDataInEndpoint1 into a UUID => Option[A]:
makeSafe(maybeMyDataInEndpoint1, convertA)
The idea is now to turn your endpoints into a list of UUID => Option[A] and fold over that list. Here's your list:
val endpoints = List(
makeSafe(maybeMyDataInEndpoint1, convertA),
makeSafe(maybeMyDataInEndpoint2, convertB),
makeSafe(maybeMyDataInEndpoint3, convertC)
)
You can now fold on it manually, which is what #g.krastev did:
def mainCall(id: UUID): Option[Data] =
endpoints.foldLeft(None: Option[Data])(_ orElse _(id))
If you're fine with a cats dependency, the notion of folding over a list of options is just a concrete use case of a common pattern (the interaction of Foldable and Monoid):
import cats._
import cats.implicits._
def mainCall(id: UUID): Option[Data] = endpoints.foldMap(_(id))
There are other ways to make this nicer still, but they might be overkill in this context - I'd probably declare a type class to turn any type into a Data, say, to give makeSafe a cleaner type signature.

Cats Writer Vector is empty

I wrote this simple program in my attempt to learn how Cats Writer works
import cats.data.Writer
import cats.syntax.applicative._
import cats.syntax.writer._
import cats.instances.vector._
object WriterTest extends App {
type Logged2[A] = Writer[Vector[String], A]
Vector("started the program").tell
val output1 = calculate1(10)
val foo = new Foo()
val output2 = foo.calculate2(20)
val (log, sum) = (output1 + output2).pure[Logged2].run
println(log)
println(sum)
def calculate1(x : Int) : Int = {
Vector("came inside calculate1").tell
val output = 10 + x
Vector(s"Calculated value ${output}").tell
output
}
}
class Foo {
def calculate2(x: Int) : Int = {
Vector("came inside calculate 2").tell
val output = 10 + x
Vector(s"calculated ${output}").tell
output
}
}
The program works and the output is
> run-main WriterTest
[info] Compiling 1 Scala source to /Users/Cats/target/scala-2.11/classes...
[info] Running WriterTest
Vector()
50
[success] Total time: 1 s, completed Jan 21, 2017 8:14:19 AM
But why is the vector empty? Shouldn't it contain all the strings on which I used the "tell" method?
When you call tell on your Vectors, each time you create a Writer[Vector[String], Unit]. However, you never actually do anything with your Writers, you just discard them. Further, you call pure to create your final Writer, which simply creates a Writer with an empty Vector. You have to combine the writers together in a chain that carries your value and message around.
type Logged[A] = Writer[Vector[String], A]
val (log, sum) = (for {
_ <- Vector("started the program").tell
output1 <- calculate1(10)
foo = new Foo()
output2 <- foo.calculate2(20)
} yield output1 + output2).run
def calculate1(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate1").tell
output = 10 + x
_ <- Vector(s"Calculated value ${output}").tell
} yield output
class Foo {
def calculate2(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate2").tell
output = 10 + x
_ <- Vector(s"calculated ${output}").tell
} yield output
}
Note the use of for notation. The definition of calculate1 is really
def calculate1(x: Int): Logged[Int] = Vector("came inside calculate1").tell.flatMap { _ =>
val output = 10 + x
Vector(s"calculated ${output}").tell.map { _ => output }
}
flatMap is the monadic bind operation, which means it understands how to take two monadic values (in this case Writer) and join them together to get a new one. In this case, it makes a Writer containing the concatenation of the logs and the value of the one on the right.
Note how there are no side effects. There is no global state by which Writer can remember all your calls to tell. You instead make many Writers and join them together with flatMap to get one big one at the end.
The problem with your example code is that you're not using the result of the tell method.
If you take a look at its signature, you'll see this:
final class WriterIdSyntax[A](val a: A) extends AnyVal {
def tell: Writer[A, Unit] = Writer(a, ())
}
it is clear that tell returns a Writer[A, Unit] result which is immediately discarded because you didn't assign it to a value.
The proper way to use a Writer (and any monad in Scala) is through its flatMap method. It would look similar to this:
println(
Vector("started the program").tell.flatMap { _ =>
15.pure[Logged2].flatMap { i =>
Writer(Vector("ended program"), i)
}
}
)
The code above, when executed will give you this:
WriterT((Vector(started the program, ended program),15))
As you can see, both messages and the int are stored in the result.
Now this is a bit ugly, and Scala actually provides a better way to do this: for-comprehensions. For-comprehension are a bit of syntactic sugar that allows us to write the same code in this way:
println(
for {
_ <- Vector("started the program").tell
i <- 15.pure[Logged2]
_ <- Vector("ended program").tell
} yield i
)
Now going back to your example, what I would recommend is for you to change the return type of compute1 and compute2 to be Writer[Vector[String], Int] and then try to make your application compile using what I wrote above.

How can I extend Scala collections with member values?

Say I have the following data structure:
case class Timestamped[CC[M] < Seq[M]](elems : CC, timestamp : String)
So it's essentially a sequence with an attribute -- a timestamp -- attached to it. This works fine and I could create new instances with the syntax
val t = Timestamped(Seq(1,2,3,4),"2014-02-25")
t.elems.head // 1
t.timestamp // "2014-05-25"
The syntax is unwieldly and instead I want to be able to do something like:
Timestamped(1,2,3,4)("2014-02-25")
t.head // 1
t.timestamp // "2014-05-25"
Where timestamped is just an extension of a Seq and it's implementation SeqLike, with a single attribute val timestamp : String.
This seems easy to do; just use a Seq with a mixin TimestampMixin { val timestamp : String }. But I can't figure out how to create the constructor. My question is: how do I create a constructor in the companion object, that creates a sequence with an extra member value? The signature is as follows:
object Timestamped {
def apply(elems: M*)(timestamp : String) : Seq[M] with TimestampMixin = ???
}
You'll see that it's not straightforward; collections use Builders to instantiate themselves, so I can't simply call the constructor an override some vals.
Scala collections are very complicated structures when it comes down to it. Extending Seq requires implementing apply, length, and iterator methods. In the end, you'll probably end up duplicating existing code for List, Set, or something else. You'll also probably have to worry about CanBuildFroms for your collection, which in the end I don't think is worth it if you just want to add a field.
Instead, consider an implicit conversion from your Timestamped type to Seq.
case class Timestamped[A](elems: Seq[A])(timestamp: String)
object Timestamped {
implicit def toSeq[A](ts: Timestamped[A]): Seq[A] = ts.elems
}
Now, whenever I try to call a method from Seq, the compiler will implicitly convert Timestamped to Seq, and we can proceed as normal.
scala> val ts = Timestamped(List(1,2,3,4))("1/2/34")
ts: Timestamped[Int] = Timestamped(List(1, 2, 3, 4))
scala> ts.filter(_ > 2)
res18: Seq[Int] = List(3, 4)
There is one major drawback here, and it's that we're now stuck with Seq after performing operations on the original Timestamped.
Go the other way... extend Seq, it only has 3 abstract members:
case class Stamped[T](elems: Seq[T], stamp: Long) extends Seq[T] {
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(List(10,20,30), 15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Keep in mind, this only works as long as Seq is specific enough for your use cases, if you later find you need more complex collection behavior this may become untenable.
EDIT: Added a version closer to the signature you were trying to create. Not sure if this helps you any more:
case class Stamped[T](elems: T*)(stamp: Long) extends Seq[T] {
def timeStamp = stamp
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(10,20,30)(15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Where elems would end up being a generically created WrappedArray.

Allocation of Function Literals in Scala

I have a class that represents sales orders:
class SalesOrder(val f01:String, val f02:Int, ..., f50:Date)
The fXX fields are of various types. I am faced with the problem of creating an audit trail of my orders. Given two instances of the class, I have to determine which fields have changed. I have come up with the following:
class SalesOrder(val f01:String, val f02:Int, ..., val f50:Date){
def auditDifferences(that:SalesOrder): List[String] = {
def diff[A](fieldName:String, getField: SalesOrder => A) =
if(getField(this) != getField(that)) Some(fieldName) else None
val diffList = diff("f01", _.f01) :: diff("f02", _.f02) :: ...
:: diff("f50", _.f50) :: Nil
diffList.flatten
}
}
I was wondering what the compiler does with all the _.fXX functions: are they instanced just once (statically), and can be shared by all instances of my class, or will they be instanced every time I create an instance of my class?
My worry is that, since I will use a lot of SalesOrder instances, it may create a lot of garbage. Should I use a different approach?
One clean way of solving this problem would be to use the standard library's Ordering type class. For example:
class SalesOrder(val f01: String, val f02: Int, val f03: Char) {
def diff(that: SalesOrder) = SalesOrder.fieldOrderings.collect {
case (name, ord) if !ord.equiv(this, that) => name
}
}
object SalesOrder {
val fieldOrderings: List[(String, Ordering[SalesOrder])] = List(
"f01" -> Ordering.by(_.f01),
"f02" -> Ordering.by(_.f02),
"f03" -> Ordering.by(_.f03)
)
}
And then:
scala> val orderA = new SalesOrder("a", 1, 'a')
orderA: SalesOrder = SalesOrder#5827384f
scala> val orderB = new SalesOrder("b", 1, 'b')
orderB: SalesOrder = SalesOrder#3bf2e1c7
scala> orderA diff orderB
res0: List[String] = List(f01, f03)
You almost certainly don't need to worry about the perfomance of your original formulation, but this version is (arguably) nicer for unrelated reasons.
Yes, that creates 50 short lived functions. I don't think you should be worried unless you have manifest evidence that that causes a performance problem in your case.
But I would define a method that transforms SalesOrder into a Map[String, Any], then you would just have
trait SalesOrder {
def fields: Map[String, Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] = {
val af = a.fields
val bf = b.fields
af.collect { case (key, value) if bf(key) != value => key }
}
If the field names are indeed just incremental numbers, you could simplify
trait SalesOrder {
def fields: Iterable[Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] =
(a.fields zip b.fields).zipWithIndex.collect {
case ((av, bv), idx) if av != bv => f"f${idx + 1}%02d"
}

Scala operator overloading with multiple parameters

In short: I try to write something like A <N B for a DSL in Scala, for an integer N and A,B of Type T. Is there a nice possibility to do so?
Longer: I try to write a DSL for TGrep2 in Scala. I'm currently interested to write
A <N B B is the Nth child of A (the rst child is <1).
in a nice way and as close as possible to the original definition in Scala. Is there a way to overload the < Operator that it can take a N and a B as a argument.
What I tried: I tried two different possibilities which did not make me very happy:
scala> val N = 10
N: Int = 10
scala> case class T(n:String) {def <(i:Int,j:T) = println("huray!")}
defined class T
scala> T("foo").<(N,T("bar"))
huray!
and
scala> case class T(n:String) {def <(i:Int) = new {def apply(j:T) = println("huray!")}}
defined class T
scala> (T("foo")<N)(T("bar"))
warning: there were 1 feature warnings; re-run with -feature for details
huray!
Id suggest you use something like nth instead of the < symbol which makes the semantics clear. A nth N is B would make a lot of sense to me at least. It would translate to something like
case class T (label:String){
def is(j:T) = {
label equals j.label
}
}
case class J(i:List[T]){
def nth(index:Int) :T = {
i(index)
}
}
You can easily do:
val t = T("Mice")
val t1 = T("Rats")
val j = J(List(t1,t))
j nth 1 is t //res = true
The problem is that apply doesn't work as a postfix operator, so you can't write it without the parantheses, you could write this:
case class T(n: String) {
def <(in: (Int, T)) = {
in match {
case (i, t) =>
println(s"${t.n} is the ${i} child of ${n}")
}
}
}
implicit class Param(lower: Int) {
def apply(t: T) = (lower, t)
}
but then,
T("foo") < 10 T("bar")
would still fail, but you could work it out with:
T("foo") < 10 (T("bar"))
there isn't a good way of doing what you want without adding parenthesis somewhere.
I think that you might want to go for a combinational parser instead if you really want to stick with this syntax. Or as #korefn proposed, you break the compatibility and do it with new operators.