Creating a modified `filter` function - scala

Consider the filter function.
I am interested in the following modifications of the filter function, if possible:
We know for a collection we can do:
case class People(val age: Int)
val a: List[People] = ...
a.filter(i => i.age ==10 )
Or more simply:
a.filter(_.age==10 )
Any simple way I can define another modified filter that works just like the following (no underline)
a.myfilter1( age==10 )
the filter function does not work when its argument is no Boolean. Suppose I want to create a modified filter that when a non-Boolean is given, it translates to equality automatically. Here is an example:
val anotherPerson: People = ...
a.myFilter2(anotherPerson)
I want the above myFilter2 to get translated as following:
a.filter(_.equals(anotherPerson))

Using implicit def:
case class MyFilterable[T](seq: Seq[T]) {
def suchAFilter(v: Any): Seq[T] = {
seq.filter(v.equals)
}
}
implicit def strongFilter[T](seq: Seq[T]): MyFilterable[T] = {
MyFilterable(seq)
}
println(List(1,2,3).suchAFilter(2))

Related

Scala: Creating Options from Seq/Tuple and add to a Sequence

Edit:
Suppose I have a Seq:
Seq(Some("Earth"),Some("Mars"))
I need to add few more elements at start of this sequence. Values to be added are generated based on an Option value.
So I try to do as:
val o = ....//Option calculated here
Seq(o.map(myFunction(_)),"Earth","Mars")
def myFunction(s: String) = s match {
case "x" => Seq(Some("Jupiter"), Some("Venus"))
case "y" => Seq(Some("PLuto"), Some("Mercury"))
}
But map would give me Some(Seq(.....)).
For this kind of problem I recommend checking the Scaladoc and following a technique called type-tetris.
You need this:
def prependIfDefined(data: Option[A], previousElements: Seq[Option[B]]): Seq[Option[B]] =
data.fold(ifEmpty = Seq.empty[Option[B]])(getNewData) ++ previousElements
def getNewData(a: A): Seq[Option[B]] = ???

Object instantiation variants in scala

I am really new to scala, and I am currently making my way through the tour (https://docs.scala-lang.org/tour/variances.html).
Now, looking at some library (akka-http), I stumbled across some code like this:
def fetchItem(itemId: Long): Future[Option[Item]] = Future {
orders.find(o => o.id == itemId)
}
And I don't quite understand the syntax, or more precisely, the = Future { part. As I learned, the syntax for methods is def [methodName]([Arguments])(:[returnType]) = [codeblock].
However the above seems to differ in that its having the Future in front of the "codeblock". Is this some kind of object instantiation? Because I could not find documentation about this syntax, I tried in my play code stuff like this:
{
val myCat:Cat = new Cat("first cat")
val myOtherCat:Cat = Cat { "second cat" }
val myThirdCat:Cat = MyObject.getSomeCat
}
...
object MyObject
{
def getSomeCat: Cat = Cat
{
"blabla"
}
}
And all of this works, in that it creates a new Cat object. So it seems like new Cat(args) is equivalent to Cat { args }.
But shouldn't def getSomeCat: Cat = Cat define a method with a code block, not the instantiate a new Cat object? I am confused.
I think there are a couple of things here:
1.
The [codeblock] in method syntax doesn't have to be enclosed in {}. If there's only one expression, it's allowed to omit them.
E.g.
def add(x: Int, y: Int) = x + y
or
def add(x: Int, y: Int) = Future { x + y }
2.
Each class can have its companion object define an apply() method, which can be invoked without explicitly saying "apply" (this is special Scala syntactic sugar). This allows us to construct instances of the class by going through the companion object, and since "apply" can be omitted, at first glance it looks like going through the class itself, just without the "new" keyword.
Without the object:
class Cat(s: String)
val myFirstCat: Cat = new Cat("first cat") // OK
val mySecondCat: Cat = Cat("second cat") // error
And now with the object:
class Cat(s: String)
object Cat {
def apply(s: String) = new Cat(s)
}
val myFirstCat: Cat = new Cat("first cat") // OK
val mySecondCat: Cat = Cat.apply("second cat") // OK
val myThirdCat: Cat = Cat("third cat") // OK (uses apply under the hood)
val myFourthCat: Cat = Cat { "fourth cat" } // OK as well
Note how fourth cat invocation works just fine with curly braces, because methods can be passed codeblocks (last evaluated value in the block will be passed, just like in functions).
3.
Case classes are another slightly "special" Scala construct in a sense that they give you convenience by automatically providing some stuff for you "behind the curtain", including an associated companion object with apply().
case class Cat(s: String)
val myFirstCat: Cat = new Cat("first cat") // OK
val mySecondCat: Cat = Cat.apply("second cat") // OK
val myThirdCat: Cat = Cat("third cat") // OK
What happens in your case with Future is number 2, identical to "fourth cat". Regarding your question about new Cat(args) being equivalent to Cat { args }, it's most likely situation number 3 - Cat is a case class. Either that, or its companion object explicitly defines the apply() method.
The short answer is Yes, that Future code is an object instanciation.
Your Cat class has a single String argument and can be created using Cat(<string>). If you want to compute a value for the string you can put it in a block using {} as you did in your example. This block can contain arbitrary code, and the value of the block will be the value of the last expression in the block which must be type String.
The Future[T] class has a single argument of type T and can be created using Future(T). You can pass an arbitrary block of code as before, as long as it returns a value of type T.
So creating a Future is just like creating any other object. The fetchItem code is just creating a Future object and returning it.
However there is a subtlety with Future in that the parameter is defined as a "call-by-name" parameter not the default "call-by-value" parameter. This means that it is not evaluated until it is used, and it is evaluated every time it is used. In the case of a Future the parameter is evaluated once at a later time and potentially on a different thread. If you use a block to compute the parameter then the whole block will be executed each time the parameter is used.
Scala has very powerful syntax and some useful shortcuts, but it can take a while to get used to it!
A typical method structure will look like:
case class Bar(name: String)
def foo(param: String): Bar = {
// code block.
}
However, Scala is quite flexible when its comes to method definition. One of flexibility is that if your method block contains single expression, then you can ignore curly braces { }.
def foo(param: String): Bar = {
Bar("This is Bar") //Block only contains single expression.
}
// Can be written as:
def foo(param: String): Bar = Bar("This is Bar")
// In case of multiple expressions:
def foo(param: String): Bar = {
println("Returning Bar...")
Bar("This is Bar")
}
def foo(param: String): Bar = println("Returning Bar...") Bar("This is Bar") //Fails
def foo(param: String): Bar = println("Returning Bar..."); Bar("This is Bar") //Fails
def foo(param: String): Bar = {println("Returning Bar..."); Bar("This is Bar")} // Works
Similarly, in your code, fetchItem contains only single expression - Future {orders.find(o => o.id == itemId)} that return a new Future (instance of Future) of Option[Item], therefore braces { } is optional. However, if you want you can write it inside braces as below:
def fetchItem(itemId: Long): Future[Option[Item]] = {
Future {
orders.find(o => o.id == itemId)
}
}
Similarly, if a method take only single parameter, you can use curly braces. i.e. you can invoke fetchItems as:
fetchItem(10)
fetchItem{10}
fetchItem{
10
}
So, why use curly braces { } instead of brackets ( )?
Because, you can provide multiple code blocks inside braces, and this situation is required when need to perform multiple computation and return a value as result of that computation. For example:
fetchItem{
val x = 2
val y = 3
val z = 2
(x + y)*z //final result is 10 which is passed as parameter.
}
// In addition, using curly braces will make your code more cleaner e.g. in case of higher ordered functions.
def test(id: String => Unit) = ???
test {
id => {
val result: List[String] = getDataById(x)
val updated = result.map(_.toUpperCase)
updated.mkString(",")
}
}
Now, coming to your case, when you invoke Future{...}, you are invoking apply(...) method of Scala future companion object that take function literal body: =>T as parameter and return a new Future.
//The below code can also be written as:
Future {
orders.find(o => o.id == itemId)
}
//Can be written as:
Future(orders.find(o => o.id == itemId))
Future.apply(orders.find(o => o.id == itemId))
// However, braces are not allowed with multiple parameter.
def foo(a:String, b:String) = ???
foo("1","2") //work
foo{"1", "2"} //won't work.

Scala - evaluate function calls sequentially until one return

I have a few 'legacy' endpoints that can return the Data I'm looking for.
def mainCall(id): Data {
maybeMyDataInEndpoint1(id: UUID): DataA
maybeMyDataInEndpoint2(id: UUID): DataB
maybeMyDataInEndpoint3(id: UUID): DataC
}
null can be returned if no DataX found
return types for each method are different. There are a convert method that converting each DataX to unified Data.
The endpoints are not Scala-ish
What is the best Scala approach to evaluate those method calls sequentially until I have the value I need?
In pseudo I would do something like:
val myData = maybeMyDataInEndpoint1 getOrElse maybeMyDataInEndpoint2 getOrElse maybeMyDataInEndpoint3
I'd use an easier approach, though the other Answers use more elaborate language features.
Just use Option() to catch the null, chain with orElse. I'm assuming methods convertX(d:DataX):Data for explicit conversion. As it might not be found at all we return an Option
def mainCall(id: UUID): Option[Data] {
Option(maybeMyDataInEndpoint1(id)).map(convertA)
.orElse(Option(maybeMyDataInEndpoint2(id)).map(convertB))
.orElse(Option(maybeMyDataInEndpoint3(id)).map(convertC))
}
Maybe You can lift these methods as high order functions of Lists and collectFirst, like:
val fs = List(maybeMyDataInEndpoint1 _, maybeMyDataInEndpoint2 _, maybeMyDataInEndpoint3 _)
val f = (a: UUID) => fs.collectFirst {
case u if u(a) != null => u(a)
}
r(myUUID)
The best Scala approach IMHO is to do things in the most straightforward way.
To handle optional values (or nulls from Java land), use Option.
To sequentially evaluate a list of methods, fold over a Seq of functions.
To convert from one data type to another, use either (1.) implicit conversions or (2.) regular functions depending on the situation and your preference.
(Edit) Assuming implicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(implicit convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1),
legacyEndpoint(maybeMyDataInEndpoint2),
legacyEndpoint(maybeMyDataInEndpoint3)
)
def mainCall(id: UUID): Option[Data] =
legacyEndpoints.foldLeft(Option.empty[Data])(_ orElse _(id))
(Edit) Using explicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1)(fromDataA),
legacyEndpoint(maybeMyDataInEndpoint2)(fromDataB),
legacyEndpoint(maybeMyDataInEndpoint3)(fromDataC)
)
... // same as before
Here is one way to do it.
(1) You can make your convert methods implicit (or wrap them into implicit wrappers) for convenience.
(2) Then use Stream to build chain from method calls. You should give type inference a hint that you want your stream to contain Data elements (not DataX as returned by legacy methods) so that appropriate implicit convert will be applied to each result of a legacy method call.
(3) Since Stream is lazy and evaluates its tail "by name" only first method gets called so far. At this point you can apply lazy filter to skip null results.
(4) Now you can actually evaluate chain, getting first non-null result with headOption
(HACK) Unfortunately, scala type inference (at the time of writing, v2.12.4) is not powerful enough to allow using #:: stream methods, unless you guide it every step of the way. Using cons makes inference happy but is cumbersome. Also, building stream using vararg apply method of companion object is not an option too, since scala does not support "by-name" varargs yet. In my example below I use combination of stream and toLazyData methods. stream is a generic helper, builds streams from 0-arg functions. toLazyData is an implicit "by-name" conversion designed to interplay with implicit convert functions that convert from DataX to Data.
Here is the demo that demonstrates the idea with more detail:
object Demo {
case class Data(value: String)
class DataA
class DataB
class DataC
def maybeMyDataInEndpoint1(id: String): DataA = {
println("maybeMyDataInEndpoint1")
null
}
def maybeMyDataInEndpoint2(id: String): DataB = {
println("maybeMyDataInEndpoint2")
new DataB
}
def maybeMyDataInEndpoint3(id: String): DataC = {
println("maybeMyDataInEndpoint3")
new DataC
}
implicit def convert(data: DataA): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataB): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataC): Data = if (data == null) null else Data(data.toString)
implicit def toLazyData[T](value: => T)(implicit convert: T => Data): (() => Data) = () => convert(value)
def stream[T](xs: (() => T)*): Stream[T] = {
xs.toStream.map(_())
}
def main (args: Array[String]) {
val chain = stream(
maybeMyDataInEndpoint1("1"),
maybeMyDataInEndpoint2("2"),
maybeMyDataInEndpoint3("3")
)
val result = chain.filter(_ != null).headOption.getOrElse(Data("default"))
println(result)
}
}
This prints:
maybeMyDataInEndpoint1
maybeMyDataInEndpoint2
Data(Demo$DataB#16022d9d)
Here maybeMyDataInEndpoint1 returns null and maybeMyDataInEndpoint2 needs to be invoked, delivering DataB, maybeMyDataInEndpoint3 never gets invoked since we already have the result.
I think #g.krastev's answer is perfectly good for your use case and you should accept that. I'm just expending a bit on it to show how you can make the last step slightly better with cats.
First, the boilerplate:
import java.util.UUID
final case class DataA(i: Int)
final case class DataB(i: Int)
final case class DataC(i: Int)
type Data = Int
def convertA(a: DataA): Data = a.i
def convertB(b: DataB): Data = b.i
def convertC(c: DataC): Data = c.i
def maybeMyDataInEndpoint1(id: UUID): DataA = DataA(1)
def maybeMyDataInEndpoint2(id: UUID): DataB = DataB(2)
def maybeMyDataInEndpoint3(id: UUID): DataC = DataC(3)
This is basically what you have, in a way that you can copy/paste in the REPL and have compile.
Now, let's first declare a way to turn each of your endpoints into something safe and unified:
def makeSafe[A, B](evaluate: UUID ⇒ A, f: A ⇒ B): UUID ⇒ Option[B] =
id ⇒ Option(evaluate(id)).map(f)
With this in place, you can, for example, call the following to turn maybeMyDataInEndpoint1 into a UUID => Option[A]:
makeSafe(maybeMyDataInEndpoint1, convertA)
The idea is now to turn your endpoints into a list of UUID => Option[A] and fold over that list. Here's your list:
val endpoints = List(
makeSafe(maybeMyDataInEndpoint1, convertA),
makeSafe(maybeMyDataInEndpoint2, convertB),
makeSafe(maybeMyDataInEndpoint3, convertC)
)
You can now fold on it manually, which is what #g.krastev did:
def mainCall(id: UUID): Option[Data] =
endpoints.foldLeft(None: Option[Data])(_ orElse _(id))
If you're fine with a cats dependency, the notion of folding over a list of options is just a concrete use case of a common pattern (the interaction of Foldable and Monoid):
import cats._
import cats.implicits._
def mainCall(id: UUID): Option[Data] = endpoints.foldMap(_(id))
There are other ways to make this nicer still, but they might be overkill in this context - I'd probably declare a type class to turn any type into a Data, say, to give makeSafe a cleaner type signature.

Define return value in Spark Scala UDF

Imagine the following code:
def myUdf(arg: Int) = udf((vector: MyData) => {
// complex logic that returns a Double
})
How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double?
I see two ways to do it, either define a method first and then lift it to a function
def myMethod(vector:MyData) : Double = {
// complex logic that returns a Double
}
val myUdf = udf(myMethod _)
or define a function first with explicit type:
val myFunction: Function1[MyData,Double] = (vector:MyData) => {
// complex logic that returns a Double
}
val myUdf = udf(myFunction)
I normally use the firt approach for my UDFs
Spark functions define several udf methods that have the following modifier/type: static <RT,A1, ..., A10> UserDefinedFunction
You can specify the input/output data types in square brackets as follows:
def myUdf(arg: Int) = udf[Double, MyData]((vector: MyData) => {
// complex logic that returns a Double
})
You can pass a type parameter to udf but you need to seemingly counter-intuitively pass the return type first, followed by the input types like [ReturnType, ArgTypes...], at least as of Spark 2.3.x. Using the original example (which seems to be a curried function based on arg):
def myUdf(arg: Int) = udf[Double, Seq[Int]]((vector: Seq[Int]) => {
13.37 // whatever
})
There is nothing special about UDF with lambda functions, they behave just like scala lambda function (see Specifying the lambda return type in Scala) so you could do:
def myUdf(arg: Int) = udf(((vector: MyData) => {
// complex logic that returns a Double
}): (MyData => Double))
or instead explicitly define your function:
def myFuncWithArg(arg: Int) {
def myFunc(vector: MyData): Double = {
// complex logic that returns a Double. Use arg here
}
myFunc _
}
def myUdf(arg: Int) = udf(myFuncWithArg(arg))

How can I extend Scala collections with member values?

Say I have the following data structure:
case class Timestamped[CC[M] < Seq[M]](elems : CC, timestamp : String)
So it's essentially a sequence with an attribute -- a timestamp -- attached to it. This works fine and I could create new instances with the syntax
val t = Timestamped(Seq(1,2,3,4),"2014-02-25")
t.elems.head // 1
t.timestamp // "2014-05-25"
The syntax is unwieldly and instead I want to be able to do something like:
Timestamped(1,2,3,4)("2014-02-25")
t.head // 1
t.timestamp // "2014-05-25"
Where timestamped is just an extension of a Seq and it's implementation SeqLike, with a single attribute val timestamp : String.
This seems easy to do; just use a Seq with a mixin TimestampMixin { val timestamp : String }. But I can't figure out how to create the constructor. My question is: how do I create a constructor in the companion object, that creates a sequence with an extra member value? The signature is as follows:
object Timestamped {
def apply(elems: M*)(timestamp : String) : Seq[M] with TimestampMixin = ???
}
You'll see that it's not straightforward; collections use Builders to instantiate themselves, so I can't simply call the constructor an override some vals.
Scala collections are very complicated structures when it comes down to it. Extending Seq requires implementing apply, length, and iterator methods. In the end, you'll probably end up duplicating existing code for List, Set, or something else. You'll also probably have to worry about CanBuildFroms for your collection, which in the end I don't think is worth it if you just want to add a field.
Instead, consider an implicit conversion from your Timestamped type to Seq.
case class Timestamped[A](elems: Seq[A])(timestamp: String)
object Timestamped {
implicit def toSeq[A](ts: Timestamped[A]): Seq[A] = ts.elems
}
Now, whenever I try to call a method from Seq, the compiler will implicitly convert Timestamped to Seq, and we can proceed as normal.
scala> val ts = Timestamped(List(1,2,3,4))("1/2/34")
ts: Timestamped[Int] = Timestamped(List(1, 2, 3, 4))
scala> ts.filter(_ > 2)
res18: Seq[Int] = List(3, 4)
There is one major drawback here, and it's that we're now stuck with Seq after performing operations on the original Timestamped.
Go the other way... extend Seq, it only has 3 abstract members:
case class Stamped[T](elems: Seq[T], stamp: Long) extends Seq[T] {
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(List(10,20,30), 15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Keep in mind, this only works as long as Seq is specific enough for your use cases, if you later find you need more complex collection behavior this may become untenable.
EDIT: Added a version closer to the signature you were trying to create. Not sure if this helps you any more:
case class Stamped[T](elems: T*)(stamp: Long) extends Seq[T] {
def timeStamp = stamp
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(10,20,30)(15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Where elems would end up being a generically created WrappedArray.