Python Type Hints for Apache beam ValueProvider - apache-beam

How does one use type hints for ValueProvider value types passed to PTransform and DoFn classes?
class MyPTransform(beam.PTransform):
def __init__(self, my_value_provider: ValueProvider):
# How do I enforce my_value_provider has value_type of str
self.my_value_provider = my_value_provider
I can make this a RuntimeValueProvider or StaticValueProvider and test this explicitly:
type(my_value_provider.type) == str
How do others do this? I didn't see anything here: https://beam.apache.org/documentation/sdks/python-type-safety

I don't think there is a way to enforce this with python's type checking, although you could always add your own runtime type check to potentially improve error messages.
Alternatively, you can avoid having to use ValueProvider at all by using Flex Templates instead.

I ended up creating a wrapper value provider like so, opinions/comments welcome.
from typing import Generic, TypeVar
from apache_beam.options.value_provider import ValueProvider, StaticValueProvider, RuntimeValueProvider
T = TypeVar('T')
class TypedValueProvider(ValueProvider, Generic[T]):
"""
Providers a wrapper around beam's ValueProvider to allow specifying value type hints
"""
def __init__(self, value_provider: ValueProvider, value_type: Type[T]):
self.value_provider = value_provider
self.value_type = value_type
assert value_provider.value_type == value_type
def is_accessible(self):
return self.value_provider.is_accessible()
def get(self):
return self.value_provider.get()
if __name__ == '__main__':
svp = StaticValueProvider(str, 'blah')
dvp: TypedValueProvider[str] = TypedValueProvider(svp, str)
print(isinstance(dvp, ValueProvider))
assert 'blah' == dvp.get()
rvp = RuntimeValueProvider('key', str, 'default')
RuntimeValueProvider.set_runtime_options({'key': 'value'})
dvp: TypedValueProvider[str] = TypedValueProvider(rvp, str)
print(isinstance(dvp, ValueProvider))
assert 'value' == dvp.get()

Related

How to validate string format with scala in compile time

My function has 1 parameter and type is string but lenght is 4, Can I validate this parameter in compile time?
In haskell and F# have type level and it can validation in compile time, like nonEmptyList.
How to make it in scala. I think shapless can do this but I don't understand
Thank you for advance suggestion
Yes, Shapeless can do this. Perhaps something like this:
def f(s: Sized[IndexedSeq[Char], Nat._4]): ...
You wouldn't be able to pass strings directly to this, though. You'd have to do something like f(Sized('a', 'b', 'c', 'd'))
You can't with vanilla Scala.
The best approach you can go is creating a special type for this -
case class SpecialString(string: String) {
require(string.length == 4)
}
Then, make your function receive SpecialString as a parameter instead of a String.
Using macros is also an option for compile-time validation. See this post by Arnout Engelen: http://blog.xebia.com/compile-time-evaluation-scala-macros/
I've modified his example to define a string-validating function:
object CompileTimeStringCheck {
import scala.language.experimental.macros
// This function exposed to consumers has a normal Scala type:
def stringCheck(s: String): String =
// but it is implemented as a macro:
macro CompileTimeStringCheck.stringCheck_impl
import scala.reflect.macros.blackbox.Context
// The macro implementation will receive a 'Context' and
// the AST's of the parameters passed to it:
def stringCheck_impl(c: Context)(s: c.Expr[String]): c.Expr[String] = {
import c.universe._
// We can pattern-match on the AST:
s match {
case Expr(Literal(Constant(nValue: String))) =>
// We perform the calculation:
val result = normalStringCheck(nValue)
// And produce an AST for the result of the computation:
c.Expr(Literal(Constant(result)))
case other =>
// Yes, this will be printed at compile time:
println("Yow!")
???
}
}
// The actual implementation is regular old-fashioned scala code:
private def normalStringCheck(s: String): String =
if (s.length == 4) return s
else throw new Exception("Baaaaaah!")
}
Here's the catch, though: This needs to be compiled before it can be used, i.e. put it into a utils jar or something. Then you can use it at a later compile time:
import CompileTimeStringCheck._
object Test extends App {
println(stringCheck("yes!"))
}
Again, see Arnout Engelen's post for more details and the original solution (http://blog.xebia.com/compile-time-evaluation-scala-macros/).

Why Scala needs duplicate constructor? (java.lang.NoSuchMethodException)

I was receiving this error in my Hadoop job.
java.lang.NoSuchMethodException: <PackageName>.<ClassName>.<init>(<parameters>)
In most Scala code, you would have it in compile time. But since this job is called in runtime I was not catching it in compile time.
I would think default parameter would cause constructors with both signatures to be created, one taking a single argument.
class BasicDynamicBlocker(args: Args, evaluation: Boolean = false) extends Job(args) with HiveAccess {
//I NEEDED THIS TOO:
def this(args: Args) = {
this(args, false)
}
...
}
I learned the hard way that I needed to declare the overloaded constructor using this. (I wanted to write this out in case it helps someone else.)
I also have a small questions. It still seems redundant to me. Is there a reason Scala language's design restrictions require this?
It is not like when you have default parameter you will get overloads generated for each possible case, like for example:
def method(num: Int = 4, str: String = "") = ???
you expect compiler to generate
def method(num: Int) = method(num, "")
def method(str: String) = method(4, str)
def method() = method(4, "")
but that is not the case.
You will instead have generated methods (in companion object), for each default param
def method$default$1: Int = 4
def method$default$2: String = "a"
and whenever you say in your code
method(str = "a")
it will be just changed to
method(method$default$1, "a")
So in your case, constructor with signature this(args: Args) just did not exist, there was only the 2 param version.
You can read more here: http://docs.scala-lang.org/sips/completed/named-and-default-arguments.html

implement conversion parameters function with scala

I'm trying to implement something like clever parameters converter function with Scala.
Basically in my program I need to read parameters from a properties file, so obviously they are all strings and I would like then to convert each parameter in a specific type that I pass as parameter.
This is the implementation that I start coding:
def getParam[T](key : String , value : String, paramClass : T): Any = {
value match {
paramClass match {
case i if i == Int => value.trim.toInt
case b if b == Boolean => value.trim.toBoolean
case _ => value.trim
}
}
/* Exception handling is missing at the moment */
}
Usage:
val convertedInt = getParam("some.int.property.key", "10", Int)
val convertedBoolean = getParam("some.boolean.property.key", "true", Boolean)
val plainString = getParam("some.string.property.key", "value",String)
Points to note:
For my program now I need just 3 main type of type: String ,Int and Boolean,
if is possible I would like to extends to more object type
This is not clever, cause I need to explicit the matching against every possibile type to convert, I would like an more reflectional like approach
This code doesn't work, it give me compile error: "object java.lang.String is not a value" when I try to convert( actually no conversion happen because property values came as String).
Can anyone help me? I'm quite newbie in Scala and maybe I missing something
The Scala approach for a problem that you are trying to solve is context bounds. Given a type T you can require an object like ParamMeta[T], which will do all conversions for you. So you can rewrite your code to something like this:
trait ParamMeta[T] {
def apply(v: String): T
}
def getParam[T](key: String, value: String)(implicit meta: ParamMeta[T]): T =
meta(value.trim)
implicit case object IntMeta extends ParamMeta[Int] {
def apply(v: String): Int = v.toInt
}
// and so on
getParam[Int](/* ... */, "127") // = 127
There is even no need to throw exceptions! If you supply an unsupported type as getParam type argument, code will even not compile. You can rewrite signature of getParam using a syntax sugar for context bounds, T: Bound, which will require implicit value Bound[T], and you will need to use implicitly[Bound[T]] to access that values (because there will be no parameter name for it).
Also this code does not use reflection at all, because compiler searches for an implicit value ParamMeta[Int], founds it in object IntMeta and rewrites function call like getParam[Int](..., "127")(IntMeta), so it will get all required values at compile time.
If you feel that writing those case objects is too boilerplate, and you are sure that you will not need another method in these objects in future (for example, to convert T back to String), you can simplify declarations like this:
case class ParamMeta[T](f: String => T) {
def apply(s: String): T = f(s)
}
implicit val stringMeta = ParamMeta(identity)
implicit val intMeta = ParamMeta(_.toInt)
To avoid importing them every time you use getParam you can declare these implicits in a companion object of ParamMeta trait/case class, and Scala will pick them automatically.
As for original match approach, you can pass a implicit ClassTag[T] to your function, so you will be able to match classes. You do not need to create any values for ClassTag, as the compiler will pass it automatically. Here is a simple example how to do class matching:
import scala.reflect.ClassTag
import scala.reflect._
def test[T: ClassTag] = classTag[T].runtimeClass match {
case x if x == classOf[Int] => "I'm an int!"
case x if x == classOf[String] => "I'm a string!"
}
println(test[Int])
println(test[String])
However, this approach is less flexible than ParamMeta one, and ParamMeta should be preferred.

Optional boolean parameters in Scala

I've been lately working on the DSL-style library wrapper over Apache POI functionality and faced a challenge which I can't seem to good solution for.
One of the goals of the library is to provide user with ability to build a spreadsheet model as a collection of immutable objects, i.e.
val headerStyle = CellStyle(fillPattern = CellFill.Solid, fillForegroundColor = Color.AquaMarine, font = Font(bold = true))
val italicStyle = CellStyle(font = Font(italic = true))
with the following assumptions:
User can optionally specify any parameter (that means, that you can create CellStyle with no parameters as well as with the full list of explicitly specified parameters);
If the parameter hasn't been specified explicitly by the user it is considered undefined and the default environment value (default value for the format we're converting to) will be used;
The 2nd point is important, as I want to convert this data model into multiple formats and i.e. the default font in Excel doesn't have to be the same as default font in HTML browser (and if user doesn't define the font family explicitly I'd like him to see the data using those defaults).
To deal with the requirements I've used the variation of the null pattern described here: Pattern for optional-parameters in Scala using null and also suggested here Scala default parameters and null (below a simplified example).
object ModelObject {
def apply(modelParam : String = null) : ModelObject = ModelObject(
modelParam = Option(modelParam)
)
}
case class ModelObject private(modelParam : Option[String])
Since null is used only internally in the companion object and very localized I decided to accept the null-sacrifice for the sake of the simplicity of the solution. The pattern works well with all the reference classes.
However for Scala primitive types wrappers null cannot be specified. This is especially a huge problem with Boolean for which I effectively consider 3 states (true, false and undefined). Wanting to provide the interface, where user still be able to write bold = true I decided to reach to Java wrappers which accept nulls.
object ModelObject {
def apply(boolParam : java.lang.Boolean = null) : ModelObject = ModelObject(
boolParam = Option(boolParam).map(_.booleanValue)
)
}
case class ModelObject private(boolParam : Option[Boolean])
This however doesn't right and I've been wondering whether there is a better approach to the problem. I've been thinking about defining the union types (with additional object denoting undefined value): How to define "type disjunction" (union types)?, however since the undefined state shouldn't be used explicitly the parameter type exposed by IDE to the user, it is going to be very confusing (ideally I'd like it to be Boolean).
Is there any better approach to the problem?
Further information:
More DSL API examples: https://github.com/norbert-radyk/spoiwo/blob/master/examples/com/norbitltd/spoiwo/examples/quickguide/SpoiwoExamples.scala
Sample implementation of the full class: https://github.com/norbert-radyk/spoiwo/blob/master/src/main/scala/com/norbitltd/spoiwo/model/CellStyle.scala
You can use a variation of the pattern I described here: How to provide helper methods to build a Map
To sum it up, you can use some helper generic class to represent optional arguments (much like an Option).
abstract sealed class OptArg[+T] {
def toOption: Option[T]
}
object OptArg{
implicit def autoWrap[T]( value: T ): OptArg[T] = SomeArg(value)
implicit def toOption[T]( arg: OptArg[T] ): Option[T] = arg.toOption
}
case class SomeArg[+T]( value: T ) extends OptArg[T] {
def toOption = Some( value )
}
case object NoArg extends OptArg[Nothing] {
val toOption = None
}
You can simply use it like this:
scala>case class ModelObject(boolParam: OptArg[Boolean] = NoArg)
defined class ModelObject
scala> ModelObject(true)
res12: ModelObject = ModelObject(SomeArg(true))
scala> ModelObject()
res13: ModelObject = ModelObject(NoArg)
However as you can see the OptArg now leaks in the ModelObject class itself (boolParam is typed as OptArg[Boolean] instead of Option[Boolean].
Fixing this (if it is important to you) just requires to define a separate factory as you have done yourself:
scala> :paste
// Entering paste mode (ctrl-D to finish)
case class ModelObject private(boolParam: Option[Boolean])
object ModelObject {
def apply(boolParam: OptArg[Boolean] = NoArg): ModelObject = new ModelObject(
boolParam = boolParam.toOption
)
}
// Exiting paste mode, now interpreting.
defined class ModelObject
defined module ModelObject
scala> ModelObject(true)
res22: ModelObject = ModelObject(Some(true))
scala> ModelObject()
res23: ModelObject = ModelObject(None)
UPDATE The advantage of using this pattern, over simply defining several overloaded apply methods as shown by #drexin is that in the latter case the number of overloads grows very fast with the number of arguments(2^N). If ModelObject had 4 parameters, that would mean 16 overloads to write by hand!

Customer Type Mapper for Slick SQL

I've found this example from slick testing:
https://github.com/slick/slick/blob/master/slick-testkit/src/main/scala/com/typesafe/slick/testkit/tests/MapperTest.scala
sealed trait Bool
case object True extends Bool
case object False extends Bool
implicit val boolTypeMapper = MappedColumnType.base[Bool, String](
{ b =>
assertNotNull(b)
if(b == True) "y" else "n"
}, { i =>
assertNotNull(i)
if(i == "y") True else False
}
)
But I'm trying to create a TypeMapper for org.joda.time.DateTime to/from java.sql.Timestamp - but without much success. The Bool example is very particular and I'm having trouble adapting it. Joda Time is super common - so any help would be much appreciated.
To be clear, I'm using interpolated sql"""select colA,colB from tableA where id = ${id}""" and such. When doing a select the system works well by using jodaDate types in the implicit GetResult converter.
However, for inserts there doesn't seem to be a way to do an implicit conversion or it is ignoring the code provided below in Answer #1 - same error as before:
could not find implicit value for parameter pconv: scala.slick.jdbc.SetParameter[(Option[Int], String, String, Option[org.joda.time.DateTime])]
I'm not using the Lifted style Slick configuration with the annotated Table objects perhaps that is why it is not finding/using the TypeMapper
I use the following in my code, which might also work for you:
import java.sql.Timestamp
import org.joda.time.DateTime
import org.joda.time.DateTimeZone.UTC
import scala.slick.lifted.MappedTypeMapper.base
import scala.slick.lifted.TypeMapper
implicit val DateTimeMapper: TypeMapper[DateTime] =
base[DateTime, Timestamp](
d => new Timestamp(d millis),
t => new DateTime(t getTime, UTC))
Edit (after your edit =^.~= ): (a bit late but I hope it still helps)
Ah, OK, since you're not using lifted embedding, you'll have to define different implicit values (as indicated by the error message from the compiler). Something like the following should work (though I haven't tried myself):
implicit val SetDateTime: SetParameter[DateTime] = new SetParameter {
def apply(d: DateTime, p: PositionedParameters): Unit =
p setTimestamp (new Timestamp(d millis))
}
For the other way round (retrieving the results of the SELECT), it looks like you'd need to define a GetResult:
implicit val GetDateTime: GetResult[DateTime] = new GetResult {
def apply(r: PositionedResult) = new DateTime(r.nextTimestamp getTime, UTC))
}
So, basically this is just the same as with the lifted embedding, just encoded with different types.
Why not to digg into something that works great?
Look at
https://gist.github.com/dragisak/4756344
and
https://github.com/tototoshi/slick-joda-mapper
First you could copy-paste to your project, and the second is available from Maven central.