How to take the logarithm of an RDD in Spark (Scala) - scala

How do I take the logarithm of an RDD? I have a val rdd: RDD[Double] and I simply want to take the logarithm of it.
This is essentially the same question as this, but the solution proposed does not work. I run:
val rdd: RDD[Double] = <something>
val log_y = rdd.map(x => org.apache.commons.math3.analysis.function.Log(2.0, x))
and I get the error:
error: object org.apache.commons.math3.analysis.function.Log is not a value

As I see, this class from math3 computes the natural log function and it works like:
new org.apache.commons.math3.analysis.function.Log().value(3)
res1: Double = 1.0986122886681098
It is the version that comes with the Spark 3.1.2.
val log_y = rdd.map(x => org.apache.commons.math3.analysis.function.Log().value)
If you are using this same version
This is the code of this class:
public class Log implements UnivariateDifferentiableFunction, DifferentiableUnivariateFunction {
/** {#inheritDoc} */
public double value(double x) {
return FastMath.log(x);
}
/** {#inheritDoc}
* #deprecated as of 3.1, replaced by {#link #value(DerivativeStructure)}
*/
#Deprecated
public UnivariateFunction derivative() {
return FunctionUtils.toDifferentiableUnivariateFunction(this).derivative();
}
/** {#inheritDoc}
* #since 3.1
*/
public DerivativeStructure value(final DerivativeStructure t) {
return t.log();
}
}
As you can see, you need to create an instance and then the value property is what actually executes the log function. Although you could create an instance outside the map and mark it as transient to avoid creating a new instance per RDD element. Or you can use this class in a mapPartition function to create only an instance per partition.

Related

How can I view the code that Scala uses to automatically generate the apply function for case classes?

When defining a Scala case class, an apply function is automatically generated which behaves similarly to the way the default constructor in java behaves. How can I see the code which automatically generates the apply function? I presume the code is a macro in the Scala compiler somewhere but I'm not sure.
To clarify I am not interested in viewing the resultant apply method of a given case class but interested in the macro/code which generates the apply method.
It's not a macro. Methods are synthesized by compiler "manually".
apply, unapply, copy are generated in scala.tools.nsc.typechecker.Namers
https://github.com/scala/scala/blob/2.13.x/src/compiler/scala/tools/nsc/typechecker/Namers.scala#L1839-L1862
/** Given a case class
* case class C[Ts] (ps: Us)
* Add the following methods to toScope:
* 1. if case class is not abstract, add
* <synthetic> <case> def apply[Ts](ps: Us): C[Ts] = new C[Ts](ps)
* 2. add a method
* <synthetic> <case> def unapply[Ts](x: C[Ts]) = <ret-val>
* where <ret-val> is the caseClassUnapplyReturnValue of class C (see UnApplies.scala)
*
* #param cdef is the class definition of the case class
* #param namer is the namer of the module class (the comp. obj)
*/
def addApplyUnapply(cdef: ClassDef, namer: Namer): Unit = {
if (!cdef.symbol.hasAbstractFlag)
namer.enterSyntheticSym(caseModuleApplyMeth(cdef))
val primaryConstructorArity = treeInfo.firstConstructorArgs(cdef.impl.body).size
if (primaryConstructorArity <= MaxTupleArity)
namer.enterSyntheticSym(caseModuleUnapplyMeth(cdef))
}
def addCopyMethod(cdef: ClassDef, namer: Namer): Unit = {
caseClassCopyMeth(cdef) foreach namer.enterSyntheticSym
}
https://github.com/scala/scala/blob/2.13.x/src/compiler/scala/tools/nsc/typechecker/Namers.scala#L1195-L1219
private def templateSig(templ: Template): Type = {
//...
// add apply and unapply methods to companion objects of case classes,
// unless they exist already; here, "clazz" is the module class
if (clazz.isModuleClass) {
clazz.attachments.get[ClassForCaseCompanionAttachment] foreach { cma =>
val cdef = cma.caseClass
assert(cdef.mods.isCase, "expected case class: "+ cdef)
addApplyUnapply(cdef, templateNamer)
}
}
// add the copy method to case classes; this needs to be done here, not in SyntheticMethods, because
// the namer phase must traverse this copy method to create default getters for its parameters.
// here, clazz is the ClassSymbol of the case class (not the module). (!clazz.hasModuleFlag) excludes
// the moduleClass symbol of the companion object when the companion is a "case object".
if (clazz.isCaseClass && !clazz.hasModuleFlag) {
val modClass = companionSymbolOf(clazz, context).moduleClass
modClass.attachments.get[ClassForCaseCompanionAttachment] foreach { cma =>
val cdef = cma.caseClass
def hasCopy = (decls containsName nme.copy) || parents.exists(_.member(nme.copy).exists)
// scala/bug#5956 needs (cdef.symbol == clazz): there can be multiple class symbols with the same name
if (cdef.symbol == clazz && !hasCopy)
addCopyMethod(cdef, templateNamer)
}
}
equals, hashCode, toString are generated in scala.tools.nsc.typechecker.SyntheticMethods
https://github.com/scala/scala/blob/2.13.x/src/compiler/scala/tools/nsc/typechecker/SyntheticMethods.scala
/** Synthetic method implementations for case classes and case objects.
*
* Added to all case classes/objects:
* def productArity: Int
* def productElement(n: Int): Any
* def productPrefix: String
* def productIterator: Iterator[Any]
*
* Selectively added to case classes/objects, unless a non-default
* implementation already exists:
* def equals(other: Any): Boolean
* def hashCode(): Int
* def canEqual(other: Any): Boolean
* def toString(): String
*
* Special handling:
* protected def writeReplace(): AnyRef
*/
trait SyntheticMethods extends ast.TreeDSL {
//...
Symbols for accessors are created in scala.reflect.internal.Symbols
https://github.com/scala/scala/blob/2.13.x/src/reflect/scala/reflect/internal/Symbols.scala#L2103-L2128
/** For a case class, the symbols of the accessor methods, one for each
* argument in the first parameter list of the primary constructor.
* The empty list for all other classes.
*
* This list will be sorted to correspond to the declaration order
* in the constructor parameter
*/
final def caseFieldAccessors: List[Symbol] = {
// We can't rely on the ordering of the case field accessors within decls --
// handling of non-public parameters seems to change the order (see scala/bug#7035.)
//
// Luckily, the constrParamAccessors are still sorted properly, so sort the field-accessors using them
// (need to undo name-mangling, including the sneaky trailing whitespace)
//
// The slightly more principled approach of using the paramss of the
// primary constructor leads to cycles in, for example, pos/t5084.scala.
val primaryNames = constrParamAccessors map (_.name.dropLocal)
def nameStartsWithOrigDollar(name: Name, prefix: Name) =
name.startsWith(prefix) && name.length > prefix.length + 1 && name.charAt(prefix.length) == '$'
caseFieldAccessorsUnsorted.sortBy { acc =>
primaryNames indexWhere { orig =>
(acc.name == orig) || nameStartsWithOrigDollar(acc.name, orig)
}
}
}
private final def caseFieldAccessorsUnsorted: List[Symbol] = info.decls.toList.filter(_.isCaseAccessorMethod)
Perhaps I could point out few points in the codebase that might be relevant.
First, there is a way to correlate Scala Language Specification grammar directly to source code. For example, case classes rule
TmplDef ::= β€˜case’ β€˜class’ ClassDef
relates to Parser.tmplDef
/** {{{
* TmplDef ::= [case] class ClassDef
* | [case] object ObjectDef
* | [override] trait TraitDef
* }}}
*/
def tmplDef(pos: Offset, mods: Modifiers): Tree = {
...
in.token match {
...
case CASECLASS =>
classDef(pos, (mods | Flags.CASE) withPosition (Flags.CASE, tokenRange(in.prev /*scanner skips on 'case' to 'class', thus take prev*/)))
...
}
}
Specification continues
A case class definition of 𝑐[tps](ps1)…(ps𝑛) with type parameters
tps and value parameters ps implies the definition of a companion
object, which serves as an extractor object.
object 𝑐 {
def apply[tps](ps1)…(ps𝑛): 𝑐[tps] = new 𝑐[Ts](xs1)…(xs𝑛)
def unapply[tps](π‘₯: 𝑐[tps]) =
if (x eq null) scala.None
else scala.Some(π‘₯.xs11,…,π‘₯.xs1π‘˜)
}
so let us try to hunt for implied definition of
def apply[tps](ps1)…(ps𝑛): 𝑐[tps] = new 𝑐[Ts](xs1)…(xs𝑛)
which is another way of saying synthesised definition. Promisingly, there exists MethodSynthesis.scala
/** Logic related to method synthesis which involves cooperation between
* Namer and Typer.
*/
trait MethodSynthesis {
Thus we find two more potential clues Namer and Typer. I wonder what is in there? But first MethodSynthesis.scala has only approx 300 LOC, so let us just skim through a bit. We stumble accross a promising line
val methDef = factoryMeth(classDef.mods & (AccessFlags | FINAL) | METHOD | IMPLICIT | SYNTHETIC, classDef.name.toTermName, classDef)
"factoryMeth"... there is a ring to it. Find usages! We are quickly led to
/** The apply method corresponding to a case class
*/
def caseModuleApplyMeth(cdef: ClassDef): DefDef = {
val inheritedMods = constrMods(cdef)
val mods =
if (applyShouldInheritAccess(inheritedMods))
(caseMods | (inheritedMods.flags & PRIVATE)).copy(privateWithin = inheritedMods.privateWithin)
else
caseMods
factoryMeth(mods, nme.apply, cdef)
}
It seems we are on the right track. We also note the name
nme.apply
which is
val apply: NameType = nameType("apply")
Eagerly, we find usages of caseModuleApplyMeth and we are wormholed to Namer.addApplyUnapply
/** Given a case class
* case class C[Ts] (ps: Us)
* Add the following methods to toScope:
* 1. if case class is not abstract, add
* <synthetic> <case> def apply[Ts](ps: Us): C[Ts] = new C[Ts](ps)
* 2. add a method
* <synthetic> <case> def unapply[Ts](x: C[Ts]) = <ret-val>
* where <ret-val> is the caseClassUnapplyReturnValue of class C (see UnApplies.scala)
*
* #param cdef is the class definition of the case class
* #param namer is the namer of the module class (the comp. obj)
*/
def addApplyUnapply(cdef: ClassDef, namer: Namer): Unit = {
if (!cdef.symbol.hasAbstractFlag)
namer.enterSyntheticSym(caseModuleApplyMeth(cdef))
val primaryConstructorArity = treeInfo.firstConstructorArgs(cdef.impl.body).size
if (primaryConstructorArity <= MaxTupleArity)
namer.enterSyntheticSym(caseModuleUnapplyMeth(cdef))
}
Woohoo! The documentation states
<synthetic> <case> def apply[Ts](ps: Us): C[Ts] = new C[Ts](ps)
which seems eerily similar to SLS version
def apply[tps](ps1)…(ps𝑛): 𝑐[tps] = new 𝑐[Ts](xs1)…(xs𝑛)
Our stumbling-in-the-dark seems to have led us to a discovery.
I noticed that, while others have posted the pieces of code that generate the name of the method, the signature, the type, the corresponding symbols in the symbol table, and pretty much everything else, so far nobody has posted the piece of code that generates the actual body of the case class companion object apply method.
That code is in scala.tools.nsc.typechecker.Unapplies.factoryMeth(mods: Global.Modifiers, name: Global.TermName, cdef: Global.ClassDef): Global.DefDef which is defined in src/compiler/scala/tools/nsc/typechecker/Unapplies.scala, and the relevant part is this:
atPos(cdef.pos.focus)(
DefDef(mods, name, tparams, cparamss, classtpe,
New(classtpe, mmap(cparamss)(gen.paramToArg)))
)
which uses the TreeDSL internal Domain Specific Language for generating Syntax Nodes in the Abstract Syntax Tree, and (roughly) means this:
At the current position in the tree (atPos(cdef.pos.focus))
Splice in a method definition node (DefDef)
Whose body is just a New node, i.e. a constructor invocation.
The description of the TreeDSL trait states:
The goal is that the code generating code should look a lot like the code it generates.
And I think that is true, and makes the code easy to read even if you are not familiar with the compiler internals.
Compare the generating code once again with the generated code:
DefDef(mods, name, tparams, cparamss, classtpe,
New(classtpe, mmap(cparamss)(gen.paramToArg)))
def apply[Tparams](constructorParams): CaseClassType =
new CaseClassType(constructorParams)

Refer to class `type` without an instance

I am just starting to learn Scala and was following this tutorial. They implement a Tree structure as the code below shows to create mathematical formula's. Halfway through they introduce the type keyword as
type Environment = String => Int
so that all variables can be mapped to numbers.
My question is, how do I refer to the type without having an instance of Tree? I.e. how can I define the type Environment as static.
Example code:
package com.company
/**
* The tree class which represents a formula
*/
abstract class Tree {
/**
* The type to map a variable to a value
*/
type Environment = String => Int
/**
* Using the given Environment, it evaluates this tree
* #param env The environment which maps variables to values
* #return The result of the fomula
*/
def eval(env: Environment): Int = this match {
case Sum(lhs, rhs) => lhs.eval(env) + rhs.eval(env)
case Variable(v) => env(v)
case Constant(c) => c
}
}
/**
* Represents a sum between to values
* #param lhs Left hand value
* #param rhs Right hand value
*/
case class Sum(lhs: Tree, rhs: Tree) extends Tree
/**
* Represents an unknown and named value or named variable
* #param variable The name of the variable
*/
case class Variable(variable: String) extends Tree
/**
* An unnamed constant value
* #param value The value of this constant
*/
case class Constant(value: Int) extends Tree
/**
* Base class for this program
*/
object Tree {
/**
* Entry point of the application
* #param args
*/
def main(args: Array[String]) {
//Define a tree: (x + 3) + 2
val tree: Tree = Sum(Sum(Variable("x"), Constant(3)), Constant(2))
//Define an environment: x=5 and y=6
val env: tree.Environment = {case "x" => 5; case "y" => 6}
// ^ Refers to the object tree rather than the class Tree
//val env: Tree.Environment = {case "x" => 5; case "y" => 6}
// ^ Results in the error: Error:(55, 27) type Environment is not a member of object com.company.Tree
println(tree) //show the tree
println(s"x=${env.apply("x")} | y=${env.apply("y")}") //show values of x and y
println(tree.eval(env)) //evaluates the tree
}
}
Use # (type projection) to access type members without referring to the instance:
val env: Tree#Environment = {case "x" => 5; case "y" => 6}
More explanation provided here: What does the `#` operator mean in Scala?
P.S. You cannot actually make your type "static" in the full sense of the word - aka static member, as JVM actually "forgets" about such types (erasure), so both tree.Environment and Tree.Environment (surprisingly) are processed by compiler, not in runtime. Even object Tree{ type Environment = ... } means that your type is going to be a member of singleton object Tree (so Tree would still have a reference)
In Scala types, just like everything else, can be made "static" by placing them inside an object.
abstract class Tree
object Tree {
type Environment = String => Int
}
I say "static", because Scala (the language) has no notion of static like some other programming languages.

Importing data as dynamically as possible

Im looking for some mini pattern:
The program should be able to support various formats as Input and then apply a transformation and in the last step load them into a database.
Its main purpose is to provide test data.
My initial idea was to "glue" different components together like this:
We have an extractor that extracts from a generic datasource [A] to an iterator of [B]
and then a transformator that maps [B] to [C] and finally a step that loads [C] into a database. I'm sure there must be a better way of approaching this. Is there a better , possibly more generic way of achieving this?
trait Importer[A, B, C] {
val extractor: Extractor[A, B]
val transformer: Transformator[B, C]
val loader: Loader[C]
/**
* this is the method call for chaining all events together
*/
def importAndTransformData(dataSource: A): Unit =
{
/**
* extraction step
*/
val output = extractor.extract(dataSource: A)
/**
* conversion method
*/
val transformed = output map (transformer.transform(_))
/**
* loading step
*/
transformed.foreach(loader.load(_))
}
}
with best regards,
Stefan
One common approach used in Scala is self-typing (especially as used in the Cake Pattern). In your case that would look something like:
trait Importer[A, B, C] {
self: Extractor[A, B] with Transformator[B, C] with Loader[C] =>
/**
* this is the method call for chaining all events together
*/
def importAndTransformData(dataSource: A): Unit =
{
/**
* extraction step
*/
val output = extract(dataSource: A)
/**
* conversion method
*/
val transformed = output map (transform(_))
/**
* loading step
*/
transformed.foreach(load(_))
}
}
You can then build your Importer with code such as:
val importer = new Importer with FooExtractor with FooBarTransformer with BarLoader {}
or
val testImporter = Importer with MockExtractor with TestTransformer with MockLoader {}
or similar for your test cases.

Using Scala 2.10 implicit classes to convert to "built-in" standard library classes

I am trying to use the new Scala 2.10 implicit class mechanism to convert a java.sql.ResultSet to a scala.collection.immutable.Stream. In Scala 2.9 I use the following code, which works:
/**
* Implicitly convert a ResultSet to a Stream[ResultSet]. The Stream can then be
* traversed using the usual methods map, filter, etc.
*
* #param resultSet the Result to convert
* #return a Stream wrapped around the ResultSet
*/
implicit def resultSet2Stream(resultSet: ResultSet): Stream[ResultSet] = {
if (resultSet.next) Stream.cons(resultSet, resultSet2Stream(resultSet))
else {
resultSet.close()
Stream.empty
}
}
I can then use it like this:
val resultSet = statement.executeQuery("SELECT * FROM foo")
resultSet.map {
row => /* ... */
}
The implicit class that I came up with looks like this:
/**
* Implicitly convert a ResultSet to a Stream[ResultSet]. The Stream can then be
* traversed using the usual map, filter, etc.
*/
implicit class ResultSetStream(val row: ResultSet)
extends AnyVal {
def toStream: Stream[ResultSet] = {
if (row.next) Stream.cons(row, row.toStream)
else {
row.close()
Stream.empty
}
}
}
However, now I must call toStream on the ResultSet, which sort of defeats the "implicit" part:
val resultSet = statement.executeQuery("SELECT * FROM foo")
resultSet.toStream.map {
row => /* ... */
}
What am I doing wrong?
Should I still be using the implicit def and import scala.language.implicitConversions to avoid the "features" warning?
UPDATE
Here is an alternative solution that converts the ResultSet into a scala.collection.Iterator (only Scala 2.10+):
/*
* Treat a java.sql.ResultSet as an Iterator, allowing operations like filter,
* map, etc.
*
* Sample usage:
* val resultSet = statement.executeQuery("...")
* resultSet.map {
* resultSet =>
* // ...
* }
*/
implicit class ResultSetIterator(resultSet: ResultSet)
extends Iterator[ResultSet] {
def hasNext: Boolean = resultSet.next()
def next() = resultSet
}
I don't see a reason here to use implicit classes. Stick to you first version. Implicit classes are mainly useful (as in "concise") to add methods to existing types (the so called "enrich my library" pattern).
It is just syntactic sugar for a wrapper class and an implicit conversion to this class.
But here you are just converting (implicitly) from one preexisting type to another preexisting type. There is no need to define a new class at all (let alone an implicit class).
In your case, you could make it work using implicit classes by making ResultSetStream extend Stream and implementing as a proxy to toStream. But that would really a lot fo trouble for nothing.

Is is possible to capture the type parameter of a trait using Manifests in Scala 2.7.7?

I'm writing a ServletUnitTest trait in Scala to provide a convenience API for ServletUnit. I have something like the following in mind:
/**
* Utility trait for HttpUnit/ServletUnit tests
*
* #param [T] Type parameter for the class under test
*/
trait ServletUnitTest[T <: HttpServlet] {
/**
* Resource name of the servlet, used to construct the servlet URL.
*/
val servletName: String
/**
* Servlet class under test
*/
implicit val servletClass: Manifest[T]
/**
* ServletUnit {#link ServletRunner}
*/
sealed lazy val servletRunner: ServletRunner = {
val sr = new ServletRunner();
sr.registerServlet(servletName, servletClass.erasure.getName);
sr
}
/**
* A {#link com.meterware.servletunit.ServletUnitClient}
*/
sealed lazy val servletClient = servletRunner.newClient
/**
* The servlet URL, useful for constructing WebRequests
*/
sealed lazy val servletUrl = "http://localhost/" + servletName
def servlet(ic: InvocationContext) = ic.getServlet.asInstanceOf[T]
}
class MyServletTest extends ServletIUnitTest[MyServlet] {
val servletName = "download"
// ... test code ...
}
This code doesn't compile as written, but hopefully my intent is clear. Is there a way to do this (with or without Manifests)?
While researching this topic, I found about a solution in this scala-list post by Jorge Ortiz, which did the trick for me, and is simpler than Aaron's.
In essence, his solution is (paraphrasing):
trait A[T] {
implicit val t: Manifest[T]
}
class B[T: Manifest] extends A[T] {
override val t = manifest[T]
}
(I'm ignoring the OP request to be 2.7.7 compatible as I'm writing this in 2011...)
For now, Scala represents traits as interfaces so this technique will work. There are some problems with this approach to implementing traits, however, in that when methods are added to a trait, the implementing class will not necessarily recompile because the interface representation only has a forwarding method pointing to another class that actually implements the method concretely. In response to this there was talk earlier this year of using interface injection into the JVM at runtime to get around this problem. If the powers that be use this approach then the trait's type information will be lost before you can capture it.
The type information is accessible with the Java reflection API. It's not pretty but it works:
trait A[T]{
def typeParameter = {
val genericType = getClass.getGenericInterfaces()(0).asInstanceOf[ParameterizedType]
genericType.getActualTypeArguments()(0)
}
}
class B extends A[Int]
new B().typeParameter -> java.lang.Integer
Some invariant checks should be added I've only implemented the happy path.
I found a solution that works, but it's pretty awkward since it requires the test class to call a method (clazz) on the trait before any of the trait's lazy vals are evaluated.
/**
* Utility trait for HttpUnit/ServletUnit tests
*
* #param [T] Type parameter for the class under test
*/
trait ServletUnitTest[T <: HttpServlet] {
/**
* Resource name of the servlet, used to construct the servlet URL.
*/
val servletName: String
/**
* Servlet class under test
*/
val servletClass: Class[_] // = clazz
protected def clazz(implicit m: Manifest[T]) = m.erasure
/**
* ServletUnit {#link ServletRunner}
*/
sealed lazy val servletRunner: ServletRunner = {
val sr = new ServletRunner();
sr.registerServlet(servletName, servletClass.getName);
sr
}
/**
* A {#link com.meterware.servletunit.ServletUnitClient}
*/
sealed lazy val servletClient = servletRunner.newClient
/**
* The servlet URL, useful for constructing WebRequests
*/
sealed lazy val servletUrl = "http://localhost/" + servletName
def servlet(ic: InvocationContext) = ic.getServlet.asInstanceOf[T]
}
class MyServletTest extends ServletIUnitTest[MyServlet] {
val servletName = "download"
val servletClass = clazz
// ... test code ...
}