Databricks SCALA UDF cannot load class when registering function - scala

I have followed this guide and this question trying to implement a decryption function to use in a SQL view.
I have compiled this scala code in the example to a jar file and uploaded to the Databricks File System (DBFS):
import com.macasaet.fernet.{Key, StringValidator, Token};
import org.apache.hadoop.hive.ql.exec.UDF;
import java.time.{Duration, Instant};
class Validator extends StringValidator {
override def getTimeToLive() : java.time.temporal.TemporalAmount = {
Duration.ofSeconds(Instant.MAX.getEpochSecond());
}
}
class udfDecrypt extends UDF {
def evaluate(inputVal: String, sparkKey : String): String = {
if( inputVal != null && inputVal!="" ) {
val keys: Key = new Key(sparkKey)
val token = Token.fromString(inputVal)
val validator = new Validator() {}
val payload = token.validateAndDecrypt(keys, validator)
payload
} else return inputVal
}
}
I can declare the function as demonstrated:
%sql
CREATE OR REPLACE FUNCTION default.udfDecrypt AS 'com.nm.udf.udfDecrypt'
USING jar 'dbfs:/FileStore/jars/decryptUDF.jar';
But if I try to call it an error is thrown:
%sql
SELECT default.udfDecrypt(field, '{key}') FROM default.encrypted_test;
Error in SQL statement: AnalysisException: Can not load class 'com.nm.udf.udfDecrypt' when registering the function 'default.udfDecrypt', please make sure it is on the classpath; line 1 pos 7
I have noticed that the function can be declared using any jar file path (even one that doesn't exist) and it will still return 'OK'.
I am using Databricks for Azure.

It seems like your UDF code is missing:
package com.nm.udf;
at the top.

Update as of 2022 October, because the accepted solution did not work for me.
First off, the given Scala code is incorrect, you need to add...
import java.time.Duration
import java.time.Instant;
To the top of the code.
Secondly, after packing the .scala file to jar (using sbt package for example...), when you create the function...
CREATE OR REPLACE FUNCTION udfDecryptor AS 'udfDecrypt'
USING jar 'dbfs:/FileStore/jars/decryptUDF.jar';
Then the alias of the function must match the name of the class.
Pay attention to having the build.sbt file with the correct version and dependencies, for example...
ThisBuild / version := "0.2.0-properscala"
ThisBuild / scalaVersion := "2.12.14"
lazy val root = (project in file("."))
.settings(
name := "the_name_of_project_here"
)
// https://mvnrepository.com/artifact/org.scala-lang/scala-library
libraryDependencies += "com.macasaet.fernet" % "fernet-java8" % "1.5.0"
// https://mvnrepository.com/artifact/org.apache.hive/hive-exec
libraryDependencies += "org.apache.hive" % "hive-exec" % "3.1.2"
This is the only way I managed to solve it and made running of the code successful. Hope it helps.

There could be some hidden parts in the guidlines.
Here is a complete repo to make things in details.
https://github.com/dungruoc/databricks-fernet-udf

Related

Is it possible to parse Json in build.sbt?

I want to validate a downloaded Json file from server during build time and failed the build, if there are any errors.
Is it possible to parse/validate Json in build.sbt?
Your build.sbt is scala code so it can do everything you can do with other scala code.
You should be able to add dependencies (e.g. a json parsing library) of your build.sbt code in project/build.sbt since sbt is recursive.
Here is an example to supplement Jasper-M's answer.
For example, add liahoy's requests-scala HTTP client library, and upickle JSON deserialisation library to project/builds.sbt
libraryDependencies ++= List(
"com.lihaoyi" %% "requests" % "0.6.0",
"com.lihaoyi" %% "upickle" % "1.1.0"
)
Then under project/Preconditions.scala add the following object which will contain assertions you want to check before running the build
object Preconditions {
import scala.util.Try
import requests._
import upickle.default._
case class User(login: String, id: Int)
implicit val userRW: ReadWriter[User] = macroRW
def validateUserJson() = {
val result = Try(read[User](get("https://api.github.com/users/lihaoyi").text)).isSuccess
assert(result, "User JSON should be valid")
}
}
Now these facilities will be available to build.sbt under the root project. Lets create a task in build.sbt to run the assertions
lazy val checkPreconditions = taskKey[Unit]("Validate pre-conditions before building")
checkPreconditions := {
Preconditions.validateUserJson()
println("All preconditions passed!")
}
and finally lets make compile task dependant on checkPreconditions task using dependsOn like so
Compile / compile := (Compile / compile).dependsOn(checkPreconditions).value
Now executing sbt compile should check pre-conditions before proceeding with compilation.

Simplest way to generate Verilog code from Chisel code

What is the simplest way to generate Verilog code from existing Chisel code?
Would i have to create my own build file?
For example from a standalone scala file (AND.scala) like the following one..
import Chisel._
class AND extends Module {
val io = IO(new Bundle {
val a = Bool(INPUT)
val b = Bool(INPUT)
val out = Bool(OUTPUT)
})
io.out := io.a & io.b
}
I have the complete Chisel3 Toolchain installed under ubuntu 16.4.
See answer here: Is there a simple example of how to generate verilog from Chisel3 module?
In short, create a build.sbt file at the root of your project with the following in it:
scalaVersion := "2.12.13"
resolvers ++= Seq(
Resolver.sonatypeRepo("snapshots"),
Resolver.sonatypeRepo("releases")
)
libraryDependencies += "edu.berkeley.cs" %% "chisel3" % "3.4.4"
Add this code to AND.scala
object ANDDriver extends App {
(new chisel3.stage.ChiselStage).emitVerilog(new AND, args)
}
Type sbt run on the command line at the root of your project.

How to do Slick configuration via application.conf from within custom sbt task?

I want to create an set task which creates a database schema with slick. For that, I have a task object like the following in my project:
object CreateSchema {
val instance = Database.forConfig("localDb")
def main(args: Array[String]) {
val createFuture = instance.run(createActions)
...
Await.ready(createFuture, Duration.Inf)
}
}
and in my build.sbt I define a task:
lazy val createSchema = taskKey[Unit]("CREATE database schema")
fullRunTask(createSchema, Runtime, "sbt.CreateSchema")
which gets executed as expected when I run sbt createSchema from the command line.
However, the problem is that application.conf doesn't seem to get taken into account (I've also tried different scopes like Compile or Test). As a result, the task fails due to com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'localDb'.
How can I fix this so the configuration is available?
I found a lot of questions here that deal with using the application.conf inside the build.sbt itself, but that is not what I need.
I have setup a little demo using SBT 0.13.8 and Slick 3.0.0, which is working as expected. (And even without modifying "-Dconfig.resource".)
Files
./build.sbt
name := "SO_20150915"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"com.typesafe" % "config" % "1.3.0" withSources() withJavadoc(),
"com.typesafe.slick" %% "slick" % "3.0.0",
"org.slf4j" % "slf4j-nop" % "1.6.4",
"com.h2database" % "h2" % "1.3.175"
)
lazy val createSchema = taskKey[Unit]("CREATE database schema")
fullRunTask(createSchema, Runtime, "somefun.CallMe")
./project/build.properties
sbt.version = 0.13.8
./src/main/resources/reference.conf
hello {
world = "buuh."
}
h2mem1 = {
url = "jdbc:h2:mem:test1"
driver = org.h2.Driver
connectionPool = disabled
keepAliveConnection = true
}
./src/main/scala/somefun/CallMe.scala
package somefun
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory
import slick.driver.H2Driver.api._
/**
* SO_20150915
* Created by martin on 15.09.15.
*/
object CallMe {
def main(args: Array[String]) : Unit = {
println("Hello")
val settings = new Settings()
println(s"Settings read from hello.world: ${settings.hw}" )
val db = Database.forConfig("h2mem1")
try {
// ...
println("Do something with your database.")
} finally db.close
}
}
class Settings(val config: Config) {
// This verifies that the Config is sane and has our
// reference config. Importantly, we specify the "di3"
// path so we only validate settings that belong to this
// library. Otherwise, we might throw mistaken errors about
// settings we know nothing about.
config.checkValid(ConfigFactory.defaultReference(), "hello")
// This uses the standard default Config, if none is provided,
// which simplifies apps willing to use the defaults
def this() {
this(ConfigFactory.load())
}
val hw = config.getString("hello.world")
}
Result
If running sbt createSchema from Console I obtain the output
[info] Loading project definition from /home/.../SO_20150915/project
[info] Set current project to SO_20150915 (in build file:/home/.../SO_20150915/)
[info] Running somefun.CallMe
Hello
Settings read from hello.world: buuh.
Do something with your database.
[success] Total time: 1 s, completed 15.09.2015 10:42:20
Ideas
Please verify that this unmodified demo project also works for you.
Then try changing SBT version in the demo project and see if that changes something.
Alternatively, recheck your project setup and try to use a higher version of SBT.
Answer
So, even if your code resides in your src-folder, it is called from within SBT. That means, you are trying to load your application.conf from within the classpath context of SBT.
Slick uses Typesafe Config internally. (So the approach below (described in background) is not applicable, as you can not modify the Config loading mechanism itself).
Instead try the set the path to your application.conf explicitly via config.resource, see typesafe config docu (search for config.resource)
Option 1
Either set config.resource (via -Dconfig.resource=...) before starting sbt
Option 2
Or from within build.sbt as Scala code
sys.props("config.resource") = "./src/main/resources/application.conf"
Option 3
Or create a Task in SBT via
lazy val configPath = TaskKey[Unit]("configPath", "Set path for application.conf")
and add
configPath := Def.task {
sys.props("config.resource") = "./src/main/resources/application.conf"
}
to your sequence of settings.
Please let me know, if that worked.
Background information
Recently, I was writing a custom plugin for SBT, where I also tried to access a reference.conf as well. Unfortunately, I was not able to access any of .conf placed within project-subfolder using the default ClassLoader.
In the end I created a testenvironment.conf in project folder and used the following code to load the (typesafe) config:
def getConfig: Config = {
val classLoader = new java.net.URLClassLoader( Array( new File("./project/").toURI.toURL ) )
ConfigFactory.load(classLoader, "testenvironment")
}
or for loading a genereal application.conf from ./src/main/resources:
def getConfig: Config = {
val classLoader = new java.net.URLClassLoader( Array( new File("./src/main/resources/").toURI.toURL ) )
// no .conf basename given, so look for reference.conf and application.conf
// use specific classLoader
ConfigFactory.load(classLoader)
}

Why is my Scala class not visible to its matching test class?

I am starting out in Scala with SBT, making a Hello World program.
Here's my project layout:
I've made sure to download the very latest JDK and Scala, and configure my Project Settings. Here's my build.sbt:
name := "Coursera_Scala"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "2.2.4" % "test"
Hello.scala itself compiles okay:
package demo
class Hello {
def sayHelloTo(name: String) = "Hello, $name!"
}
However, my accompanying HelloTest.scala does not. Here's the test:
package demo
import org.scalatest.FunSuite
class HelloTest extends FunSuite {
test("testSayHello") {
val result = new Hello.sayHelloTo("Scala")
assert(result == "Hello, Scala!")
}
}
Here's the error:
Error:(8, 22) not found: value Hello
val result = new Hello.sayHello("Scala")
^
In addition to the compile error, Intellij shows errors "Cannot resolve symbol" for the symbols Hello, assert, and ==. This leads me to believe that the build is set up incorrectly, but then wouldn't there be an error on the import?
The problem is this expression:
new Hello.sayHelloTo("Scala")
This creates a new instance of the class sayHelloTo defined on the value Hello. However, there is no value Hello, just the class Hello.
You want to write this:
(new Hello).sayHelloTo("Scala")
This creates a new instance of the class Hello and calls sayHelloTo on the instance.
Or you can use new Hello().sayHelloTo(...). Writing () should create a new instance and then call the method.

scala.tools.nsc.IMain within Play 2.1

I googled a lot and am totally stuck now. I know, that there are similar questions but please read to the end. I have tried all proposed solutions and none did work.
I am trying to use the IMain class from scala.tools.nsc within a Play 2.1 project (Using Scala 2.10.0).
Controller Code
This is the code, where I try to use the IMain in a Websocket. This is only for testing.
object Scala extends Controller {
def session = WebSocket.using[String] { request =>
val interpreter = new IMain()
val (out,channel) = Concurrent.broadcast[String]
val in = Iteratee.foreach[String]{ code =>
interpreter.interpret(code) match {
case Results.Error => channel.push("error")
case Results.Incomplete => channel.push("incomplete")
case Results.Success => channel.push("success")
}
}
(in,out)
}
}
As soon as something gets sent over the Websocket the following error gets logged by play:
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Build.scala
object ApplicationBuild extends Build {
val appName = "escalator"
val appVersion = "1.0-SNAPSHOT"
val appDependencies = Seq(
"org.scala-lang" % "scala-compiler" % "2.10.0"
)
val main = play.Project(appName, appVersion, appDependencies).settings(
)
}
What I have tried so far
All this didn't work:
I have included fork := true in the Build.scala
A Settings object with:
embeddedDefaults[MyType]
usejavacp.value = true
The soultion proposed as answer to Question Embedded Scala REPL inherits parent classpath
I dont know what to do now.
The problem here is that sbt doesnt add scala-library to the class path.
The following workaround works.
First create a folder lib in the top project directory(the parent of app,conf etc) and copy there the scala-library.jar
Then you can use the following code to host an interpreter :
val settings = new Settings
settings.bootclasspath.value +=scala.tools.util.PathResolver.Environment.javaBootClassPath + File.pathSeparator + "lib/scala-library.jar"
val in = new IMain(settings){
override protected def parentClassLoader = settings.getClass.getClassLoader()
}
val res = in.interpret("val x = 1")
The above creates the bootclasspath by adding to the java class the scala library. It's not a problem with play framework it comes from the sbt. The same problem occures for any scala project when it runs with sbt. Tested with a simple project. When it runs from eclipse its works fine.
EDIT: Link to sample project demonstrating the above.`
I wonder if the reflect jar is missing. Try adding this too in appDependencies.
"org.scala-lang" % "scala-reflect" % "2.10.0"