Building scala libraries and using them in databricks - scala

I have fair knowledge of scala and i use it in databricks for my data engineering needs. I want to create some customer libraries that i can use in all other notebooks. Here is what i'm looking for
create a scala notebook helperfunctions.scala which will have functions like ParseUrl(), GetUrl() etc
Deploy these libraries on databricks cluster
Call these libraries from another notebook using 'import from helperfunctions as fn' and use the functions
Can you give me an idea about how to get started? What does databricks offer?

I'd suggest not using notebooks as imports.
You can compile and package your functions as a JAR from plain JVM code using your preferred tooling, then upload it to something like JitPack or GitHub Packages, which you can then import your utilities as a Maven reference like other Spark dependencies

Related

How to build a sdk in scala which can be reused in other projects

I am new to scala and programming and I have written a class in spark scala that is generic and should be reused by other projects. How can I build a sdk in scala sbt so that I can reuse this class without re writing the code again. Any online documentation would help.
What you're calling a "sdk" sounds more like what we call a "library".
The high level steps are:
create your library project (regular sbt project)
configure this project for publishing
publish the project to a repository (Artifactory, Nexus..) or optionally publish locally if you don't need a repository
in any other project, add your library project as a dependency
I would recommend reading the sbt documentation which should cover the most of it. For instance, Publishing with sbt.

Distributing a jar for use in pyspark

I've built a jar that I can use from pyspark by adding it to ${SPARK_HOME}/jars and calling it using
spark._sc._jvm.com.mypackage.myclass.mymethod()
however what I'd like to do is bundle that jar into a python wheel so someone can pip install a the jar into their running pyspark/jupyter session. I'm not very familiar with python packaging, is it possible to distribute jars inside a wheel and have that jar be automatically available to pyspark?
I want to put a jar inside a wheel or egg (not even sure if I can do that???) and upon installation of said wheel/egg, out that jar in a place where it will be available to the jvm.
I guess what I'm really asking is, how do I make it easy for someone to install a 3rd party jar and use it from pyspark?
As you have mentioned above, and hope you have already used the --jars option and able to use function in pyspark. As understood your requirement correctly you want to add this jar in install package so that jar library will be available on each node of cluster.
There is one source found on databricks which talks about adding third party jar files pyspark python wheel install. See if that is only information you are looking at.
https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel

Importing scala packages (no jar) at Zeppelin

This about running Spark/Scala from a Zeppelin notebook.
In order to better modularize and reorganize code, I need to import existing Scala classes, packages or functions into the notebook, preferably skipping creating a jar file (much the same as in PySpark).
Something like:
import myclass
where 'myclass' is implemented in a .scala file. Probably this source code needs to reside in a specific location for Zeppelin.
Currently there's no such feature in zeppelin.
The only way of doing what you propose is to add a jar to Spark's classpath jars. At least, that's how I'm using it.
I wouldn't recommend the practice of importing scala classes from somewhere in a .scala file. That code should be packaged and made available for all workers, such as in all cluster workers and master.

download scowl in Lunix using command line

I have to manipulate my OWL file with Scala language. therefore, I found that scowl: https://github.com/phenoscape/scowl facilitates this task. I'm working with Lunix and Virtual Machine.
my questions are:
how to download scowl in Lunix with command line?
how to upload my owl file from github in Lunix?
Thanks in advance
I am the author of Scowl. It is meant to be used as a library in your Scala application. So you would add it as a dependency in your build file, whether you are using SBT or Maven or something else. It's available from Maven Central, and instructions are in the Scowl readme. I'm not sure what you mean by uploading your OWL file. Scowl provides a domain-specific language in Scala to allow you to manipulate OWL axioms in a readable way.

IntelliJ scala console, interactive developement

I have just started "coding" in Scala, coming from F# I am trying to find a way to have a similar environement.
Currently I am using IntelliJ 10.0.2. with the Scala plugin. On any given project I am trying to set up the following:
When opened the scala console loads the external libraries of the project ( this works )
A command/way to load the files where code is under developement. For instance you define a few modules in your project that you would like to test, so you would do something like load "module1.scala", load"module2.scala" etc.. in the console.
Is this possible?
Note:
com.intellij.rt.execution.application.AppMain.org.jetbrains.plugins.scala.compiler.rt.ConsoleRunner seems to load all the external libraries
there seems to be a function :load but when I supply it arguments it returns "file does not exist") actually was just a silly mistake was using C: instead of c:
Many thanks
Google for the sbt plugin for idea. SBT (Simple Build Tool) is an interactive build tool for Scala. It has a "console" command that loads the dependencies and your classes when it starts up. SBT offers other goodies, as well. The idea plugin lets you use the two together.
It's not possible like IntelliJ IDEA feature.
You can add appropriate issue to Scala plugin issue tracker: http://youtrack.jetbrains.net/issues/SCL.