How to externalize protobuf files in JVM ecosystem? - scala

I stumbled upon this Akka grpc tutorial which suggests that we can create a jar from a project that has .proto file under src/main/proto and add it as a dependency in client and server projects to build their respective stubs.
libraryDependencies += "com.example" %% "my-grpc-service" % "1.0.0" % "protobuf-src"
But this doesn't seem to work!! Are there any example projects that demonstrates how this would work in action? How can we externalize protobuf sources and use the same in a jvm based project?

I was able to figure out how to externalise protobuf files properly as per suggestion from akka-grpc docs.
The problem was that I was not adding sbt-akka-grpc plugin required by sbt to recognise .proto files and include them in the packaged jar. Without this plugin there won't be any .proto file made available in the packaged jar.
addSbtPlugin("com.lightbend.akka.grpc" % "sbt-akka-grpc" % "1.1.0")
Make sure to add organization settings in your build.sbt to prepare jar correctly.
organization := "com.iamsmkr"
Also, if you wish to cross-compile this jar to multiple versions add following entries in your build.sbt:
scalaVersion := "2.13.3"
crossScalaVersions := Seq(scalaVersion.value, "2.12.14")
and then to publish:
$ sbt +publishLocal
With appropriate jars published you can now add them as dependencies in your client and server projects like so:
libraryDependencies +=
"com.iamsmkr" %% "prime-protobuf" % protobufSourceVersion % "protobuf-src"
You can check out this project I am working on to see this in action.
Alternate Way
An alternate way I figured is that you can keep your .proto files in a root directory and then refer them in client and server build.sbt like so:
PB.protoSources.in(Compile) := Seq(sourceDirectory.value / ".." / ".." / "proto")
Checkout this project to see it in action.

Related

sbt assembly, including my jar

I want to build a 'fat' jar of my code. I understand how to do this mostly but all the examples I have use the idea that the jar is not local and I am not sure how to include into my assembled jar another JAR that I built that the scala code uses. Like what folder does this JAR I have to include reside in?
Normally when I run my current code as a test using spark-shell it looks like this:
spark-shell --jars magellan_2.11-1.0.6-SNAPSHOT.jar -i st_magellan_abby2.scala
(the jar file is right in the same path as the .scala file)
So now I want to build a build.sbt file that does the same and includes that SNAPSHOT.jar file?
name := "PSGApp"
version := "1.0"
scalaVersion := "2.11.8"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
//provided means don't included it is there. already on cluster?
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.2.0" % "provided",
//add magellan here somehow?
)
So where would I put the jar in the SBT project folder structure so it gets picked up when I run sbt assembly? Is that in the main/resources folder? Which the reference manual says is where 'files to include in the main jar' go?
What would I put in the libraryDependencies here so it knows to add that specific jar and not go out into the internet to get it?
One last thing, I was also doing some imports in my test code that doesn't seem to fly now that I put this code in an object with a def main attached to it.
I had things like:
import sqlContext.implicits._ which was right in the code above where it was about to be used like so:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.functions.udf
val distance =udf {(a: Point, b: Point) =>
a.withinCircle(b, .001f); //current radius set to .0001
}
I am not sure can I just keep these imports inside the def main? or do I have to move them elsewhere somehow? (Still learning scala and wrangling the scoping I guess).
One way is to build your fat jar using the assembly plugin (https://github.com/sbt/sbt-assembly) locally and publishLocal to store the resulting jar into your local ivy2 cache
This will make it available for inclusion in your other project based on build.sbt settings in this project, eg:
name := "My Project"
organization := "org.me"
version := "0.1-SNAPSHOT"
Will be locally available as "org.me" %% "my-project" % "0.1-SNAPSHOT"
SBT will search local cache before trying to download from external repo.
However, this is considered bad practise, because only final project should ever be a fat-jar. You should never include one as dependency (many headaches).
There is no reason to make project magellan a fat-jar if library is included in PGapp. Just publishLocal without assembly
Another way is to make projects dependant on each other as code, not library.
lazy val projMagellan = RootProject("../magellan")
lazy val projPSGApp = project.in(file(".")).dependsOn(projMagellan)
This makes compilation in projPSGApp tigger compilation in projMagellan.
It depends on your use case though.
Just don't get in a situation where you have to manage your .jar manually
The other question:
import sqlContext.implicits._ should always be included in the scope where dataframe actions are required, so you shouldn't put that import near the other ones in the header
Update
Based on discussion in comments, my advise would be:
Get the magellan repo
git clone git#github.com:harsha2010/magellan.git
Create a branch to work on, eg.
git checkout -b new-stuff
Change the code you want
Then update the versioning number, eg.
version := "1.0.7-SNAPSHOT"
Publish locally
sbt publishLocal
You'll see something like (after a while):
[info] published ivy to /Users/tomlous/.ivy2/local/harsha2010/magellan_2.11/1.0.7-SNAPSHOT/ivys/ivy.xml
Go to your other project
Change build.sbt to include
"harsha2010" %% "magellan" % "1.0.7-SNAPSHOT" in your libraryDependencies
Now you have a good (temp) reference to your library.
Your PSGApp should be build as an fat jar assembly to pass to Spark
sbt clean assembly
This will pull in the custom build jar
If the change in the magellan project is usefull for the rest of the world, you should push your changes and create a pull request, so that in the future you can just include the latest build of this library

How to activate the sbt DockerPlugin in scala?

I have two scala projects, one is already defined to build its docker container through the sbt docker plugin. The other one I want to dockerify as well.
The working one has in its build.sbt the following lines relevant to the docker config:
organization := "com.namespace"
name := "dockerized-app"
version := sys.env.getOrElse("PIPELINE_VERSION", "0.1.0_local")
scalaVersion := "2.12.4"
enablePlugins(JavaAppPackaging)
enablePlugins(DockerPlugin)
packageName in Docker := packageName.value
dockerRepository := Some("our-docker.io:5001")
dockerExposedPorts := Seq(8080)
I thought that I could copy paste the relevant lines to the new project, change the name, and make it work.
Yet when I add the line to the about to be dockerified scala project:
enablePlugins(DockerPlugin)
I get the error:
Cannot resolve symbol DockerPlugin
I've looked through the prexisting projects libraryDependencies, yet it doesn't seem to be configured that way. In the the pre-configured project, IntellJ somehow knows the plugin, I can track that the DockerPlugin comes from com.typesafe.sbt.packager.docker. This made me assume that sbt comes shipped with it by default.
Yet apparently I have to activate it somehow.
Digging deeper I also tried adding this to my plugins.sbt to no avail:
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.3.2")
How to activate DockerPlugin using sbt in scala?
In order to make it working properly you need to add the following line:
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.3.2")
in your project/plugins.sbt file.
Then refresh your project and it should work.
For further information, please check the Sbt Native Packager documentation.

Including a Spark Package JAR file in a SBT generated fat JAR

The spark-daria project is uploaded to Spark Packages and I'm accessing spark-daria code in another SBT project with the sbt-spark-package plugin.
I can include spark-daria in the fat JAR file generated by sbt assembly with the following code in the build.sbt file.
spDependencies += "mrpowers/spark-daria:0.3.0"
val requiredJars = List("spark-daria-0.3.0.jar")
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter { f =>
!requiredJars.contains(f.data.getName)
}
}
This code feels like a hack. Is there a better way to include spark-daria in the fat JAR file?
N.B. I want to build a semi-fat JAR file here. I want spark-daria to be included in the JAR file, but I don't want all of Spark in the JAR file!
The README for version 0.2.6 states the following:
In any case where you really can't specify Spark dependencies using sparkComponents (e.g. you have exclusion rules) and configure them as provided (e.g. standalone jar for a demo), you may use spIgnoreProvided := true to properly use the assembly plugin.
You should then use this flag on your build definition and set your Spark dependencies as provided as I do with spark-sql:2.2.0 in the following example:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
Please note that by setting this your IDE may no longer have the necessary dependencies references to compile and run your code locally, which would mean that you would have to add the necessary JARs to the classpath by hand. I do this often on IntelliJ, what I do is having a Spark distribution on my machine and adding its jars directory to the IntelliJ project definition (this question may help you with that, should you need it).

How to declare a sbt plugin?

I have cloned sbteclipse.I am reading scala-sbt tutorial,says:
If your project is in directory hello, and you’re adding sbt-site plugin to the build definition, create hello/project/site.sbt
I am in /home/mil directory,and sbteclipse is in the same directory.Inside sbteclipse is project directory with plugins.sbt
libraryDependencies += "org.scala-sbt" % "scripted-plugin" % sbtVersion.value
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.3")
addSbtPlugin("org.scalariform" % "sbt-scalariform" % "1.6.0")
Should I edit this file or not?How to declare plugin in oreder to enable eclipse to recognize them?
As a general rule, SBT plugins has to be defined in the project subfolder of your project's root. SBT aggregates definitions from all files with .sbt extension, so it doesn't really matter if you call it plugins.sbt or site.sbt. It's also fine to have multiple .sbt files.
So in your particular case, either add that plugin to the existing project/plugins.sbt, or follow the tutorial and create a new file, project/site.sbt, and add the plugin there.
P.S. this rule of searching and aggregating definitions from all .sbt files applies not only to meta-build (project directory), but also for the build itself. It might make sense to split big build definitions into multiple .sbt files. E.g. we have build.sbt for main definitions (library dependencies, etc.) and docker.sbt for Docker definitions.

How do I resolve my own test artifacts in SBT?

One of my projects will provide a jar package supposed to be used for unit testing in several other projects. So far I managed to make sbt produce a objects-commons_2.10-0.1-SNAPSHOT-test.jar and have it published in my repository.
However, I can't find a way to tell sbt to use that artifact with the testing scope in other projects.
Adding the following dependencies in my build.scala will not get the test artifact loaded.
"com.company" %% "objects-commons" % "0.1-SNAPSHOT",
"com.company" %% "objects-commons" % "0.1-SNAPSHOT-test" % "test",
What I need is to use the default .jar file as compile and runtime dependency and the -test.jar as dependency in my test scope. But somehow sbt never tries to resolve the test jar.
How to use test artifacts
To enable publishing the test artifact when the main artifact is published you need to add to your build.sbt of the library:
publishArtifact in (Test, packageBin) := true
Publish your artifact. There should be at least two JARs: objects-commons_2.10.jar and objects-commons_2.10-test.jar.
To use the library at runtime and the test library at test scope add the following lines to build.sbt of the main application:
libraryDependencies ++= Seq("com.company" % "objects-commons_2.10" % "0.1-SNAPSHOT"
, "com.company" % "objects-commons_2.10" % "0.1-SNAPSHOT" % "test" classifier "tests" //for SBT 12: classifier test (not tests with s)
)
The first entry loads the the runtime libraries and the second entry forces that the "tests" artifact is only available in the test scope.
I created an example project:
git clone git#github.com:schleichardt/stackoverflow-answers.git --branch so15290881-how-do-i-resolve-my-own-test-artifacts-in-sbt
Or you can view the example directly in github.
Your problem is that sbt thinks that your two jars are the same artifact, but with different versions. It takes the "latest", which is 0.1-SNAPSHOT, and ignores the 0.1-SNAPSHOT-test. This is the same behaviour as you would see if, for instance you have 0.1-SNAPSHOT and 0.2-SNAPSHOT.
I don't know what is in these two jars, but if you want them both to be on the classpath, which is what you seem to want to do, then you'll need to change the name of the test artifact to objects-commons-test, as Kazuhiro suggested. It seems that this should be easy enough for you, since you're already putting it in the repo yourself.
It will work fine if you change the name like this.
"com.company" %% "objects-commons" % "0.1-SNAPSHOT",
"com.company" %% "objects-commons-test" % "0.1-SNAPSHOT" % "test",