I'm trying to use Janusgraph in scala script with tinkerpop 3. I use the gremlin.scala library (https://github.com/mpollmeier/gremlin-scala) but I get an error about HNil (see below). How to use gremlin in scala script and Janusgraph ?
import gremlin.scala._
import org.apache.commons.configuration.BaseConfiguration
import org.janusgraph.core.JanusGraphFactory
import org.apache.tinkerpop.gremlin.structure.Graph
object Janus {
def main(args: Array[String]): Unit = {
val conf = new BaseConfiguration()
conf.setProperty("storage.backend","inmemory")
val graph = JanusGraphFactory.open(conf)
val v1 = graph.graph.addV("test")
}
}
Error:(11, 14) Symbol 'type scala.ScalaObject' is missing from the classpath.
This symbol is required by 'trait shapeless.HNil'.
Make sure that type ScalaObject is in your classpath and check for conflicting dependencies with -Ylog-classpath.
A full rebuild may help if 'HNil.class' was compiled against an incompatible version of scala.
val v1 = graph.graph.addV("test")
Not sure what you mean by 'scala script', but it looks like you're missing many (all?) dependencies. Did you have a look at https://github.com/mpollmeier/gremlin-scala-examples/ ? It contains an example setup for janusgraph.
Related
I'm running Scala code on Azure databricks well. Now I want to move this code from Azure notebook to eclipse.
I install databricks connection following Microsoft document successfully. Pass databricks data connection test.
I also installed SBT and import to my project in eclipse
I create scala object in eclipse and also I import all jar files as external file in pyspark
package Student
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.SparkSession
import java.util.Properties
//import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
object Test {
def isTypeSame(df: DataFrame, name: String, coltype: String) = (df.schema(name).dataType.toString == coltype)
def main(args: Array[String]){
var Result = true
val Borrowers = List(("col1", "StringType"),("col2", "StringType"),("col3", "DecimalType(38,18)"))
val dfPcllcus22 = spark.read.format("parquet").load("/mnt/slraw/ServiceCenter=*******.parquet")
if (Result == false) println("Test Fail, Please check") else println("Test Pass")
}
}
When I run this code in eclipse, it shows cannot find main class. But if I comment "val dfPcllcus22 = spark.read.format("parquet").load("/mnt/slraw/ServiceCenter=*******.parquet")", pass the test.
So it seems spark.read.format cannot be recognized.
I'm new to Scala and DataBricks.
I was researching result for several days but still cannot solve it.
If anyone can help, really appreciate.
Environment is a bit complicated to me, if more information required, please let me know
SparkSession is needed to run your code in eclipse, since your provided code does not have this line for SparkSession creation leads to an error,
val spark = SparkSession.builder.appName("SparkDBFSParquet").master("local[*]".getOrCreate()
Please add this line and run the code and it should work.
I would like to run a scala code on Zeppelin from Spark cluster.
For example:
This is code into hdfs Spark "HelloWorldScala.scala":
object HelloWorldScala{
def main (arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName("myApp_Enrico")
val spark = SparkSession.builder.config(conf).getOrCreate()
val aList = List(1,2,3,4,5,6,7,8,9,10)
val aRdd = spark.sparkContext.parallelize(aList)
println("********* HELLO WORLD AND HELLO SPARK!! ******")
println("Print even numbers")
aRdd.filter(x=>x%2==0).map(x=>x*2).collect().foreach(println)
}
}
I would like to import in Zeppelin the HelloWorldScala file and run main, but I see the error:
Error code Zeppelin
Unfortunately you can't import single file in Zeppelin. You can pack your scala files into .jar library and put it to spark.jars (setted as property in spark) directory, after you will can import your library using line: import your.libray.packages.YourClass and using non-private functions from it. If you don't know about jar packages, and spark.jar directories just read a bit more about that.
UPDATE:
%dep
z.load("your_package_group:artifact:version")
%spark
import com.yourpackage.HelloWorldScala
I'm using sublime to write my first Scala program, and I'm using terminal to run it.
First I use scalac assignment2.scala command to compile it, but it show error message:"error: object apache is not a member of package org"
How can I do to fix it?
This is my code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object assignment2 {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("assignment2")
val sc = new SparkContext(conf)
val input = sc.parallelize(List(1, 2, 3, 4))
val result = input.map(x => x * x)
println(result.collect().mkString(","))
}
}
Where are you trying to submit the job. To run any spark application you need to submit it from bin/spark-submit in your spark installation directory or you need to have spark-home set in your environment, which you can refer while submitting.
Actually you can't run spark-scala file directly because for compilation your scala class, you need spark library. So for executing scala file you required spark-shell. For executing your spark scala file inside spark-shell, please find the below steps:
Open your spark-shell using next command-
'spark-shell --master yarn-client'
load your file with exact location-
':load File_Name_With_Absoulte_path'
Run you main method using class name- 'ClassName.main(null)'
I am receiving the following error when I try to run my code:
Error:(104, 63) type mismatch;
found : hydrant.spark.hydrant.spark.IPPortPair
required: hydrant.spark.(some other)hydrant.spark.IPPortPair
IPPortPair(it.getHeader.getDestinationIP, it.getHeader.getDestinationPort))
My code uses a case class defined in the package object spark to set up the IP/Port map for each connection.
The package object looks like this:
package object spark{
case class IPPortPair(ip:Long,port:Long)
}
And the code using the package object like the below:
package hydrant.spark
import java.io.{File,PrintStream}
object identifyCustomers{
……………
def mapCustomers(suspectTraffic:RDD[Generic])={
suspectTraffic.filter(
it => !it.getHeader.isEmtpy
).map(
it => IPPortPair(it.getHeader.getDestinationIP,it.getHeader.getDestinationPort)
) ^`
}
I am concious about the strange way that my packages are being displayed as the error makes it seem that I am in hydrant.spark.hydrant.spark which does not exist.
I am also using Intellij if that makes a difference.
You need to run sbt clean (or the IntelliJ equivalent). You changed something in the project (e.g. Scala version) and this is how the incompatibility manifests.
My project uses the following jars: scala-library (2.9.2), mongo-java-driver (2.7.3), scalaj-collection (2.9.1-1.2), casbah (util, commons, core, query, gridfs) 2.9.1-3.0.0-M2, joda-time 2.1, and joda convert 1.2
When I enter the following hello-worldish code:
package test
import com.mongodb.casbah.Imports._
object Test {
def main(args: Array[String]): Unit = {
var connection = MongoConnection()
}
}
I get an error: "not found: value MongoConnection". The error goes away if I explicitly
include com.mongodb.casbah.MongoConnection
But I thought Imports._ was supposed to be taking care of that. What could I be doing wrong?
In Casbah 3.0, Imports._ is deprecated.
What is weird though is that MongoConnection is not even imported anymore. Everything else works but deprecation warnings occur.
As those warnings state, you just need to do this instead:
import com.mongodb.casbah._