I'm trying to query a Hbase table with spark but i get this error :
14:08:35.134 [main] DEBUG org.apache.hadoop.util.Shell - Failed to
detect a valid hadoop home directory java.io.FileNotFoundException:
HADOOP_HOME and hadoop.home.dir are unset.
I have set the HADOOP_HOME in .bashrc and echo $HADOOP_HOME give me the path
my code :
object HbaseQuery {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseRead").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
val tableName = "emp"
System.setProperty("hadoop.home.dir", "/usr/local/hadoop-2.7.6")
conf.set("hbase.zookeeper.quorum", "localhost")
conf.set("hbase.master", "localhost:60000")
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
println("Number of Records found : " + hBaseRDD.count())
hBaseRDD.foreach(println)
}
}
i tried to create spark-env.sh too by adding
export HADOOP_HOME="my path"
but still the same problem
Thank's in advance
Related
I am passing config filename as args in spark2-submit. But i am getting file not exists error even though the file exists. If i hard-code the file name, it is working fine.
spark2-submit --files /data/app/Data_validation/target/input.conf --class "QualityCheck" DC_framework-jar-with-dependencies.jar "input.conf"
code:
import org.apache.spark.SparkFiles
object QualityCheck {
def main(args: Array[String]) : Unit={
val configFile = SparkFiles.get("input.conf")
val conf = new SparkConf().setAppName("Check Global")
val sc = SparkContext.getOrCreate(conf)
val spark = SparkSession.builder.getOrCreate()
println(configFile)
if (file.exists) {
val config = ConfigFactory.parseFile(file)
} else {
println("Configuration file does not exist")
Stacktrace:
/data/app/Data_validation/target/input.conf
Configuration file does not exist
Config(SimpleConfigObject({}))
please help!
Please check below code.
spark2-submit --class "QualityCheck" DC_framework-jar-with-dependencies.jar "/data/app/Data_validation/target/input.conf"
object QualityCheck {
def main(args: Array[String]) : Unit={
val file = new File(args(0))
if (file.exists) {
val config = ConfigFactory.parseFile(file)
} else {
println("Configuration file does not exist")
}
val conf = new SparkConf().setAppName("Check Global")
val sc = SparkContext.getOrCreate(conf)
val spark = SparkSession.builder.getOrCreate()
}
}
Hi I'm trying to read configuration from my configuration file in spark/scala.
I've wriiten below code.
val conf = com.typesafe.config.ConfigFactory.load(args(0))
var url=conf.getString("parameters.spark-hive.url")
var db=conf.getString("parameters.spark-hive.dbname")
val sparksession = SparkSession.builder()
.appName("myapp")
.config("spark.sql.hive.hiveserver2.jdbc.url",url)
.enableHiveSupport()
.getOrCreate()
Below is my application.conf file(src/main/resources/application.conf)
parameters {
spark-hive {
url = """jdbc://xxxxxxxxxxxx""",
dbname = """bdname"""
}
}
and using below Spark-submit command:
spark-submit \
> --conf "spark.executor.extraClassPath=-Dconfig.file=application.conf"\
> --verbose \
> --class classjarname \
> project_jar
> /path/config-1.2.0.jar \
> /path/application.conf
but getting below error.
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'parameters' Note:-I'm genetarting Jar using Maven and using HDP 3.X
You could print out the actual value of args(0) to see where the (full) path refers to. This worked for me:
com.typesafe.config.ConfigFactory.parseFile(new java.io.File(args(0)))
Additional remark:
not sure what project_jar means in your submit command
there seems to be a typo with hive-url as the code when building the SparkSession does not match your configuration.
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'parameters specifies that it is not able to load the key parameters. This is the entry of your conf file which is pointing to your config file not properly loaded or not getting parsed properly. So I would suggest reading the file and then try with the next step i.e using these parameters to create the Sparksession. Try below to read the file contents if loaded properly
import scala.io.Source
import com.typesafe.config.ConfigFactory
val filename = ConfigFactory.load(args(0))
for (line <- Source.fromFile(filename).getLines) {
println(line)
}
I want to show you an easy example of how to use com.typesafe.config library.
This is my application.properties under the resources directory.
## Structured Streaming device
device.zookeeper = quickstart.cloudera:2181
device.bootstrap.server = quickstart.cloudera:9092
device.topic = device
device.execution.mode = local
device.data.host = quickstart.cloudera
device.data.port = 44444
## HBase
device.zookeeper.quorum = quickstart.cloudera
device.zookeeper.port = 2181
device.window = 1
and this is the code to get the properties, args(0) == device
def main(args: Array[String]): Unit = {
val conf = ConfigFactory.load // get Confs
val envProps: Config = conf.getConfig(args(0)) // args(0) == device
val sparkConf = new SparkConf().setMaster(envProps.getString("execution.mode")).setAppName("Device Signal") // get execution.mode conf
val streamingContext = new StreamingContext(sparkConf, Seconds(envProps.getInt("window"))) // get window conf
streamingContext.sparkContext.setLogLevel("ERROR")
val broadcastConfig = streamingContext.sparkContext.broadcast(envProps)
val topicsSet = Set(envProps.getString("topic")) // get topic conf
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> envProps.getString("bootstrap.server"), // get bootstrap.server conf
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "1",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val logData: DStream[String] = KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
Subscribe[String, String](topicsSet, kafkaParams)
).map(record =>{
record.value
})
Really simple Scala code files at the first count() method call.
def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("Spark File Count"))
val fileList = recursiveListFiles(new File("C:/data")).filter(_.isFile).map(file => file.getName())
val filesRDD = sc.parallelize(fileList)
val linesRDD = sc.textFile("file:///temp/dataset.txt")
val lines = linesRDD.count()
val files = filesRDD.count()
}
I don't want to set up a HDFS installation for this right now. How do I configure Spark to use the local file system? This works with spark-shell.
To read the file from local filesystem(From Windows directory) you need to use below pattern.
val fileRDD = sc.textFile("C:\\Users\\Sandeep\\Documents\\test\\test.txt");
Please see below sample working program to read data from local file system.
package com.scala.example
import org.apache.spark._
object Test extends Serializable {
val conf = new SparkConf().setAppName("read local file")
conf.set("spark.executor.memory", "100M")
conf.setMaster("local");
val sc = new SparkContext(conf)
val input = "C:\\Users\\Sandeep\\Documents\\test\\test.txt"
def main(args: Array[String]): Unit = {
val fileRDD = sc.textFile(input);
val counts = fileRDD.flatMap(line => line.split(","))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.collect().foreach(println)
//Stop the Spark context
sc.stop
}
}
val sc = new SparkContext(new SparkConf().setAppName("Spark File
Count")).setMaster("local[8]")
might help
I am reading Hbase table using spark scala.
code is follows:
package HBase
import org.apache.hadoop.hbase.client.{HBaseAdmin, Result}
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import it.nerdammer.spark.hbase._
import org.apache.spark._
object Connector {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseApp").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
val tableName = "cars"
conf.set("hbase.master", "10.163.12.87")
conf.setInt("timeout", 40000)
conf.set("hbase.zookeeper.quorum", "10.163.12.87")
conf.set("zookeeper.znode.parent", "/hbase-unsecure")
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(tableName)) {
val tableDesc = new HTableDescriptor(tableName)
admin.createTable(tableDesc)
}
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
println("Number of Records found : " + hBaseRDD.count())
sc.stop()
}
}
I am getting below error:
Exception in thread "main" java.lang.NumberFormatException: For input string: "16000��I���PBUF
HDP1.Node1�}ڞ���*
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:63)
at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:353)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:89)
at HBase.Connector$.main(Connector.scala:32)
at HBase.Connector.main(Connector.scala)
Try setting the port number in the hbase.master config.
conf.set("hbase.master", "10.163.12.87:60000")
I'm new to Spark and wrote a very simple Spark application in Scala as below:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object test2object {
def main(args: Array[String]) {
val logFile = "src/data/sample.txt"
val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating",
List("target/scala-2.10/simple-project_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numTHEs = logData.filter(line => line.contains("the")).count()
println("Lines with the: %s".format(numTHEs))
}
}
I'm coding in Scala IDE and included the spark-assembly.jar into my code. I generate a jar file from my project and submit that to my local spark cluster using this command spark-submit --class test2object --master local[2] ./file.jar but I get this error message:
Exception in thread "main" java.lang.NoSuchMethodException: test2object.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1665)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:649)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
What is wrong here?
p.s. my source code is under the project root directory (project/test2object.scala)
I didn't use spark 0.9.1 before, but I believed the problem came from this line of code:
val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating", List("target/scala-2.10/simple-project_2.10-1.0.jar"))
If you change to this:
val conf = new SparkConf().setAppName("Simple App")
val sc = new SparkContext(conf)
This will work.