Scala: Reading data from scylla throws exception - scala

I am new to scala and to run a simple query to retrieve some data from scylla. Here is my code:
val my_name = "test"
val cluster = ScyllaConnector.getCluster(clusterIpString, scyllaPreferredDc, scyllaUsername, scyllaPassword)
val session = cluster.connect(keySpace)
val preparedStatement: PreparedStatement = session.prepare(GOID_QUERY)
val nameResults = session.execute(preparedStatement.bind(my_name))
val nameResult = nameResults.one()
if(nameResult != null){
println("Here")
val id_recent = nameResult.getSet("id_recent", classOf[String])
println(id_recent)
}
session.close()
cluster.close()
Throws:
Exception in thread "main"
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not
found for requested operation: [varchar <->
java.util.Set<java.lang.String>] at
com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:679)
at
com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:526)
at
com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:506)
at
com.datastax.driver.core.CodecRegistry.access$200(CodecRegistry.java:140)
at
com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:211)
at
com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:208)
at
shadeio.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
shadeio.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
shadeio.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at shadeio.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at shadeio.common.cache.LocalCache.get(LocalCache.java:3937) at
shadeio.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) at
shadeio.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at
com.datastax.driver.core.CodecRegistry.lookupCodec(CodecRegistry.java:480)
at
com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:448)
at
com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:73)
at
com.datastax.driver.core.AbstractGettableByIndexData.getSet(AbstractGettableByIndexData.java:318)
at
com.datastax.driver.core.AbstractGettableData.getSet(AbstractGettableData.java:26)
at
com.datastax.driver.core.AbstractGettableByIndexData.getSet(AbstractGettableByIndexData.java:307)
at
com.datastax.driver.core.AbstractGettableData.getSet(AbstractGettableData.java:26)
at
com.datastax.driver.core.AbstractGettableData.getSet(AbstractGettableData.java:215)
at
class.path$.main(CodeName.scala:184)
at
class.path.main(CodeName.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am sure the problem rises in the getSet line where it's asking for classOf[String] but I'm not sure what to replace it with.
Here is my table definition:
-- auto-generated definition
create table name_table
(
name text,
id_recent text,
primary key (name)
)

You have incompatible types - you have text type in the database, but you're trying to retrieve it as a set of strings ([varchar <-> java.util.Set<java.lang.String>] message directly says about that).
Replace getSet with getString, and if you need a set, then you need to construct it yourself from retrieved string

Related

Scala ClosedByInterruptException using os.lib watch service

One new to scala here!
Im trying to use os.lib.watch to read json files when there is file name change happening in the directory. Problem is that i cannot get the json file read when i change the filename manually to something else.
object Main extends App {
//this works no problem
val jsonPath = os.Path("/users/tst.json")
val jsonString = os.read(jsonPath)
val data = ujson.read(jsonString)
println(data)
def readFileContent(file: os.Path){
println("Reading Input Json..")
val jsonString = os.read(file) //fail
val data = ujson.read(jsonString)
println(data)
}
def processFile(filePath: os.Path) {
println("FileName:"+ filePath)
readFileContent(filePath)
}
os.watch.watch(Seq(os.pwd/"output"),
f => processFile(f.last))
}
sbt:
"com.lihaoyi" %% "os-lib" % "0.7.8", "com.lihaoyi" %% "os-lib-watch" % "0.4.2"
Error when reading the json file:
JNA: Callback os.watch.FSEventsWatcher$$anon$1#82009c threw the following exception:
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.io.InputStream.read(InputStream.java:101)
at os.Internals$.transfer0(Internals.scala:15)
at os.Internals$.transfer(Internals.scala:23)
at os.read$bytes$.apply(ReadWriteOps.scala:257)
at os.read$.apply(ReadWriteOps.scala:216)
at os.read$.apply(ReadWriteOps.scala:214)
at Main$.readFileContent(Main.scala:21)
at Main$.processFile(Main.scala:28)
at Main$.$anonfun$new$1(Main.scala:33)
at Main$.$anonfun$new$1$adapted(Main.scala:33)
at os.watch.FSEventsWatcher$$anon$1.invoke(FSEventsWatcher.scala:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jna.CallbackReference$DefaultCallbackProxy.invokeCallback(CallbackReference.java:520)
at com.sun.jna.CallbackReference$DefaultCallbackProxy.callback(CallbackReference.java:551)
at com.sun.jna.Native.invokeVoid(Native Method)
at com.sun.jna.Function.invoke(Function.java:414)
at com.sun.jna.Function.invoke(Function.java:360)
at com.sun.jna.Library$Handler.invoke(Library.java:244)
at com.sun.proxy.$Proxy3.CFRunLoopRun(Unknown Source)
at os.watch.FSEventsWatcher.run(FSEventsWatcher.scala:75)
at os.watch.package$.$anonfun$watch$1(package.scala:39)
at java.lang.Thread.run(Thread.java:748)

spark dealing with carbondata

Below is the code snippet I'm trying to use to create a carbondata table in S3. However, inspite of setting the aws credentials in hadoopconfiguration, it still complains about secret key and access key not being set. What is the issue here?
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._
val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("s3n://url")
carbon.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId","<accesskey>")
carbon.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","<secretaccesskey>")
carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string,name string,city string,age Int) STORED BY 'carbondata'")
Last command yields error:
java.lang.IllegalArgumentException: AWS Access Key ID and Secret
Access Key must be specified as the username or password
(respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId
or fs.s3n.awsSecretAccessKey properties (respectively)
Spark Version : 2.2.1
Command used to start spark-shell:
$SPARK_PATH/bin/spark-shell --jars /localpath/jar/apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2/apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2.jar,/localpath/jar/spark-avro_2.11-4.0.0.jar --packages com.amazonaws:aws-java-sdk-pom:1.9.22,org.apache.hadoop:hadoop-aws:2.7.2,org.slf4j:slf4j-simple:1.7.21,asm:asm:3.2,org.xerial.snappy:snappy-java:1.1.7.1,com.databricks:spark-avro_2.11:4.0.0
UPDATE:
Found that S3 support is only available in 1.4.0 RC1. So I built RC1 and tested the below code against the same. But still I seem to be running into issues. Any help appreciated.
Code:
import org.apache.spark.sql.CarbonSession._
import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
import org.apache.spark.sql.SparkSession
import org.apache.carbondata.core.constants.CarbonCommonConstants
object sample4 {
def main(args: Array[String]) {
val (accessKey, secretKey, endpoint) = getKeyOnPrefix("s3n://")
//val rootPath = new File(this.getClass.getResource("/").getPath
// + "../../../..").getCanonicalPath
val path = "/localpath/sample/data1.csv"
val spark = SparkSession
.builder()
.master("local")
.appName("S3UsingSDKExample")
.config("spark.driver.host", "localhost")
.config(accessKey, "<accesskey>")
.config(secretKey, "<secretkey>")
//.config(endpoint, "s3-us-east-1.amazonaws.com")
.getOrCreateCarbonSession()
spark.sql("Drop table if exists carbon_table")
spark.sql(
s"""
| CREATE TABLE if not exists carbon_table(
| shortField SHORT,
| intField INT,
| bigintField LONG,
| doubleField DOUBLE,
| stringField STRING,
| timestampField TIMESTAMP,
| decimalField DECIMAL(18,2),
| dateField DATE,
| charField CHAR(5),
| floatField FLOAT
| )
| STORED BY 'carbondata'
| LOCATION 's3n://bucketName/table/carbon_table'
| TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
""".stripMargin)
}
def getKeyOnPrefix(path: String): (String, String, String) = {
val endPoint = "spark.hadoop." + ENDPOINT
if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
} else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
"spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
} else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
"spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
} else {
throw new Exception("Incorrect Store Path")
}
}
def getSparkMaster(args: Array[String]): String = {
if (args.length == 6) args(5)
else if (args(3).contains("spark:") || args(3).contains("mesos:")) args(3)
else "local"
}
}
Error:
18/05/17 12:23:22 ERROR SegmentStatusManager: main Failed to read metadata of load
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: Request Error: Empty key
I also tried against the sample code in (tried s3,s3n,s3a protocols as well):
https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala
Ran as:
S3Example.main(Array("accesskey","secretKey","s3://bucketName/path/carbon_table","https://bucketName.s3.amazonaws.com","local"))
Error stacktrace:
org.apache.hadoop.fs.s3.S3Exception:
org.jets3t.service.S3ServiceException: Request Error: Empty key at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy21.retrieveINode(Unknown Source) at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isFileExist(AbstractDFSCarbonFile.java:426)
at
org.apache.carbondata.core.datastore.impl.FileFactory.isFileExist(FileFactory.java:201)
at
org.apache.carbondata.core.statusmanager.SegmentStatusManager.readTableStatusFile(SegmentStatusManager.java:246)
at
org.apache.carbondata.core.statusmanager.SegmentStatusManager.readLoadMetadata(SegmentStatusManager.java:197)
at
org.apache.carbondata.core.cache.dictionary.ManageDictionaryAndBTree.clearBTreeAndDictionaryLRUCache(ManageDictionaryAndBTree.java:101)
at
org.apache.spark.sql.hive.CarbonFileMetastore.dropTable(CarbonFileMetastore.scala:460)
at
org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:148)
at
org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:183) at
org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:107)
at
org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:96)
at
org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:144)
at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:94) at
$line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$S3Example$.main(:68) at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:31)
at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) at
$line26.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:38) at
$line26.$read$$iw$$iw$$iw$$iw$$iw.(:40) at
$line26.$read$$iw$$iw$$iw$$iw.(:42) at
$line26.$read$$iw$$iw$$iw.(:44) at
$line26.$read$$iw$$iw.(:46) at
$line26.$read$$iw.(:48) at
$line26.$read.(:50) at
$line26.$read$.(:54) at
$line26.$read$.() at
$line26.$eval$.$print$lzycompute(:7) at
$line26.$eval$.$print(:6) at $line26.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at
scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at
scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at
scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at
scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415) at
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)
at
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at
scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at
scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at
org.apache.spark.repl.Main$.doMain(Main.scala:74) at
org.apache.spark.repl.Main$.main(Main.scala:54) at
org.apache.spark.repl.Main.main(Main.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused
by: org.jets3t.service.S3ServiceException: Request Error: Empty key
at org.jets3t.service.S3Service.getObject(S3Service.java:1470) at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163)
Is any of the arguments that I'm passing wrong.
I'm able to access the s3 path using aws cli:
aws s3 ls s3://bucketName/path
exists in S3.
You can try it using this example https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala
You have to provide aws credentials properties to spark first after that you will be creating carbonSession.
If you have already created sparkContext without aws properties being provided. Then it do not pick up those properties even after you give it to carbonContext.
hi vikas looking at your exception empty key simply means that your acesss key and secret key is not binded in carbon session because when we give the s3 implementation we write the logic that if any of key is not provide by user then it then their value should be taken as empty
so to make things easy
first build the carbon data jar using this command
mvn -Pspark-2.1 clean package
then execute spark submit with this command
./spark-submit --jars file:///home/anubhav/Downloads/softwares/spark-2.2.1-bin-hadoop2.7/carbonlib/apache-carbondata-1.4.0-SNAPSHOT-bin-spark2.2.1-hadoop2.7.2.jar --class org.apache.carbondata.examples.S3Example /home/anubhav/Documents/carbondata/carbondata/carbondata/examples/spark2/target/carbondata-examples-spark2-1.4.0-SNAPSHOT.jar local
replace my jar path with yours and see it should work,its working for me

Spark-Shell error: "spark.dynamicAllocation.{min/max}Executors must be set

I am trying to start spark-shell after setting up Spark 1.2.1 on cloudera quick start VM. I am getting the below error.Looking for help in resolving this issue. Appreciate any quick help on this to resolve the issue. The log of the error is mentioned below:
16/03/03 09:40:37 INFO EventLoggingListener: Logging events to hdfs://quickstart.cloudera:8020/user/spark/applicationHistory/local-1457026830824
org.apache.spark.SparkException: spark.dynamicAllocation.{min/max}Executors must be set!
at org.apache.spark.ExecutorAllocationManager.validateSettings(ExecutorAllocationManager.scala:135)
at org.apache.spark.ExecutorAllocationManager.<init>(ExecutorAllocationManager.scala:98)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:377)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:986)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:270)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:60)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:147)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:60)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:60)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:962)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
scala>
The exception is pretty clear. It seems that you've set the spark.dynamicAllocation.enabled property to true, but failed to set spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors. The spark 1.2.1 documentation clearly states this (from spark.dynamicAllocation.enabled description, emphasis mine):
This requires the following configurations to be set:
spark.dynamicAllocation.minExecutors,
spark.dynamicAllocation.maxExecutors, and
spark.shuffle.service.enabled
If you look at the 1.2 branch of Spark, you'll see that if you don't specify those values, the default defers to -1:
// Lower and upper bounds on the number of executors. These are required.
private val minNumExecutors = conf.getInt("spark.dynamicAllocation.minExecutors", -1)
private val maxNumExecutors = conf.getInt("spark.dynamicAllocation.maxExecutors", -1)
This behavior has changed. If you look at the updated 1.6 branch of Spark, you'll see that they defer to 0 and Integer.MAX_VALUE, respectively:
// Lower and upper bounds on the number of executors.
private val minNumExecutors = conf.getInt("spark.dynamicAllocation.minExecutors", 0)
private val maxNumExecutors = conf.getInt("spark.dynamicAllocation.maxExecutors",
Integer.MAX_VALUE)
This simply means, you need to add these either to the SparkConf settings, or to any other configuration file you're providing to the spark-shell:
val sparkConf = new SparkConf()
.set("spark.dynamicAllocation.minExecutors", minExecutors)
.set("spark.dynamicAllocation.maxExecutors", maxExecutors)

Spring Batch ResultSet got closed by other before all data being fetched

I am trying to setup the DB2 source as the persistence for the Batch meta data. I am getting this stacktrace:
Caused by: org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException for SQL [SELECT JOB_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, CREATE_TIME, LAST_UPDATED, VERSION, JOB_CONFIGURATION_LOCATION from rhall.BATCH_JOB_EXECUTION where JOB_INSTANCE_ID = ? order by JOB_EXECUTION_ID desc]; SQL state [null]; error code [-4470]; [jcc][t4][10120][10898][3.57.82] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null; nested exception is com.ibm.db2.jcc.am.SqlException: [jcc][t4][10120][10898][3.57.82] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:84)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:645)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:722)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:777)
at org.springframework.batch.core.repository.dao.JdbcJobExecutionDao.findJobExecutions(JdbcJobExecutionDao.java:131)
at org.springframework.batch.core.repository.support.SimpleJobRepository.getStepExecutionCount(SimpleJobRepository.java:253)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:281)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at $Proxy32.getStepExecutionCount(Unknown Source)
at org.springframework.batch.core.job.flow.JobFlowExecutor.isStepRestart(JobFlowExecutor.java:82)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:63)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:67)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:169) ... 22 more
Caused by: com.ibm.db2.jcc.am.SqlException: [jcc][t4][10120][10898][3.57.82] Invalid operation: result set is closed. ERRORCODE=-4470, SQLSTATE=null
at com.ibm.db2.jcc.am.bd.a(bd.java:660)
at com.ibm.db2.jcc.am.bd.a(bd.java:60)
at com.ibm.db2.jcc.am.bd.a(bd.java:103)
at com.ibm.db2.jcc.am.zl.Db(zl.java:4219)
at com.ibm.db2.jcc.am.zl.q(zl.java:4180)
at com.ibm.db2.jcc.am.zl.c(zl.java:1009)
at com.ibm.db2.jcc.am.zl.getTimestamp(zl.java:985)
at com.ibm.ws.rsadapter.jdbc.WSJdbcResultSet.getTimestamp(WSJdbcResultSet.java:2607)
at org.springframework.batch.core.repository.dao.JdbcJobExecutionDao$JobExecutionRowMapper.mapRow(JdbcJobExecutionDao.java:425)
at org.springframework.batch.core.repository.dao.JdbcJobExecutionDao$JobExecutionRowMapper.mapRow(JdbcJobExecutionDao.java:396)
at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:93)
at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60)
at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:693)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:629)
... 45 more
I trace the code, and found the problem around this method: JdbcJobExecutionDao.mapRow(ResultSet rs, int rowNum)
(I am using Spring-batch version 3.0.6) List paste the method here for your
convenience,
public JobExecution mapRow(ResultSet rs, int rowNum)
throws SQLException {
Long id = rs.getLong(1);
String jobConfigurationLocation = rs.getString(10);
JobExecution jobExecution;
if (jobParameters == null) {
jobParameters = getJobParameters(id);
}
if (jobInstance == null) {
jobExecution = new JobExecution(id, jobParameters, jobConfigurationLocation);
} else {
jobExecution = new JobExecution(jobInstance, id, jobParameters, jobConfigurationLocation);
}
jobExecution.setStartTime(rs.getTimestamp(2));
jobExecution.setEndTime(rs.getTimestamp(3));
jobExecution.setStatus(BatchStatus.valueOf(rs.getString(4)));
jobExecution.setExitStatus(new ExitStatus(rs.getString(5), rs.getString(6)));
jobExecution.setCreateTime(rs.getTimestamp(7));
jobExecution.setLastUpdated(rs.getTimestamp(8));
jobExecution.setVersion(rs.getInt(9));
return jobExecution;
}
As I trace it, I notice that the problem is in the getJobParameters(id) method. This method performs another query to the JOB_EXECUTION_PARAMS table for paramaters for the given job id. But within this method, the getConnection method returns the same connection as in the current context. After the query, the finally block closes the resultSet. So when the control gets back to the mapRow method, it failed at this line:
jobExecution.setStartTime(rs.getTimestamp(2));
It is because the rs has already been closed by the getJobParameters(id) method.
Wondering if I did wrong? Please point me out.
Many thanks.
removing #Transactional from my method that uses the Batch Infrastructure classes, ie, JobExplorer, solves this issues around the closed ResultSet.

How can i pass a URL explicitly in Scala

Hello i am new to Scala . I tried this code
def web ( url : Any) {
| val ur= new URL("url")
| val content=fromInputStream(ur.openStream).getLines.mkString("\n")
| print(content)
| }
when i pass a url like web("http://contentexplore.com/iphone-6-amazing-looks/")
it is showing error
java.net.MalformedURLException: no protocol: url
at java.net.URL.<init>(URL.java:585)
at java.net.URL.<init>(URL.java:482)
at java.net.URL.<init>(URL.java:431)
at .web(<console>:22)
at .<init>(<console>:23)
at .<clinit>(<console>)
at .<init>(<console>:11)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
at java.lang.Thread.run(Thread.java:722)
My question is how can i pass a url explicitly in scala .Kindly suggest me an idea .Thanks in advance
As mentioned in the comments, this line is the problem:
val ur= new URL("url")
If you want to create a URL from the input param url, the code should be:
val ur= new URL(url)
With the error, the java URL class was trying to parse a String with value "url", looking first for a recognized protocol (http, https, etc...) and not finding one, so that's why you were seeing that error.