OutOfMemoryError: Java heap space and memory variables in Spark - scala

I have been trying to execute a scala program and the output somehow always seems to be something like this:
15/08/17 14:13:14 ERROR util.Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
at java.lang.StringBuilder.<init>(StringBuilder.java:97)
at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:339)
at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:83)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2344)
at org.json4s.jackson.JsonMethods$class.compact(JsonMethods.scala:32)
at org.json4s.jackson.JsonMethods$.compact(JsonMethods.scala:44)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$1.apply(EventLoggingListener.scala:143)
at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$1.apply(EventLoggingListener.scala:143)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:143)
at org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:169)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1215)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
or like this
15/08/19 11:45:11 ERROR util.Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider$Impl.createInstance(DefaultSerializerProvider.java:526)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider$Impl.createInstance(DefaultSerializerProvider.java:505)
at com.fasterxml.jackson.databind.ObjectMapper._serializerProvider(ObjectMapper.java:2846)
at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902)
at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:280)
at com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1255)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:22)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902)
at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:280)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:17)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902)
at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:280)
at com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1255)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:22)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902)
at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:280)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:17)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902)
at com.fasterxml.jackson.core.base.GeneratorBase.writeObject(GeneratorBase.java:280)
at com.fasterxml.jackson.core.JsonGenerator.writeObjectField(JsonGenerator.java:1255)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:22)
at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
Are these errors on the driver or executor side?
I am a bit confused with the memory variables that Spark uses. My current settings are
spark-env.sh
export SPARK_WORKER_MEMORY=6G
export SPARK_DRIVER_MEMORY=6G
export SPARK_EXECUTOR_MEMORY=4G
spark-defaults.conf
# spark.driver.memory 6G
# spark.executor.memory 4G
# spark.executor.extraJavaOptions ' -Xms5G -Xmx5G '
# spark.driver.extraJavaOptions ' -Xms5G -Xmx5G '
Do I need to uncomment any of the variables contained in spark-defaults.conf, or are they redundant?
Is for example setting SPARK_WORKER_MEMORY equivalent to setting the spark.executor.memory?
Part of my scala code where it stops after a few iterations:
val filteredNodesGroups = connCompGraph.vertices.map{ case(_, array) => array(pagerankIndex) }.distinct.collect
for (id <- filteredNodesGroups){
val clusterGraph = connCompGraph.subgraph(vpred = (_, attr) => attr(pagerankIndex) == id)
val pagerankGraph = clusterGraph.pageRank(0.15)
val completeClusterPagerankGraph = clusterGraph.outerJoinVertices(pagerankGraph.vertices) {
case (uid, attrList, Some(pr)) =>
attrList :+ ("inClusterPagerank:" + pr)
case (uid, attrList, None) =>
attrList :+ ""
}
val sortedClusterNodes = completeClusterPagerankGraph.vertices.toArray.sortBy(_._2(pagerankIndex + 1))
println(sortedClusterNodes(0)._2(1) + " with rank: " + sortedClusterNodes(0)._2(pagerankIndex + 1))
}
Many questions disguised as one. Thank you in advance!

I'm not a Spark expert, but there is line that seems suspicious to me :
val filteredNodesGroups = connCompGraph.vertices.map{ case(_, array) => array(pagerankIndex) }.distinct.collect
Basically, by using the collect method, you are getting back all the data from your executors (before even processing it) to the driver. Do you have any idea about the size of this data ?
In order to fix this, you should proceed in a more functional way. To extract the distinct values, you could for example use a groupBy and map :
val pairs = connCompGraph.vertices.map{ case(_, array) => array(pagerankIndex) }
pairs.groupBy(_./* the property to group on */)
.map { case (_, arrays) => /* map function */ }
Regarding the collect, there should be a way to sort each partition and then to return the (processed) result to the driver. I would like to help you more but I need more information about what you are trying to do.
UPDATE
After digging a little bit, you could sort your data using shuffling as described here
UPDATE
So far, I've tried to avoid the collect, and to get the data back to the driver as much as possible, but I've no idea how to solve this :
val filteredNodesGroups = connCompGraph.vertices.map{ case(_, array) => array(pagerankIndex) }.distinct()
val clusterGraphs = filteredNodesGroups.map { id => connCompGraph.subgraph(vpred = (_, attr) => attr(pagerankIndex) == id) }
val pageRankGraphs = clusterGraphs.map(_.pageRank(0.15))
Basically, you need to join two RDD[Graph[Array[String], String]], but I don't know what key to use, and secondly this would necessarily return an RDD of RDD (I don't know if you can even do that). I'll try to find something later this day.

Related

Akka streams with gilt aws kinesis exception: Stream is terminated. SourceQueue is detached

I'm using gilt aws kinesis stream consumer library there to connect to a single shard kinesis stream.
Specifically:
...
val streamConfig = KinesisStreamConsumerConfig[String](
streamName = queueName
, applicationName = kinesisConsumerApp
, regionName = Some(awsRegion)
, checkPointInterval = 5.minutes
, retryConfig = RetryConfig(initialDelay = 1.second, retryDelay = 1.second, maxRetries = 3)
, initialPositionInStream = InitialPositionInStream.LATEST
)
implicit val mat = ActorMaterializer()
val flow = Source.queue[String](0, OverflowStrategy.backpressure)
.to(Sink.foreach {
msgBody => {
log.info(s"Flow got message: $msgBody")
try {
val workAsJson = parse(msgBody)
frontEnd ! workAsJson
} catch {
case th: Throwable => log.error(s"Exception thrown trying to parse message from Kinesis stream, e.cause: ${th.getCause}, e.message: ${th.getMessage}")
}
}
})
.run()
val consumer = new KinesisStreamConsumer[String](
streamConfig,
KinesisStreamHandler(
KinesisStreamSource.pumpKinesisStreamTo(flow, 10.second)
)
)
val ec = Executors.newSingleThreadExecutor()
ec.submit(new Runnable {
override def run(): Unit = consumer.run()
})
The application runs fine for about 24 hours, I verify occasionally by pushing records using aws kinesis put-record command line and watch them getting consumed by my application, but then suddenly the application start receiving exceptions each time a new record is pushed to the stream.
Here is the console logging when that happens:
INFO: Sleeping ... [863/1962]
DEBUG[RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Processing 1 records from shard shardId-000000000000
WARN [RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
WARN [RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
WARN [RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
ERROR[RecordProcessor-0000] KCLRecordProcessorFactory$IRecordProcessorFactoryImpl - SKIPPING 1 records from shard shardId-000000000000 :: Kinesis shard: shardId-000000000000 :: Stream is termi
nated. SourceQueue is detached
com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$KCLProcessorException: Kinesis shard: shardId-000000000000 :: Stream is terminated. SourceQueue is detached
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.$anonfun$doRetry$2(KCLRecordProcessorFactory.scala:156)
at com.gilt.gfc.util.Retry$.retryWithExponentialDelay(Retry.scala:67)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.doRetry(KCLRecordProcessorFactory.scala:151)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.processRecords(KCLRecordProcessorFactory.scala:120)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.V1ToV2RecordProcessorAdapter.processRecords(V1ToV2RecordProcessorAdapter.java:42)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:176)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Stream is terminated. SourceQueue is detached
at akka.stream.impl.QueueSource$$anon$1.$anonfun$postStop$1(Sources.scala:57)
at akka.stream.impl.QueueSource$$anon$1.$anonfun$postStop$1$adapted(Sources.scala:56)
at akka.stream.stage.CallbackWrapper.$anonfun$invoke$1(GraphStage.scala:1373)
at akka.stream.stage.CallbackWrapper.locked(GraphStage.scala:1379)
at akka.stream.stage.CallbackWrapper.invoke(GraphStage.scala:1370)
at akka.stream.stage.CallbackWrapper.invoke$(GraphStage.scala:1369)
at akka.stream.impl.QueueSource$$anon$1.invoke(Sources.scala:47)
at akka.stream.impl.QueueSource$$anon$2.offer(Sources.scala:180)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamSource$.$anonfun$pumpKinesisStreamTo$1(KinesisStreamSource.scala:20)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamSource$.$anonfun$pumpKinesisStreamTo$1$adapted(KinesisStreamSource.scala:20)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamHandler$$anon$1.onRecord(KinesisStreamHandler.scala:29)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamConsumer.$anonfun$run$1(KinesisStreamConsumer.scala:40)
at com.gilt.gfc.aws.kinesis.akka.KinesisStreamConsumer.$anonfun$run$1$adapted(KinesisStreamConsumer.scala:40)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$2(KCLWorkerRunner.scala:159)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$2$adapted(KCLWorkerRunner.scala:159)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$1(KCLWorkerRunner.scala:159)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runSingleRecordProcessor$1$adapted(KCLWorkerRunner.scala:159)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runBatchProcessor$1(KCLWorkerRunner.scala:121)
at com.gilt.gfc.aws.kinesis.client.KCLWorkerRunner.$anonfun$runBatchProcessor$1$adapted(KCLWorkerRunner.scala:116)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.$anonfun$processRecords$2(KCLRecordProcessorFactory.scala:120)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at com.gilt.gfc.aws.kinesis.client.KCLRecordProcessorFactory$IRecordProcessorFactoryImpl$$anon$1.$anonfun$doRetry$2(KCLRecordProcessorFactory.scala:153)
... 11 common frames omitted
Wondering if that answer might be related. If so, I'd appreciate simpler explanation/how-to-fix that suits a newbie like myself
Notes:
This is still in testing/staging phase so there is no real load to
the stream except for the occasional manual pushes I'm making
The 24h duration in which the application runs fine is not
accurately tested but was an observation.
I'm running the test for a third time (started at 8:42 UTC) but with
the difference of increasing Source.queue buffer size to 100.
If 24h turns out to be accurate, could that be related to Kinesis
default 24h retention period of stream records?
Update:
Application still working fine after 24+ hours of operation.
Update2:
So the application has been running fine for the past 48+ hours, again the only difference is increasing the stream's Source.queue size to 100.
Could that be the proper fix to the issue?
Will I face similar issue with increased load once we go to production?
Is 100 enough/too much/too few?
Can someone please explain how this change fixed/suppressed/metigated the error?

ReactiveMongo/Play: `LastError: DatabaseException['<none>']`, while db still updates

weird problem: While my play application tries to insert/update records from some mongoDB collections while using reactivemongo, the operation seems to fail with a mysterious message, but the record does, actually, gets inserted/updated.
More info:
Insert to problematic collections from the mongo console works well
reading from all collections works well
reading and writing to other collections in the same db works well
writing to the problematic collections used to work.
Error message is:
play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[LastError: DatabaseException['<none>']]]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:280)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:206)
at play.core.server.netty.PlayRequestHandler$$anonfun$2$$anonfun$apply$1.applyOrElse(PlayRequestHandler.scala:100)
at play.core.server.netty.PlayRequestHandler$$anonfun$2$$anonfun$apply$1.applyOrElse(PlayRequestHandler.scala:99)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:344)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:343)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at play.api.libs.iteratee.Execution$trampoline$.execute(Execution.scala:70)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
Caused by: reactivemongo.api.commands.LastError: DatabaseException['<none>']
Using ReactiveMongo 0.11.14, Play 2.5.4, Scala 2.11.7, MongoDB 3.4.0.
Thanks!
UPDATE - The mystery thickens!
Based on #Yaroslav_Derman's answer, I added a .recover clause, like so:
collectionRef.flatMap( c =>
c.update( BSONDocument("_id" -> publicationWithId.id.get), publicationWithId.asInstanceOf[PublicationItem], upsert=true))
.map(wr => {
Logger.warn("Write Result: " + wr )
Logger.warn("wr.inError: " + wr.inError)
Logger.warn("*************")
publicationWithId
}).recover({
case de:DatabaseException => {
Logger.warn("DatabaseException: " + de.getMessage())
Logger.warn("Cause: " + de.getCause())
Logger.warn("Code: " + de.code)
publicationWithId
}
})
The recover clause does get called. Here's the log:
[info] application - Saving pub t3
[warn] application - *************
[warn] application - Saving publication Publication(Some(BSONObjectID("5848101d7263468d01ff390d")),t3,2016-12-07,desc,auth,filename,None)
[info] application - Resolving database...
[info] application - Resolving database...
[warn] application - DatabaseException: DatabaseException['<none>']
[warn] application - Cause: null
[warn] application - Code: None
So no cause, no code, message is "'<none>'", but still an error. What gives?
I tried to move to 0.12, but that caused some compilation errors across the app, plus I'm not sure that would solve the problem. So I'd like to understand what's wrong first.
UPDATE #2:
Migrated to reactive-mongo 0.12.0. Problem persists.
Problem solved by downgrading to MongoDB 3.2.8. Turns out reactiveMongo 0.12.0 is not compatible with mongoDB 3.4.
Thanks everyone who looked into this.
For play reactivemongo 0.12.0 you can do like this
def appsDb = reactiveMongoApi.database.map(_.collection[JSONCollection](DesktopApp.COLLECTION_NAME))
def save(id: String, user: User, body: JsValue) = {
val version = (body \ "version").as[String]
val app = DesktopApp(id, version, user)
appsDb.flatMap(
_.insert(app)
.map(_ => app)
.recover(processError)
)
}
def processError[T]: PartialFunction[Throwable, T] = {
case ex: DatabaseException if ex.code.contains(10054 | 10056 | 10058 | 10107 | 13435 | 13436) =>
//Custom exception which processed in Error Handler
throw new AppException(ResponseCode.ALREADY_EXISTS, "Entity already exists")
case ex: DatabaseException if ex.code.contains(10057 | 15845 | 16550) =>
//Custom exception which processed in Error Handler
throw new AppException(ResponseCode.ENTITY_NOT_FOUND, "Entity not found")
case ex: Exception =>
//Custom exception which processed in Error Handler
throw new InternalServerErrorException(ex.getMessage)
}
Also you can add logs in processError method
LastError was deprecated in 0.11, replaced by WriteResult.
LastError does not, actually, means error, it could mean successful result, you need to check inError property of the LastError object to detect if it's real error. As I see, the '<none>' error message give a good chance that this is not error.
Here is the example "how it was in 0.10": http://reactivemongo.org/releases/0.10/documentation/tutorial/write-documents.html

How to use Spark BigQuery Connector locally?

For test purpose, I would like to use BigQuery Connector to write Parquet Avro logs in BigQuery. As I'm writing there is no way to read directly Parquet from the UI to ingest it so I'm writing a Spark job to do so.
In Scala, for the time being, job body is the following:
val events: RDD[RichTrackEvent] =
readParquetRDD[RichTrackEvent, RichTrackEvent](sc, googleCloudStorageUrl)
val conf = sc.hadoopConfiguration
conf.set("mapred.bq.project.id", "myproject")
// Output parameters
val projectId = conf.get("fs.gs.project.id")
val outputDatasetId = "logs"
val outputTableId = "test"
val outputTableSchema = LogSchema.schema
// Output configuration
BigQueryConfiguration.configureBigQueryOutput(
conf, projectId, outputDatasetId, outputTableId, outputTableSchema
)
conf.set(
"mapreduce.job.outputformat.class",
classOf[BigQueryOutputFormat[_, _]].getName
)
events
.mapPartitions {
items =>
val gson = new Gson()
items.map(e => gson.fromJson(e.toString, classOf[JsonObject]))
}
.map(x => (null, x))
.saveAsNewAPIHadoopDataset(conf)
As the BigQueryOutputFormat isn't finding the Google Credentials, it fallbacks on metadata host to try to discover them with the following stacktrace:
016-06-13 11:40:53 WARN HttpTransport:993 - exception thrown while executing request
java.net.UnknownHostException: metadata
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:160)
at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:207)
at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:72)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.createBigQueryCredential(BigQueryFactory.java:81)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQuery(BigQueryFactory.java:101)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQueryHelper(BigQueryFactory.java:89)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputCommitter.<init>(BigQueryOutputCommitter.java:70)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:102)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:84)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:30)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1135)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1078)
It is of course expected but it should be able to use my service account and its key as GoogleCredential.getApplicationDefault() returns appropriate credentials fetched from GOOGLE_APPLICATION_CREDENTIALS environment variable.
As the connector seems to read credentials, from hadoop configuration, what's the keys to set so that it reads GOOGLE_APPLICATION_CREDENTIALS ? Is there a way to configure the output format to use a provided GoogleCredential object ?
If I understand your question correctly - you might want to set:
<name>mapred.bq.auth.service.account.enable</name>
<name>mapred.bq.auth.service.account.email</name>
<name>mapred.bq.auth.service.account.keyfile</name>
<name>mapred.bq.project.id</name>
<name>mapred.bq.gcs.bucket</name>
Here, the mapred.bq.auth.service.account.keyfile should point to the full file path to the older-style "P12" keyfile; alternatively, if you're using the newer "JSON" keyfiles, you should replace the "email" and "keyfile" entries with the single mapred.bq.auth.service.account.json.keyfile key:
<name>mapred.bq.auth.service.account.enable</name>
<name>mapred.bq.auth.service.account.json.keyfile</name>
<name>mapred.bq.project.id</name>
<name>mapred.bq.gcs.bucket</name>
Also you might want to take a look at https://github.com/spotify/spark-bigquery - which is much more civilised way of working with BQ and Spark. The setGcpJsonKeyFile method used in this case is the same JSON file you'd set for mapred.bq.auth.service.account.json.keyfile if using the BQ connector for Hadoop.

Play 2.1 Scala SQLException Connection Timed out waiting for a free available connection

I have been working on this issue for quite a while now and I cannot find a solution...
A web app built with play framework 2.2.1 using h2 db (for dev) and a simple Model package.
I am trying to implement a REST JSON endpoint and the code works... but only once per server instance.
def createOtherModel() = Action(parse.json) {
request =>
request.body \ "name" match {
case _: JsUndefined => BadRequest(Json.obj("error" -> true,
"message" -> "Could not match name =(")).as("application/json")
case name: JsValue =>
request.body \ "value" match {
case _: JsUndefined => BadRequest(Json.obj("error" -> true,
"message" -> "Could not match value =(")).as("application/json")
case value: JsValue =>
// this breaks the secod time
val session = ThinkingSession.dummy
val json = Json.obj(
"content" -> value,
"thinkingSession" -> session.id,
)
)
Ok(Json.obj("content" -> json)).as("application/json")
}
} else {
BadRequest(Json.obj("error" -> true,
"message" -> "Name was not content =(")).as("application/json")
}
}
}
so basically I read the JSON, echo the "value" value, create a model obj and send it's id.
the ThinkingSession.dummy function does this:
def all(): List[ThinkingSession] = {
// Tried explicitly closing connection, no difference
//val conn = DB.getConnection()
//try {
// DB.withConnection { implicit conn =>
// SQL("select * from thinking_session").as(ThinkingSession.DBParser *)
// }
//} finally {
// conn.close()
//}
DB.withConnection { implicit conn =>
SQL("select * from thinking_session").as(ThinkingSession.DBParser *)
}
}
def dummy: ThinkingSession = {
(all() head)
}
So this should do a SELECT * FROM thinking_session, create a model obj list from the result and return the first out of the list.
This works fine the first time after server start but the second time I get a
play.api.Application$$anon$1: Execution exception[[SQLException: Timed out waiting for a free available connection.]]
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10.jar:2.2.1]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10.jar:2.2.1]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.1]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.1]
at scala.Option.map(Option.scala:145) [scala-library.jar:na]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2.applyOrElse(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.1]
Caused by: java.sql.SQLException: Timed out waiting for a free available connection.
at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:88) ~[bonecp.jar:na]
at com.jolbox.bonecp.AbstractConnectionStrategy.getConnection(AbstractConnectionStrategy.java:90) ~[bonecp.jar:na]
at com.jolbox.bonecp.BoneCP.getConnection(BoneCP.java:553) ~[bonecp.jar:na]
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:131) ~[bonecp.jar:na]
at play.api.db.DBApi$class.getConnection(DB.scala:67) ~[play-jdbc_2.10.jar:2.2.1]
at play.api.db.BoneCPApi.getConnection(DB.scala:276) ~[play-jdbc_2.10.jar:2.2.1]
My application.conf (db section)
db.default.driver=org.h2.Driver
db.default.url="jdbc:h2:file:database/[my_db]"
db.default.logStatements=true
db.default.idleConnectionTestPeriod=5 minutes
db.default.connectionTestStatement="SELECT 1"
db.default.maxConnectionAge=0
db.default.connectionTimeout=10000
Initially the only thing set in my config was the connection and the error occurred. I added all the other stuff while reading up on the issue on the web.
What is interesting is that when I use the h2 in memory db it works once after server start and after that it fails. when I use the h2 file system db it only works once, regardless of the server instances.
Can anyone give me some insight on this issue? Have found some stuff on bonecp problem and tried upgrading to 0.8.0-rc1 but nothing changed... I am at a loss =(
Try to set a maxConnectionAge and idle timeout
turns out the error was quite somewhere else... it was a good ol' stack overflow... have not seen one in a long time. I tried down-voting my question but it's not possible^^

Why makes calling error or done in a BodyParser's Iteratee the request hang in Play Framework 2.0?

I am trying to understand the reactive I/O concepts of Play 2.0 framework. In order to get a better understanding from the start I decided to skip the framework's helpers to construct iteratees of different kinds and to write a custom Iteratee from scratch to be used by a BodyParser to parse a request body.
Starting with the information available in Iteratees and ScalaBodyParser docs and two presentations about play reactive I/O this is what I came up with:
import play.api.mvc._
import play.api.mvc.Results._
import play.api.libs.iteratee.{Iteratee, Input}
import play.api.libs.concurrent.Promise
import play.api.libs.iteratee.Input.{El, EOF, Empty}
01 object Upload extends Controller {
02 def send = Action(BodyParser(rh => new SomeIteratee)) { request =>
03 Ok("Done")
04 }
05 }
06
07 case class SomeIteratee(state: Symbol = 'Cont, input: Input[Array[Byte]] = Empty, received: Int = 0) extends Iteratee[Array[Byte], Either[Result, Int]] {
08 println(state + " " + input + " " + received)
09
10 def fold[B](
11 done: (Either[Result, Int], Input[Array[Byte]]) => Promise[B],
12 cont: (Input[Array[Byte]] => Iteratee[Array[Byte], Either[Result, Int]]) => Promise[B],
13 error: (String, Input[Array[Byte]]) => Promise[B]
14 ): Promise[B] = state match {
15 case 'Done => { println("Done"); done(Right(received), Input.Empty) }
16 case 'Cont => cont(in => in match {
17 case in: El[Array[Byte]] => copy(input = in, received = received + in.e.length)
18 case Empty => copy(input = in)
19 case EOF => copy(state = 'Done, input = in)
20 case _ => copy(state = 'Error, input = in)
21 })
22 case _ => { println("Error"); error("Some error.", input) }
23 }
24 }
(Remark: All these things are new to me, so please forgive if something about this is total crap.)
The Iteratee is pretty dumb, it just reads all chunks, sums up the number of received bytes and prints out some messages. Everything works as expected when I call the controller action with some data - I can observe all chunks are received by the Iteratee and when all data is read it switches to state done and the request ends.
Now I started to play around with the code because I wanted to see the behaviour for these two cases:
Switching into state error before all input is read.
Switching into state done before all input is read and returning a Result instead of the Int.
My understanding of the documentation mentioned above is that both should be possible but actually I am not able to understand the observed behaviour. To test the first case I changed line 17 of the above code to:
17 case in: El[Array[Byte]] => copy(state = if(received + in.e.length > 10000) 'Error else 'Cont, input = in, received = received + in.e.length)
So I just added a condition to switch into the error state if more than 10000 bytes were received. The output I get is this:
'Cont Empty 0
'Cont El([B#38ecece6) 8192
'Error El([B#4ab50d3c) 16384
Error
Error
Error
Error
Error
Error
Error
Error
Error
Error
Error
Then the request hangs forever and never ends. My expectation from the above mentioned docs was that when I call the error function inside fold of an Iteratee the processing should be stopped. What is happening here is that the Iteratee's fold method is called several times after error has been called - well and then the request hangs.
When I switch into done state before reading all input the behaviour is quite similar. Changing line 15 to:
15 case 'Done => { println("Done with " + input); done(if (input == EOF) Right(received) else Left(BadRequest), Input.Empty) }
and line 17 to:
17 case in: El[Array[Byte]] => copy(state = if(received + in.e.length > 10000) 'Done else 'Cont, input = in, received = received + in.e.length)
produces the following output:
'Cont Empty 0
'Cont El([B#16ce00a8) 8192
'Done El([B#2e8d214a) 16384
Done with El([B#2e8d214a)
Done with El([B#2e8d214a)
Done with El([B#2e8d214a)
Done with El([B#2e8d214a)
and again the request hangs forever.
My main question is why the request is hanging in the above mentioned cases. If anybody could shed light on this I would greatly appreciate it!
Your understanding is perfectly right and I have just push a fix to master:
https://github.com/playframework/Play20/commit/ef70e641d9114ff8225332bf18b4dd995bd39bcc
Fixed both cases plus exceptions in the Iteratees.
Nice use of copy in case class for doing an Iteratee BTW.
Things must have changed with Play 2.1 - Promise is no longer parametric, and this example no longer compiles.