Standalone Spark Task Hangs When Inserting Into DB - scala

I have a standalone spark 1.4.1 job that runs on a Red Hat box I submit via spark-submit that sometimes hangs during insertion of data from an RDD. I have auto-commit on the connection turned off and commit the transactions in batches of insertions. What the logs show me before it hangs:
16/03/25 14:00:05 INFO Executor: Finished task 3.0 in stage 138.0 (TID 915). 1847 bytes result sent to driver
16/03/25 14:00:05 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(StatusUpdate(915,FINISHED,java.nio.HeapByteBuffer[pos=0 lim=1847 cap=1
16/03/25 14:00:05 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(StatusUpdate(915,FINISHED,java.nio.HeapByteBuffer[pos=0 lim=1847 cap=1847
16/03/25 14:00:05 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_138, runningTasks: 1
16/03/25 14:00:05 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.118 ms) AkkaMessage(StatusUpdate(915,FINISHED,java.nio.HeapByteBuffer[pos=621 li
16/03/25 14:00:05 INFO TaskSetManager: Finished task 3.0 in stage 138.0 (TID 915) in 7407 ms on localhost (23/24)
16/03/25 14:00:05 TRACE DAGScheduler: Checking for newly runnable parent stages
16/03/25 14:00:05 TRACE DAGScheduler: running: Set(ResultStage 138)
16/03/25 14:00:05 TRACE DAGScheduler: waiting: Set()
16/03/25 14:00:05 TRACE DAGScheduler: failed: Set()
16/03/25 14:00:10 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received message AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;#7ed52306,BlockManagerId(driver, local
16/03/25 14:00:10 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;#7ed52306,BlockManagerId(driver, localhos
16/03/25 14:00:10 DEBUG AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled message (0.099 ms) AkkaMessage(Heartbeat(driver,[Lscala.Tuple2;#7ed52306,BlockManagerId(dri
And then it just repeats the last 3 lines with this intermittently:
16/03/25 14:01:04 TRACE HeartbeatReceiver: Checking for hosts with no recent heartbeats in HeartbeatReceiver.
The kicker is that I can't take a look at the web UI due to some firewall issues on these machines. What I noticed is that this issue was more prevalent when I was inserting with batches of 1000 than with 100. This is the scala code that looks to be the culprit.
//records should have up to INSERT_BATCH_SIZE entries
private def insertStuff(records: Seq[(String, (String, Stuff1, Stuff2, Stuff3))]) {
if (!records.isEmpty) {
//get statement used for insertion (instantiated in an array of statements)
val stmt = stuffInsertArray(//stuff)
logger.info("Starting insertions on stuff" + table + " for " + time + " with " + records.length + " records")
try {
records.foreach(record => {
//get vals from record
...
//perform sanity checks
if (//validate stuff)
{
//log stuff because it didn't validate
}
else
{
stmt.setInt(1, //stuff)
stmt.setLong(2, //stuff)
...
stmt.addBatch()
}
})
//check if connection is still valid
if (!connInsert.isValid(VALIDATE_CONNECTION_TIMEOUT))
{
logger.error("Insertion connection is not valid while inserting stuff.")
throw new RuntimeException(s"Insertion connection not valid while inserting stuff.")
}
logger.debug("Stuff insertion executing batch...")
stmt.executeBatch()
logger.debug("Stuff insertion execution complete. Committing...")
//commit insert batch. Either INSERT_BATCH_SIZE insertions planned or the last batch to be done
insertCommit() //this does the commit and resets some counters
logger.debug("stuff insertion commit complete.")
} catch {
case e: Exception => throw new RuntimeException(s"insertStuff exception ${e.getMessage}")
}
}
}
And here's how it gets called:
//stuffData is an RDD
stuffData.foreachPartition(recordIt => {
//new instance of the object of whose member function we're currently in
val obj = new Obj(clusterInfo)
recordIt.grouped(INSERT_BATCH_SIZE).foreach(records => obj.insertStuff(records))
})
All the extra logging and connection checking I put in just to isolate the issue but since I write for every batch of insertions, the logs get convoluted. If I serialize the insertions, the issue still persists. Any idea why the last task (out of 24) doesn't finish? Thanks.

Related

Exactly once semantic with spring kafka

Im trying to test my exactly once configuration to make sure all the configs i set are correct and the behavior is as i expect
I seem to encounter a problem with duplicate sends
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage(new TopicPartition("someTopic2", cr.partition()),
new OffsetAndMetadata(cr.offset() + 1),"something1", "im in transaction", cr.key());
acknowledgment.acknowledge();
}
}).build();
consumer.start();
}
this is my "test", you can assume the builder puts the right configuration.
ConsumerMessageLogic is a class that handles the "process" part of the read-process-write that the exactly once semantic is supporting
inside the producer class i have a send message method like so:
public void sendMessage(TopicPartition topicPartition, OffsetAndMetadata offsetAndMetadata,String sendToTopic, V message, PK partitionKey) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicPartition.topic(), partitionKey, message);
if(kafkaTemplate.getProducerFactory().transactionCapable()){
kafkaTemplate.executeInTransaction(operations -> {
sendMessage(message, partitionKey, sendToTopic, partitionAndMessagePair, operations);
operations.sendOffsetsToTransaction(
Map.of(topicPartition, offsetAndMetadata),"bla");
return true;
});
}else{
sendMessage(message, partitionKey, topicPartition.topic(), partitionAndMessagePair, kafkaTemplate);
}
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
I create my consumer like so:
/**
* Start the message consumer
* The record event will be delegate on the onMessage()
*/
public void start() {
initConsumerMessageListenerContainer();
container.start();
}
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer() {
// start a acknowledge message listener to allow the manual commit
messageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(messageListener);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener));
}
when i create my producer i create it with UUID as transaction prefix like so
public ProducerFactory<PK, V> producerFactory(boolean isTransactional) {
ProducerFactory<PK, V> res = new DefaultKafkaProducerFactory<>(props);
if(isTransactional){
((DefaultKafkaProducerFactory<PK, V>) res).setTransactionIdPrefix(UUID.randomUUID().toString());
((DefaultKafkaProducerFactory<PK, V>) res).setProducerPerConsumerPartition(true);
}
return res;
}
Now after everything is set up, i bring 2 instances up on a topic with 2 partitions
each instance get 1 partitions from the consumed topic.
i send a message and wait in debug for the transaction timeout ( to simulate loss of connection)
in instance A, once the timeout passes the other instance( instance B) automatically processes the record and send it to the target topic cause a re-balance occurred
So far so good.
Now when i release the break point on instance A, it says its re-balancing and couldn't commit, but i still see another output record in my destination topic.
My expectation was that instance A wont continue its work once i release the breakpoint as the record was already processed.
Am i doing something wrong?
Can this scenario be achieved?
edit 2:
after garys remarks about the execute in transaction, i get the duplicate record if i freeze one of the instances till the timeout and release it after the other instance processed the record, then the freezed instance process and produce the same record to the out put topic...
public static void main(String[] args) {
MessageProducer producer = new ProducerBuilder()
.setBootstrapServers("kafka:9992")
.setKeySerializerClass(StringSerializer.class)
.setValueSerializerClass(StringSerializer.class)
.setProducerEnableIdempotence(true).build();
MessageConsumer consumer = new ConsumerBuilder()
.setBootstrapServers("kafka:9992")
.setIsolationLevel("read_committed")
.setTopics("someTopic2")
.setGroupId("bla")
.setKeyDeserializerClass(StringDeserializer.class)
.setValueDeserializerClass(MapDeserializer.class)
.setConsumerMessageLogic(new ConsumerMessageLogic() {
#Override
public void onMessage(ConsumerRecord cr, Acknowledgment acknowledgment) {
producer.sendMessage("something1", "im in transaction");
}
}).build();
consumer.start(producer.getProducerFactory());
}
the new sendMessage method in the producer without executeInTransaction
public void sendMessage(V message, PK partitionKey, String topicName) {
try {
KafkaRecord<PK, V> partitionAndMessagePair = producerMessageLogic.prepareMessage(topicName, partitionKey, message);
sendMessage(message, partitionKey, topicName, partitionAndMessagePair, kafkaTemplate);
}catch (Exception e){
failureHandler.onFailure(partitionKey, message, e);
}
}
as well as i changed the consumer container creation to have a transaction manager with the same producerfactory as suggested
/**
* Initialize the kafka message listener
*/
private void initConsumerMessageListenerContainer(ProducerFactory<PK,V> producerFactory) {
// start a acknowledge message listener to allow the manual commit
acknowledgingMessageListener = consumerMessageLogic::onMessage;
// start and initialize the consumer container
container = initContainer(acknowledgingMessageListener, producerFactory);
// sets the number of consumers, the topic partitions will be divided by the consumers
container.setConcurrency(springConcurrency);
springContainerPollTimeoutOpt.ifPresent(p -> container.getContainerProperties().setPollTimeout(p));
if (springAckMode != null) {
container.getContainerProperties().setAckMode(springAckMode);
}
}
private ConcurrentMessageListenerContainer<PK, V> initContainer(AcknowledgingMessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
return new ConcurrentMessageListenerContainer<>(
consumerFactory(props),
containerProperties(messageListener, producerFactory));
}
#NonNull
private ContainerProperties containerProperties(MessageListener<PK, V> messageListener, ProducerFactory<PK,V> producerFactory) {
ContainerProperties containerProperties = new ContainerProperties(topics);
containerProperties.setMessageListener(messageListener);
containerProperties.setTransactionManager(new KafkaTransactionManager<>(producerFactory));
return containerProperties;
}
my expectation is that the broker once receiving the processed record from the freezed instance, that it'll know that that record was already handled by another instance as it contains the exact same metadata ( or is it? i mean, the PID will be different, but should it be different?)
Maybe the scenario im looking for is not even supported in the current exactly once support kafka and spring provides...
if i have 2 instances of read-process-write - that means i have 2 producers with 2 different PID's.
Now when i freeze one of the instances, when the unfrozen instance gets the record process responsibility due to a rebalance, it will send the record with its own PID and a sequence in the metadata.
Now when i release the frozen instance, he sends the same record but with its own PID, so theres no way the broker will know its a duplicate...
Am i wrong? how can i avoid this scenario? i though the re-balance stops the instance and doesnt let it complete its process ( where he produce the duplicate record) cause he no longer has responsibility about that record
Adding the logs:
frozen instance: you can see the freeze time at 10:53:34 and i released it at 10:54:02 ( rebalance time is 10 secs)
2020-06-16 10:53:34,393 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
2020-06-16 10:53:34,394 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
beginTransaction()
2020-06-16 10:53:34,395 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]]
2020-06-16 10:54:02,157 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Group coordinator X.X.X.X:9992 (id: 2147482646 rack:
null) is unavailable or invalid, will attempt rediscovery
2020-06-16 10:54:02,181 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:54:02,189 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[252] to topic something1
2020-06-16 10:54:02,193 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0, transactionalId=b76e8aba-8149-48f8-857b-
a19195f5a20abla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:54:02,263 INFO [${sys:spring.application.name}] [kafka-
coordinator-heartbeat-thread | bla]
[o.a.k.c.c.i.AbstractCoordinator.:] [Consumer clientId=consumer-bla-1,
groupId=bla] Discovered group coordinator 192.168.144.1:9992 (id:
2147482646 rack: null)
2020-06-16 10:54:02,295 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:54:02,296 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#5c7f5906]
commitTransaction()
2020-06-16 10:54:02,299 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:54:02,301 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.AbstractCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Attempt to heartbeat failed for
since member id consumer-bla-1-b3ad1c09-ad06-4bc4-a891-47a2288a830f is
not valid.
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Giving away all assigned
partitions as lost since generation has been reset,indicating that
consumer is no longer part of the group
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.a.k.c.c.i.ConsumerCoordinator.:] [Consumer
clientId=consumer-bla-1, groupId=bla] Lost previously assigned
partitions someTopic2-0
2020-06-16 10:54:02,302 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions lost: [someTopic2-0]
2020-06-16 10:54:02,303 INFO [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.l.ConcurrentMessageListenerContainer.info:279]
bla: partitions revoked: [someTopic2-0]
2020-06-16 10:54:02,303 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
The regular instance that takes over the partation and produce the record after a rebalance
2020-06-16 10:53:46,536 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
Created new Producer: CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
2020-06-16 10:53:46,537 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
beginTransaction()
2020-06-16 10:53:46,539 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.doBegin:149] Created
Kafka transaction on producer [CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]]
2020-06-16 10:53:46,556 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Sending offsets to transaction: {someTopic2-
0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
2020-06-16 10:53:46,563 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [i.i.k.s.p.SimpleSuccessHandler.:] Sent
message=[im in transaction] with offset=[250] to topic something1
2020-06-16 10:53:46,566 INFO [${sys:spring.application.name}] [kafka-
producer-network-thread | producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] [o.a.k.c.p.i.TransactionManager.:]
[Producer clientId=producer-1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0, transactionalId=1d8e74d3-8986-4458-89b7-
6d3e5756e213bla.someTopic2.0] Discovered group coordinator
X.X.X.X:9992 (id: 1001 rack: null)
2020-06-16 10:53:46,668 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.t.KafkaTransactionManager.processCommit:740]
Initiating transaction commit
2020-06-16 10:53:46,669 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1] [o.s.k.c.DefaultKafkaProducerFactory.debug:296]
CloseSafeProducer
[delegate=org.apache.kafka.clients.producer.KafkaProducer#26c76153]
commitTransaction()
2020-06-16 10:53:46,672 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Commit list: {}
2020-06-16 10:53:51,673 DEBUG [${sys:spring.application.name}]
[consumer-0-C-1]
[o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer.debug:296]
Received: 0 records
I noticed they both note the exact same offset to commit
Sending offsets to transaction: {someTopic2-0=OffsetAndMetadata{offset=23, leaderEpoch=null, metadata=''}}
i thought when they try to commit the exact same thing the broker will abort one of the transactions...
I also noticed that if i reduce the transaction.timeout.ms to just 2 seconds, it doesnt abort the transaction no matter how long i freeze the instance on debug...
maybe the timer of transaction.timeout.ms starts only after i send the message?
You must not use executeInTransaction at all - see its Javadocs; it is used when there is no active transaction or if you explicitly don't want an operation to participate in an existing transaction.
You need to add a KafkaTransactionManager to the listener container; it must have a reference to same ProducerFactory as the template.
Then, the container will start the transaction and, if successful, send the offset to the transaction.

Why Spark structured streaming job is not terminating even after raising exception

I am raising a custom exception to test failure in my structured streaming job as below. I see the query gets terminated but not able to understand why driver script is not failing with a non zero exit code
streamingDF.writeStream
.trigger(Trigger.ProcessingTime(10000L))
.foreachBatch {
(batchDF: DataFrame, batchId: Long) => {
val transformedDF: DataFrame = DoSomeProcessing(batchDF)
if (batchId == 1) {
throw new Exception("Custom Exception as batchId is 1")
}
I get below trace on my console but the driver script is not exiting and no new logs are printed on console.
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Custom Exception as batchId is 1
=== Streaming Query ===
Identifier: [id = 6f4c3b4c-bc30-46fe-93ef-8378c23380ab, runId = 1241cb37-493b-4882-ab28-9df8a8c6fb1a]
Current Committed Offsets: ...
Current Available Offsets: ...
Current State: ACTIVE
Thread State: RUNNABLE
Logical Plan:
RepartitionByExpression [timestamp#12], 10
...
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: java.lang.Exception: Custom Exception as batchId is 1
at MySteamingApp$$anonfun$startSparkStructuredStreaming$1.apply(MySteamingApp.scala:61)
at MySteamingApp$$anonfun$startSparkStructuredStreaming$1.apply(MySteamingApp.scala:57)
at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:534)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:532)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:531)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
... 1 more
I think number of task failures were configured more
spark.task.maxFailures default 4 Number of failures of any particular task before giving up on the job. The total number of failures spread across different tasks will not cause the job to fail; a particular task has to fail this number of attempts. Should be greater than or equal to 1. Number of allowed retries = this value - 1.
Further have a look at Is there a way to dynamically stop Spark Structured Streaming?

pyspark on emr with boto3, copy of s3 object result with Futures timed out after [100000 milliseconds]

I have a pyspark application that will transform csv to parquet and before this happen I'm copying some S3 object from a bucket to another.
pyspark with spark 2.4, emr 5.27, maximizeResourceAllocation set to true
I have various csv files size, from 80kb to 500mb.
Nonetheless, my EMR cluster (it doesn't fail on local with spark-submit) fails at 70% completion on a file that is 166mb (a previous at 480mb succeeded).
The job is simple:
def organise_adwords_csv():
s3 = boto3.resource('s3')
bucket = s3.Bucket(S3_ORIGIN_RAW_BUCKET)
for obj in bucket.objects.filter(Prefix=S3_ORIGIN_ADWORDS_RAW + "/"):
key = obj.key
copy_source = {
'Bucket': S3_ORIGIN_RAW_BUCKET,
'Key': key
}
key_tab = obj.key.split("/")
if len(key_tab) < 5:
print("continuing from length", obj)
continue
file_name = ''.join(key_tab[len(key_tab)-1:len(key_tab)])
if file_name == '':
print("continuing", obj)
continue
table = file_name.split("_")[1].replace("-", "_")
new_path = "{0}/{1}/{2}".format(S3_DESTINATION_ORDERED_ADWORDS_RAW_PATH, table, file_name)
print("new_path", new_path) <- the last print will end here
try:
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
print("copy done")
except Exception as e:
print(e)
print("an exception occured while copying")
if __name__=='__main__':
organise_adwords_csv()
print("copy Final done") <- never printed
spark = SparkSession.builder.appName("adwords_transform") \
...
but, in the stdout, no errors / exception are showing.
In stderr logs:
19/10/09 16:16:57 INFO ApplicationMaster: Waiting for spark context initialization...
19/10/09 16:18:37 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/10/09 16:18:37 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
)
19/10/09 16:18:37 INFO ShutdownHookManager: Shutdown hook called
I'm completely blind, I don't understand what is failing / why.
How can I figure that out? On local it works like a charm (but super slow of course)
Edit:
After many tries I can confirm that the function:
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
make the EMR cluster timeout, even tho it processed 80% of the files already.
Does anyone have a recommendation about this?
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
This will fail for any source object larger than 5 GB. please use multipart upload in AWS. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#multipartupload

mapWithState assertion failed: Block rdd_45_0 is not locked for reading

I'm using the function mapWithState() to count UV in my spark streaming application. After mapWithState I get a dstream and foreachRDD with it. In the function foreachRDD, there is a rdd.foreachPartition to foreach the Iterator, and next apply foreach on Iterator with Future, but I got an error in the Future.
Error log here:
> 17/07/27 10:19:54.0447 INFO Executor: Finished task 1.0 in stage 52.0 (TID 422). 1878 bytes result sent to driver
> 17/07/27 10:19:54.0454 DEBUG BlockManagerSlaveEndpoint: removing RDD 47
> 17/07/27 10:19:54.0454 INFO BlockManager: Removing RDD 47
> 17/07/27 10:19:54.0455 DEBUG BlockManagerSlaveEndpoint: Done removing RDD 47, response is 0
> 17/07/27 10:19:54.0455 DEBUG BlockManagerSlaveEndpoint: Sent response: 0 to 192.168.1.30:43968
> 17/07/27 10:19:54.0456 DEBUG BlockManagerSlaveEndpoint: removing RDD 46
> 17/07/27 10:19:54.0456 INFO BlockManager: Removing RDD 46
> 17/07/27 10:19:54.0456 DEBUG BlockManagerSlaveEndpoint: Done removing RDD 46, response is 0
> 17/07/27 10:19:54.0456 DEBUG BlockManagerSlaveEndpoint: Sent response: 0 to 192.168.1.30:43968
> 17/07/27 10:19:54.0461 WARN BoneCP: Thread close connection monitoring has been enabled. This will negatively impact on your
> performance. Only enable this option for debugging purposes!
> 17/07/27 10:19:54.0873 WARN ClickAnalysis$: before parpair data with threadName=ForkJoinPool-1-worker-5 and threadId=46
> 17/07/27 10:19:54.0873 WARN ClickAnalysis$: before parpair data with threadName=ForkJoinPool-1-worker-3 and threadId=50
> 17/07/27 10:19:54.0875 WARN ClickAnalysis$: come into foreach data with threadName=ForkJoinPool-1-worker-5 and threadId=46
> 17/07/27 10:19:54.0875 WARN ClickAnalysis$: come into foreach data with threadName=ForkJoinPool-1-worker-3 and threadId=50
> Exception: java.util.concurrent.ExecutionException: Boxed Error
> at scala.concurrent.impl.Promise$.resolver(Promise.scala:55)
> at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:47)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244)
> at scala.concurrent.Promise$class.complete(Promise.scala:55)
> at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
> at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.AssertionError: assertion failed: Block rdd_45_0 is not locked for reading
> at scala.Predef$.assert(Predef.scala:170)
> at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
> at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:720)
> at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:516)
> at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
> at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
> at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at ClickAnalysis$.doPrepairCamAndGmtUvPs(ClickAnalysis.scala:383)
> at ClickAnalysis$$anonfun$8.apply(ClickAnalysis.scala:353)
> at ClickAnalysis$$anonfun$8.apply(ClickAnalysis.scala:345)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> ... 5 more
and my code here:
val mapState3=pairs.mapWithState(StateSpec.function(mappingFunction).timeout(Duration(uvExpireTime.toLong))).map( x => (x._1, x._2.estimatedSize.toLong))
mapState3.foreachRDD( { rdd =>{
rdd.foreachPartition( uvRecord =>{
if (!uvRecord.isEmpty) {
doUpdateUV(uvRecord)
}
})
def doUpdateUV(data:Iterator[(String, Long)]):Unit ={
if(data != null){
val f = Future{
var connection:Connection = null
try{
connection = ConnectionPool.getConnection.getOrElse(null)
connection.setAutoCommit(false)
val camPs: PreparedStatement = connection.prepareStatement(updateUvCamCnt_sql)
val gmtPs: PreparedStatement = connection.prepareStatement(updateUvGmtCnt_sql)
logger.warn("before parpair data with threadName="+Thread.currentThread().getName+" and threadId="+Thread.currentThread().getId)
for(uvRecord <- data) {
logger.warn("come into foreach data with threadName=" + Thread.currentThread().getName + " and threadId=" + Thread.currentThread().getId)
}
logger.warn("come into batch update with threadName="+Thread.currentThread().getName+" and threadId="+Thread.currentThread().getId)
camPs.executeBatch()
gmtPs.executeBatch()
connection.commit()
camPs.close()
gmtPs.close()
} catch {
case exception: Exception =>
logger.error("Error in batchUpdate "+ exception.getMessage + "-----------------------" + ExceptionUtils.getStackTrace(exception) + "-----------------------------")
throw exception
} finally {
ConnectionPool.closeConnection(connection)
}
"success"
}
f onSuccess {
case result => println(s"Success: $result")
}
f onFailure {
case t => println(s"Exception: ${ExceptionUtils.getStackTrace(t)}")
}
}
I look forward for getting any useful solution for this problem .
I had the same issue:
java.lang.AssertionError: assertion failed: Block rdd_xx_xx is not
locked for reading
I fixed it by just adding more clusters. It seems to have been a memory issue.
Based on what I have read from the different jiras, this is a race condition. Multiple attempts at fixing it have been checked in. I am experiencing this issue in 2.4.4, and looks like 3.0.0 might have fixed this issue.
For me it is happening during a call to df.rdd.isEmpty()
If you want more information what I found here are the resources:
First jira on the issue
Later jira on same issue (duplicate),
but later spark version
More details on why this is a race
condition
Very old jira when this seems to have first
appeared

akka-http no stack trace or details on error

I got a structure which can basically be summarized as:
outside user makes a rest request to akka-http server
akka-http makes a request(query?) to a (some)data source using asynchttpclient
akka-http transforms the result from asynchttpclient and serves it back to user
At some point I am getting an error from akka which tells me almost nothing. This error happens right after the asynchttpclient returns me some results. (I can infact at this point print the results on the log, they are there parsed from json etc.. but akka had already errored out)
Even in debug logging level I got no decipherable error message from akka or a stacktrace.
only message I got is:
2017-03-24 17:22:55 INFO CompanyRepository:111 - search company with name:"somecompanyname"
2017-03-24 17:22:55 INFO CompanyRepository:73 - [QUERY TIME]: 527ms
[ERROR] [03/24/2017 17:22:55.951] [company-api-system-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(company-api-system)] Error during processing of request: 'requirement failed'. Completing with 500 Internal Server Error response.
This error message is the only thing I get. Relevant parts of my config:
akka {
loglevel = "DEBUG"
# edit -- tested with sl4jlogger with no change
#loggers = ["akka.event.slf4j.Slf4jLogger"]
#logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
parsing {
max-content-length = 800m
max-chunk-size = 100m
}
server {
server-header = akka-http/${akka.http.version}
idle-timeout = 120 s
request-timeout = 120 s
bind-timeout = 10s
max-connections = 1024
pipelining-limit = 32
verbose-error-messages = on
}
client {
user-agent-header = akka-http/${akka.http.version}
}
host-connection-pool {
max-connections = 4
}
}
akka.http.routing {
verbose-error-messages = on
}
Anyone knows if I can make akka to spit out more details about what/where the error is occurring?
Edit: I realized I do NOT get this same error on resultsets which are smaller in size. <- ignore
Edit 2:
Added akka.loglevel = DEBUG, spits out a lot more noise but still not detail about the actual error.
Converted asynchttpclient to akka quickly to rule out AHC
I already had a wrapper around my query to time it, added some logging there trying to pinpoint when exactly the error is happening.
def queryTimer[ R <: Future[ Any ] ]( block: => R ): R = {
val t0 = System.currentTimeMillis()
val result = block
result.onComplete { maybeResult =>
val t1 = System.currentTimeMillis()
logger.info( "[QUERY TIME]: " + ( t1 - t0 ) + "ms" )
maybeResult match {
case Success(some) =>
logger.info( "successful feature:")
logger.info( FormattedString.prettyPrint(some))
case Failure(someFailure) =>
logger.info( "failed feature:")
logger.debug( FormattedString.prettyPrint(someFailure))
}
}
result
}
resulting log:
2017-03-28 13:19:10 INFO CompanyRepository:111 - search company with name:"some company"
[DEBUG] [03/28/2017 13:19:10.497] [company-api-system-akka.actor.default-dispatcher-2] [EventStream(akka://xca-api-actor-system)] logger log1-Logging$DefaultLogger started
[DEBUG] [03/28/2017 13:19:10.497] [company-api-system-akka.actor.default-dispatcher-2] [EventStream(akka://xca-api-actor-system)] Default Loggers started
[DEBUG] [03/28/2017 13:19:10.613] [company-api-system-akka.actor.default-dispatcher-2] [AkkaSSLConfig(akka://xca-api-actor-system)] Initializing AkkaSSLConfig extension...
[DEBUG] [03/28/2017 13:19:10.613] [company-api-system-akka.actor.default-dispatcher-2] [AkkaSSLConfig(akka://xca-api-actor-system)] buildHostnameVerifier: created hostname verifier: com.typesafe.sslconfig.ssl.DefaultHostnameVerifier#779e2339
[DEBUG] [03/28/2017 13:19:10.633] [xca-api-actor-system-akka.actor.default-dispatcher-3] [akka://xca-api-actor-system/user/pool-master/PoolInterfaceActor-0] (Re-)starting host connection pool to localhost:27474
[DEBUG] [03/28/2017 13:19:10.727] [xca-api-actor-system-akka.actor.default-dispatcher-3] [akka://xca-api-actor-system/system/IO-TCP/selectors/$a/0] Resolving localhost before connecting
[DEBUG] [03/28/2017 13:19:10.740] [xca-api-actor-system-akka.actor.default-dispatcher-4] [akka://xca-api-actor-system/system/IO-DNS] Resolution request for localhost from Actor[akka://xca-api-actor-system/system/IO-TCP/selectors/$a/0#-815754478]
[DEBUG] [03/28/2017 13:19:10.749] [xca-api-actor-system-akka.actor.default-dispatcher-4] [akka://xca-api-actor-system/system/IO-TCP/selectors/$a/0] Attempting connection to [localhost/127.0.0.1:27474]
[DEBUG] [03/28/2017 13:19:10.751] [xca-api-actor-system-akka.actor.default-dispatcher-4] [akka://xca-api-actor-system/system/IO-TCP/selectors/$a/0] Connection established to [localhost:27474]
2017-03-28 13:19:10 INFO CompanyRepository:73 - [QUERY TIME]: 376ms
2017-03-28 13:19:10 INFO CompanyRepository:77 - successful feature:
[ERROR] [03/28/2017 13:19:10.896] [company-api-system-akka.actor.default-dispatcher-7] [akka.actor.ActorSystemImpl(company-api-system)] Error during processing of request: 'requirement failed'. Completing with 500 Internal Server Error response.
2017-03-28 13:19:10 INFO CompanyRepository:78 - SearchResult(List(
( prettyprint output here!!! lots and lots of legit result, json parsed succcesfully into a bunch of case classes)
as you can see my logging format and akkas' are different, the ERROR is coming from akka with do details, while everything looks like working.
Edit 3: logs with sleep in between calls
new query timer function with sleeps
def queryTimer[ R <: Future[ Any ] ]( block: => R ): R = {
val t0 = System.currentTimeMillis()
val result = block
result.onComplete { maybeResult =>
val t1 = System.currentTimeMillis()
logger.info( "[QUERY TIME]: " + ( t1 - t0 ) + "ms" )
maybeResult match {
case Success(some) =>
Thread.sleep(500)
logger.info( "successful feature:")
Thread.sleep(500)
logger.info( FormattedString.prettyPrint(some))
Thread.sleep(500)
logger.info("we are there!")
case Failure(someFailure) =>
logger.info( "failed feature:")
logger.debug( FormattedString.prettyPrint(someFailure))
}
}
result
}
logs with sleeps
[DEBUG] [03/30/2017 11:11:58.629] [xca-api-actor-system-akka.actor.default-dispatcher-7] [akka://xca-api-actor-system/system/IO-TCP/selectors/$a/0] Attempting connection to [localhost/127.0.0.1:27474]
[DEBUG] [03/30/2017 11:11:58.631] [xca-api-actor-system-akka.actor.default-dispatcher-7] [akka://xca-api-actor-system/system/IO-TCP/selectors/$a/0] Connection established to [localhost:27474]
11:11:59.442 [pool-2-thread-1] DEBUG o.a.netty.channel.DefaultChannelPool - Closed 0 connections out of 0 in 0 ms
11:11:59.496 [pool-1-thread-1] DEBUG o.a.netty.channel.DefaultChannelPool - Closed 0 connections out of 0 in 0 ms
11:12:00.250 [ForkJoinPool-2-worker-15] INFO c.s.s.r.neo4j.CompanyRepository - [QUERY TIME]: 1880ms
[ERROR] [03/30/2017 11:12:00.265] [company-api-system-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(company-api-system)] Error during processing of request: 'requirement failed'. Completing with 500 Internal Server Error response.
11:12:00.543 [pool-2-thread-1] DEBUG o.a.netty.channel.DefaultChannelPool - Closed 0 connections out of 0 in 0 ms
11:12:00.597 [pool-1-thread-1] DEBUG o.a.netty.channel.DefaultChannelPool - Closed 0 connections out of 0 in 0 ms
11:12:00.752 [ForkJoinPool-2-worker-15] INFO c.s.s.r.neo4j.CompanyRepository - successful feature:
11:12:01.645 [pool-2-thread-1] DEBUG o.a.netty.channel.DefaultChannelPool - Closed 0 connections out of 0 in 0 ms
11:12:01.697 [pool-1-thread-1] DEBUG o.a.netty.channel.DefaultChannelPool - Closed 0 connections out of 0 in 0 ms
11:12:01.750 [ForkJoinPool-2-worker-15] INFO c.s.s.r.neo4j.CompanyRepository - SearchResult(List( "lots of legit result here"
11:12:02.281 [ForkJoinPool-2-worker-15] INFO c.s.s.r.neo4j.CompanyRepository - we are there!
Edit 4 and solution!
Apparently the default exception handler does not print a stack trace! overriding the exception handler with a very basic catch all:
implicit def myExceptionHandler: ExceptionHandler =
ExceptionHandler {
case e: Exception => {
logger.info("---------------- exception log start")
logger.error(e.getMessage, e)
logger.error("cause" , e.getCause)
logger.error("cause" , e.getStackTraceString )
logger.info( FormattedString.prettyPrint(e))
logger.info("---------------- exception log end")
Directives.complete("server made a boo boo")
}
}
results in a stack trace that befuddles the sh*t out of me!!
11:42:04.634 [company-api-system-akka.actor.default-dispatcher-2] INFO c.stepweb.scarifgate.CompanyApiApp$ - ---------------- exception log start
11:42:04.640 [company-api-system-akka.actor.default-dispatcher-2] ERROR c.stepweb.scarifgate.CompanyApiApp$ - requirement failed
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:212) ~[scala-library-2.11.8.jar:na]
at spray.json.BasicFormats$StringJsonFormat$.write(BasicFormats.scala:121) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.BasicFormats$StringJsonFormat$.write(BasicFormats.scala:119) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.ProductFormats$class.productElement2Field(ProductFormats.scala:46) ~[spray-json_2.11-1.3.2.jar:na]
at com.stepweb.scarifgate.services.CompanyService.productElement2Field(CompanyService.scala:14) ~[classes/:na]
at spray.json.ProductFormatsInstances$$anon$3.write(ProductFormatsInstances.scala:73) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.ProductFormatsInstances$$anon$3.write(ProductFormatsInstances.scala:68) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.PimpedAny.toJson(package.scala:39) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.CollectionFormats$$anon$1$$anonfun$write$1.apply(CollectionFormats.scala:26) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.CollectionFormats$$anon$1$$anonfun$write$1.apply(CollectionFormats.scala:26) ~[spray-json_2.11-1.3.2.jar:na]
at scala.collection.immutable.List.map(List.scala:273) ~[scala-library-2.11.8.jar:na]
at spray.json.CollectionFormats$$anon$1.write(CollectionFormats.scala:26) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.CollectionFormats$$anon$1.write(CollectionFormats.scala:25) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.ProductFormats$class.productElement2Field(ProductFormats.scala:46) ~[spray-json_2.11-1.3.2.jar:na]
at com.stepweb.scarifgate.services.CompanyService.productElement2Field(CompanyService.scala:14) ~[classes/:na]
at spray.json.ProductFormatsInstances$$anon$1.write(ProductFormatsInstances.scala:30) ~[spray-json_2.11-1.3.2.jar:na]
at spray.json.ProductFormatsInstances$$anon$1.write(ProductFormatsInstances.scala:26) ~[spray-json_2.11-1.3.2.jar:na]
at akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport$$anonfun$sprayJsonMarshaller$1.apply(SprayJsonSupport.scala:62) ~[akka-http-spray-json_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport$$anonfun$sprayJsonMarshaller$1.apply(SprayJsonSupport.scala:62) ~[akka-http-spray-json_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anonfun$compose$1$$anonfun$apply$15.apply(Marshaller.scala:73) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anonfun$compose$1$$anonfun$apply$15.apply(Marshaller.scala:73) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anon$1.apply(Marshaller.scala:92) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.GenericMarshallers$$anonfun$optionMarshaller$1$$anonfun$apply$1.apply(GenericMarshallers.scala:19) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.GenericMarshallers$$anonfun$optionMarshaller$1$$anonfun$apply$1.apply(GenericMarshallers.scala:18) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anon$1.apply(Marshaller.scala:92) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.PredefinedToResponseMarshallers$$anonfun$fromStatusCodeAndHeadersAndValue$1$$anonfun$apply$5.apply(PredefinedToResponseMarshallers.scala:58) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.PredefinedToResponseMarshallers$$anonfun$fromStatusCodeAndHeadersAndValue$1$$anonfun$apply$5.apply(PredefinedToResponseMarshallers.scala:57) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anon$1.apply(Marshaller.scala:92) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anonfun$compose$1$$anonfun$apply$15.apply(Marshaller.scala:73) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anonfun$compose$1$$anonfun$apply$15.apply(Marshaller.scala:73) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anon$1.apply(Marshaller.scala:92) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.ToResponseMarshallable$$anonfun$1$$anonfun$apply$1.apply(ToResponseMarshallable.scala:29) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.ToResponseMarshallable$$anonfun$1$$anonfun$apply$1.apply(ToResponseMarshallable.scala:29) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.Marshaller$$anon$1.apply(Marshaller.scala:92) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.GenericMarshallers$$anonfun$futureMarshaller$1$$anonfun$apply$3$$anonfun$apply$4.apply(GenericMarshallers.scala:33) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.marshalling.GenericMarshallers$$anonfun$futureMarshaller$1$$anonfun$apply$3$$anonfun$apply$4.apply(GenericMarshallers.scala:33) ~[akka-http_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.util.FastFuture$.akka$http$scaladsl$util$FastFuture$$strictTransform$1(FastFuture.scala:41) ~[akka-http-core_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.util.FastFuture$$anonfun$transformWith$extension1$1.apply(FastFuture.scala:51) [akka-http-core_2.11-10.0.0.jar:10.0.0]
at akka.http.scaladsl.util.FastFuture$$anonfun$transformWith$extension1$1.apply(FastFuture.scala:50) [akka-http-core_2.11-10.0.0.jar:10.0.0]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.11.8.jar:na]
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) [akka-actor_2.11-2.4.16.jar:na]
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) [akka-actor_2.11-2.4.16.jar:na]
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) [akka-actor_2.11-2.4.16.jar:na]
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) [akka-actor_2.11-2.4.16.jar:na]
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) [scala-library-2.11.8.jar:na]
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) [akka-actor_2.11-2.4.16.jar:na]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) [akka-actor_2.11-2.4.16.jar:na]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) [akka-actor_2.11-2.4.16.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.11.8.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.11.8.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.11.8.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.11.8.jar:na]
11:42:04.640 [company-api-system-akka.actor.default-dispatcher-2] ERROR c.stepweb.scarifgate.CompanyApiApp$ - cause
11:42:04.641 [company-api-system-akka.actor.default-dispatcher-2] ERROR c.stepweb.scarifgate.CompanyApiApp$ - cause
11:42:04.644 [company-api-system-akka.actor.default-dispatcher-2] INFO c.stepweb.scarifgate.CompanyApiApp$ - java.lang.IllegalArgumentException: requirement failed
11:42:04.644 [company-api-system-akka.actor.default-dispatcher-2] INFO c.stepweb.scarifgate.CompanyApiApp$ - ---------------- exception log end
so... the exception is caused here in spray.json.BasicFormats
implicit object StringJsonFormat extends JsonFormat[String] {
def write(x: String) = {
require(x ne null) // <-----------------------------------
JsString(x)
}
def read(value: JsValue) = value match {
case JsString(x) => x
case x => deserializationError("Expected String as JsString, but got " + x)
}
}
which sort of means one of the strings in this thousands of lines of response is null. Special thanks goes to the laziness of using that "require" without a message. Debugging which string is empty where will be a nightmare but I still think akka should fail in a better way.
akka-http no stack trace or details on error
Well, default akka-http ExceptionHandler doesn't print stack trace and prints only error message or its class name if the message is empty but you can provide custom exception handler that will print anything you want (i.e. stack trace in your example).
Some examples of how to make a custom exception handler are provided at GitHub ExceptionHandlerExamplesSpec.spec
The simplest way in your case seems to be to define your own custom implicit exception handler
import akka.http.scaladsl.model._
import akka.http.scaladsl.server._
import StatusCodes._
import Directives._
implicit def myExceptionHandler: ExceptionHandler =
ExceptionHandler {
case NonFatal(e) =>
logger.error(s"Exception $e at\n${e.getStackTraceString}")
complete(HttpResponse(InternalServerError, entity = "Internal Server Error"))
}
}
Try setting the loggers as well - from your configuration it seems they're not set. Something like:
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
}
Also, consider using akka-slf4j along with their recommended logging backend logback.
This should make akka spit more details.