Spring WebClient reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException - webclient

I am using Spring WebClient to call a rest service. The code for the post call as mentioned below.
Mono<ClientResponse> response = client.post()
.uri(uriBuilder -> uriBuilder.build())
.headers(httpHeaders -> httpHeaders.setAll(getHeaders()))
.body(BodyInserters.fromPublisher(Mono.just(message), String.class))
.exchange();
response.subscribe(clientResponse -> {
System.out.println(clientResponse.statusCode());
});
I am getting the following exception after posting continuously for a while (after posting 2-3 million requests in 5 mins).
[ parallel-3] r.c.s.Schedulers : Scheduler worker in group main failed with an uncaught exception
reactor.core.Exceptions$ErrorCallbackNotImplemented: reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException: Pool#acquire(Duration) has been pending for more than the configured timeout of 45000ms
Caused by: reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException: Pool#acquire(Duration) has been pending for more than the configured timeout of 45000ms
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.run(AbstractPool.java:317) ~[reactor-netty-0.9.0.RELEASE.jar!/:0.9.0.RELEASE]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Request to POST https://myrest-service-c12-1-lb-125370128.us-west-2.elb.amazonaws.com/antenna [DefaultWebClient]
Stack trace:
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.run(AbstractPool.java:317) ~[reactor-netty-0.9.0.RELEASE.jar!/:0.9.0.RELEASE]
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) [reactor-core-3.3.0.RELEASE.jar!/:3.3.0.RELEASE]
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) [reactor-core-3.3.0.RELEASE.jar!/:3.3.0.RELEASE]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
It looks like the pool is exhausted and I need to limit the number of requests. Could some one help me in solving this issue. Thanks in advance.

Seems to be a bug in reactor-netty that has recently been fixed: https://github.com/reactor/reactor-netty/issues/1012

Related

What causes the io.netty.util.internal.OutOfDirectMemoryError in Spring Cloud Gateway?

We are using the spring cloud gateway to proxy requests to the backend system. Currently, we found the following error. Does anyone know what could possibly cause the following exception?
2019-05-21 09:18:24.792 WARN An exception 'io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2046820359, max: 2058354688)' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2046820359, max: 2058354688)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:226)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:397)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:120)
at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:299)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:278)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1096)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1087)
at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:93)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.lang.Thread.run(Unknown Source)
spring-boot-version: 2.1.3.RELEASE
spring-cloud-version: Greenwich.SR1
reactor-netty: 0.8.5.RELEASE

Milo: Connect to (public) OPC-UA-Server

First of all: I'm a newbie to OPCUA. :)
I'm trying to connect an Milo Client to our Server but don't really understand whats going wrong. The sample Client and Server work fine together, but when I try to connect the client sample with one of the public OPC-UA-Test-Servers I get those exceptions:
15:48:34.729 [ua-netty-event-loop-0] DEBUG
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler
- Sent Hello message on channel=[id: 0xc22800c2, L:/10.22.19.217:58947 - R:opcua.demo-this.com/52.233.134.134:51210]. 15:48:34.729 [ua-netty-event-loop-0] WARN io.netty.channel.DefaultChannelPipeline -
An exceptionCaught() event was fired, and it reached at the tail of
the pipeline. It usually means the last handler in the pipeline did
not handle the exception. java.io.IOException: An existing connection
was forcibly closed by the remote host at
sun.nio.ch.SocketDispatcher.read0(Native Method) at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at
sun.nio.ch.IOUtil.read(IOUtil.java:192) at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:485)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:399)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:371) at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:745) 15:48:39.612
[ForkJoinPool.commonPool-worker-1] DEBUG
org.eclipse.milo.opcua.stack.client.ClientChannelManager - Channel
bootstrap failed: timed out waiting for acknowledge
org.eclipse.milo.opcua.stack.core.UaException: timed out waiting for
acknowledge at
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler.lambda$startHelloTimeout$4(UaTcpClientAcknowledgeHandler.java:156)
at
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler$$Lambda$27/469017260.run(Unknown
Source) at
io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
at
io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
at
io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
at java.lang.Thread.run(Thread.java:745) 15:48:39.613 [main] ERROR
org.eclipse.milo.examples.client.ClientExampleRunner - Error running
example: UaException: status=Bad_Timeout, message=timed out waiting
for acknowledge java.util.concurrent.ExecutionException: UaException:
status=Bad_Timeout, message=timed out waiting for acknowledge at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1887)
at
org.eclipse.milo.examples.client.ClientExampleRunner.createClient(ClientExampleRunner.java:56)
at
org.eclipse.milo.examples.client.ClientExampleRunner.run(ClientExampleRunner.java:103)
at
org.eclipse.milo.examples.client.BrowseExample.main(BrowseExample.java:40)
Caused by: org.eclipse.milo.opcua.stack.core.UaException: timed out
waiting for acknowledge at
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler.lambda$startHelloTimeout$4(UaTcpClientAcknowledgeHandler.java:156)
at
org.eclipse.milo.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler$$Lambda$27/469017260.run(Unknown
Source) at
io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
at
io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
at
io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
at java.lang.Thread.run(Thread.java:745) 15:48:42.842
[threadDeathWatcher-2-1] DEBUG io.netty.buffer.PoolThreadCache - Freed
2 thread-local buffer(s) from thread: ua-netty-event-loop-0
I took the sample-Code and removed the Certificate/Keypair and changed the URL to opc.tcp://opcua.demo-this.com:51210/UA/SampleServer since the public server doesn't need authorization:
SecurityPolicy securityPolicy = clientExample.getSecurityPolicy();
EndpointDescription[] endpoints = UaTcpStackClient.getEndpoints("opc.tcp://opcua.demo-this.com:51210/UA/SampleServer").get();
EndpointDescription endpoint = Arrays.stream(endpoints)
.filter(e -> e.getSecurityPolicyUri().equals(securityPolicy.getSecurityPolicyUri()))
.findFirst().orElseThrow(() -> new Exception("no desired endpoints returned"));
logger.info("Using endpoint: {} [{}]", endpoint.getEndpointUrl(), securityPolicy);
loader.load();
OpcUaClientConfig config = OpcUaClientConfig.builder()
.setApplicationName(LocalizedText.english("eclipse milo opc-ua client"))
.setApplicationUri("urn:eclipse:milo:examples:client")
//.setCertificate(loader.getClientCertificate())
//.setKeyPair(loader.getClientKeyPair())
.setEndpoint(endpoint)
.setIdentityProvider(clientExample.getIdentityProvider())
.setRequestTimeout(uint(5000))
.build();
return new OpcUaClient(config);
What am I missing?
Greetings and thanks in advance :)
Buried in that stack trace is the real problem: UaException: timed out waiting for acknowledge.
Maybe your firewall or network setup is blocking it, or maybe the server didn't send it back, but the problem is that the client never received the Acknowledge message in response to its Hello.
FWIW, I can run the ReadExample against that public server with no issue. In ReadExample I overrode getSecurityPolicy() and returned SecurityPolicy.None and in ClientExampleRunner just replaced the endpoint URL.

RESTEasy Exception: RESTEASY003770: Response is committed, can't handle exception

I'm getting a strange exception from my RESTEasy server. This happens when a particularly large response is being returned to the client.
The exception occurs after my code has returned the result object to RESTEasy, so it's all inside the RESTEasy layer. The server is TomCat.
Small responses are fine, but large ones are triggering the error. This happens if I return JSON or XML, and is triggered by size of the response, not the content in the objects I'm returning.
I've searched around, but haven't found anything helpful yet, and am pretty much lost at this point....
org.jboss.resteasy.spi.UnhandledException: RESTEASY003770: Response is committed, can't handle exception
at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:174)
at org.jboss.resteasy.core.SynchronousDispatcher.writeResponse(SynchronousDispatcher.java:478)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:422)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:209)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:221)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:522)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1095)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:672)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1502)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1458)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:393)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:426)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)
at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:418)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:406)
at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:97)
at org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.write(HttpServletResponseWrapper.java:46)
at org.jboss.resteasy.util.CommitHeaderOutputStream.write(CommitHeaderOutputStream.java:71)
at org.jboss.resteasy.util.DelegatingOutputStream.write(DelegatingOutputStream.java:48)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2003)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator.close(UTF8JsonGenerator.java:1049)
at org.jboss.resteasy.plugins.providers.jackson.ResteasyJackson2Provider.writeTo(ResteasyJackson2Provider.java:209)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.writeTo(AbstractWriterInterceptorContext.java:131)
at org.jboss.resteasy.core.interception.ServerWriterInterceptorContext.writeTo(ServerWriterInterceptorContext.java:60)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:120)
at org.jboss.resteasy.plugins.interceptors.encoding.GZIPEncodingInterceptor.aroundWriteTo(GZIPEncodingInterceptor.java:100)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:124)
at org.jboss.resteasy.plugins.interceptors.encoding.ServerContentEncodingAnnotationFilter.aroundWriteTo(ServerContentEncodingAnnotationFilter.java:60)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:124)
at org.jboss.resteasy.core.ServerResponseWriter.writeNomapResponse(ServerResponseWriter.java:98)
at org.jboss.resteasy.core.SynchronousDispatcher.writeResponse(SynchronousDispatcher.java:473)
... 27 more
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:124)
at org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:101)
at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:172)
at org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:139)
at org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:244)
at org.apache.coyote.http11.InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:189)
at org.apache.coyote.http11.InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:41)
at org.apache.coyote.http11.InternalNioOutputBuffer$SocketOutputBuffer.doWrite(InternalNioOutputBuffer.java:320)
at org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite(IdentityOutputFilter.java:93)
at org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:256)
at org.apache.coyote.Response.doWrite(Response.java:501)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:388)
... 47 more
Update: I actually had this happen again! Scott's answer below covered the first case, but the second time was due to inserting a null value for a response header. RESTEasy accepted the null header value at the time I inserted it, but then blew up when it tried to serialize the response after exiting from my code.
Ok, having been through this now a couple of times - I've personally seen this behaviour when there is an issue with the front end reverse proxy if you are using one. For example, I've seen issues when using nginx where the system was deployed and run with different users. A permission issue on temporary directories means you might not see the issue until the response hits a certain size. For example (it's one line, the line breaks are for visibility):
2017-04-20T17:46:40.03987 [nginx] 2017/04/20 13:46:40 [crit] 64537#0:
*14195 open() "/usr/local/var/run/nginx/proxy_temp/6/43/0000000436"
failed (13: Permission denied) while reading upstream, client: 10.0.0.1,
server: foo.bar.com, request: "GET /this/awesome/endpoint HTTP/1.1",
upstream: "http://127.0.0.1:9999/this/awesome/endpoint", host: "foo.bar.com"
Caused by: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
It seems client is closing the connection before response is send completely to it ,can you increase client timeout and then check

Spring Batch Partitioned Step stopped after hours from when a non-skippable exception occured

I want to verify a behaviour of Spring Batch...
When running a partitioned step of a Job I got this exception:
org.springframework.batch.core.JobExecutionException: Partition handler returned an unsuccessful step
at org.springframework.batch.core.partition.support.PartitionStep.doExecute(PartitionStep.java:111)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:137)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:152)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:131)
at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:301)
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:134)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:127)
only this - no previous exceptions that might have triggered this, and then got a FAILED result for my job.
When searching the logs from previous hours-days I noticed these exceptions(3 of them in different partitioned steps):
06/05/2014 21:50:51.996 [Step3TaskExecutor-12] [] ERROR AbstractStep - Line (222) Encountered an error executing the step
org.springframework.retry.RetryException: Non-skippable exception in recoverer while processing; nested exception is java.io.FileNotFoundException: Source 'blabla....pdf' does not exist
at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor$2.recover(FaultTolerantChunkProcessor.java:281)
at org.springframework.retry.support.RetryTemplate.handleRetryExhausted(RetryTemplate.java:435)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:304)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:188)
at org.springframework.batch.core.step.item.BatchRetryTemplate.execute(BatchRetryTemplate.java:217)
at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor.transform(FaultTolerantChunkProcessor.java:290)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:192)
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:75)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:395)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:267)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:253)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:139)
at org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler$1.call(TaskExecutorPartitionHandler.java:136)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: Source 'blabla....pdf' does not exist
It seemed weird to me that after those exceptions, job continued to run, so I'm thinking that only the slave-steps that this exception occured have failed and master step waited for the rest of slave steps to finish in order to return the first error mentioned.
Can someone verify that this is the problem? it's been driving me crazy for days
That is correct behavior for Spring Batch's partitioning. The PartitionHandler in the master step evaluates the results of all steps at once when they have all returned (or timed out). With regards to what happened in the slaves, those logged errors would be a leading cause to me. However, the definitive answer should be in the job repository (assuming you're using a database backed implementation). When a step fails (even a partitioned slave), the exception is stored there.
I got this error when the utilization of my CPU is very high.
When i added this bean in configuration, it worked for me:
#Bean
public TaskExecutor asyncTaskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(numberOfCores);
return taskExecutor;
}
I use it here:
#Bean
public Step masterStep() throws Exception {
return stepBuilderFactory.get("masterStep")
.partitioner(slaveStep().getName(), partitioner())
.step(slaveStep())
.gridSize(gridSize)
.taskExecutor(asyncTaskExecutor())
.build();
}

Broken pipe exceptions from Redis client Jedis

We have a redis client calls from a play framework Application. This Redis calls are being made from an Actor using Akka Schedular. This scheduler runs every 60 secs which makes redis calls along with other JDBC calls. After scheduler has run for a few mins we start seeing following into the log files and app stops responding to any Redis client calls. This is my first encounter with Redis so any pointers, help is appreciated.
redis.host = localhost
redis.port = 6379
redis.timeout = 10
redis.pool.maxActive =110
redis.pool.maxIdle = 50
redis.pool.maxWait = 3000
redis.pool.testOnBorrow = true
redis.pool.testOnReturn = true
redis.pool.testWhileIdle = true
redis.pool.timeBetweenEvictionRunsMillis = 60000
redis.pool.numTestsPerEvictionRun = 10
Exception details:
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Broken pipe
at redis.clients.jedis.Connection.flush(Connection.java:69) ~[redis.clients.jedis-2.3.0.jar:na]
at redis.clients.jedis.JedisPubSub.subscribe(JedisPubSub.java:58) ~[redis.clients.jedis-2.3.0.jar:na]
............
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) [com.typesafe.akka.akka-actor_2.10-2.2.0.jar:2.2.0]
at akka.actor.ActorCell.invoke(ActorCell.scala:456) [com.typesafe.akka.akka-actor_2.10-2.2.0.jar:2.2.0]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) [com.typesafe.akka.akka-actor_2.10-2.2.0.jar:2.2.0]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) [com.typesafe.akka.akka-actor_2.10-2.2.0.jar:2.2.0]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) [com.typesafe.akka.akka-actor_2.10-2.2.0.jar:2.2.0]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [org.scala-lang.scala-library-2.10.3.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [org.scala-lang.scala-library-2.10.3.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [org.scala-lang.scala-library-2.10.3.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [org.scala-lang.scala-library-2.10.3.jar:na]
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_51]
at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_51]
at redis.clients.util.RedisOutputStream.flushBuffer(RedisOutputStream.java:31) ~[redis.clients.jedis-2.3.0.jar:na]
at redis.clients.util.RedisOutputStream.flush(RedisOutputStream.java:223) ~[redis.clients.jedis-2.3.0.jar:na]
at redis.clients.jedis.Connection.flush(Connection.java:67) ~[redis.clients.jedis-2.3.0.jar:na]
... 15 common frames omitted
The problem is with the time out and that the client you used to do the subscribe timed out / got DC ed.
I have recently encountered a similar problem with Redis and the problem was, I was not returning the Jedis resource after subscribing so the connections were leaking, and the app was not responding after couple of connections. So don't forget to return jedis resource after whatever you do it:
Jedis j = play.Play.application().plugin(RedisPlugin.class).jedisPool().getResource();
j.subscribe(listener, redisChannel);
play.Play.application().plugin(RedisPlugin.class).jedisPool().returnResource(j);