issue in Sping integration mail idle inbound adapter - email

we use int-mail:imap-idle-channel-adapter to fetch emails from ALIYUN emailBox. the spring integration version is 5.3.1.
The adapter config as follows:
<int-mail:inbound-channel-adapter id="mailAdapter"
store-uri="imaps://XXX%40weikayun.com:***#imap.mxhichina.com/INBOX"
channel="InboundChannel"
auto-startup="true"
should-delete-messages="false"
should-mark-messages-as-read="true"
search-term-strategy="unseenSearchTermStrategy"
java-mail-properties="javaMailProperties"
simple-content="true">
<int:poller max-messages-per-poll="4" fixed-rate="30000"/>
</int-mail:inbound-channel-adapter>
The javaMailProperties config as follows:
<util:properties id="javaMailProperties">
<prop key="mail.imap.socketFactory.class">javax.net.ssl.SSLSocketFactory</prop>
<prop key="mail.imap.socketFactory.fallback">false</prop>
<prop key="mail.store.protocol">imaps</prop>
<prop key="mail.transport.protocol">smtps</prop>
<prop key="mail.smtps.auth">true</prop>
<prop key="mail.debug">false</prop>
<prop key="mail.smtp.starttls.enable">false</prop>
<prop key="mail.imaps.timeout">300000</prop>
</util:properties>
but recently, we always get "FolderClosedException" exceptions as follows:
ImapIdleChannelAdapter: - error occurred in idle task
javax.mail.FolderClosedException: * BYE JavaMail Exception: java.io.IOException: Connection dropped by server?
at com.sun.mail.imap.IMAPFolder.handleIdle(IMAPFolder.java:3199) ~[javax.mail-1.5.5.jar!/:1.5.5]
at com.sun.mail.imap.IMAPFolder.idle(IMAPFolder.java:3043) ~[javax.mail-1.5.5.jar!/:1.5.5]
at com.sun.mail.imap.IMAPFolder.idle(IMAPFolder.java:2995) ~[javax.mail-1.5.5.jar!/:1.5.5]
at org.springframework.integration.mail.ImapMailReceiver.waitForNewMessages(ImapMailReceiver.java:197) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.integration.mail.ImapIdleChannelAdapter$IdleTask.run(ImapIdleChannelAdapter.java:277) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.integration.mail.ImapIdleChannelAdapter$ReceivingTask.run(ImapIdleChannelAdapter.java:249) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93) ~[spring-context-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
ImapIdleChannelAdapter: - Failed to execute IDLE task. Will attempt to resubmit in 10000 milliseconds.
java.lang.IllegalStateException: Failure in 'idle' task. Will resubmit.
at org.springframework.integration.mail.ImapIdleChannelAdapter$IdleTask.run(ImapIdleChannelAdapter.java:295) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.integration.mail.ImapIdleChannelAdapter$ReceivingTask.run(ImapIdleChannelAdapter.java:249) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93) ~[spring-context-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: javax.mail.FolderClosedException: * BYE JavaMail Exception: java.io.IOException: Connection dropped by server?
at com.sun.mail.imap.IMAPFolder.handleIdle(IMAPFolder.java:3199) ~[javax.mail-1.5.5.jar!/:1.5.5]
at com.sun.mail.imap.IMAPFolder.idle(IMAPFolder.java:3043) ~[javax.mail-1.5.5.jar!/:1.5.5]
at com.sun.mail.imap.IMAPFolder.idle(IMAPFolder.java:2995) ~[javax.mail-1.5.5.jar!/:1.5.5]
at org.springframework.integration.mail.ImapMailReceiver.waitForNewMessages(ImapMailReceiver.java:197) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.integration.mail.ImapIdleChannelAdapter$IdleTask.run(ImapIdleChannelAdapter.java:277) ~[spring-integration-mail-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
... 10 more
after that, no emails can be fetched from emailBox until we restart related process flow, it will just fetch one batch of emails and stopped again.
we also open the mail debug log, in exceptional scenario, there are several unseen emails in emailbox, but adapter cannot fetch them, the searching result [UNSEEN 0], the full logs as following:
DEBUG IMAPS: added an Authenticated connection -- size: 1
DEBUG IMAPS: IMAPProtocol noop
A279 NOOP
A279 OK NOOP completed
A280 LIST "" INBOX
* LIST () "/" "INBOX"
A280 OK LIST completed
DEBUG IMAPS: connection available -- size: 1
* 0 EXISTS
* 0 RECENT
* OK [UNSEEN 0]
* OK [UIDNEXT 0] Predicted next UID.
* OK [UIDVALIDITY ] UIDs valid.
* FLAGS (\Answered \Seen \Deleted \Draft \Flagged)
* OK [PERMANENTFLAGS (\Answered \Seen \Deleted \Draft \Flagged)] Limited.
A281 OK [READ-WRITE] SELECT completed
A282 SEARCH UNSEEN ALL
* SEARCH
A282 OK SEARCH completed
A283 IDLE
+ idling
DEBUG IMAP: startIdle: set to IDLE
DEBUG IMAP: startIdle: return true
After restarting the related flow, the unseen emails will be fetched successfully. the searching result [UNSEEN 8], the full logs as following:
DEBUG IMAPS: LOGIN command result: A1 OK LOGIN completed
A2 CAPABILITY
* CAPABILITY IMAP4rev1 IDLE XLIST UIDPLUS ID SASL-IR AUTH=XOAUTH AUTH=EXTERNAL
A2 OK CAPABILITY completed
DEBUG IMAPS: AUTH: XOAUTH
DEBUG IMAPS: AUTH: EXTERNAL
A3 LIST "" INBOX
* LIST () "/" "INBOX"
A3 OK LIST completed
DEBUG IMAPS: connection available -- size: 1
A4 SELECT INBOX
* 18127 EXISTS
* 0 RECENT
* OK [UNSEEN 8]
* OK [UIDNEXT 564053] Predicted next UID.
* OK [UIDVALIDITY 2] UIDs valid.
* FLAGS (\Answered \Seen \Deleted \Draft \Flagged)
* OK [PERMANENTFLAGS (\Answered \Seen \Deleted \Draft \Flagged)] Limited.
A4 OK [READ-WRITE] SELECT completed
A5 SEARCH UNSEEN ALL
* SEARCH 18120 18121 18122 18123 18124 18125 18126 18127
A5 OK SEARCH completed
A6 SEARCH UNSEEN ALL
* SEARCH 18120 18121 18122 18123 18124 18125 18126 18127
A6 OK SEARCH completed
A7 FETCH 18120:18127 (ENVELOPE INTERNALDATE RFC822.SIZE FLAGS BODYSTRUCTURE)
......
any idea about this issue ? this issue is invoked by email server or spring integration config error ? could anyone give some advice ?

Related

Problem with writing data to Hbase table. Coded on IntelliJ using spark-scala-Hbase(standalone mode)

Not able to write to Hbase Table from IntelliJ. Spark- Scala -Hbase
Host file configured as:
# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
# localhost name resolution is handled within DNS itself.
127.0.0.1 localhost
#::1 localhost
Hbase-env.cmd properties:
set JAVA_HOME=%JAVA_HOME%
set HBASE_CLASSPATH=%HBASE_HOME%\lib\client-facing-thirdparty\*
set HBASE_HEAPSIZE=8000
set HBASE_OPTS="-XX:+UseConcMarkSweepGC" "-Djava.net.preferIPv4Stack=true"
set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" %HBASE_GC_OPTS%
set HBASE_USE_GC_LOGFILE=true
set HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false"
set HBASE_MASTER_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10101"
set HBASE_REGIONSERVER_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10102"
set HBASE_THRIFT_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10103"
set HBASE_ZOOKEEPER_OPTS=%HBASE_JMX_BASE% -Dcom.sun.management.jmxremote.port=10104"
set HBASE_REGIONSERVERS=%HBASE_HOME%\conf\regionservers
set HBASE_LOG_DIR=%HBASE_HOME%\logs
set HBASE_IDENT_STRING=%USERNAME%
set HBASE_MANAGES_ZK=true
Hbase-site.xml
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>file:///C:/hbase/hbase-2.2.5/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/C:/hbase/hbase-2.2.5/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name> hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
HMaster Terminal has following details:
[main] zookeeper.RecoverableZooKeeper: Process identifier=master:16000 connecting to ZooKeeper ensemble=localhost:2181
[main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
[main] zookeeper.ZooKeeper: Client environment:host.name=GH-4NR7533.XXXXXXXXX.org (ipaddress:192.168.0.103)
[main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_101
[main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
[main] zookeeper.ZooKeeper: Client environment:java.home=C:\Program Files\Java\jdk1.8.0_144\jre
On IntelliJ I'm getting the following error:
2022-12-26 11:49:17 WARN ProcfsMetricsGetter:69 - Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
2022-12-26 11:49:19 WARN ClientCnxn:1235 - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
2022-12-26 11:49:20 INFO ClientCnxn:1113 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2022-12-26 11:49:20 INFO ClientCnxn:948 - Socket connection established, initiating session, client: /127.0.0.1:56517, server: localhost/127.0.0.1:2181
2022-12-26 11:49:20 WARN ReadOnlyZKClient:192 - 0x0ba4f370 to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 1
2022-12-26 11:49:20 INFO ClientCnxn:1381 - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1854d1460720005, negotiated timeout = 40000
2022-12-26 11:49:33 INFO RpcRetryingCallerImpl:134 - Call exception, tries=6, retries=16, started=11476 ms ago, cancelled=false, msg=Call to 192.168.1.36:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: 192.168.1.36:16020, details=row 'grp_dev:u' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=GH-4NR7533.XXXXXXXXX.org,16020,1672035462187, seqNum=-1, see https://s.apache.org/timeout
2022-12-26 11:49:39 INFO RpcRetryingCallerImpl:134 - Call exception, tries=7, retries=16, started=17518 ms ago, cancelled=false, msg=Call to 192.168.1.36:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: 192.168.1.36:16020, details=row 'grp_dev:u' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=GH-4NR7533.XXXXXXXXX.org,16020,1672035462187, seqNum=-1, see https://s.apache.org/timeout
2022-12-26 11:49:51 INFO RpcRetryingCallerImpl:134 - Call exception, tries=8, retries=16, started=29596 ms ago, cancelled=false, msg=Call to 192.168.1.36:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: 192.168.1.36:16020, details=row 'grp_dev:u' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=GH-4NR7533.XXXXXXXXX.org,16020,1672035462187, seqNum=-1, see https://s.apache.org/timeout
2022-12-26 11:50:03 INFO RpcRetryingCallerImpl:134 - Call exception, tries=9, retries=16, started=41688 ms ago, cancelled=false, msg=Call to 192.168.1.36:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: 192.168.1.36:16020, details=row 'grp_dev:u' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=GH-4NR7533.XXXXXXXXX.org,16020,1672035462187, seqNum=-1, see https://s.apache.org/timeout
2022-12-26 11:50:15 INFO RpcRetryingCallerImpl:134 - Call exception, tries=10, retries=16, started=53794 ms ago, cancelled=false, msg=Call to 192.168.1.36:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: 192.168.1.36:16020, details=row 'grp_dev:u' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=GH-4NR7533.XXXXXXXXX.org,16020,1672035462187, seqNum=-1, see https://s.apache.org/timeout
2022-12-26 11:50:27 INFO RpcRetryingCallerImpl:134 - Call exception, tries=11, retries=16, started=65865 ms ago, cancelled=false, msg=Call to 192.168.1.36:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: 192.168.1.36:16020, details=row 'grp_dev:u' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=GH-4NR7533.XXXXXXXXX.org,16020,1672035462187, seqNum=-1, see https://s.apache.org/timeout`

I can't upload files bigger then 20 M to my S3 bucket

I recently created a S3 bucket at Scaleway.
I mount it using s3fs without apparent problem.
I have problems uploading some "mid size" files.
If the size under 20 M it's ok but for with larger files (50 M and more), the copy fails with message "unable to write file, permission denied".
I contacter scaleway support but they said it's related to my s3fs client.
I mounted the bucket in debug mode, using :
$ sudo s3fs tellurix /mnt/scaleway/ -o passwd_file=${HOME}/.passwd-s3fs,url=https://s3.fr-par.scw.cloud,allow_other -o use_path_request_style,noatime -o dbglevel=info -f -o curldbg
I copy/paste the 100 last lines of the log, because I don't see where the error is .
Thanks a lot for help
* SSL_write() returned SYSCALL, errno = 32
* Closing connection 6
[ERR] curl.cpp:RequestPerform(2546): ### CURLE_SEND_ERROR
* SSL_write() returned SYSCALL, errno = 32
* Closing connection 5
[ERR] curl.cpp:RequestPerform(2546): ### CURLE_SEND_ERROR
[INF] curl.cpp:RequestPerform(2621): ### retrying...
[INF] curl.cpp:RemakeHandle(2248): Retry request. [type=9][url=https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=5&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1][path=/ant/MyHome automation guide 72488.pdf]
* Hostname s3.fr-par.scw.cloud was found in DNS cache
* Trying 2001:bc8:1002::30:443...
* TCP_NODELAY set
* Connected to s3.fr-par.scw.cloud (2001:bc8:1002::30) port 443 (#6)
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL re-using session ID
* SSL_write() returned SYSCALL, errno = 32
* Closing connection 5
[ERR] curl.cpp:RequestPerform(2546): ### CURLE_SEND_ERROR
* old SSL session ID is stale, removing
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* Server certificate:
* subject: CN=s3.fr-par.scw.cloud
* start date: Feb 10 23:20:22 2020 GMT
* expire date: May 10 23:20:22 2020 GMT
* subjectAltName: host "s3.fr-par.scw.cloud" matched cert's "s3.fr-par.scw.cloud"
* issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* SSL certificate verify ok.
> PUT /tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=5&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1 HTTP/1.1
Host: s3.fr-par.scw.cloud
User-Agent: s3fs/1.86 (commit hash 005a684; OpenSSL)
Accept: */*
Content-Length: 10485760
Expect: 100-continue
* SSL_write() returned SYSCALL, errno = 32
* Closing connection 6
[ERR] curl.cpp:RequestPerform(2546): ### CURLE_SEND_ERROR
* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< x-amz-id-2: tx97bf2f1b3ccd47c4a5f91-005eaa999a
< x-amz-request-id: tx97bf2f1b3ccd47c4a5f91-005eaa999a
< Content-Type: application/xml
< Date: Thu, 30 Apr 2020 09:25:46 GMT
< Transfer-Encoding: chunked
* HTTP error before end of send, keep sending
<
[INF] curl.cpp:RequestPerform(2621): ### retrying...
[INF] curl.cpp:RemakeHandle(2248): Retry request. [type=9][url=https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=2&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1][path=/ant/MyHome automation guide 72488.pdf]
[ERR] curl.cpp:RequestPerform(2639): ### giving up
[WAN] curl.cpp:MultiPerform(4340): thread failed - rc(-5)
[INF] curl.cpp:insertV4Headers(2797): computing signature [PUT] [/ant/MyHome automation guide 72488.pdf] [partNumber=6&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1] [34ec149b334729973e407bada5e11b96774acfd1375b8009f789474ecb9bb2bb]
[INF] curl.cpp:url_to_host(99): url is https://s3.fr-par.scw.cloud
* Hostname s3.fr-par.scw.cloud was found in DNS cache
* Trying 2001:bc8:1002::30:443...
* TCP_NODELAY set
* Connected to s3.fr-par.scw.cloud (2001:bc8:1002::30) port 443 (#7)
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL re-using session ID
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* Server certificate:
* subject: CN=s3.fr-par.scw.cloud
* start date: Feb 10 23:20:22 2020 GMT
* expire date: May 10 23:20:22 2020 GMT
* subjectAltName: host "s3.fr-par.scw.cloud" matched cert's "s3.fr-par.scw.cloud"
* issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* SSL certificate verify ok.
> PUT /tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=6&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1 HTTP/1.1
Host: s3.fr-par.scw.cloud
User-Agent: s3fs/1.86 (commit hash 005a684; OpenSSL)
Authorization: AWS4-HMAC-SHA256 Credential=xxxxxx/20200430/fr-par/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=91bbf50cc33a1f1d1cd3f3660fcc116e857223b4f8297b6c796e7dc32f244bac
x-amz-content-sha256: 34ec149b334729973e407bada5e11b96774acfd1375b8009f789474ecb9bb2bb
x-amz-date: 20200430T092546Z
Content-Length: 1132789
Expect: 100-continue
[INF] curl.cpp:RequestPerform(2621): ### retrying...
[INF] curl.cpp:RemakeHandle(2248): Retry request. [type=9][url=https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=1&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1][path=/ant/MyHome automation guide 72488.pdf]
[ERR] curl.cpp:RequestPerform(2639): ### giving up
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* SSL_write() returned SYSCALL, errno = 32
* Closing connection 6
[ERR] curl.cpp:RequestPerform(2546): ### CURLE_SEND_ERROR
[INF] curl.cpp:RequestPerform(2621): ### retrying...
[INF] curl.cpp:RemakeHandle(2248): Retry request. [type=9][url=https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=3&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1][path=/ant/MyHome automation guide 72488.pdf]
[ERR] curl.cpp:RequestPerform(2639): ### giving up
[INF] curl.cpp:RequestPerform(2621): ### retrying...
[INF] curl.cpp:RemakeHandle(2248): Retry request. [type=9][url=https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=4&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1][path=/ant/MyHome automation guide 72488.pdf]
[ERR] curl.cpp:RequestPerform(2639): ### giving up
[INF] curl.cpp:RequestPerform(2621): ### retrying...
[INF] curl.cpp:RemakeHandle(2248): Retry request. [type=9][url=https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=5&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1][path=/ant/MyHome automation guide 72488.pdf]
[ERR] curl.cpp:RequestPerform(2639): ### giving up
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< x-amz-id-2: tx64fa48b5fffb4985bee17-005eaa999a
< Last-Modified: Thu, 30 Apr 2020 09:25:46 GMT
< ETag: "30c5132a619a14608ff0a3d9bac63fe2"
< x-amz-request-id: tx64fa48b5fffb4985bee17-005eaa999a
< x-amz-version-id: 1588238746862950
< Content-Type: text/html; charset=UTF-8
< Date: Thu, 30 Apr 2020 09:25:59 GMT
<
* Connection #7 to host s3.fr-par.scw.cloud left intact
[INF] curl.cpp:RequestPerform(2455): HTTP response code 200
[WAN] curl.cpp:MultiPerform(4374): thread failed - rc(-5)
[WAN] curl.cpp:MultiPerform(4374): thread failed - rc(-5)
[WAN] curl.cpp:MultiPerform(4374): thread failed - rc(-5)
[WAN] curl.cpp:MultiPerform(4374): thread failed - rc(-5)
[WAN] curl.cpp:MultiRead(4400): error from callback function(https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=1&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1).
[WAN] curl.cpp:MultiRead(4400): error from callback function(https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=2&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1).
[WAN] curl.cpp:MultiRead(4400): error from callback function(https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=3&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1).
[WAN] curl.cpp:MultiRead(4400): error from callback function(https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=4&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1).
[WAN] curl.cpp:MultiRead(4400): error from callback function(https://s3.fr-par.scw.cloud/tellurix/ant/MyHome%20automation%20guide%2072488.pdf?partNumber=5&uploadId=YmNkMmE3MWMtMDFhYi00NDhmLTlkYWItMjEyMDA1YTM1Njk1).
[INF] curl.cpp:CompleteMultipartPostRequest(3642): [tpath=/ant/MyHome automation guide 72488.pdf][parts=6]
[ERR] curl.cpp:CompleteMultipartPostRequest(3653): 1 file part is not finished uploading.
[INF] s3fs.cpp:s3fs_release(2358): [path=/ant/MyHome automation guide 72488.pdf][fd=11]
[INF] cache.cpp:DelStat(582): delete stat cache entry[path=/ant/MyHome automation guide 72488.pdf]
[INF] fdcache.cpp:GetFdEntity(2650): [path=/ant/MyHome automation guide 72488.pdf][fd=11]
I successfully mounted and wrote a 500 MB file to scaleway using your command-line arguments. Given the CURLE_SEND_ERROR I wonder if you have some kind of network problem? Maybe try a lower value for -o parallel_count, e.g., 1? See https://github.com/s3fs-fuse/s3fs-fuse/issues/1283#issuecomment-623026911 for the resolution.
From where do you mount your bucket? Is it your PC in your home or a cloud VM? How much time does it take before you receive this error?
I'm asking because "SSL_write() returned SYSCALL, errno = 32" looks like something is closing your connection. "HTTP error before end of send, keep sending" also points for that kind of problem. A timeout maybe occurs? Do you have a NAT gateway between you and your bucket? That can also cause the problem, if it does not care about keepalives as the upload can take relatively long.
As the s3fs wiki says, 20MB is the threshold for multipart uploads instead of single request. Maybe Scaleway has a slightly different API for multipart uploads than Amazon? From the s3fs wiki: "Some providers do not support the full S3 API, e.g., lacking multi-part upload." Please make note that s3fs is mainly intended to work with Amazon S3 and, as I see, Scaleway is not on the list of supported providers in the s3fs wiki: https://github.com/s3fs-fuse/s3fs-fuse/wiki/Non-Amazon-S3.
The last thing, what's your version of libcurl? The s3fs documentation says it should be 7.16 or 7.17. And are you using the latest version of s3fs?

Apache Flink Zookeeper Checkpoint Conflict After Upgrade

I recently upgraded an Apache Flink cluster from 1.3.2 to 1.4.2 and now I get following exception in Zookeeper Logs
2018-06-19 18:45:34,658 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#649] - Got user-level KeeperException when processing sessionid:0x363f3574f420001 type:create cxid:0x2284d zxid:0x900016e9d txntype:-1 reqpath:n/a Error Path:/flink/cluster_one/checkpoints/5e8ad58b9f2ef81b155d0e15b23d2365/0000000000000010305/81783429-1405-4609-9cdb-ce9dc95b8272 Error:KeeperErrorCode = NodeExists for /flink/cluster_one/checkpoints/5e8ad58b9f2ef81b155d0e15b23d2365/0000000000000010305/81783429-1405-4609-9cdb-ce9dc95b8272
Because of this the Apache Flink node where this is thrown is repeatedly kicked out of the cluster and then rejoins.
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#tm03-dev:6124/user/taskmanager#-50864731]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
... 1 more
06/22/2018 16:11:29 Job execution switched to status FAILING.
java.lang.Exception: Cannot deploy task Source: Custom Source -> StringToJSONEvents (1/1) (a78bb6a5d7c90a7b5cee785bc4ac2426) - TaskManager (37d2c90dd316c87974dda50e0b4525d6 # tm03-dev (dataPort=6125)) not responding after a timeout of 10000 ms
at org.apache.flink.runtime.executiongraph.Execution.lambda$deploy$3(Execution.java:529)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#tm03-dev:6124/user/taskmanager#-50864731]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
... 1 more
06/22/2018 16:11:29 FilterLoginFailedEvents -> Sink: SendRequestToService(1/1) switched to CANCELING
06/22/2018 16:11:29 FilterLoginFailedEvents -> Sink: SendRequestToService(1/1) switched to CANCELED
In TaskManager logs this keeps repeating
2018-06-22 15:59:45.168 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 15:59:58.513 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:10.658 [flink-akka.actor.default-dispatcher-8172] INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229/user/jobmanager (attempt 9305, timeout: 30000 milliseconds)
2018-06-22 16:00:11.858 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.ProtocolStateActor flink-akka.remote.default-remote-dispatcher-15 - Association between local [tcp://flink#tm03-flink-dev:25422] and remote [tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] was disassociated because the ProtocolStateActor failed: Unknown
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.ProtocolStateActor flink-akka.remote.default-remote-dispatcher-15 - Association between local [tcp://flink#tm03-flink-dev:25422] and remote [tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] was disassociated because the ProtocolStateActor failed: Unknown
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-15 - Association with remote system [akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-15 - Association with remote system [akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.netty.NettyTransport New I/O worker #2 - Remote connection to [flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] was disconnected because of [id: 0x735f1b8b, /10.11.21.13:25422 :> flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] DISCONNECTED
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.netty.NettyTransport New I/O worker #2 - Remote connection to [flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] was disconnected because of [id: 0x735f1b8b, /10.11.21.13:25422 :> flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] DISCONNECTED
2018-06-22 16:00:25.199 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:38.539 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:40.679 [flink-akka.actor.default-dispatcher-8172] INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229/user/jobmanager (attempt 9306, timeout: 30000 milliseconds)
Another clue, on the leader JobManager such 2 messages are displayed in the log every few seconds:
2018-06-27 17:33:46.632 [jobmanager-future-thread-2] DEBUG o.a.flink.runtime.rest.handler.legacy.metrics.MetricFetcher - Could not retrieve QueryServiceGateway.
java.util.concurrent.CompletionException: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink#tm03-dev:6124/), Path(/user/MetricQueryService_64bde0e9e6f3f0a906a30e88c261c9d7)]
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:442)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Promise$class.failure(Promise.scala:104)
at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:157)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:68)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:66)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:76)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:75)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:558)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:595)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:584)
at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:98)
at akka.remote.ReliableDeliverySupervisor$$anonfun$gated$1.applyOrElse(Endpoint.scala:353)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink#tm03-dev:6124/), Path(/user/MetricQueryService_64bde0e9e6f3f0a906a30e88c261c9d7)]
... 27 common frames omitted
2018-06-27 17:34:01.625 [flink-akka.actor.default-dispatcher-19] DEBUG org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler - Error while handling request.
java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.NotFoundException: Could not find job 93d6fa4fb5b2355bb03253cb80d81ef5.
at org.apache.flink.runtime.rest.handler.legacy.AbstractExecutionGraphRequestHandler.lambda$handleJsonRequest$0(AbstractExecutionGraphRequestHandler.java:70)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:130)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:444)
at akka.dispatch.OnComplete.internal(Future.scala:259)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at org.apache.flink.runtime.jobmanager.MemoryArchivist$$anonfun$handleMessage$1.applyOrElse(MemoryArchivist.scala:123)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.MemoryArchivist.aroundReceive(MemoryArchivist.scala:65)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.rest.NotFoundException: Could not find job 93d6fa4fb5b2355bb03253cb80d81ef5.
... 53 common frames omitted
What does this exception mean?
Are we supposed to erase contents of Zookeeper folders (high-availability.zookeeper.path.root) before upgrade?
This question was related to Apache Flink 1.4.2 akka.actor.ActorNotFound.
After we
restarted the JobManagers
increased TaskManager Memory size from 1GB to 2GB
The issue seems to be gone and the cluster is working fine now.

How to investigate failing dataproc worker processes?

I'm running a PySpark job and Im having trouble determining the cause of failure on worker processes.
While my job is running I started noticing stack traces in the job output such as:
16/04/10 03:24:21 WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1460240417530_0021_01_000003 on host: cluster-2-w-0.c.my-project.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
[Stage 0:=================================> (19 + 13) / 32]16/04/10 03:26:21 WARN org.apache.spark.rpc.netty.NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(2,Container marked as failed: container_1460240417530_0021_01_000003 on host: cluster-2-w-0.c.my-project.internal. Exit status: -100. Diagnostics: Container released on a *lost* node)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:359)
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:176)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 11 more
16/04/10 03:26:40 WARN org.apache.spark.rpc.netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(23,0,Map())] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634)
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242)
... 7 more
[Stage 0:=================================> (19 + 13) / 32]
Ill also notice the overall CPU usage of the cluster slowly drop as worker nodes fail. These nodes seem to permanently fail and do not re-join the cluster:
I'm using preemtible machines but when I check the status of these machines they are still running and have not been preempted. So I'm guessing its something wrong on the worker:
It could be because of the heavy workload in the workers. Try to increase spark.network.timeout (default 120) to a bigger number.
If that is not resolving the error, most likely causes are garbage collection. Try to run a memory profile with the following options.
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -XX:+CMSClassUnloadingEnabled

Unable to perform Sqoop import

I am unable to import data from Mysql to Hdfs.My bashrc & sqoop-env.sh files are fine. Also I am able to run sqoop list-databases command successfully. The problem is with import command it is throwin an outputconnectionfailed exception please refer to below error & help me out:
Blockquote
rahul#ubuntu:~$ sqoop import --connect jdbc:mysql://localhost/rahul
--username root --password 123 --table emp -m1 --target-dir /sqoopimport/emp Warning: /usr/lib/hbase does not exist! HBase imports
will fail. Please set $HBASE_HOME to the root of your HBase
installation. 14/09/09 01:22:45 WARN tool.BaseSqoopTool: Setting your
password on the command-line is insecure. Consider using -P instead.
14/09/09 01:22:45 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset. 14/09/09 01:22:45 INFO tool.CodeGenTool:
Beginning code generation 14/09/09 01:22:45 INFO manager.SqlManager:
Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1 14/09/09
01:22:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.*
FROM emp AS t LIMIT 1 14/09/09 01:22:45 INFO orm.CompilationManager:
HADOOP_MAPRED_HOME is /usr/local/hadoop Note:
/tmp/sqoop-rahul/compile/a81597835880664d34a2ff0e4c7b9b33/emp.java
uses or overrides a deprecated API. Note: Recompile with
-Xlint:deprecation for details. 14/09/09 01:22:46 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-rahul/compile/a81597835880664d34a2ff0e4c7b9b33/emp.jar
14/09/09 01:22:46 WARN manager.MySQLManager: It looks like you are
importing from mysql. 14/09/09 01:22:46 WARN manager.MySQLManager:
This transfer can be faster! Use the --direct 14/09/09 01:22:46 WARN
manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/09/09 01:22:46 INFO manager.MySQLManager: Setting zero DATETIME
behavior to convertToNull (mysql) 14/09/09 01:22:46 INFO
mapreduce.ImportJobBase: Beginning import of emp 14/09/09 01:22:47
INFO mapred.JobClient: Running job: job_201409090100_0003 14/09/09
01:22:48 INFO mapred.JobClient: map 0% reduce 0% 14/09/09 01:22:54
INFO mapred.JobClient: Task Id : attempt_201409090100_0003_m_000000_0,
Status : FAILED java.lang.RuntimeException:
java.lang.RuntimeException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.RuntimeException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:193)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:162)
... 9 more Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2479)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2516)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2301)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:834)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:278)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:187)
... 10 more Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:241)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:305)
... 26 more
14/09/09 01:22:54 WARN mapred.JobClient: Error reading task
outputConnection refused 14/09/09 01:22:54 WARN mapred.JobClient:
Error reading task outputConnection refused 14/09/09 01:22:59 INFO
mapred.JobClient: Task Id : attempt_201409090100_0003_m_000000_1,
Status : FAILED java.lang.RuntimeException:
java.lang.RuntimeException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.RuntimeException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server.
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:193)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:162)
... 9 more Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2479)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2516)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2301)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:834)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:278)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:187)
... 10 more Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:241)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:305)
... 26 more
14/09/09 01:22:59 WARN mapred.JobClient: Error reading task
outputConnection refused 14/09/09 01:22:59 WARN mapred.JobClient:
Error reading task outputConnection refused 14/09/09 01:23:03 INFO
mapred.JobClient: Task Id : attempt_201409090100_0003_m_000000_2,
Status : FAILED java.lang.RuntimeException:
java.lang.RuntimeException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.RuntimeException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications l
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:193)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:162)
... 9 more Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2479)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2516)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2301)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:834)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:278)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:187)
... 10 more Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.(Socket.java:425)
at java.net.Socket.(Socket.java:241)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:305)
... 26 more
14/09/09 01:23:03 WARN mapred.JobClient: Error reading task
outputConnection refused 14/09/09 01:23:03 WARN mapred.JobClient:
Error reading task outputConnection refused 14/09/09 01:23:09 INFO
mapred.JobClient: Job complete: job_201409090100_0003 14/09/09
01:23:09 INFO mapred.JobClient: Counters: 6 14/09/09 01:23:09 INFO
mapred.JobClient: Job Counters 14/09/09 01:23:09 INFO
mapred.JobClient: SLOTS_MILLIS_MAPS=20325 14/09/09 01:23:09 INFO
mapred.JobClient: Total time spent by all reduces waiting after
reserving slots (ms)= 14/09/09 01:23:09 INFO mapred.JobClient:
Total time spent by all maps waiting after reserving slots (ms)=0
14/09/09 01:23:09 INFO mapred.JobClient: Launched map tasks=4
14/09/09 01:23:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/09/09 01:23:09 INFO mapred.JobClient: Failed map tasks=1
14/09/09 01:23:09 INFO mapreduce.ImportJobBase: Transferred 0 bytes in
23.174 seconds (0 bytes/sec) 14/09/09 01:23:09 INFO mapreduce.ImportJobBase: Retrieved 0 records. 14/09/09 01:23:09 ERROR
tool.ImportTool: Error during import: Import job failed!
Blockquote
I fixed it. The problem was that I was using localhost in the import statement, as the sql was running in the same system. when I used the actual IP instead of localhost. then it worked like a charm.
Also i was using root username & password to connect to sql. for some reason it didn't work.So I created another user & granted all privileges to that user.
GRANT ALL PRIVILEGES ON employee.* to 'sqoopuser'#'%' IDENTIFIED BY 'passphrase';
Mistake:sqoop import --connect jdbc:mysql://localhost/rahul --username root --password 123 --table emp -m1 --target-dir /sqoopimport/emp
Correction:sqoop import --connect jdbc:mysql://192.168.202.139:3306/rahul --username sqoopuser --password 123 --table emp -m1 --target-dir /sqoopimport/emp
There are three checkpoints:
Make sure mysql service accessible -> Manually connect to localhost:3306 as root.
Make sure no firewall restriction to localhost:3306
Download the most recent mysql-connector to lib and then import again.
The Solution in my case was to use the proper IP address instead of the "localhost" in sqoop command:
sqoop import --connect jdbc:mysql://192.168.69.69:3306/testdb --username root -P --table TESTABLE --target-dir /data/import