Sqoop installation export and import from postgresql - postgresql

I v'e just installed sqoop and was testing it . I tried to export some data from hdfs to postgresql using sqoop. When I run it it throws the following exception : java.io.IOException: Can't export data, please check task tracker logs . I think there may also have been a problem in installation.
The File content is :
ustNU 45
MB1bA 0
gNbCO 76
iZP10 39
B2aoo 45
SI7eG 93
5sC4k 60
2IhFV 2
u2A48 16
yvy6R 51
LNhsV 26
mZ2yn 65
80Gp3 43
Wk5Ag 85
VUfyp 93
P077j 94
f1Oj5 11
LxJkg 72
0H7NP 99
Dk406 25
g4KRp 76
Fw3U0 80
6LD59 1
07KHx 91
F1S88 72
Bnb0v 85
A2qM7 79
Z6cAt 81
0M3DO 23
m0s09 44
KIvwd 13
GNUD0 78
um93a 20
19bHv 75
4Of3s 75
5hFen 16
This is the posgres table:
Table "public.mysort"
Column | Type | Modifiers
--------+---------+-----------
name | text |
marks | integer |
The sqoop command is:
sqoop export --connect jdbc:postgresql://localhost/testdb --username akshay --password akshay --table mysort -m 1 --export-dir MySort/input
Followed by the error:
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
14/06/11 18:28:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/06/11 18:28:06 INFO manager.SqlManager: Using default fetchSize of 1000
14/06/11 18:28:06 INFO tool.CodeGenTool: Beginning code generation
14/06/11 18:28:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "mysort" AS t LIMIT 1
14/06/11 18:28:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hduser/compile/0402ad4b5cf7980040264af35de406cb/mysort.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/06/11 18:28:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/0402ad4b5cf7980040264af35de406cb/mysort.jar
14/06/11 18:28:07 INFO mapreduce.ExportJobBase: Beginning export of mysort
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hbase/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/11 18:28:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/11 18:28:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/06/11 18:28:23 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/06/11 18:28:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/06/11 18:28:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/06/11 18:28:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/11 18:28:24 INFO input.FileInputFormat: Total input paths to process : 1
14/06/11 18:28:24 INFO input.FileInputFormat: Total input paths to process : 1
14/06/11 18:28:25 INFO mapreduce.JobSubmitter: number of splits:1
14/06/11 18:28:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402488523460_0003
14/06/11 18:28:25 INFO impl.YarnClientImpl: Submitted application application_1402488523460_0003
14/06/11 18:28:25 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1402488523460_0003/
14/06/11 18:28:25 INFO mapreduce.Job: Running job: job_1402488523460_0003
14/06/11 18:28:46 INFO mapreduce.Job: Job job_1402488523460_0003 running in uber mode : false
14/06/11 18:28:46 INFO mapreduce.Job: map 0% reduce 0%
14/06/11 18:29:04 INFO mapreduce.Job: Task Id : attempt_1402488523460_0003_m_000000_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:839)
at mysort.__loadFromFields(mysort.java:198)
at mysort.parse(mysort.java:147)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
14/06/11 18:29:23 INFO mapreduce.Job: Task Id : attempt_1402488523460_0003_m_000000_1, Status : FAILED
Error: java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:839)
at mysort.__loadFromFields(mysort.java:198)
at mysort.parse(mysort.java:147)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
14/06/11 18:29:42 INFO mapreduce.Job: Task Id : attempt_1402488523460_0003_m_000000_2, Status : FAILED
Error: java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:839)
at mysort.__loadFromFields(mysort.java:198)
at mysort.parse(mysort.java:147)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
14/06/11 18:30:03 INFO mapreduce.Job: map 100% reduce 0%
14/06/11 18:30:03 INFO mapreduce.Job: Job job_1402488523460_0003 failed with state FAILED due to: Task failed task_1402488523460_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/06/11 18:30:03 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=69336
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=69336
Total vcore-seconds taken by all map tasks=69336
Total megabyte-seconds taken by all map tasks=71000064
14/06/11 18:30:03 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/06/11 18:30:03 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 100.1476 seconds (0 bytes/sec)
14/06/11 18:30:03 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/06/11 18:30:03 INFO mapreduce.ExportJobBase: Exported 0 records.
14/06/11 18:30:03 ERROR tool.ExportTool: Error during export: Export job failed!
This is the log file :
2014-06-11 17:54:37,601 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-06-11 17:54:37,602 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-06-11 17:54:52,678 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-06-11 17:54:52,777 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-06-11 17:54:52,846 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-06-11 17:54:52,847 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2014-06-11 17:54:52,855 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-06-11 17:54:52,855 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1402488523460_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#971d0d8)
2014-06-11 17:54:52,901 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2014-06-11 17:54:53,165 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/hadoop-hduser/nm-local-dir/usercache/hduser/appcache/application_1402488523460_0002
2014-06-11 17:54:53,249 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-06-11 17:54:53,249 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-06-11 17:54:53,393 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2014-06-11 17:54:53,689 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2014-06-11 17:54:53,899 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hduser/MySort/input/data.txt:0+891082
2014-06-11 17:54:53,904 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2014-06-11 17:54:53,904 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2014-06-11 17:54:53,904 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
2014-06-11 17:54:54,028 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2014-06-11 17:54:54,028 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: Exception raised during data export
2014-06-11 17:54:54,028 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2014-06-11 17:54:54,028 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: Exception:
java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:839)
at mysort.__loadFromFields(mysort.java:198)
at mysort.parse(mysort.java:147)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2014-06-11 17:54:54,030 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: On input: ustNU 45
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: On input file: hdfs://localhost:9000/user/hduser/MySort/input/data.txt
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: At position 0
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: Currently processing split:
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: Paths:/user/hduser/MySort/input/data.txt:0+891082
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: This issue might not necessarily be caused by current input
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper: due to the batching nature of export.
2014-06-11 17:54:54,031 ERROR [main] org.apache.sqoop.mapreduce.TextExportMapper:
2014-06-11 17:54:54,032 INFO [Thread-12] org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
2014-06-11 17:54:54,033 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.IOException: Can't export data, please check task tracker logs
2014-06-11 17:54:54,033 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:839)
at mysort.__loadFromFields(mysort.java:198)
at mysort.parse(mysort.java:147)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
2014-06-11 17:54:54,037 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Any help in resolving the issue is appreciated.

Here is the complete procedure for installation and import and export commands for Sqoop. Hope fully it may be helpful to some one. This one is tried and tested by me and actually works.
Download : apache.mirrors.tds.net/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
sudo mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/lib/sqoop
copy paste followingtwo lines in .bashrc
export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
Go to /usr/lib/sqoop/conf folder and copy sqoop-env-template.sh to new file sqoop-env.sh and modify export HADOOP_HOME ,HBASE_HOME,etc to the installation directory
Download the postgresql conector jar file from jdbc.postgresql.org/download/postgresql-9.3-1101.jdbc41.jar
create a directory manager.d in sqoop/conf/
create a file postgresql in conf/ and add the following line in it
org.postgresql.Driver=/usr/lib/sqoop/lib/postgresql-9.3-1101.jdbc41.jar
name the connector.jar file accordingly
For Export
Create a user in postgres:
createuser -P -s -e ace
Enter password for new role: ace
Enter it again: ace
CREATE DATABASE testdb OWNER ace TABLESPACE ace;
create table stud1(id int,name text);
Create a file student.txt
Add lines such as:
1,Ace
2,iloveapis
hadoop fs -put student.txt
sqoop export --connect jdbc:postgresql://localhost:5432/testdb --username ace --password ace --table stud1 -m 1 --export-dir student.txt
check in postgres: Select * from stud1;
For Import:
sqoop import --connect jdbc:postgresql://localhost:5432/testdb --username akshay --password akshay --table stud1 --m 1
hadoop fs -ls -R stud1
Expected Output:
-rw-r--r-- 1 hduser supergroup 0 2014-06-13 18:10 stud1/_SUCCESS
-rw-r--r-- 1 hduser supergroup 21 2014-06-13 18:10 stud1/part-m-00000
hadoop fs -cat stud1/part-m-00000
Expected Output:
1,Ace
2,iloveapis
hadoop fs -copyToLocal stud1/part-m-00000 $HOME/imported_data.txt

Related

Issue while setting up nifi cluster

I followed the below tutorial to setup nifi cluster. Performed everything as mentioned, but receiving the below error.
https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
But the zookeeper was unable to elect a leader and getting the following error message.
2018-04-20 09:27:40,942 INFO [main] org.wali.MinimalLockingWriteAheadLog Successfully recovered 0 records in 3 milliseconds 2018-04-20 09:27:41,007 INFO [main] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#27caa186 checkpointed with 0 Records and 0 Swap Files in 65 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 7 millis), max Transaction ID -1 2018-04-20 09:27:41,300 INFO [main] o.a.z.server.DatadirCleanupManager autopurge.snapRetainCount set to 30 2018-04-20 09:27:41,300 INFO [main] o.a.z.server.DatadirCleanupManager autopurge.purgeInterval set to 24 2018-04-20 09:27:41,311 INFO [main] o.a.n.c.s.server.ZooKeeperStateServer Starting Embedded ZooKeeper Peer 2018-04-20 09:27:41,319 INFO [PurgeTask] o.a.z.server.DatadirCleanupManager Purge task started. 2018-04-20 09:27:41,343 INFO [PurgeTask] o.a.z.server.DatadirCleanupManager Purge task completed. 2018-04-20 09:27:41,621 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected... 2018-04-20 09:27:41,907 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting 2018-04-20 09:27:49,882 WARN [main] o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine the Elected Leader for role 'Cluster Coordinator' due to org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator; assuming no leader has been elected 2018-04-20 09:27:49,885 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting 2018-04-20 09:27:49,986 INFO [main] o.apache.nifi.controller.FlowController It appears that no Cluster Coordinator has been Elected yet. Registering for Cluster Coordinator Role. 2018-04-20 09:27:49,988 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election. 2018-04-20 09:27:49,988 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting 2018-04-20 09:27:49,997 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election. 2018-04-20 09:27:49,997 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started 2018-04-20 09:27:49,997 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started 2018-04-20 09:27:58,179 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#76d9fe26{/nifi-api,file:///root/nifi-1.6.0/work/jetty/nifi-web-api-1.6.0.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.6.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-api-1.6.0.war} 2018-04-20 09:27:59,657 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=861ms 2018-04-20 09:27:59,834 INFO [main] o.e.j.C./nifi-content-viewer No Spring WebApplicationInitializer types detected on classpath 2018-04-20 09:27:59,840 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#43b41cb9{/nifi-content-viewer,file:///root/nifi-1.6.0/work/jetty/nifi-web-content-viewer-1.6.0.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.6.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-content-viewer-1.6.0.war} 2018-04-20 09:27:59,863 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.s.h.ContextHandler#49825659{/nifi-docs,null,AVAILABLE} 2018-04-20 09:28:00,079 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=69ms 2018-04-20 09:28:00,083 INFO [main] o.e.jetty.ContextHandler./nifi-docs No Spring WebApplicationInitializer types detected on classpath 2018-04-20 09:28:00,171 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#2ab0ca04{/nifi-docs,file:///root/nifi-1.6.0/work/jetty/nifi-web-docs-1.6.0.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.6.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-docs-1.6.0.war} 2018-04-20 09:28:00,343 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=119ms 2018-04-20 09:28:00,344 INFO [main] org.eclipse.jetty.ContextHandler./ No Spring WebApplicationInitializer types detected on classpath 2018-04-20 09:28:00,454 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#67e255cf{/,file:///root/nifi-1.6.0/work/jetty/nifi-web-error-1.6.0.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.6.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-error-1.6.0.war} 2018-04-20 09:28:00,541 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#6d9f624{HTTP/1.1,[http/1.1]}{node-1:8080} 2018-04-20 09:28:00,559 INFO [main] org.eclipse.jetty.server.Server Started #98628ms 2018-04-20 09:28:00,599 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow... 2018-04-20 09:28:00,688 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 9999 2018-04-20 09:28:00,916 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow 2018-04-20 09:28:01,337 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: node-1:8080 2018-04-20 09:28:07,922 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2018-04-20 09:28:07,927 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#188207a7 Connection State changed to SUSPENDED 2018-04-20 09:28:07,930 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-04-19 15:56:55,209 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I set the following properties on all 3 nodes:
mkdir ./state
mkdir ./state/zookeeper
echo 1 > ./state/zookeeper/myid (2 & 3 for other nodes)
in zookeeper.properties
server.1=node-1:2888:3888
server.2=node-2:2888:3888
server.3=node-3:2888:3888
in nifi.properties
nifi.state.management.embedded.zookeeper.start=true
nifi.zookeeper.connect.string=node-1:2181,node-2:2181,node-3:2181
nifi.cluster.protocol.is.secure=false
nifi.cluster.is.node=true
nifi.cluster.node.address=node-1(node-2 & node-3)
nifi.cluster.node.protocol.port=9999
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.firewall.file=
nifi.remote.input.host=node-1(node-2 & node-3)
nifi.remote.input.secure=false
nifi.remote.input.socket.port=9998
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.web.http.host=node-1(node-2 & node-3)
After Bryan's comment, got reminded that I forgot to disable the iptables (I guess I can even open the ports but since this is for my practice I am disabling the iptables) and its working now.

Spark submit runs successfully but when submitted through oozie it fails to connect to hive

I am using CDH 5.9.0, Spark 1.6 and Scala 2.10.0. I have created a scala and spark program to create a table and load data from a file to hive. When I run it using spark submit, it completes. But the same program when submitted through oozie, it throws the below exception.
Below is the exception.
Log Type: stdout
Log Upload Time: Fri Oct 27 10:08:28 -0400 2017
Log Length: 172584
2017-10-27 10:08:20,652 INFO [main] yarn.ApplicationMaster (SignalLogger.scala:register(47)) - Registered signal handlers for [TERM, HUP, INT]
2017-10-27 10:08:21,306 INFO [main] yarn.ApplicationMaster (Logging.scala:logInfo(58)) - ApplicationAttemptId: appattempt_1507999204018_0292_000001
2017-10-27 10:08:21,952 INFO [main] spark.SecurityManager (Logging.scala:logInfo(58)) - Changing view acls to: username
2017-10-27 10:08:21,953 INFO [main] spark.SecurityManager (Logging.scala:logInfo(58)) - Changing modify acls to: username
2017-10-27 10:08:21,956 INFO [main] spark.SecurityManager (Logging.scala:logInfo(58)) - SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(username); users with modify permissions: Set(username)
2017-10-27 10:08:21,970 INFO [main] yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Starting the user application in a separate Thread
2017-10-27 10:08:21,997 INFO [main] yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Waiting for spark context initialization
2017-10-27 10:08:21,998 INFO [main] yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Waiting for spark context initialization ...
2017-10-27 10:08:22,308 WARN [Driver] security.UserGroupInformation (UserGroupInformation.java:doAs(1701)) - PriviledgedActionException as:username (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2017-10-27 10:08:22,309 WARN [Driver] ipc.Client (Client.java:run(682)) - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2017-10-27 10:08:22,310 WARN [Driver] security.UserGroupInformation (UserGroupInformation.java:doAs(1701)) - PriviledgedActionException as:username (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2017-10-27 10:08:22,391 INFO [Driver] spark.SparkContext (Logging.scala:logInfo(58)) - Running Spark version 1.6.0
2017-10-27 10:08:22,417 INFO [Driver] spark.SecurityManager (Logging.scala:logInfo(58)) - Changing view acls to: username
2017-10-27 10:08:22,418 INFO [Driver] spark.SecurityManager (Logging.scala:logInfo(58)) - Changing modify acls to: username
2017-10-27 10:08:22,418 INFO [Driver] spark.SecurityManager (Logging.scala:logInfo(58)) - SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(username); users with modify permissions: Set(username)
2017-10-27 10:08:22,572 INFO [Driver] util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'sparkDriver' on port 44049.
2017-10-27 10:08:22,901 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-4] slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2017-10-27 10:08:22,936 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2017-10-27 10:08:23,062 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#a.b.c.d:38305]
2017-10-27 10:08:23,064 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem#a.b.c.d:38305]
2017-10-27 10:08:23,174 INFO [Driver] util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'sparkDriverActorSystem' on port 38305.
2017-10-27 10:08:23,195 INFO [Driver] spark.SparkEnv (Logging.scala:logInfo(58)) - Registering MapOutputTracker
2017-10-27 10:08:23,207 INFO [Driver] spark.SparkEnv (Logging.scala:logInfo(58)) - Registering BlockManagerMaster
2017-10-27 10:08:23,216 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/01/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-ba42749b-3498-4c1d-ba8b-dc6720e815a0
2017-10-27 10:08:23,217 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/02/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-d9375d30-699d-4e40-8b42-559f79f27f85
2017-10-27 10:08:23,217 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-fc2caf3b-3fa0-4f1e-be01-b33b6f6d52d5
2017-10-27 10:08:23,217 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/04/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-450319a4-2d4f-4159-a633-3dd2a71bafe1
2017-10-27 10:08:23,217 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/05/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-c3dbf9b3-cb95-4104-b4bf-9e7b1987e210
2017-10-27 10:08:23,217 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/06/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-5d9c58a6-29bb-4e8e-a8fb-3720db0004d4
2017-10-27 10:08:23,218 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/07/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-999eecaf-f183-4ede-8845-eeb57a87276b
2017-10-27 10:08:23,218 INFO [Driver] storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /data/08/yarn/nm/usercache/username/appcache/application_1507999204018_0292/blockmgr-216d2449-14b1-45aa-b6c6-d6271815f485
2017-10-27 10:08:23,221 INFO [Driver] storage.MemoryStore (Logging.scala:logInfo(58)) - MemoryStore started with capacity 491.7 MB
2017-10-27 10:08:23,283 INFO [Driver] spark.SparkEnv (Logging.scala:logInfo(58)) - Registering OutputCommitCoordinator
2017-10-27 10:08:23,394 INFO [Driver] ui.JettyUtils (Logging.scala:logInfo(58)) - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2017-10-27 10:08:23,413 INFO [Driver] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2017-10-27 10:08:23,448 INFO [Driver] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SelectChannelConnector#0.0.0.0:36123
2017-10-27 10:08:23,448 INFO [Driver] util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'SparkUI' on port 36123.
2017-10-27 10:08:23,449 INFO [Driver] ui.SparkUI (Logging.scala:logInfo(58)) - Started SparkUI at http://a.b.c.d:36123
2017-10-27 10:08:23,498 INFO [Driver] cluster.YarnClusterScheduler (Logging.scala:logInfo(58)) - Created YarnClusterScheduler
2017-10-27 10:08:23,524 INFO [Driver] util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44418.
2017-10-27 10:08:23,525 INFO [Driver] netty.NettyBlockTransferService (Logging.scala:logInfo(58)) - Server created on 44418
2017-10-27 10:08:23,527 INFO [Driver] storage.BlockManager (Logging.scala:logInfo(58)) - external shuffle service port = 7337
2017-10-27 10:08:23,527 INFO [Driver] storage.BlockManagerMaster (Logging.scala:logInfo(58)) - Trying to register BlockManager
2017-10-27 10:08:23,530 INFO [dispatcher-event-loop-11] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager a.b.c.d:44418 with 491.7 MB RAM, BlockManagerId(driver, a.b.c.d, 44418)
2017-10-27 10:08:23,533 INFO [Driver] storage.BlockManagerMaster (Logging.scala:logInfo(58)) - Registered BlockManager
2017-10-27 10:08:24,106 INFO [Driver] scheduler.EventLoggingListener (Logging.scala:logInfo(58)) - Logging events to hdfs://.../user/spark/applicationHistory/application_1507999204018_0292_1
2017-10-27 10:08:24,133 INFO [Driver] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(58)) - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
2017-10-27 10:08:24,133 INFO [Driver] cluster.YarnClusterScheduler (Logging.scala:logInfo(58)) - YarnClusterScheduler.postStartHook done
2017-10-27 10:08:24,140 INFO [dispatcher-event-loop-13] cluster.YarnSchedulerBackend$YarnSchedulerEndpoint (Logging.scala:logInfo(58)) - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM#a.b.c.d:44049)
2017-10-27 10:08:24,191 INFO [main] yarn.YarnRMClient (Logging.scala:logInfo(58)) - Registering the ApplicationMaster
2017-10-27 10:08:24,295 INFO [main] yarn.ApplicationMaster (Logging.scala:logInfo(58)) - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
2017-10-27 10:08:25,107 INFO [Driver] hive.HiveContext (Logging.scala:logInfo(58)) - Initializing execution hive, version 1.1.0
2017-10-27 10:08:25,146 INFO [Driver] client.ClientWrapper (Logging.scala:logInfo(58)) - Inspected Hadoop version: 2.6.0-cdh5.9.0
2017-10-27 10:08:25,147 INFO [Driver] client.ClientWrapper (Logging.scala:logInfo(58)) - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.9.0
2017-10-27 10:08:25,582 INFO [Driver] metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(644)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2017-10-27 10:08:25,600 INFO [Driver] metastore.ObjectStore (ObjectStore.java:initialize(333)) - ObjectStore, initialize called
2017-10-27 10:08:25,671 WARN [Driver] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/container_e69_1507999204018_0292_01_000001/datanucleus-core-3.2.2.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/data/05/yarn/nm/filecache/507/datanucleus-core-3.2.2.jar."
2017-10-27 10:08:25,687 WARN [Driver] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/container_e69_1507999204018_0292_01_000001/datanucleus-api-jdo-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/data/07/yarn/nm/filecache/582/datanucleus-api-jdo-3.2.1.jar."
2017-10-27 10:08:25,688 WARN [Driver] DataNucleus.General (Log4JLogger.java:warn(96)) - Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/data/08/yarn/nm/filecache/554/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/container_e69_1507999204018_0292_01_000001/datanucleus-rdbms-3.2.1.jar."
2017-10-27 10:08:25,709 INFO [Driver] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2017-10-27 10:08:25,710 INFO [Driver] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will be ignored
2017-10-27 10:08:26,178 WARN [Driver] bonecp.BoneCPConfig (BoneCPConfig.java:sanitize(1537)) - Max Connections < 1. Setting to 20
2017-10-27 10:08:26,180 ERROR [Driver] Datastore.Schema (Log4JLogger.java:error(125)) - Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=/data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/container_e69_1507999204018_0292_01_000001/tmp/spark-633fb1f8-1f38-44ac-a54e-81465354bedc/metastore;create=true, username = APP. Terminating connection pool. Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:derby:;databaseName=/data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/container_e69_1507999204018_0292_01_000001/tmp/spark-633fb1f8-1f38-44ac-a54e-81465354bedc/metastore;create=true
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:254)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:305)
at com.jolbox.bonecp.BoneCPDataSource.maybeInit(BoneCPDataSource.java:150)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:112)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:479)
at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:304)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1069)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:359)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:768)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:326)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:411)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:440)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:335)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:291)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:648)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:626)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:675)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5999)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:203)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1528)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:67)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:82)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3037)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3056)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3281)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:201)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:324)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:285)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:260)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:514)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:220)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:210)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:464)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:463)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at prfrx.externaltableerror$.main(externaltableerror.scala:28)
at prfrx.externaltableerror.main(externaltableerror.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:312)
at com.jolbox.bonecp.BoneCPDataSource.maybeInit(BoneCPDataSource.java:150)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:112)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:479)
at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:304)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1069)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:359)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:768)
... 62 more
Caused by: java.sql.SQLException: No suitable driver found for jdbc:derby:;databaseName=/data/03/yarn/nm/usercache/username/appcache/application_1507999204018_0292/container_e69_1507999204018_0292_01_000001/tmp/spark-633fb1f8-1f38-44ac-a54e-81465354bedc/metastore;create=true
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:254)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:305)
Below is the code I am using.
object externaltableerror {
def main(args: Array[String]) {
val conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://...")
conf.addResource("hdfs://.../core-site.xml");
conf.addResource("hdfs://.../hdfs-site.xml");
conf.addResource("hdfs://.../hive-site.xml");
val fs = FileSystem.get(conf)
val os = fs.create(new Path("/.../Error.txt"))
try {
//System.setProperty("hive.metastore.uris", "thrift://...");
val sc = new SparkContext(new SparkConf().setAppName("withhive"))
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val files = sc.textFile("hdfs://.../Example.txt").first()
val rdd = sc.parallelize(List(files))
val fm = rdd.flatMap(line => line.split("\t")).map(x => x.concat(" string"))
val alternative = fm.reduce((s1, s2) => s1 + "," + s2)
val ddl = "Create external table table_name(" + alternative + ") ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 'hdfs://.../' tblproperties (\"skip.header.line.count\"=\"1\")"
hiveContext.sql(ddl)
sc.stop()
} catch{
// case e : Exception => new PrintWriter("hdfs://.../Error.txt") { write(e.getStackTrace.mkString("\n")); close }
// println("H" + e.getStackTrace)
case e : Exception => os.write(e.getStackTrace.mkString("\n").getBytes)
}
}
}
Any suggestions on how to get the job running with oozie will be of great help. Thanks!
I had the same issue - I fixed it by using the parameter --files /etc/hive/conf/hive-site.xml in my spark-submit job. (first I tried it in the shell and then in oozie, because I launched a .sh file that contains the spark-submit sencence)

Storing HDFS data into MongoDB using Pig

I am new to Hadoop and having a requirement to store the Hadoop data into MongoDB. Here I am using Pig to store the data in Hadoop into MongoDB.
I downloaded and registered the following drivers to do this in Pig Grunt shell with the help of given command,
REGISTER /home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar
REGISTER /home/miracle/Downloads/mongo-java-driver-3.4.2.jar
REGISTER /home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar
After this I successfully got the data from MongoDB using the following command.
raw = LOAD 'mongodb://localhost:27017/pavan.pavan.in' USING com.mongodb.hadoop.pig.MongoLoader;
Then I tried the following command to insert the data from pig bag to MongoDB and got succeeded.
STORE student INTO 'mongodb://localhost:27017/pavan.persons_info' USING com.mongodb.hadoop.pig.MongoInsertStorage('','');
Then I am trying the Mongo Update using the below command.
STORE student INTO 'mongodb://localhost:27017/pavan.persons_info1' USING com.mongodb.hadoop.pig.MongoUpdateStorage(' ','{first:"\$firstname", last:"\$lastname", phone:"\$phone", city:"\$city"}','firstname: chararray,lastname: chararray,phone: chararray,city: chararray');
But I am getting the below error while performing the above command.
2017-03-22 11:16:42,516 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-22 11:16:43,064 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:43,180 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2017-03-22 11:16:43,306 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-22 11:16:43,308 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2017-03-22 11:16:43,309 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-03-22 11:16:43,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-03-22 11:16:43,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-22 11:16:43,419 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-22 11:16:43,423 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2017-03-22 11:16:43,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-03-22 11:16:43,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2017-03-22 11:16:43,603 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-java-driver-3.0.4.jar to DistributedCache through /tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar
2017-03-22 11:16:43,687 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar
2017-03-22 11:16:43,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar
2017-03-22 11:16:44,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar
2017-03-22 11:16:44,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp159471787/tmp413788756/automaton-1.11-8.jar
2017-03-22 11:16:44,830 [DataStreamer for file /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar block BP-1303579226-127.0.1.1-1489750707340:blk_1073742392_1568] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:577)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:573)
2017-03-22 11:16:44,856 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar
2017-03-22 11:16:44,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar
2017-03-22 11:16:44,996 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:45,004 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-03-22 11:16:45,147 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-03-22 11:16:45,166 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-22 11:16:45,253 [JobControl] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:45,318 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2017-03-22 11:16:45,572 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-03-22 11:16:45,579 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 11:16:45,581 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-03-22 11:16:45,593 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-03-22 11:16:45,690 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2017-03-22 11:16:45,884 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1070093788_0006
2017-03-22 11:16:47,476 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar <- /home/miracle/mongo-java-driver-3.0.4.jar
2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar
2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar <- /home/miracle/mongo-hadoop-core-2.0.2.jar
2017-03-22 11:16:47,674 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar
2017-03-22 11:16:48,194 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar <- /home/miracle/mongo-hadoop-pig-2.0.2.jar
2017-03-22 11:16:48,201 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar
2017-03-22 11:16:48,329 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar <- /home/miracle/pig-0.16.0-core-h2.jar
2017-03-22 11:16:48,337 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar
2017-03-22 11:16:48,338 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar <- /home/miracle/automaton-1.11-8.jar
2017-03-22 11:16:48,370 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp413788756/automaton-1.11-8.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar
2017-03-22 11:16:48,371 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar <- /home/miracle/antlr-runtime-3.4.jar
2017-03-22 11:16:48,384 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar
2017-03-22 11:16:48,389 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar <- /home/miracle/joda-time-2.9.3.jar
2017-03-22 11:16:48,409 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar
2017-03-22 11:16:48,798 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar
2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar
2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar
2017-03-22 11:16:48,804 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar
2017-03-22 11:16:48,806 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar
2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar
2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar
2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2017-03-22 11:16:48,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1070093788_0006
2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student1
2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student1[7,11] C: R:
2017-03-22 11:16:48,889 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1070093788_0006]
2017-03-22 11:16:48,999 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:49,011 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-22 11:16:49,054 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2017-03-22 11:16:49,094 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:49,104 [Thread-455] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up job.
2017-03-22 11:16:49,126 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2017-03-22 11:16:49,127 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1070093788_0006_m_000000_0
2017-03-22 11:16:49,253 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:49,279 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:49,290 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up task.
2017-03-22 11:16:49,296 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2017-03-22 11:16:49,340 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 212
Input split[0]:
Length = 212
ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
Locations:
-----------------------
2017-03-22 11:16:49,415 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2017-03-22 11:16:49,417 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://localhost:9000/input/student_dir/student_Info.txt:0+212
2017-03-22 11:16:49,459 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:49,684 [LocalJobRunner Map Task Executor #0] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2017-03-22 11:16:50,484 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoRecordWriter - Writing to temporary file: /tmp/hadoop-miracle/attempt_local1070093788_0006_m_000000_0/_MONGO_OUT_TEMP/_out
2017-03-22 11:16:50,516 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Preparing to write to com.mongodb.hadoop.output.MongoRecordWriter#1fd6ae6
2017-03-22 11:16:50,736 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752
2017-03-22 11:16:50,739 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2017-03-22 11:16:50,746 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: student1[7,11] C: R:
2017-03-22 11:16:50,880 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2017-03-22 11:16:50,919 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017]
2017-03-22 11:16:50,963 [Thread-455] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1070093788_0006
java.lang.Exception: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson:
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson:
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:261)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Couldn't convert tuple to bson:
at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
... 17 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117)
at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120)
at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142)
... 18 more
2017-03-22 11:16:53,944 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2017-03-22 11:16:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1070093788_0006 has failed! Stop running all dependent jobs
2017-03-22 11:16:53,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-03-22 11:16:53,949 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-22 11:16:53,954 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-22 11:16:53,962 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2017-03-22 11:16:53,981 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.3 0.16.0 miracle 2017-03-22 11:16:43 2017-03-22 11:16:53 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1070093788_0006 student1 MAP_ONLY Message: Job failed! mongodb://localhost:27017/pavan.persons_info1,
Input(s):
Failed to read data from "hdfs://localhost:9000/input/student_dir/student_Info.txt"
Output(s):
Failed to produce result in "mongodb://localhost:27017/pavan.persons_info1"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1070093788_0006
2017-03-22 11:16:53,983 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2017-03-22 11:16:54,004 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias student1
Details at logfile: /home/miracle/pig_1490205716326.log
Here is the input that I have dumped here,
Input(s):
Successfully read 6 records (5378419 bytes) from: "hdfs://localhost:9000/input/student_dir/student_Info.txt"
Output(s):
Successfully stored 6 records (5378449 bytes) in: "hdfs://localhost:9000/tmp/temp-1419179625/tmp882976412"
Counters:
Total records written : 6
Total bytes written : 5378449
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1866034015_0001
2017-03-23 02:43:37,677 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-23 02:43:37,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-23 02:43:37,689 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-03-23 02:43:37,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2017-03-23 02:43:37,748 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-03-23 02:43:37,751 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2017-03-23 02:43:37,793 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-23 02:43:37,793 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(Rajiv,Reddy,9848022337,Hyderabad)
(siddarth,Battacharya,9848022338,Kolkata)
(Rajesh,Khanna,9848022339,Delhi)
(Preethi,Agarwal,9848022330,Pune)
(Trupthi,Mohanthy,9848022336,Bhuwaneshwar)
(Archana,Mishra,9848022335,Chennai.)
I don't know What to do next please give me any suggestions on this.
I do not use Mongo , but from HBase experience and From the error it looks like few fields are not flatten 'ed enough for it to fit the column family. And also few of columns you are trying to STORE don't exist.
See to DUMP the data instead of STORE and check if it is matching the STORE structure you have applied.
Caused by: java.io.IOException: Couldn't convert tuple to bson: at
com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1,
Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at
java.util.ArrayList.get(ArrayList.java:429) at
org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117) at
com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120)
at
com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142)
... 18 more

RuntimeException when nutch generate

I'm new to nutch. I have installed nutch 2.3.1 and configure it to use mongodb. The inject operation was successful but when I try to generate it generate an exception (see below).
NB : This error is generated with a seed file containing 60K urls. So I've tried with 100 urls and everything went well.
Do you have an idea what is the cause of this error ? Thanks !!!
2016-12-30 00:01:48,446 INFO crawl.GeneratorJob - GeneratorJob: starting at 2016-12-30 00:01:48
2016-12-30 00:01:48,447 INFO crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for fetch.
2016-12-30 00:01:48,447 INFO crawl.GeneratorJob - GeneratorJob: starting
2016-12-30 00:01:48,448 INFO crawl.GeneratorJob - GeneratorJob: filtering: true
2016-12-30 00:01:48,448 INFO crawl.GeneratorJob - GeneratorJob: normalizing: true
2016-12-30 00:01:48,448 INFO crawl.GeneratorJob - GeneratorJob: topN: 100000
2016-12-30 00:01:48,816 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-12-30 00:01:48,857 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2016-12-30 00:01:48,867 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
2016-12-30 00:01:48,867 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
2016-12-30 00:01:51,568 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2016-12-30 00:01:51,573 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/staging/mehdi1740651658/.staging/job_local1740651658_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2016-12-30 00:01:51,753 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2016-12-30 00:01:51,760 WARN conf.Configuration - file:/tmp/hadoop-mehdi/mapred/local/localRunner/mehdi/job_local1740651658_0001/job_local1740651658_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2016-12-30 00:01:52,408 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2016-12-30 00:01:52,408 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000
2016-12-30 00:01:52,408 INFO crawl.AbstractFetchSchedule - maxInterval=7776000
2016-12-30 00:01:52,591 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
2016-12-30 00:02:03,229 ERROR mapreduce.GoraRecordReader - Error reading Gora records: Read operation to server localhost:27017 failed on database nutch
2016-12-30 00:02:04,607 WARN mapred.LocalJobRunner - job_local1740651658_0001
java.lang.Exception: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.mongodb.MongoException$Network: Read operation to server localhost:27017 failed on database nutch
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:298)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:235)
at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:145)
at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:135)
at com.mongodb.DBCursor._hasNext(DBCursor.java:626)
at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
at org.apache.gora.mongodb.query.MongoDBResult.nextInner(MongoDBResult.java:71)
at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:111)
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:118)
... 12 more
Caused by: java.io.EOFException
at org.bson.io.Bits.readFully(Bits.java:75)
at org.bson.io.Bits.readFully(Bits.java:50)
at org.bson.io.Bits.readFully(Bits.java:37)
at com.mongodb.Response.<init>(Response.java:42)
at com.mongodb.DBPort$1.execute(DBPort.java:164)
at com.mongodb.DBPort$1.execute(DBPort.java:158)
at com.mongodb.DBPort.doOperation(DBPort.java:187)
at com.mongodb.DBPort.call(DBPort.java:158)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:290)
... 21 more
2016-12-30 00:02:04,846 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=nutch-maven-1.0-SNAPSHOT.jar, jobid=job_local1740651658_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)
I figured out that the problem becomes from mongodb version. Nutch uses mongo-java-driver-2.13.1.jar ad I've installed mongodb 3.4.1. So I've installed mongo 2.6.7 and now it works fine. I'll try to update the driver in Nutch and tell you if it works with the new version of mongodb.

sqoop error : password authentication failed

I try to import data from postgres9.3 to hadoop2.7.2 using sqoop1.4.6 on Linux.
When I use the following code, it returns the correct thing.
sqoop-list-databases --connect jdbc:postgresql://localhost:5432/ --username postgres --password "baixinghehe"
The following result shows the username and password are both OK.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/09 15:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/09 15:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/09 15:39:50 INFO manager.SqlManager: Using default fetchSize of 1000
template1
template0
postgres
learnflask
When I want to use sqoop import like the following code.
sqoop import --connect jdbc:postgresql://localhost:5432/learnflask --username postgres --password baixinghehe --table employee --target-dir /data/employee -m 1
It runs ok at first, until the map part. Here is the error. I really feel strange because the program should not go to the mapreduce part if the password is wrong. Can someone help? Thanks very much.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/09 15:35:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/09 15:35:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/09 15:35:02 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/09 15:35:02 INFO tool.CodeGenTool: Beginning code generation
16/07/09 15:35:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "employee" AS t LIMIT 1
16/07/09 15:35:03 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
注: /tmp/sqoop-chenxuyuan/compile/b54e25a7f52c8eb496b2b84945ecd05a/employee.java.Use replace the old api
Using -Xlint:deprecation to recompile
16/07/09 15:35:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-chenxuyuan/compile/b54e25a7f52c8eb496b2b84945ecd05a/employee.jar
16/07/09 15:35:06 WARN manager.PostgresqlManager: It looks like you are importing from postgresql.
16/07/09 15:35:06 WARN manager.PostgresqlManager: This transfer can be faster! Use the --direct
16/07/09 15:35:06 WARN manager.PostgresqlManager: option to exercise a postgresql-specific fast path.
16/07/09 15:35:06 INFO mapreduce.ImportJobBase: Beginning import of employee
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/09 15:35:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/07/09 15:35:07 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/07/09 15:35:07 INFO client.RMProxy: Connecting to ResourceManager at cxy10/192.168.0.110:8032
16/07/09 15:35:33 INFO db.DBInputFormat: Using read commited transaction isolation
16/07/09 15:35:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("id"), MAX("id") FROM "employee"
16/07/09 15:35:33 INFO mapreduce.JobSubmitter: number of splits:2
16/07/09 15:35:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468047672859_0003
16/07/09 15:35:35 INFO impl.YarnClientImpl: Submitted application application_1468047672859_0003
16/07/09 15:35:35 INFO mapreduce.Job: The url to track the job: http://cxy10:8088/proxy/application_1468047672859_0003/
16/07/09 15:35:35 INFO mapreduce.Job: Running job: job_1468047672859_0003
16/07/09 15:35:46 INFO mapreduce.Job: Job job_1468047672859_0003 running in uber mode : false
16/07/09 15:35:46 INFO mapreduce.Job: map 0% reduce 0%
16/07/09 15:35:55 INFO mapreduce.Job: Task Id : attempt_1468047672859_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
Caused by: org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"
at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:415)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:188)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:143)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:29)
at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:21)
at org.postgresql.jdbc3g.Jdbc3gConnection.<init>(Jdbc3gConnection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:412)
at org.postgresql.Driver.connect(Driver.java:280)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:213)
... 10 more