I am successfully able to import data with clob data type from db2 to hive. After some processing on table in hive, I want to load back the table to db2.
Command used to import:
$SQOOP_HOME/bin/sqoop import --connect jdbc:db2://192.168.145.64:50000/one --table clobtest --username db2inst1 --password dbuser --hive-import --map-column-hive CLOBB=STRING --inline-lob-limit 155578 --target-dir /tmp/1 --m 1
Command used to export:
$SQOOP_HOME/bin/sqoop export --connect jdbc:db2://192.168.145.64:50000/one --username db2inst1 --password dbuser --export-dir /user/hive/warehouse/clobtest --table clobtest --input-fields-terminated-by '\0001' --input-null-string '\\N' --input-null-non-string '\\N' --m 1
At the time of export, I am getting below error:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Could not buffer record
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:218)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.ClobRef
at java.lang.Object.clone(Native Method)
at org.apache.sqoop.lib.LobRef.clone(LobRef.java:109)
at clobtest.clone(clobtest.java:222)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:213)
... 15 more
Any idea about this error?
Thanks.
Related
I am trying to use sqoop import with HCatalog integration to ingest data from Teradata to Hive. Below is my sqoop import command:
sqoop import -libjars /path/tdgssconfig.jar \
-Dmapreduce.job.queuename=${queue} \
-Dmapreduce.map.java.opts=-Xmx16g \
-Dmapreduce.map.memory.mb=20480 \
--driver com.teradata.jdbc.TeraDriver \
--connect jdbc:teradata:<db-url>,charset=ASCII,LOGMECH=LDAP \
--username ${srcDbUsr} \
--password-file ${srcDbPassFile} \
--verbose \
--query "${query} AND \$CONDITIONS" \
--split-by ${splitBy} \
--fetch-size ${fetchSize} \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by , \
--hcatalog-database ${tgtDbName} \
--hcatalog-table ${tgtTblName} \
--hcatalog-partition-keys ${partitionKey} \
--hcatalog-partition-values "${partitionValue}"
And I encountered below error - Error adding partition to metastore. Permission denied.:
18/07/03 12:14:02 INFO mapreduce.Job: Job job_1530241180113_6487 failed with state FAILED due to: Job commit failed: org.apache.hive.hcatalog.common.HCatException : 2006 : Error adding partition to metastore. Cause : org.apache.hadoop.security.AccessControlException: Permission denied. user=<usr-name> is not the owner of inode=<partition-key=partition-value>
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkOwner(DefaultAuthorizationProvider.java:195)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:181)
at org.apache.sentry.hdfs.SentryAuthorizationProvider.checkPermission(SentryAuthorizationProvider.java:178)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3560)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3543)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:3508)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:6559)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermissionInt(FSNamesystem.java:1807)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1787)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:654)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.setPermission(AuthorizationProviderProxyClientProtocol.java:174)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:969)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
How can I resolve this permission issue?
Figured out the issue. sqoop hcatalog cannot add files to Hive internal table, because it resides in Hive directories and the owner is hive, not a particular user. Resolution is to create an external table so that the underlying directories have the user (not hive) as the owner.
I write the following sqoop command :
sqoop import --connect jdbc:mysql://localhost/export --username root --password cloudera --table cust --create-hive-table --fields-terminated-by ' ' --hive-table default.cust -m 1
Then, I could not found the table in default database but the file appeared in /user/cloudera/cust
Use —hive-import and -hive-overwrite if it is a overwrite table. You can also mention the —target-dir
When I run this command on shell works fine:
sqoop import --incremental append --check-column id_civilstatus --last-value -1
--connect jdbc:postgresql://somehost/somedb --username someuser
--password-file file:///passfile.txt --table sometable --direct -m 3
--target-dir /jobs/somedir -- --schema someschema
But when I try to save it as a job:
sqoop job --create myjob -- import --incremental append --check-column id_civilstatus
--last-value -1 --connect jdbc:postgresql://somehost/somedb --username someuser
--password-file file:///passfile.txt --table sometable --direct -m 3
--target-dir /jobs/somedir -- --schema someschema
Then I execute:
sqoop job --exec myjob
I get this error message:
PSQLException: ERROR: relation "sometable" does not exist
This is error due to 'sometable' does not exists in default schema.
Why sqoop job soes not take schema parameter? I am missing something?
Thanks
You can specify /change default schema passing "?currentSchema=myschema"in jdbc connection More detail .
jdbc:postgresql://localhost:5432/mydatabase?currentSchema=myschema
You don’t need to mention schema separately, you can either keep it in jdbc URL, not sure if postgres jdbc URL have that option or not. You have to add it in the table option itself. Something like below
—table schemaName.tableName
Use the following as your JDBC URL
jdbc:postgresql://somehost/somedb/someschema
and remove --schema someschema from the Sqoop Statement.
I found a way to make this work here.
sqoop job --exec myjob -- -- --schema someschema
I'm importing external data from Oracle to HDFS using Sqoop, after that I'm trying to import it to MongDB, follow the command that I'm usign and the error message:
-- Import external file to HDFS
sqoop import \
--connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=hostname)(port=1521))(connect_data=(service_name=SID)))" \
--username user --table schema. my_table \
--num-mappers 1 --verbose -P
--Import HDFS file to MongoDB
hadoop fs -text /user/cloudera/mytable/part* | mongoimport -d database -c my_table --type csv --headerline
Error:
mongoimport: /usr/lib64/libcrypto.so.10: no version information
available (required by mongoimport) mongoimport:
/usr/lib64/libssl.so.10: no version information available (required by
mongoimport) 2015-12-08T07:08:24.001-0800 Failed: fields '' and '.'
are incompatible 2015-12-08T07:08:24.001-0800 imported 0 documents
text: Unable to write to output stream.
I’m using Sqoop v1.4.2 to do incremental imports with jobs. The jobs are:
--create job_1 -- import --connect <CONNECT_STRING> --username <UNAME> --password <PASSWORD> -m <MAPPER#> --split-by <COLUMN> --target-dir <TARGET_DIR> --table <TABLE> --check-column <COLUMN> --incremental append --last-value 1
NOTES:
Incremental type is append
Job creation is successful
Job execution is successful for repeated times
Can see new rows being imported in HDFS
--create job_2 -- import --connect <CONNECT_STRING> --username <UNAME> --password <PASSWORD> -m <MAPPER#> --split-by <COLUMN> --target-dir <TARGET_DIR> --table <TABLE> --check-column <COLUMN> --incremental lastmodified --last-value 1981-01-01
NOTES:
Incremental type is lastmodified
Job creation is successful, table name is different from as used in job_1
Job execution is successful ONLY FOR FIRST TIME
Can see rows being imported for first execution in HDFS
Subsequent job execution fails with following error:
ERROR security.UserGroupInformation: PriviledgedActionException as:<MY_UNIX_USER>(auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <TARGET_DIR_AS_SPECIFIED_IN_job_2> already exists
ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <TARGET_DIR_AS_SPECIFIED_IN_job_2> already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:202)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:465)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:108)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
If you wanted to execute job_2 again and again then you need to use --incremental lastmodified --append
sqoop --create job_2 -- import --connect <CONNECT_STRING> --username <UNAME>
--password <PASSWORD> --table <TABLE> --incremental lastmodified --append
--check-column<COLUMN> --last-value "2017-11-05 02:43:43" --target-dir
<TARGET_DIR> -m 1