I am trying to use sqoop import with HCatalog integration to ingest data from Teradata to Hive. Below is my sqoop import command:
sqoop import -libjars /path/tdgssconfig.jar \
-Dmapreduce.job.queuename=${queue} \
-Dmapreduce.map.java.opts=-Xmx16g \
-Dmapreduce.map.memory.mb=20480 \
--driver com.teradata.jdbc.TeraDriver \
--connect jdbc:teradata:<db-url>,charset=ASCII,LOGMECH=LDAP \
--username ${srcDbUsr} \
--password-file ${srcDbPassFile} \
--verbose \
--query "${query} AND \$CONDITIONS" \
--split-by ${splitBy} \
--fetch-size ${fetchSize} \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by , \
--hcatalog-database ${tgtDbName} \
--hcatalog-table ${tgtTblName} \
--hcatalog-partition-keys ${partitionKey} \
--hcatalog-partition-values "${partitionValue}"
And I encountered below error - Error adding partition to metastore. Permission denied.:
18/07/03 12:14:02 INFO mapreduce.Job: Job job_1530241180113_6487 failed with state FAILED due to: Job commit failed: org.apache.hive.hcatalog.common.HCatException : 2006 : Error adding partition to metastore. Cause : org.apache.hadoop.security.AccessControlException: Permission denied. user=<usr-name> is not the owner of inode=<partition-key=partition-value>
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkOwner(DefaultAuthorizationProvider.java:195)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:181)
at org.apache.sentry.hdfs.SentryAuthorizationProvider.checkPermission(SentryAuthorizationProvider.java:178)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3560)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3543)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:3508)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:6559)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermissionInt(FSNamesystem.java:1807)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1787)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:654)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.setPermission(AuthorizationProviderProxyClientProtocol.java:174)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:969)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
How can I resolve this permission issue?
Figured out the issue. sqoop hcatalog cannot add files to Hive internal table, because it resides in Hive directories and the owner is hive, not a particular user. Resolution is to create an external table so that the underlying directories have the user (not hive) as the owner.
Related
As soon as my EMR-Cluster was ready to be run.
I started facing some issues when listing databases and importing sqoop
Apparently, sqoop has been installed normally and it is working normally when I type "sqoop help" in Linux terminal.
using sqoop help
as you can see, the command could be recognized normally.
However, if I try out the sqoop import command, this one cannot be and it faces an error:
sqoop import \
--connect jdbc:postgres://sportsdb.cxri########.us-east-2.rds.amazonaws.com/SportsDB \
--username postgres \
--password mypassword \
--table addresses --target-dir s3://sqoop-table-from-rds-to-s3/sqoop-table/ -m 1 --fields-terminated-by '\t' --lines-terminated-by ','
sqoop import
The same goes to the second one, which is "sqoop list-databases" as shown below:
sqoop list-databases \
--connect jdbc:postgres://sportsdb.cxri########.us-east-2.rds.amazonaws.com \
--username postgres \
--password mypassword
sqoop list-databases
they don't really works and anything happens ;/
I also downloaded jar and put into /usr/lib/sqoop/lib/ where is the jar files on sqoop
To do it I run these two follow commands:
1) wget -O postgresql-jdbc.jar https://jdbc.postgresql.org/download/postgresql-42.3.1.jar
2) sudo mv postgresql-jdbc.jar /usr/lib/sqoop/lib/
Jar file added to sqoop/lib
Someone else has a suggestion about what can be done in order to fix this issue?
The issue could be solved after following a tip received about a typo.
Then, I just changed the word postgres for postgresql as follows:
sqoop list-databases \
--connect jdbc:postgresql://sportsdb.cxri########.us-east-2.rds.amazonaws.com \
--username postgres \
--password mypassword
That is it. The issue was fixed just adjusting something pretty simple
I have the following codes:
sqoop import --connect jdbc:mysql://localhost/export \
--username root \
--password cloudera \
--table cust \
--hive-import \
--create-hive-table \
--fields-terminated-by ' ' \
--hive-table default.cust \
--target-dir /user/hive/warehouse/cust \
--compression-codec org.apache.org.io.compress.GzipCodec \
--as-avrodatafile \
-m 1
got the following error, please help.
Hive import is not compatible with importing into AVRO format.
Currently, Sqoop does not support to import AVRO format directly into a HIVE table, so as a workaround you can import into HDFS and create a EXTERNAL TABLE in HIVE
Step 1 : IMPORT into hdfs
sqoop import --connect jdbc:mysql://localhost/export \
--username root --password cloudera
--table cust \
--target-dir /user/hive/warehouse/cust \
--compression-codec org.apache.org.io.compress.GzipCodec \
--as-avrodatafile -m 1
This import will create a schema file in the current directory (Linux) with an extension .avsc .Copy this file to some location in HDFS (PATH_TO_THE_COPIED_SCHEMA).
Step 2: Create an external table in HIVE like
CREATE EXTERNAL TABLE cust
STORED AS AVRO
LOCATION 'hdfs:///user/hive/warehouse/cust'
TBLPROPERTIES ('avro.schema.url'='hdfs:///PATH_TO_THE_COPIED_SCHEMA/cust.avsc');
I'm trying to do an activity whereas i'll import a table from MSSQL then export to MSSQL again in another database for the sake of testing sqoop1. So far, my imports are successful. My concern is regarding the export, if i import a table without --hive-import option, i'll be able to export it successfully. But if i include --hive-import option, sqoop wont be able to export it and prompts an error:
17/04/02 23:08:20 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.DatasetIOException: Unable to load descriptor
file:hdfs://quickstart.cloudera:8020/user/hive/warehouse/customer/.metadata/descriptor.properties
for dataset:customer org.kitesdk.data.DatasetIOException: Unable to
load descriptor
file:hdfs://quickstart.cloudera:8020/user/hive/warehouse/customer/.metadata/descriptor.properties
for dataset:customer
As per checking, there's a difference in the metadata with --hive-imports. Imports with --hive-import parameter only does not have the required metadata:
Supplier/.metadata/descriptor.properties
My question is, is it possible to import a table in sqoop with --as-parquetfile and --hive-import option then be able to export it also?
here's my sample import and export code for referrence:
sqoop export \
--connect "jdbc:sqlserver://192.168.1.23;database=SqoopDB;schema=dbo;" \
--username sa \
--password Password1 \
--export-dir /user/hive/warehouse/customer \
--table customer
sqoop import \
--connect "jdbc:sqlserver://192.168.1.23;database=SourceDB;schema=dbo" \
--username sa \
--password Password1 \
--table Customer \
--as-parquetfile \
--hive-import \
--hive-overwrite \
-m 1
I use Sqoop 1.4.5-cdh5.4.2 and Postgresql.
If Sqoop connects directly to the database - all right.
But need use Sqoop over pgbouncer, and I have problem with this.
In pgbouncer you can not do prepared statements transaction mode.
... connect command:
sqoop import \
--connect "$db_name" \
--username "$db_user" \
--password "$db_pass" \
--direct \
--hive-import \
--hive-table "$hive_schema.$t" \
--hive-overwrite \
--num-mappers 10 \
--fetch-size 100000 \
--split-by "object_id" \
--target-dir "/user/$hive_schema/$t" \
--table "$t"
... and error:
org.postgresql.util.PSQLException: ERROR: prepared statement "S_3" already exists
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at org.postgresql.jdbc2.AbstractJdbc2Connection.executeTransactionCommand(AbstractJdbc2Connection.java:791)
at org.postgresql.jdbc2.AbstractJdbc2Connection.commit(AbstractJdbc2Connection.java:815)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:315)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:126)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:188)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Add prepareThreshold=0 to the connection string
SQOOP don't work with pgbouncer and transaction pool :(
I am successfully able to import data with clob data type from db2 to hive. After some processing on table in hive, I want to load back the table to db2.
Command used to import:
$SQOOP_HOME/bin/sqoop import --connect jdbc:db2://192.168.145.64:50000/one --table clobtest --username db2inst1 --password dbuser --hive-import --map-column-hive CLOBB=STRING --inline-lob-limit 155578 --target-dir /tmp/1 --m 1
Command used to export:
$SQOOP_HOME/bin/sqoop export --connect jdbc:db2://192.168.145.64:50000/one --username db2inst1 --password dbuser --export-dir /user/hive/warehouse/clobtest --table clobtest --input-fields-terminated-by '\0001' --input-null-string '\\N' --input-null-non-string '\\N' --m 1
At the time of export, I am getting below error:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Could not buffer record
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:218)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.ClobRef
at java.lang.Object.clone(Native Method)
at org.apache.sqoop.lib.LobRef.clone(LobRef.java:109)
at clobtest.clone(clobtest.java:222)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:213)
... 15 more
Any idea about this error?
Thanks.