Change data capture in Sqoop-Hive Import
I am trying to do change data capture using Sqoop but when I am writing -as-parquet I my Sqoop import command it is falling .but after removing -as-parquet from my Sqoop command it is working and putting data in text format in hive table but want it in parquet hive table.
i want to do update operation from my data.
I have written this below command :
Sqoop import --connect "myoracleconntiondetails"
--username myuser --password mypasswd
--query 'select * from test_table where $CONDITIONS'
--hive_import --hive-database test_dase
--hive-table test_dase.test_table --null-string 'NULL'
--null-non-string '-99999' --target-dir mydir/full path
--split-by mycol --incremental append
--merge-key could -as-parquet -m -10
I get this error:
Got exception running sqoop: org.kitesdk.data.validationException: Dataset name test_dase.test_table is not alphanumeric (plus '')
Org.kitesdk.data.validation:Dataset name test_dase.test_table is not alphanumeric (plus '')
I have the following codes:
sqoop import --connect jdbc:mysql://localhost/export \
--username root \
--password cloudera \
--table cust \
--hive-import \
--create-hive-table \
--fields-terminated-by ' ' \
--hive-table default.cust \
--target-dir /user/hive/warehouse/cust \
--compression-codec org.apache.org.io.compress.GzipCodec \
--as-avrodatafile \
-m 1
got the following error, please help.
Hive import is not compatible with importing into AVRO format.
Currently, Sqoop does not support to import AVRO format directly into a HIVE table, so as a workaround you can import into HDFS and create a EXTERNAL TABLE in HIVE
Step 1 : IMPORT into hdfs
sqoop import --connect jdbc:mysql://localhost/export \
--username root --password cloudera
--table cust \
--target-dir /user/hive/warehouse/cust \
--compression-codec org.apache.org.io.compress.GzipCodec \
--as-avrodatafile -m 1
This import will create a schema file in the current directory (Linux) with an extension .avsc .Copy this file to some location in HDFS (PATH_TO_THE_COPIED_SCHEMA).
Step 2: Create an external table in HIVE like
CREATE EXTERNAL TABLE cust
STORED AS AVRO
LOCATION 'hdfs:///user/hive/warehouse/cust'
TBLPROPERTIES ('avro.schema.url'='hdfs:///PATH_TO_THE_COPIED_SCHEMA/cust.avsc');
I write the following sqoop command :
sqoop import --connect jdbc:mysql://localhost/export --username root --password cloudera --table cust --create-hive-table --fields-terminated-by ' ' --hive-table default.cust -m 1
Then, I could not found the table in default database but the file appeared in /user/cloudera/cust
Use —hive-import and -hive-overwrite if it is a overwrite table. You can also mention the —target-dir
I'm trying to do an activity whereas i'll import a table from MSSQL then export to MSSQL again in another database for the sake of testing sqoop1. So far, my imports are successful. My concern is regarding the export, if i import a table without --hive-import option, i'll be able to export it successfully. But if i include --hive-import option, sqoop wont be able to export it and prompts an error:
17/04/02 23:08:20 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.DatasetIOException: Unable to load descriptor
file:hdfs://quickstart.cloudera:8020/user/hive/warehouse/customer/.metadata/descriptor.properties
for dataset:customer org.kitesdk.data.DatasetIOException: Unable to
load descriptor
file:hdfs://quickstart.cloudera:8020/user/hive/warehouse/customer/.metadata/descriptor.properties
for dataset:customer
As per checking, there's a difference in the metadata with --hive-imports. Imports with --hive-import parameter only does not have the required metadata:
Supplier/.metadata/descriptor.properties
My question is, is it possible to import a table in sqoop with --as-parquetfile and --hive-import option then be able to export it also?
here's my sample import and export code for referrence:
sqoop export \
--connect "jdbc:sqlserver://192.168.1.23;database=SqoopDB;schema=dbo;" \
--username sa \
--password Password1 \
--export-dir /user/hive/warehouse/customer \
--table customer
sqoop import \
--connect "jdbc:sqlserver://192.168.1.23;database=SourceDB;schema=dbo" \
--username sa \
--password Password1 \
--table Customer \
--as-parquetfile \
--hive-import \
--hive-overwrite \
-m 1
I am getting the below error while reading a csv file:
Failed: error processing document #1: invalid character 'a' in literal new or null (expecting 'e' or 'u')
There are some blank fields, which I am suspecting needs to be presented as 'null' to be read properly. Am I correct here?
SAMPLE CSV:
name,year,battle_number,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,defender_2,defender_3,defender_4,attacker_outcome,battle_type,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
Battle of the Golden Tooth,298,1,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,1,0,15000,4000,Jaime Lannister,"Clement Piper, Vance",1,Golden Tooth,The Westerlands,
Battle at the Mummer's Ford,298,2,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,,,,win,ambush,1,0,,120,Gregor Clegane,Beric Dondarrion,1,Mummer's Ford,The Riverlands,
I guess you didn't specify the file type with --type csv so mongoimportassumes you import a JSON file by default
--> Try to import with --type csv --headerline
In AWS documentation, it missing following 3 lines in CSV file import.
--type=csv \
--headerline \
--ignoreBlanks \
After I added following 3 lines to the code as following, CSV file imported successfuly to AWS documentdb.
mongoimport --ssl \
--host="sample-cluster.node.us-east-1.docdb.amazonaws.com:27017" \
--collection=sample-collection \
--db=sample-database \
--type=csv \
--headerline \
--ignoreBlanks \
--file=<yourFile> \
--numInsertionWorkers 4 \
--username=sample-user \
--password=abc0123 \
--sslCAFile rds-combined-ca-bundle.pem