Primary Key Error while importing table Using Sqoop - import

I tried to import all tables using the sqoop into one of the directories.But one of the tables has no primary key.This is the code I executed.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera/retail_db"
--username=retail_dba
--password=cloudera
--warehouse-dir /user/cloudera/sqoop_import/
I am getting the following error:
Error during import: No primary key could be found for table
departments_export. Please specify one with --split-by or perform a
sequential import with '-m 1'.
By seeing
sqoop import without primary key in RDBMS
I understood that we can just use --split-by for a single table import.Is there a way i can specify --splity-by for Import-all-tables command. Is there a way I can use more than one mapper for the multi-table import with no primary-key.

you need to use --autoreset-to-one-mapper:
Tables without primary key will be imported with one mapper and others with primary key with default mappers (4 - if not specified in sqoop command)
As #JaimeCr said you can't use --split-by with import-all-tables but this just a quote from sqoop guide in context of error you got:
If a table does not have a primary key defined and the --split-by> <col> is not provided, then import will fail unless the number of mappers is explicitly set to one with the --num-mappers 1 or --m 1 option or the --autoreset-to-one-mapper option is used.
The option --autoreset-to-one-mapper is typically used with the import-all-tables tool to automatically handle tables without a primary key in a schema.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera/retail_db" \
--username=retail_dba \
--password=cloudera \
--autoreset-to-one-mapper \
--warehouse-dir /user/cloudera/sqoop_import/

Related

Sqoop Import from Mysql DB is taking Huge time to Complete

I am trying to import some 5 records from a MySQL table "employees" and import it to hdfs by creating a hive table in Sqoop Import command. But the query is running for a long time and I am unable to see any result.
I have started Hadoop services and it is up and running. I can create tables in Hive manually also.
sqoop import --connect jdbc:mysql://localhost:3306/DBName
--username UserID --password Pwd --split-by EMPLOYEE_ID
-e "select EMPLOYEE_ID,FIRST_NAME from employees where EMPLOYEE_ID <=105 and $CONDITIONS"
--target-dir /user/hive/warehouse/mysqldb.db/employees
--fields-terminated-by ","
--hive-import
--create-hive-table
--hive-table mysqldb.employees -m2
What is going on here?
Sqoop Query ->
URL ->

Sqoop - changing a Hive column data type to Postgres data type

I'm trying to change the last column of the hive table (which is type STRING in hive) to a Postgres type date below is the command:
sqoop export
--connect jdbc:postgresql://192.168.11.1:5432/test
--username test
--password test_password
--table posgres_table
--hcatalog-database hive_db
--hcatalog-table hive_table
I have tried using, this but the column in Postgres is still empty:
-map-column-hive batch_date=date
-map-column-hive works only for Sqoop import (i.e. while fetching data from RDBMS to HDFS/Hive)
All you need to make your Hive's String data in proper date format, it should work.
Internally, sqoop export create statements like
INSERT INTO posgres_table...
You can verify by manually creating INSERT INTO posgres_table values(...) statment via JDBC driver or any client like pgAdmin, squirrel-sql, etc.

How to create separate DDL file for create tables and constraints in DB2

For DB2 10.1 LUW, I want to create a DDL script which just contains the create table statements and then create a DDL script which just contains the constraints.
The aim is to automate the creation of the database scripts from the existing database and not rely on SQL GUIs to do this like Data Studio, SQLDbx and RazorSQL. Ideally we would like to trigger actions from the command line that can generate the create DLL for the schema, the insert data statements for each table in the schema and then generate the constraints DDL. This will help us to insert the data without dealing with the constraints which will help performance and means we will not be restricted to running the inserts in a specific order.
Unfortunately we cannot use db2move and I thought db2look would work for the DDL, but cannot find the syntax to do this. Can db2look do this or is there anything else out there that could help?
Thanks
db2look generates DDL script from existing DB and objects.
For example,
1. Getting all DDL for a DB
db2look -d <dbname> -a -e -l -x -o db2look.out
To get DDL for a table
db2look –d dbname –e –z schema_name –t Table_name -o output_file -nofed

How to insert and Update simultaneously to PostgreSQL with sqoop command

I am trying to insert into postgreSQL DB with sqoop command.
sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema
It is working fine if there is not primary key constrain. I want to insert new records and update old records simultaneously.
I have tried
--update-key primary_key This updates only those primary keys present in both DB(hive and postgreSQL. No insertion)
--update-mode allowinsert - This only does the insert
--update-key primary_key --update-mode allowinsert - This gives error
ERROR tool.ExportTool: Error during export: Mixed update/insert is not supported against the
target database yet
Can anyone help me to write sqoop command which insert and update the data to postgreSQL ?
According to my internet search, it is not possible to perform both insert and update directly to postgreSQL DB. Instead you can create a storedProc/function in postgreSQL and you can send data there..
sqoop export --connect <url> --call <upsert proc> --export-dir /results/bar_data
Stored proc/function should perform both Update and Insert.
Link 1 - https://issues.apache.org/jira/browse/SQOOP-1270
Link 2 - PLPGSQL-UPSERT-EXAMPLE

Postgresql db import not working well

I am trying to import a sql dump to postgresql db as -
sudo su postgres -c psql dbname < mydb_dump.sql
This gives errors as -
SET
SET
SET
SET
SET
SET
ERROR: function "array_accum" already exists with same argument types
ALTER AGGREGATE
ERROR: function "c" already exists with same argument types
ALTER AGGREGATE
ERROR: duplicate key value violates unique constraint "pg_ts_config_cfgname_index"
ERROR: duplicate key value violates unique constraint "pg_ts_config_map_index"
and so on this. What might be wrong with that? Tried googling for it, but couldn't find any pointers over it.
Postgresql version is 8.4.1
Thanks !!
you should to remove shared functions and objects from database, before you do dump or before you load dump. You have these functions and objects registered in template1, and when you create new database, then these objects are there - and you can see errors when dump try to create it again.
This issue is well solved in PostgreSQL 9.1. For older versions try to use option --clean for pg_dump
Pavel