Sqoop Import from Mysql DB is taking Huge time to Complete - import

I am trying to import some 5 records from a MySQL table "employees" and import it to hdfs by creating a hive table in Sqoop Import command. But the query is running for a long time and I am unable to see any result.
I have started Hadoop services and it is up and running. I can create tables in Hive manually also.
sqoop import --connect jdbc:mysql://localhost:3306/DBName
--username UserID --password Pwd --split-by EMPLOYEE_ID
-e "select EMPLOYEE_ID,FIRST_NAME from employees where EMPLOYEE_ID <=105 and $CONDITIONS"
--target-dir /user/hive/warehouse/mysqldb.db/employees
--fields-terminated-by ","
--hive-import
--create-hive-table
--hive-table mysqldb.employees -m2
What is going on here?
Sqoop Query ->
URL ->

Related

How to import rows from only one table from pgAdmin db to table in Heroku Postgres db?

I have table Films with data in pgAdmin db(local Postgres db). And I want to import this data from Films table to the same table Films in Heroku Postgres db. I am using SQLAlchemy and Flask. I have read here(https://devcenter.heroku.com/articles/heroku-postgres-import-export) that it can be done through the console. I even tried it, but without any success. I am going to do this by write all data from Films into csv file and then copy it from csv file to Films on Heroku Postgres db, but is there a better way to do what i want? If exists, please give me understandable example. I will be very appreciative.
P.S. I tried create table dump: pg_dump -Fc --no-acl --no-owner -h localhost -U oleksiy -t films --data-only fm_bot > table.dump
But I don't understand next step: Generate a signed URL using the aws console - aws s3 presign s3://your-bucket-address/your-object
What is "your-bucket-address", what is "your-object"? And the main question: is it that I need?

How to transfer data from one schema to another schema of different machine in Postgres db using Dbeaver

I have a Postgres database schema in machine A, I want to copy the whole database schema into machine B using Dbeaver. How can I do that?
You should take the backup using pg_dump. To take a dump from a remote server you should run the following
pg_dump -h host_address -U username -s schema_name -Fc database_name > dump_file_path.sql
This will create a SQL dump of the selected schema. Then you can use PSQL or DBeaver to import and create the schema in your server B or database B.
To do it with psql you should run the following -
psql "sslmode=disable dbname=_db_name_ user=_user_ hostaddr=_host_" < exported_sql_file_path

Sqoop - changing a Hive column data type to Postgres data type

I'm trying to change the last column of the hive table (which is type STRING in hive) to a Postgres type date below is the command:
sqoop export
--connect jdbc:postgresql://192.168.11.1:5432/test
--username test
--password test_password
--table posgres_table
--hcatalog-database hive_db
--hcatalog-table hive_table
I have tried using, this but the column in Postgres is still empty:
-map-column-hive batch_date=date
-map-column-hive works only for Sqoop import (i.e. while fetching data from RDBMS to HDFS/Hive)
All you need to make your Hive's String data in proper date format, it should work.
Internally, sqoop export create statements like
INSERT INTO posgres_table...
You can verify by manually creating INSERT INTO posgres_table values(...) statment via JDBC driver or any client like pgAdmin, squirrel-sql, etc.

Primary Key Error while importing table Using Sqoop

I tried to import all tables using the sqoop into one of the directories.But one of the tables has no primary key.This is the code I executed.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera/retail_db"
--username=retail_dba
--password=cloudera
--warehouse-dir /user/cloudera/sqoop_import/
I am getting the following error:
Error during import: No primary key could be found for table
departments_export. Please specify one with --split-by or perform a
sequential import with '-m 1'.
By seeing
sqoop import without primary key in RDBMS
I understood that we can just use --split-by for a single table import.Is there a way i can specify --splity-by for Import-all-tables command. Is there a way I can use more than one mapper for the multi-table import with no primary-key.
you need to use --autoreset-to-one-mapper:
Tables without primary key will be imported with one mapper and others with primary key with default mappers (4 - if not specified in sqoop command)
As #JaimeCr said you can't use --split-by with import-all-tables but this just a quote from sqoop guide in context of error you got:
If a table does not have a primary key defined and the --split-by> <col> is not provided, then import will fail unless the number of mappers is explicitly set to one with the --num-mappers 1 or --m 1 option or the --autoreset-to-one-mapper option is used.
The option --autoreset-to-one-mapper is typically used with the import-all-tables tool to automatically handle tables without a primary key in a schema.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera/retail_db" \
--username=retail_dba \
--password=cloudera \
--autoreset-to-one-mapper \
--warehouse-dir /user/cloudera/sqoop_import/

How to insert and Update simultaneously to PostgreSQL with sqoop command

I am trying to insert into postgreSQL DB with sqoop command.
sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema
It is working fine if there is not primary key constrain. I want to insert new records and update old records simultaneously.
I have tried
--update-key primary_key This updates only those primary keys present in both DB(hive and postgreSQL. No insertion)
--update-mode allowinsert - This only does the insert
--update-key primary_key --update-mode allowinsert - This gives error
ERROR tool.ExportTool: Error during export: Mixed update/insert is not supported against the
target database yet
Can anyone help me to write sqoop command which insert and update the data to postgreSQL ?
According to my internet search, it is not possible to perform both insert and update directly to postgreSQL DB. Instead you can create a storedProc/function in postgreSQL and you can send data there..
sqoop export --connect <url> --call <upsert proc> --export-dir /results/bar_data
Stored proc/function should perform both Update and Insert.
Link 1 - https://issues.apache.org/jira/browse/SQOOP-1270
Link 2 - PLPGSQL-UPSERT-EXAMPLE