How to insert and Update simultaneously to PostgreSQL with sqoop command - postgresql

I am trying to insert into postgreSQL DB with sqoop command.
sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema
It is working fine if there is not primary key constrain. I want to insert new records and update old records simultaneously.
I have tried
--update-key primary_key This updates only those primary keys present in both DB(hive and postgreSQL. No insertion)
--update-mode allowinsert - This only does the insert
--update-key primary_key --update-mode allowinsert - This gives error
ERROR tool.ExportTool: Error during export: Mixed update/insert is not supported against the
target database yet
Can anyone help me to write sqoop command which insert and update the data to postgreSQL ?

According to my internet search, it is not possible to perform both insert and update directly to postgreSQL DB. Instead you can create a storedProc/function in postgreSQL and you can send data there..
sqoop export --connect <url> --call <upsert proc> --export-dir /results/bar_data
Stored proc/function should perform both Update and Insert.
Link 1 - https://issues.apache.org/jira/browse/SQOOP-1270
Link 2 - PLPGSQL-UPSERT-EXAMPLE

Related

Update Postgres table from a pg_dump dump?

I have 2 Postgres tables (same schema) in different servers.
One table has the the latest data, and the other one has old data that must be updated with the latest data.
So I dump the table with the latest data:
pg_dump --table=table00 --data-only db00 > table00.sql
or
pg_dump --table=table00 --data-only --column-inserts db00 > table00.sql
But now when I want to read in the SQL dump, I get errors about duplicate keys.
psql postgresql://<username>:<password>#localhost:5432/db00 --file=table00.sql
The error goes away if I drop the table with the old data first, but not only is this undesirable, it is plain silly.
How can I update a Postgres table from a SQL dump then?
Eg. it'd be nice if pg_dump had a --column-updates option, where, instead of INSERT statements, you got INSERT ON CONFLICT (column) DO UPDATE SET statements...

pg_restore --clean is not dropping and clearing the database

I am having an issue with pg_restore --clean not clearing the database.
Or do I misunderstand what the --clean does, I am expecting it to truncate the database tables and reinitialize the indexes/primary keys.
I am using 9.5 on rds
This is the full command we use
pg_restore --verbose --clean --no-acl --no-owner -h localhost -U superuser -d mydatabase backup.dump
Basically what is happening is this.
I do a nightly backup of my production db, and restore it to an analytics db for the analyst to churn and run their reports.
I found out recently that the rails application used to view the reports was complaining that the primary keys were missing from the restored analytics database.
So I started investigating the production db, the analytics db etc. Which was when I realized that multiple rows with the same primary key existed in the analytics database.
I ran a few short experiments and realized that every time the pg_restore script is run it inserts duplicate data into the tables, this leads me to think that the --clean is not dropping and restoring the data. Because if I were to drop the schema beforehand, I don't get duplicate data.
To remove all tables from a database (but keep the database itself), you have two options.
Option 1: Drop the entire schema
You will need to re-create the schema and its permissions. This is usually good enough for development machines only.
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
Applications usually use the "public" schema. You may encounter other schema names when working with a (legacy) application's database.
Note that for Rails applications, dropping and recreating the database itself is usually fine in development. You can use bin/rake db:drop db:create for that.
Option 2: Drop each table individually
Prefer this for production or staging servers. Permissions may be managed by your operations team, and you do not want to be the one who messed up permissions on a shared database cluster.
The following SQL code will find all table names and execute a DROP TABLE statement for each.
DO $$ DECLARE
r RECORD;
BEGIN
FOR r IN (SELECT tablename FROM pg_tables WHERE schemaname = current_schema()) LOOP
EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(r.tablename) || ' CASCADE'; -- DROP TABLE IF EXISTS instead DROP TABLE - thanks for the clarification Yaroslav Schekin
END LOOP;
END $$;
Original:
https://makandracards.com/makandra/62111-how-to-drop-all-tables-in-postgresql

Sqoop - changing a Hive column data type to Postgres data type

I'm trying to change the last column of the hive table (which is type STRING in hive) to a Postgres type date below is the command:
sqoop export
--connect jdbc:postgresql://192.168.11.1:5432/test
--username test
--password test_password
--table posgres_table
--hcatalog-database hive_db
--hcatalog-table hive_table
I have tried using, this but the column in Postgres is still empty:
-map-column-hive batch_date=date
-map-column-hive works only for Sqoop import (i.e. while fetching data from RDBMS to HDFS/Hive)
All you need to make your Hive's String data in proper date format, it should work.
Internally, sqoop export create statements like
INSERT INTO posgres_table...
You can verify by manually creating INSERT INTO posgres_table values(...) statment via JDBC driver or any client like pgAdmin, squirrel-sql, etc.

Primary Key Error while importing table Using Sqoop

I tried to import all tables using the sqoop into one of the directories.But one of the tables has no primary key.This is the code I executed.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera/retail_db"
--username=retail_dba
--password=cloudera
--warehouse-dir /user/cloudera/sqoop_import/
I am getting the following error:
Error during import: No primary key could be found for table
departments_export. Please specify one with --split-by or perform a
sequential import with '-m 1'.
By seeing
sqoop import without primary key in RDBMS
I understood that we can just use --split-by for a single table import.Is there a way i can specify --splity-by for Import-all-tables command. Is there a way I can use more than one mapper for the multi-table import with no primary-key.
you need to use --autoreset-to-one-mapper:
Tables without primary key will be imported with one mapper and others with primary key with default mappers (4 - if not specified in sqoop command)
As #JaimeCr said you can't use --split-by with import-all-tables but this just a quote from sqoop guide in context of error you got:
If a table does not have a primary key defined and the --split-by> <col> is not provided, then import will fail unless the number of mappers is explicitly set to one with the --num-mappers 1 or --m 1 option or the --autoreset-to-one-mapper option is used.
The option --autoreset-to-one-mapper is typically used with the import-all-tables tool to automatically handle tables without a primary key in a schema.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera/retail_db" \
--username=retail_dba \
--password=cloudera \
--autoreset-to-one-mapper \
--warehouse-dir /user/cloudera/sqoop_import/

How to create separate DDL file for create tables and constraints in DB2

For DB2 10.1 LUW, I want to create a DDL script which just contains the create table statements and then create a DDL script which just contains the constraints.
The aim is to automate the creation of the database scripts from the existing database and not rely on SQL GUIs to do this like Data Studio, SQLDbx and RazorSQL. Ideally we would like to trigger actions from the command line that can generate the create DLL for the schema, the insert data statements for each table in the schema and then generate the constraints DDL. This will help us to insert the data without dealing with the constraints which will help performance and means we will not be restricted to running the inserts in a specific order.
Unfortunately we cannot use db2move and I thought db2look would work for the DDL, but cannot find the syntax to do this. Can db2look do this or is there anything else out there that could help?
Thanks
db2look generates DDL script from existing DB and objects.
For example,
1. Getting all DDL for a DB
db2look -d <dbname> -a -e -l -x -o db2look.out
To get DDL for a table
db2look –d dbname –e –z schema_name –t Table_name -o output_file -nofed