Sqoop: how switch off Prepared Statements? - postgresql

I use Sqoop 1.4.5-cdh5.4.2 and Postgresql.
If Sqoop connects directly to the database - all right.
But need use Sqoop over pgbouncer, and I have problem with this.
In pgbouncer you can not do prepared statements transaction mode.
... connect command:
sqoop import \
--connect "$db_name" \
--username "$db_user" \
--password "$db_pass" \
--direct \
--hive-import \
--hive-table "$hive_schema.$t" \
--hive-overwrite \
--num-mappers 10 \
--fetch-size 100000 \
--split-by "object_id" \
--target-dir "/user/$hive_schema/$t" \
--table "$t"
... and error:
org.postgresql.util.PSQLException: ERROR: prepared statement "S_3" already exists
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at org.postgresql.jdbc2.AbstractJdbc2Connection.executeTransactionCommand(AbstractJdbc2Connection.java:791)
at org.postgresql.jdbc2.AbstractJdbc2Connection.commit(AbstractJdbc2Connection.java:815)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:315)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:126)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:188)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Add prepareThreshold=0 to the connection string

SQOOP don't work with pgbouncer and transaction pool :(

Related

No manager for connect string

As soon as my EMR-Cluster was ready to be run.
I started facing some issues when listing databases and importing sqoop
Apparently, sqoop has been installed normally and it is working normally when I type "sqoop help" in Linux terminal.
using sqoop help
as you can see, the command could be recognized normally.
However, if I try out the sqoop import command, this one cannot be and it faces an error:
sqoop import \
--connect jdbc:postgres://sportsdb.cxri########.us-east-2.rds.amazonaws.com/SportsDB \
--username postgres \
--password mypassword \
--table addresses --target-dir s3://sqoop-table-from-rds-to-s3/sqoop-table/ -m 1 --fields-terminated-by '\t' --lines-terminated-by ','
sqoop import
The same goes to the second one, which is "sqoop list-databases" as shown below:
sqoop list-databases \
--connect jdbc:postgres://sportsdb.cxri########.us-east-2.rds.amazonaws.com \
--username postgres \
--password mypassword
sqoop list-databases
they don't really works and anything happens ;/
I also downloaded jar and put into /usr/lib/sqoop/lib/ where is the jar files on sqoop
To do it I run these two follow commands:
1) wget -O postgresql-jdbc.jar https://jdbc.postgresql.org/download/postgresql-42.3.1.jar
2) sudo mv postgresql-jdbc.jar /usr/lib/sqoop/lib/
Jar file added to sqoop/lib
Someone else has a suggestion about what can be done in order to fix this issue?
The issue could be solved after following a tip received about a typo.
Then, I just changed the word postgres for postgresql as follows:
sqoop list-databases \
--connect jdbc:postgresql://sportsdb.cxri########.us-east-2.rds.amazonaws.com \
--username postgres \
--password mypassword
That is it. The issue was fixed just adjusting something pretty simple

Command not found when use sqoop

When I use sqoop, it tells me:
'--table: command not found'
The code is as follows:
sqoop export --connect jdbc:mysql://hadoop01:3306/result \
--username root \
--password 000000 \
--table stats_event \
--export-dit hdfs://hadoop01:8020/user/hive/warehouse/eclog.db/stats_event \
--input-fields-terminated-by '\t' \
--update-node allowsert \
--update-key paltform_dimension_id,date_dimension_id,event_dimension_id \
- m 1;
Typo:
--export-dit → --export-dir
it's typo mistake:
--export-dir and also
--update-node allowser --> --update-node allowinsert

Hive import is not compatible with importing into AVRO format

I have the following codes:
sqoop import --connect jdbc:mysql://localhost/export \
--username root \
--password cloudera \
--table cust \
--hive-import \
--create-hive-table \
--fields-terminated-by ' ' \
--hive-table default.cust \
--target-dir /user/hive/warehouse/cust \
--compression-codec org.apache.org.io.compress.GzipCodec \
--as-avrodatafile \
-m 1
got the following error, please help.
Hive import is not compatible with importing into AVRO format.
Currently, Sqoop does not support to import AVRO format directly into a HIVE table, so as a workaround you can import into HDFS and create a EXTERNAL TABLE in HIVE
Step 1 : IMPORT into hdfs
sqoop import --connect jdbc:mysql://localhost/export \
--username root --password cloudera
--table cust \
--target-dir /user/hive/warehouse/cust \
--compression-codec org.apache.org.io.compress.GzipCodec \
--as-avrodatafile -m 1
This import will create a schema file in the current directory (Linux) with an extension .avsc .Copy this file to some location in HDFS (PATH_TO_THE_COPIED_SCHEMA).
Step 2: Create an external table in HIVE like
CREATE EXTERNAL TABLE cust
STORED AS AVRO
LOCATION 'hdfs:///user/hive/warehouse/cust'
TBLPROPERTIES ('avro.schema.url'='hdfs:///PATH_TO_THE_COPIED_SCHEMA/cust.avsc');

Cloudera-Sqoop import with/without --hive-import

I'm trying to do an activity whereas i'll import a table from MSSQL then export to MSSQL again in another database for the sake of testing sqoop1. So far, my imports are successful. My concern is regarding the export, if i import a table without --hive-import option, i'll be able to export it successfully. But if i include --hive-import option, sqoop wont be able to export it and prompts an error:
17/04/02 23:08:20 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.DatasetIOException: Unable to load descriptor
file:hdfs://quickstart.cloudera:8020/user/hive/warehouse/customer/.metadata/descriptor.properties
for dataset:customer org.kitesdk.data.DatasetIOException: Unable to
load descriptor
file:hdfs://quickstart.cloudera:8020/user/hive/warehouse/customer/.metadata/descriptor.properties
for dataset:customer
As per checking, there's a difference in the metadata with --hive-imports. Imports with --hive-import parameter only does not have the required metadata:
Supplier/.metadata/descriptor.properties
My question is, is it possible to import a table in sqoop with --as-parquetfile and --hive-import option then be able to export it also?
here's my sample import and export code for referrence:
sqoop export \
--connect "jdbc:sqlserver://192.168.1.23;database=SqoopDB;schema=dbo;" \
--username sa \
--password Password1 \
--export-dir /user/hive/warehouse/customer \
--table customer
sqoop import \
--connect "jdbc:sqlserver://192.168.1.23;database=SourceDB;schema=dbo" \
--username sa \
--password Password1 \
--table Customer \
--as-parquetfile \
--hive-import \
--hive-overwrite \
-m 1

Sqoop export is successful but destination postgres table is empty

I am trying to export the table from hdfs to postgres
Below is the query which I used for export:
sqoop export --connect jdbc:postgresql:hostname:5432/postgresDB --username user --password password --input-fields-terminated-by '\001' --fields-terminated-by ',' --table customer --export-dir /hdfs/location/customer --input-null-string '\\N' --input-null-non-string '\\N' --direct --update-key customer_id
The sqoop query completes with success message. Please see the screenshot below:
But when I query the table, I am not finding any data.
Any help is appreciated. Thanks in advance.
sqoop export --connect jdbc:postgresql:hostname:5432/postgresDB \
--username user \
--password password \
--input-fields-terminated-by '\001' \
--fields-terminated-by ',' \
--table customer \
--export-dir /hdfs/location/customer \
--input-null-string '\\N' --input-null-non-string '\\N' \
--direct \
--update-mode allowinsert
This worked ..
For me, this worked after I added the schema name for my table name.
-- -- schema my_schema