I try to use Db2 external tables, with Swift or Amazon S3 object storage option.
I use my own Openstack Swift server.
Here are the details of steps:
[i1156#lat111 ~]$ swift --auth-version 3 --os-auth-url http://myip:5000/v3 --os-project-name myproject --os-project-domain-name default --os-username user1 --os-password mypass list container1
outfile
[i1156#lat111 ~]$ db2 "CREATE EXTERNAL TABLE TB_EXTERNAL(COL1 VARCHAR(5)) USING (FORMAT TEXT DELIMITER '|' QUOTEDVALUE DOUBLE CCSID 1208 NULLVALUE 'NULL' NOLOG TRUE DATAOBJECT 'outfile' SWIFT ('http://myip:5000/v3', 'user1', 'mypass', 'container1'))"
DB20000I The SQL command completed successfully.
[i1156#lat111 ~]$ db2 "select * from TB_EXTERNAL"
COL1
-----
SQL20569N The external table operation failed due to a problem with the
corresponding data file or diagnostic files. File name: "outfile". Reason
code: "1". SQLSTATE=428IB
The result is the same when I try to use AWS S3 storage. Any idea ?
Thanks
Related
I've updated some ETLs to spark 3.2.1 and delta lake 1.1.0. After doing this my local tests started to fail. After some debugging, I found that when I create an empty table with a specified location it is registered in the metastore with some prefix.
Let's say if try to create a table on the bronze DB with spark-warehouse/users as my specified location:
spark.sql("""CREATE DATABASE IF NOT EXISTS bronze""")
spark.sql("""CREATE TABLE bronze.users (
| name string,
| active boolean
|)
|USING delta
|LOCATION 'spark-warehouse/users'""".stripMargin)
I end up with:
spark-warehouse/bronze.db/spark-warehouse/users registered on the metastore but with the actual files in spark-warehouse/users! This makes any query to the table fail.
I generated a sample repository: https://github.com/adrianabreu/delta-1.1.0-table-location-error-example/blob/master/src/test/scala/example/HelloSpec.scala
I am trying to write a dataframe into postgreSQL database table.
When i write it into heroku's postgres SQL database, everything works fine. No problems.
For heroku postgresql, I use the connection string
connection_string = "postgresql+psycopg2://%s:%s#%s/%s" % (
conn_params['user'],
conn_params['password'],
conn_params['host'],
conn_params['dbname'])
However, when i try to write the dataframe into GCP's cloud sql table, i get the following error...
struct.error: 'h' format requires -32768 <= number <= 32767
The connection string i use for gcp cloud sql is as follows.
connection_string = \
f"postgresql+pg8000://{conn_params['user']}:{conn_params['password']}#{conn_params['host']}/{conn_params['dbname']}"
the command i use to write to the database is the same for both gcp and heroku
df_Output.to_sql(sql_tablename, con=conn, schema='public', index=False, if_exists=if_exists, method='multi')
I'd recommend using the Cloud SQL Python Connector to manage your connections and take care of the connection string for you. It supports the pg8000 driver and should help resolve your troubles.
from google.cloud.sql.connector import connector
import sqlalchemy
# configure Cloud SQL Python Connector properties
def getconn():
conn = connector.connect(
"project:region:instance",
"pg8000",
user="YOUR_USER",
password="YOUR_PASSWORD",
db="YOUR_DB"
)
return conn
# create connection pool to re-use connections
pool = sqlalchemy.create_engine(
"postgresql+pg8000://",
creator=getconn,
)
# query or insert into Cloud SQL database
with pool.connect() as db_conn:
# query database
result = db_conn.execute("SELECT * from my_table").fetchall()
# Do something with the results
for row in result:
print(row)
For more detailed examples refer to the README of the repository.
I want to create an external table in Redshift Spectrum from CSV files. When I try doing so with dbt, I get a strange error. But when I manually remove some double quotes from the SQL generated by dbt and run it directly, I get no such error.
First I run this in Redshift Query Editor v2 on default database dev in my cluster:
CREATE EXTERNAL SCHEMA example_schema
FROM DATA CATALOG
DATABASE 'example_db'
REGION 'us-east-1'
IAM_ROLE 'iam_role'
CREATE EXTERNAL DATABASE IF NOT EXISTS
;
Database dev now has an external schema named example_schema (and Glue catalog registers example_db).
I then upload example_file.csv to the S3 bucket s3://example_bucket. The file looks like this:
col1,col2
1,a,
2,b,
3,c
Then I run dbt run-operation stage_external_sources in my local dbt project and get this output with an error:
21:03:03 Running with dbt=1.0.1
21:03:03 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.example_project.example_models
21:03:03 1 of 1 START external source example_schema.example_table
21:03:03 1 of 1 (1) drop table if exists "example_db"."example_schema"."example_table" cascade
21:03:04 Encountered an error while running operation: Database Error
cross-database reference to database "example_db" is not supported
I try running the generated SQL in Query Editor:
DROP TABLE IF EXISTS "example_db"."example_schema"."example_table" CASCADE
and get the same error message:
ERROR: cross-database reference to database "example_db" is not supported
But when I run this SQL in Query Editor, it works:
DROP TABLE IF EXISTS "example_db.example_schema.example_table" CASCADE
Note that I just removed some quotes.
What's going on here? Is this a bug in dbt-core, dbt-redshift, or dbt_external_tables--or just a mistake on my part?
To confirm, I can successfully create the external table by running this in Query Editor:
DROP SCHEMA IF EXISTS example_schema
DROP EXTERNAL DATABASE
CASCADE
;
CREATE EXTERNAL SCHEMA example_schema
FROM DATA CATALOG
DATABASE 'example_db'
REGION 'us-east-1'
IAM_ROLE 'iam_role'
CREATE EXTERNAL DATABASE IF NOT EXISTS
;
CREATE EXTERNAL TABLE example_schema.example_table (
col1 SMALLINT,
col2 CHAR(1)
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 's3://example_bucket'
TABLE PROPERTIES ('skip.header.line.count'='1')
;
dbt config files
models/example/schema.yml (modeled after this example:
version: 2
sources:
- name: example_source
database: dev
schema: example_schema
loader: S3
tables:
- name: example_table
external:
location: 's3://example_bucket'
row_format: >
serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
'strip.outer.array'='false'
)
columns:
- name: col1
data_type: smallint
- name: col2
data_type: char(1)
dbt_project.yml:
name: 'example_project'
version: '1.0.0'
config-version: 2
profile: 'example_profile'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target"
clean-targets:
- "target"
- "dbt_packages"
models:
example_project:
example:
+materialized: view
packages.yml:
packages:
- package: dbt-labs/dbt_external_tables
version: 0.8.0
I am trying to do the differential data load from db2 to PostgreSQL table through InfoSphere Federation Server.
Followed below steps and got the expeption:
SQL1822N Unexpected error code "55000" received from data source "FEDSER".
Associated text and tokens are "This ResultSet is closed.".
Please find the below steps which I followed:
create wrapper jdbc
DB20000I The SQL command completed successfully.
CREATE SERVER FEDSER TYPE JDBC VERSION '12' WRAPPER JDBC OPTIONS( ADD DRIVER_PACKAGE 'E:\Sandhya\postgresql-8.1-415.jdbc3.jar', URL 'jdbc:postgresql://localhost:5432/SCOPEDB', DRIVER_CLASS 'org.postgresql.Driver', DB2_IUD_ENABLE 'Y', db2_char_blankpadded_comparison 'Y', db2_varchar_blankpadded_comparison 'Y', VARCHAR_NO_TRAILING_BLANKS 'Y', JDBC_LOG 'Y')
DB20000I The SQL command completed successfully.
CREATE USER MAPPING FOR SANAGARW SERVER FEDSER OPTIONS (REMOTE_AUTHID 'postgres',REMOTE_PASSWORD '*****')
DB20000I The SQL command completed successfully
SELECT COUNT(*) FROM "SCOPE".EMPLOYEE
SQL1822N Unexpected error code "55000" received from data source "FEDSER".
Associated text and tokens are "This ResultSet is closed.".
I am using Postgres version 12, Java version "1.8.0_241"
Please help me to resolve this issue. or once connection get created then I can only create the nickname.
Consider using Db2 11.5 instead of InfoSphere Federation Server which went out of support 2017-09-30 https://www.ibm.com/support/lifecycle/#/search?q=InfoSphere%20Federation%20Server
Db2 11.5 includes inbuilt support for PostgreSQL federation in all Db2 editions including Db2 Community Edition
https://www.ibm.com/support/pages/data-source-support-matrix-federation-bundled-db2-luw-v115
I was trying to import it but i am encountering some errors.
this is my error:
08:49:13 PM Restoring dbDB (contact)
Running: mysql --defaults-extra-file="/tmp/tmpdwf14l/extraparams.cnf" --host=127.0.0.1 --user=root --port=3306 --default-character-set=utf8 --comments
ERROR 1046 (3D000) at line 22: No database selected
Operation failed with exitcode 1
08:49:13 PM Restoring dbDBB (course)
Running: mysql --defaults-extra-file="/tmp/tmpMW20Fb/extraparams.cnf" --host=127.0.0.1 --user=root --port=3306 --default-character-set=utf8
ERROR 1046 (3D000) at line 22: No database selected
Error: You have not selected the default target schema in which to import the data from dump
Create a schema/database in MySQL and select that database in MySQL Workbench while importing data from Dump.
Or
You can edit the dump file and append a SQL statement at the start with some thing like this
create database test;
use test;
Solution as per the dump file of user:
--
-- Table structure for table `course`
--
Write the code as :
create database test1;
use test1;
--
-- Table structure for table `course`
--
This should do.
The error is because you havent selected any database; In the dump right below create schema 'database_name' (or create database 'database_name') add this : use 'database_name';
Replace the database_name with your DB name;