What is the simplest way to migrate data from MySQL to DB2

What is the simplest way to migrate data from MySQL to DB2 - db2

I need to migrate data from MySQL to DB2. Both DBs are up and running.
I tried to mysqldump with --no-create-info --extended-insert=FALSE --complete-insert and with a few changes on the output (e.g. change ` to "), I get to a satisfactory result but sometimes I have weird exceptions, like
does not have an
ending string delimiter. SQLSTATE=42603
Ideally I would want to have a routine that is as general as possible, but as an example here, let's say I have a DB2 table that looks like:
db2 => describe table "mytable"
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
id SYSIBM BIGINT 8 0 No
name SYSIBM VARCHAR 512 0 No
2 record(s) selected.
Its MySQL counterpart being
mysql> describe mytable;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(512) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
Let's assume the DB2 and MySQL databases are called mydb.
Now, if I do
mysqldump -uroot mydb mytable --no-create-info --extended-insert=FALSE --complete-insert | # mysldump, with options (see below): # do not output table create statement # one insert statement per record# ouput table column names
sed -n -e '/^INSERT/p' | # only keep lines beginning with "INSERT"
sed 's/`/"/g' | # replace ` with "
sed 's/;$//g' | # remove `;` at end of insert query
sed "s/\\\'/''/g" # replace `\'` with `''` , see http://stackoverflow.com/questions/2442205/how-does-one-escape-an-apostrophe-in-db2-sql and http://stackoverflow.com/questions/2369314/why-does-sed-require-3-backslashes-for-a-regular-backslash
, I get:
INSERT INTO "mytable" ("id", "name") VALUES (1,'record 1')
INSERT INTO "mytable" ("id", "name") VALUES (2,'record 2')
INSERT INTO "mytable" ("id", "name") VALUES (3,'record 3')
INSERT INTO "mytable" ("id", "name") VALUES (4,'record 4')
INSERT INTO "mytable" ("id", "name") VALUES (5,'" "" '' '''' \"\" ')
This ouput can be used as a DB2 query and it works well.
Any idea on how to solve this more efficiently/generally? Any other suggestions?

After having played around a bit I came with the following routine which I believe to be fairly general, robust and scalable.
1 Run the following command:
mysqldump -uroot mydb mytable --no-create-info --extended-insert=FALSE --complete-insert | # mysldump, with options (see below): # do not output table create statement # one insert statement per record# ouput table column names
sed -n -e '/^INSERT/p' | # only keep lines beginning with "INSERT"
sed 's/`/"/g' | # replace ` with "
sed -e 's/\\"/"/g' | # replace `\"` with `#` (mysql escapes double quotes)
sed "s/\\\'/''/g" > out.sql # replace `\'` with `''` , see http://stackoverflow.com/questions/2442205/how-does-one-escape-an-apostrophe-in-db2-sql and http://stackoverflow.com/questions/2369314/why-does-sed-require-3-backslashes-for-a-regular-backslash
Note: here unlike in the question ; are not being removed.
2 upload the file to DB2 server
scp out.sql user#myserver:out.sql
3 run queries from the file
db2 -tvsf /path/to/query/file/out.sql

Related

Why I am not able to get the specific column in psql

I have a table in my psql db called cls and user called cls
then tries to get the specific column from the existing table [name: test] but I am not able to retrieve the table.
Snippets as below:
psql -U cls
cls # select * from test;
name |   ip |   user |   password |   group |   created_on
-----------+--------+------+--------------+--------+----------------------------
server | 1.1.1.1 | test | pwd | gp1 | 2022-08-04 13:55:00.765548
cls # select ip from test where name='server';
LINE 1: select ip from test where name='server';
^
HINT: Perhaps you meant to reference the column "test. ip".
cls # select test.ip from test where name='server';
LINE 1: select ip from test where name='server';
^
HINT: Perhaps you meant to reference the column "test. ip".
cls # select t.ip from test t;
LINE 1: select t.ip from test t;
^
HINT: Perhaps you meant to reference the column "t. ip".
I tried double quotes and single quotes but no luck.

As the error message says, your column isn't called ip, it's called  ip - notice the "funny" character before the i.

problems with full-text search in postgres

I have the next table, and data:
/* script for people table, with field tsvector and gin */
CREATE TABLE public.people (
id INTEGER,
name VARCHAR(30),
lastname VARCHAR(30),
complete TSVECTOR
)
WITH (oids = false);
CREATE INDEX idx_complete ON public.people
USING gin (complete);
/* data for people table */
INSERT INTO public.people ("id", "name", "lastname", "complete")
VALUES
(1, 'MICHAEL', 'BRYANT BRYANT', '''bryant'':2,3 ''michael'':1'),
(2, 'HENRY STEVEN', 'BUSH TIESSEN', '''bush'':3 ''henri'':1 ''steven'':2 ''tiessen'':4'),
(3, 'WILLINGTON STEVEN', 'STEPHENS FLINN', '''flinn'':4 ''stephen'':3 ''steven'':2 ''willington'':1'),
(4, 'BRET', 'MARTINEZ AROCH', '''aroch'':3 ''bret'':1 ''martinez'':2'),
(5, 'TERENCE BERT', 'CAVALIERE ENRON', '''bert'':2 ''cavalier'':3 ''terenc'':1');
I need retrieve the names and lastnames, according the tsvector field. Actually I have the query:
SELECT * FROM people WHERE complete ## to_tsquery('WILLINGTON & FLINN');
And the result is right (the third record). BUT if I try with
SELECT * FROM people WHERE complete ## to_tsquery('STEVEN & FLINN');
/* the same record! */
I don't have results. Why? What can I do?

You should use the same language to search your table as the values in your field 'complete' where inserted.
Check the result of that query compared english and german:
select * ,
to_tsvector('english', concat_ws(' ', name, lastname )) as english,
to_tsvector('german', concat_ws(' ', name, lastname )) as german
from public.people
so that should work for you :
SELECT * FROM people WHERE complete ## to_tsquery('english','STEVEN & FLINN');

You are probably using a text search configuration where either STEVEN or FLINN are modified by stemming.
I can reproduce this here:
test=> SHOW default_text_search_config;
default_text_search_config
----------------------------
pg_catalog.german
(1 row)
test=> SELECT complete FROM public.people WHERE id = 3;
complete
-------------------------------------------------
'flinn':4 'stephen':3 'steven':2 'willington':1
(1 row)
test=> SELECT * FROM ts_debug('STEVEN & FLINN');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+--------+---------------+-------------+---------
asciiword | Word, all ASCII | STEVEN | {german_stem} | german_stem | {stev}
blank | Space symbols | | {} | |
blank | Space symbols | & | {} | |
asciiword | Word, all ASCII | FLINN | {german_stem} | german_stem | {flinn}
(4 rows)
test=> SELECT * FROM public.people
WHERE complete ## to_tsquery('STEVEN & FLINN');
id | name | lastname | complete
----+------+----------+----------
(0 rows)
So you see, the German Snowball dictionary stems STEVEN to stev.
Since complete contains the unstemmed version steven, no match is found.
You should use the same text search configuration when you populate complete and in the query.

psql SQL Interpolation in a code block

In some of my scripts I use SQL Interpolation feature of psql utility:
basic.sql:
update :schema.mytable set ok = true;
> psql -h 10.0.0.1 -U postgres -f basic.sql -v schema=myschema
Now I need bit more complicated scenario. I need to specify schema name (and desirebly some other things) inside PL/pgSQL code block:
pg.sql
do
$$
begin
update :schema.mytable set ok = true;
end;
$$
But unfortunately this does not work, since psql does not replace :variables inside $$.
Is there a way to workaround it in general? Or more specifically, how to substitute schema names into pgSQL code block or function definition?

in your referenced docs:
Variable interpolation will not be performed within quoted SQL
literals and identifiers. Therefore, a construction such as ':foo'
doesn't work to produce a quoted literal from a variable's value (and
it would be unsafe if it did work, since it wouldn't correctly handle
quotes embedded in the value).
it does not matter if quotes are double dollar sign or single quote - it wont work, eg:
do
'
begin
update :schema.mytable set ok = true;
end;
'
ERROR: syntax error at or near ":"
to pass variable into quoted statement other way you can try using shell variables, eg:
MacBook-Air:~ vao$ cat do.sh; export schema_name='smth' && bash do.sh
psql -X so <<EOF
\dn+
do
\$\$
begin
execute format ('create schema %I','$schema_name');
end;
\$\$
;
\dn+
EOF
List of schemas
Name | Owner | Access privileges | Description
----------+----------+-------------------+------------------------
public | vao | vao=UC/vao +| standard public schema
| | =UC/vao |
schema_a | user_old | |
(2 rows)
DO
List of schemas
Name | Owner | Access privileges | Description
----------+----------+-------------------+------------------------
public | vao | vao=UC/vao +| standard public schema
| | =UC/vao |
schema_a | user_old | |
smth | vao | |
(3 rows)

String Include some other strings

How do I check in postgres that a varchar contains 'aaa' or 'bbb'?
I tried myVarchar IN ('aaa', 'bbb') but, obviously, it's true when myvarchar is exactly equal to 'aaa' or 'bbb'.

for multiple similarity check the best fit in terms of speed and laconic syntax would be
SIMILAR TO '%(aaa|bbb|ccc)%'

you can use ANY & LIKE operators together.
SELECT * FROM "myTable" WHERE "myColumn" LIKE ANY( ARRAY[ '%aaa%', '%bbb%' ] );

Assuming this is your table:
CREATE TABLE t
(
myVarchar varchar
) ;
INSERT INTO t (myVarchar)
VALUES
('something aaa else'),
('also some bbb'),
('maybe ccc') ;
-- (some random data, this query is PostgreSQL specific)
INSERT INTO t (myVarchar)
SELECT
random()::varchar
FROM
generate_series(1, 10000) ;
SQL Standard approach:
You can do (in all SQL standard databases):
SELECT
*
FROM
t
WHERE
myVarchar LIKE '%aaa%' or myVarchar LIKE '%bbb%' ;
and you'll get:
| myvarchar |
| :----------------- |
| something aaa else |
| also some bbb |
PostgreSQL specific approaches
Specifically for PostgreSQL, you can use a (single) regex with multiple values to look for:
SELECT
*
FROM
t
WHERE
myVarchar ~ 'aaa|bbb' ;
| myvarchar |
| :----------------- |
| something aaa else |
| also some bbb |
dbfiddle here
If you need quick finds, you can use trigram indexes, like this:
CREATE EXTENSION pg_trgm; -- Only needed if extension not already installed
CREATE INDEX myVarchar_like_idx
ON t
USING GIST (myVarchar gist_trgm_ops);
... the query using LIKE will be much faster.

How to Check for Two Columns and Query Every Table When They Exist?

I'm interested in doing a COUNT(*), SUM(LENGTH(blob)/1024./1024.), and ORDER BY SUM(LENGTH(blob)) for my entire database when column 'blob' exists. For tables where synchlevels does not exist, I still want the output. I'd like to GROUP BY that column:
Example
+--------+------------+--------+-----------+
| table | synchlevel | count | size_mb |
+--------+------------+--------+-----------+
| tableA | 0 | 924505 | 3013.47 |
| tableA | 7 | 981 | 295.33 |
| tableB | 6 | 1449 | 130.50 |
| tableC | 1 | 64368 | 68.43 |
| tableD | NULL | 359 | .54 |
| tableD | NULL | 778 | .05 |
+--------+------------+--------+-----------+
I would like to do a pure SQL solution, but I'm having a bit of difficulty with that. Currently, I'm wrapping some SQL into BASH.
#!/bin/bash
USER=$1
DBNAME=$2
function psql_cmd(){
cmd=$1
prefix='\pset border 2 \\ '
echo $prefix $cmd | psql -U $USER $DBNAME | grep -v "Border\| row"
}
function synchlevels(){
echo "===================================================="
echo " SYNCH LEVEL STATS "
echo "===================================================="
tables=($(psql -U $USER -tc "SELECT table_name FROM information_schema.columns
WHERE column_name = 'blob';" $DBNAME))
for table in ${tables[#]}; do
count_size="SELECT t.synchlevel,
COUNT(t.blob) AS count,
to_char(SUM(LENGTH(t.blob)/1024./1024.),'99999D99') AS size_mb
FROM $table AS t
GROUP BY t.synchlevel
ORDER BY SUM(LENGTH(t.blob)) DESC;"
echo $table
psql_cmd "$count_size"
done
echo "===================================================="
}
I could extend this by creating a second tables BASH array of tables which have the 'synchlevel' column, compare and use that list to run through the SQL, but I was wondering if there was a way I could just do the SQL portion purely in SQL without resorting to making these lists in BASH and doing the comparisons externally. i.e. I want to avoid needing to externally loop through the tables and making numerous queries in tables=($(psql -U $USER....
I've tried the following SQL to test on a table where I know the column doesn't exist...
SELECT
CASE WHEN EXISTS(SELECT * FROM information_schema.columns
WHERE column_name = 'synchlevel'
AND table_name = 'archivemetadata')
THEN synchlevel
END,
COUNT(blob) AS count,
to_char(SUM(LENGTH(blob)/1024./1024.),'99999D99') AS size_mb
FROM archivemetadata, information_schema.columns AS info
WHERE info.column_name = 'blob'
However, it fails on THEN synchlevel for tables where it doesn't exist. It seems really simple to do, but I just can't seem to find a way to do this which doesn't require either:
Resorting to external array comparisons in BASH.
Can be done, but I'd like to simplify my solution rather than add another layer.
Creating PL/PGSQL functions.
This script is really just to help with some database data analysis for improving performance in a third-party software. We are not a shop of DB Admins, so I would prefer not to dive into PL/PGSQL as that would require more folks from our shop to also become acquainted with the language in order to support the script. Again, simplicity is the motivation here.
Postgresql 8.4 is the engine. (We cannot upgrade due to security constraints by an overseeing IT body.)
Thanks for any suggestions you might have!

The following is untested, but how about creating some dynamic sql in one psql session and piping it to another?
psql -d <yourdb> -qtAc "
select 'select ' || (case when info.column_name = 'synchlevel' then 'synchlevel,' else '' end) ||
'count(*) as cnt,' ||
'to_char(SUM(LENGTH(blob)::NUMERIC/1024/1024),''99999D99'') AS size_mb' ||
'from ' || info.table_name ||
(case when info.column_name = 'synchlevel' then ' group by synchlevel order by synchlevel' else '' end)
from information_schema.columns as info
where info.table_name IN (select distinct table_name from information_schema.columns where column_name = 'blob')" | psql -d <yourdb>