Unable to replace dash with null during COPY operation from CSV - postgresql

I have the following CSV data
"AG","Saint Philip","AG-08"
"AI","Anguilla","-"
"AL","Berat","AL-01"
I want to replace - with NULL
I use the following command
copy subdivision from '/tmp/IP2LOCATION-ISO3166-2.CSV' with delimiter as ',' NULL AS '-' csv;
The copy operation is success. However, - in 3rd column is being copied as well, instead of replaced with NULL.
Do you have idea what mistake in my command? My table is
CREATE TABLE subdivision(
country_code TEXT NOT NULL,
name TEXT NOT NULL,
code TEXT
);

It comes down to the quoting. If you have this:
"AI","Anguilla",-
"AL","Berat","AL-01"
Then the below works(using newer COPY format):
copy
subdivision
from
'/home/postgres/csv_test.csv'
with(format csv, delimiter ',' , NULL '-');
COPY 3
\pset null NULL
select * from subdivision ;
country_code | name | code
--------------+--------------+-------
AG | Saint Philip | AG-08
AI | Anguilla | NULL
AL | Berat | AL-01
If you maintain the original csv:
"AG","Saint Philip","AG-08"
"AI","Anguilla","-"
"AL","Berat","AL-01"
then you have to do this:
copy
subdivision
from
'/home/postgres/csv_test.csv'
with(format csv, delimiter ',' , NULL '-', FORCE_NULL (code) );
select * from subdivision ;
country_code | name | code
--------------+--------------+-------
AG | Saint Philip | AG-08
AI | Anguilla | NULL
AL | Berat | AL-01
where FORCE_NULL is:
https://www.postgresql.org/docs/current/sql-copy.html
FORCE_NULL
Match the specified columns' values against the null string, even if it has been quoted, and if a match is found set the value to NULL. In the default case where the null string is empty, this converts a quoted empty string into NULL. This option is allowed only in COPY FROM, and only when using CSV format.
So to convert quoted values you have to force the conversion by specifying the columns(s)

Related

How to select the part of a column between _ and a

i have a column href which contiens a liste of links
href
100712107%23C3.3_.pdf
100712107~%23C3.2_C00.zip
100740104_C2.3.xls
testme_aze_--b00.xls
My ouput would be
href
name
rev
100712107%23C3.3_.pdf
100712107%23C3.3
100712107~%23C3.2_C00.zip
100712107~%23C3.2
C00
100740104_C2.3.xls
100740104
C2.3
testme_aze_--b00.xls
testme_aze
--b00
I have tried to use ** reverse(split_part(reverse(href),'.',1))**
You can use a regex together with substring() to extract the parts you want:
select href,
substring(href from '(.*)_') as name,
substring(href from '.*_(.*)\.') as rev
from the_table
returns:
href | name | rev
--------------------------+-------------------+------
100712107%23C3.3_.pdf | 100712107%23C3.3 |
100712107~%23C3.2_C00.zip | 100712107~%23C3.2 | C00
100740104_C2.3.xls | 100740104 | C2.3
testme_aze_--b00.xls | testme_aze | --b00
substring() will return the first group from the regex. The firs regex matches all characters until the last _.
The second expression matches everything after the last _ up until the last .
You want add a column to the table ?
ALTER TABLE TABLE_NAME ADD COLUMN HREF VARCHAR(255) NOT NULL;

How to update JSONB column with value coming from another table column in PostgreSQL

I have a source table, which lists like below:
public.source
Id | part_no | category
1 | 01270-4 | Landscape
2 | 01102-3 | Sports
Then, I have target table with jsonb column (combinations) , which list like below;
public.target
Id | part_no | combinations
7 | 01270-4 | {"subject":""}
8 | 01102-3 | {"subject":""}
My problem is - how I can update the target table with jsonb column (combinations) with the values coming from source table using the part_no column?
Output like:
Id | part_no | combinations
7 | 01270-4 | {"subject":"Landscape"}
8 | 01102-3 | {"subject":"Sports"}
I tried below but giving error:
UPDATE public.target t
SET combinations = jsonb_set(combinations,'{subject}','s.category',false)
FROM public.source s
WHERE s.part_no = t.part_no;
ERROR: invalid input syntax for type json
LINE 2: SET combinations = jsonb_set(combinations,'{subject}', 's.categor...
^
DETAIL: Token "s" is invalid.
CONTEXT: JSON data, line 1: s...
SQL state: 22P02
Character: 77
You should use to_jsonb function to convert s.category to JSON
Demo
UPDATE public.target t
SET combinations = jsonb_set(combinations,'{subject}',to_jsonb(s.category),false)
FROM public.source s
WHERE s.part_no = t.part_no
Or you can use sample structure for join and update two JSON field:
Demo
UPDATE public.target t
SET combinations = combinations || jsonb_build_object('subject', s.category)
FROM public.source s
WHERE s.part_no = t.part_no

What is the simplest way to migrate data from MySQL to DB2

I need to migrate data from MySQL to DB2. Both DBs are up and running.
I tried to mysqldump with --no-create-info --extended-insert=FALSE --complete-insert and with a few changes on the output (e.g. change ` to "), I get to a satisfactory result but sometimes I have weird exceptions, like
does not have an
ending string delimiter. SQLSTATE=42603
Ideally I would want to have a routine that is as general as possible, but as an example here, let's say I have a DB2 table that looks like:
db2 => describe table "mytable"
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
id SYSIBM BIGINT 8 0 No
name SYSIBM VARCHAR 512 0 No
2 record(s) selected.
Its MySQL counterpart being
mysql> describe mytable;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(512) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
Let's assume the DB2 and MySQL databases are called mydb.
Now, if I do
mysqldump -uroot mydb mytable --no-create-info --extended-insert=FALSE --complete-insert | # mysldump, with options (see below): # do not output table create statement # one insert statement per record# ouput table column names
sed -n -e '/^INSERT/p' | # only keep lines beginning with "INSERT"
sed 's/`/"/g' | # replace ` with "
sed 's/;$//g' | # remove `;` at end of insert query
sed "s/\\\'/''/g" # replace `\'` with `''` , see http://stackoverflow.com/questions/2442205/how-does-one-escape-an-apostrophe-in-db2-sql and http://stackoverflow.com/questions/2369314/why-does-sed-require-3-backslashes-for-a-regular-backslash
, I get:
INSERT INTO "mytable" ("id", "name") VALUES (1,'record 1')
INSERT INTO "mytable" ("id", "name") VALUES (2,'record 2')
INSERT INTO "mytable" ("id", "name") VALUES (3,'record 3')
INSERT INTO "mytable" ("id", "name") VALUES (4,'record 4')
INSERT INTO "mytable" ("id", "name") VALUES (5,'" "" '' '''' \"\" ')
This ouput can be used as a DB2 query and it works well.
Any idea on how to solve this more efficiently/generally? Any other suggestions?
After having played around a bit I came with the following routine which I believe to be fairly general, robust and scalable.
1 Run the following command:
mysqldump -uroot mydb mytable --no-create-info --extended-insert=FALSE --complete-insert | # mysldump, with options (see below): # do not output table create statement # one insert statement per record# ouput table column names
sed -n -e '/^INSERT/p' | # only keep lines beginning with "INSERT"
sed 's/`/"/g' | # replace ` with "
sed -e 's/\\"/"/g' | # replace `\"` with `#` (mysql escapes double quotes)
sed "s/\\\'/''/g" > out.sql # replace `\'` with `''` , see http://stackoverflow.com/questions/2442205/how-does-one-escape-an-apostrophe-in-db2-sql and http://stackoverflow.com/questions/2369314/why-does-sed-require-3-backslashes-for-a-regular-backslash
Note: here unlike in the question ; are not being removed.
2 upload the file to DB2 server
scp out.sql user#myserver:out.sql
3 run queries from the file
db2 -tvsf /path/to/query/file/out.sql

how to deal with missings when importing csv to postgres?

I would like to import a csv file, which has multiple occurrences of missing values. I recoded them into NULL and tried to import the file as. I suppose that my attributes which include the NULLS are character values. However transforming them to numeric is bit complicated. Therefore I would like to import all of my table as:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' DELIMITER ';' CSV WITH NULL AS 'NULL' ';' HEADER
There must be a syntax error. But I tried different combinations and always get:
ERROR: syntax error at or near "WITH NULL"
LINE 1: COPY player_allstar FROM STDIN DELIMITER ';' CSV WITH NULL ...
I also tried:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
and get:
ERROR: invalid input syntax for integer: "NULL"
CONTEXT: COPY player_allstar, line 2, column dreb: "NULL"
I suppose it is caused by preprocessing with R. The Table came with NAs so I change them to:
data[data==NA] <- "NULL"
I`m not aware of a different way chaning to NULL. I think this causes strings. Is there a different way to preprocess and keep the NAs(as NULLS in postgres of course)?
Sample:
pts dreb oreb reb asts stl
11 NULL NULL 8 3 NULL
4 5 3 8 2 1
3 NULL NULL 1 1 NULL
data type is integer
Given /tmp/sample.csv:
pts;dreb;oreb;reb;asts;stl
11;NULL;NULL;8;3;NULL
4;5;3;8;2;1
3;NULL;NULL;1;1;NULL
then with a table like:
CREATE TABLE player_allstar (pts integer, dreb integer, oreb integer, reb integer, asts integer, stl integer);
it works for me:
\copy player_allstar FROM '/tmp/sample.csv' WITH (FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Your syntax is fine, the problem seem to be in the formatting of your data. Using your syntax I was able to load data with NULLs successfully:
mydb=# create table test(a int, b text);
CREATE TABLE
mydb=# \copy test from stdin WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> col a header;col b header
>> 1;one
>> NULL;NULL
>> 3;NULL
>> NULL;four
>> \.
mydb=# select * from test;
a | b
---+------
1 | one
|
3 |
| four
(4 rows)
mydb=# select * from test where a is null;
a | b
---+------
|
| four
(2 rows)
In your case you can substitute to NULL 'NA' in the copy command, if the original value is 'NA'.
You should make sure that there's no spaces around your data values. For example, if your NULL is represented as NA in your data and fields are delimited with semicolon:
1;NA <-- good
1 ; NA <-- bad
1<tab>NA <-- bad
etc.

Filter an ID Column against a range of values

I have the following SQL:
SELECT ',' + LTRIM(RTRIM(CAST(vessel_is_id as CHAR(2)))) + ',' AS 'Id'
FROM Vessels
WHERE ',' + LTRIM(RTRIM(CAST(vessel_is_id as varCHAR(2)))) + ',' IN (',1,2,3,4,5,6,')
Basically, I want to filter the vessel_is_id against a variable list of integer values (which is passed in as a varchar into the stored proc). Now, the above SQL does not work. I do have rows in the table with a `vessel__is_id' of 1, but they are not returned.
Can someone suggest a better approach to this for me? Or, if the above is OK
EDIT:
Sample data
| vessel_is_id |
| ------------ |
| 1 |
| 2 |
| 5 |
| 3 |
| 1 |
| 1 |
So I want to returned all of the above where vessel_is_id is in a variable filter i.e. '1,3' - which should return 4 records.
Cheers.
Jas.
IF OBJECT_ID(N'dbo.fn_ArrayToTable',N'FN') IS NOT NULL
DROP FUNCTION [dbo].[fn_ArrayToTable]
GO
CREATE FUNCTION [dbo].fn_ArrayToTable (#array VARCHAR(MAX))
-- =============================================
-- Author: Dan Andrews
-- Create date: 04/11/11
-- Description: String to Tabled-Valued Function
--
-- =============================================
RETURNS #output TABLE (data VARCHAR(256))
AS
BEGIN
DECLARE #pointer INT
SET #pointer = CHARINDEX(',', #array)
WHILE #pointer != 0
BEGIN
INSERT INTO #output
SELECT RTRIM(LTRIM(LEFT(#array,#pointer-1)))
SELECT #array = RIGHT(#array, LEN(#array)-#pointer),
#pointer = CHARINDEX(',', #array)
END
RETURN
END
Which you may apply like:
SELECT * FROM dbo.fn_ArrayToTable('2,3,4,5,2,2')
and in your case:
SELECT LTRIM(RTRIM(CAST(vessel_is_id AS CHAR(2)))) AS 'Id'
FROM Vessels
WHERE LTRIM(RTRIM(CAST(vessel_is_id AS VARCHAR(2)))) IN (SELECT data FROM dbo.fn_ArrayToTable('1,2,3,4,5,6')
Since Sql server doesn't have an Array you may want to consider passing in a set of values as an XML type. You can then turn the XML type into a relation and join on it. Drawing on the time-tested pubs database for example. Of course you're client may or may not have an easy time generating the XML for the parameter value, but this approach is safe from sql-injection which most "comma seperated" value approaches are not.
declare #stateSelector xml
set #stateSelector = '<values>
<value>or</value>
<value>ut</value>
<value>tn</value>
</values>'
select * from authors
where state in ( select c.value('.', 'varchar(2)') from #stateSelector.nodes('//value') as t(c))