Snowflake null values quoted in CSV breaks PostgreSQL unload - postgresql

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO #STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.

When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.
This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
)

Related

Snowfalke csv copy failures

I'm trying to load csv data into snowflake using snowflake copy. CSV column separator is pipe character (|). One of the columns in the input csv has | character in data which is escaped by backslash . The datatype in the target snowflake database is VARCHAR(n). With the addition of the escape character(\) in the data the data size exceeds the target column definition size which causes copy to fail.
Is there a way I can remove the escape character (\) from the data before loaded into the table?
copy into table_name from 's3path.csv.gz' file_format=(type = 'csv', field_optionally_enclosed_by ='"' escape_unenclosed_field = NONE empty_field_as_null = true escape ='\' field_delimter ='|' skip_header=1 NULL_IF = ('\\N', '', '\N')) ON_ERROR ='ABORT_STATEMENT' PURGE = FALSE
Sample data that causes the failure: "Data 50k | $200K "

Copy into snowflake table from raw data file using Perl DBI

There's not much info out there for perl dbi and snowflake so I'll give this a shot. I have a raw file, of which the headers are contained in line 1. This exact 'copy into' command works from the snowflake gui. I'm not sure if I can just take this exact command and put it into a perl prepare and execute.
COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1:auction_id_64 as AUCTION_ID_64,
DATEADD(S,\$1:date_time,'1970-01-01') as DATE_TIME,
$1:user_tz_offset as USER_TZ_OFFSET,
$1:creative_width as CREATIVE_WIDTH,
$1:creative_height as CREATIVE_HEIGHT,
$1:media_type as MEDIA_TYPE,
$1:fold_position as FOLD_POSITION,
$1:event_type as EVENT_TYPE
FROM #DBTABLE.lnd.S3_STAGE_READY/pr/data/standard/data_dt=20200825/00/STANDARD_FILE.gz.parquet)
pattern = '.*.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%'
my $SQL = "COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA\$FILENAME,'/',4) as SEAT_ID,
\$1:auction_id_64 as AUCTION_ID_64,
DATEADD(S,\$1:date_time,'1970-01-01') as DATE_TIME,
\$1:user_tz_offset as USER_TZ_OFFSET,
\$1:creative_width as CREATIVE_WIDTH,
\$1:creative_height as CREATIVE_HEIGHT,
\$1:media_type as MEDIA_TYPE,
\$1:fold_position as FOLD_POSITION,
\$1:event_type as EVENT_TYPE
FROM \#DBTABLE.lnd.S3_STAGE_READY/pr/data/standard/data_dt=20200825/00/STANDARD_FILE.gz.parquet)
pattern = '.*.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%'";
my $sth = $dbh->prepare($sql);
$sth->execute;
In looking at the output from snowflake I see this error
syntax error line 3 at position 4 unexpected '?'.
syntax error line 4 at position 13 unexpected '?'.
COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1? as AUCTION_ID_64,
DATEADD(S,$1?,'1970-01-01') as DATE_TIME,
$1? as USER_TZ_OFFSET,
$1? as CREATIVE_WIDTH,
$1? as CREATIVE_HEIGHT,
$1? as MEDIA_TYPE
Do I need to create bind variables for each of the columns? I usually pull in the data from the file and put them into variables but this is different as I can't read the raw file first, it has to come directly from the copy into command.
Any help would be appreciated.
It was interpreting the : as a bind variable value, rather than a value in a variant. I used the bracket notation, instead like the following:
my $SQL = "COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA\$FILENAME,'/',4) as SEAT_ID,
\$1['auction_id_64'] as AUCTION_ID_64,
DATEADD(S,\$1['date_time,'1970-01-01') as DATE_TIME,
\$1['user_tz_offset'] as USER_TZ_OFFSET,
\$1:creative_width'] as CREATIVE_WIDTH,
etc...
That worked

DB2 SQL Error: SQLCODE=-302 while executing prepared statement

I have a SQL query which takes user inputs hence security flaw is present.
The existing query is:
SELECT BUS_NM, STR_ADDR_1, CITY_NM, STATE_CD, POSTAL_CD, COUNTRY_CD,
BUS_PHONE_NB,PEG_ACCOUNT_ID, GDN_ALERT_ID, GBIN, GDN_MON_REF_NB,
ALERT_DT, ALERT_TYPE, ALERT_DESC,ALERT_PRIORITY
FROM ( SELECT A.BUS_NM, AE.STR_ADDR_1, A.CITY_NM, A.STATE_CD, A.POSTAL_CD,
CC.COUNTRY_CD, A.BUS_PHONE_NB, A.PEG_ACCOUNT_ID, 'I' ||
LPAD(INTL_ALERT_DTL_ID, 9,'0') GDN_ALERT_ID,
LPAD(IA.GBIN, 9,'0') GBIN, IA.GDN_MON_REF_NB,
DATE(IAD.ALERT_TS) ALERT_DT,
XMLCAST(XMLQUERY('$A/alertTypeConfig/biqCode/text()' passing
IAC.INTL_ALERT_TYPE_CONFIG as "A") AS CHAR(4)) ALERT_TYPE,
, ROW_NUMBER() OVER () AS "RN"
FROM ACCOUNT A, Other tables
WHERE IA.GDN_MON_REF_NB = '100'
AND A.PEG_ACCOUNT_ID = IAAR.PEG_ACCOUNT_ID
AND CC.COUNTRY_CD = A.COUNTRY_ISO3_CD
ORDER BY IA.INTL_ALERT_ID ASC )
WHERE ALERT_TYPE IN (" +TriggerType+ ");
I changed it to accept TriggerType from setString like:
SELECT BUS_NM, STR_ADDR_1, CITY_NM, STATE_CD, POSTAL_CD, COUNTRY_CD,
BUS_PHONE_NB,PEG_ACCOUNT_ID, GDN_ALERT_ID, GBIN, GDN_MON_REF_NB,
ALERT_DT, ALERT_TYPE, ALERT_DESC,ALERT_PRIORITY
FROM ( SELECT A.BUS_NM, AE.STR_ADDR_1, A.CITY_NM, A.STATE_CD, A.POSTAL_CD,
CC.COUNTRY_CD, A.BUS_PHONE_NB, A.PEG_ACCOUNT_ID,
'I' || LPAD(INTL_ALERT_DTL_ID, 9,'0') GDN_ALERT_ID,
LPAD(IA.GBIN, 9,'0') GBIN, IA.GDN_MON_REF_NB,
DATE(IAD.ALERT_TS) ALERT_DT,
XMLCAST(XMLQUERY('$A/alertTypeConfig/biqCode/text()' passing
IAC.INTL_ALERT_TYPE_CONFIG as "A") AS CHAR(4)) ALERT_TYPE,
ROW_NUMBER() OVER () AS "RN"
FROM ACCOUNT A, other tables
WHERE IA.GDN_MON_REF_NB = '100'
AND A.PEG_ACCOUNT_ID = IAAR.PEG_ACCOUNT_ID
AND CC.COUNTRY_CD = A.COUNTRY_ISO3_CD
ORDER BY IA.INTL_ALERT_ID ASC )
WHERE ALERT_TYPE IN (?);
Setting trigger type as below:
if (StringUtils.isNotBlank(request.getTriggerType())) {
preparedStatement.setString(1, triggerType != null ? triggerType.toString() : "");
}
Getting error as
Caused by: com.ibm.db2.jcc.am.SqlDataException: DB2 SQL Error: SQLCODE=-302, SQLSTATE=22001, SQLERRMC=null, DRIVER=4.19.26
The -302 SQLCODE indicates a conversion error of some sort.
SQLSTATE 22001 narrows that down a bit by telling us that you are trying to force a big string into a small variable. Given the limited information in your question, I am guessing it is the XMLCAST that is the culprit.
DB2 won't jam 30 pounds of crap into a 4 pound bag so to speak, it gives you an error. Maybe giving XML some extra room in the cast might be a help. If you need to make sure it ends up being only 4 characters long, you could explicitly do a LEFT(XMLCAST( ... AS VARCHAR(64)), 4). That way the XMLCAST has the space it needs, but you cut it back to fit your variable on the fetch.
The other thing could be that the variable being passed to the parameter marker is too long. DB2 will guess the type and length based on the length of ALERT_TYPE. Note that you can only pass a single value through a parameter marker. If you pass a comma separated list, it will not behave as expected (unless you expect ALERT_TYPE to also contain a comma separated list). If you are getting the comma separated list from a table, you can use a sub-select instead.
Wrong IN predicate use with a parameter.
Do not expect that IN ('AAAA, M250, ABCD') (as you try to do passing a comma-separated string as a single parameter) works as IN ('AAAA', 'M250', 'ABCD') (as you need). These predicates are not equivalent.
You need some "string tokenizer", if you want to pass such a comma-separated string like below.
select t.*
from
(
select XMLCAST(XMLQUERY('$A/alertTypeConfig/biqCode/text()' passing IAC.INTL_ALERT_TYPE_CONFIG as "A") AS CHAR(4)) ALERT_TYPE
from table(values xmlparse(document '<alertTypeConfig><biqCode>M250, really big code</biqCode></alertTypeConfig>')) IAC(INTL_ALERT_TYPE_CONFIG)
) t
--WHERE ALERT_TYPE IN ('AAAA, M250, ABCD')
join xmltable('for $id in tokenize($s, ",\s?") return <i>{string($id)}</i>'
passing cast('AAA, M250 , ABCD' as varchar(200)) as "s"
columns token varchar(200) path '.') x on x.token=t.ALERT_TYPE
;
Run the statement as is. Then you may uncomment the string with WHERE clause and comment out the rest to see what you try to do.
P.S.:
The error you get is probably because you don't specify the data type of the parameter (you don't use something like IN (cast(? as varchar(xxx))), and db2 compiler assumes that its length is equal to the length of the ALERT_TYPE expression (4 bytes).

How can I combine these two statements?

I'm currently trying to insert data into a database from a text boxes, $enter / $enter2 being where the text is being written.
The database consists of three columns ID, name and nametwo
ID is auto incrementing and works fine
Both statements work fine on their own, but because they are being issued separately the first leaves nametwo blank and the second leaves name blank.
I've tried combining both but haven't had much luck, hope someone can help.
$dbh->do("INSERT INTO $table(name) VALUES ('".$enter."')");
$dbh->do("INSERT INTO $table(nametwo) VALUES ('".$enter2."')");
To paraphrase what others have said:
my $sth = $dbh->prepare("INSERT INTO $table(name,nametwo) values (?,?)");
$sth->execute($enter, $enter2);
So you don't have to worry about quoting.
You should read database manual.
The query should be:
$dbh->do("INSERT INTO $table(name,nametwo) VALUES ('".$enter."', '".$enter2."')");
The SQL syntax is
INSERT INTO MyTable (
name_one,
name_two
) VALUES (
"value_one",
"value_two"
)
Your way of generating SQL statements is very fragile. For example, it will fail if the table name is Values or the value is Jester's.
Solution 1:
$dbh->do("
INSERT INTO ".$dbh->quote_identifier($table_name)."
name_one,
name_two
) VALUES (
".$dbh->quote($value_one).",
".$dbh->quote($value_two)."
)
");
Solution 2: Placeholders
$dbh->do(
" INSERT INTO ".$dbh->quote_identifier($table_name)."
name_one,
name_two
) VALUES (
?, ?
)
",
undef,
$value_one,
$value_two,
);

how to deal with missings when importing csv to postgres?

I would like to import a csv file, which has multiple occurrences of missing values. I recoded them into NULL and tried to import the file as. I suppose that my attributes which include the NULLS are character values. However transforming them to numeric is bit complicated. Therefore I would like to import all of my table as:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' DELIMITER ';' CSV WITH NULL AS 'NULL' ';' HEADER
There must be a syntax error. But I tried different combinations and always get:
ERROR: syntax error at or near "WITH NULL"
LINE 1: COPY player_allstar FROM STDIN DELIMITER ';' CSV WITH NULL ...
I also tried:
\copy player_allstar FROM '/Users/Desktop/Rdaten/Data/player_allstar.csv' WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
and get:
ERROR: invalid input syntax for integer: "NULL"
CONTEXT: COPY player_allstar, line 2, column dreb: "NULL"
I suppose it is caused by preprocessing with R. The Table came with NAs so I change them to:
data[data==NA] <- "NULL"
I`m not aware of a different way chaning to NULL. I think this causes strings. Is there a different way to preprocess and keep the NAs(as NULLS in postgres of course)?
Sample:
pts dreb oreb reb asts stl
11 NULL NULL 8 3 NULL
4 5 3 8 2 1
3 NULL NULL 1 1 NULL
data type is integer
Given /tmp/sample.csv:
pts;dreb;oreb;reb;asts;stl
11;NULL;NULL;8;3;NULL
4;5;3;8;2;1
3;NULL;NULL;1;1;NULL
then with a table like:
CREATE TABLE player_allstar (pts integer, dreb integer, oreb integer, reb integer, asts integer, stl integer);
it works for me:
\copy player_allstar FROM '/tmp/sample.csv' WITH (FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Your syntax is fine, the problem seem to be in the formatting of your data. Using your syntax I was able to load data with NULLs successfully:
mydb=# create table test(a int, b text);
CREATE TABLE
mydb=# \copy test from stdin WITH(FORMAT CSV, DELIMITER ';', NULL 'NULL', HEADER);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> col a header;col b header
>> 1;one
>> NULL;NULL
>> 3;NULL
>> NULL;four
>> \.
mydb=# select * from test;
a | b
---+------
1 | one
|
3 |
| four
(4 rows)
mydb=# select * from test where a is null;
a | b
---+------
|
| four
(2 rows)
In your case you can substitute to NULL 'NA' in the copy command, if the original value is 'NA'.
You should make sure that there's no spaces around your data values. For example, if your NULL is represented as NA in your data and fields are delimited with semicolon:
1;NA <-- good
1 ; NA <-- bad
1<tab>NA <-- bad
etc.