Snowfalke csv copy failures - copy

I'm trying to load csv data into snowflake using snowflake copy. CSV column separator is pipe character (|). One of the columns in the input csv has | character in data which is escaped by backslash . The datatype in the target snowflake database is VARCHAR(n). With the addition of the escape character(\) in the data the data size exceeds the target column definition size which causes copy to fail.
Is there a way I can remove the escape character (\) from the data before loaded into the table?
copy into table_name from 's3path.csv.gz' file_format=(type = 'csv', field_optionally_enclosed_by ='"' escape_unenclosed_field = NONE empty_field_as_null = true escape ='\' field_delimter ='|' skip_header=1 NULL_IF = ('\\N', '', '\N')) ON_ERROR ='ABORT_STATEMENT' PURGE = FALSE
Sample data that causes the failure: "Data 50k | $200K "

Related

Improve SqlLdr performance for 120 Million records Upload from Csv

It is taking almost 10hrs to finish loading into tables.
Here is my ctl file.
OPTIONS (
skip =1,
ERRORS=4000000,
READSIZE=5000000,
BINDSIZE=8000000,
direct=true
unrecoverable
)
load data
INFILE 'weeklydata1108.csv'
INSERT INTO TABLE t_location_data
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(f_start_ip,
f_end_ip,
f_country,
f_count_code ,
f_reg,
f_stat,
f_city,
f_p_code ,
f_area,
f_lat,
f_long,
f_anon_stat ,
f_pro_detect date "YYYY-MM-DD",
f_date "SYSDATE")
And sqlldr command for running it is
sqlldr username#\"\(DESCRIPTION=\(ADDRESS=\(HOST=**mydbip***\)\(PROTOCOL=TCP\)\(PORT=1521\)\)\(CONNECT_DATA=\(SID=Real\)\)\)\"/geolocation control='myload.ctl' log='insert.log' bad='insert.bad'

Snowflake null values quoted in CSV breaks PostgreSQL unload

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO #STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.
When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.
This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
)

MySQL 5.7.26 does not accept \N for nullable int column using Load Data Infile

MySQL does not accept the \N value as null when I'm sure I've used it in the past for the same purpose.
Steps:
CREATE TABLE t (i INT NULL, name varchar(50));
Create a file called test.txt with tab separated fields:
1 John
\N Jim
In MySQL:
LOAD DATA INFILE '/var/lib/mysql-files/test.txt'
INTO TABLE t
Fields terminated BY "\t"
Escaped by ''
Lines terminated BY "\r\n";
Error:
ERROR 1366 (HY000): Incorrect integer value: '\N' for column 'i' at row 2
I should have done this:
LOAD DATA infile '/var/lib/mysql-files/test.txt'
INTO TABLE t
fields terminated BY "\t"
Escaped by '\\'
lines terminated BY "\r\n";

How to ignore errors, but not skip lines in COPY command redshift?

I have the below COPY statement. It skips lines for maxerror. Is there any way to COPY data over to redshift, forcing any errors into the column regardless of type? I dont want to lose information.
sql_prelim = """copy table1 from 's3://dwreplicatelanding/file.csv.gz'
access_key_id 'xxxx'
secret_access_key 'xxxx'
DELIMITER '\t' timeformat 'auto'
GZIP IGNOREHEADER 1
trimblanks
CSV
BLANKSASNULL
maxerror as 100000
"""
The error I want to skip is below, but ideally I want to skip all errors and maintain data:
1207- Invalid digit, Value 'N', Pos 0, Type: Decimal

Psycopg2 copy_from throws DataError: invalid input syntax for integer

I have a table with some integer columns. I am using psycopg2's copy_from
conn = psycopg2.connect(database=the_database,
user="postgres",
password=PASSWORD,
host="",
port="")
print "Putting data in the table: Opened database successfully"
cur = conn.cursor()
with open(the_file, 'r') as f:
cur.copy_from(file=f, table = the_table, sep=the_delimiter)
conn.commit()
print "Successfully copied all data to the database!"
conn.close()
The error says that it expects the 8th column to be an integer and not a string. But, Python's write method can only read strings to the file. So, how would you import a file full of string representation of number to postgres table with columns that expect integer when your file can only have character representation of the integer (e.g. str(your_number)).
You either have to write numbers in integer format to the file (which Python's write method disallows) or psycopg2 should be smart enough to the conversion as part of copy_from procedure, which it apparently is not. Any idea is appreciated.
I ended up using copy_expert command. Note that on Windows, you have to set the permission of the file. This post is very useful setting permission.
with open(the_file, 'r') as f:
sql_copy_statement = "copy {table} FROM '"'{from_file}'"' DELIMITER '"'{deli}'"' {file_type} HEADER;".format(table = the_table,
from_file = the_file,
deli = the_delimiter,
file_type = the_file_type
)
print sql_copy_statement
cur.copy_expert(sql_copy_statement, f)
conn.commit()