Redshift: Column defined as NOT NULL allows null data to load - amazon-redshift

I expected the columns defined not null, should prevent to load the null values.
But this time, we got several null data for the column defined not null and it loaded to table without any STL_LOAD_ERROR, like other tables.
That column places at the last column of the record.
And I'm wondering something wrong in the options of the copy command as follows.
ESCAPE
GZIP
DATEFORMAT 'auto'
TIMEFORMAT 'auto'
DELIMITER ','
ACCEPTINVCHARS '?'
COMPUPDATE TRUE
STATUPDATE TRUE
MAXERROR 0
TRUNCATECOLUMNS
NULL AS '\000'
EXPLICIT_IDS
Any advice would be appreciated. Thank you.

Related

Assigning null value to a numeric column while importing csv

I am using the next cmd to import a csv file to my PostgreSQL table:
\copy myTable from 'myTable_sample_nh.csv' with delimiter E'\t' null as '\x00';
ERROR: invalid input syntax for type numeric: ""
CONTEXT: COPY myTable, line 1, column salary: ""
How can I do to bypass this error? I mean, during an INSERT row operation, if I do not specify a column then the DB does not assign a value to that column. In this case, the CSV file does not have a value for a numeric column field so the intention is to assign no value to that column.
'\x00' does not mean a zero byte, it means backslash followed by x followed by
00.
Considering that, it actually works:
create table test(n numeric);
\copy test from stdin with delimiter E'\t' null as '\x00'
>> \x00
>> \.
test=> select n is null from test;
?column?
----------
t
But presumably what you meant is a null byte to represent a SQL NULL. Syntactically, it can be written as E'\x00' but in practice it's unusable: for implementation reasons, null bytes in strings are not supported by Postgres, in pretty much any context. In the context of the null specification of \copy it would be rejected even before submitting input data:
\copy test from stdin with delimiter E'\t' null as E'\x00'
ERROR: invalid byte sequence for encoding "UTF8": 0x00
The solution is to use something else that doesn't appear in the data, or the empty string: null as '' is accepted by \copy.

Postgres \copy a file with double quotes

This is what my data looks like -
"to_claim_id" "NEW_PATIENT" "from_rend" "from_bill" "to_rend" "to_bill" "from_date" "to_date" "days_diff"
"10193136348200818391" "102657" "103325" "174597" "1830139" "17497" 20180904 20181002 28
How do I import this data into my database using \copy?
I have tried \copy public.data from '/data/test' with delimiter E'\t' csv header quote '"' but I get ERROR: value too long for type character varying(25) error.
That means at least one column in the target table public.data is type varchar(25) and a corresponding value in the CSV file has more characters.
You might change the data type of such columns (temporarily) to just varchar or text, import, and then identify and trim offending values - or just live happily ever after as you probably don't need that restriction to begin with.
Related:
Any downsides of using data type "text" for storing strings?

Discarding rows containing empty string in CSV from uploading through SQL Loader control file

I am trying to upload a CSV which may/may not contain empty value for a column in a row.
I want to discard the rows that contain empty value from uploading to the DB through SQL Loader.
How can this be handled in ctrl file:
I have tried below conditions in the ctl file :
when String_Value is not null
when String_Value <> ''
but the rows are still getting inserted
This worked for me using either '<>' or '!='. I suspect the order of the clauses was incorrect for you. Note colc (also the third column in the data file) matches the column name in the table.
load data
infile 'c:\temp\x_test.dat'
TRUNCATE
into table x_test
when colc <> ''
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
cola char,
colb char,
colc char,
cold integer external
)

Insert null values to postgresql timestamp data type using python

I am tying to insert null value to a postgres timestamp datatype variable using python psycopg2.
The problem is the other data types such as char or int takes None, whereas the timestamp variable does not recognize None.
I tried to insert Null , null as a string because I am using a dictionary to get append the values for insert statement.
Below is the code.
queryDictOrdered[column] = queryDictOrdered[column] if isNull(queryDictOrdered[column]) is False else NULL
and the function is
def isNull(key):
if str(key).lower() in ('null','n.a','none'):
return True
else:
False
I get the below error messages:
DataError: invalid input syntax for type timestamp: "NULL"
DataError: invalid input syntax for type timestamp: "None"
Empty timestamps in Pandas dataframes come through as NaT (not a time), which is NOT pg compatible with NULL. A quick work around is to send it as a varchar and then run these 2 queries:
update <<schema.table_name>> set <<column_name>> = Null where
<<column_name>> = 'NULL';
or (depending on what you hard coded empty values as)
update <<schema.table_name>> set <<column_name>> = Null where <<column_name>> = 'NaT';
Finally run:
alter table <<schema.table_name>>
alter COLUMN <<column_name>> TYPE timestamp USING <<column_name>>::timestamp without time zone;
Surely you are adding quotes around the placeholder. Read psycopg documentation about passing parameters to queries.
Dropping this here incase it's helpful for anyone.
Using psycopg2 and the cursor object's copy_from method, you can copy missing or NaT datetime values from a pandas DataFrame to a Postgres timestamp field.
The copy_from method has a null parameter that is a "textual representation of NULL in the file. The default is the two characters string \N". See this link for more information.
Using pandas' fillna method, you can replace any missing datetime values with \N via data["my_datetime_field"].fillna("\\N"). Notice the double backslash here, where the first backslash is necessary to escape the second backslash.
Using the select_columns method from the pyjanitor module (or .loc[] and some subsetting with the column names of your DataFrame), you can coerce multiple columns at once via something akin to this, where all of your datetime fields end with an _at suffix.
data_datetime_fields = \
(data
.select_columns("*_at")
.apply(lambda x: x.fillna("\\N")))

Load NULL TIMESTAMP with TIME ZONE using COPY FROM in PostgreSQL

I have a CSV file that I'm trying to load into a PostgreSQL 9.2.4 database using the COPY FROM command. In particular there is a timestamp field that is allowed to be null, however when I load "null values" (actually just "") I get the following error:
ERROR: invalid input syntax for type timestamp with time zone: ""
An example CSV file looks as follows:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",""
The SQL looks as follows:
CREATE TABLE "users"
(
"id" BIGSERIAL NOT NULL PRIMARY KEY,
"name" VARCHAR(255),
"joined" TIMESTAMP WITH TIME ZONE,
);
COPY "users" ("id", "name", "joined")
FROM '/path/to/data.csv'
WITH (
ENCODING 'utf-8',
HEADER 1,
FORMAT 'csv'
);
According to the documentation, null values should be represented by an empty string that cannot contain the quote character, which is double quote (") in this case:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
Note: When using COPY FROM, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with COPY TO.
I've tried the option NULL '' but that seems to have no affect. Advice, please!
empty string without quotes works normally:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",
select * from users;
id | name | joined
----+------+------------------------
1 | bob | 2013-10-03 03:27:44+07
2 | jane |
maybe it would be simpler to replace "" with empty string using sed.
The FORCE_NULL option for COPY FROM in Postgres 9.4+ would be the most elegant way to solve your problem. Per documentation:
FORCE_NULL
Match the specified columns' values against the null string, even if
it has been quoted, and if a match is found set the value to NULL. In
the default case where the null string is empty, this converts a
quoted empty string into NULL. This option is allowed only in COPY
FROM, and only when using CSV format.
Of course, it converts all matching values in all columns.
In older versions, you can COPY to a temporary table with the same table layout - except data type text for the problem column. Then fix offending values and INSERT from there:
single quotes appear arround value after running copy in postgres 9.2
Could not get it to work. Ended up using this program:
http://neilb.bitbucket.org/csvfix/
With that you can replace empty fileds with other values.
So for example in your case column 3 needs to have a timestamp value, so I give it a fake one. In this case '1900-01-01 00:00:00'. if needed you can delete or filter them out once the data is imported.
$CSVFIXHOME/csvfix map -f 3 -fv '' -tv '1900-01-01 00:00:00' -rsep ',' $YOURFILE > $FILEWITHDATES
After that you can import the newly created file.