Load NULL TIMESTAMP with TIME ZONE using COPY FROM in PostgreSQL - postgresql

I have a CSV file that I'm trying to load into a PostgreSQL 9.2.4 database using the COPY FROM command. In particular there is a timestamp field that is allowed to be null, however when I load "null values" (actually just "") I get the following error:
ERROR: invalid input syntax for type timestamp with time zone: ""
An example CSV file looks as follows:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",""
The SQL looks as follows:
CREATE TABLE "users"
(
"id" BIGSERIAL NOT NULL PRIMARY KEY,
"name" VARCHAR(255),
"joined" TIMESTAMP WITH TIME ZONE,
);
COPY "users" ("id", "name", "joined")
FROM '/path/to/data.csv'
WITH (
ENCODING 'utf-8',
HEADER 1,
FORMAT 'csv'
);
According to the documentation, null values should be represented by an empty string that cannot contain the quote character, which is double quote (") in this case:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
Note: When using COPY FROM, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with COPY TO.
I've tried the option NULL '' but that seems to have no affect. Advice, please!

empty string without quotes works normally:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",
select * from users;
id | name | joined
----+------+------------------------
1 | bob | 2013-10-03 03:27:44+07
2 | jane |
maybe it would be simpler to replace "" with empty string using sed.

The FORCE_NULL option for COPY FROM in Postgres 9.4+ would be the most elegant way to solve your problem. Per documentation:
FORCE_NULL
Match the specified columns' values against the null string, even if
it has been quoted, and if a match is found set the value to NULL. In
the default case where the null string is empty, this converts a
quoted empty string into NULL. This option is allowed only in COPY
FROM, and only when using CSV format.
Of course, it converts all matching values in all columns.
In older versions, you can COPY to a temporary table with the same table layout - except data type text for the problem column. Then fix offending values and INSERT from there:
single quotes appear arround value after running copy in postgres 9.2

Could not get it to work. Ended up using this program:
http://neilb.bitbucket.org/csvfix/
With that you can replace empty fileds with other values.
So for example in your case column 3 needs to have a timestamp value, so I give it a fake one. In this case '1900-01-01 00:00:00'. if needed you can delete or filter them out once the data is imported.
$CSVFIXHOME/csvfix map -f 3 -fv '' -tv '1900-01-01 00:00:00' -rsep ',' $YOURFILE > $FILEWITHDATES
After that you can import the newly created file.

Related

Postgres \copy a file with double quotes

This is what my data looks like -
"to_claim_id" "NEW_PATIENT" "from_rend" "from_bill" "to_rend" "to_bill" "from_date" "to_date" "days_diff"
"10193136348200818391" "102657" "103325" "174597" "1830139" "17497" 20180904 20181002 28
How do I import this data into my database using \copy?
I have tried \copy public.data from '/data/test' with delimiter E'\t' csv header quote '"' but I get ERROR: value too long for type character varying(25) error.
That means at least one column in the target table public.data is type varchar(25) and a corresponding value in the CSV file has more characters.
You might change the data type of such columns (temporarily) to just varchar or text, import, and then identify and trim offending values - or just live happily ever after as you probably don't need that restriction to begin with.
Related:
Any downsides of using data type "text" for storing strings?

How to store string spaces as null in numeric column

I want to get records from my local txt file to postgresql table.
I have created following table.
create table player_info
(
Name varchar(20),
City varchar(30),
State varchar(30),
DateOfTour date,
pay numeric(5),
flag char
)
And, my local txt file contains following data.
John|Mumbai| |20170203|55555|Y
David|Mumbai| |20170305| |N
Romcy|Mumbai| |20170405|55555|N
Gotry|Mumbai| |20170708| |Y
I am just executing this,
copy player_info (Name,
City,
State,
DateOfTour,
pay_id,
flag)
from local 'D:\sample_player_info.txt'
delimiter '|' null as ''
exceptions 'D:\Logs\player_info'
What I want is,
For my numeric column, If 3 spaces are there,
then I have to insert NULL as pay else whatever 5 digits numeric number.
pay is a column in my table whose datatype is numeric.
Is this correct or possible to do this ?
You cannot store strings in a numeric column, at all. 3 spaces is a string, so it cannot be stored in the column pay as that is defined as numeric.
A common approach to this conundrum is to create a staging table which uses less precise data types in the column definitions. Import the source data into the staging table. Then process that data so that it can be reliably added to the final table. e.g. in the staging table set a column called pay_str to NULL where pay_str = ' ' (or perhaps LIKE ' %')

How does Redshift treat guillemets?

I am trying to run a CSV import using the COPY command for some data that includes a guillemet (»). Redshift complains that the column value is too long for the varchar column I have defined. The error in the "Loads" tab in the Redshift GUI displays this character as two dots: .. - had it been treated as one, it would have fit in the varchar column. It's not clear whether there is some sort of conversion error occurring or if there is a display issue.
When trying to do plain INSERTs I run into strange behavior as well:
dev=# create table test (name varchar(3));
CREATE TABLE
dev=# insert into test values ('bla');
INSERT 0 1
3 characters treated as 4?
dev=# insert into test values ('bl»');
ERROR: value too long for type character varying(3)
dev=# insert into test values ('b»');
INSERT 0 1
Why does char_length return 2?
dev=# select char_length(name), name from test;
char_length | name
-------------+------
2 | b»
I've checked the client encoding and database encodings and those all seem to be UTF8/UNICODE.
You need to increase the length of your varchar field. Multibyte characters use more than one character and length in the definition of varchar field are byte based. So, your special char might be taking more than a byte. If it still doesn't work refer to the doc page for Redshift below,
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html

Insert null values to postgresql timestamp data type using python

I am tying to insert null value to a postgres timestamp datatype variable using python psycopg2.
The problem is the other data types such as char or int takes None, whereas the timestamp variable does not recognize None.
I tried to insert Null , null as a string because I am using a dictionary to get append the values for insert statement.
Below is the code.
queryDictOrdered[column] = queryDictOrdered[column] if isNull(queryDictOrdered[column]) is False else NULL
and the function is
def isNull(key):
if str(key).lower() in ('null','n.a','none'):
return True
else:
False
I get the below error messages:
DataError: invalid input syntax for type timestamp: "NULL"
DataError: invalid input syntax for type timestamp: "None"
Empty timestamps in Pandas dataframes come through as NaT (not a time), which is NOT pg compatible with NULL. A quick work around is to send it as a varchar and then run these 2 queries:
update <<schema.table_name>> set <<column_name>> = Null where
<<column_name>> = 'NULL';
or (depending on what you hard coded empty values as)
update <<schema.table_name>> set <<column_name>> = Null where <<column_name>> = 'NaT';
Finally run:
alter table <<schema.table_name>>
alter COLUMN <<column_name>> TYPE timestamp USING <<column_name>>::timestamp without time zone;
Surely you are adding quotes around the placeholder. Read psycopg documentation about passing parameters to queries.
Dropping this here incase it's helpful for anyone.
Using psycopg2 and the cursor object's copy_from method, you can copy missing or NaT datetime values from a pandas DataFrame to a Postgres timestamp field.
The copy_from method has a null parameter that is a "textual representation of NULL in the file. The default is the two characters string \N". See this link for more information.
Using pandas' fillna method, you can replace any missing datetime values with \N via data["my_datetime_field"].fillna("\\N"). Notice the double backslash here, where the first backslash is necessary to escape the second backslash.
Using the select_columns method from the pyjanitor module (or .loc[] and some subsetting with the column names of your DataFrame), you can coerce multiple columns at once via something akin to this, where all of your datetime fields end with an _at suffix.
data_datetime_fields = \
(data
.select_columns("*_at")
.apply(lambda x: x.fillna("\\N")))

PostgreSQL insert query

I try to insert a single line to log table, but it throws an error message . ><
The log table structure is like this:
no integer NOT NULL nextval('log_no_seq'::regclass)
ip character varying(50)
country character varying(10)
region character varying(10)
city character varying(50)
postalCode character varying(10)
taken numeric
date date
and my query:
INSERT INTO log (ip,country,region,city,postalCode,taken,date) VALUES
("24.24.24.24","US","NY","Binghamton","11111",1,"2011-11-09")
=> ERROR: column "postalcode" of relation "log" does not exist
second try query : (without postalcode)
INSERT INTO log (ip,country,region,city,taken,date) VALUES
("24.24.24.24","US","NY","11111",1,"2011-11-09")
=> ERROR: column "24.24.24.24" does not exist
I don't know what I did wrong...
And PostgreSQL does not have datetime type? (2011-11-09 11:00:10)
Try single quotes (e.g. '2011-11-09')
PostgreSQL has a "datetime" type: timestamp. Read the manual here.
The double-qutes "" are used for identifiers if you want them as is. It's best you never have to use them as #wildplasser advised.
String literals are enclosed in single quotes ''.
Start by reading the chapter Lexical Structure. It is very informative. :)
Try it rewrite in this way:
INSERT INTO log (ip,country,region,city,"postalCode",taken,date) VALUES
('24.24.24.24','US','NY','Binghamton','11111',1,'2011-11-09');
When you are using mixed case in the name of column, or reserved words (such as "column", "row" etc.), you have to use double quotes, instead of values, where you have to use a single ones, as you can see in the example.