How to store string spaces as null in numeric column - postgresql

I want to get records from my local txt file to postgresql table.
I have created following table.
create table player_info
(
Name varchar(20),
City varchar(30),
State varchar(30),
DateOfTour date,
pay numeric(5),
flag char
)
And, my local txt file contains following data.
John|Mumbai| |20170203|55555|Y
David|Mumbai| |20170305| |N
Romcy|Mumbai| |20170405|55555|N
Gotry|Mumbai| |20170708| |Y
I am just executing this,
copy player_info (Name,
City,
State,
DateOfTour,
pay_id,
flag)
from local 'D:\sample_player_info.txt'
delimiter '|' null as ''
exceptions 'D:\Logs\player_info'
What I want is,
For my numeric column, If 3 spaces are there,
then I have to insert NULL as pay else whatever 5 digits numeric number.
pay is a column in my table whose datatype is numeric.
Is this correct or possible to do this ?

You cannot store strings in a numeric column, at all. 3 spaces is a string, so it cannot be stored in the column pay as that is defined as numeric.
A common approach to this conundrum is to create a staging table which uses less precise data types in the column definitions. Import the source data into the staging table. Then process that data so that it can be reliably added to the final table. e.g. in the staging table set a column called pay_str to NULL where pay_str = ' ' (or perhaps LIKE ' %')

Related

value too long for type character varying(512)--Why can't import the data?

The maximum size of limited character types (e.g. varchar(n)) in Postgres is 10485760.
description on max length of postgresql's varchar
Please download the file for testing and extract it in /tmp/2019q4, we only use pre.txt to import data with.
sample data
Enter you psql and create a database:
postgres=# create database edgar;
postgres=# \c edgar;
Create table according to the webpage:
fields in pre table definations
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel varchar(512),
negating boolean
);
CREATE TABLE
Try to import data:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
We analyse the error info:
ERROR: value too long for type character varying(512)
CONTEXT: COPY pre, line 1005798, column plabel: "LIABILITIES AND STOCKHOLDERS EQUITY 0
0001493152-19-017173 2 11 BS 0 H LiabilitiesAndStockholdersEqu..."
Time: 1481.566 ms (00:01.482)
1.What size i set in the field is just 512 ,more less than 10485760.
2.the content in line 1005798 is not same as in error info:
0001654954-19-012748 6 20 EQ 0 H ReclassificationAdjustmentRelatingToAvailableforsaleSecuritiesNetOfTaxEffect 0001654954-19-012748 Reclassification adjustment relating to available-for-sale securities, net of tax effect" 0
Now i drop the previous table ,convert the plabel field as text,re-create it:
edgar=# drop table pre;
DROP TABLE
Time: 22.763 ms
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel text,
negating boolean
);
CREATE TABLE
Time: 81.895 ms
Import the same data with same copy command:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
COPY 275079
Time: 2964.898 ms (00:02.965)
edgar=#
No error info in psql console,let me check the raw data '/tmp/2019q4/pre.txt' ,which it contain 1043000 lines.
wc -l /tmp/2019q4/pre.txt
1043000 /tmp/2019q4/pre.txt
There are 1043000 lines,how much lines imported then?
edgar=# select count(*) from pre;
count
--------
275079
(1 row)
Why so less data imported without error info ?
The sample data you provided is obviously not the data you are really loading. It does still show the same error, but of course the line numbers and markers are different.
That file occasionally has double quote marks where there should be single quote marks (apostrophes). Because you are using CSV mode, these stray double quotes will start multi-line strings, which span all the way until the next stray double quote mark. That is why you have fewer rows of data than lines of input, because some of the data values are giant multiline strings.
Since your data clearly isn't CSV, you probably shouldn't be using \copy in CSV format. It loads fine in text format as long as you specify "header", although that option didn't become available in text format until v15. For versions before that, you could manually remove the header line, or use PROGRAM to skip the header like FROM PROGRAM 'tail +2 /tmp/pre.txt' Alternatively, you could keep using CSV format, but choose a different quote character, one that never shows up in your data such as with (delimiter E'\t', format csv, header, quote E'\b')

Discarding rows containing empty string in CSV from uploading through SQL Loader control file

I am trying to upload a CSV which may/may not contain empty value for a column in a row.
I want to discard the rows that contain empty value from uploading to the DB through SQL Loader.
How can this be handled in ctrl file:
I have tried below conditions in the ctl file :
when String_Value is not null
when String_Value <> ''
but the rows are still getting inserted
This worked for me using either '<>' or '!='. I suspect the order of the clauses was incorrect for you. Note colc (also the third column in the data file) matches the column name in the table.
load data
infile 'c:\temp\x_test.dat'
TRUNCATE
into table x_test
when colc <> ''
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
cola char,
colb char,
colc char,
cold integer external
)

Creating a non-numeric Sequence in Postgres

I've come across a requirement to create a sequence in Postgres for generating a code (in string) which is expected to generate a unique code to increment by one for each new row and it should follow a six digit pattern.
For instance,
AC0001
AC0040
AC0201
AC3421
where the first two letters are chars and the remaining are integers.
I have created a sequence first,
CREATE SEQUENCE code_sequence START WITH 1
INCREMENT BY 1
CACHE 1;
Then, created a table,
CREATE TABLE account
(
code VARCHAR NOT NULL DEFAULT 'AC'||nextval('code_sequence'::regclass)::VARCHAR,
desc VARCHAR
);
This generates the code as AC1, AC2 etc. But, I want to have the code like AC0001, AC0002. Trying to "pad" zero's just after the 'AC'.
I would appreciate, if any one suggest a solution or idea for this problem.
Use to_char() to format the number:
CREATE TABLE account
(
code VARCHAR NOT NULL DEFAULT 'AC'||to_char(nextval('code_sequence'), 'FM0000'),
"desc" VARCHAR
);
Try the LPAD function.
CREATE TABLE account
(
code VARCHAR NOT NULL DEFAULT 'AC' || LPAD(nextval('code_sequence'::regclass), 4, '0')::VARCHAR,
desc VARCHAR
);

How does Redshift treat guillemets?

I am trying to run a CSV import using the COPY command for some data that includes a guillemet (»). Redshift complains that the column value is too long for the varchar column I have defined. The error in the "Loads" tab in the Redshift GUI displays this character as two dots: .. - had it been treated as one, it would have fit in the varchar column. It's not clear whether there is some sort of conversion error occurring or if there is a display issue.
When trying to do plain INSERTs I run into strange behavior as well:
dev=# create table test (name varchar(3));
CREATE TABLE
dev=# insert into test values ('bla');
INSERT 0 1
3 characters treated as 4?
dev=# insert into test values ('bl»');
ERROR: value too long for type character varying(3)
dev=# insert into test values ('b»');
INSERT 0 1
Why does char_length return 2?
dev=# select char_length(name), name from test;
char_length | name
-------------+------
2 | b»
I've checked the client encoding and database encodings and those all seem to be UTF8/UNICODE.
You need to increase the length of your varchar field. Multibyte characters use more than one character and length in the definition of varchar field are byte based. So, your special char might be taking more than a byte. If it still doesn't work refer to the doc page for Redshift below,
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html

Load NULL TIMESTAMP with TIME ZONE using COPY FROM in PostgreSQL

I have a CSV file that I'm trying to load into a PostgreSQL 9.2.4 database using the COPY FROM command. In particular there is a timestamp field that is allowed to be null, however when I load "null values" (actually just "") I get the following error:
ERROR: invalid input syntax for type timestamp with time zone: ""
An example CSV file looks as follows:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",""
The SQL looks as follows:
CREATE TABLE "users"
(
"id" BIGSERIAL NOT NULL PRIMARY KEY,
"name" VARCHAR(255),
"joined" TIMESTAMP WITH TIME ZONE,
);
COPY "users" ("id", "name", "joined")
FROM '/path/to/data.csv'
WITH (
ENCODING 'utf-8',
HEADER 1,
FORMAT 'csv'
);
According to the documentation, null values should be represented by an empty string that cannot contain the quote character, which is double quote (") in this case:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
Note: When using COPY FROM, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with COPY TO.
I've tried the option NULL '' but that seems to have no affect. Advice, please!
empty string without quotes works normally:
id,name,joined
1,"bob","2013-10-02 15:27:44-05"
2,"jane",
select * from users;
id | name | joined
----+------+------------------------
1 | bob | 2013-10-03 03:27:44+07
2 | jane |
maybe it would be simpler to replace "" with empty string using sed.
The FORCE_NULL option for COPY FROM in Postgres 9.4+ would be the most elegant way to solve your problem. Per documentation:
FORCE_NULL
Match the specified columns' values against the null string, even if
it has been quoted, and if a match is found set the value to NULL. In
the default case where the null string is empty, this converts a
quoted empty string into NULL. This option is allowed only in COPY
FROM, and only when using CSV format.
Of course, it converts all matching values in all columns.
In older versions, you can COPY to a temporary table with the same table layout - except data type text for the problem column. Then fix offending values and INSERT from there:
single quotes appear arround value after running copy in postgres 9.2
Could not get it to work. Ended up using this program:
http://neilb.bitbucket.org/csvfix/
With that you can replace empty fileds with other values.
So for example in your case column 3 needs to have a timestamp value, so I give it a fake one. In this case '1900-01-01 00:00:00'. if needed you can delete or filter them out once the data is imported.
$CSVFIXHOME/csvfix map -f 3 -fv '' -tv '1900-01-01 00:00:00' -rsep ',' $YOURFILE > $FILEWITHDATES
After that you can import the newly created file.