value too long for type character varying(512)--Why can't import the data? - postgresql

The maximum size of limited character types (e.g. varchar(n)) in Postgres is 10485760.
description on max length of postgresql's varchar
Please download the file for testing and extract it in /tmp/2019q4, we only use pre.txt to import data with.
sample data
Enter you psql and create a database:
postgres=# create database edgar;
postgres=# \c edgar;
Create table according to the webpage:
fields in pre table definations
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel varchar(512),
negating boolean
);
CREATE TABLE
Try to import data:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
We analyse the error info:
ERROR: value too long for type character varying(512)
CONTEXT: COPY pre, line 1005798, column plabel: "LIABILITIES AND STOCKHOLDERS EQUITY 0
0001493152-19-017173 2 11 BS 0 H LiabilitiesAndStockholdersEqu..."
Time: 1481.566 ms (00:01.482)
1.What size i set in the field is just 512 ,more less than 10485760.
2.the content in line 1005798 is not same as in error info:
0001654954-19-012748 6 20 EQ 0 H ReclassificationAdjustmentRelatingToAvailableforsaleSecuritiesNetOfTaxEffect 0001654954-19-012748 Reclassification adjustment relating to available-for-sale securities, net of tax effect" 0
Now i drop the previous table ,convert the plabel field as text,re-create it:
edgar=# drop table pre;
DROP TABLE
Time: 22.763 ms
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel text,
negating boolean
);
CREATE TABLE
Time: 81.895 ms
Import the same data with same copy command:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
COPY 275079
Time: 2964.898 ms (00:02.965)
edgar=#
No error info in psql console,let me check the raw data '/tmp/2019q4/pre.txt' ,which it contain 1043000 lines.
wc -l /tmp/2019q4/pre.txt
1043000 /tmp/2019q4/pre.txt
There are 1043000 lines,how much lines imported then?
edgar=# select count(*) from pre;
count
--------
275079
(1 row)
Why so less data imported without error info ?

The sample data you provided is obviously not the data you are really loading. It does still show the same error, but of course the line numbers and markers are different.
That file occasionally has double quote marks where there should be single quote marks (apostrophes). Because you are using CSV mode, these stray double quotes will start multi-line strings, which span all the way until the next stray double quote mark. That is why you have fewer rows of data than lines of input, because some of the data values are giant multiline strings.
Since your data clearly isn't CSV, you probably shouldn't be using \copy in CSV format. It loads fine in text format as long as you specify "header", although that option didn't become available in text format until v15. For versions before that, you could manually remove the header line, or use PROGRAM to skip the header like FROM PROGRAM 'tail +2 /tmp/pre.txt' Alternatively, you could keep using CSV format, but choose a different quote character, one that never shows up in your data such as with (delimiter E'\t', format csv, header, quote E'\b')

Related

Postgres is adding a space at the beginning and end of all fields

SLES 12 SP3
Postgres 10.8
I have duplicated a table to migrate data from a DB2 instance. The fields are all of type CHAR, VARCHAR, or TIMESTAMP. I originally tried to use \COPY to pull the data in from a pipe delimited file. But, it put a space at the beginning and end of all of the fields, even if this caused the field to be longer than it is defined. I found a claim online that this was a known issue with \COPY. At that point, I dropped the table, used sed and some other tools to convert the pipe delimited data into an SQL INSERT statement. I again had a leading and trailing space in every field.
There are a lot of columns but as an example of what I have follows:
FLD1 CHAR(6) PRIMARY KEY
FLD2 VARCHAR(8)
FLD3 TIMESTAMP
I am using the short form of INSERT.
INSERT INTO my_table VALUES
('123456', '12345678', '2021-01-01 12:34:56');
But when I do a SELECT, I get (note the leading and trailing spaces):
123456 | 12345678 | 2021-01-01 12:34:56 |
I would point out that the first two fields are now longer than they are defined by 2 characters.
Does anyone how I might fix this?
The -A argument to psql gives me the desired result.

Importing CSV file PostgreSQL using pgAdmin 4

I'm trying to import a CSV file to my PostgreSQL but I get this error
ERROR: invalid input syntax for integer: "id;date;time;latitude;longitude"
CONTEXT: COPY test, line 1, column id: "id;date;time;latitude;longitude"
my csv file is simple
id;date;time;latitude;longitude
12980;2015-10-22;14:13:44.1430000;59,86411203;17,64274849
The table is created with the following code:
CREATE TABLE kordinater.test
(
id integer NOT NULL,
date date,
"time" time without time zone,
latitude real,
longitude real
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE kordinater.test
OWNER to postgres;
You can use Import/Export option for this task.
Right click on your table
Select "Import/Export" option & Click
Provide proper option
Click Ok button
You should try this it must work
COPY kordinater.test(id,date,time,latitude,longitude)
FROM 'C:\tmp\yourfile.csv' DELIMITER ',' CSV HEADER;
Your csv header must be separated by comma NOT WITH semi-colon or try to change id column type to bigint
to know more
I believe the quickest way to overcome this issue is to create an intermediary temporary table, so that you can import your data and cast the coordinates as you please.
Create a similar temporary table with the problematic columns as text:
CREATE TEMPORARY TABLE tmp
(
id integer,
date date,
time time without time zone,
latitude text,
longitude text
);
And import your file using COPY:
COPY tmp FROM '/path/to/file.csv' DELIMITER ';' CSV HEADER;
Once you have your data in the tmp table, you can cast the coordinates and insert them into the test table with this command:
INSERT INTO test (id, date, time, latitude, longitude)
SELECT id, date, time, replace(latitude,',','.')::numeric, replace(longitude,',','.')::numeric from tmp;
One more thing:
Since you're working with geographic coordinates, I sincerely recommend you to take a look at PostGIS. It is quite easy to install and makes your life much easier when you start your first calculations with geospatial data.

How to store string spaces as null in numeric column

I want to get records from my local txt file to postgresql table.
I have created following table.
create table player_info
(
Name varchar(20),
City varchar(30),
State varchar(30),
DateOfTour date,
pay numeric(5),
flag char
)
And, my local txt file contains following data.
John|Mumbai| |20170203|55555|Y
David|Mumbai| |20170305| |N
Romcy|Mumbai| |20170405|55555|N
Gotry|Mumbai| |20170708| |Y
I am just executing this,
copy player_info (Name,
City,
State,
DateOfTour,
pay_id,
flag)
from local 'D:\sample_player_info.txt'
delimiter '|' null as ''
exceptions 'D:\Logs\player_info'
What I want is,
For my numeric column, If 3 spaces are there,
then I have to insert NULL as pay else whatever 5 digits numeric number.
pay is a column in my table whose datatype is numeric.
Is this correct or possible to do this ?
You cannot store strings in a numeric column, at all. 3 spaces is a string, so it cannot be stored in the column pay as that is defined as numeric.
A common approach to this conundrum is to create a staging table which uses less precise data types in the column definitions. Import the source data into the staging table. Then process that data so that it can be reliably added to the final table. e.g. in the staging table set a column called pay_str to NULL where pay_str = ' ' (or perhaps LIKE ' %')

How does Redshift treat guillemets?

I am trying to run a CSV import using the COPY command for some data that includes a guillemet (»). Redshift complains that the column value is too long for the varchar column I have defined. The error in the "Loads" tab in the Redshift GUI displays this character as two dots: .. - had it been treated as one, it would have fit in the varchar column. It's not clear whether there is some sort of conversion error occurring or if there is a display issue.
When trying to do plain INSERTs I run into strange behavior as well:
dev=# create table test (name varchar(3));
CREATE TABLE
dev=# insert into test values ('bla');
INSERT 0 1
3 characters treated as 4?
dev=# insert into test values ('bl»');
ERROR: value too long for type character varying(3)
dev=# insert into test values ('b»');
INSERT 0 1
Why does char_length return 2?
dev=# select char_length(name), name from test;
char_length | name
-------------+------
2 | b»
I've checked the client encoding and database encodings and those all seem to be UTF8/UNICODE.
You need to increase the length of your varchar field. Multibyte characters use more than one character and length in the definition of varchar field are byte based. So, your special char might be taking more than a byte. If it still doesn't work refer to the doc page for Redshift below,
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html

Importing variable number of columns into SQLite database

I have a list of synonyms in a csv file format: word,meaning1,meaning2,meaning3....
Different words have different number of synonyms which means that rows are likely to have a variable number of columns. I am trying to import the csv file into an sqlite database like so:
sqlite3 synonyms
sqlite> create table list(word text, meaning0 text, meaning1 text, meaning2 text, meaning3 text, meaning4 text, meaning5 text, meaning6 text, meaning7 text, meaning8 text, meaning9 text);
sqlite> .mode list
sqlite> .separator ,
sqlite> .import ./csv/synonyms.csv list
To be on the safe side, I assumed a max. number of 10 columns to each word. For those words with less than 10 synonyms, the other columns should be null. The error I get on executing the import command is:
Error: ./csv/synonyms.csv line 1: expected 11 columns of data but found 3
My question(s):
1. In case the number of columns is less than 10, how can I tell SQLite to substitute it with null?
2. Is there some way of specifying that I want 10 columns after word instead of typing it automatically?
You can do following:
Import all data into single column;
Update table splitting column contents into other columns.
Sample:
-- Create a table with only one column;
CREATE TABLE table_name(first);
-- Choose a separator which doesn't exist within file
.separator ~
-- Import data
.import file.csv table_name
-- Add another column to split data
ALTER TABLE table_name ADD COLUMN second;
-- Split data between first and second column
UPDATE table_name SET first=SUBSTR(first, 1, INSTR(first, ",")-1), second=SUBSTR(first, INSTR(first, ",")+1) WHERE INSTR(first, ",")>0;
-- Repeat to next column
ALTER TABLE table_name ADD COLUMN third;
-- Split data between second and third column
UPDATE table_name SET second=SUBSTR(second, 1, INSTR(second, ",")-1), third=SUBSTR(second, INSTR(second, ",")+1) WHERE INSTR(second, ",")>0;
-- And so on...
ALTER TABLE table_name ADD COLUMN fourth;
UPDATE table_name SET third=SUBSTR(third, 1, INSTR(third, ",")-1), fourth=SUBSTR(third, INSTR(third, ",")+1) WHERE INSTR(third, ",")>0;
-- Many times as needed...
Not being an optimal method, sqlite performance should render it enough fast.