I have a json file as:
[xyz#innolx20122 ~]$ cat test_cgs.json
{"technology":"AAA","vendor":"XXX","name":"RBNI","temporal_unit":"hour","regional_unit":"cell","dataset_metadata":"{\"name\": \"RBNI\", \"intervals_epoch_seconds\": [[1609941600, 1609945200]], \"identifier_column_names\": [\"CELLID\", \"CELLNAME\", \"NETWORK\"], \"vendor\": \"XXX\", \"timestamp_column_name\": \"COLLECTTIME\", \"regional_unit\": \"cell\"}","rk":1}
which I am trying to upload to below table in Postgres
CREATE TABLE temp_test_table
(
technology character varying(255),
vendor character varying(255),
name character varying(255),
temporal_unit character varying(255),
regional_unit character varying(255),
dataset_metadata json,
rk character varying(255)
);
and here is my copy command
db-state=> \copy temp_test_table(technology,vendor,name,temporal_unit,regional_unit,dataset_metadata,rk) FROM '/home/eksinvi/test_cgs.json' WITH CSV DELIMITER ',' quote E'\b' ESCAPE '\';
ERROR: extra data after last expected column
CONTEXT: COPY temp_test_table, line 1: "{"technology":"AAA","vendor":"XXX","name":"RBNI","temporal_unit":"hour","regional_unit":"cell","data..."
I even tried loading this file to big query table but no luck
bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON --allow_quoted_newlines --allow_jagged_rows --ignore_unknown_values test-project:vikrant_test_dataset.cg_test_table "gs://test-bucket-01/test/test_cgs.json"
any of the solution would work for me. I want to load this json either to Postgres table or bigquery table.
I had similar problems. In my case, it was related to NULL columns and encoding of the file. I also had to specify a custom delimiter because my columns sometimes included the default limiter and it would make the copy fail.
\\copy mytable FROM 'filePath.dat' (DELIMITER E'\\t', FORMAT CSV, NULL '', ENCODING 'UTF8' );
In my case, I was exporting data to a CSV file from SQL Server and importing it to postgres. In SQL Server, we had unicode characters that would show up as "blanks" but that would screw up the copy command. I had to search the SQL table for those characters with regex queries and eliminate invalid characters. It's an edge case but that was part of the problem in my case.
Related
When I tried to copy a very large txt file into my postgres database, I got a following error below.
Note that I created a table with a single column and am not using any delimiter when importing the txt file.
db1=# create table travis_2018_data (v text);
db1=# \COPY travis_2018_data FROM 'C:\Users\testu\Downloads\travis_2018\2018-Certification\PROP.txt';
The error:
ERROR: extra data after last expected column
CONTEXT: COPY travis_2018_data, line 295032: "000000561125P 02018000000000000
I'm wondering why I still get the error about the extra data (or column) on line 295032 ?
Your text probably contains a tab character which is the default column delimiter for the TEXT format when using \copy (or copy) without specifying a format.
So \copy thinks the line contains two column but only expects one, hence the error message
You need to specify a delimiter that will not occur in the file. The character with the ASCII value 1 is a highly unlikely to occur in such a file, so you can try:
\COPY travis_2018_data FROM '.....' DELIMITER E'\x01'
When trying to use the COPY command via SQL in Postgres 9.5.1 in a simple example database…
I am getting this error:
ERROR: invalid input syntax for integer: "Sally"
CONTEXT: COPY customer_, line 2, column id_: "Sally"
********** Error **********
ERROR: invalid input syntax for integer: "Sally"
SQL state: 22P02
Context: COPY customer_, line 2, column id_: "Sally"
…when importing this data in CSV (comma-separated value):
"first_name_","last_name_","phone_","email_"
"Sally","Jones","425.555.1324","s.jones#acme.com"
"Jarrod","Barkley","206.555.3454","j.barkley#example.com"
"Wendy","Melvin","415.555.2343","wendy#wendyandlisa.com"
"Lisa","Coleman","425.555.7282","lisa#wendyandlisa.com"
"Jesse","Johnson","507.555.7865","j.j#guitar.com"
"Jean-Luc","Martin","212.555.2244","jean-luc.martin#example.com"
…being imported via the following SQL executed in pgAdmin:
COPY customer_
FROM '/home/parallels/Downloads/customer_.csv'
CSV
HEADER
;
…into this table:
-- Table: public.customer_
-- DROP TABLE public.customer_;
CREATE TABLE public.customer_
(
id_ integer NOT NULL DEFAULT nextval('customer__id__seq'::regclass),
first_name_ text NOT NULL,
last_name_ text NOT NULL,
phone_ text NOT NULL DEFAULT ''::text,
email_ text NOT NULL DEFAULT ''::text,
CONSTRAINT pkey_customer_ PRIMARY KEY (id_)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.customer_
OWNER TO postgres;
COMMENT ON TABLE public.customer_
IS 'Represents a person whose pets visit our clinic.';
So it seems the first row containing the names of the columns is being processed successfully. The failure point is with the first data value in the first data line of the CSV. None of my imported data is of integer type, so the I am befuddled by the error message. The only integer is the id_ primary key, auto-incrementing SERIAL.
I did read the Question page on PG COPY error: invalid input syntax for integer. But that question did involve integer values, and the lack thereof in an empty quoted string being interpreted as a NULL. In my case here we have no integer values in the data; the only integer is the primary key SERIAL column with a DEFAULT generated value (not in the data being imported).
I also found the Question, PostgreSQL ERROR: invalid input syntax for integer. But it seems irrelevant.
Try specifying the columns . . . without the primary key:
COPY customer_ (first_name_ text, last_name_ text, phone_ text, email_ text)
FROM '/home/parallels/Downloads/customer_.csv'
CSV
HEADER
;
Without the column list, it is looking for a value for id_.
The import data file’s first row of column names are not used for mapping to the table columns. The HEADER flag merely tells Postgres to skip over that first line, as documented:
HEADER
Specifies that… on input, the first line is ignored. …
COPY table_name FROM 'C:\path\file.csv' DELIMITERS ',' CSV header;
This wasn't the OP's problem, but posting because this is one of the top results when I google the error message.
I was trying to import a .csv file with no header. Adding a header to the file and changing COPY ... CSV to COPY ... CSV HEADER in the sql command fixed the problem for me.
I am required to load an excel file to a teradata table which already has data in it. I have used TPT Inserter operator to load data with CSV files. I am not sure how to directly load an excel file using TPT Inserter.
When I tried providing the excel file with TextDelimiter='TAB', the parser threw an error
data_connector: TPT19134 !ERROR! Fatal data error processing file 'd:\sample_dat
a.csv'. Delimited Data Parsing error: Too few columns in row 1.
1) Could someone explain what are the options required while directly importing excel file to teradata
2) How to load a TAB delimited file in teradata using tptLoad / tptInserter
the script that I have used is:
define job insert_data
description 'Load from Excel to TD table'
(
define operator insert_operator
type inserter
schema *
attributes
(
varchar logonmech='LDAP',
varchar username='username',
varchar userpassword='password',
varchar tdpid='tdpid',
varchar targettable='excel_to_table'
);
define schema upload_schema
(
quarter varchar(20),
cust_type varchar(20)
);
define operator data_connector
type dataconnector producer
schema upload_schema
attributes
(
varchar filename='d:\sample_data.xlsx',
varchar format='delimited',
varchar textdelimiter='TAB',
varchar openmode='Read'
);
apply ('insert into excel_to_table(quarter, cust_type) values(:quarter, :cust_type);')
to operator (insert_operator[1])
select quarter, cust_type
from operator (data_connector[1]);
);
Thanks!!
The scripts actually seems fine by the looks besides for the fact the the error is related to delimited data and a .xlsx extension file is specified in the script. Are you sure that the specified file is Tab delimited?
Formats supported by TPT Dataconnector operator are:
Binary - Binary data fitting exactly in the defined Schema plus indicator bytes
Delimited - Easier for multiple column human readable files, limited to all varchar schema
Formatted - For working with data exported by Teradata TTUs
Text - For text files containing fixed width columns, also human readable, limited to all varchar schema
Unformatted - For working with data exported by Teradata TTUs
The original excel data (in true xls or xlsx format) is not directly supported by native TPT operators. But if your data is really Tab delimited then this shouldn't be a problem; you should be able to load this. An obvious point to consider in loading a delimited file is that Char or Varchar fields must not contain delimiter within data. You can escape delimiter characters in data by using a '\'. A more subtle point is that you cannot specify TAB delimiter in lower case, i.e. varchar textdelimiter='TAB' works but varchar textdelimiter='tab' doesn't. Also, any other control characters (besides TAB) cannot be specified as delimiters.
If you truly need to load excel files then you may need to pre-process it into a loadable format such as delimited or binary or text data. You can write separate code in any language to achieve this.
I am trying import the following csv file which is imported from PgAdmin.
CSV File:
"id";"email";"password";"permissions";"activated";"activation_code";"activated_at";"last_login";"persist_code";"reset_password_code";"first_name";"last_name";"created_at";"updated_at"
"11";"test.teset#gmail.com";"$2y$10$gNfDFVCgqhQzCgKoiLuKfeskyCUPzK1akJvk6PdXt1DmMJooCuYpi";"";"t";"XY56HNcFHeAcAwzUhleYhvVbxxmOaMr57ifYEhxiUd";"";"2014-05-27 08:47:33";"$2y$10$g0LnAStmA/kEWuhNDfWleOjQeyo9maGvvIlfiJms/KpRiPAdfBWHm";"";"";"";"2014-05-27 07:51:07";"2014-05-27 08:47:33"
"5";"test#gmail.com";"$2y$10$dXODoI520pddcmiSXcS/OuiH.4K/87LEXeQRzvUl2k/Uth2HumpNy";"";"t";"4k8s1TbgrPfAMcNozVEP19MOQkCApQ0LA8bhGkF55A";"";"2014-05-21 21:18:06";"$2y$10$CefSnbQIzAJBo5PfbMdzKuhzpW17fHqh/frWabmljzJvv0A5Vkt1O";"";"";"";"2014-05-21 21:17:45";"2014-05-22 19:12:01"
And this the command I am using to import CSV file,
DROP TABLE users;
CREATE TABLE users
(
id serial NOT NULL,
email character varying(255) NOT NULL,
password character varying(255) NOT NULL,
permissions text,
activated boolean NOT NULL DEFAULT true,
activation_code character varying(255),
activated_at timestamp without time zone,
last_login timestamp without time zone,
persist_code character varying(255),
reset_password_code character varying(255),
first_name character varying(255),
last_name character varying(255),
created_at timestamp without time zone,
updated_at timestamp without time zone
);
COPY users (email,password,permissions,activated,activation_code,activated_at,last_login,persist_code,reset_password_code,first_name,last_name,created_at,updated_at)
FROM 'D:/test/db_backup/data_csv.csv' WITH CSV HEADER delimiter ';'
;
But I am getting following error, not sure why.
**ERROR: extra data after last expected column SQL state: 22P04**
Please let me know what is the issue here?
Because you have 14 columns of data but only 13 inside your copy statement. Specifically, you are missing the ID column in your copy statment. If the data are in the same order as they are declared in the table, there is no need to put the column names in the copy table from statement.
COPY users FROM 'D:/test/db_backup/data_csv.csv' CSV HEADER delimiter ';' null '\N';
will work just fine in your case -- note you don't need the WITH either.
EDIT. You need to explicitly set the null value, as in, null '\N'. If you use "", you can get an error such as: "CSV quote character must not appear in the NULL specification". Also, you can only specify HEADER in CSV mode, so if you want to use HEADER, and CSV, then you cannot use "" as a delimiter, and must use an explicit null '\N'.
I have updated my original answer to include this and imported your sample data, after replacing all the "" with \N.
I have the following problem. I'm using SQLite3 to store some code table information.
There is a text file that contains all the rows I need. I've trimmed it down to one to make things easier.
The codetbls.txt file contains the one row I want insert into the table codetbls.
Using notepad++ to view the file contents shows the following:
codetbls.txt (Encoding: UTF-8)
1A|Frequency|Fréquence
I've created the following table:
create table codetbls (
id char(2) COLLATE NOCASE PRIMARY KEY NOT NULL,
name_eng varchar(50) COLLATE NOCASE,
name_fr varchar(50) COLLATE NOCASE
);
I then execute the following:
.read codetbls.txt codetbls
Now, when I run a select, I see the following:
select * from codetbls;
id name_eng name_fr
--+---------+----------
1A|Frequency|Fr├®quence
I don't understand why it doesn't show properly.
If I execute an insert statement with 'é' using the shell prompt, it shows up correctly. However using the .read command doesn't seem to work.
Based on other suggestions, I have tried the following:
- changed datatype to 'text'
- changed character encoding to UTF-8 without BOM
I don't know why it doesn't show. Any help?