How to create column jsonb on AWS-Redshift - amazon-redshift

I have a table in my demo database.
CREATE TABLE airports_data (
airport_code character(3) NOT NULL,
airport_name text NOT NULL,
city text NOT NULL,
coordinates super NOT NULL,
timezone text NOT NULL
);
In the Postgres Database, the data in the coordinate column has POINT as data type, and are like this :
{"x":129.77099609375,"y":62.093299865722656}
When copy the table to csv file, the data are represent like this in the csv file :
"(129.77099609375,62.093299865722656)"
As i defined SUPER as data type for the column coordinate in redshift,how can i copy data type POINT from database postgres to csv file, and load csv file to redshift ?

In Amazon Redshift the SUPER data type is used to store semi-structured data.
This is the Amazon Redshift guide for loading and manipulating semi-structured data using the the SUPER data type.
As an example, if you have:
CREATE TABLE "public"."tmp_super2"
("id" VARCHAR(255) NULL, "data1" SUPER NULL, "data2" SUPER NULL)
BACKUP Yes;
With a CSV file named somefile.csv like this:
a|{}|{}
b|{\"a\":\"Hello World\", \"b\":100}|{\"z\":\"Hello Again\"}
Then you can load it with a COPY command like this:
COPY "public"."tmp_super2"
FROM 's3://yourbucket/yourfolder/somefile.csv'
REGION 'us-west-1'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftRole'
DELIMITER '|' ESCAPE
The COPY command is picky about double quotes when it is loading SUPER from CSV, hence the use of a pipe field delimiter, and escaping the double quotes.

Related

Import CSV file via PSQL With Array Values

I have a very simple table where I would like to insert data from a CSV into it.
create table products(
id integer primary key,
images text[]
);
Here is what I am currently trying with my csv:
1,"['hello.jpg', 'world.jpg']"
2,"['hola.jpg', 'mundo.jpg']"
When I do the following, I get a syntax error from psql, with no additional information what could have gone wrong.
\copy products 'C:\Users\z\Downloads\MOCK_DATA.csv' WITH DELIMITER ',';
Does anyone know how to format my array values properly?
If you remove the square brackets from the csv file then I would have the table like this (images as text rather than text[]):
create table products_raw
(
id integer primary key,
images text
);
plus this view
create view products as
select id, ('{'||images||'}')::text[] as images
from products_raw;
and use the view henceforth. Anyway I would rather have the CSV file like this, no formatting, just data:
1,"hello.jpg,world.jpg"
2,"hola.jpg,mundo.jpg"
It is also worth considering to attach the csv file as a foreign table using file_fdw. It is a bit more complicated but usually pays off with several benefits.

ERROR: extra data after last expected column in Postgres

I have a json file as:
[xyz#innolx20122 ~]$ cat test_cgs.json
{"technology":"AAA","vendor":"XXX","name":"RBNI","temporal_unit":"hour","regional_unit":"cell","dataset_metadata":"{\"name\": \"RBNI\", \"intervals_epoch_seconds\": [[1609941600, 1609945200]], \"identifier_column_names\": [\"CELLID\", \"CELLNAME\", \"NETWORK\"], \"vendor\": \"XXX\", \"timestamp_column_name\": \"COLLECTTIME\", \"regional_unit\": \"cell\"}","rk":1}
which I am trying to upload to below table in Postgres
CREATE TABLE temp_test_table
(
technology character varying(255),
vendor character varying(255),
name character varying(255),
temporal_unit character varying(255),
regional_unit character varying(255),
dataset_metadata json,
rk character varying(255)
);
and here is my copy command
db-state=> \copy temp_test_table(technology,vendor,name,temporal_unit,regional_unit,dataset_metadata,rk) FROM '/home/eksinvi/test_cgs.json' WITH CSV DELIMITER ',' quote E'\b' ESCAPE '\';
ERROR: extra data after last expected column
CONTEXT: COPY temp_test_table, line 1: "{"technology":"AAA","vendor":"XXX","name":"RBNI","temporal_unit":"hour","regional_unit":"cell","data..."
I even tried loading this file to big query table but no luck
bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON --allow_quoted_newlines --allow_jagged_rows --ignore_unknown_values test-project:vikrant_test_dataset.cg_test_table "gs://test-bucket-01/test/test_cgs.json"
any of the solution would work for me. I want to load this json either to Postgres table or bigquery table.
I had similar problems. In my case, it was related to NULL columns and encoding of the file. I also had to specify a custom delimiter because my columns sometimes included the default limiter and it would make the copy fail.
\\copy mytable FROM 'filePath.dat' (DELIMITER E'\\t', FORMAT CSV, NULL '', ENCODING 'UTF8' );
In my case, I was exporting data to a CSV file from SQL Server and importing it to postgres. In SQL Server, we had unicode characters that would show up as "blanks" but that would screw up the copy command. I had to search the SQL table for those characters with regex queries and eliminate invalid characters. It's an edge case but that was part of the problem in my case.

psql copy from csv automatically generating ids

Well consider a table created like this:
CREATE TABLE public.test
(
id integer NOT NULL DEFAULT nextval('user_id_seq'::regclass),
name text,
PRIMARY KEY (id)
)
So the table has a unique 'id' column that auto generates default values using a sequence.
Now I wish to import data from a csv file, extending this table. However "obviously" the ids need to be unique, and thus I wish to let the database itself generate the ids, the csv file itself (coming from a complete different source) has hence an "empty column" for the ids:
,username
,username2
However if I then import this csv using psql:
\copy public."user" FROM '/home/paul/Downloads/test.csv' WITH (FORMAT csv);
The following error pops up:
ERROR: null value in column "id" violates not-null constraint
So how can I do this?
The empty colum from the CSV file is interpreted as SQL NULL, and inserting that value overrides the DEFAULT and leads to the error.
You should omit the empty column from the file and use:
\copy public."user"(name) FROM '...' (FORMAT 'csv')
Then the default value will be used for id.

Import excel file into teradata using tpt

I am required to load an excel file to a teradata table which already has data in it. I have used TPT Inserter operator to load data with CSV files. I am not sure how to directly load an excel file using TPT Inserter.
When I tried providing the excel file with TextDelimiter='TAB', the parser threw an error
data_connector: TPT19134 !ERROR! Fatal data error processing file 'd:\sample_dat
a.csv'. Delimited Data Parsing error: Too few columns in row 1.
1) Could someone explain what are the options required while directly importing excel file to teradata
2) How to load a TAB delimited file in teradata using tptLoad / tptInserter
the script that I have used is:
define job insert_data
description 'Load from Excel to TD table'
(
define operator insert_operator
type inserter
schema *
attributes
(
varchar logonmech='LDAP',
varchar username='username',
varchar userpassword='password',
varchar tdpid='tdpid',
varchar targettable='excel_to_table'
);
define schema upload_schema
(
quarter varchar(20),
cust_type varchar(20)
);
define operator data_connector
type dataconnector producer
schema upload_schema
attributes
(
varchar filename='d:\sample_data.xlsx',
varchar format='delimited',
varchar textdelimiter='TAB',
varchar openmode='Read'
);
apply ('insert into excel_to_table(quarter, cust_type) values(:quarter, :cust_type);')
to operator (insert_operator[1])
select quarter, cust_type
from operator (data_connector[1]);
);
Thanks!!
The scripts actually seems fine by the looks besides for the fact the the error is related to delimited data and a .xlsx extension file is specified in the script. Are you sure that the specified file is Tab delimited?
Formats supported by TPT Dataconnector operator are:
Binary - Binary data fitting exactly in the defined Schema plus indicator bytes
Delimited - Easier for multiple column human readable files, limited to all varchar schema
Formatted - For working with data exported by Teradata TTUs
Text - For text files containing fixed width columns, also human readable, limited to all varchar schema
Unformatted - For working with data exported by Teradata TTUs
The original excel data (in true xls or xlsx format) is not directly supported by native TPT operators. But if your data is really Tab delimited then this shouldn't be a problem; you should be able to load this. An obvious point to consider in loading a delimited file is that Char or Varchar fields must not contain delimiter within data. You can escape delimiter characters in data by using a '\'. A more subtle point is that you cannot specify TAB delimiter in lower case, i.e. varchar textdelimiter='TAB' works but varchar textdelimiter='tab' doesn't. Also, any other control characters (besides TAB) cannot be specified as delimiters.
If you truly need to load excel files then you may need to pre-process it into a loadable format such as delimited or binary or text data. You can write separate code in any language to achieve this.

Using Oracle Sequence in SQL Loader?

I am using SEQUENCE keyword in SQL Loader control file to generate primary keys. But for a special scenario I would like to use Oracle sequence in the control file. The Oracle documentation for SQL Loader doesn't mentioned anything about it. does SQL Loader support it?
I have managed to load without using the dummy by the switching the sequence to be the last column as in :
LOAD DATA
INFILE 'data.csv'
APPEND INTO TABLE my_data
FIELDS TERMINATED BY ','
(
name char,
ID "MY_SEQUENCE.NEXTVAL"
)
and data.csv would be like:
"dave"
"carol"
"tim"
"sue"
I have successfully used a sequence from my Oracle 10g database to populate a primary key field during an sqlldr run:
Here is my data.ctl:
LOAD DATA
INFILE 'data.csv'
APPEND INTO TABLE my_data
FIELDS TERMINATED BY ','
(
ID "MY_SEQUENCE.NEXTVAL",
name char
)
and my data.csv:
-1, "dave"
-1, "carol"
-1, "tim"
-1, "sue"
For some reason you have to put a dummy value in the CSV file even though you'd figure that sqlldr would just figure out that you wanted to use a sequence.
I don't think so, but you can assign the sequence via the on insert trigger unless this is a direct path load.