Import excel file into teradata using tpt - import

I am required to load an excel file to a teradata table which already has data in it. I have used TPT Inserter operator to load data with CSV files. I am not sure how to directly load an excel file using TPT Inserter.
When I tried providing the excel file with TextDelimiter='TAB', the parser threw an error
data_connector: TPT19134 !ERROR! Fatal data error processing file 'd:\sample_dat
a.csv'. Delimited Data Parsing error: Too few columns in row 1.
1) Could someone explain what are the options required while directly importing excel file to teradata
2) How to load a TAB delimited file in teradata using tptLoad / tptInserter
the script that I have used is:
define job insert_data
description 'Load from Excel to TD table'
(
define operator insert_operator
type inserter
schema *
attributes
(
varchar logonmech='LDAP',
varchar username='username',
varchar userpassword='password',
varchar tdpid='tdpid',
varchar targettable='excel_to_table'
);
define schema upload_schema
(
quarter varchar(20),
cust_type varchar(20)
);
define operator data_connector
type dataconnector producer
schema upload_schema
attributes
(
varchar filename='d:\sample_data.xlsx',
varchar format='delimited',
varchar textdelimiter='TAB',
varchar openmode='Read'
);
apply ('insert into excel_to_table(quarter, cust_type) values(:quarter, :cust_type);')
to operator (insert_operator[1])
select quarter, cust_type
from operator (data_connector[1]);
);
Thanks!!

The scripts actually seems fine by the looks besides for the fact the the error is related to delimited data and a .xlsx extension file is specified in the script. Are you sure that the specified file is Tab delimited?
Formats supported by TPT Dataconnector operator are:
Binary - Binary data fitting exactly in the defined Schema plus indicator bytes
Delimited - Easier for multiple column human readable files, limited to all varchar schema
Formatted - For working with data exported by Teradata TTUs
Text - For text files containing fixed width columns, also human readable, limited to all varchar schema
Unformatted - For working with data exported by Teradata TTUs
The original excel data (in true xls or xlsx format) is not directly supported by native TPT operators. But if your data is really Tab delimited then this shouldn't be a problem; you should be able to load this. An obvious point to consider in loading a delimited file is that Char or Varchar fields must not contain delimiter within data. You can escape delimiter characters in data by using a '\'. A more subtle point is that you cannot specify TAB delimiter in lower case, i.e. varchar textdelimiter='TAB' works but varchar textdelimiter='tab' doesn't. Also, any other control characters (besides TAB) cannot be specified as delimiters.
If you truly need to load excel files then you may need to pre-process it into a loadable format such as delimited or binary or text data. You can write separate code in any language to achieve this.

Related

Import CSV file via PSQL With Array Values

I have a very simple table where I would like to insert data from a CSV into it.
create table products(
id integer primary key,
images text[]
);
Here is what I am currently trying with my csv:
1,"['hello.jpg', 'world.jpg']"
2,"['hola.jpg', 'mundo.jpg']"
When I do the following, I get a syntax error from psql, with no additional information what could have gone wrong.
\copy products 'C:\Users\z\Downloads\MOCK_DATA.csv' WITH DELIMITER ',';
Does anyone know how to format my array values properly?
If you remove the square brackets from the csv file then I would have the table like this (images as text rather than text[]):
create table products_raw
(
id integer primary key,
images text
);
plus this view
create view products as
select id, ('{'||images||'}')::text[] as images
from products_raw;
and use the view henceforth. Anyway I would rather have the CSV file like this, no formatting, just data:
1,"hello.jpg,world.jpg"
2,"hola.jpg,mundo.jpg"
It is also worth considering to attach the csv file as a foreign table using file_fdw. It is a bit more complicated but usually pays off with several benefits.

Load data with default values into Redshift from a parquet file

I need to load data with a default value column into Redshift, as outlined in the AWS docs.
Unfortunately the COPY command doesn't allow loading data with default values from a parquet file, so I need to find a different way to do that.
My table requires a column with the getdate function from Redshift:
LOAD_DT TIMESTAMP DEFAULT GETDATE()
If I use the COPY command and add the column names as arguments I get the error:
Column mapping option argument is not supported for PARQUET based COPY
What is a workaround for this?
Can you post a reference for Redshift not supporting default values for a Parquet COPY? I haven't heard of this restriction.
As to work-arounds I can think of two.
Copy the file to a temp table and then insert from this temp table into your table with the default value.
Define an external table that uses the parquet file as source and insert from this table onto the table with the default value.

Convert a BLOB to VARCHAR instead of VARCHAR FOR BIT

I have a BLOB field in a table that I am selecting. This field data consists only of JSON data.
If I do the following:
Select CAST(JSONBLOB as VARCHAR(2000)) from MyTable
--> this returns the value in VARCHAR FOR BIT DATA format.
I just want it as a standard string or varcher - not in bit format.
That is because I need to use JSON2BSON function to convert the JSON to BSON. JSON2BSON accepts a string but it will not accept a VarChar for BIT DATA type...
This conversation should be easy.
I am able to do the select as a VARCHAR FOR BIT DATA.. Manually COPY it using the UI. Paste it into a select literal and convert that to BSON. I need to migrate a bunch of data in this BLOB from JSON to BSON, and doing it manually won't be fast. I just want to explain how simple of a use case this should be.
What is the select command to essentially get this to work:
Select JSON2BSON(CAST(JSONBLOB as VARCHAR(2000))) from MyTable
--> Currently this fails because the CAST converts this (even though its only text characters) to VARCHAR for BIT DATA type and not standard VARCHAR.
What is the suggestion to fix this?
DB2 11 on Windows.
If the data is JSON, then the table column should be CLOB in the first place...
Having the table column a BLOB might make sense if the data is actually already BSON.
You could change the blob into a clob using the converttoclob procedure then you should be ok.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.apdv.sqlpl.doc/doc/r0055119.html
You can use this function to remove the "FOR BIT DATA" flag on a column
CREATE OR REPLACE FUNCTION DB_BINARY_TO_CHARACTER(A VARCHAR(32672 OCTETS) FOR BIT DATA)
RETURNS VARCHAR(32672 OCTETS)
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN ATOMIC
RETURN A;
END
or if you are on Db2 11.5 the function SYSIBMADM.UTL_RAW.CAST_TO_VARCHAR2 will also work

Basic questions about Cloud SQL

I'm trying to populate a cloud sql database using a cloud storage bucket, but I'm getting some errors. The csv has the headers (or column names) as first row and does not have all the columns (some columns in the database can be null, so I'm loading the data I need for now).
The database is in postgresql and this is the first database in GCP I'm trying to configure and I'm a little bit confused.
Does it matters if the csv file has the column names?
Does the order of the columns matter in the csv file? (I guess they do if there are not present in the csv)
The PK of the table is a serial number, which I'm not including in the csv file. Do I need to include also the PK? I mean, because its a serial number it should be "auto assigned", right?
Sorry for the noob questions and thanks in advance :)
This is all covered by the COPY documentation.
It matters in that you will have to specify the HEADER option so that the first line is skipped:
[...] on input, the first line is ignored.
The order matters, and if the CSV file does not contain all the columns in the same order as the table, you have to specify them with COPY:
COPY mytable (col12, col2, col4, ...) FROM '/dir/afile' OPTIONS (...);
Same as above: if you omit a table column in the column list, it will be filled with the default value, in that case that is the autogenerated number.

How to insert (raw bytes from file data) using a plain text script

Database: Postgres 9.1
I have a table called logos defined like this:
create type image_type as enum ('png');
create table logos (
id UUID primary key,
bytes bytea not null,
type image_type not null,
created timestamp with time zone default current_timestamp not null
);
create index logo_id_idx on logos(id);
I want to be able to insert records into this table in 2 ways.
The first (and most common) way rows will be inserted in the table will be that a user will provide a PNG image file via an html file upload form. The code processing the request on the server will receive a byte array containing the data in the PNG image file and insert a record in the table using something very similar to what is explained here. There are plenty of example of how to insert byte arrays into a postgresql field of type bytea on the internet. This is an easy exercise. An example of the insert code would look like this:
insert into logos (id, bytes, type, created) values (?, ?, ?, now())
And the bytes would be set with something like:
...
byte[] bytes = ... // read PNG file into a byte array.
...
ps.setBytes(2, bytes);
...
The second way rows will be inserted in the table will be from a plain text file script. The reason this is needed is only to populate test data into the table for automated tests, or to initialize the database with a few records for a remote development environment.
Regardless of how the data is entered in the table, the application will obviously need to be able to select the bytea data from the table and convert it back into a PNG image.
Question
How does one properly encode a byte array, to be able to insert the data from within a script, in such a way that only the original bytes contained in the file are stored in the database?
I can write code to read the file and spit out insert statements to populate the script. But I don't know how to encode the byte array for the plain text script such that when running the script from psql the image data will be the same as if the file was inserted using the setBytes jdbc code.
I would like to run the script with something like this:
psql -U username -d dataBase -a -f test_data.sql
The easiest way, IMO, to represent bytea data in an SQL file is to use the hex format:
8.4.1. bytea Hex Format
The "hex" format encodes binary data as 2 hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x (to distinguish it from the escape format). In some contexts, the initial backslash may need to be escaped by doubling it, in the same cases in which backslashes have to be doubled in escape format; details appear below. The hexadecimal digits can be either upper or lower case, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x sequence). The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred.
Example:
SELECT E'\\xDEADBEEF';
Converting an array of bytes to hex should be trivial in any language that a sane person (such a yourself) would use to write the SQL file generator.