Why does Redshift UNLOAD query is not able to quote column correctly?

Why does Redshift UNLOAD query is not able to quote column correctly? - amazon-redshift

I have following row in AWS Redshift warehouse table.
name
-------------------
tokenauthserver2018
This I queried via simple SELECT query
SELECT name
FROM tablename
When I am trying to unload it using UNLOAD query from AWS Redshift, it is successfully finishing but giving weird quoting.
"name"
"tokenauthserver2018\
Here is my query
UNLOAD ($TABLE_QUERY$
SELECT name
FROM tablename
$TABLE_QUERY$)
TO 's3://bucket/folder'
MANIFEST VERBOSE HEADER DELIMITER AS ','
NULL AS '' ESCAPE GZIP ADDQUOTES ALLOWOVERWRITE PARALLEL OFF;
I tried unloading without ADDQUOTES as well, but got following data
name
"tokenauthserver2018
This is the query for above.
UNLOAD ($TABLE_QUERY$
SELECT name
FROM tablename
$TABLE_QUERY$)
TO 's3://bucket/folder'
MANIFEST VERBOSE HEADER CSV NULL AS '' GZIP ALLOWOVERWRITE PARALLEL OFF;

Amazon support was able to resolve this, I am posting answer here for anyone interested.
This was due to presence of NULL character \0 in my data. As I don't have control over source data, I used TRANSLATE function to replace \0 character.
SELECT
TRANSLATE("name", CHR(0), '') AS "name"
FROM <tablename>
Reference: https://docs.aws.amazon.com/redshift/latest/dg/r_TRANSLATE.html

Related

How to add ONE column to ALL tables in postgresql schema

question is pretty simple, but can't seem to find a concrete answer anywhere.
I need to update all tables inside my postgresql schema to include a timestamp column with default NOW(). I'm wondering how I can do this via a query instead of having to go to each individual table. There are several hundred tables in the schema and they all just need to have the one column added with the default value.
Any help would be greatly appreciated!

The easy way with psql, run a query to generate the commands, save and run the results
-- Turn off headers:
\t
-- Use SQL to build SQL:
SELECT 'ALTER TABLE public.' || table_name || ' add fecha timestamp not null default now();'
FROM information_schema.tables
WHERE table_type = 'BASE TABLE' AND table_schema='public';
-- If the output looks good, write it to a file and run it:
\g out.tmp
\i out.tmp
-- or if you don't want the temporal file, use gexec to run it:
\gexec

stl_load_errors returning invalid timestamp format I can't figure out

I'm trying to use the copy function to create a table in Redshift. I've setup this particular field that keeps failing in my schema as a standard timestamp because I don't know why it would be anything otherwise. But when I run this statement:
copy sample_table
from 's3://aws-bucket/data_push_2018-10-05.txt'
credentials 'aws_access_key_id=XXXXXXXXXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/XXX'
dateformat 'auto'
ignoreheader 1;
It keeps returning this error: Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SS]
raw_field_value: "2018-08-29 15:04:52"
raw_line: 12039752|311525|"67daf211abbe11e8b0010a28385dd2bc"|98953|"2018-08-20"|"2018-11-30"|"active"|"risk"|||||||"sample"|15750|0|"2018-08-29 15:04:52"|"2018-08-29 16:05:01"
There is a very similar table in our database (that I did not make) which has the aforementioned error value as timestamp and values for that field identical to 2018-08-29 15:04:52 so what's happening when I run it that's causing the issue?

Your copy command seems OK, and seems like you are missing FORMAT as CSV QUOTE AS '"' AND DELIMITER AS '|' parameters and It should work.
I'm here using some sample data and command to prove my case, to make it simple, I did made the table simple and covered all your data points though.
create table sample_table(
salesid integer not null,
category varchar(100),
created_at timestamp,
update_at timestamp );
Here goes your sample data test_file.csv,
12039752|"67daf211abbe11e8b0010a28385dd2bc"|"2018-08-29 11:04:52"|"2018-08-29 14:05:01"
12039754|"67daf211abbe11e8b0010a2838cccddbc"|"2018-08-29 15:04:52"|"2018-08-29 16:05:01"
12039755|"67daf211abbe11e8b0010a28385ff2bc"|"2018-08-29 12:04:52"|"2018-08-29 13:05:01"
12039756|"67daf211abbe11e8b0010a28385bb2bc |"2018-08-29 10:04:52"|"2018-08-29 15:05:01"
Here goes your copy command,
COPY sample_table FROM 's3://path/to/csv/test_file.csv' CREDENTIALS 'aws_access_key_id=XXXXXXXXXXX;aws_secret_access_key=XXXXXXXXX' FORMAT as CSV QUOTE AS '"' DELIMITER AS '|';
It will returns,
INFO: Load into table 'sample_table' completed, 4 record(s) loaded successfully.
COPY
Though this command works fine, but if there are more issues with your data you could try MAXERROR option as well.
Hope it answers your question.

SQL server Openquery equivalent to PostgresQL

Is there query equivalent to sql server's openquery or openrowset to use in postgresql to query from excel or csv ?

You can use PostgreSQL's COPY
As per doc:
COPY moves data between PostgreSQL tables and standard file-system
files. COPY TO copies the contents of a table to a file, while COPY
FROM copies data from a file to a table (appending the data to
whatever is in the table already). COPY TO can also copy the results
of a SELECT query
COPY works like this:
Importing a table from CSV
Assuming you already have a table in place with the right columns, the command is as follows
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV;
Exporting a CSV from a table.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV;
Its important to mention here that generally if your data is in unicode or need strict Encoding, then Always set client_encoding before running any of the above mentioned commands.
To set CLIENT_ENCODING parameter in PostgreSQL
set client_encoding to 'UTF8'
or
set client_encoding to 'latin1'
Another thing to guard against is nulls, while exporting , if some fields are null then PostgreSQL will add '/N' to represent a null field, this is fine but may cause issues if you are trying to import that data in say SQL server.
A quick fix is modify the export command by specifying what would you prefer as a null placeholder in exported CSV
COPY (select * from tblemployee ) TO '~/exp_tblemployee.csv' DELIMITERS ',' NULL as E'';
Another common requirement is import or export with the header.
Import CSV to table with Header for columns present in first row of csv file.
COPY tblemployee FROM '~/empsource.csv' DELIMITERS ',' CSV HEADER
Export a table to CSV with Headers present in the first row.
COPY (select * from tblemployee) TO '~/exp_tblemployee.csv' DELIMITERS ',' CSV HEADER

Unloading from redshift to s3 with headers

I already know how to unload a file from redshift into s3 as one file. I need to know how to unload with the column headers. Can anyone please help or give me a clue?
I don't want to manually have to do it in shell or python.

As of cluster version 1.0.3945, Redshift now supports unloading data to S3 with header rows in each file i.e.
UNLOAD('select column1, column2 from mytable;')
TO 's3://bucket/prefix/'
IAM_ROLE '<role arn>'
HEADER;
Note: you can't use the HEADER option in conjunction with FIXEDWIDTH.
https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

If any of your columns are non-character, then you need to explicitly cast them as char or varchar because the UNION forces a cast.
Here is an example of the full statement that will create a file in S3 with the headers in the first row.
The output file will be a single CSV file with quotes.
This example assumes numeric values in column_1. You will need to adjust the ORDER BY clause to a numeric column to ensure the header row is in row 1 of the S3 file.
******************************************
/* Redshift export to S3 CSV single file with headers - limit 6.2GB */
UNLOAD ('
SELECT \'column_1\',\'column_2\'
UNION
SELECT
CAST(column_1 AS varchar(255)) AS column_1,
CAST(column_2 AS varchar(255)) AS column_2
FROM source_table_for_export_to_s3
ORDER BY 1 DESC
;
')
TO 's3://bucket/path/file_name_for_table_export_in_s3_' credentials
'aws_access_key_id=<key_with_no_<>_brackets>;aws_secret_access_key=<secret_access_key_with_no_<>_brackets>'
PARALLEL OFF
ESCAPE
ADDQUOTES
DELIMITER ','
ALLOWOVERWRITE
GZIP
;
****************************************

There is no direct option provided by redshift unload .
But we can tweak queries to generate files with rows having headers added.
First we will try with parallel off option so that it will create only on file.
"By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files."
To have headers in unload files we will do as below.
Suppose you have table as below
create table mutable
(
name varchar(64) default NULL,
address varchar(512) default NULL
)
Then try to use select command from you unload as below to add headers as well
( select 'name','address') union ( select name,address from mytable )
this will add headers name and address as first line in your output.

Just to complement the answer, to ensure the header row comes first, you don't have to order by a specific column of data. You can enclose the UNIONed selects inside another select, add a ordinal column to them and then in the outer select order by that column without including it in the list of selected columns.
UNLOAD ('
SELECT column_1, column_2 FROM (
SELECT 1 AS i,\'column_1\' AS column_, \'column_2\' AS column_2
UNION ALL
SELECT 2 AS i, column_1::varchar(255), column_2::varchar(255)
FROM source_table_for_export_to_s3
) t ORDER BY i
')
TO 's3://bucket/path/file_name_for_table_export_in_s3_'
CREDENTIALS
'aws_access_key_id=...;aws_secret_access_key=...'
DELIMITER ','
PARALLEL OFF
ESCAPE
ADDQUOTES;

Redshift now supports unload with headers. September 19–October 10, 2018 release.
The syntax for unloading with headers is -
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
HEADER

Unfortunately, the UNLOAD command doesn't natively support this feature (see other answers for how to do it with workarounds).
I've posted a feature request on the AWS forums, so hopefully it gets added someday.
Edit: The feature has now been implemented natively in Redshift! 🎉

Try like this:
Unload VENUE with a Header:
unload ('select * from venue where venueseats > 75000')
to 's3://mybucket/unload/'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
header
parallel off;
The following shows the contents of the output file with a header row:
venueid|venuename|venuecity|venuestate|venueseats
6|New York Giants Stadium|East Rutherford|NJ|80242
78|INVESCO Field|Denver|CO|76125
83|FedExField|Landover|MD|91704
79|Arrowhead Stadium|Kansas City|MO|79451

To make the process easier you can use a pre-built docker image to extract and include the header row.
https://github.com/openbridge/ob_redshift_unload
It will also do a few other things, but it seemed to make sense to package this in an easy to use format.

To unload a table as csv to s3 including the headers, you will simply have to do it this way
UNLOAD ('SELECT * FROM {schema}.{table}')
TO 's3://{s3_bucket}/{s3_key}/{table}/'
with credentials
'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
CSV HEADER ALLOWOVERWRITE PARALLEL OFF;

exporting to csv from db2 with no delimiter

I need to export content of a db2 table to CSV file.
I read that nochardel would prevent to have the separator between each data but that is not happening.
Suppose I have a table
MY_TABLE
-----------------------
Field_A varchar(10)
Field_B varchar(10)
Field_A varchar(10)
I am using this command
export to myfile.csv of del modified by nochardel select * from MY_TABLE
I get this written into the myfile.csv
data1 ,data2 ,data3
but I would like no ',' separator like below
data1 data2 data3
Is there a way to do that?

You're asking how to eliminate the comma (,) in a comma separated values file? :-)
NOCHARDEL tells DB2 not to surround character-fields (CHAR and VARCHAR fields) with a character-field-delimiter (default is the double quote " character).
Anyway, when exporting from DB2 using the delimited format, you have to have some kind of column delimiter. There isn't a NOCOLDEL option for delimited files.
The EXPORT utility can't write fixed-length (positional) records - you would have to do this by either:
Writing a program yourself,
Using a separate utility (IBM sells the High Performance Unload utility)
Writing an SQL statement that concatenates the individual columns into a single string:
Here's an example for the last option:
export to file.del
of del
modified by nochardel
select
cast(col1 as char(20)) ||
cast(intcol as char(10)) ||
cast(deccol as char(30));
This last option can be a pain since DB2 doesn't have an sprintf() function to help format strings nicely.

Yes there is another way of doing this. I always do this:
Put select statement into a file (input.sql):
select
cast(col1 as char(20)),
cast(col2 as char(10)),
cast(col3 as char(30));
Call db2 clp like this:
db2 -x -tf input.sql -r result.txt
This will work for you, because you need to cast varchar to char. Like Ian said, casting numbers or other data types to char might bring unexpected results.
PS: I think Ian points right on the difference between CSV and fixed-length format ;-)

Use "of asc" instead of "of del". Then you can specify the fixed column locations instead of delimiting.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why does Redshift UNLOAD query is not able to quote column correctly? - amazon-redshift

Related

How to add ONE column to ALL tables in postgresql schema

stl_load_errors returning invalid timestamp format I can't figure out

SQL server Openquery equivalent to PostgresQL

Unloading from redshift to s3 with headers

exporting to csv from db2 with no delimiter

Categories

Resources