Creating Postgres table on AWS RDS using CSV file - postgresql

I'm having this issue with creating a table on my postgres DB on AWS RDS by importing the raw csv data. Here's the few steps that I already did.
CSV file has been uploaded on my S3 bucket
Followed AWS's tutorial to give RDS permission to import data from S3
Created an empty table on postgres
Tried using pgAdmin's 'import' feature to import the local csv file into the table, but it kept giving me the error.
So I'm using this query below to import the data into the table:
SELECT aws_s3.table_import_from_s3(
'public.bayarea_property_data',
'',
'(FORMAT CSV, HEADER true)',
'cottage-prop-data',
'clean_ta_file_edit.csv',
'us-west-1'
);
However, I keep getting this message:
ERROR: extra data after last expected column
CONTEXT: COPY bayarea_property_data, line 2: ",2009.0,2009.0,0.0,,0,2019,13061.0,,0,0.0,0.0,,2019,0.0,6767.0,576040,172810,403230,70,1,,1.0,,6081,..."
SQL statement "copy public.bayarea_property_data from '/rdsdbdata/extensions/aws_s3/amazon-s3-fifo-6261-20200819T083314Z-0' with (FORMAT CSV, HEADER true)"
SQL state: 22P04
Anyone can help me with this? I'm an AWS noob, so have been struggling over the past few days. Thanks!

Related

Copy data File AWS S3 to Aurora Postgres

Trying to Copy csv File from AWS S3 to Aurora Postgres.
I did add S3 access to the RDS table for s3 import.
Is there anything else i am missing?
This is command that i tried:
SELECT aws_s3.table_import_from_s3 ('t1','','DELIMITER '','' CSV HEADER',aws_commons.create_s3_uri('testing','test_1.csv','us-west-2'));
Error:
NOTICE: HINT: make sure your instance is able to connect with S3.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 0 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
ERROR: Unable to generate pre-signed url, look at engine log for details.
CONTEXT: SQL function "table_import_from_s3" statement 1 ```
can anyone help me on this please?

Import CSV parts from S3 into RDS Aurora PostgresSQL

I spent some time fiddling with the tiny details of the AWS S3 extension for Postgres described here https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/postgresql-s3-export.html#postgresql-s3-export-access-bucket (postgres extension configuration, roles, policies, tiny function input details).
I want to easily export, then import huge tables for testing purposes (indexes, generated columns, partitions etc) to optimize the database performance.
I am using this extension because I want to avoid to use my laptop to store the file with stuff like the following command which involves a lot of network I/O and is affected by slow internet connections, broken pipes when the connection is being nuked by the Operating System after a while and more of these problems related to huge tables:
# store CSV from S3 to local
aws s3 cp s3://my_bucket/my_sub_path/my_file.csv /my_local_directory/my_file.csv
# import from local CSV to AWS RDS Aurora PostgresSQL
psql -h my_rds.amazonaws.com -U my_username -d my_dbname -c '\COPY table FROM ''my_file.csv'' CSV HEADER'
I managed to export a very big table (160GB) into CSV files to S3 with:
SELECT * from aws_s3.query_export_to_s3(
'SELECT * FROM my_schema.my_large_table',
aws_commons.create_s3_uri(
'my_bucket/my_subpath',
'my_file.csv',
'eu-central-1'
),
options:='format csv'
);
However this ends up in lots of "part files" in S3:
the first one with that same CSV filename my_file.csv
all the others like my_file.csv_part2 ... my_file.csv_part20 and so on
Now, I don't think this is a problem as long as I am able to import back the CSV data somewhere else in AWS RDS Aurora (PostgresSQL). Although I am not sure what strategies could be applied here, if it's better having all these CSV files, or perhaps I can configure the export to use only one huge CSV file (160GB).
Now the import stuff (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PostgreSQL.S3Import.html):
Turns out I have to import all these "part files" with PL/pgSQL, but then I get lost in the details on how to format those strings for the S3 paths and in general I see all sorts of errors (both export and import). One file import takes around 20 minutes, so it's quite frustrating figure out what is going wrong.
What's wrong with the source code / error below?
Is there a better way to handle all this export/import at scale (160GB tables)?
DO $$
DECLARE
my_csv_s3_sub_path text;
BEGIN
FOR cnt IN 2..26 LOOP
my_csv_s3_sub_path := 'my_subpath/my_file.csv_part' || cnt;
RAISE NOTICE '% START loading CSV file % from S3', now(), cnt;
SELECT aws_s3.table_import_from_s3(
'my_schema.my_large_table_new',
'',
'(format csv)',
aws_commons.create_s3_uri(
'my_bucket',
my_csv_s3_sub_path,
'eu-central-1'
)
);
RAISE NOTICE '% STOP loading CSV file % from S3', now(), cnt;
END LOOP;
END; $$
The code above gives:
SQL Error [42601]: ERROR: query has no destination for result data
Hint: If you want to discard the results of a SELECT, use PERFORM instead.
Where: PL/pgSQL function inline_code_block line 8 at SQL statement
I think it's related to variables and string interpolation because I need to dynamically generate the CSV file name in S3 to be used in the Postgres AWS extension.
But I had all sorts of other errors before, e.g. some export/import inconsistency in the syntax around the S3 bucket sub-path that was leading to the Postgres AWS S3 extension to throw error HTTP 400:
SQL Error [XX000]: ERROR: HTTP 400. Check your arguments and try again. Where: SQL function "table_import_from_s3" statement 1
Is there a better alternative to export/import huge table from/to AWS RDS Aurora PostgresSQL?
The solution was to:
use PERFORM instead of SELECT when running aws_s3.table_import_from_s3 inside a stored procedure,
loop on all the S3 paths to the CSV file parts e.g. my_subpath/my_file.csv_part1 to my_subpath/my_file.csv_part26 (bear in mind there's also the "part 0" my_subpath/my_file.csv)
create the table index AFTER the data I/O above
-- this goes into the loop for all the CSV parts
PERFORM aws_s3.table_import_from_s3(
'my_schema.my_large_table_new',
'',
'(format csv)',
aws_commons.create_s3_uri(
'my_bucket',
'my_subpath/my_file.csv_part26',
'eu-central-1'
)
);
-- then AFTER the CSV ingestion create the index on the table
CREATE INDEX my_dx ON my_schema.my_large_table_new USING btree (my_column)
This still took 1 day of processing all the CSV files of 6GB each. Not very practical for most scenarios.
For the sake of SQL completeness, make sure the Postgres extension is installed and configured like this:
DROP EXTENSION aws_s3;
DROP EXTENSION aws_commons;
CREATE EXTENSION aws_s3 CASCADE;
You will still have to configure policies, roles and all of that on AWS.

Error when trying to import with CSV file format in Cloud SQL

HTTPError 400: Unknow export file type was thrown when I try to Import csv file from my Cloud Storage bucket into my Cloud SQL db. Any idea what I missed out.
Reference:
gcloud sql import csv
CSV files are not supported in Cloud SQL, MS SQL Server. As mentioned here,
In Cloud SQL, SQL Server currently supports importing databases using
SQL and BAK files.
Somehow, it is supported for MySQL and PostgreSQL versions of Cloud SQL.
You could perform one of the next solutions:
Change the database engine to either PostgreSQL or MySQL (where CSV files are supported).
If the data on your CSV file came from an on-premise SQL Server DB table, you can create an SQL file from it, then use it to import into Cloud SQL, SQL Server.

Imoprting a csv file to Postgresql

I am trying to import a csv file into a postgres database. The csv file is on a local server while the db is on another server. From what I've seen, the recommendations were using \copy.
\COPY tablename from '/path/to/local/file.csv' with csv
But it shows me a syntax error. And the problem is at \copy.
Any suggestion how can I approach this problem?

How do i import .csv file from a remote server to a postgreSQL database?

The original code is a simple SQL import :
LOAD DATA LOCAL INFILE 'D:/FTP/foo/foo.csv'
INTO TABLE error_logs
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY ''
LINES STARTING BY ''
TERMINATED BY '\n'
IGNORE 1 LINES
(Server,Client,Error,Time);
I need to migrate a web portal (from SQL to Postgres[I know there are tools for that, but its not the question]) and the issue is i am no more working on local.
I didn't see anybody ask the question in this way : import .csv from a remote server to a postgres db.
I think i have to use COPY but i dont get the right syntax...
Thanks for your attention.
the copy command is an option to do this.
I had to do this once time.
How to import CSV file data into a PostgreSQL table?
Copying PostgreSQL database to another server