Insert data into Redshift from Windows txt files - postgresql

I have 50 txt files on windows and I would like to insert their data into a single table on Redshift.
I created the basic table structure and now I'm having issues with inserting the data. I tried using COPY command from SQLWorkbench/J but it didn't work out.
Here's the command:
copy feed
from 'F:\Data\feed\feed1.txt'
credentials 'aws_access_key_id=<access>;aws_secret_access_key=<key>'
Here's the error:
-----------------------------------------------
error: CREDENTIALS argument is not supported when loading from file system
code: 8001
context:
query: 0
location: xen_load_unload.cpp:333
process: padbmaster [pid=1970]
-----------------------------------------------;
Upon removing the Credentials argument, here's the error I get:
[Amazon](500310) Invalid operation: LOAD source is not supported. (Hint: only S3 or DynamoDB or EMR based load is allowed);
I'm not a UNIX user so I don't really know how this should be done. Any help in this regard would be appreciated.

#patthebug is correct in that Redshift cannot see your local Windows drive. You must push the data into an S3 bucket. There are some additional sources you can use per http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html, but they seem outside the context you're working with. I suggest you get a copy of Cloudberry Explorer (http://www.cloudberrylab.com/free-amazon-s3-explorer-cloudfront-IAM.aspx) which you can use to copy those files up to S3.

Related

Best practice for importing bulk data to AWS RDS PostgreSQL database

I have a big AWS RDS database that needs to be updated with data on a periodic basis. The data is in JSON files stored in S3 buckets.
This is my current flow:
Download all the JSON files locally
Run a ruby script to parse the JSON files to generate a CSV file matching the table in the database
Connect to RDS using psql
Use \copy command to append the data to the table
I would like switch this to an automated approach (maybe using an AWS Lambda). What would be the best practices?
Approach 1:
Run a script (Ruby / JS) that parses all folders in the past period (e.g., week) and within the parsing of each file, connect to the RDS db and execute an INSERT command. I feel this is a very slow process with constant writes to the database and wouldn't be optimal.
Approach 2:
I already have a Ruby script that parses local files to generate a single CSV. I can modify it to parse the S3 folders directly and create a temporary CSV file in S3. The question is - how do I then use this temporary file to do a bulk import?
Are there any other approaches that I have missed and might be better suited for my requirement?
Thanks.

Can DB2REMOTE be used to point a file from another server?

Using the script below, I was able to load the data to the table with local files.
db2 load from SOME/LOCAL/File.txt of asc modified by reclen=123 method L \(1 11, 12 14\) REPLACE INTO schema.tablename
However, I want to achieve to load the file from another server. I don't want to transfer the files from another server to db2 server so I will be able to use the command as above. Found that DB2REMOTE can be used for remotefiles in this documentation, but I'm not sure how to execute it with success.
Do I need to do this also? Because I don't have the right IAM role and don't have the credentials to do so. If I just can skip this and proceed to connect with another server only.
This is the script I'm trying with DB2REMOTE:
db2 load from 'DB2REMOTE://centos#123.456.789.0:/folders/directory/file.txt' of asc modified by reclen=123 method L \(1 11, 12 14\) REPLACE INTO schema.tablename
Thank you in advance!
DB2REMOTE is for accessing cloud object storage (e.g Amazon S3, IBM Cloud Object Storage), from some Db2 commands.
If you are not using cloud object storage, then mount the remote directory locally with appropriate permissions, and specify the local mountpoint with the Db2 load command .
You can remote mount with SSHFS or similar, when installed and properly configured. This is not programming , but instead it is administration and configuration.

uploading large file to AWS aurora postgres serverless

I have been trying for days to copy a large CSV file to a table in PostgreSQL I am using PGadmin4 to access the database. I have a file on my system the file is 10 GB so I am getting starting error when trying to upload it via UI or \copy command.
When talking about 10 GB CSV file, then you may use as well different options
I believe \copy should work, you did not provide any more information about the issue
I'd personally use the AWS Glue - an ETL service which could read from an S3 file

Copy into running fine but not loading data gzip file

I am using talend bulk execution to load data from s3 to snowflake. Talend tFileArchive converts the file to gzip format , file.csv.gz and upload it to s3 bucket. Copy into which gets executed through talend bulk component looks like below. It does nt throw an error or something but does nt load data either. If I try to load csv file without zip , it works fine.
File: file.csv.gz
Copy into table
from 's3://bucket/'
credentials=(aws_key_id='' aws_secret_key='')
FILE_FORMAT=(type=csv compression=gzip field_delimeter=',' skip_header=1 field_optionally_enclosed_by='\"' empty_field_as_null=true)
force=true
Can someone point wheres the issue ? Even if I execute above command through snowflake UI , it says ran successfully but does not load. File has data.
Thank you
View your table's COPY history to see if an error was thrown or not. Then try and LIST the file in your S3 bucket to make sure your STAGE is working.
SELECT *
FROM TABLE(information_schema.copy_history(table_name=>'YourDatabase.YourSchema.YourTable'
, start_time=> dateadd(days, -1, current_timestamp())));
LIST #YourStage
OR
LIST 's3://bucket/'
Also make sure you are executing your COPY INTO statement under a role that has permission to see the s3 bucket and write to the destination table.

SQL Database + LOAD + CLOB files = error SQL3229W

I'm having trouble making loads of tables that have CLOBS and BLOBS columns in a 'SQL Database' database in Bluemix.
The error returned is:
SQL3229W The field value in row "617" and column "3" is invalid. The row was
rejected. Reason code: "1".
SQL3185W The previous error occurred while processing data from row "617" of
the input file.
The same procedures performed in a local environment functioned normally.
under the command you use to load:
load client from /home/db2inst1/ODONTO/tmp/ODONTO.ANAMNESE.IXF OF IXF LOBS FROM /home/db2inst1/ODONTO/tmp MODIFIED BY IDENTITYOVERRIDE replace into USER12135.TESTE NONRECOVERABLE
The only manner currently you can upload lob files to a SQLDB or dashDB is to load the data and lobs from the cloud. The option is to get data from a Swift object storage in Softlayer or a Amazon S3 storage. You should have an account on one of those services.
After that, you can use the following syntax:
db2 "call sysproc.admin_cmd('load from Softlayer::softlayer_end_point::softlayer_username::softlayer_api_key::softlayer_container_name::mylobs/blob.del of del LOBS FROM Softlayer::softlayer_end_point::softlayer_username::softlayer_api_key::softlayer_container_name::mylobs/ messages on server insert into LOBLOAD')"
Where:
mylobs/ is the directory inside the Softlayer swift object storage container, defined in
LOBLOAD is the table name to be loaded in
Example:
db2 "call sysproc.admin_cmd('load from Softlayer::https://lon02.objectstorage.softlayer.net/auth/v1.0::SLOS424907-2:SL523907::0ac631wewqewre8af20c576ad5214ec70f163d600d247bd5d4dfef5453f72ff6::TestContainer::mylobs/blob.del of del LOBS FROM Softlayer::https://lon02.objectstorage.softlayer.net/auth/v1.0::SLOS424907-2:SL523907::0ac631wewqewre8af20c576ad5214ec70f163d600d247bd5d4dfef5453f72ff6::TestContainer::mylobs/ messages on server insert into LOBLOAD')"