Copy into running fine but not loading data gzip file - talend

I am using talend bulk execution to load data from s3 to snowflake. Talend tFileArchive converts the file to gzip format , file.csv.gz and upload it to s3 bucket. Copy into which gets executed through talend bulk component looks like below. It does nt throw an error or something but does nt load data either. If I try to load csv file without zip , it works fine.
File: file.csv.gz
Copy into table
from 's3://bucket/'
credentials=(aws_key_id='' aws_secret_key='')
FILE_FORMAT=(type=csv compression=gzip field_delimeter=',' skip_header=1 field_optionally_enclosed_by='\"' empty_field_as_null=true)
force=true
Can someone point wheres the issue ? Even if I execute above command through snowflake UI , it says ran successfully but does not load. File has data.
Thank you

View your table's COPY history to see if an error was thrown or not. Then try and LIST the file in your S3 bucket to make sure your STAGE is working.
SELECT *
FROM TABLE(information_schema.copy_history(table_name=>'YourDatabase.YourSchema.YourTable'
, start_time=> dateadd(days, -1, current_timestamp())));
LIST #YourStage
OR
LIST 's3://bucket/'
Also make sure you are executing your COPY INTO statement under a role that has permission to see the s3 bucket and write to the destination table.

Related

Best practice for importing bulk data to AWS RDS PostgreSQL database

I have a big AWS RDS database that needs to be updated with data on a periodic basis. The data is in JSON files stored in S3 buckets.
This is my current flow:
Download all the JSON files locally
Run a ruby script to parse the JSON files to generate a CSV file matching the table in the database
Connect to RDS using psql
Use \copy command to append the data to the table
I would like switch this to an automated approach (maybe using an AWS Lambda). What would be the best practices?
Approach 1:
Run a script (Ruby / JS) that parses all folders in the past period (e.g., week) and within the parsing of each file, connect to the RDS db and execute an INSERT command. I feel this is a very slow process with constant writes to the database and wouldn't be optimal.
Approach 2:
I already have a Ruby script that parses local files to generate a single CSV. I can modify it to parse the S3 folders directly and create a temporary CSV file in S3. The question is - how do I then use this temporary file to do a bulk import?
Are there any other approaches that I have missed and might be better suited for my requirement?
Thanks.

o110.pyWriteDynamicFrame. null

I have created a visual job in AWS Glue where I extract data from Snowflake and then my target is a postgresql database in AWS.
I have been able to connect to both Snowflak and Postgre, I can preview data from both.
I have also been able to get data from snoflake, write to s3 as csv and then take that csv and upload it to postgre.
However when I try to get data from snowflake and push it to postgre I get the below error:
o110.pyWriteDynamicFrame. null
So it means that you can get the data from snowflake in a Datafarme and while writing the data from this datafarme to postgres, you are failing.
You need to check was glue logs to get more understanding why is this failing while writing the data into postgres.
Please check if you have the right version of jars (needed by postgres) compatible with scala(on was glue side).

how to upload 900MB csv file from a website to postgresql

I want to do some data analysis from NYCopendata. The file is ~900 MB. So I am using postgresql database to store this file. I am using pgadmin4 but could not figure out how to directly store the csv in postgresl without first downloading in my machine. Any help is greatly appreciated.
Thanks.
You can use:
pgAdmin to upload a CSV file from import/export dialog
https://www.pgadmin.org/docs/pgadmin4/4.21/import_export_data.html
COPY statement on the database server
\copy command from psql on any client

Insert data into Redshift from Windows txt files

I have 50 txt files on windows and I would like to insert their data into a single table on Redshift.
I created the basic table structure and now I'm having issues with inserting the data. I tried using COPY command from SQLWorkbench/J but it didn't work out.
Here's the command:
copy feed
from 'F:\Data\feed\feed1.txt'
credentials 'aws_access_key_id=<access>;aws_secret_access_key=<key>'
Here's the error:
-----------------------------------------------
error: CREDENTIALS argument is not supported when loading from file system
code: 8001
context:
query: 0
location: xen_load_unload.cpp:333
process: padbmaster [pid=1970]
-----------------------------------------------;
Upon removing the Credentials argument, here's the error I get:
[Amazon](500310) Invalid operation: LOAD source is not supported. (Hint: only S3 or DynamoDB or EMR based load is allowed);
I'm not a UNIX user so I don't really know how this should be done. Any help in this regard would be appreciated.
#patthebug is correct in that Redshift cannot see your local Windows drive. You must push the data into an S3 bucket. There are some additional sources you can use per http://docs.aws.amazon.com/redshift/latest/dg/t_Loading_tables_with_the_COPY_command.html, but they seem outside the context you're working with. I suggest you get a copy of Cloudberry Explorer (http://www.cloudberrylab.com/free-amazon-s3-explorer-cloudfront-IAM.aspx) which you can use to copy those files up to S3.

Replace Existing File with Temp File:I/O Error

I have an Access 2007 database from which I call a Windows batch file to retrieve files from an external server via a ribbon menu. When executing the file manually everything works just fine. When executing the batch file via the Access ribbon menu the following error appears within the command line:
Opening data connection for ...
> Replace Existing File with Temp File:I/O Error
Binary transfer complete.
I've read something about this error in relation to (Admin) rights, but since the batch file actually runs when called by Access it not seems to be the issue.