I am facing issue while accessing Empty Redshift Tables from GLueContext Dynamic frame. The table is visible in Glue Catalog. But when I try accessing it from Glue ETL, it is throwing error on below mentioned line of code:
glueContext.create_dynamic_frame.from_catalog(
database=<redshift_database>,
table_name=<table_name>,
redshift_tmp_dir=<redshift_temp>,
transformation_ctx=<transformation_ctx>
)
If I insert 1 row in the Redshift table and run the job it is running successfully.
I started facing this issue from Monday, April 27, 2020. Earlier, I was able to run Glue ETL jobs on Empty Redshift tables.
Did anyone face this issue?
Why is Glue Job Failing for empty Redshift tables?
Error Message:
An error occurred while calling o118.getDynamicFrame. The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: BD60647947F6BA52; S3 Extended Request ID: JuIqVpL2nJuxdVtR4pgK/kH5TamNFlcFC7EfMOpdxgT/1tlBy/nnPnPcsqurIf24zaDKAcbw0Hk=)
It's working fine now, without making any changes
Related
I have created a visual job in AWS Glue where I extract data from Snowflake and then my target is a postgresql database in AWS.
I have been able to connect to both Snowflak and Postgre, I can preview data from both.
I have also been able to get data from snoflake, write to s3 as csv and then take that csv and upload it to postgre.
However when I try to get data from snowflake and push it to postgre I get the below error:
o110.pyWriteDynamicFrame. null
So it means that you can get the data from snowflake in a Datafarme and while writing the data from this datafarme to postgres, you are failing.
You need to check was glue logs to get more understanding why is this failing while writing the data into postgres.
Please check if you have the right version of jars (needed by postgres) compatible with scala(on was glue side).
I have data catalog tables generated by crawlers one is data source from mongodb, and second is datasource Postgres sql (rds). Crawlers running successfully & connections test working.
I am trying to define an ETL job from mongodb to postgres sql (simple transform).
In the job I defined source as AWS Glue Data Catalog (mongodb) and target as Data catalog Postgres.
When I run the job I get this error:
IllegalArgumentException: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property
It looks like this is related to the mongodb part. I tried to set the 'database' and 'collection' parameters in the data catalog tables and it didn't help
Script generated for source is:
AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
What could be missing?
I had the same problem, just add the parameter below.
AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
additional_options = {"database":"data-catalog-db",
"collection":"data-catalog-table"}
Additional parameters can be found on the AWS page
https://docs.aws.amazon.com/glue/latest/dg/connection-mongodb.html
I'm having this issue with creating a table on my postgres DB on AWS RDS by importing the raw csv data. Here's the few steps that I already did.
CSV file has been uploaded on my S3 bucket
Followed AWS's tutorial to give RDS permission to import data from S3
Created an empty table on postgres
Tried using pgAdmin's 'import' feature to import the local csv file into the table, but it kept giving me the error.
So I'm using this query below to import the data into the table:
SELECT aws_s3.table_import_from_s3(
'public.bayarea_property_data',
'',
'(FORMAT CSV, HEADER true)',
'cottage-prop-data',
'clean_ta_file_edit.csv',
'us-west-1'
);
However, I keep getting this message:
ERROR: extra data after last expected column
CONTEXT: COPY bayarea_property_data, line 2: ",2009.0,2009.0,0.0,,0,2019,13061.0,,0,0.0,0.0,,2019,0.0,6767.0,576040,172810,403230,70,1,,1.0,,6081,..."
SQL statement "copy public.bayarea_property_data from '/rdsdbdata/extensions/aws_s3/amazon-s3-fifo-6261-20200819T083314Z-0' with (FORMAT CSV, HEADER true)"
SQL state: 22P04
Anyone can help me with this? I'm an AWS noob, so have been struggling over the past few days. Thanks!
Trying to follow the https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html to query s3 usage from redshift via athena.
Running into an error when attempting to create schema in Step 3:
"create external schema athena_schema from data catalog
database 'sampledb'
iam_role 'arn:aws:iam::<>:role/MySpectrumRole'
region 'us-east-1';"
Error: "line 1:8: no viable alternative at input 'create external'
(service: amazonathena; status code: 400; error code: invalidrequestexception;"
Any suggestions on why I run into this or how to resolve it?
Turns out you need to give permissions to the cluster owner for AthenaFullAccess and S3ReadOnlyAccess not just the role you are logging into redshift as
I'm using AWS Glue and have a crawler to reflect tables from a particular schema in my Redshift cluster to make those data accessible to my Glue Jobs. This crawler has been working fine for a month or more, but now all of the sudden I'm getting the following error:
Error crawling database reporting: SQLException: SQLState: 42P01 Error Code: 500310 Message: [Amazon](500310) Invalid operation: relation "{table_name}" does not exist
But, I can query the relevant schema & table with the exact same credentials used for the connection that Glue is using. I am able to subset to particular tables in the schema and have Glue reflect those, but not the full schema or the problematic tables it runs into.
Any ideas on how Glue reflects tables from Redshift and what might be going on here? The crawlers are all pretty black-box so I've pretty quickly run out of debugging ideas and not sure what else to try.