AWS Glue JDBC Crawler - relation does not exist

AWS Glue JDBC Crawler - relation does not exist - amazon-redshift

I'm using AWS Glue and have a crawler to reflect tables from a particular schema in my Redshift cluster to make those data accessible to my Glue Jobs. This crawler has been working fine for a month or more, but now all of the sudden I'm getting the following error:
Error crawling database reporting: SQLException: SQLState: 42P01 Error Code: 500310 Message: [Amazon](500310) Invalid operation: relation "{table_name}" does not exist
But, I can query the relevant schema & table with the exact same credentials used for the connection that Glue is using. I am able to subset to particular tables in the schema and have Glue reflect those, but not the full schema or the problematic tables it runs into.
Any ideas on how Glue reflects tables from Redshift and what might be going on here? The crawlers are all pretty black-box so I've pretty quickly run out of debugging ideas and not sure what else to try.

Related

AWS Babelfish with Postgres is throwing sporadic error

we are using AWS Babelfish with Postgres enabled DB, ideally it's a Postgres DB. There are frequent errors related to could not open relation. The same SP executes fine sometimes and same one fails with error sporadically if not more frequently. I have found a article[https://www.postgresql.org/message-id/12791.1310599941%40sss.pgh.pa.us] which discuss about the error, but it doesn't point the exact issue. And other articles doesn't help me understand the pattern of the error. I have added .dbo to all the tables as one of the possible fix and dropped all the temp tables at end of the SP as well.
Level 16, State 1, Line 4
could not open relation with OID 54505

AWS Glue ETL Job Missing collection name

I have data catalog tables generated by crawlers one is data source from mongodb, and second is datasource Postgres sql (rds). Crawlers running successfully & connections test working.
I am trying to define an ETL job from mongodb to postgres sql (simple transform).
In the job I defined source as AWS Glue Data Catalog (mongodb) and target as Data catalog Postgres.
When I run the job I get this error:
IllegalArgumentException: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property
It looks like this is related to the mongodb part. I tried to set the 'database' and 'collection' parameters in the data catalog tables and it didn't help
Script generated for source is:
AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
What could be missing?

I had the same problem, just add the parameter below.
AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
additional_options = {"database":"data-catalog-db",
"collection":"data-catalog-table"}
Additional parameters can be found on the AWS page
https://docs.aws.amazon.com/glue/latest/dg/connection-mongodb.html

Glue ETL Job is failing while accessing Empty Redshift table using gluecontext

I am facing issue while accessing Empty Redshift Tables from GLueContext Dynamic frame. The table is visible in Glue Catalog. But when I try accessing it from Glue ETL, it is throwing error on below mentioned line of code:
glueContext.create_dynamic_frame.from_catalog(
database=<redshift_database>,
table_name=<table_name>,
redshift_tmp_dir=<redshift_temp>,
transformation_ctx=<transformation_ctx>
)
If I insert 1 row in the Redshift table and run the job it is running successfully.
I started facing this issue from Monday, April 27, 2020. Earlier, I was able to run Glue ETL jobs on Empty Redshift tables.
Did anyone face this issue?
Why is Glue Job Failing for empty Redshift tables?
Error Message:
An error occurred while calling o118.getDynamicFrame. The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: BD60647947F6BA52; S3 Extended Request ID: JuIqVpL2nJuxdVtR4pgK/kH5TamNFlcFC7EfMOpdxgT/1tlBy/nnPnPcsqurIf24zaDKAcbw0Hk=)

It's working fine now, without making any changes

Creating a new Postgres Scheme for Offer-Ready in Azure Cloud - "Non empty schema"

I tried to build an Offer-Ready Docker container on Azure Cloud. Although I created a new (blank) table in PostgreSQL, I got this strange error message.
javax.servlet.ServletException: org.eclipse.jetty.servlet.ServletHolder$1: org.flywaydb.core.api.FlywayException: Found non-empty schema(s) "public" without schema history table! Use baseline() or set baselineOnMigrate to true to initialize the schema history table.
I double-checked the database, there is no table in schema "public". I didn't have that problem on AWS. Has anybody an idea what is different on Azure?

I had the same experience once.
The PostgreSQL database on Azure seemed empty (\dt returned no results),
But Flyway claimed the database was not empty (and therefore would not apply the migration scripts, for fear of interfering with whatever was already there).
Here is what I did was:
Create a new schema within the database e.g. myschema
Delete the default schema called public
Add the parameter currentSchema=myschema to the JDBC URL
And then it worked. I never got to find out what the root cause of this problem was.
EDIT: This link might provide more information on what objects are in the "public" schema by default on Azure PostgreSQL: https://community.atlassian.com/t5/Jira-questions/Re-quot-database-that-is-not-empty-quot-when-trying-to-use-azure/qaq-p/1308795/comment-id/410329#M410329

Linking Access tables into a PostgreSQL Database using a foreign data wrapper

I'm new to postgres so this problem is probably a relatively easy one for someone else. However, I have spent many frustrating hours trying to figure out the solution. I have an Access Database of metadata that must be kept updated for sending records to other groups. I also have a database using PostgreSQL and PGAdmin which also has these same metadata tables. Currently these tables in the Postgres database get updated manually by exporting the Access tables as excel files, and then importing them into the SQL tables. It's not the most efficient process and could lead to errors in the SQL database if someone forgets to check before running any queries that they are using the most recent data from Access. So I would like to integrate some of the tables from my Access database with my Postgres database.
Originally I tried just installing drivers to export the Access tables directly to Postgres which worked, but not in the way that I wanted since it just brings in a table which I would still need to manually update. From my understanding, I can create a server connection in postgres to access and that would then bring in updated data using a foreign data wrapper.
I tried to use ogr_fdw.
CREATE EXTENSION ogr_fdw;
When I try:
CREATE SERVER metadata
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'H:\Databases\20170712.accdb',
format 'ODBC' );
I receive: ERROR: unable to connect to data source "H:\Databases\20170712.accdb"
SQL state: HV00D
When I try:
CREATE SERVER metadata
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'H:\Databases\20170712.accdb',
format 'ACCDB' );
I receive: ERROR: unable to find format "ADDCB"
HINT: See the formats list at http://www.gdal.org/ogr_formats.html.
I also tried MDB and received the same error. However, MDB is the code name given by the website but it says that it needs JDK/JRE to compile and I'm not really sure if that's another type of driver that I would need or what it is.
When I try:
CREATE SERVER metadata
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'H:\Databases\20170712.mdb',
format 'ODBC' );
I receive: ERROR: unable to connect to data source "H:\Databases\20170712.mdb"
SQL state: HV00D
Hint: Unable to initialize ODBC connection to DSN for DRIVER=Microsoft Access Driver (*.mdb);DBQ=H:\Databases\20170712.mdb,
[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
However I thought after looking at the github help page for ogr_fdw didn't need ODBC and special drivers https://github.com/pramsey/pgsql-ogr-fdw/blob/master/FAQ.md.
A lot of this is probably due to my limited knowledge of the terminology when I'm reading through a lot of this stuff. Also my Access database is an .accdb file but since that wasn't working I tried around with mdb and ODBC as the "format" too. If anyone has any suggestions I would greatly appreciate it.
Thanks!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse