Unable to create external schema for Amazon Redshift Spectrum - amazon-redshift

Trying to follow the https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html to query s3 usage from redshift via athena.
Running into an error when attempting to create schema in Step 3:
"create external schema athena_schema from data catalog
database 'sampledb'
iam_role 'arn:aws:iam::<>:role/MySpectrumRole'
region 'us-east-1';"
Error: "line 1:8: no viable alternative at input 'create external'
(service: amazonathena; status code: 400; error code: invalidrequestexception;"
Any suggestions on why I run into this or how to resolve it?

Turns out you need to give permissions to the cluster owner for AthenaFullAccess and S3ReadOnlyAccess not just the role you are logging into redshift as

Related

Copy data File AWS S3 to Aurora Postgres

Trying to Copy csv File from AWS S3 to Aurora Postgres.
I did add S3 access to the RDS table for s3 import.
Is there anything else i am missing?
This is command that i tried:
SELECT aws_s3.table_import_from_s3 ('t1','','DELIMITER '','' CSV HEADER',aws_commons.create_s3_uri('testing','test_1.csv','us-west-2'));
Error:
NOTICE: HINT: make sure your instance is able to connect with S3.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 0 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
ERROR: Unable to generate pre-signed url, look at engine log for details.
CONTEXT: SQL function "table_import_from_s3" statement 1 ```
can anyone help me on this please?

export Amazon RDS into S3 or locally

i am using Amazon RDS Aurora postgreSQL 10.18, i need to export a specific tables with more than 50,000 rows into csv file (either local or into s3 bucket), i have tried many procedure but ended up with fail :
i tried the button export to csv from the query editor after select all rows but the API response with too large data to return
i tried to use aws_s3.query_export_to_s3, but ERROR: (credentials stored with the database cluster can’t be accessed Hint: Has the IAM role Amazon Resource Name (ARN) been associated with the feature-name "s3Export")
i tried to take a snapshot from our instance, then export it into s3 bucket but ended up with error (The specified db snapshot engine mode isn’t supported and can’t be exported)

How Do i read the Lake database in Azure Synapse in a PySpark notebook

Hi I created a Database in Azure Synapse Studio and I can see the database and table in there, Now I have created a Notebook where I have added the required libraries but I am unable to read the table by below code. Can anyone fix what wrong am i doing here ?
My database name is Utilities_66_Demo . It gives me error as
AnalysisException: Path does not exist:
abfss://users#stcdmsynapsedev01.dfs.core.windows.net/Utilities_66_Demo.parquet
From where should I take the path? I tried to follow the MS article. Where Do I read path? if I click on edit Database, i get this
%%pyspark
df = spark.read.load('abfss://users#stcdmsynapsedev01.dfs.core.windows.net/Utilities_66_Demo.parquet', format='parquet')
display(df.limit(10))
Trying to access the created Lake Database Table:
Selected Azure Synapse Analytics:
I select my workspace and in dropdown there is no table shown:
I select Edit and put my Db name and Table name and it says Invalid
details.
Now I select Azure Dedicated Synapse Pool from Linked Service,
I get no option to select in SQL Pool or Table, and without SQL Pool I am unable to create a Linked service just by inserting Table name:
You can directly go to your ADLS and right click the parquet file and select properties. There, you will be able to find the ABFSS path which is in the format :
abfss://<container_name>#<storage_account_name>.dfs.core.windows.net/<path

AWS Glue JDBC Crawler - relation does not exist

I'm using AWS Glue and have a crawler to reflect tables from a particular schema in my Redshift cluster to make those data accessible to my Glue Jobs. This crawler has been working fine for a month or more, but now all of the sudden I'm getting the following error:
Error crawling database reporting: SQLException: SQLState: 42P01 Error Code: 500310 Message: [Amazon](500310) Invalid operation: relation "{table_name}" does not exist
But, I can query the relevant schema & table with the exact same credentials used for the connection that Glue is using. I am able to subset to particular tables in the schema and have Glue reflect those, but not the full schema or the problematic tables it runs into.
Any ideas on how Glue reflects tables from Redshift and what might be going on here? The crawlers are all pretty black-box so I've pretty quickly run out of debugging ideas and not sure what else to try.

SQL Database + LOAD + CLOB files = error SQL3229W

I'm having trouble making loads of tables that have CLOBS and BLOBS columns in a 'SQL Database' database in Bluemix.
The error returned is:
SQL3229W The field value in row "617" and column "3" is invalid. The row was
rejected. Reason code: "1".
SQL3185W The previous error occurred while processing data from row "617" of
the input file.
The same procedures performed in a local environment functioned normally.
under the command you use to load:
load client from /home/db2inst1/ODONTO/tmp/ODONTO.ANAMNESE.IXF OF IXF LOBS FROM /home/db2inst1/ODONTO/tmp MODIFIED BY IDENTITYOVERRIDE replace into USER12135.TESTE NONRECOVERABLE
The only manner currently you can upload lob files to a SQLDB or dashDB is to load the data and lobs from the cloud. The option is to get data from a Swift object storage in Softlayer or a Amazon S3 storage. You should have an account on one of those services.
After that, you can use the following syntax:
db2 "call sysproc.admin_cmd('load from Softlayer::softlayer_end_point::softlayer_username::softlayer_api_key::softlayer_container_name::mylobs/blob.del of del LOBS FROM Softlayer::softlayer_end_point::softlayer_username::softlayer_api_key::softlayer_container_name::mylobs/ messages on server insert into LOBLOAD')"
Where:
mylobs/ is the directory inside the Softlayer swift object storage container, defined in
LOBLOAD is the table name to be loaded in
Example:
db2 "call sysproc.admin_cmd('load from Softlayer::https://lon02.objectstorage.softlayer.net/auth/v1.0::SLOS424907-2:SL523907::0ac631wewqewre8af20c576ad5214ec70f163d600d247bd5d4dfef5453f72ff6::TestContainer::mylobs/blob.del of del LOBS FROM Softlayer::https://lon02.objectstorage.softlayer.net/auth/v1.0::SLOS424907-2:SL523907::0ac631wewqewre8af20c576ad5214ec70f163d600d247bd5d4dfef5453f72ff6::TestContainer::mylobs/ messages on server insert into LOBLOAD')"