AWS Redshift to Postgres federated query failure - amazon-redshift

I've had a look here but can't find an answer to my problem. I'm trying to run a federated query on Redshift (to Postgres) from DataGrip but it errors out with -
[HY000][100071] [Simba]AthenaJDBC An error has been thrown from the AWS Athena client. No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input. [Execution ID not available] com.simba.athena.amazonaws.services.athena.model.InvalidRequestException: No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input. (Service: AmazonAthena; Status Code: 400; Error Code: InvalidRequestException; Request ID: 6b51efb8-5f90-46c0-abc9-10da282926fd; Proxy: null)
Should I setup something in the AWS Athena Console before running the SQLs?
Others in my team are able to query the table fine so I am at my wits end trying to figure out what I'm doing wrong.
Appreicate any help, thank you.

Related

AWS Athena Federated Query - GENERIC_USER_ERROR when running DB query for PostgreSQL

Hi all,
I am trying to execute queries on a postgresql database I created in AWS.
I added a data source to Athena, I created the data source for postgresql and I created the lambda function.
In Lambda function I set:
default connection string
spill_bucket and spill prefix (I set the same for both: 'athena-spill'. In the S3 page I cannot see any athena-spill bucket)
the security group --> I set the security group I created to access the db
the subnet --> I set one of the database subnet
I deployed the lambda function but I received an error and I had to add a new environment variable created with the connection string but named as 'dbname_connection_string'.
After adding this new env variable I am able to see the database in Athena but when I try to execute any query on this database as:
select * from tests_summary limit 10;
I receive this error after running query:
GENERIC_USER_ERROR: Encountered an exception[com.amazonaws.SdkClientException] from your LambdaFunction[arn:aws:lambda:eu-central-1:449809321626:function:data-production-athena-connector-nina-lambda] executed in context[retrieving meta-data] with message[Unable to execute HTTP request: Connect to s3.eu-central-1.amazonaws.com:443 [s3.eu-central-1.amazonaws.com/52.219.170.25] failed: connect timed out]
This query ran against the "public" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 3366bd80-143e-459c-a4da-5350b5ab4a77
What could be causing the problem?
Thanks a lot!
Root Cause:
VPC have no internet connection issue, causing Lambda can't access S3.
Solution:
Add VPC Gateway Endpoint (Select com.amazonaws.eu-central-1.s3) in Lambda associated VPC.

RDS OptionGroup not working while creating it from via cloudformation for SQL Server

I am trying to create an rds option group for RDS SQlserver independently via cloud formation while creating it is getting failed with the below error. The same when I am created with the same parameters it is able to create. Any pointers would be very helpful.
SqlServerOptionGroup:
Type: AWS::RDS::OptionGroup
Properties:
EngineName: "sqlserver-ex"
MajorEngineVersion: "14.0.0"
OptionGroupDescription: rds-sql-optiongroup
OptionConfigurations:
- OptionName: SQLSERVER_BACKUP_RESTORE
Error:
Cannot find major version 14.0.0 for sqlserver-ex (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination
The same when I have created via console it is getting created
Try "14.00" for MajorEngineVersion.
I also found you need to quote the EngineName and MajorEngineVersion, which you have done.

Newbie help - how to connect to AWS Redshift cluster (currently using Aginity)

(I'm afraid I'm probably about to reveal myself as completely unfit for the task at hand!)
I'm trying to setup a Redshift cluster and database to help manage data for a class/group project.
I have a dc2.large cluster running with either default options, or what looked like the most generic in the couple of place I was forced to make entries.
I have downloaded Aginity (Win64) as it is described as being specialized for Redshift. That said, I can't find any instructions for connecting using it. The connection dialog requests the follwoing:
Server: using the endpoint for my cluster (less :57xx at the end).
UserID: the Master username for the database defined for the cluster.
Password: to match the UserID
SSL Mode (Disable, Allow, Prefer, Require): trying various options
Database: as named in cluster setup
Port: as defined in cluster setup
I can't get it to connect ("failed to establish connection") and don't know if I'm entering something wrong in Aginity or if I haven't set up my cluster properly.
Message: Failed to establish a connection to 'abc1234-smtm.crone7m2jcwv.us-east-1.redshift.amazonaws.com'.
Type : Npgsql.NpgsqlException
Source : Npgsql
Trace : at Npgsql.NpgsqlClosedState.Open(NpgsqlConnector context, Int32 timeout)
at Npgsql.NpgsqlConnector.Open()
at Npgsql.NpgsqlConnection.Open()
at Aginity.MPP.Common.BaseDataProvider.get_Connection()
at Aginity.MPP.Common.BaseDataProvider.CreateCommand(String commandText, CommandType commandType, IDataParameter[] commandParams)
at Aginity.MPP.Common.BaseDataProvider.ExecuteReader(String commandText, CommandType commandType, IDataParameter[] commandParams)
--- Inner Exception: ---
......
It seems there is not enough information going into Aginity to authorize connection to my cluster - no account credential are supplied. For UserID, am I meant to enter the ID of a valid user? Can I use the root account? What would the ID look like? I have setup a User with FullAccess to S3 and Redshift, then entered the UserID in this format
arn:aws:iam::600123456789:user/john
along with the matching password, but that hasn't worked either.
The only training/tutorial I have been able to find/do on this is the Intro AWS direct you to, at https://qwiklabs.com/focuses/2366, which uses a web-based client that I can't find outside of the tutorial (pgweb).
Any advice what I am doing wrong, and how to do it right?
Well, I think I got it working - I haven't had a chance to see if I can actually create table yet, but it seems to be connected. I had to allow inbound traffic from outside the VPC, as per the above snapshot.
I'm guessing there's a better way than opening it up to all IP addresses, but I don't know the users' (fellow team members) IPs, and aren't they all subject to change depending on the device they're using to connect?
How does one go about getting inside the VPC to connect that way, presumably more securely?

EMR Spark Fails to Save Dataframe to S3

I am using the RunJobFlow command to spin up a Spark EMR cluster. This command sets the JobFlowRole to an IAM Role which has the policies AmazonElasticMapReduceforEC2Role and AmazonRedshiftReadOnlyAccess. The first policy contains an action to allow all s3 permissions.
When the EC2 instances spin up, they assume this IAM role, and generate temporary credentials via STS.
The first thing which I do is read a table from my Redshift cluster into a Spark Dataframe using the com.databricks.spark.redshift format and using the same IAM Role to unload the data from redshift as I did for the EMR JobFlowRole.
So far as I understand, this runs an UNLOAD command on Redshift to dump into the S3 bucket I specify. Spark then loads the newly unloaded data into a Dataframe. I use the recommended s3n:// protocol for the tempdir option.
This command works great, and it always successfully loads the data into the Dataframe.
I then run some transformations and attempt to save the dataframe in the csv format to the same S3 bucket Redshift Unloaded into.
However, when I try to do this, it throws the following error
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively)
Okay. So I don't know why this happens, but I tried to hack around it by setting the recommended hadoop configuration parameters. I then used DefaultAWSCredentialsProviderChain to load the AWSAccessKeyID and AWSSecretKey and set via
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", <CREDENTIALS_ACCESS_KEY>)
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", <CREDENTIALS_SECRET_ACCESS_KEY>)
When I run it again it throws the following error:
java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId;
Okay. So that didn't work. I then removed setting the hadoop configurations and hardcoded an IAM user's credentials in the s3 url via s3n://ACCESS_KEY:SECRET_KEY#BUCKET/KEY
When I ran this it spit out the following error:
java.lang.IllegalArgumentException: Bucket name should be between 3 and 63 characters long
So it tried to create a bucket.. which is definitely not what we want it to do.
I am really stuck on this one and would really appreciate any help here! It works fine when I run it locally, but completely fails on EMR.
The problem was the following:
EC2 Instance Generated Temporary Credentials on EMR Bootstrap Phase
When I queried Redshift, I passed the aws_iam_role to theDatabricks driver. The driver then re-generated temporary credentials for that same IAM role. This invalidated the credentials the EC2 instance generated.
I then tried to upload to S3 using the old credentials (and the credentials which were stored in the instance's metadata)
It failed because it was trying to use out-of-date credentials.
The solution was to remove redshift authorization via aws_iam_role and replace it with the following:
val credentials = EC2MetadataUtils.getIAMSecurityCredentials
...
.option("temporary_aws_access_key_id", credentials.get(IAM_ROLE).accessKeyId)
.option("temporary_aws_secret_access_key", credentials.get(IAM_ROLE).secretAccessKey)
.option("temporary_aws_session_token", credentials.get(IAM_ROLE).token)
On amazon EMR, try usong the prefix s3:// to refer to an object in S3.
It's a long story.

WSO2 API MANAGER clustering Worker-Manager

This is regarding WSO2 API Manager Worker cluster configuration with external Postgres db. I have used 2 databases i.e wso2_carbon for registry and user management and the wso2_am, for storing APIs. Respective xmls have been configured. The postgres scripts have been run to create the database tables. My log console when wso2server.sh is run, shows enabled clustering and the members of the domain. However on the https://: when I try to create to create APIs, it throws and error in the design phase itself.
ERROR - add:jag org.wso2.carbon.apimgt.api.APIManagementException: Error while checking whether context exists
[2016-12-13 04:32:37,737] ERROR - ApiMgtDAO Error while locating API: admin-hello-v.1.2.3 from the database
java.sql.SQLException: org.postgres.Driver cannot be found by jdbc-pool_7.0.34.wso2v2
As per the error message, the driver class name you have given is org.postgres.Driver which is not correct. It should be org.postgresql.Driver. Double check master-datasource.xml config.