Cannot get AWS Data Pipeline connected to Redshift - amazon-redshift

I have a query I'd like to run regularly in Redshift. I've set up an AWS Data Pipeline for it.
My problem is that I cannot figure out how to access Redshift. I keep getting "Unable to establish connection" errors. I have an Ec2Resource and I've tried including a subnet from our cluster's VPC and using the Security Group Id that Redshift uses, while also adding that sg-id to the inbound part of the rules. No luck.
Does anyone have a from-scratch way to set up a data pipeline to run against Redshift?
How I currently have my pipeline set up
RedshiftDatabase
Connection String: jdbc:redshift://[host]:[port]/[database]
Username, Password
Ec2Resource
Resource Role: DataPipelineDefaultResourceRole
Role: DataPipelineDefaultRole
Terminate after: 20 minutes
SqlActivity
Database: [database] (from Connection String)
Runs on: Ec2Resource
Script: SQL query
Error message
Unable to establish connection to jdbc:postgresql://[host]:[port]/[database] Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

Ok, so the answer lies in Security Groups. I had to find the Security Group my Redshift cluster is in, and then add that as a value to "Security Group" parameter on the Ec2Resource in the DataPipeline.
Ec2Resource
Resource Role: DataPipelineDefaultResourceRole
Role: DataPipelineDefaultRole
Terminate after: 20 minutes
Security Group: sg-XXXXX [pull from Redshift]

Try opening inbound rules to all sources, just to narrow down possible causes. You've probably done this, but make sure you've set up your jdbc driver and configurations according to this.

Related

AWS Athena Federated Query - GENERIC_USER_ERROR when running DB query for PostgreSQL

Hi all,
I am trying to execute queries on a postgresql database I created in AWS.
I added a data source to Athena, I created the data source for postgresql and I created the lambda function.
In Lambda function I set:
default connection string
spill_bucket and spill prefix (I set the same for both: 'athena-spill'. In the S3 page I cannot see any athena-spill bucket)
the security group --> I set the security group I created to access the db
the subnet --> I set one of the database subnet
I deployed the lambda function but I received an error and I had to add a new environment variable created with the connection string but named as 'dbname_connection_string'.
After adding this new env variable I am able to see the database in Athena but when I try to execute any query on this database as:
select * from tests_summary limit 10;
I receive this error after running query:
GENERIC_USER_ERROR: Encountered an exception[com.amazonaws.SdkClientException] from your LambdaFunction[arn:aws:lambda:eu-central-1:449809321626:function:data-production-athena-connector-nina-lambda] executed in context[retrieving meta-data] with message[Unable to execute HTTP request: Connect to s3.eu-central-1.amazonaws.com:443 [s3.eu-central-1.amazonaws.com/52.219.170.25] failed: connect timed out]
This query ran against the "public" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 3366bd80-143e-459c-a4da-5350b5ab4a77
What could be causing the problem?
Thanks a lot!
Root Cause:
VPC have no internet connection issue, causing Lambda can't access S3.
Solution:
Add VPC Gateway Endpoint (Select com.amazonaws.eu-central-1.s3) in Lambda associated VPC.

timeout errors from Lambda when trying to access an Amazon RDS DB instance

I am writing a Python app to run as lambda function and want to connect to an RDS DB instance without making it publicly accessible.
The RDS DB instance was already created under the default VPC with security group "sg-abcd".
So I have:
created a lambda function under the same default VPC
created a role with the following permission AWSLambdaVPCAccessExecutionRole and assigned it to the lambda function as in https://docs.aws.amazon.com/lambda/latest/dg/services-rds-tutorial.html
set sg-abcd as the lambda function's security group
added sg-abcd as source in the security group's inbound rules
added the CIDR range of the lambda function's subnet as source in the security group's inbound rules
However, when I invoke the lambda function it times out.
I can connect to the RDS DB from my laptop (after setting my IP as source in the sg's inbound rules), so I now that it is not an authentication problem. Also, for the RDS DB "Publicly Accessible" is set to "Yes".
Here's part of the app's code (where I try to connect):
rds_host = "xxx.rds.amazonaws.com"
port = xxxx
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn = psycopg2.connect(host=rds_host, database=db_name, user=name, password=password, connect_timeout=20)
except psycopg2.Error as e:
logger.error("ERROR: Unexpected error: Could not connect to PostgreSQL instance.")
logger.error(e)
sys.exit()
I really can't understand what I am missing. Any suggestion is welcomed, please help me figure it out!
Edit: the inbound rules that I have set look like this:
Security group rule ID: sgr-123456789
Type Info: PostgreSQL
Protocol Info: TPC
Port range Info: 5432
Source: sg-abcd OR IP or CIDR range
This document should help you out. Just make sure to get the suggestions for your specific scenario, whether the lambda function and the RDS instance are in the same VPC or not.
In my case I have the lambda function and the RDS instance in the same VPC and also both have the same subnets and SGs. But just make sure to follow the instructions in that document for the configurations needed for each scenario.

Encountering error 08001 when attempting to connect to database

When I attempt to connect to an instance of a PostgreSQL database I've created as per the AWS "Create and Connect to a PostgreSQL Database with Amazon RDS" tutorial located here (https://aws.amazon.com/getting-started/tutorials/create-connect-postgresql-db/), I receive an error that reads:
[08001] The connection attempt failed.
java.net.SocketTimeoutException: connect timed out.
The database is set to allow incoming and outgoing traffic on all ports and from all IP addresses. I am completely at a loss as to how to get this working and have reached out to AWS Support for their input, but, as yet, all I've done is follow the directions prescribed by the AWS tutorial--to no avail.
Does anyone know what might be the issue?
Edit: I should mention that all of my, host URL, port number, database name, etc. have been entered correctly into DataGrip, so none of the above are the issue.
All right--I've figured it out.
First off, #Mark B was right--the issue was that I hadn't yet made the database itself publicly accessible via the VPC security group of which it was a member. To do this, from the database detail screen in AWS, I:
clicked (what for me was the one and only) link beneath the "VPC security groups" of the database's dashboard, which directed me to the EC2 Security Groups screen
clicked the security group link related to my database, which directed me to that group's detail page
clicked the "Edit inbound rules" button which directed me to the "Edit inbound rules" screen
clicked the "Add rule" button, which caused a table row containing the following columns: "Type", "Protocol," "Port Range," "Source," "Description - optional"
selected "PostgreSQL" for the "Type" column, which caused the values of "TCP" and "5432" to populate the "Protocol" and "Port range" columns respectively, entered my machine's IP address ("123.456.789.012/32"--no quotes and no parentheses), and left "Description - optional" blank, because, well, it's optional.
Finally, I guess I'd forgotten to explicitly name the database, and so my attempts to enter what for me was ostensibly the database's name (that is, "database-1") resulted in a connection error indicating that "database-1" does not exist. So, for the sake of ease and simply verifying my database connection, I entered "postgres" as the database name in my database client (I'm presently using DataGrip), because "postgres" is the de facto name of a postgreSQL database.
And that should work. I'm sure this is all no-brainer stuff to those more experienced with AWS, but it's new to me and presumably to many others.
Thanks again, #Mark B, for sending me down the right path.

Newbie help - how to connect to AWS Redshift cluster (currently using Aginity)

(I'm afraid I'm probably about to reveal myself as completely unfit for the task at hand!)
I'm trying to setup a Redshift cluster and database to help manage data for a class/group project.
I have a dc2.large cluster running with either default options, or what looked like the most generic in the couple of place I was forced to make entries.
I have downloaded Aginity (Win64) as it is described as being specialized for Redshift. That said, I can't find any instructions for connecting using it. The connection dialog requests the follwoing:
Server: using the endpoint for my cluster (less :57xx at the end).
UserID: the Master username for the database defined for the cluster.
Password: to match the UserID
SSL Mode (Disable, Allow, Prefer, Require): trying various options
Database: as named in cluster setup
Port: as defined in cluster setup
I can't get it to connect ("failed to establish connection") and don't know if I'm entering something wrong in Aginity or if I haven't set up my cluster properly.
Message: Failed to establish a connection to 'abc1234-smtm.crone7m2jcwv.us-east-1.redshift.amazonaws.com'.
Type : Npgsql.NpgsqlException
Source : Npgsql
Trace : at Npgsql.NpgsqlClosedState.Open(NpgsqlConnector context, Int32 timeout)
at Npgsql.NpgsqlConnector.Open()
at Npgsql.NpgsqlConnection.Open()
at Aginity.MPP.Common.BaseDataProvider.get_Connection()
at Aginity.MPP.Common.BaseDataProvider.CreateCommand(String commandText, CommandType commandType, IDataParameter[] commandParams)
at Aginity.MPP.Common.BaseDataProvider.ExecuteReader(String commandText, CommandType commandType, IDataParameter[] commandParams)
--- Inner Exception: ---
......
It seems there is not enough information going into Aginity to authorize connection to my cluster - no account credential are supplied. For UserID, am I meant to enter the ID of a valid user? Can I use the root account? What would the ID look like? I have setup a User with FullAccess to S3 and Redshift, then entered the UserID in this format
arn:aws:iam::600123456789:user/john
along with the matching password, but that hasn't worked either.
The only training/tutorial I have been able to find/do on this is the Intro AWS direct you to, at https://qwiklabs.com/focuses/2366, which uses a web-based client that I can't find outside of the tutorial (pgweb).
Any advice what I am doing wrong, and how to do it right?
Well, I think I got it working - I haven't had a chance to see if I can actually create table yet, but it seems to be connected. I had to allow inbound traffic from outside the VPC, as per the above snapshot.
I'm guessing there's a better way than opening it up to all IP addresses, but I don't know the users' (fellow team members) IPs, and aren't they all subject to change depending on the device they're using to connect?
How does one go about getting inside the VPC to connect that way, presumably more securely?

Dataworks Forge can't establish connection for sql db service

I tried to establish connection using Dataworks Forge for sql db, I got following error.
Unable to establish a connection using the supplied values.Check that all values are correct and try again. Internal Details: Failed to send the request to the handler: The agent at yp-iis-dataworks-ga-wdc01-2-12-0-0-5-vm5:31531 is not available.; nested exception is: com.ibm.iis.prs.exception.CommunicationException: Failed to send the request to the handler: The agent at yp-iis-dataworks-ga-wdc01-2-12-0-0-5-vm5:31531 is not available.
I input the values based on VCAP_Service, and double checked it. How can I troubleshoot this?
Connection name sqldb1
Host 75.126.155.1xx
Database SQLDB
User user06xxx
Port 50000
Password xxx
Today, I did same thing when I posted this question and could establish connection successfully.
As Nigel mentioned, the dataworks service states when I posted was green. But maybe there were some issues. And they were fixed now.