Access database which is running in EC2 instance through AWS-lambda function - postgresql

I wrote the lambda function in python3.6 to access the postgresql database which is running in EC2 instance.
psycopg2.connect(user="<USER NAME>",
password="<PASSWORD>",
host="<EC2 IP Address>",
port="<PORT NUMBER>",
database="<DATABASE NAME>")
created deployment package with required dependencies as zip file and uploaded into AWS lambda.To build dependency i followed THIS reference guide.
And also configured Virtual Private Cloud (VPC) as default one and also included Ec2 instance details, but i couldn't get the connection from database. when trying to connect database from lambda result in timeout.
Lambda function:
from __future__ import print_function
import json
import ast,datetime
import psycopg2
def lambda_handler(event, context):
received_event = json.dumps(event, indent=2)
load = ast.literal_eval(received_event)
try:
connection = psycopg2.connect(user="<USER NAME>",
password="<PASSWORD>",
host="<EC2 IP Address>",
# host="localhost",
port="<PORT NUMBER>",
database="<DATABASE NAME>")
cursor = connection.cursor()
postgreSQL_select_Query = "select * from test_table limit 10"
cursor.execute(postgreSQL_select_Query)
print("Selecting rows from mobile table using cursor.fetchall")
mobile_records = cursor.fetchall()
print("Print each row and it's columns values")
for row in mobile_records:
print("Id = ", row[0], )
except (Exception,) as error :
print ("Error while fetching data from PostgreSQL", error)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'dt' : str(datetime.datetime.now())
}
I googled quite a lot, But i couldn't found any workaround for this.is there any way to accomplish this requirement?

Your configuration would need to be:
A database in a VPC
The Lambda function configured to use the same VPC as the database
A security group on the Lambda function (Lambda-SG)
A security group on the Database (DB-SG) that permits inbound connects from Lambda-SG on the relevant database port
That is, DB-SG refers to Lambda-SG.

For lambda to connect to any resources inside a VPC, it needs to setup ENIs to the related private subnets of the VPC. Have you set up the VPC association and security groups of the EC2 correctly?
You can refer https://docs.aws.amazon.com/lambda/latest/dg/vpc.html

Related

How to Connect Database(postgres) to Airflow composer On Google Cloud Platform?

I have airflow setup on my local machine.Dags are written in a way that they need to access database(postgres).I am trying to setup similar thing on Google Cloud Platform.But I am not able to connect database to Airflow in a composer.I am Keep getting error "no host postgres" Any Suggestions for setting up airflow on GCP or Connecting Database to airflow composer??
Here Is Link For My Complete Airflow Folder:(This setup works fine on my local machine with docker)
https://github.com/digvijay13873/airflow-docker.git
I am using GCP composer.Postgres Database is in SQL instance. My Table creation Dag is here :
https://github.com/digvijay13873/airflow-docker/blob/main/dags/tablecreation.py
What changes should I do in a My existing Dag to connect it with postgres in SQL instance. I tried Giving public IP address of postgres in Host parameter.
Answering your main question, connecting a SQL instance from GCP in Cloud Composer environment can be done in two ways:
Using Public IP
Using Cloud SQL proxy (recommended): secure access without the need of authorized networks and SSL configuration
Connecting using Public IP:
Postgres: connect directly via TCP (non-SSL)
os.environ['AIRFLOW_CONN_PUBLIC_POSTGRES_TCP'] = (
"gcpcloudsql://{user}:{password}#{public_ip}:{public_port}/{database}?"
"database_type=postgres&"
"project_id={project_id}&"
"location={location}&"
"instance={instance}&"
"use_proxy=False&"
"use_ssl=False".format(**postgres_kwargs)
)
For more information refer github
For connecting using Cloud SQL proxy: You can connect using Auth proxy from GKE as per this documentation.
After setting up the SQL proxy you can connect Composer to your SQL instance using a proxy.
Exemplar Code:
SQL = [
'CREATE TABLE IF NOT EXISTS TABLE_TEST (I INTEGER)',
'CREATE TABLE IF NOT EXISTS TABLE_TEST (I INTEGER)',
'INSERT INTO TABLE_TEST VALUES (0)',
'CREATE TABLE IF NOT EXISTS TABLE_TEST2 (I INTEGER)',
'DROP TABLE TABLE_TEST',
'DROP TABLE TABLE_TEST2',
]
HOME_DIR = expanduser("~")
def get_absolute_path(path):
if path.startswith("/"):
return path
else:
return os.path.join(HOME_DIR, path)
postgres_kwargs = dict(
user=quote_plus(GCSQL_POSTGRES_USER),
password=quote_plus(GCSQL_POSTGRES_PASSWORD),
public_port=GCSQL_POSTGRES_PUBLIC_PORT,
public_ip=quote_plus(GCSQL_POSTGRES_PUBLIC_IP),
project_id=quote_plus(GCP_PROJECT_ID),
location=quote_plus(GCP_REGION),
instance=quote_plus(GCSQL_POSTGRES_INSTANCE_NAME_QUERY),
database=quote_plus(GCSQL_POSTGRES_DATABASE_NAME),
)
os.environ['AIRFLOW_CONN_PROXY_POSTGRES_TCP'] = \
"gcpcloudsql://{user}:{password}#{public_ip}:{public_port}/{database}?" \
"database_type=postgres&" \
"project_id={project_id}&" \
"location={location}&" \
"instance={instance}&" \\
"use_proxy=True&" \
"sql_proxy_use_tcp=True".format(**postgres_kwargs)
connection_names = [
"proxy_postgres_tcp",
]
dag = DAG(
'con_SQL',
default_args=default_args,
description='A DAG that connect to the SQL server.',
schedule_interval=timedelta(days=1),
)
def print_client(ds, **kwargs):
client = storage.Client()
print(client)
print_task = PythonOperator(
task_id='print_the_client',
provide_context=True,
python_callable=print_client,
dag=dag,
)
for connection_name in connection_names:
task = CloudSqlQueryOperator(
gcp_cloudsql_conn_id=connection_name,
task_id="example_gcp_sql_task_" + connection_name,
sql=SQL,
dag=dag
)
print_task >> task

'h' format requires -32768 <= number <= 32767

I am trying to write a dataframe into postgreSQL database table.
When i write it into heroku's postgres SQL database, everything works fine. No problems.
For heroku postgresql, I use the connection string
connection_string = "postgresql+psycopg2://%s:%s#%s/%s" % (
conn_params['user'],
conn_params['password'],
conn_params['host'],
conn_params['dbname'])
However, when i try to write the dataframe into GCP's cloud sql table, i get the following error...
struct.error: 'h' format requires -32768 <= number <= 32767
The connection string i use for gcp cloud sql is as follows.
connection_string = \
f"postgresql+pg8000://{conn_params['user']}:{conn_params['password']}#{conn_params['host']}/{conn_params['dbname']}"
the command i use to write to the database is the same for both gcp and heroku
df_Output.to_sql(sql_tablename, con=conn, schema='public', index=False, if_exists=if_exists, method='multi')
I'd recommend using the Cloud SQL Python Connector to manage your connections and take care of the connection string for you. It supports the pg8000 driver and should help resolve your troubles.
from google.cloud.sql.connector import connector
import sqlalchemy
# configure Cloud SQL Python Connector properties
def getconn():
conn = connector.connect(
"project:region:instance",
"pg8000",
user="YOUR_USER",
password="YOUR_PASSWORD",
db="YOUR_DB"
)
return conn
# create connection pool to re-use connections
pool = sqlalchemy.create_engine(
"postgresql+pg8000://",
creator=getconn,
)
# query or insert into Cloud SQL database
with pool.connect() as db_conn:
# query database
result = db_conn.execute("SELECT * from my_table").fetchall()
# Do something with the results
for row in result:
print(row)
For more detailed examples refer to the README of the repository.

The first part before dot operator of the PostgreSQL service host name is not validated by the IBM PostgreSQL cloud service

We have IBM Cloud PostgreSQL instance. We are connecting to it using node-postgres client or odbc client with Data Direct Driver. As per our understanding the PostgreSQL service instance should throw error when provide incorrect host while connecting. However, the instance is not throwing any error when providing incorrect host (incorrect value for the first part of the host string before dot operator).
Steps to reproduce the issue
Create IBM Cloud PostgreSQL instance. Below is the format of host string
<<1st part of host>>.<<2nd part of host>>.databases.appdomain.cloud
Connect to instance using node client using node-postgres client or odbc client with Data Direct Driver with incorrect value for the first portion(<<1st part of host>>) of the host.
It will connect successfully without any issue. If we provide incorrect value(<<2nd part of host>>.databases.appdomain.cloud) for remaining portion it throws error.
Used below code snippet for validating this scenario:
const pg = require('pg')
const connectionString = 'postgres://user:password#<<1st part of host>>.<<2nd part of host>>.databases.appdomain.cloud:31974/ibmclouddb?sslmode=verify-full'
const caCert = 'Self Signed CA Certificate'
const client = new pg.Client(
{
connectionString: connectionString,
ssl: {
ca: caCert,
rejectUnauthorized: false
}
}
)
client.connect()
client.query('select * from test_pg.char_test4').then(res => {
console.log('res.rows :::::: ', res)
}).finally(() => client.end())

Is it possible writing down to RDS raw sql (PostgreSQL) using AWS/Glue/Spark shell?

I have a Glue/Connection for an RDS/PostgreSQL DB pre-built via CloudFormation, which works fine in a Glue/Scala/Sparkshell via getJDBCSink API to write down a DataFrame to that DB.
But also I need to write down to the same db, plain sql like create index ... or create table ... etc.
How can I forward that sort of statements in the same Glue/Spark shell?
In python, you can provide pg8000 dependency to the spark glue jobs and then run the sql commands by establishing the connection to the RDS using pg8000.
In scala you can directly establish a JDBC connection without the need of any external library as far as driver is concerned, postgres driver is available in aws glue.
You can create connection as
import java.sql.{Connection, DriverManager, ResultSet}
object pgconn extends App {
println("Postgres connector")
classOf[org.postgresql.Driver]
val con_st = "jdbc:postgresql://localhost:5432/DB_NAME?user=DB_USER"
val conn = DriverManager.getConnection(con_str)
try {
val stm = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
val rs = stm.executeQuery("SELECT * from Users")
while(rs.next) {
println(rs.getString("quote"))
}
} finally {
conn.close()
}
}
or follow this blog

Flask-sqlalchemy losing connection after restarting of DB server

I use flask-sqlalchemy in my application. DB is postgresql 9.3.
I have simple init of db, model and view:
from config import *
from flask import Flask, request, render_template
from flask.ext.sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://%s:%s#%s/%s' % (DB_USER, DB_PASSWORD, HOST, DB_NAME)
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
login = db.Column(db.String(255), unique=True, index=True, nullable=False)
db.create_all()
db.session.commit()
#app.route('/users/')
def users():
users = User.query.all()
return '1'
And all works fine. But when happens DB server restarting (sudo service postgresql restart), on first request to the /users/ I obtain sqlalchemy.exc.OperationalError:
OperationalError: (psycopg2.OperationalError) terminating connection due to administrator command
SSL connection has been closed unexpectedly
[SQL: ....
Is there any way to renew connection inside view, or setup flask-sqlalchemy in another way for renew connection automatically?
UPDATE.
I ended up with using clear SQLAlchemy, declaring engine, metadata and db_session for every view, where I critically need it.
It is not solution of question, just a 'hack'.
So question is open. I am sure, It will be nice to find solution for this :)
The SQLAlchemy documentation explains that the default behaviour is to handle disconnects optimistically. Did you try another request - the connection should have re-established itself ? I've just tested this with a Flask/Postgres/Windows project and it works.
In a typical web application using an ORM Session, the above condition would correspond to a single request failing with a 500 error, then the web application continuing normally beyond that. Hence the approach is “optimistic” in that frequent database restarts are not anticipated.
If you want the connection state to be checked prior to a connection attempt you need to write code that handles disconnects pessimistically. The following example code is provided at the documentation:
from sqlalchemy import exc
from sqlalchemy import event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
cursor = dbapi_connection.cursor()
try:
cursor.execute("SELECT 1")
except:
# optional - dispose the whole pool
# instead of invalidating one at a time
# connection_proxy._pool.dispose()
# raise DisconnectionError - pool will try
# connecting again up to three times before raising.
raise exc.DisconnectionError()
cursor.close()
Here's some screenshots of the event being caught in PyCharm's debugger:
Windows 7 (Postgres 9.4, Flask 0.10.1, SQLAlchemy 1.0.11, Flask-SQLAlchemy 2.1 and psycopg 2.6.1)
On first db request
After db restart
Ubuntu 14.04 (Postgres 9.4, Flask 0.10.1, SQLAlchemy 1.0.8, Flask-SQLAlchemy 2.0 and psycopg 2.5.5)
On first db request
After db restart
In plain SQLAlchemy you can add the pool_pre_ping=True kwarg when calling the create_engine function to fix this issue.
When using Flask-SQLAlchemy you can use the same argument, but you need to pass it as a dict in the engine_options kwarg:
app.db = SQLAlchemy(app, engine_options={"pool_pre_ping": True})