Is it possible writing down to RDS raw sql (PostgreSQL) using AWS/Glue/Spark shell? - postgresql

I have a Glue/Connection for an RDS/PostgreSQL DB pre-built via CloudFormation, which works fine in a Glue/Scala/Sparkshell via getJDBCSink API to write down a DataFrame to that DB.
But also I need to write down to the same db, plain sql like create index ... or create table ... etc.
How can I forward that sort of statements in the same Glue/Spark shell?

In python, you can provide pg8000 dependency to the spark glue jobs and then run the sql commands by establishing the connection to the RDS using pg8000.
In scala you can directly establish a JDBC connection without the need of any external library as far as driver is concerned, postgres driver is available in aws glue.
You can create connection as
import java.sql.{Connection, DriverManager, ResultSet}
object pgconn extends App {
println("Postgres connector")
classOf[org.postgresql.Driver]
val con_st = "jdbc:postgresql://localhost:5432/DB_NAME?user=DB_USER"
val conn = DriverManager.getConnection(con_str)
try {
val stm = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
val rs = stm.executeQuery("SELECT * from Users")
while(rs.next) {
println(rs.getString("quote"))
}
} finally {
conn.close()
}
}
or follow this blog

Related

'h' format requires -32768 <= number <= 32767

I am trying to write a dataframe into postgreSQL database table.
When i write it into heroku's postgres SQL database, everything works fine. No problems.
For heroku postgresql, I use the connection string
connection_string = "postgresql+psycopg2://%s:%s#%s/%s" % (
conn_params['user'],
conn_params['password'],
conn_params['host'],
conn_params['dbname'])
However, when i try to write the dataframe into GCP's cloud sql table, i get the following error...
struct.error: 'h' format requires -32768 <= number <= 32767
The connection string i use for gcp cloud sql is as follows.
connection_string = \
f"postgresql+pg8000://{conn_params['user']}:{conn_params['password']}#{conn_params['host']}/{conn_params['dbname']}"
the command i use to write to the database is the same for both gcp and heroku
df_Output.to_sql(sql_tablename, con=conn, schema='public', index=False, if_exists=if_exists, method='multi')
I'd recommend using the Cloud SQL Python Connector to manage your connections and take care of the connection string for you. It supports the pg8000 driver and should help resolve your troubles.
from google.cloud.sql.connector import connector
import sqlalchemy
# configure Cloud SQL Python Connector properties
def getconn():
conn = connector.connect(
"project:region:instance",
"pg8000",
user="YOUR_USER",
password="YOUR_PASSWORD",
db="YOUR_DB"
)
return conn
# create connection pool to re-use connections
pool = sqlalchemy.create_engine(
"postgresql+pg8000://",
creator=getconn,
)
# query or insert into Cloud SQL database
with pool.connect() as db_conn:
# query database
result = db_conn.execute("SELECT * from my_table").fetchall()
# Do something with the results
for row in result:
print(row)
For more detailed examples refer to the README of the repository.

Scala Slick configure with Amazon X-Ray

anyone tried to use this:
X-Ray
with Slick ?
import slick.dbio.DBIO
import slick.jdbc.JdbcBackend.Database
import slick.jdbc.PostgresProfile.api._
private[database] class PostgresConnector extends DatabaseConnector {
protected final val configurationPath = "mycompany.backend.database.postgres"
protected lazy val database = Database.forConfig(configurationPath)
Probably no way because its based on tomcat:
These interceptors are in the aws-xray-recorder-sql-postgres and
aws-xray-recorder-sql-mysql submodules, respectively. They implement
org.apache.tomcat.jdbc.pool.JdbcInterceptor and are compatible with
Tomcat connection pools.
If you are trying to instrument a SQL connection using a provider other than MySQL or Postgres, you can try to use the generic JDBC-based SQL Library, documented here: https://github.com/aws/aws-xray-sdk-java#intercept-jdbc-based-sql-queries
Alternatively, you can use the X-Ray auto-instrumentation agent for Java, which automatically captures all JDBC-based SQL queries.

Access database which is running in EC2 instance through AWS-lambda function

I wrote the lambda function in python3.6 to access the postgresql database which is running in EC2 instance.
psycopg2.connect(user="<USER NAME>",
password="<PASSWORD>",
host="<EC2 IP Address>",
port="<PORT NUMBER>",
database="<DATABASE NAME>")
created deployment package with required dependencies as zip file and uploaded into AWS lambda.To build dependency i followed THIS reference guide.
And also configured Virtual Private Cloud (VPC) as default one and also included Ec2 instance details, but i couldn't get the connection from database. when trying to connect database from lambda result in timeout.
Lambda function:
from __future__ import print_function
import json
import ast,datetime
import psycopg2
def lambda_handler(event, context):
received_event = json.dumps(event, indent=2)
load = ast.literal_eval(received_event)
try:
connection = psycopg2.connect(user="<USER NAME>",
password="<PASSWORD>",
host="<EC2 IP Address>",
# host="localhost",
port="<PORT NUMBER>",
database="<DATABASE NAME>")
cursor = connection.cursor()
postgreSQL_select_Query = "select * from test_table limit 10"
cursor.execute(postgreSQL_select_Query)
print("Selecting rows from mobile table using cursor.fetchall")
mobile_records = cursor.fetchall()
print("Print each row and it's columns values")
for row in mobile_records:
print("Id = ", row[0], )
except (Exception,) as error :
print ("Error while fetching data from PostgreSQL", error)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'dt' : str(datetime.datetime.now())
}
I googled quite a lot, But i couldn't found any workaround for this.is there any way to accomplish this requirement?
Your configuration would need to be:
A database in a VPC
The Lambda function configured to use the same VPC as the database
A security group on the Lambda function (Lambda-SG)
A security group on the Database (DB-SG) that permits inbound connects from Lambda-SG on the relevant database port
That is, DB-SG refers to Lambda-SG.
For lambda to connect to any resources inside a VPC, it needs to setup ENIs to the related private subnets of the VPC. Have you set up the VPC association and security groups of the EC2 correctly?
You can refer https://docs.aws.amazon.com/lambda/latest/dg/vpc.html

Loading SQL script in Vertx

I have been trying to load the SQL script schema into MySQL DB using Vertx.
Though, i am able to load or update any single DB command but unable to load complete schema in one go.
The second challenge faced is that, this might be blocking code for Vertx application. If that is the case, how can it be avoided?
Here is the code snippet i have been trying to execute:
jdbcClient.getConnection(resConn -> {
if(resConn.succeeded()) {
SQLConnection connection = resConn.result();
connection.execute("<Trying to load the SQL Script schema>", resSchema -> {
connection.close();
if(resSchema.succeeded()) {
async.complete();
} else {
testContext.fail("Failed to load bootstrap schema: " + resSchema.cause().getMessage());
}
});
} else {
testContext.fail("Failed to obtain DB connection for schema write");
}
});
The MySQL JDBC driver does not allow to execute a file as a SQL script. You must parse the script and execute individual commands one by one.

Why do I get 'Database is already closed' when invoking StaticQuery updateNA "shutdown;"

import scala.slick.driver.H2Driver
import scala.slick.jdbc.StaticQuery
object Main extends App {
val db = H2Driver.simple.Database forURL (url = s"jdbc:h2:mem:test", user = "sa", driver = "org.h2.Driver")
StaticQuery updateNA "shutdown;" execute db.createSession()
}
Executing this with scala 2.11.5, h2 1.4.186 and slick 2.1.0 yields a "org.h2.jdbc.JdbcSQLException: Database is already closed". What is happening here?
After executing the "shutdown" prepared statement, the slick StatementInvoker asks the database for the updateCount of the statement.
The H2 database doesn't like being asked this because it's already shut down.
I don't know which of the two is not behaving correctly. However, if you happen to have the same problem, to close the database just use
db.createSession().createStatement() execute "shutdown;"