Connecting to a Postgres Heroku DB from AWS Glue, SSL issue

Connecting to a Postgres Heroku DB from AWS Glue, SSL issue - postgresql

I'm trying to connect to my Heroku DB and I'm getting the following series of errors related to SSL:
SSL connection to data store using host matching failed. Retrying without host matching.
SSL connection to data store failed. Retrying without SSL.
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. org.postgresql.util.PSQLException: Connection attempt timed out.
I managed to connect to the DB with DBeaver and had similar SSL problems until I set the SSL Factory to org.postgresql.ssl.NonValidatingFactory, but Glue doesn't offer any SSL options.
The DB is actually hosted on AWS, the connection URL is:
jdbc:postgresql://ec2-52-19-160-2.eu-west-1.compute.amazonaws.com:5432/something
(p.s. the AWS Glue forums are useless! They don't seem to be answering anyones questions)

I was having the same issue and it seems that the issue is that Heroku requires a newer JDBC driver than the one that Amazon requires. See this thread:
AWS Data Pipelines with a Heroku Database
Also, it seems that you can use the jbdc directly from your python scripts. See here:
https://dzone.com/articles/extract-data-into-aws-glue-using-jdbc-drivers-and
So it seems like you need to download a new driver, upload it to s3, then manually use it in your scripts as mentioned here:
https://gist.github.com/saiteja09/2af441049f253d90e7677fb1f2db50cc
Good luck!
UPDATE: I was able to use the following code snippet in a Glue Job to connect to the data. I had to upload the Postgres driver to S3 and then add it to the path for my Glue Job. Also, make sure that either the Jars are public or you've configured the IAM user's policy such that they have access to the bucket.
%pyspark
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.transforms import *
glueContext = GlueContext(SparkContext.getOrCreate())
source_df = spark.read.format("jdbc").option("url","jdbc:postgresql://<hostname>:<port>/<datbase>“).option("dbtable", “<table>”).option("driver", "org.postgresql.Driver").option("sslfactory", "org.postgresql.ssl.NonValidatingFactory").option("ssl", "true").option("user", “<username>”).option("password", “<password>”).load()
dynamic_dframe = DynamicFrame.fromDF(source_df, glueContext, "dynamic_df")

Related

GCP deploy db connection error: Can't create a connection to host

I am trying to deploy my flask app to GCP. However my db connection is not working.
I already try different methods, but anything worked. I also setup db connection locally and I was able to write information on the Google SQL, but when I deploy the app the string doesn't worked at all.
This is part of my code:
from flask import Flask, send_from_directory, render_template, request, redirect, url_for
from sqlalchemy import cast, String
import psycopg2
from model import db, Contacts, Testimonials
from forms import ContactForm, SearchContactForm
from secrets import token_hex
from werkzeug.utils import secure_filename, escape, unescape
from datetime import datetime
import os
import dateutil.parser
root_dir = os.path.abspath(os.path.dirname(__file__))
app = Flask(__name__)
db.init_app(app)
debug_mode = False
if debug_mode:
# Dev environment
app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0
app.debug = True
# Define the database
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://postgres:password#public_ip_address/dbname'
else:
# Production environment
app.debug = False
# Define the database
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql+pg8000://postgres:password#public_ip_address/dbname'
To deploy I am using command gcloud app deploy from cloud shell, and works perfect. Actually, I can render the parts of the website not related with db connection.
The error I am getting is:
pg8000.exceptions.InterfaceError: Can't create a connection to host {public_ip_address} and port 5432 (timeout is None and source_address is None).

I'd recommend using the Cloud SQL Python Connector to manage your connections and take care of the connection string for you. It supports the pg8000 driver and should help resolve your troubles as it will work both locally and in GCP (Cloud Functions, Cloud Run etc.)
Here is example code showing how to use the connector to access a GCP Cloud SQL database.
from google.cloud.sql.connector import connector
import sqlalchemy
# configure Cloud SQL Python Connector properties
def getconn() ->:
conn = connector.connect(
"project:region:instance",
"pg8000",
user="YOUR_USER",
password="YOUR_PASSWORD",
db="YOUR_DB"
)
return conn
# create connection pool to re-use connections
pool = sqlalchemy.create_engine(
"postgresql+pg8000://",
creator=getconn,
)
# query or insert into Cloud SQL database
with pool.connect() as db_conn:
# query database
result = db_conn.execute("SELECT * from my_table").fetchall()
# Do something with the results
for row in result:
print(row)
For more detailed examples refer to the README of the repository.

Issues connecting to a Google Cloud SQL instance from Google Cloud Run

I have a postgresql Google Cloud SQL instance and I'm trying to connect a FastAPI application running in Google Cloud Run to it, however I'm getting ConnectionRefusedError: [Errno 111] Connection refused errors.
My application uses the databases package for async database connections:
database = databases.Database(sqlalchemy_database_uri)
Which then tries to connect on app startup through:
#app.on_event("startup")
async def startup() -> None:
if not database.is_connected:
await database.connect() <--- this is where the error is raised
From reading through the documentation here it suggests forming the connection string like so:
"postgresql+psycopg2://user:pass#/dbname?unix_sock=/cloudsql/PROJECT_ID:REGION:INSTANCE_NAME/.s.PGSQL.5432"
I've tried several different variations of the url, with host instead of unix_sock as the sqlalchemy docs seem to suggest, as well as removing the .s.PGSQL.5432 at the end as I've seen some other SO posts suggest, all to no avail.
I've added the Cloud SQL connection to the instance in the Cloud Run dashboard and added a Cloud SQL Client role to the service account.
I'm able to connect to the databases locally with the Cloud SQL Auth Proxy.
I'm at a bit of a loss on how to fix this, or even how to debug it as there doesn't seem to be any easy way to ssh into container and try out some things. Any help would be greatly appreciated, thanks!
UPDATE
I'm able to connect directly with sqlalchemy with:
from sqlalchemy import create_engine
engine = create_engine(url)
engine.connect()
Where url is any of these formats:
"postgresql://user:pass#/db_name?host=/cloudsql/PROJECT_ID:REGION:INSTANCE_NAME"
"postgresql+psycopg2://user:pass#/db_name?host=/cloudsql/PROJECT_ID:REGION:INSTANCE_NAME"
"postgresql+pg8000://user:pass#/db_name?unix_sock=/cloudsql/PROJECT_ID:REGION:INSTANCE_NAME/.s.PGSQL.5432"
Is there something due to databases's async nature that's causing issues?

turns out this was a bug with the databases package. this should now be resolved with https://github.com/encode/databases/pull/423

uWSGI (PythonAnywhere), Flask, SQLAlchemy, and PostgreSQL: SSL error: decryption failed or bad record mac

I have a Flask application running on PythonAnywhere and a PostgreSQL managed database from DigitalOcean. I'm using Flask-SQLAlchemy to connect to the database.
My app works fine locally when connecting to the remote database. However, the app fails when I run it on PythonAnywhere. I suspect that it's uWSGI's multiple workers that are causing the problem, as described on this Stack Overflow post: uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac
This is the error message that I get
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL error: decryption failed or bad record mac
The error is triggered by a simple Flask-SQLAlchemy method:
user = User.query.get(id)
The other Stack Overflow post provided a solution where I can change the uWSGI's configuration. But in PythonAnywhere, I am unable modify the uWSGI's configuration. Also, disposing the database engine after initializing the app did not resolve the issue. Is there an alternate fix?
Edit 1: I have a paid account.

Disposing the database engine after initializing the app DID resolve the issue. This is a link to the post that helped me: https://github.com/pallets/flask-sqlalchemy/issues/745#issuecomment-499970525
This is how I fixed it in my code.
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from tms_sola.config import Config # my custom config object
db = SQLAlchemy()
def create_app(config_class=Config):
app = Flask(__name__)
app.config.from_object(config_class)
app.app_context().push()
with app.app_context():
db.init_app(app)
# https://github.com/pallets/flask-sqlalchemy/issues/745#issuecomment-499970525
db.session.remove()
db.engine.dispose()
from tms_sola.blueprints.some_blueprint.routes import some_blueprint
# https://github.com/pallets/flask-sqlalchemy/issues/745#issuecomment-499970525
db.session.remove()
db.engine.dispose()
return app

How to connect cloud function to cloudsql

How I can connect cloud function to cloudsql.
import psycopg2
def hello_gcs(event, context):
print("Imported")
conn = psycopg2.connect("dbname='db_bio' user='postgres' host='XXXX' password='aox199'")
print("Connected")
file = event
print(f"Processing file: {file['name']}.")
Could not connect to cloud sql's postgres version, please help.

Google Cloud Function provides a unix socket to automatically authenticate connections to your Cloud SQL instance if it is in the same project. This socket is located at /cloudsql/[instance_connection_name].
conn = psycopg2.connect(host='/cloudsql/[instance_connection_name]', dbname='my-db', user='my-user', password='my-password')
You can find the full documentation page (including instructions for authentication from a different project) here.

You could use the Python example mentioned on this public issue tracker, or use the Node.JS code shown in this document to connect to Cloud SQL using Cloud Functions.

Connection between spark and tableau on tableau desktop

I am trying to connect the spark to tableau, I had installed Simba ODBC driver for 64bit, but I am facing issues while connrecting to spark.
ERROR:
Unable to connect to the ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid.
[Simba][ODBC] (10000) General error: Unexpected exception has been caught.
In some doc, I saw that tableau requires some special license key. Can you please explain it

Basically spark works with the hive but with difference engine(algorithm for fetching data)
so initially to connect spark from tableau we need to install ODBC Hive drivers then spark driver.
Hive Driver:
http://www.tableau.com/support/drivers
Spark driver :
https://databricks.com/spark/odbc-driver-download
while installing spark on your cluster we need to configure thrift server with the hive server and need to give new port address.
you can go through using link to install spark in ambari :
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_spark-guide/content/ch_installing-spark.html
And the respective port address need to be specified in tableau while connecting.
If your Cluster is secured with user name and password we need to specify the Authentication as "USERNAME AND PASSWORD" and need to give credentials there..
Even though if it raises error, then look into below areas:
An incorrect port and/or service defined in the connection
Web Proxy or Firewall settings are blocking connection from Tableau Desktop
The data server is not started.