Why does RDS proxy make performance worse? - postgresql

I deployed a RDS Aurora cluster for postgresql 11 in AWS. My lambda is talking to this cluster via IAM authentication. Since lambda is serverless, I have to create a connection to database every time my lambda is triggered and close the connection when it finishes. It is not great since creating db connection is heavy and takes time. I have used xray to observe the connection performance which takes 150ms to create a new connection. It also gives a lot load on db cluster since there will be many short lived connections on db.
After some searching I found RDS proxy is designed to solve the problem. So I deployed RDS proxy to use username/password to connect to my Aurora cluster. And my lambda connects to RDS proxy via IAM authentication.
When I observe the creating connection performance, it becomes worse. It takes more than 500ms to create a connection and sometimes it even takes more than 1 second.
How come it is worse when using RDS proxy? Is there anything I didn't configure in the proxy?

Related

How to rds connection drops when downscaling?

I’m thinking of using autoscaling with my Amazon aurora postgres, but I’m worried about what to do if a replica is downscaling and a client still holds a connection to that replica. How can I make sure that the client can handle this situation?
The connection is based on TCP, and once it's disconnected, the JDBC drive opens a new TCP connection to the RDS instance.
I can recommend using AWS RDS Proxy - it holds the connection pool to the application and takes care of the connections to the backend.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy.html
Also, AWS have provided their own JDBC connector, which is recommended for faster recovery https://github.com/pgjdbc/pgjdbc

SQLAlchemy with Aurora Serverless V2 PostgreSQL - many connections

I have an AWS Serverless V2 database setup (postgresql) that is being accessed from a compute cluster. The cluster launches a large number of jobs (>1000) and each job independently puts/pulls some data from the database. The Serverless cluster is setup to autoscale from 2 to 32 units as needed.
The code being run by each cluster job is using SQLAlchemy (either the ORM or the core). I am setting up each database connection with a null pool and pessimistic disconnect handling (i.e., pool_pre_ping=True). From my reading of the docs this should be handling disconnects due to being idle mid-connection.
Code is also written to access the DB, get the results, close the connection (to avoid idle connections), and then reopen the connection after processing (5-30 minutes). This is working well because once processing is completed, the new connections are staggered and the DB has scaled up.
My logs are showing the standard, all connections are taken error: psycopg2.OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser and rds_superuser connections until the DB scales the available units high enough.
Questions:
Should I be configuring the SQLAlchemy connection differently? It feels like an anti-pattern to put in a custom retry to grab a connection while waiting for the DB to scale the number of available units as this type of capability seems to be built into SQLAlchemy usually.
Should I be using an RDS Proxy in front of the database? This also seems like an anti-pattern, adding a proxy in front of an autoscaling DB.
PG version is 10.

mongo atlas or aws - Internal or External connection

i am working on my next project currently which works 100% on mongo,
my past projects worked on SQL + Mongo on which i used AWS RDS + AWS EC2 and could connect them both in AWS internal IP which result me with much faster connection.
Now in mongo there is alot of fancy cloud servers like MLab and MongoDB Atlas which is actually cheaper then AWS.
My concern is that moving back to external DB connection will be slower and more network consuming then the internal connection in RDS
Have anyone experienced in such issue? maybe the different isn't that big as i make it but i need it to be optimized
This depends on your setup. Many of the "fancy" services also host stuff on AWS, so latency is minimal. Some even offer "private environments" or such, so you can hide your databases from public view.
The only thing left to care about is the amount of network traffic. But this will be your problem regardless of your database host. You can test this relatively easily (e.g. get a trial from one of the providers and test for throughput, or raise your own MongoDB docker cluster to use as a test etc) just to get an idea of the performance range you'll be in.

real-time sync between local Postgres instance and Azure Cloud Postgres instance

I need to set up real time sync process between a on premise postgresql instance with cloud postgresql instance. Please let me know what are all the options available through which i can achieve it.
Do i have to use any specific tool or it can be managed through replication .
Please advice
Use PgPool
http://www.pgpool.net/mediawiki/index.php/Main_Page
from their web page:
pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.

Replicate data from one RDS server to another

Can we replicate data from one RDS server to another? Or can we set master slave relationship between two RDS servers?
Should we replicate data from non RDS instance to RDS instance?
RDS can replicate from external mysql and also be a master of an external slave. It depends on your usecase if you "should" do it.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.External.Repl.html
While i guess you could setup replication between two RDS instances yourself I don't see why you should since starting a RDS read replica is just a few clicks in AWS console or an api call.
It can be possible to replicate data from RDS to RDS. It is also possible to replicate data from RDS to some other MySQL server.
Steps:
You can go creating your ec2 server and install MySQL.
Change configuration to replicate data.
That will require additional work to manage ec2 instance in case if your data is increasing and crossing the server limits
Then you have to do all the manual work again to replicate data as we can't increase storage in ec2 server.
RDS provides an easy mechanism to create Read replica via a few clicks. (Note: replica is quite a costlier option.)
But going with that you will save manual work one person salary who will be managing the database and doing these setups regularly.
If you are using postgresql database on RDS then you can use bucardo for asynchronous replication. You need to create a EC2 or use can use local system also but it will not be fast enough.
Use the following tutorial if you want to use bucardo.
https://www.installvirtual.com/how-to-install-bucardo-for-postgres-replication/
I think you can using snapshot to clone another rds database