From my understanding of their documentation, the cloud postgresql service, being beta, does not yet support external replicas, which is what i thought i could use if i wanted a database to be replicated cross region.
This could very well end up being a blocker for our setup, since we need the data in separate regions.
I thought i'd investigate all the streaming replication options out there, and perhaps find one that does not require touching the host folder or custom config wise, which in my mind would end up looking like
master => streaming_replication_app => slave
but what from I've researched so far, no real streaming replication options are possible that are non intrusive.
Can you guys confirm or deny this and point me in the right direction?
I need to decide if the Cloud postgres is too limited as a solution or not
Thanks in advance,
Gabriel
Cross region replica is supported in Google Cloud Sql.
https://cloud.google.com/blog/products/databases/introducing-cross-region-replica-for-cloud-sql
Related
In my gcp project App engine located in Central US and my Postgres Sql located in East US region.
Any one suggest best way to resolve the connection and latency issue for the above problem?
If I correctly understand there is high latency between your app engine and PostgreSql instance. You can avoid high latency if you enable Geo-replication for your instances.
You can also migrate your instances closer to each other but overall latency across the world would increase significantly.
So my suggestion would be to go for geo-replication.
Google has ]this cool tool kubemci - Command line tool to configure L7 load balancers using multiple kubernetes clusters with which you can basically have a HA multi region Kubernetes setup. Which is kind of cool.
But let's say we have an basic architecture like this:
Front end is implemented as SPA and uses json API to talk to backend
Backend is a set of microservices which use PostgreSQL as a DB storage engine.
So I can create two Kubernetes Clusters on GKE, put both backend and frontend on them (e.g. let's say in London and Belgium) and all looks fine.
Until we think about the database. PostgreSQL is single master only, so it must be placed in one of the regions only. And If backend from London region starts to talk to PostgreSQL in Belgium region the performance will really be poor considering the 6ms+ latency between those regions.
So that whole HA setup kind of doesn't make any sense? Or am I missing something? One option to slightly mitigate the issue is would be have a readonly replica in the the "slave" region, and direct read-only queries there (is that even possible with PostgreSQL?)
This is a classic architecture scenario that has no easy solution. Making data available in multiple regions is a challenging problem that major companies spend a lot of time and money to solve.
PostgreSQL does not natively support multi-master writes. Your idea of a replica located in the other region with logic in your app to read and write to the correct database would work. This will give you fast local reads, but slower writes in one region. It's also more complicated code in you app and more work to handle failover of the master. Bandwidth and costs can also be problems with heavy updates.
Use 3rd-party solutions for multi-master Postgres (like Postgres-BDR by 2nd Quadrant) to offload the work to the database layer. This can get expensive and your application still has to manage data conflicts from two regions overwriting the same data at the same time.
Choose another database that supports multi-regional replication with multi-master writes. Cassandra (or ScyllaDB) is a good choice, or hosted options like Google Spanner, Azure CosmosDB, AWS DynamoDB Global Tables, and others. An interesting option is CockroachDB which supports the PostgreSQL protocol but is a scalable relational database and supports multiple regions.
If none of these options work, you'll have to create your own replication system. Some companies do this with a event-sourced / CQRS architecture where every write is a message sent to a central log, then applied in every location. This is a more work but provides the most flexibility. At this point you're also basically building your own database replication system.
If you have multi cluster ingress set up on two clusters in different regions, then the multi cluster ingress will only send traffic to the closest region to the user.
If the closest region is down, this is when traffic will be routed to the cluster in the other region.
So using the example you have provided, if there is traffic being sent to the backend and this user is closer to London, then traffic sent by this user will always be sent to London as long as the Region is up and running.
In regards dealing with latency, you will have to deal with the latency in this case as you cannot create a read replica within another region.
The benefit of this functionality (multi-cluster ingress) is that if one region goes down, then you have another region to route the traffic to.
I currently have a small website hosted on AWS.
The server is a micro-instance.
On this micro-instance:
I am running nginx to serve static files and error pages
I am running my node server
I am storing my mongoDB
As the website is getting more traffic, I reached the time where I need to scale, and I am not sure what the best-practices are and what are the implication of each.
I would love any referrals to reading materials
I was thinking of having:
2 dedicated micro-instances to run the website
1 micro-instance running nginx
1 micro-instance storing the db
questions:
Would having the db stored on a separate machine make the queries
significantly slower?
Should I in fact store the db on S3 instead?
Is it justified to have an entire instance for nginx alone?
How would you go about scaling from 1 machine to multiple ones? I am guessing moving from one to two is harder than moving from two to 50.
Any advice will be greatly appreciated!
Would having the db stored on a separate machine make the queries significantly slower?
No, the speed impact would be very minimal, and this would be needed for scalability anyway. Just make sure you use the private IP addresses of your instances for any inter-instance communication so that the traffic stays inside your VPC (for both security and performance reasons).
Should I in fact store the db on S3 instead?
No, that wouldn't work at all. You can't store a DB on S3, only DB backups.
Is it justified to have an entire instance for nginx alone?
If you are getting enough traffic, then yes absolutely.
How would you go about scaling from 1 machine to multiple ones?
In general you need to move your DB to a separate server, create multiple instances of your web server, and place a load balancer in front of them. If you want automatic scaling based on traffic then you would also place the web servers in an auto-scaling group. If all this sounds difficult then I would recommend looking into moving your web servers into Elastic Beanstalk which will manage much of this for you.
If your database is a bottleneck then you might also need to setup a MongoDB cluster and balance the load across the cluster. You could also move your DB to something like mlab which would greatly ease the management of that as well.
This is rather a set of questions than one very specific question. In the last couple weeks/days I puzzled together information regarding how to properly host a JAVA PLAY application "in the cloud", as lots of this information is scattered over different services, I felt like gathering up all these small pieces to one, because lots of things are important to be seen in full context. However, I moved my considerations to the bottom of the question, as they are mainly my opinions and subjective findings, which I don't want to be held responsible for. If I got something wrong, please don't hesitate to point that out.
Hosting Java PLAY + MySQL on AWS for world wide accessibility
Our Scenario: we have a quite straight forward application written within the Java PLAY framework (https://www.playframework.com/), working on iOS and Android as well as with a backend-system (for administration, content management and API), storing data in a MySQL DB. While most of the users' interactions with the server is quick and easy (login, sync some data) there are also some more data-intensive tasks (download some <100mb data zips to the mobile phone, upload a couple of mb to the server). Therefore we were looking for a solution to properly provide users far away from our servers with reasonable response times. The obvious next step was hosting in the cloud.
Hosting setup within AWS:
Horizontal scaling: for the start, only 1 EC2 instance with our app will be running in eu-1a. We will need to evaluate how much resources one instance actually requires, if more instances are needed and if more instances would actually benefit to quicker response times.
Horizontal scaling across regions: once the app generates heavy user load from another region, the whole EC2 instance should be duplicated and put to another region, running a db read replica (see Setting up a globally available web app on amazon web services and https://aws.amazon.com/de/blogs/aws/cross-region-read-replicas-for-amazon-rds-for-mysql/ ).
Vertical scaling of EC2 instances: in recent tests of the old hosting setup, the database proved to be the bottleneck rather than the play app and its server's hardware specifications. Therefore it is not yet fully clear how much vertical scaling would affect response times. If a t2.micro instance serves as good as a m3.xlarge instance, of course we would rather climb our way up from the bottom here.
Vertical scaling of RDS: we will need to estimate how much traffic hits the DB server and what CPU/RAM/etc will be required. Probably we will work our way up here aswell.
Global Redirection: done using Amazon Route 53 (?). A user from Tokio should be redirected to the EC2 instance running in Asia; a user from Rome to the EC2 instance in Europe. This does not only affect API calls within the app, but also content delivery (in both directions).
Open Questions regarding the setup
Is this setup conclusive? Am I missing crucial components?
Regarding global redirection: is Amazon Route 53 the right tool? How does it differ from CloudFront (which strikes me to be purely for content / media distribution?).
How do I define correct data/api endpoints for my app? Of course I don't want to define the database endpoint of a db read replica during app deployment. Will this also happen during the AR53 (question 2) setup? Same goes for API calls, of course the app should direct it's calls to https://myurl.com/api and from there it should be redirected. Is this realistic?
I would highly appreciate all kinds of thoughts (!), also regarding the background info written below. If you can point me to further reading to solve my questions on my own, I am also very thankful - there is simply a huge load of information regarding this, but this makes it hard to narrow the answers down. I do have knowledge in hosting/servers, but I am pretty sure there are true experts out there waiting to slap me with knowledge. :)
Background-Information
Current Hosting Setup: a load balancer distributes the traffic on 2 root linux servers, both of them running the PLAY app, one of them also holding the MySQL installation.
The current hosting setup has 3 big flaws:
No vertical scalability: the hosting company would take money for each scaling step. Currently the servers are running idle, but if the app booms, we could run short on capacity quickly. Running idle is still paid as if permanently under full load. This is expensive!
No deployment support: currently, we connect through SSH, manually deploy the correct folders to the file system, recompile on the server, set privileges, apply database evolutions; do the same for the second server (with different db connection parameters). What could possibly go wrong. ;)
No worldwide availability: to set up another server in another region of the world would mean a huge effort. To have a synchronized replica of our DB can be done, but once again deploying would mean downtime, room for errors and therefore time and money.
Hosting Options for Java PLAY:
There are lot of different blog posts about this. In short:
AWS: Amazon Web Services is one of the first places you start looking. Here you get everything that's possible, at a flexible price. You set yourself up an EC2 instance, a MySQL RDS and you're good to go - all of this in the free tier, so you can experiment, play around, test your stuff.
Microsoft Azure: similar to AWS regarding pricing and possibilities. However, I did not dive into setting up and deploying our application for test purposes.
Heroku: super easy deployment from within PLAY, scalable servers. However (on the first glance?) lacks possibility to supply remote regions with high speed content.
Jelastic: even easier deployment from within PLAY / IntelliJ IDEA. You push your app image to jelastic, jelastic distributes it further to their infrastructure providers.
RedHat OpenShift (https://www.openshift.com/): sounds promising, yet not as complete as AWS.
Lots of choices and possible setups/prices. Especially after finding out about deployment using boxfuse (https://cloudcaptain.sh/) I made my choice for AWS, as it offers absolutely all we need from 1 source. Boxfuse has low monthly costs but is perfectly integrated into AWS. Scaling is supported as well as the 3 common environments (dev/test/prod). Support is outstanding.
The setup looks good. I would however make one change: your large up- & downloads. As mobile speeds may not be ideal, have your app serve long-running requests is something you should avoid as this will needlessly tie up server threads. Instead consider having users upload and download straight from S3 using presigned URLs. You can then later add CloudFront to the mix when it makes financial sense to do so.
R53 will work just fine for picking the best server(s) for each end user.
For EC2 consider having an ELB + Auto-Scaling Group setup. Even just for a single instance you get the benefit of permanent health monitoring and auto-respawns. If you expect more load you can then auto-scale based on your expected bottleneck (cpu, network i/o). This will give you a more autonomous and robust setup than manually having to scale up and down based on your own monitoring analysis (even though the scaling part is very easy if you stick with immutable infrastructure & blue/green deployments like what Boxfuse offers).
Your focus on vertical server scaling might not serve you well on AWS. I would start thinking about horizontal scaling of app servers behind an Elastic Load Balancer, and possibly look into Elastic Beanstalk.
I'm not sure you can setup a read replica in another region via RDS, you might have to set that up via MySQL servers running on standard EC2 instances. And even if you can, that's going to be some expensive and high-latency data transfer.
If file uploads and downloads are all you are worried about, you just need to put CloudFront (Amazon's CDN service) in front of your application, and allow it to handle file uploads and downloads via its global edge servers. You could even do this without moving your entire application into AWS. I would recommend reading this blog post as a start.
I'm working on a project where a Postgresql database needs to be shared across several physical locations. Each location has limited connectivity, and may only have access to the outside world once or twice a day. So the database has to be available locally at each location, but must also synchronize with the master database when possible.
I am not yet familiar with replication or clustering. Are these good solutions? Or is there a better way of doing it? I would appreciate some advice on this. :)
NOTE: clashing of primary keys from different locations would not be an issue, this has been taken care of.
If the remote locations require read-only access to the data, you can set up asynchronous replication fairly easily using log shipping, which is a built-in feature of PostgreSQL. In this configuration, the master server drops WAL (write-ahead log) files to a shared location where the remote servers can periodically connect and read the logs to bring themselves up to date.
If all servers are performing writes independently, what you're looking for is asynchronous multi-master replication. The Postgres docs mention Bucardo and rubyrep as options for accomplishing this. According to the docs, both are limited to master-to-master replication (or master to multiple slaves), but Bucardo supposedly has true multi-master replication planned for version 5.0, and rubyrep mentions a method for keeping multiple servers synchronized.
(I have servers using PostgreSQL's log shipping and streaming replication features, but I have no direct experience with Bucardo or rubyrep.)