I am trying to figure out how to build a scalable database system. I settled on using postgresql and am trying to figure out how to implement load balancing. I looked into HAProxy, which I really liked. I noticed that there were multiple different configurations of postgresql http://www.postgresql.org/docs/8.3/static/high-availability.html. Which one would be the best to link with HAProxy?
I have used HAProxy for MySQL. But that was because there were no options tailor-made for MySQL. And HAProxy does a great job. For PostgreSQL, there are quite a few tailor-made options. May be you could have a look at pgpool?
Are you looking for scalability alone, or failover too? Which version of PostgreSQL are you using?
Related
I am currently designing the archicture for an HA WebRTC installation (using Liveswitch Server). Among the requirements is the setup of an auto failover scenario for a postgresql database.
Since I intend to deploy nginx as a load balancer in a different part of the system already I was wondering whether the above postgresql scenario can be accomplished using nginx as well.
IMPORTANT: notification of the failover case to the admin via email or similar is a must (of course).
Is this possible using nginx and which setup should be chosen for the database instances (hot-standby, warm-standby, etc.)?
If not: what would be the solution of choice?
I see that Grafana Cluster can use postgres or MySQL as its metadata DB.
Can it also use cockroachDB?
(In general, I'm looking for an HA solution for Grafana, where the DB is also HA)
Thanks,
Moshe
You might be interested in following along with this issue: https://github.com/grafana/grafana/issues/8900
There are a couple of problems that prevent it from working out of the box right now. A big one right now is that CockroachDB only has experimental support for altering data types of columns, which Grafana uses.
I have an OLTP DB in Oracle and a downstream OLAP System in PostgreSQL on-premises. The data from Oracle is pumped into PostgreSQL using Oracle_FDW.
I am exploring the possibility of moving the PostgreSQL to AWS, but none of the RDS have Oracle_fdw capability. One way out is to install PG on an EC2 instance but that would leave some of the features like read-replica provided natively by AWS. Is there a better workaround?
Also is there a way to fetch the data in Oracle RDS from Postgres RDS in AWS?
With PostgreSQL on Amazon RDS your choice of extensions is limited to the extensions they explicitly support. As far as I'm aware there's no way around this limitation.
Like you mentioned, the general option in this case would be to host PostgreSQL yourself on EC2 instead of RDS. You lose automatic backup/replication/management features, but you get the power and flexibility you need. This will certainly work but will require some leg work to replace what you're losing by not using RDS.
The only alternative to this I can think of is that you may be able to host a different (otherwise empty) PostgreSQL server with the oracle-fdw extension installed and use the postgres-fdw extension (which is supported by RDS) to proxy requests from your RDS hosted database, through your proxy PostgreSQL database, to your Oracle database and back. If the amount of data you're retrieving is substantial, or if the number of queries per minute is high this is probably a terrible idea. But it might be worth testing to see if it works for your use case.
I did a quick search around and I haven't been able to find any references to anyone actually layering foreign data wrappers like this but I also couldn't find anything in the manual or online saying it wasn't supported either. In theory it should work, but if you do try it make sure you thoroughly test it prior to using it to do anything important.
Oracle_FDW is now supported in recent versions - https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-rds-for-postgresql-supports-oracle-fdw-extension-for-accessing-data-in-oracle-databases/
In relational databases I would just pop in W3Schools tutorial, install mysql in my machine and start practicing! How can I learn non relational databases in a similar way? In most tutorials I read that these databases work with multiple nodes and data centers.
Does this mean that I will be unable to learn and practice, say Cassandra, using my own single pc?
You do it just like you do it with mySQL: You set up a database on your local machine and start experimenting.
Most database systems which focus on sharding and clustering also work as a stand-alone instance. But when you want to test these features specifically, you can often run multiple instances on the same machine. When you also want to try how they behave when they run on different machines, you can use a virtualization software like VMWare or VirtualBox to set up a bunch of virtual machines and build your virtual datacenter on your desktop.
(I would recommend VMWare for business use and VirtualBox for home use)
I'm a big fan of MongoDB. It's the NoSQL equivalent of MySQL.
Go to the Try It Out link on their home page and you can actually use it in a live session on their website - no download, no configuration, no hassle! Just use it and learn the basics.
Here's the quick start for Cassandra. http://wiki.apache.org/cassandra/GettingStarted
I don't see any reason you couldnt run that from local host. I think the point is that you Can scale these nosql solutions. Might want to check out mongodb or couchdb as well. Easy set up and both are great nosql solutions in my experience.
I would strongly suggest using something like Amazon EC2 for testing NoSQL solutions. You can definitely install a technology like MongoDB locally and create a replica set, but you should definitely put these on different physical machines if you can.
I have installed things like AppFabric, Couchbase and Mongo locally and created clusters and they always work really well locally. It's very easy because the networking part of it always goes smoothly.
Once you introduce two physical machines and a stronger network partition things get difficult.
You can create instances on EC2 for free last I checked if you use their Micro instances. You'll learn a lot.
Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.