Postgres Multi-Tenant setup for production and tests - postgresql

I am being asked to extend our production/QA database to include an additional schema reserved for testing. My gut keeps telling me this will lead to no good.
The reasoning I've been given is to avoid spinning up an additional RDS instance. Doing so will cut cost and increase efficiency. I proposed running these test on a local instance, or even a micro EC2 instance. Both were shot down due to the complexity and what I felt was other nonsense.
Before I push back, I am wondering if others may have done this with some success. My experience in testing databases is that the environment should mimic one another as much as possible and that each environment should be isolated.
My Questions are:
Is a multi-tenant schema the way to go for this? Or is there another shared schema method?
Has it been heard of to run a multi-tenant schema to support both production and testing interaction?
If so, where might I look for inspiration, examples or how-tos?
What are some of the benefits/pitfalls of taking on this approach?

Related

is it good to leave pg_stat_statements permanently on production servers?

I am fairly new to Postgresql (coming from SQL Server)
I came across that package and it looks very interesting.
Is it good practice to leave it on permanently on production servers?
On one hand, I want to know what actually loads my system in production.
On the other hand, I don't want to load my server by monitoring either...
I always enable it on production databases. The benefits are well worth the small performance hit.
Particularly on a production database you want to know which statements cause the most pain and should be optimized.

How to synchronize deployments (especially of database object changes) on multiple environments

I have this challenge. I am the DevOps engineer and a software engineer in a team where months back, the developers moved from having a central Oracle DB to having the DB on a CentOS VM on their individual laptops. The move from a central DB was to reduce dependency on the DBAs and also to eliminate issues that stemmed from inconsistent data.
The plan for sharing and ensuring synchronization of the Database with everyone on the team was that each person will share change scripts with everyone. The problem is that we use Skype for communication (we just setup slack but are yet to start using it fully), and although people sometimes post the text of DB change scripts, it could be missed by some. The other problem is that some developers miss posting the changes. Further, new releases are deployed in Production without being deployed on the Test and Demo environments.
This has posed a serious challenge for us, especially myself who of recent, became responsible for ensuring that our Demo deployments were in sync with the Production deployments.
Most of the synchronization issues border on the lack of sync of the Database due to missing change scripts or missing DB objects. Oracle is our DB of preference.
A typical deployment in the Demo environment is a very painful process that involves testing an application and as issues occur due to missing DB table columns, functions, stored procs, we have to look for the missing DB objects, apply them to the DB and then continue until all issues are resolved.
How can I solve this problem to ensure smooth, painless and less time-consuming deployments? Can migrating our applications to Docker help with the DB synchronization issues and the associated lack of discipline of the developers? What process can we put into place to improve in this area?
Thank you very much in advance for your help.
Have a look # http://www.dbmaestro.com
I strongly recommend you to join the live demo session
DBmaetro TeamWork can help you merge the changes from multiple DBs into a single shared DB and to move safely the changes from one environment to the other
Danny

What are the pros and cons of DynamoDB with respect to other NoSQL databases?

We use MongoDB database add-on on Heroku for our SaaS product. Now that Amazon launched DynamoDB, a cloud database service, I was wondering how that changes the NoSQL offerings landscape?
Specifically for cloud based services or SaaS vendors, how will using DynamoDB be better or worse as compared to say MongoDB? Are there any cost, performance, scalability, reliability, drivers, community etc. benefits of using one versus the other?
For starters, it will be fully managed by Amazon's expert team, so you can bet that it will scale very well with virtually no input from the end user (developer).
Also, since its built and managed by Amazon, you can assume that they have designed it to work very well with their infrastructure so you can can assume that performance will be top notch. In addition to being specifically built for their infrastructure, they have chosen to use SSD's as storage so right from the start, disk throughput will be significantly higher than other data stores on AWS that are HDD backed.
I havent seen any drivers yet and I think its too early to tell how the community will react to this, but I suspect that Amazon will have drivers for all of the most popular languages and the community will likely receive this well - and in turn create additional drivers and tools.
Using MongoDB through an add-on for Heroku effectively turns MongoDB into a SaaS product as well.
In reality one would be comparing whatever service a chosen provider has compared to what Amazon can offer instead of comparing one persistance solution to another.
This is very hard to do. Each provider will have varying levels of service at different price points and one could consider the option of running it on their own hardware locally for development purposes a welcome option.
I think the key difference to consider is MongoDB is a software that you can install anywhere (including at AWS or at other cloud service or in-house) where as DynamoDB is a SaaS available exclusively as hosted service from Amazon (AWS). If you want to retain the option of hosting your application in-house, DynamoDB is not an option. If hosting outside of AWS is not a consideration, then, DynamoDB should be your default choice unless very specific features are of higher consideration.
There's a table in the following link that summarizes the attributes of DynamoDB and Cassandra:
http://www.datastax.com/dev/blog/amazon-dynamodb
Something that needs improvement on DynamoDB in order to become more usable is the possibility to index columns other than the primary key.
UPDATE 1 (06/04/2013)
On 04/18/2013, Amazon announced support for Local Secondary Indexes, which made DynamoDB f***ing great:
http://aws.amazon.com/about-aws/whats-new/2013/04/18/amazon-dynamodb-announces-local-secondary-indexes/
I have to be honest; I was very excited when I heard about the new DynamoDB and did attend the webinar yesterday. However it's so difficult to make a decision right now as everything they said was still very vague; I have no idea the functions that are going to be allowed / used through their service.
The one thing I do know is that scaling is automatically handled; which is pretty awesome, yet there are still so many unknowns that it's tough to really make a great analysis until all the facts are in and we can start using it.
Thus far I still see mongo as working much better for me (personally) in the project undertaking that I've been working on.
Like most DB decisions, it's really going to come down to a project by project decision of what's best for your need.
I anxiously await more information on the product, as for now though it is in beta and I wouldn't jump ship to adopt the latest and greatest only to be a tester :)
I think one of the key differences between DynamoDB and other NoSQL offerings is the provisioned throughput - you pay for a specific throughput level on a table and provided you keep your data well-partitioned you can always expect that throughput to be met. So as your application load grows you can scale up and keep you performance more-or-less constant.
Amazon DynamoDB seems like a pretty decent NoSQL solution. It is fast, and it is pretty easy to use. Other than having an AWS account, there really isn't any setup or maintenance required. The feature set and API is fairly small right now compared to MongoDB/CouchDB/Cassandra, but I would probably expect that to grow over time as feedback from the developer community is received. Right now, all of the official AWS SDKs include a DynamoDB client.
Pros
Lightning Fast (uses SSDs internally)
Really (really) reliable. (chances of write failures are lower)
Seamless scaling (no need to do manual sharding)
Works as webservices (no server, no configuration, no installation)
Easily integrated with other AWS features (can store the whole table into S3 or use EMR etc)
Replication is managed internally, so chances of accidental loss of data is negligible.
Cons
Very (very) limited querying.
Scanning is painful (I remember once a scanning through Java ran for 6 hours)
pre-defined throughput, which means sudden increase beyond the set throughput will be throttled.
throughput is partitioned as table is sharded internally. (which means if you had a throughput for 1000 and its partitioned in two and if you are reading only the latest data(from one part) then your throughput of reading is 500 only)
No joins, Limited indexing allowed (basically 2).
No views, triggers, scripts or stored procedure.
It's really good as an alternative to session storage in scalable application. Another good use would be logging/auditing in extensive system. NOT preferable for feature rich application with frequent enhancement or changes.

PostgreSQL Replication Tools

On the postgreSQL's wiki, on the "Replication, Clustering, and Connection Pooling" page ( http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling) , it shows the following example on replication's requirements:
"Your users take a local copy of the database with them on laptops when they leave the office, make changes while they are away, and need to merge those with the main database when they return. Here you'd want an asynchronous, lazy replication approach, and will be forced to consider how to handle conflicts in cases where the same record has been modified both on the master server and on a local copy"
And that's pretty much my case. But, unfortunatelly, on the same page, it says: "(...) A great source for this background is in the Postgres-R Terms and Definitions for Database Replication. The main theoretical topic it doesn't mention is how to resolve conflict resolution in lazy replication cases like the laptop situation, which involves voting and similar schemes."
What I want to know, is where can I find material on how to resolve this kind of situation, and wich would be the best way to do this on PostgreSQL.
I will have to check into RubyRep but it seems like Bucardo might be a more widely supported option.
Gabriel Weinberg has an EXCELLENT tutorial on his site for how he uses Bucardo. The guy runs his own search engine called DuckDuckGo and there are quite a few tips and tricks that are optimized for his use cases.
http://www.gabrielweinberg.com/blog/2011/05/replicating-postgresql-with-bucardo.html
Just answering my own question, if anyone ever finds it: I'm using Rubyrep http://www.rubyrep.org/ and it's working.

Synchronizing Applications

I have a standalone network device. It needs to be reworked to function as part of a geographically distributed group of these devices. Synchronization between devices in the group need not occur frequently, not more than hourly. The application is rails with SQLite.
Mainly, we want to keep certain pieces of information collected on these devices in sync. Because of the deployment, it isn't feasible to add a large database cluster.
I have been considering CouchDB since replication and handling conflicts resulting from replication is a strong suit of its.
What do you think of CouchDB as a mechanism to keep distributed network devices synchronized? Any thoughts or suggestions for an alternative approach?
What is the particular question?
CouchDB implements master-master replication which is exactly what you are asking for.
Or?
CouchDB would be a great fit for this, because as you say, it has master-master replication. Since you're replicating over the WAN, another huge add is that CouchDB was designed to handle going on and off the network gracefully, which will be a nice piece of fault tolerance.
A lot of people have used CouchDB for this type of situation. Take a look at some case studies (http://www.couchbase.com/customers/case-studies) and a recent blog post I wrote about using CouchDB to keep front end servers' session data synchronized (weblog.bocoup.com/storing-php-sessions-in-couchdb).
Also, it would help if you posted more information about your case so that we can help cater our answers.
Cheers.
CouchDB is fine. You might have some alternatives with Unix tools.
The simplest key/value database is files in a filesystem. They work great. If you only need key/value storage with basic replication, then rsync can do that. If your conflict resolution policy is, for example, always take the latest timestamped data, then you might get away with rsync.
First of all, you're probably running Unix/Linux. SSH and rsync will be included, unlike CouchDB.
Another advantage of rsync (actually its SSH tunnel) is of course identification, authentication, and authorization. Your device is presumably Unix/Linux, and there are a million ways to wire up Unix authorization. It's not a guarantee but nearly anything is doable: password files, NIS, LDAP, Kerberos, Samba/Active Directory. The list goes on.
With Couch you will have to figure out some kind of user management system.
Will you use oauth?
Will you have to write an authentication plugin?
Will you also replicate the _users database around? What about conflicts in the _users database?
Do you instead have a central _users database? How can you have a central users database if you can't have a central data database?
Couch, like MySQL, is a full-blown server. It will maintenance load that rsync won't.
Remember to compact your databases, compact your views, and run view cleanup
Remember to rotate the log files
Possibly back up your .couch files and your .ini config
In other words, can you do a quick and dirty rsync hack, or do you need the full Couch package?
CouchDB is a uniform, consistent platform regardless of OS. That can be good or bad. Not knowing your specifics, I would guess that rsync over SSH is the best short-term, but Couch is the best long-term. (But with so many software projects, long-term never seems to arrive.)