On the postgreSQL's wiki, on the "Replication, Clustering, and Connection Pooling" page ( http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling) , it shows the following example on replication's requirements:
"Your users take a local copy of the database with them on laptops when they leave the office, make changes while they are away, and need to merge those with the main database when they return. Here you'd want an asynchronous, lazy replication approach, and will be forced to consider how to handle conflicts in cases where the same record has been modified both on the master server and on a local copy"
And that's pretty much my case. But, unfortunatelly, on the same page, it says: "(...) A great source for this background is in the Postgres-R Terms and Definitions for Database Replication. The main theoretical topic it doesn't mention is how to resolve conflict resolution in lazy replication cases like the laptop situation, which involves voting and similar schemes."
What I want to know, is where can I find material on how to resolve this kind of situation, and wich would be the best way to do this on PostgreSQL.
I will have to check into RubyRep but it seems like Bucardo might be a more widely supported option.
Gabriel Weinberg has an EXCELLENT tutorial on his site for how he uses Bucardo. The guy runs his own search engine called DuckDuckGo and there are quite a few tips and tricks that are optimized for his use cases.
http://www.gabrielweinberg.com/blog/2011/05/replicating-postgresql-with-bucardo.html
Just answering my own question, if anyone ever finds it: I'm using Rubyrep http://www.rubyrep.org/ and it's working.
Related
I have a PostgreSQL 9.3 two node cluster with warm-standby (read-only) slave. There are around 30 individual databases with a few hundred total tables and 1.3 TB of raw data. I'd really like the internets to have full access to these tables and allow folks to write arbitrary queries against. The main reason is my ignorance and incompetence with setting up useful things like REST services, etc...
So I suppose one approach would be to simply allow postgresql tcp connections to the warm-standby host as a user with very limited SELECT perms and perhaps that is what I should do?
Another approach would be to have some simple JSON(P) emitting service that simply takes a database and query string, then returns results?
And I suspect you'll have a better approach, so that's why I am here :)
In general, I am not worried if the internets overrun this host with load and DOS's it. I just don't want it to become a security liability or have some method to delete data on the warm-standby host. The machine would be there for use and if there are naughty users, too bad for the others I guess. If it gets popular, I could setup more readonly hosts, anyway...
Thanks in advance for your thoughts and for those that say I just need to grit my teeth and figure out how to properly provide web services for the data. My main languages are PHP and python, so if you have ideas of tools for those languages...
There is a site: SQL Fiddle that allows simple querying of different databases. Its code is open sourced and available on github here.
You can try to adapt the code to your needs.
About 6 months ago we started using AccuRev with JIRA for our source control and issue management, but there are some obvious problems, like a lack of security, everyone can pretty much do anything, like lock and unlock stream and change streams belonging to anyone else and on top of it the default email trigger that ships with Accurev is not very good.
Accurev allows for pre-create, pre-keep, pre-promote and server-post-promote phase and I've decide to use those to help me manage some of the wild west of Accurev development. I'll stick with PERL since that is what they used for the original trigger and will post mine here later, but before I start, I was wondering if someone maybe already had this problem before, how did you solve it and if you could post some of your triggers here or ideas for triggers and what can be managed through them, it would be greatly appreciated.
Disclosure, I've worked for AccuRev for almost 7 years. It's mostly by design that AccuRev starts off in an open development model. The goal is to enable whatever process you as an organization want to adopt. Some companies flourish in this wide open model, others have very stringent requirements and lock things down tightly using a combination of GUI features and the triggers you mentioned.
The sample triggers we ship with the product provide a solid framework for advanced process security. For example, one of the default clauses in the server_admin_trig.pl is that you can't change someone else's workspace. Typically, companies will work with AccuRev at initial implementation time (or any time later, or on their own) to determine what level of customization, if any, is needed for these triggers.
Sounds like you have the requisite Perl experience to setup whatever you need, but can you give an example of some behavior you'd like to control, and perhaps I can post a sample. As a sidebar, please feel free to contact me using my username # accurev dot com and I'm sure we can find some way to assist.
Regards,
~James
Since Accurev 5.7 there is a very good example server_admin_trigger.pl in the distribution with all the commands covered and described. Once we upgraded to 5.7, getting the admin trigger done just the way I wanted was a breeze.
I want to build a web app similar to Reddit.com, where you have multy level of comments, lots of reads and writes. I was wondering if nosql and mongoDB in particular is the right tool for this?
Comments -- it's really thing for nosql database, no doubt. You avoiding multiple joins to itself. And it's means that your system can scale out!
With mongodb you can store all hierarchy within one document. Some peoples can say that here will be problems with atomic updates, but i guess that it's not a problem because of you can load and save back entire comments tree. In any way you can easy redesign your system later to support atomic updates and avoid issues with concurrency.
Reddit itself uses Cassandra. If you want something "similar to reddit.com," maybe you should look at their source -- https://github.com/reddit/reddit/wiki.
Here's what David King (ketralnis) said earlier this year about the Cassandra 0.7 release: "Running any large website is a constant race between scaling your user base and scaling your infrastructure to support it. Our traffic more than tripled this year, and the transparent scalability afforded to us by Apache Cassandra is in large part what allowed us to do it on our limited resources. Cassandra v0.7 represents the real-life operations lessons learned from installations like ours and provides further features like column expiration that allow us to scale even more of our infrastructure."
However, Rick Branson notes that Reddit doesn't take full advantage of Cassandra's features, so if you were to start from scratch, you'd want to do some things differently.
I have a standalone network device. It needs to be reworked to function as part of a geographically distributed group of these devices. Synchronization between devices in the group need not occur frequently, not more than hourly. The application is rails with SQLite.
Mainly, we want to keep certain pieces of information collected on these devices in sync. Because of the deployment, it isn't feasible to add a large database cluster.
I have been considering CouchDB since replication and handling conflicts resulting from replication is a strong suit of its.
What do you think of CouchDB as a mechanism to keep distributed network devices synchronized? Any thoughts or suggestions for an alternative approach?
What is the particular question?
CouchDB implements master-master replication which is exactly what you are asking for.
Or?
CouchDB would be a great fit for this, because as you say, it has master-master replication. Since you're replicating over the WAN, another huge add is that CouchDB was designed to handle going on and off the network gracefully, which will be a nice piece of fault tolerance.
A lot of people have used CouchDB for this type of situation. Take a look at some case studies (http://www.couchbase.com/customers/case-studies) and a recent blog post I wrote about using CouchDB to keep front end servers' session data synchronized (weblog.bocoup.com/storing-php-sessions-in-couchdb).
Also, it would help if you posted more information about your case so that we can help cater our answers.
Cheers.
CouchDB is fine. You might have some alternatives with Unix tools.
The simplest key/value database is files in a filesystem. They work great. If you only need key/value storage with basic replication, then rsync can do that. If your conflict resolution policy is, for example, always take the latest timestamped data, then you might get away with rsync.
First of all, you're probably running Unix/Linux. SSH and rsync will be included, unlike CouchDB.
Another advantage of rsync (actually its SSH tunnel) is of course identification, authentication, and authorization. Your device is presumably Unix/Linux, and there are a million ways to wire up Unix authorization. It's not a guarantee but nearly anything is doable: password files, NIS, LDAP, Kerberos, Samba/Active Directory. The list goes on.
With Couch you will have to figure out some kind of user management system.
Will you use oauth?
Will you have to write an authentication plugin?
Will you also replicate the _users database around? What about conflicts in the _users database?
Do you instead have a central _users database? How can you have a central users database if you can't have a central data database?
Couch, like MySQL, is a full-blown server. It will maintenance load that rsync won't.
Remember to compact your databases, compact your views, and run view cleanup
Remember to rotate the log files
Possibly back up your .couch files and your .ini config
In other words, can you do a quick and dirty rsync hack, or do you need the full Couch package?
CouchDB is a uniform, consistent platform regardless of OS. That can be good or bad. Not knowing your specifics, I would guess that rsync over SSH is the best short-term, but Couch is the best long-term. (But with so many software projects, long-term never seems to arrive.)
I have found a lot of topics about stress-testing web application.
My goals are different, it's to test only database (sybase sql anywhere 9).
What I need:
Some tool to give a diagnostic of all sqls and find a bottleneck. I wish I could macro-view the entire system easily.
Best practices to design/build a good sql queries.
The system issues are:
20GB database size.
2-5 request per second
Thousands sql spread in the code (this messy can be solved only rewriting the system).
The quickest way would actually be to upgrade your SQL Anywhere to v10 or (better) v11, as the latest releases include a complete performance diagnostic toolset. See the documentation here for more details.
several open source tools are listed here:
http://www.opensourcetesting.org/performance.php