Arangodb remote replication - nosql

I am reading about Arangodb and it is what I want to use for a new startup. I am confused on the asynchronous replication. Can I do asynchronous replication without having the enterprise edition? I will likely have multiple machines receiving a read only copy for backup and maybe in different locations. The enterprise edition talks about datacenter to datacenter replication and so I am confused. Can I get a read only remote asynchronous replicant with the community edition?
Thanx

You can do remote replication the way you describe using the documented methods here:
https://docs.arangodb.com/3.3/Manual/Administration/Replication/
This is not per se an enterprise feature. The enterprise feature dc2dc just does the whole operational setup and runtime for you. Is there a specific reason, why you would like to stay clear of it? It is free for evaluation.
The evaluations T&C are here: https://www.arangodb.com/customer-agreement

Related

Percona PSMDB and MongoDB nodes in single replica set?

tokumx and mongodb are incompatible; you couldn't build a mixed replica-set because they had different storage engines and spoke different replication languages. But PSMDB seems to have closed this gap (with pluggable storage engines, at least, which can allow wiredTiger). Does this mean they can now also be mixed (i.e. have the differences in replication-language also been rectified?) I ask because I've got a very old tokumx system with important data on it and MUST bring it into a mongodb cluster, but there seems to be no simple way to do this. If I can migrate tokumx->PSMDB->mongodb, that would be fantastic! Any help would be appreciated!
I've got a very old tokumx system with important data on it and MUST bring it into a mongodb cluster, but there seems to be no simple way to do this.
TokuMX's replication protocol is not compatible with MongoDB or Percona servers, so migrating from TokuMX will unfortunately require dumping and restoring your data. Outside of replication, there are also some incompatible TokuMX index options to remove before restoring into MongoDB.
See Migrate from TokuMX to Percona Server for migration approaches & scripts to help with this.
If I can migrate tokumx->PSMDB->mongodb, that would be fantastic!
If your goal is to migrate to MongoDB Community or Enterprise edition, an intermediate migration via PSMDB will not provide any benefit. PSMDB uses replication code from the upstream MongoDB community server, but does not provide any special migration path from TokuMX.

MongoDB on Azure Cloud

Is MongoDB for Azure production ready ?
Can anyone share some experience with it ?
Looks like comfort is missing for using it for prod.
What do you think ?
Edit: Since there is a misunderstanding in my question i will try to redefine it.
The information i look into from the community is sharing an info of someone who is running mongo on windows azure to share experience from it.
What i mean by experience is not how to run it in the cloud(we already have the manual on 10gens faq) nor how many bugs it have(we can see that in mongo-azure jira).
What i am looking for is that how it is going with performance ?
Are there any problems(side effects) from running mongodb on azure ?
How does mongodb handle VM recycling ?
Does anyone tried sharding ?
In the end, is the mongo-azure worker role from 10gens stable for using it in production ?
Hope this clears out.
A bit of clarification here. MongoDB itself is production-ready. And MongoDB works just fine in Windows Azure, as long as you set up the scaffolding to get it to work in the environment. This typically entails setting up an Azure Drive, to give you durable storage. Alternatively, using a replicaset, you effectively have eventual consistency across the set members. Then, you could consider going with a standalone (or standalone with hot standby). Personally, I prefer a replicaset model, and that's typical guidance for production MongoDB systems.
As far as 10gen's support for Windows Azure: While the page #SyntaxC4 points to does clarify the wrapper is in a preview state, note that the wrapper is the scaffolding code that launches MongoDB. This scaffolding was initially released in December 2011, and has had a few tweaks since then. It uses the production MongoDB bits (and works just fine with version 2.0.5 which was published on May 9). One caveat is that the MongoDB replicaset roles are deployed alongside your application's roles, since the client app needs visibility to all replica set nodes (to properly build the set). To avoid this limitation, you'd need to run mongos and the entry point (and that's not part of 10gen's scaffolding solution).
Forgetting the preview scaffolding a moment: I have customers running MongoDB in production, with custom scaffolding. One of them is running a rather large deployment, with multiple shards, using a replicaset per shard.
So... does it work in Windows Azure? Yes. Should you take advantage of 10gen's supplied scaffolding? If you're just looking for a simple way to launch a replicaset, I think it's fine. If you want a standalone model, or a shard model, or if you need a separate deployment for MongoDB, you'd currently need to do this on your own (or modify the project 10gen published).
MongoLab is now offering Mongo as a service on Azure MongoLab Blog
Free Demo account is 0.5 GB storage are available in the Windows Azure Store
The warning message on their site says that it's a preview. This would mean that there would be no support for it at a product level in Windows Azure.
If you want to form your own opinion on a comfort level, you can take a look at their bug tracking system and get a feeling for what people are currently reporting as issues.

Which PostgreSQL replication solution to use for my specific scenario

I need to replicate a PostgreSQL database server as follows:
Two servers are adjacent to each-other - one is the master and the other standby. If the master fails, the standby takes over. Replication from master to slave needs to be failsafe, hence, synchronous. The standby will not be used for any querying unless it has become a master. So, no high-availability/load-balancing is required.
There is another backup server at a remote location. Data from the master server mentioned above will be replicated to this remote server asynchronously and in batches. Time is not a factor at all in this replication - a couple of hours is just fine. This server would be used just for backup.
I've studied the currently available replication solutions from the PostgreSQL docs as well as from Google, but can't decide which combination of synchronous-asynchronous solutions would I need.
The closest I came up with is using pgpool-II for scenario 1 and Mammoth for scenario 2. However, as pgpool is statement-based, what would happen to queries containing rand() and now()?
Please note that I'd rather use free and open-source replication tools.
Also, just a side question - according to scenario 1 above, when the master fails, the standby will take over. Would the master-slave role be reversed after that, or would after the recovery of the master server the slave would go back to its standby state?
Any suggestion would be highly appreciated. Thanks.
I suggest using DRBD for scenario 1 and either 9.0 built-in replication or Slony for scenario 2.
Before PostgreSQL 9.1 (not yet released), there is no other synchronous replication solution available, and DRBD is widely established for this purpose. Together with Pacemaker or Heartbeat, which come with all the scripts needed for PostgreSQL monitoring and switchover, you have a very robust and fairly easy to manage solution. (In fact, I'd consider continuing to use DRBD even after 9.1 comes out; it's just a lot easier and has a longer track record.)
For the cross-site asynchronous, you could try the built-in replication of PostgreSQL 9.0, perhaps in conjunction with repmgr for monitoring and management. Alternatively, you could try the (now a bit) old-school Slony, but I'd guess it will more complicated for your needs.
You didn't mention if the server in question was on a specific version or if this was a new project with the freedom to choose the version. The answers vary based on that information.
If you are starting with a clean slate, I would recommend designing based on the PostgreSQL 9.1 beta. The final version will be released long before you would be ready to go into a production environment and it has binary synchronous replication built-in.
I've been using the built-in asynchronous replication in PostgreSQL for years in almost the exact same scenario you describe and it has always been rock-solid for me. It's become even better with 9.0 with Hot standby and it's become much easier to configure and maintain. 9.1 provides the only missing piece you require.
However, if you are trying to replicate an existing server, built-in asynchronous replication with aggressive settings for "checkpoint_timeout" a very frequent backup of unarchived WAL files could be sufficient until you can upgrade to 9.1.
The bottom line here is that you can get exactly what you want is with stock PostgreSQL 9.1--no third-party products required.
As for failover, it is not an automatic process, you'll need to handle that yourself. I would recommend that after a failover, switching the roles of the two machines until either the next failover event or until a controlled manual failover during a scheduled outage during a slow period of use. Again, this is not automatic and much be managed by the administrator (via shell scripts, presumably).

Does azure support things like mongodb and redis?

Can you use mongodb and redis/memcached with azure?
I'm guessing no but just want to make sure.
It turns out they do support things other than .net, are they using linux servers then?
You can very easily run mongodb in Windows Azure. I presented this at MongoSV - video here.
EDIT: In December 2011, 10gen published their official MongoDB+Azure code on github. This contains a project for replica-sets, as well as a demo ASP.NET MVC application (taken from the Windows Azure Platform Training Kit) that uses a replica set for its storage.
Standalone servers are straightforward, except you have to deal with scale-out: you can't have multiple instances of a standalone server simultaneously, so you'll need to plan for this: take all but one out of the load balancer, or only launch mongod if you can acquire the Cloud Drive lock.
Replicasets are doable, as I demonstrated at MongoSV. However, I didn't cover the intricacies of graceful shutdown of a replicaset to ensure zero data loss.
You can run memcached as well - see David Aiken's post about this. Note: Now that the AppFabric Cache service is live, you should look into the pros/cons of using that over memcached. Cost-wise, AppFabric Cache should run much less, as you don't have to pay for role instances to host your cache. More info about AppFabric Cache here.
You now also have the option of running Redis in Windows Azure on Linux virtual machines ! In the case of Redis, this would allow you to use the "official" build instead of the "unsupported" Windows build ... For MongoDB, both choices seem equally valid (running on Linux virtual machines, "plain" Windows virtual machines, or using 10gen's package to run on "managed" VMs (Cloud Services).
FYI, there's now a Redis installer for Windows Azure available from MS Open Tech (my team). Here's a tutorial on how to use it: http://ossonazure.interoperabilitybridges.com/articles/how-to-deploy-redis-to-windows-azure-using-the-command-line-tool
[UPDATE] Azure now supports MongoDB and Redis.
http://azure.microsoft.com/blog/2014/04/22/announcing-new-mongodb-instances-on-microsoft-azure/
http://azure.microsoft.com/en-us/services/cache/
In the Azure Store you can now select Redis Cloud as an add-on.
Heres the Azure store description:
"Redis Cloud is a fully-managed cloud service for hosting and running Redis in a highly-available and scalable manner, with predictable and stable top performance. Tell us how much memory you need and get started instantly with your new Redis database."
PUBLISHED DATE 3/31/2014
You can access the store by selecting the "New" button in the Azure portal then "Store". I have yet to use it but it looks promising.
Azure now has a first-party Redis service, currently in preview:
http://azure.microsoft.com/en-us/documentation/articles/cache-dotnet-how-to-use-azure-redis-cache/

Load Balancing and Failover for Read-Only PostgreSQL Database

Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.