In relational databases I would just pop in W3Schools tutorial, install mysql in my machine and start practicing! How can I learn non relational databases in a similar way? In most tutorials I read that these databases work with multiple nodes and data centers.
Does this mean that I will be unable to learn and practice, say Cassandra, using my own single pc?
You do it just like you do it with mySQL: You set up a database on your local machine and start experimenting.
Most database systems which focus on sharding and clustering also work as a stand-alone instance. But when you want to test these features specifically, you can often run multiple instances on the same machine. When you also want to try how they behave when they run on different machines, you can use a virtualization software like VMWare or VirtualBox to set up a bunch of virtual machines and build your virtual datacenter on your desktop.
(I would recommend VMWare for business use and VirtualBox for home use)
I'm a big fan of MongoDB. It's the NoSQL equivalent of MySQL.
Go to the Try It Out link on their home page and you can actually use it in a live session on their website - no download, no configuration, no hassle! Just use it and learn the basics.
Here's the quick start for Cassandra. http://wiki.apache.org/cassandra/GettingStarted
I don't see any reason you couldnt run that from local host. I think the point is that you Can scale these nosql solutions. Might want to check out mongodb or couchdb as well. Easy set up and both are great nosql solutions in my experience.
I would strongly suggest using something like Amazon EC2 for testing NoSQL solutions. You can definitely install a technology like MongoDB locally and create a replica set, but you should definitely put these on different physical machines if you can.
I have installed things like AppFabric, Couchbase and Mongo locally and created clusters and they always work really well locally. It's very easy because the networking part of it always goes smoothly.
Once you introduce two physical machines and a stronger network partition things get difficult.
You can create instances on EC2 for free last I checked if you use their Micro instances. You'll learn a lot.
Related
How does PostgreSQL handle running multiple servers on different machines using a shared data directory? Does it automatically handle this under-the-hood without problems? Is it possible, but requiring some special configuration? Or is this a bad idea in general?
I'm doing some data science on high performance machine cluster, where I submit jobs, the job is run by a random machine, and each machine has access to a shared network drive. Currently, I'm using SQLite, where this use-case works fine. A single shared SQLite database file can handle multiple connections from different machines without trouble.
I'm now attempting to switch over to PostgreSQL. Intercommunication between the machines of the cluster is surprisingly not straightforward. So while the immediate solution should be having one server which all the other machines connect to, this might not end up being practical. Ideally, I could just continue doing what I've been doing with the SQLite setup. That is, have each machine run it's own PostgreSQL server, which then connects to the shared databases.
No, no, no and yes.
A PostgreSQL installation ("cluster" is the term used in the manuals) expects to be in charge of all of its files. It carefully coordinates access between multiple processes accessing those files. You are supposed to access PostgreSQL in a client/server manner over a socket (unix if local, tcp if not).
This is not supported with PostgreSQL. It will lead to corruption and data loss. If you can't simplify your networking, then you best stick to SQLite. (Assuming it is actually safe with SQLite, something I haven't verified)
How to run MongoDB on WindowsAzure? Should instance be deployed on a virtual machine? Are there any out-of-the-box solutions like images for virtual machines or anything else? How to run replica sets on WindowsAzure?
I saw this article http://docs.mongodb.org/ecosystem/platforms/windows-azure/ but I feel like it is already out of date. Is it?
Any best practices, help or info would be appreciated!
The article that you refer to describes the options quite well. You have three options:
Running MongoDB in worker roles (as linked to in the article). Before Azure VMs, worker roles were the only option, but I wouldn't recommend it.
You can try the MongoDB database as as service offerings that are available in the add-ons store. This would be a good way to try it out. For longer term, you will have to ask around for peoples' experience.
I recommend that you run MongoDB on a Linux VM. That way you have full control and support from the linux/MongoDB community. Replica sets would the be 'out the box'. The article links to a walkthrough on a CentOS image. You can also get a pre-built image from VMDepot such as this Ubuntu one. The VMDepot images seem to work very well and are a good start for people with less Linux experience.
Edit: MongoLab seems to be gaining traction, and is getting support from Scott Guthrie. As a service that has affinity with Azure datacentres, it is worth evaluating.
You can use MongoLab - Here goes the Tutorial on Azure
Using MongoLab all the maintenance (atleast in DB engine itself) will be taken care by MongoLab guys. That will remove lot of maintenance overheads on your side.
I have an account with MongoLab for MongoDB and the constant calls to this remote server from my app slow it down quite a lot. When I run the app locally on my computer with a local version of Mongod and MongoDB it's far, far faster, as would be expected.
When I deploy my app (running on Node/Express) it will be run from a VPS on CentOS. I have plenty of storage space available on my VPS, are there any major downsides to running MongoDB locally rather than remotely on Mongolab?
Specs of the VPS:
1024MB RAM
1024MB VSwap
4 CPU Cores # 3.3GHz+
60GB SSD space
1Gbps Port
3000GB Bandwidth
Nothing apart from the obvious:
You will have to worry about storage yourself. MongoDB does tend to take a lot of disk space. upgrading storage will probably be harder to manage than letting Mongolab take care of it.
You will have to worry about making sure the Mongo server doesn't crash and it's running fine.
You will have scaling issues in the future once the load on your application and your database increases.
Using a "database-as-a-service" like Mongolab frees you from worrying about a lot of hardware/OS/system level requirements and configuration. Memory optimization? Which file system? Connection limits? Visualization and IO ops issues? (thanks to Nikolay for pointing that one out)
If your VPS provider doesn't account for local traffic in your bandwidth, then you can set up another VPS for MongoDB. That way, the server will be closer so the requests will be faster, and also, it will have the benefits of being an independent server. It still won't be fully managed like MongoLab though.
[ Edit: As Chris pointed out, MongoLab also helps you with your database schema design and bundles MongoDB support with their plans, so that's also nice. ]
Also, this is a good question, but probably not appropriate for StackOverflow. dba.stackexchange.com and serverfault.com are good places for this question.
I've got a group of servers that currently use both memcached and repcached side by side (listening on different ports). The memcached service is used to store local data that doesn't need to be shared. The repcached instance is used to allow pairs of servers to collaborate.
When I found Couchbase I was really excited because it looks like it would allow me to:
Make some data persistent
Share with more than two nodes
Leave most of my code as-is since it uses the memcached API
So I installed Couchbase but I've run into a problem--it doesn't look like there's a way to setup two clusters on the same server. I'd like one cluster that doesn't share with any other server and a second cluster that does share with other servers.
Yes, I could setup several dedicated servers for Couchbase to create different clusters but I've got plenty of CPU + ram to spare on the servers that are currently running memcached + repcached so I'd prefer to just replace those services with Couchbase.
Is it possible to run two instances of Couchbase on the same host? I realize I'd have to change some ports around. I just haven't seen anyone talking about doing anything like this so I'm thinking the answer is "no"... but I had to ask because it looks like Couchbase would be perfect for my needs.
If this won't work then I'd be interested in any alternative suggestions. For example, one idea I had was using Memcached + MemcacheDB to emulate a persistent non-shared Couchbase cluster. However, I don't like the fact that MemcacheDB doesn't support expiring records and I'd rather not have to write a routine to delete millions of records each month (and then wonder if performance will degrade over time).
Any thoughts would be appreciated. :-)
The best solution here is probably to run a single instance of Couchbase and create one memcached bucket and one Couchbase bucket. The memcached bucket won't have persistence and will function exactly like memcached. The other bucket will have persistence and supports the memcached api. You can create as many buckets as you want in a single Couchbase server.
Your other option is to virtualize and run a Couchbase server on each vm.
Is MongoDB for Azure production ready ?
Can anyone share some experience with it ?
Looks like comfort is missing for using it for prod.
What do you think ?
Edit: Since there is a misunderstanding in my question i will try to redefine it.
The information i look into from the community is sharing an info of someone who is running mongo on windows azure to share experience from it.
What i mean by experience is not how to run it in the cloud(we already have the manual on 10gens faq) nor how many bugs it have(we can see that in mongo-azure jira).
What i am looking for is that how it is going with performance ?
Are there any problems(side effects) from running mongodb on azure ?
How does mongodb handle VM recycling ?
Does anyone tried sharding ?
In the end, is the mongo-azure worker role from 10gens stable for using it in production ?
Hope this clears out.
A bit of clarification here. MongoDB itself is production-ready. And MongoDB works just fine in Windows Azure, as long as you set up the scaffolding to get it to work in the environment. This typically entails setting up an Azure Drive, to give you durable storage. Alternatively, using a replicaset, you effectively have eventual consistency across the set members. Then, you could consider going with a standalone (or standalone with hot standby). Personally, I prefer a replicaset model, and that's typical guidance for production MongoDB systems.
As far as 10gen's support for Windows Azure: While the page #SyntaxC4 points to does clarify the wrapper is in a preview state, note that the wrapper is the scaffolding code that launches MongoDB. This scaffolding was initially released in December 2011, and has had a few tweaks since then. It uses the production MongoDB bits (and works just fine with version 2.0.5 which was published on May 9). One caveat is that the MongoDB replicaset roles are deployed alongside your application's roles, since the client app needs visibility to all replica set nodes (to properly build the set). To avoid this limitation, you'd need to run mongos and the entry point (and that's not part of 10gen's scaffolding solution).
Forgetting the preview scaffolding a moment: I have customers running MongoDB in production, with custom scaffolding. One of them is running a rather large deployment, with multiple shards, using a replicaset per shard.
So... does it work in Windows Azure? Yes. Should you take advantage of 10gen's supplied scaffolding? If you're just looking for a simple way to launch a replicaset, I think it's fine. If you want a standalone model, or a shard model, or if you need a separate deployment for MongoDB, you'd currently need to do this on your own (or modify the project 10gen published).
MongoLab is now offering Mongo as a service on Azure MongoLab Blog
Free Demo account is 0.5 GB storage are available in the Windows Azure Store
The warning message on their site says that it's a preview. This would mean that there would be no support for it at a product level in Windows Azure.
If you want to form your own opinion on a comfort level, you can take a look at their bug tracking system and get a feeling for what people are currently reporting as issues.