Can Couchbase Server work with RAM memory only? - nosql

We want to take advantage of the No-Sql Databases in our applications, and we found out about Couchbase.
I've read about it on another stack question, where somebody says that you can configure Couchbase to work with Memcached only (so it saves data only on memory, not on disks also).
However, i haven't found anything about this in the documentation.
Is it possible to setup Couchbase server to work with RAM memory?
Or, you specify on the client side where the data should be saved? (disk or memory)

Yes, Just use memcached buckets. That's all

Check out: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-introduction-basics.html
Couchbase 2.0 documentation explicitly states that it's an in-memory database. From my experience the buckets all exist in RAM. You can set the size of every bucket to partition your RAM appropriately.

Related

Does memcache use replica?

I know memcache use consistency hashing to do shading.
But, do memcache do replication as disk storage?
I assume it will not. Since, losing one of the caching server just means the cache miss of that shard. It's not a single point of failure.
However, I still want to confirm.
No. Memcache does not support replication. Nor does it store any data on disk. Everything is stored on memory. This is the main reason memcache is so fast.
Also in a way memcache is not distributed either. It is the client which takes into account multiple servers of memcache, not the memcache servers. The servers are unaware about the existence of the other servers.
If you want replication you can take a look at repcache
Also redis is a good alternative which offers many more functionalities.
Redis and couchbase provide persistence - not sure if these are binary compatible with the memcache protocol.
For replication I suggest having a look at mcrouter. I wrote my own script for pre-seeding memcache instances for high availability. See https://symcbean.blogspot.com/2023/01/usable-memcache.html

Optimized environment for mongo

I have my RHEL linux server(VM) running a 4core processor and 8GB ram running the below applications
- an Apache Karaf container
- an Apache tomcat server
- an ActiveMQ server
- and the mongod server(either primary of secondary).
Often I see that mongo consumes nearly 80% of cpu. Now I see that my cpu and memory is overshooting most of the time and this has caused me to doubt whether my hardware config is too low for running these many components.
Please let me know if it is ok to run mongo like this on a shared server..
The question is to broad and the answer depends on too many variables, but I'll try to give you overall sense of it.
Can you use all these services together on the same machine at a minimum load? - for sure. It's not clear where other shards reside though, but it will work either way. You didn't provide your HDD specs which is quite important for a DB server, but again it will work at a minimum load.
Can you use this setup under heavy load - not the best idea. Perhaps it's better to have separate servers handling these services.
Monitor overall server load like: CPU, memory, IO. Check mongo logs for slow queries. If your queries supposed to run fast and they don't, you'll need more hardware.
Nobody would be really able to tell you how much load a specific server configuration can handle. You need at least 512Mb RAM and 1 CPU to get going these days but very soon you hit the limits. It all depends on how many users you have, what kinds of queries they run and how much data they cover.
Can you run MongoDB along other applications on a single server? Well it would appear that if you are having memory issues or CPU issues in your current configuration then you will likely need to address something. But "Can You?", well if it is not going to affect you then of course you can.
Should you, do this? Most people would firmly agree that you should not, and that would also stand for most of the other applications you are running on the one machine.
There are various reasons, process isolation, resource allocation, security, and far too many for a short topic response to go into why you should not have this kind of configuration. And certainly where it becomes a problem you should be addressing the issue by seeking a new configuration.
For Mongo alone, most people would not think twice about running their SQL database on dedicated hardware. The choice for Mongo should likely be no different.
Have also suggested this be moved to ServerFault, as it is not a programming question suited to stack overflow.

Amazon aws hints

I have to setup a server environment for a web application. I have to use aws, and so far it looks good for that purpose.
I need:
a scalable Tomcat 7 webapp server
session replication!
a mongodb database cluster(?)
As far as I think it could work with:
The scalable Tomcat 7 I can do easily with elastic beanstalk.
The session replication could work with elasticache
It seems like I have todo the mongodb cluster "manually", so I created some ec2 instances todo so.
I have some problems with that.
the costs would be quite high. The minimum setup would be 2 ec2 instances and one for the elasticache
The only thing, which autoscales, is the elastic beanstalk, means that I have to take care of that, too. (Well, for the mongodb instances, I could use a balancer, too)
In case of the mongodb ec2 instances, I need to setup each instance by myself
Do you have any idea how to:
lower the costs (Especially in the beginning, it would be a little much, no?)?
make the administration easier?
If you install mongo, have a look at: AWS_NoSQL_MongoDB.pdf. It is very customary to have one member of the replica set outside the availability zone you use, or even outside of the same region, so you can have a hot backup in case of a failure.
About prices - experiment (load test) and find the smaller instance type that fits you. Also, remember to shut down unused instances
About management - there is the AWS console and many 3rd party products. Also, Netflix have released some nice management tools.
Autoscaling for your web server is easily done with Elastic Beanstalk, but you can use it independently. Check out the documentation for autoscaling here: http://aws.amazon.com/autoscaling/
It has couple of features that will help you save most of your computation costs;
one is the ability to scale out (and most importantly in), by changing automatically the number of web server you are using, based on your load. You can define the minimum number (for example, 1) and the maximum number. The system can watch a set of metrics (number of requests, CPU load...) and decide when to add or subtract instances.
the second is the ability to change the scale policy and increase or decrease the size of the machines, based on your usage. You can use medium size instances and switch to large or extra large ones, if you find it more cost effective. You are encourage to try out different sizes to see what fits you best.
Using Elastic Cache, can help you both in the session replication, but also to lower the load on your DB machine. You can cache there your frequent queries output (front page, main category page...), and get better performance, and use fewer DB instances. It supports Memecached clients, which makes it very easy to develop in almost any programming language.
You should check Couchbase instead of MongoDB (see comparison). It is more robust and more reliable in scale.
Elastic BeanStalk is the best to reduce the over head of management for Web/App servers. However if haven't taken the development too far, I would recommend to use DynamoDB for the administration easement aspect.
However keep check on cost as well. Performance and Management is something really awesome in DynamoDB

AWS MongoDB EC2 instance as localhost with EC2 application instance

I'm actually new on AWS. And I configured 2 EC2 instances.
One for my MongoDB database and an other one for my application.
I'm using pymongo to make the connection. But If send data through instances each time, it takes too much time. I would like to know if it's possible to have the mongoDB instance as localhost for the application one, using groups or I don't know, to get better performances.
Or If it is better to put the database on the same instance as my application and get more EBS.
Be sure you know where your performance bottleneck is.
If both instances are in the same Availability Zone, network latency should not be the largest performance issue. In fact if you have instances that are at least large... due to the better NIC... network latency should be a non-issue.
To know for sure, measure your network utilization with a monitoring tool.
If any of your working set (MongoDB documents that are used with any frequency) cannot fit in RAM of the instance, that means you are touching EBS. EBS is very, very slow compared to what MongoDB needs. I measured a single EBS volume using iozone recently and found the EBS volume to be half as fast as my laptop's rotational hard drive.
You can improve EBS performance substantially by striping multiple EBS volumes into a software RAID configuration.
The bottom line when running MongoDB on AWS is that you need enough RAM to hold the MongoDB documents that you will touch with any frequency.
I have an application in production that uses a mongodb instance on the same machine as the web server. Works fine for me but then I don't have need for scalability right now. One instance is enough.
So to answer your question, sure you can run it as localhost.
But if your app picks up and you need multiple instances or sharding or such then you'd have to have instances deployed on other machines as well.

The benefits of deploying multiple instances for serving/data/cache

although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.