Redis periodically stops responding on high load - nosql

I am using a simple redis server setup to store some values in my PHP application. Yesterday I installed phpredis module to use redis as PHP Session backend, which increased request rate on redis DB form 100 to 2000, and DB size from 60Mb to 200Mb.
And after this, redis is not avalable on every 10th request - just not responding. Log file does not show anything that could explain this.
I have more than 50% of memory free. Here are the resources used by redis:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31075 root 20 0 170m 161m 936 S 41 2.0 11:10.52 redis-server
What can be the cause of this? Maybe I should tweak some redis settings for higher load?
Here is my redis.conf:
# Redis configuration file example
# Note on units: when memory size is needed, it is possible to specifiy
# it in the usual form of 1k 5GB 4M and so forth:
#
# 1k => 1000 bytes
# 1kb => 1024 bytes
# 1m => 1000000 bytes
# 1mb => 1024*1024 bytes
# 1g => 1000000000 bytes
# 1gb => 1024*1024*1024 bytes
#
# units are case insensitive so 1GB 1Gb 1gB are all the same.
# By default Redis does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis.pid when daemonized.
daemonize no
# When running daemonized, Redis writes a pid file in /var/run/redis.pid by
# default. You can specify a custom pid file location here.
pidfile /var/run/redis.pid
# Accept connections on the specified port, default is 6379.
# If port 0 is specified Redis will not listen on a TCP socket.
port 6379
# If you want you can bind a single interface, if the bind option is not
# specified all the interfaces will listen for incoming connections.
#
# bind 127.0.0.1
# Specify the path for the unix socket that will be used to listen for
# incoming connections. There is no default, so Redis will not listen
# on a unix socket when not specified.
#
# unixsocket /tmp/redis.sock
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 300
# Set server verbosity to 'debug'
# it can be one of:
# debug (a lot of information, useful for development/testing)
# verbose (many rarely useful info, but not a mess like the debug level)
# notice (moderately verbose, what you want in production probably)
# warning (only very important / critical messages are logged)
loglevel debug
# Specify the log file name. Also 'stdout' can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile /var/log/redis/redis.log
# To enable logging to the system logger, just set 'syslog-enabled' to yes,
# and optionally update the other syslog parameters to suit your needs.
# syslog-enabled no
# Specify the syslog identity.
# syslog-ident redis
# Specify the syslog facility. Must be USER or between LOCAL0-LOCAL7.
# syslog-facility local0
# Set the number of databases. The default database is DB 0, you can select
# a different one on a per-connection basis using SELECT <dbid> where
# dbid is a number between 0 and 'databases'-1
databases 16
################################ SNAPSHOTTING #################################
#
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving at all commenting all the "save" lines.
save 900 1
save 300 10
save 60 10000
# Compress string objects using LZF when dump .rdb databases?
# For default that's set to 'yes' as it's almost always a win.
# If you want to save some CPU in the saving child set it to 'no' but
# the dataset will likely be bigger if you have compressible values or keys.
rdbcompression yes
# The filename where to dump the DB
dbfilename dump.rdb
# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
#
# Also the Append Only File will be created inside this directory.
#
# Note that you must specify a directory here, not a file name.
dir /backups/redisdumps
################################# REPLICATION #################################
# Master-Slave replication. Use slaveof to make a Redis instance a copy of
# another Redis server. Note that the configuration is local to the slave
# so for example it is possible to configure the slave to save the DB with a
# different interval, or to listen to another port, and so on.
#
# slaveof <masterip> <masterport>
# If the master is password protected (using the "requirepass" configuration
# directive below) it is possible to tell the slave to authenticate before
# starting the replication synchronization process, otherwise the master will
# refuse the slave request.
#
# masterauth <master-password>
# When a slave lost the connection with the master, or when the replication
# is still in progress, the slave can act in two different ways:
#
# 1) if slave-serve-stale-data is set to 'yes' (the default) the slave will
# still reply to client requests, possibly with out of data data, or the
# data set may just be empty if this is the first synchronization.
#
# 2) if slave-serve-stale data is set to 'no' the slave will reply with
# an error "SYNC with master in progress" to all the kind of commands
# but to INFO and SLAVEOF.
#
slave-serve-stale-data yes
################################## SECURITY ###################################
# Require clients to issue AUTH <PASSWORD> before processing any other
# commands. This might be useful in environments in which you do not trust
# others with access to the host running redis-server.
#
# This should stay commented out for backward compatibility and because most
# people do not need auth (e.g. they run their own servers).
#
# Warning: since Redis is pretty fast an outside user can try up to
# 150k passwords per second against a good box. This means that you should
# use a very strong password otherwise it will be very easy to break.
#
# requirepass foobared
# Command renaming.
#
# It is possilbe to change the name of dangerous commands in a shared
# environment. For instance the CONFIG command may be renamed into something
# of hard to guess so that it will be still available for internal-use
# tools but not available for general clients.
#
# Example:
#
# rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52
#
# It is also possilbe to completely kill a command renaming it into
# an empty string:
#
# rename-command CONFIG ""
################################### LIMITS ####################################
# Set the max number of connected clients at the same time. By default there
# is no limit, and it's up to the number of file descriptors the Redis process
# is able to open. The special value '0' means no limits.
# Once the limit is reached Redis will close all the new connections sending
# an error 'max number of clients reached'.
#
# maxclients 128
# Don't use more memory than the specified amount of bytes.
# When the memory limit is reached Redis will try to remove keys with an
# EXPIRE set. It will try to start freeing keys that are going to expire
# in little time and preserve keys with a longer time to live.
# Redis will also try to remove objects from free lists if possible.
#
# If all this fails, Redis will start to reply with errors to commands
# that will use more memory, like SET, LPUSH, and so on, and will continue
# to reply to most read-only commands like GET.
#
# WARNING: maxmemory can be a good idea mainly if you want to use Redis as a
# 'state' server or cache, not as a real DB. When Redis is used as a real
# database the memory usage will grow over the weeks, it will be obvious if
# it is going to use too much memory in the long run, and you'll have the time
# to upgrade. With maxmemory after the limit is reached you'll start to get
# errors for write operations, and this may even lead to DB inconsistency.
#
# maxmemory <bytes>
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached? You can select among five behavior:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key accordingly to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys->random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
#
# Note: with all the kind of policies, Redis will return an error on write
# operations, when there are not suitable keys for eviction.
#
# At the date of writing this commands are: set setnx setex append
# incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
# sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
# zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
# getset mset msetnx exec sort
#
# The default is:
#
# maxmemory-policy volatile-lru
# LRU and minimal TTL algorithms are not precise algorithms but approximated
# algorithms (in order to save memory), so you can select as well the sample
# size to check. For instance for default Redis will check three keys and
# pick the one that was used less recently, you can change the sample size
# using the following configuration directive.
#
# maxmemory-samples 3
############################## APPEND ONLY MODE ###############################
# By default Redis asynchronously dumps the dataset on disk. If you can live
# with the idea that the latest records will be lost if something like a crash
# happens this is the preferred way to run Redis. If instead you care a lot
# about your data and don't want to that a single record can get lost you should
# enable the append only mode: when this mode is enabled Redis will append
# every write operation received in the file appendonly.aof. This file will
# be read on startup in order to rebuild the full dataset in memory.
#
# Note that you can have both the async dumps and the append only file if you
# like (you have to comment the "save" statements above to disable the dumps).
# Still if append only mode is enabled Redis will load the data from the
# log file at startup ignoring the dump.rdb file.
#
# IMPORTANT: Check the BGREWRITEAOF to check how to rewrite the append
# log file in background when it gets too big.
appendonly no
# The name of the append only file (default: "appendonly.aof")
# appendfilename appendonly.aof
# The fsync() call tells the Operating System to actually write data on disk
# instead to wait for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log . Slow, Safest.
# everysec: fsync only if one second passed since the last fsync. Compromise.
#
# The default is "everysec" that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# If unsure, use "everysec".
# appendfsync always
appendfsync everysec
# appendfsync no
# When the AOF fsync policy is set to always or everysec, and a background
# saving process (a background save or AOF log background rewriting) is
# performing a lot of I/O against the disk, in some Linux configurations
# Redis may block too long on the fsync() call. Note that there is no fix for
# this currently, as even performing fsync in a different thread will block
# our synchronous write(2) call.
#
# In order to mitigate this problem it's possible to use the following option
# that will prevent fsync() from being called in the main process while a
# BGSAVE or BGREWRITEAOF is in progress.
#
# This means that while another child is saving the durability of Redis is
# the same as "appendfsync none", that in pratical terms means that it is
# possible to lost up to 30 seconds of log in the worst scenario (with the
# default Linux settings).
#
# If you have latency problems turn this to "yes". Otherwise leave it as
# "no" that is the safest pick from the point of view of durability.
no-appendfsync-on-rewrite no
################################## SLOW LOG ###################################
# The Redis Slow Log is a system to log queries that exceeded a specified
# execution time. The execution time does not include the I/O operations
# like talking with the client, sending the reply and so forth,
# but just the time needed to actually execute the command (this is the only
# stage of command execution where the thread is blocked and can not serve
# other requests in the meantime).
#
# You can configure the slow log with two parameters: one tells Redis
# what is the execution time, in microseconds, to exceed in order for the
# command to get logged, and the other parameter is the length of the
# slow log. When a new command is logged the oldest one is removed from the
# queue of logged commands.
# The following time is expressed in microseconds, so 1000000 is equivalent
# to one second. Note that a negative number disables the slow log, while
# a value of zero forces the logging of every command.
slowlog-log-slower-than 10000
# There is no limit to this length. Just be aware that it will consume memory.
# You can reclaim memory used by the slow log with SLOWLOG RESET.
slowlog-max-len 1024
################################ VIRTUAL MEMORY ###############################
### WARNING! Virtual Memory is deprecated in Redis 2.4
### The use of Virtual Memory is strongly discouraged.
# Virtual Memory allows Redis to work with datasets bigger than the actual
# amount of RAM needed to hold the whole dataset in memory.
# In order to do so very used keys are taken in memory while the other keys
# are swapped into a swap file, similarly to what operating systems do
# with memory pages.
#
# To enable VM just set 'vm-enabled' to yes, and set the following three
# VM parameters accordingly to your needs.
vm-enabled no
# vm-enabled yes
# This is the path of the Redis swap file. As you can guess, swap files
# can't be shared by different Redis instances, so make sure to use a swap
# file for every redis process you are running. Redis will complain if the
# swap file is already in use.
#
# The best kind of storage for the Redis swap file (that's accessed at random)
# is a Solid State Disk (SSD).
#
# *** WARNING *** if you are using a shared hosting the default of putting
# the swap file under /tmp is not secure. Create a dir with access granted
# only to Redis user and configure Redis to create the swap file there.
vm-swap-file /tmp/redis.swap
# vm-max-memory configures the VM to use at max the specified amount of
# RAM. Everything that deos not fit will be swapped on disk *if* possible, that
# is, if there is still enough contiguous space in the swap file.
#
# With vm-max-memory 0 the system will swap everything it can. Not a good
# default, just specify the max amount of RAM you can in bytes, but it's
# better to leave some margin. For instance specify an amount of RAM
# that's more or less between 60 and 80% of your free RAM.
vm-max-memory 0
# Redis swap files is split into pages. An object can be saved using multiple
# contiguous pages, but pages can't be shared between different objects.
# So if your page is too big, small objects swapped out on disk will waste
# a lot of space. If you page is too small, there is less space in the swap
# file (assuming you configured the same number of total swap file pages).
#
# If you use a lot of small objects, use a page size of 64 or 32 bytes.
# If you use a lot of big objects, use a bigger page size.
# If unsure, use the default :)
vm-page-size 32
# Number of total memory pages in the swap file.
# Given that the page table (a bitmap of free/used pages) is taken in memory,
# every 8 pages on disk will consume 1 byte of RAM.
#
# The total swap size is vm-page-size * vm-pages
#
# With the default of 32-bytes memory pages and 134217728 pages Redis will
# use a 4 GB swap file, that will use 16 MB of RAM for the page table.
#
# It's better to use the smallest acceptable value for your application,
# but the default is large in order to work in most conditions.
vm-pages 134217728
# Max number of VM I/O threads running at the same time.
# This threads are used to read/write data from/to swap file, since they
# also encode and decode objects from disk to memory or the reverse, a bigger
# number of threads can help with big objects even if they can't help with
# I/O itself as the physical device may not be able to couple with many
# reads/writes operations at the same time.
#
# The special value of 0 turn off threaded I/O and enables the blocking
# Virtual Memory implementation.
vm-max-threads 4
############################### ADVANCED CONFIG ###############################
# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given numer of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
# Similarly to hashes, small lists are also encoded in a special way in order
# to save a lot of space. The special representation is only used when
# you are under the following limits:
list-max-ziplist-entries 512
list-max-ziplist-value 64
# Sets have a special encoding in just one case: when a set is composed
# of just strings that happens to be integers in radix 10 in the range
# of 64 bit signed integers.
# The following configuration setting sets the limit in the size of the
# set in order to use this special memory saving encoding.
set-max-intset-entries 512
# Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in
# order to help rehashing the main Redis hash table (the one mapping top-level
# keys to values). The hash table implementation redis uses (see dict.c)
# performs a lazy rehashing: the more operation you run into an hash table
# that is rhashing, the more rehashing "steps" are performed, so if the
# server is idle the rehashing is never complete and some more memory is used
# by the hash table.
#
# The default is to use this millisecond 10 times every second in order to
# active rehashing the main dictionaries, freeing memory when possible.
#
# If unsure:
# use "activerehashing no" if you have hard latency requirements and it is
# not a good thing in your environment that Redis can reply form time to time
# to queries with 2 milliseconds delay.
#
# use "activerehashing yes" if you don't have such hard requirements but
# want to free memory asap when possible.
activerehashing yes
################################## INCLUDES ###################################
# Include one or more other config files here. This is useful if you
# have a standard template that goes to all redis server but also need
# to customize a few per-server settings. Include files can include
# other files, so use this wisely.
#
# include /path/to/local.conf
# include /path/to/other.conf

Seems that I have found a solution here: http://redis4you.com/articles.php?id=012&name=redis
On linux you should change kernel settings, to optimize TCP operations for high load:
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
And also, make those changes persistant after reboot by adding fallowing to /etc/sysctl.conf:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
Additionaly, it helped to set timeout to 0 in redis.conf:
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0

We found if you use aof persistance along with rdb snapshots, sometimes the rdb updating can delay the aof being updated, which in turn causes writes to redis to block. Turning off aof persistance cured the intermittant latency we were having, which was ok for us as loosing the latest few keys due to a crash was not to important.

Related

Unable to start a Zookeeper server

I am trying to initialize/start a Zookeeper server on Windows. I have create a logs folder to store the data and also modify the path inside the zoo.cfg file to the path of the logs folder. When I start to run the command .\zkServer.cmd start, I receive this error.
This is my zoo.cfg file
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=C:\zookeeper\logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpHost=0.0.0.0
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true

Fork fails with "resource temporarily unavailable". Which resource?

I have inherited a Perl script that, depending on machine configuration, fails during calls to fork with $? == 11.
According to errno.h and various posts, 11 is EAGAIN, i.e. "try again", because some resource was temporarily unavailable.
Is there a way to determine which resource caused the fork to fail,
other than increasing various system limits one by one
(open file descriptors,
swap space, or
number of allowable threads)?
Assuming you mean $! is EAGAIN, the fork man page on my system says:
EAGAIN: fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.
EAGAIN: It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.
Are you trying to create a ton of processes? Are you reaping your children when they are done?
The error is due to the user has run out of free stacks.
Check the security configuration file on RHEL Server
[root#server1 webapps]# cat /etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
* soft nproc 1024
root soft nproc unlimited
[root#server1 webapps]# vi /etc/security/limits.d/90-nproc.conf
[root#server1 webapps]#
In my case, the "test" user was receiving the message "-bash: fork: retry: Resource temporarily unavailable"
Resolved the issue by adding user specific limits for stack limit
[root#server1 webapps]# vi/etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
* soft nproc 1024
test soft nproc 16384
root soft nproc unlimited
[root#server1 webapps]#

Cannot Start Mesos/Marathon Cluster

Physical Machine: 192.168.10.1 ( Mesos, Zookeeper, Marathon )
Virtual Machine: 192.168.122.10 ( Mesos, Zookeeper )
Virtual Machine: 192.168.122.46 ( Mesos, Zookeeper )
OS for all three machines are Fedora 23 Server
The two networks are already inter-routed by default as the virtual machines all reside on the physical machine.
There is no firewall setup.
Mesos Election LOG:
Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address.
I can set this manually, however I cannot set this dynamically... the --ip_discovery_command flag is not recognized.
What I wanted to do was link the below script to that flag.
if [[ $(ip addr) == *enp8s0* ]];
then
ip addr show enp8s0 | awk -F'/| ' '/inet/ { print $6 }'
else
ip addr show eth0 | awk -F'/| ' '/inet/ { print $6 }'
fi
When I do set this manually (not what I want to do)...
the Mesos page at IP:5050 comes up... but then the mesos-master fails after 1 minute due to this...
F0427 17:03:27.975260 6914 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
# 0x7f8360fa9edd (unknown)
# 0x7f8360fabc50 (unknown)
# 0x7f8360fa9ad3 (unknown)
# 0x7f8360fac61e (unknown)
# 0x7f83619a85dd (unknown)
# 0x7f83619e7c30 (unknown)
# 0x55a885ee3b2e (unknown)
# 0x7f8361a11c0e (unknown)
# 0x7f8361a5d75e (unknown)
# 0x7f8361a7077a (unknown)
# 0x7f83618f4aae (unknown)
# 0x7f8361a70768 (unknown)
# 0x7f8361a548d0 (unknown)
# 0x7f8361fc832c (unknown)
# 0x7f8361fd42a5 (unknown)
# 0x7f8361fd472f (unknown)
# 0x7f8360a5e60a start_thread
# 0x7f835fefda4d __clone Aborted (core dumped)
Zookeeper is setup like this:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/log
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1:192.168.10.1:2888:3888
server.2:192.168.122.46:2888:3888
server.3:192.168.122.10:2888:3888
and have no idea how to verify that it is working properly...
I'm honestly on the end of my rope.. pulling out my hair for the past week on this due to poor documentation and lack of proper architecture explanations (primarily Marathon) horribly organized logs (Mesos), systemd being unable to properly parse a bash and use the output as a variable, and lack of instructions all around.
Am I doing something wrong? I Appreciate any assistance I can get, Let me know if you need anything I have not yet provided and I will post it right away.
EDIT:
I fixed the issue with marathon, by adding two additional Marathon servers to the VM's so that they could form a quorum.
EDIT2:
I am now having issues where the Mesos server keeps rapidly re-electing a leader... but depending on the outcome I will look into this later...
If you follow the installation docs closely, I think you should get it to work.
For example you "Master binds to loopback" problem is IMHO related to incorrect/incomplete settings. See:
Hostname (optional)
If you're unable to resolve the hostname of the machine directly (e.g., if on a different network or using a VPN), set /etc/mesos-master/hostname to a value that you can resolve, for example, an externally accessible IP address or DNS hostname. This will ensure all links from the Mesos console work correctly.
You will also want to set this property in /etc/marathon/conf/hostname.
Furthermore, I'd recommend to also set the Master IP address in the /etc/mesos-master/ip file. Always make sure that the hostnames are resolvable to a non-local IP address, i.e. by adding entries in the /etc/hosts file on each host.
Basically, the /etc/hosts file should look similar to this (replace the hostnames with the actual ones):
127.0.0.1 localhost
192.168.10.1 host1
192.168.122.10 host2
192.168.122.46 host3
If you just want to test a Mesos cluster, you could also use a preconfigured Vagrant solution like tobilg/coreos-mesos-cluster.
Regarding the ZooKeeper setup, make sure that you created a /var/lib/zookeeper/myid on each node which contains the actual numeric id you set for each node, e.g. for 192.168.10.1 the sole content of the file needs to be 1.
Before debugging the masters, check that the ZooKeeper cluster works correctly, and that a leader is elected. Make sure that /etc/mesos/zk contains the right ZooKeeper connection string on each host, e.g.
zk://192.168.10.1:2181,192.168.122.10:2181,192.168.122.46:2181/mesos
If ZK works, then restart the services and check the Masters logs. Do the same with the Slaves.
References:
https://open.mesosphere.com/reference/mesos-master/
https://open.mesosphere.com/reference/mesos-slave/
https://mesosphere.github.io/marathon/docs/

Redis server can't run more than 1024M maxheap

I am running Redis 2.8.19 on Windows Server 2008.
I get an error saying that I have insufficient disc space for my Redis heap. (The memory mapping file instead of fork()).
I can only get Redis running, if I have 'maxheap 1024M' in the cfg, even though I have ~50GB of free space on the directory I have set 'heapdir' to.
If I try to run it with higher maxheap, or with no maxheap, I get this error (PowerShell):
PS C:\Users\admasgve> cd D:\redis-2.8.19
PS D:\redis-2.8.19> .\redis-server.exe
[7476] 25 Feb 09:32:38.419 #
The Windows version of Redis allocates a large memory mapped file for sharing
the heap with the forked process used in persistence operations. This file
will be created in the current working directory or the directory specified by
the 'heapdir' directive in the .conf file. Windows is reporting that there is
insufficient disk space available for this file (Windows error 0x70).
You may fix this problem by either reducing the size of the Redis heap with
the --maxheap flag, or by moving the heap file to a local drive with sufficient
space.
Please see the documentation included with the binary distributions for more
details on the --maxheap and --heapdir flags.
Redis can not continue. Exiting.
Screenshot: http://i.stack.imgur.com/Xae0f.jpg
Free space on D: 49,4 GB
Free space on C: 2,71 GB
Total RAM: 16 GB
Free RAM: ~9 GB
redis.windows.conf:
# Generated by CONFIG REWRITE
loglevel verbose
logfile "stdout"
save 900 1
save 300 10
save 60 10000
dir "D:\\redis-2.8.19"
maxmemory 1024M
# maxheap 2048M
heapdir "D:\\redis-2.8.19"
Everything beside the last 3 lines are generated by redis with the 'CONFIG REWRITE' cmd. I have tried various things, with maxmemory, maxheap and heapdir.
From Redis documentation:
maxmemory / maxheap - the maxheap flag controls the maximum size of this memory mapped file, as well as the total usable space for the Redis heap. Running Redis without either maxheap or maxmemory will result in a memory mapped file being created that is equal to the size of physical memory; The Redis heap must be larger than the value specified by the maxmemory
Have anybody encountered this problem before? What do I do wrong?
Redis doesn't use the conf file in its home directory by default. You have to pass the file in on the command line:
.\redis-server.exe redis.windows.conf
This is what is in my conf file:
maxheap 2048M
heapdir D:\\redisheap
These settings resolved my issue.
This is how to use the maxheap flag, which is more convenient then using a config file:
redis-server --maxheap 2gb
To back up Michael's response, I've had the same problem.
I had ~40GB of free space, and paging file set to 4G-8G.
Redis did not want to start until I set paging file to the amount recommended by Windows themselves, which was 12GB.
Really odd beahaviour.
.\redis-server.exe redis.windows.conf
This is what is in my conf file:
maxheap 2048M
heapdir D:\\redisheap
after passing the above parameters in redis-server.exe redis.windows.conf
the service has started for me thanks for the solution.
maxheap 2048M
heapdir D:\"location where your server is
This Should Solve problem Please Ping me if you have Same Question

What are some useful tips/tools for monitoring/tuning memcached health?

Yesterday, I found this cool script 'memcache-top' which nicely prints out stats of memcached live. It looks like,
memcache-top v0.6 (default port: 11211, color: on, refresh: 3 seconds)
INSTANCE USAGE HIT % CONN TIME EVICT/s READ/s WRITE/s
127.0.0.1:11211 88.8% 94.8% 20 0.8ms 9.0 311.3K 162.8K
AVERAGE: 88.8% 94.8% 20 0.8ms 9.0 311.3K 162.8K
TOTAL: 1.8GB/ 2.0GB 20 0.8ms 9.0 311.3K 162.8K
(ctrl-c to quit.)
it even makes certain text red when you should pay attention to something!
Q. Broadly, what are some useful tools/techniques you've used to check that memcached is set up well?
Good interface to accessing Memcached server instances is phpMemCacheAdmin.
I prefer access from the command line using telnet.
To make a connection to Memcached using Telnet, use the following telnet localhost 11211 command from the command line.
If at any time you wish to terminate the Telnet session, simply type quit and hit return.
You can get an overview of the important statistics of your Memcached server by running the stats command once connected.
Memory is allocated in chunks internally and constantly reused. Since memory is broken into different size slabs, you do waste memory if your items do not fit perfectly into the slab the server chooses to put it in.
So Memcached allocates your data into different "slabs" (think of these as partitions) of memory automatically, based on the size of your data, which in turn makes memory allocation more optimal.
To list the slabs in the instance you are connected to, use the stats slab command.
A more useful command is the stats items, which will give you a list of slabs which includes a count of the items store within each slab.
Now that you know how to list slabs, you can browse inside each slab to list the items contained within by using the stats cachedump [slab ID] [number of items, 0 for all items] command.
If you want to get the actual value of that item, you can use the get [key] command.
To delete an item from the cache you can use the delete [key] command.
For a production systems, you should really set up active monitoring (with downtime alerts, automated restarts etc.) of Memcache using something like Monit. Here is an example config: Monitoring Memcache with Monit
It is good to monitor overall memory usage of memcached for resource planning.
Track the eviction statistics counter to know how often cached items are getting evicted due to lack of memory.
Track cache hit/misses, reclaims(The number of expired items removed to allow space for new writes), current connections, flush cmd which is available in stats.
Memcached stats (can be read from telnet, libmemcached, language specific library)
stats
stats slabs
stats items
stats sizes
stats detail
stats settings
run the above commands using telnet
or simply run using netcat
echo "stats settings" | nc 127.0.0.1 11211
Other scripts/tools
https://github.com/memcached/memcached/tree/master/scripts
memcached top
memcached metrics per slab
This is what memcached metrics per slab looks like
desc for some fields can be found here.
Memcached Prometheus exporter - Exports metrics from memcached servers for consumption by Prometheus.