what can be used for (reliably) testing Couchbase latency? - memcached

So, testrunner is over-complicated, cbloadworkgen gives no request times, and mcsoda seems to work bad with membase protocol (thus doing memcached direct tests - totally avoiding cluster layer).
Is there anything else? Or any known way to really bench couchbase with one of that tools?
I need timings for 99%, 95% of requests.

Are you refering about development or production server scenario ? Couchbase latency itself won't be the bottleneck - its basically only network latency and your application stack (e.g. PHP) that causes latency.
I have used weighttp tool to mainly test the request/seconds possible in a development scenario - that tool should also give you the average latency per HTTP request. You can find my setup on GitHub. Its using real HTTP requests to call an gwan C-servlet making multiple CB-queries.
If you want to know about how to achieve best latency and throughput with couchbase:
Use native async code using libcouchbase. PHP, NodeJS, Java.. all of them are the true bottlenecks in production stacks.
Async allows to keep multiple couchbase queries in flight - this way latency can "happen in parallel" - end user will notice only slight improvements, server load will be
Make sure your app is using connection pools. You don't want to initialize a connection for each user request you are serving! The servlet above countains a rough implementation of one.
Keep couchbase instances local, if possible. I was running a cluster where each node was running my app server and a couchbase instance, and each cb node would have their own bucket for caching things that can be cached (like configurations). You can use XDCR if your backup needs are not so strict - dont need automatic failover to live replicas. If ur running 2 Nodes with 1 Replica, it will mean that 50% of your keyspace is on another machine.
If data set is large, make sure to use SSDs.
This way you can achieve incredible performance, up to 300.000 ops / second if using a decent local node (i7 4770k, 32 GB DDR3-1600), 240 GB SSD for ~75$/mo here in germany) and a bit less if in a cluster (not sure why). Here's a 8 node CouchBase setup (with 24 additional load generator nodes) I was running with the above test benchmark:
Conclusion is: don't worry about couchbase itself. Worry only about your own app architecture/code and its latency. One thing is, that CouchBase knows about magic - the other thing is even if its 'bad' - you can hardly get any better latency with other products out there.

Try cbc-pillowfight which ships as part of the libcouchbase-tools package. This supports both memcached and couchbase protocols, and gives timings similar to what you want. For example:
cbc pillowfight -h localhost -b my_bucket -i 10000 -T -d
-T displays the timings, and -d enables memcached (dumb) node. Ensure you change my_bucket to whatever your bucket is named. Full documentation is on the cbc man page.
Example timing output (snipped for brevity):
[1397065322.061025] Populate
+---------+---------+---------+---------+
[110 - 119]us |################ - 68
[120 - 129]us |######################################## - 163
[130 - 139]us |######################### - 105
[140 - 149]us |###################### - 93
[150 - 159]us |####################### - 94
...
+----------------------------------------
[1397065322.070389] Run
+---------+---------+---------+---------+
[110 - 119]us |################ - 69
[120 - 129]us |######################################## - 165
[130 - 139]us |########################## - 109
[140 - 149]us |###################### - 94
[150 - 159]us |####################### - 95
[160 - 169]us |################### - 81
[170 - 179]us |############### - 62
[180 - 189]us |############### - 63
[190 - 199]us |########### - 46
[200 - 209]us |######## - 35
[210 - 219]us |######### - 39
...
+----------------------------------------

Related

Ubuntu crashing - How to diagnose

I have a dedicated server running Ubuntu 20.04, with cPanel 106.11, MySQL 8, PHP 8.1, Elasticsearch 7.17.8 and i run magento 2.4.5-p1. Config Server Security & Firewall is enabled.
Every couple of days i get an monitoring alert to say my server doesnt respond to ping and the host has to do a hard reboot, they are getting frustrated with this and say they will turn off monitoring unless i sort this as they have checked all hardware which is fine.
This happens at different times and usually overnight.
I have looked through syslog, mysql log, elasticsearch log, magento 2 logs, apache log, kern.log and i cant find the cause of the issue.
I have enabled "sar" and the RAM usage around the time is 64%, cpu usage is between 5-10%.
What else can i look at to try and diagnose this issue?
Additional info requested by Wilson:
select count - https://justpaste.it/6zc95
show global status - https://justpaste.it/6vqvg
show global variables - https://justpaste.it/cb52m
full process list - https://justpaste.it/d41lt
status - https://justpaste.it/9ht1i
show engine innodb status - https://justpaste.it/a9uem
top -b -n 1 - https://justpaste.it/4zdbx
top -b -n 1 -H - https://justpaste.it/bqt57
ulimit -a - https://justpaste.it/5sjr4
iostat -xm 5 3 - https://justpaste.it/c37to
df -h, df -i, free -h and cat /proc/meminfo - https://justpaste.it/csmwh
htop - https://freeimage.host/i/HAKG0va
Server is using nvme drives, 32GB RAM, 6 cores, MySQL is running on same server as litespeed.
Server has not gone down again since posting this but the datacentre usually reboot within 15 - 20 mins and 99% of the time happens overnight. The server is not accessible over ssh when it crashes.
Rate Per Second = RPS
Suggestions to consider for your instance (should be available in your cpanel as they are all dynamic variables)
connect_timeout=30 # from 10 seconds to reduce aborted_connects RPHr of 75
innodb_io_capacity=900 # from 200 to use more of NVME IOPS capacity
thread_cache_size=36 # from 9 to reduce threads_created RPHr of 75
read_rnd_buffer_size=32768 # from 256K to reduce handler_read_rnd_next RPS of 5,805
read_buffer_size=524288 # from 128K to reduce handler_read_next RPS of 5,063
Many more opportunities exist to improve performance of your instance.
View profile for contact info, please. We are pushing the one question/one answer planned for this platform.

Monitor Nestjs backend

In the old days, when we wanted to monitor a "Daemon" / Service, we were asking the software editor the list of all the services running in the background in Windows.
If a "Daemon / service" would be down, it would be restarted.
On top of that, we would use a software like NAGIOS or Centreon to monitore this particular "Daemon / service".
I have a team of Software developper in charge of implementing a nice Nest JS.
Here is what we are going to implement:
2 differents VMs running on a high availability VMWARE cluster with a SAN
the two VMs has Vmotion / High availabity settings
an HA Proxy is setup in order to provide load balancing and additional high availability
Our questions are, how can we detect that :
one of our backend is down ?
one of our backend moving from 50ms average response time to 800ms ?
one of our backend consumes more that 15Gb of ram ?
etc
When we were using "old school" daemon, it was enough, when it comes to JS backend, I am a bit clue less.
Cheers
Kynes
nb : the datacenter in charge of our infrastructure is not "docker / kubernetes / ansible etc compliant)
To be fair, all of these seem doable out of the box for Centreon/Nagios. I'd say check the documentation...
one of our backend is down ?
VM DOWN: the centreon-vmware plugins provides monitoring of VM status.
VM UP but Backend DOWN : use the native http/https url checks provided by Centreon/Nagios to load the web page.
Or use the native SNMP plugins to monitor the status of your node process.
one of our backend moving from 50ms average response time to 800ms ?
Ping Response time: Use the native ping check
Status of the network interfaces of the VM: the centreon-vmware plugin has network interface checks for VMs.
Page loading time: use the native http/https url checks provided by Centreon/Nagios.
You may go even further and use a browser automation tool like selenium to run scenarios on your pages and monitor the time for each step.
one of our backend consumes more that 15Gb of ram ?
Total RAM consumed on server: use the native SNMP memory checks from centreon/nagios.
RAM consumed by a specific process: possible through the native SNMP memory plugin.
Like so:
/usr/lib/centreon/plugins/centreon_linux_snmp.pl --plugin os::linux::snmp::plugin --mode processcount --hostname=127.0.0.1 --process-name="centengine" --memory --cpu
OK: Number of current processes running: 1 - Total memory usage: 8.56 MB - Average memory usage: 8.56 MB - Total CPU usage: 0.00 % | 'nbproc'=1;;;0; 'mem_total'=8978432B;;;0; 'mem_avg'=8978432.00B;;;0; 'cpu_total'=0.00%;;;0;`

How to mass insert keys in redis running as pod in kubernetes

I have referred to https://redis.io/topics/mass-insert and tried the Luke protocol,
and did
cat data.txt | redis-cli -a <pass> -h <events-k8s-service> --pipe-timeout 100 > /dev/null
The redirection to /dev/null is to ignore the replies. The CLIENT REPLY of redis can't serve its purpose here from CLI and it may turn into a blocking command.
The data.txt has around 18 Million records/commands like
SELECT 1
SET key1 '"field1":"val1","field2":"val2","field3":"val3","field4":"val4","field5":val5,"field6":val6'
SET key2 '"field1":"val1","field2":"val2","field3":"val3","field4":"val4","field5":val5,"field6":val6'
.
.
.
This command is executed from a cronJob which execs into the redis pod, and executes the above command from within the pod, to understand the footprint, the redis pod had no resources limit and following are the observation:
Keys loaded: 18147292
Time taken: ~31 minutes
Peak CPU: 2063 m
Peak Memory: 4745 Mi
The resources consumed are way too high and the time taken is too long.
The questions:
How do we load mass load data of the order 50 Million keys using redis pipe, is there an alternate approach to this problem ?
Is there a golang/python library that does the same mass loading efficiently(less time , little footprint of mem and cpu) ?
Do we need to fine tune redis here ?
The help is appreciated, Thanks in advance.
If you are using the redis-cli inside the pod to move the millions of key into Redis POD won't be able to handle it.
Also, you have not specified any resources that you are giving to Redis however it's a memory store so it will be better to give proper memory to redis 2-3 GB depends on usegae.
You can try out the Redis-riot : https://github.com/redis-developer/riot
to add data into the Redis.
There is also good video across loading the Big foot data into the redis : https://www.youtube.com/watch?v=TqTg6RijfaU
Do we need to fine tune redis here.
Increase the memory for redis if it's getting OOMkilled.

Distributed tracing in Istio - expected behavior when the application does NOT propagate headers

My application (hosted in a Kubernetes cluster with Istio installed) does NOT propagate distributed tracing headers (as described here). My expectation is that istio-proxy should still generate a trace (consisting of a single call) that would be visible in Jaeger, even though of course the entire chain of calls would not be stitched together. However, that doesn't appear to be the case, as I'm not seeing any calls to my application in Jaeger.
In attempt to troubleshoot I have tried the following:
Logs for the istio-proxy container deployed as a side-car to my application's container look good, I can see incoming requests to the application being registered by Envoy:
kubectl logs -f helloworld-69b7f5b6f8-chp9n -c istio-proxy
[2019-01-29T21:29:18.925Z] - 444 289 45 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:45930 10.244.0.54:80 10.244.0.1:33733
[2019-01-29T21:29:29.922Z] - 444 289 25065 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:46014 10.244.0.54:80 10.240.0.5:56166
[2019-01-29T21:30:05.922Z] - 444 289 15051 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:46240 10.244.0.54:80 10.240.0.6:48053
[2019-01-29T21:30:31.922Z] - 444 289 36 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:46392 10.244.0.54:80 10.240.0.6:47009
I have enabled tracing in Mixer's configuration, and I can now see Mixer's activity in Jaeger UI (but no traces of calls to my application still).
I'm new to Istio, and it appears I have run out of option.
First off, is my expectation correct? Am I supposed to be seeing traces - each consisting of a single call - in Jaeger UI when the application doesn't propagate distributed tracing headers?
If my expectation is correct, how can I troubleshoot further? Can I somehow verify Envoy configuration and check that it's indeed tracing data to Mixer?
If my expectation is incorrect, can Istio's behavior be overridden so that I get what I need?
Thank you.

Queuing in Rackspace Cloud [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've been using EC2 for deployment all the time and now I wanna give Rackspace a try ,My application is have to be scalable, so I used RabbitMQ as the main queuing system . The actions on the front-end could lead to a very large amount of jobs that need execution which I want to queue somewhere.
Due to the expected load profile of the application it makes sense to use a scalable infrastructure like the rackspace cloud. Now I am wondering where it would be best to queue the jobs. Queueing them on the front-end server means that the number of front-end servers can only be scalled back down once the queues are processed which is a waste of resources if the peak load on the front-end is over we want to scale that down and scale up on machines that process the queue items.
If we queue them on the database server we are adding the load onto a single machine which in the current setup is already the most likely botleneck. How would you design this?
is there any built-in queuing for Rackspace something like amazon SQS or something ?
They don't have anything like SQS but there are a few good services that you may be able to take advantage of:
Cloud Files
With Akamai CDN - push all your static stuff right out to your clients (I'm in Gold Coast Australia and cloud files public content comes to me from some server in Brisbane (13 msec vs 250 msec ping times for USA servers) and due to the effect of distance on download speed - faster download times for your users, plus absolutely no clogging the pipes on the web server during the Christmas rush.
The way I use it is:
I create a Cloud files container; this gets a unique hostname.
I create a CNAME DNS record (for example: cdn.supa.ws) pointing to that unique hostname.
I use cloudfuse to mount the directory both on my cloud server and on my home linux box.
Then just copy or upload files straight to that directory, then serve them from http://cdn.yourdomain.com
Load balancers as a service
http://www.rackspace.com/cloud/cloud_hosting_products/loadbalancers/ - Basically a bunch of Zeus load balancers that you can use to push requests to your back end servers. Cool because they're API programmable, so you can scale on the fly and add more backend servers as needed. They also have nice weighting algorithms, so you can send more traffic to certain servers if needed.
Internal VLAN
I would recommend using the 'internal IPs' (10.x.y.z) (the eth1 interface) for message queuing and DB data between Cloud Servers as they give you a higher outgoing bandwidth cap.
Outgoing Bandwidth (speed) caps:
256 MB Ram - 10 Mb/s eth0 - 20 Mb/s eth1
512 MB Ram - 20 Mb/s eth0 - 40 Mb/s eth1
1 GB Ram - 30 eth0 - 60 Mb/s eth1
2 GB Ram - 40 eth0 - 80 Mb/s eth1
4 GB Ram - 50 eth0 - 100 Mb/s eth1
8 GB Ram - 60 eth0 - 120 Mb/s eth1
15.5 GB Ram - 70 eth0 - 140 Mb/s eth1
eth1 is called an Internal VLAN, but it is shared with other customers, so best to firewall off your eth1 as well as your eth0, for example only allow mysql connections from your Cloud Servers; and if you have really sensetive stuff maybe use myqsl with ssl, just in case :)
MySQL as a service
There is also a MySQL as a service private beta. I haven't tried it yet, but looks like it has a lot of potential coolness: http://www.rackspace.com/cloud/blog/2011/12/01/announcing-the-rackspace-mysql-cloud-database-private-beta/
Rackspace don't offer a hosted queuing system.
I've been running RabbitMQ on their Cloud Servers for more than 2 years and things are good.
I haven't tried clustering though so I don't know how easy it would be to setup over there, nor how stable it would be in their environment.
Beanstalkd just rocks- Tubes function as pub-sub and can just work like a charm on any cloud vendor. 3-7 minutes to setup. Blazingly fast since uses memcache like queue.
You can write workers in any language you chose from. You cannot go wrong with this one.
Link:
http://kr.github.com/beanstalkd/