How to mass insert keys in redis running as pod in kubernetes - kubernetes

I have referred to https://redis.io/topics/mass-insert and tried the Luke protocol,
and did
cat data.txt | redis-cli -a <pass> -h <events-k8s-service> --pipe-timeout 100 > /dev/null
The redirection to /dev/null is to ignore the replies. The CLIENT REPLY of redis can't serve its purpose here from CLI and it may turn into a blocking command.
The data.txt has around 18 Million records/commands like
SELECT 1
SET key1 '"field1":"val1","field2":"val2","field3":"val3","field4":"val4","field5":val5,"field6":val6'
SET key2 '"field1":"val1","field2":"val2","field3":"val3","field4":"val4","field5":val5,"field6":val6'
.
.
.
This command is executed from a cronJob which execs into the redis pod, and executes the above command from within the pod, to understand the footprint, the redis pod had no resources limit and following are the observation:
Keys loaded: 18147292
Time taken: ~31 minutes
Peak CPU: 2063 m
Peak Memory: 4745 Mi
The resources consumed are way too high and the time taken is too long.
The questions:
How do we load mass load data of the order 50 Million keys using redis pipe, is there an alternate approach to this problem ?
Is there a golang/python library that does the same mass loading efficiently(less time , little footprint of mem and cpu) ?
Do we need to fine tune redis here ?
The help is appreciated, Thanks in advance.

If you are using the redis-cli inside the pod to move the millions of key into Redis POD won't be able to handle it.
Also, you have not specified any resources that you are giving to Redis however it's a memory store so it will be better to give proper memory to redis 2-3 GB depends on usegae.
You can try out the Redis-riot : https://github.com/redis-developer/riot
to add data into the Redis.
There is also good video across loading the Big foot data into the redis : https://www.youtube.com/watch?v=TqTg6RijfaU
Do we need to fine tune redis here.
Increase the memory for redis if it's getting OOMkilled.

Related

Ubuntu crashing - How to diagnose

I have a dedicated server running Ubuntu 20.04, with cPanel 106.11, MySQL 8, PHP 8.1, Elasticsearch 7.17.8 and i run magento 2.4.5-p1. Config Server Security & Firewall is enabled.
Every couple of days i get an monitoring alert to say my server doesnt respond to ping and the host has to do a hard reboot, they are getting frustrated with this and say they will turn off monitoring unless i sort this as they have checked all hardware which is fine.
This happens at different times and usually overnight.
I have looked through syslog, mysql log, elasticsearch log, magento 2 logs, apache log, kern.log and i cant find the cause of the issue.
I have enabled "sar" and the RAM usage around the time is 64%, cpu usage is between 5-10%.
What else can i look at to try and diagnose this issue?
Additional info requested by Wilson:
select count - https://justpaste.it/6zc95
show global status - https://justpaste.it/6vqvg
show global variables - https://justpaste.it/cb52m
full process list - https://justpaste.it/d41lt
status - https://justpaste.it/9ht1i
show engine innodb status - https://justpaste.it/a9uem
top -b -n 1 - https://justpaste.it/4zdbx
top -b -n 1 -H - https://justpaste.it/bqt57
ulimit -a - https://justpaste.it/5sjr4
iostat -xm 5 3 - https://justpaste.it/c37to
df -h, df -i, free -h and cat /proc/meminfo - https://justpaste.it/csmwh
htop - https://freeimage.host/i/HAKG0va
Server is using nvme drives, 32GB RAM, 6 cores, MySQL is running on same server as litespeed.
Server has not gone down again since posting this but the datacentre usually reboot within 15 - 20 mins and 99% of the time happens overnight. The server is not accessible over ssh when it crashes.
Rate Per Second = RPS
Suggestions to consider for your instance (should be available in your cpanel as they are all dynamic variables)
connect_timeout=30 # from 10 seconds to reduce aborted_connects RPHr of 75
innodb_io_capacity=900 # from 200 to use more of NVME IOPS capacity
thread_cache_size=36 # from 9 to reduce threads_created RPHr of 75
read_rnd_buffer_size=32768 # from 256K to reduce handler_read_rnd_next RPS of 5,805
read_buffer_size=524288 # from 128K to reduce handler_read_next RPS of 5,063
Many more opportunities exist to improve performance of your instance.
View profile for contact info, please. We are pushing the one question/one answer planned for this platform.

Debugging What Process Most Consumed Memory on Pods

I have issue that my application running almost got into its limit at 1 Gi. I've done checking ...
the describe pods but nothing events come
check htop process through exec but just shows nothing heavy running on background
check the memory.stat and showing this
How can I debug whats the process consume most of my memory? I have no many idea about the memory.stat, i've already read the memory.state documentation from this kernel docs and read some stackoverflow but still puzzled. could you please give me a suggest?
htop is a good approach to find relative memory utilization. we see on the screenshot that inside the pod only apache2 are running. Knowing apache I would guess that it has big log files. Can you check by kubectl describe pod if they use emptyDir volumes.
Another approach is from inside the pod to do du -sh /var/log/apache2/* ( check the logs location in config file is no logs there) ; if there big file(s), just truncate them by cat > /var/log/apache2/[name_of_file] , check memory usage, if the volume is backend by RAM you would see decrease in memory usage.

What is the best way to monitor Heroku Postgres memory and cpu

We're on Heroku and trying to understand if it's time to upgrade our Postgres database or not. I have two questions:
Is there any tools you know of that track heroku postgres logs to track their memory and cpu
usage stats over time?
Are those (Memory and CPU usage) even the best metrics to look at to determine if we should upgrade to a larger instance or not?
The most useful tool I've found for monitoring heroku postgres instancs is the logs associated with the database's dyno, which you can monitor using heroku logs -t -d heroku-postgres. This spits out some useful stats every 5 minutes, so if you fill your logs up quickly, this might not output anything right away — use -t to wait for the next log line.
Output will look something like this:
2022-06-27T16:34:49.000000+00:00 app[heroku-postgres]: source=HEROKU_POSTGRESQL_SILVER addon=postgresql-fluffy-55941 sample#current_transaction=81770844 sample#db_size=44008084127bytes sample#tables=1988 sample#active-connections=27 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99818 sample#table-cache-hit-rate=0.9647 sample#load-avg-1m=0.03 sample#load-avg-5m=0.205 sample#load-avg-15m=0.21 sample#read-iops=14.328 sample#write-iops=15.336 sample#tmp-disk-used=543633408 sample#tmp-disk-available=72435159040 sample#memory-total=16085852kB sample#memory-free=236104kB sample#memory-cached=15075900kB sample#memory-postgres=223120kB sample#wal-percentage-used=0.0692420374380985
The main stats I pay attention to are table-cache-hit-rate which is a good proxy for how much of your active dataset fits in memory, and load-avg-1m, which tells you how much load per CPU the server is experiencing.
You can read more about all these metrics here.

Writing to neo4j pod takes much more time than writing to local neo4j

I have a python code where I process some data, write neo4j queries and then commit these queries to neo4j. When I run the code on my local machine and write the output to local neo4j it doesn't take more than 15 minutes. However, when I run my code locally and write the output to noe4j pod in k8s pod it takes double the time, and when I build my code and deploy it to k8s and run that pod and write the output to neo4j pod it takes a round 3 hours. since I'm new to k8s deployment it might something in the pod configurations or settings, so I appreciate if I can get some hints
There could be few reasons of that.
I would first check how much resources does your pod consume while you are processing data, you can do that using kubectl top pod.
Second I would check if there are any limits inside pod. You can read a great deal about them on Managing Compute Resources for Containers.
If you have a limit set then it might be too low and that's causing the extended time while processing data.
If limits are not set then it might be because of how you installed minik8s. I think as default it's being installed with 4G is memory, you can look at alternative methods of installing minik8s. With multipass you can specify more memory to allocate.
There also can be a issue with Page Cache Sizing, Heap Sizing or number of open files. Please read the Neo4j Performance Tuning.

Cassandra: CL != ALL on write will overload cluster

I am running a series of benchmarks with Cassandra. Among others, I tried the following configuration: 1 client node, 3 server nodes (same ring). All experiments are run after cleaning up the servers:
pkill -9 java; sleep 2; rm -r /var/lib/cassandra/*; ./apache-cassandra-1.2.2/bin/cassandra -f
then I run cassandra-stress from the client node (3 replica, consistency ANY/ALL):
[stop/clean/start servers]
./tools/bin/cassandra-stress -o INSERT -d server1,server2,server3 -l 3 -e ANY
[224 seconds]
[stop/clean/start servers]
./tools/bin/cassandra-stress -o INSERT -d server1,server2,server3 -l 3 -e ALL
[368 seconds]
One would deduce that decreasing the consistency level increases performance. However, there is no reason why this should happen. The bottleneck is the CPU on the servers and they all have to eventually do a local write. In fact, a careful read of the server logs reveals that hinted hand-off has taken place. Repeating the experiment, I sometimes get UnavailableException on the client and "MUTATION messages dropped" on the server.
Is this issue documented? Should CL != ALL be considered harmful on writes?
I'm not quite sure what your point is. Things appear to be working as designed.
Yes, if you're writing at CL.ONE it will complete the write faster that at CL.ALL - because it only has to get an ACK from one node - not all of them.
However, you're not measuring the time that will be taken to repair the data. You will get time spent queueing up and processing the hinted handoffs - however, nodes only hold this up for an hour.
Eventually, you'll have to run a nodetool repair to correct the consistency and delete the tombstones.