Speeding up syscall in Go - sockets

I've been working on a vpn written in go and I'm starting to try to optimize the data flow. From a cursory glance, the implementation code seems sound as there are no issues with memory leaking and CPU doesn't seem to be a constraint.
So I moved to pprof and the problem I am seeing is that most of the execution time is spent in syscall.Syscall. I did a 6 second profile of a running iperf throughput test and this is what I see:
This test is being run with both the client and server inside of docker containers with the client getting a --link to the server. Running iperf on the base bridge networking yields around 40Gbit of throughput, iperf over this vpn impl over the top of the same, nets about 500Mbit.
A simple htop shows that 3/4 of the time is spent in the system.
I've tried a couple approaches to attempt speeding up the single-client case, but I can't seem to find a way to mitigate writing packets in a vpn server... NB: iperf uses full MTU-sized packets during its test which limits some obvious optimizations.
listing Syscall:
Not sure why this is showing the CMPQ is taking all the time, I'd think that should be attributed to SYSCALL.

pprof is a process sampling profiler. It finds that the Program Counter (PC) is often waiting for CMPQ to execute while the OS is executing.
Speeding up syscall in Go
You can make the SYSCALL less often. You can improve the OS SYSCALL mechanism. You can improve the OS code that you asked the SYSCALL to execute. You can use better hardware. And so on.

Related

How do operating systems deal with fork bombs?

Consider the following fork bomb in Python (source):
import os
while 1:
os.fork()
I'm too afraid to test it out myself, but I'm somewhat skeptical that if I just took this program and ran it my computer would just freeze up and die. Assuming this is true, my question is -- what mechanisms or policies is my operating system using to fight it off?
My question can be viewed as sort of an "application" problem to what one might learn in an OS class.
As expected, when I tried it out on my machine, the computer froze and I had to hard reboot. So definitely don't do this on a regular basis.
The last error that I was able to capture from the program was:
BlockingIOError: [Errno 11] Resource temporarily unavailable
File "fork_bomb.py", line 3, in <module>
os.fork()
So at some point, the OS couldn't handle the OS fork calls and returned an error. The only other useful message I can see from /var/log/syslog is
cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/session-2.scope
Cgroups are a way to restrict resources from processes within a particular cgroup. So presumably, the python processes were in a cgroup that had reached its pid/task limit. So that's one way the OS tries to deal with fork bombs, is limiting tasks using cgroups. Of course, the infinite loop of forks, even if the forks were failing, still required overhead from requesting resources from the OS, hence the system freeze.
Theoretically, another way the OS can try to limit fork bombs is through memory limits. Ignoring copy-on-write, if all the forked processes required extra memory, the Linux OOM (out of memory) killer will be called. This kernel process will be awakened when memory is tight and then its job is to start killing processes that it thinks will help free up sufficient memory to keep the system running. Memory limits can be set using cgroups or by setting the minimum free memory using /proc/sys/vm/min_free_kbytes.

Losing SSH Connectivity repeatedly on Google Compute Engine

I ve been coding on vscode remotely connected to an instance on the google Compute Engine. I have an internet connection speed of around 30-40 mbps. What I have observed is I keep losing connection to the remote machine very frequently. What I have also observed is there are times when this issue occurs especially when certain memory intensive operations are run. So,
Question 1: Is there a relationship between RAM and ssh connectivity.
Question 2: Is my internet connectivity speed a problem? If yes what is the minimum amount of speed necessary a seamless coding experience.
The only relationship between the RAM and the SSH service is that the SSH is also using RAM to be able to operate. In your case, you already got a clue that the SSH Service crashes from time to time and mostly when performing memory intensive operations. Your machine is falling short on resources and hence in order to keep the OS up, the process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.
With your current speed, connection is not an issue.
One of the best ways to tackle this is:
increase the resources of your VM (RAM)
then go back to code and check the requirements and limitation of your app
You can also check this official SSH Troubleshooting guide from google. Troubleshooting SSH

Pytest live logging with parallel execution - possible?

I have a test suite that I run with
python3 -mpytest --log-cli-level=DEBUG ...
on the build server. The live logs are useful to troubleshoot if the tests get stuck or are slow for some reason (the tests use external resources).
To speed things up, it is possible to run them with e.g.
python3 -mpytest -n 4 --log-cli-level=DEBUG ...
to have four parallel test runners. Speedup is almost linear with number of processes, which is great, but unfortunately the parent process swallows all live logs. I get the captured logs in case of a test failure, but I need the live logs as well to understand what is going on in real time. I understand that the output from all four parallel runs will be intermixed and that is fine. The purpose is for the committer to just check the build server output and know roughly what is going on.
I am currently using pytest-xdist, but use none of the more advanced features from it (just the multiprocessing).

Distributed write job crashes remote machine with MongoDB server

Looking for any advice I can get.
I have 16 virtual CPUs all writing to a single remote MongoDB server. The machine that's being written to is a 64-bit machine with 32GB RAM, running Windows Server 2008 R2. After a certain amount of time, all the CPUs stop cold (no gradual performance reduction), and any attempt to get a Remote Desktop Connection hangs.
I'm writing from Python via pymongo, and the insert statement is "[collection].insert([document], safe=True)"
I decided to more actively monitor my server as the distributed write job progressed, remoting in from time to time and checking the Task Manager. What I see is a steady memory creep, from 0.0GB all the way up to 29.9GB, in a fairly linear fashion. My leading theory is therefore that my writes are filling up the memory and eventually overwhelming the machine.
Am I missing something really basic? I'm new to MongoDB, but I remember that when writing to a MySQL database, inserts are typically followed by commits, where it's the commit statement that actually makes sure the record is written. Here I'm not doing any commits...?
Thanks,
Dave
Try it with journaling turned off and see if the problem remains.

How to grab a full memory dump of a large memory usage

I am hosting IIS based web service applications on Windows 2008 64-bit system running on a Quad core 8G machine. Ran into couple of instances when W3WP was running at 7.6G of memory usage. Nothing else was responding on the system including RDP. Right click on the process from the task manager and creating the dumps, froze the system and all its threads for a long time (close to 30minutes). When the freeze up occurred during off hours, we let the dump run for a while (ran close to 1 hour) but still dump didn't complete. In the interest of getting the system up, we had to kill IIS
Tried other tools like procexp, debug diag etc to create full memory dump and all have the same results
So, what tool does the community use to grab dump files quickly? Or without freezing all the threads? I realize latter might be a rhetorical question. But what are the options for generating such a large dump file without locking up the system for a long time?
IMO you shouldn't have to wait until the process memory grows to 8 GB. I am sure with something like 3 - 4 GB you should be able to detect the memory leak.
Procdump has an option based on memory threshold
-m Memory commit threshold in MB at which to create a dump of the process.
I would you this option to dump the memory of the process.
And also SSD would help in writing faster.
WPA a.k.a xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx) is a powerfull tool, to diagnose the applications. You will get call stack of the culprit allocation. You dont have to collect the dump and it is no-invasive and does not load much in production systems
Complete step by step information is available here. http://msdn.microsoft.com/en-us/library/ff190906(v=VS.85).aspx.