Nginx websocket proxy uses three connections per socket - sockets

I am trying to create an Nginx configuration that will serve as a proxy to incoming websocket connections (mainly for SSL offloading), but I am running into connection limits. I followed several guides and SO answers to accommodate more connections but something weird caught my attention. I currently have 18K clients connected and when I run ss -s on the Nginx machine, this is the report:
Total: 54417 (kernel 54537)
TCP: 54282 (estab 54000, closed 280, orphaned 0, synrecv 0, timewait 158/0), ports 18263
Transport Total IP IPv6
* 54537 - -
RAW 0 0 0
UDP 1 1 0
TCP 54002 36001 18001
INET 54003 36002 18001
FRAG 0 0 0
I understand how there can be 36K IP connections, but what I do not get is where those additional IPv6 connections come from. I am having problems scaling above 25K connections and I think part of that comes from the fact that somehow there are three connections set up for each socket. So, my question is this: does anyone know where those extra connections are coming from?
The entire system is running within a Kubernetes cluster, with the configuration as follows:
nginx.conf:
user nginx;
worker_processes auto;
worker_rlimit_nofile 500000;
error_log /dev/stdout warn;
pid /var/run/nginx.pid;
# Increase worker connections to accommodate more sockets
events {
worker_connections 500000;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log off; # don't use it, so don't waste cpu, i/o and other resources.
tcp_nopush on;
tcp_nodelay on;
include /etc/nginx/conf.d/*.conf;
}
proxy.conf (included via conf.d):
server {
listen 0.0.0.0:443 ssl backlog=100000;
# Set a big keepalive timeout to make sure no connections are dropped by nginx
# This should never be less than the MAX_CLIENT_PING_INTERVAL + MAX_CLIENT_PING_TIMEOUT in the ws-server config!
keepalive_timeout 200s;
keepalive_requests 0;
proxy_read_timeout 200s;
ssl_certificate /app/secrets/cert.chain.pem;
ssl_certificate_key /app/secrets/key.pem;
ssl_prefer_server_ciphers On;
ssl_protocols TLSv1.2;
location / {
proxy_pass http://127.0.0.1:8443;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
I also set the following options in Unix:
/etc/sysctl.d/custom.conf:
fs.file-max = 1000000
fs.nr_open = 1000000
net.ipv4.netfilter.ip_conntrack_max = 1048576
net.core.somaxconn = 1048576
net.ipv4.tcp_max_tw_buckets = 1048576
net.ipv4.ip_local_port_range 1024 65000
net.ipv4.tcp_max_syn_backlog = 3240000
net.nf_conntrack_max = 1048576
net.ipv4.tcp_tw_reuse= 1
net.ipv4.tcp_fin_timeout= 15
/etc/security/limits.d/custom.conf:
root soft nofile 1000000
root hard nofile 1000000
* soft nofile 1000000
* hard nofile 1000000

With help of some colleagues I found out that this is actually Kubernetes confusing everything by joining containers within a Pod in one IP namespace (so that each container can access the other via localhost (link)). So what I see there:
Incoming connections from the proxy
Outgoing connections from the proxy
Incoming connections from the server
Although this does not help me to achieve more connections on a single instance, it does explain the weird behaviour.

Related

HAproxy not routing from virtual IP

I am currently trying to configure HAProxy to route between two servers using a virtual IP.
For testing I created two instances, 172.16.4.130 and 172.16.4.131. I am then creating a virtual IP address of 172.16.4.99, using keepalived which will be bridging the two servers. Both of these servers are running apache2, which is hosting a simple index.html landing page for testing. All of the above is running.
When I go to 172.16.4.99, the page does not load, nor am I redirected to either one of the index.html pages. I can however, ping this IP address. I feel like this is a simple configuration issue, and since I am not very experienced with HAproxy, I would like some assistance. Below are my haproxy.cfg files, as well as keepalived.
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 4096
#debug
#quiet
user haproxy
group haproxy
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webfarm 172.16.4.99:80
mode http
stats enable
stats auth user:password
balance roundrobin
cookie JSESSIONID prefix
option httpclose
option forwardfor
option httpchk HEAD /check.txt HTTP/1.0
server webA 172.16.4.130:8080 cookie A check
server webB 172.16.4.131:8080 cookie B check
keepalived.conf on 172.16.4.130
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101 # 101 on master, 100 on backup
virtual_ipaddress {
172.16.4.99
}
track_script {
chk_haproxy
}
}
keepalived.conf on 172.16.4.131:
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 100 # 101 on master, 100 on backup
virtual_ipaddress {
172.16.4.99
}
track_script {
chk_haproxy
}
}
I have made similar structure to balancing transactions for the MYSQL. I can reach the MYSQL server behind virtual IP. Maybe my config helps you.
https://serverfault.com/questions/857241/haproxy-dont-balancing-requests-between-nodes-of-galera-cluster
It would be greate if it helps you.

502 Bad Gateway when redirecting on nginx

I have a problem with nginx redirection. I work on nginx 1.4.4 and i have two seperate redirects. It should work two ways:
First redirect: Address address1.com redirects to address address2.com ->
Address address2.com redirects to addres2.com:1234 where the application resides.
Second redirect is directly from ddress2.com:
- address2.com redirects to address2.com:1234
Now the problem:
- Redirect from address1.com to address2.com works, but address2.com to address2.com:port doesn't. It ends with 502 Bad Gateway error. Configs and
errors from log are presented below:
Information from error.log:
[error] : *386 connect() failed (111: Connection refused) while connecting to upstream, client: {client ip addr}, server:{server name}, request:
"GET / HTTP/1.1", upstream: "https://127.0.0.1:{port}", host: "{server name}"
Nginx uses many .conf files stored in conf.d location.
address1.conf (This works):
server {
### server port and name ###
listen {ip_addr}:443;
ssl on;
server_name address1.com;
access_log /var/log/nginx/address1.log;
error_log /var/log/nginx/address1-error.log;
ssl_certificate /etc/httpd/ssl/servercert.crt;
ssl_certificate_key /etc/httpd/ssl/private/serverkey.key;
location / {
rewrite ^ $scheme://address2.com redirect;
}}
address2.com conf file (This doesn't):
server {
### server port and name ###
listen {ip_addr}:443;
ssl on;
server_name address2.com;
access_log /var/log/nginx/address2.log;
error_log /var/log/nginx/address2-error.log;
ssl_certificate /etc/httpd/ssl/servercert.crt;
ssl_certificate_key /etc/httpd/ssl/private/serverkey.key;
proxy_read_timeout 180;
location / {
proxy_pass https://127.0.0.1:{port};
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Ssl on;
proxy_set_header X-Forwarded-Protocol $scheme;
proxy_set_header X-Forwarded-HTTPS on;
}}
Funny thing is that I have another application working on the scheme addr3.com -> addr3.com:port and redirection works just perfect. The only
difference between address2.conf and address3.conf is port on which applications work. Each address uses https, Port 443 is open on the firewall.
Hope my description is detailed enough, if not just let me know.
I've been struggling with this problem for couple of days and haven't found any tips or solutions suitable for me.
I'd appreciate any help.
The problem might be with SELinux. Check to see if it running with sestatus. Since some forwarding is working for you, this command might be redundant, but others might require it:
sudo setsebool -P httpd_can_network_connect 1
To enable forwaring for specific ports, which might be your problem, run this command:
sudo semanage port -a -t http_port_t -p tcp 8088
Replace 8088 with the port in question.
The command semanage might not be found. How you install it is distro dependent, but you can most likely google for a solution to that.

Error in Nginx+php-fpm with keepalive & fastcgi_keep_conn on

I am trying to use Nginx + php-fpm with nginx option 'keepalive' & 'fastcgi_keep_conn on' to keep tcp connection active between them but facing errors after serving few hundred requests "104: Connection reset by peer".
These errors are visible with php-fpm started on tcp port ( 9000 ) or unix socket ( /var/run/php5-fpm.socket ).
Intension here is to reduce new tcp/socket connection overhead between Nginx + php-fpm as much as possible and reuse connections as much as possible.
Note that i have kept nginx 'keepalive 20' where as php-fpm 'pm.max_requests = 0' & 'pm.start_servers = 50'.
Can anybody please help me to fix this error?
Softwares in use:
nginx version: nginx/1.4.7
php-fpm version: 5.4.25 / 5.6.6
PHP-FPM Error log entry:
WARNING: [pool www] child 15388 exited on signal 15 (SIGTERM) after 2245.557110 seconds from start
NOTICE: [pool www] child 18701 started
Nginx Errors:
with php-fpm listening on port 9000
[error] 32310#0: *765 readv() failed (104: Connection reset by peer) while reading upstream, client: 10.10.133.xx, server: 192.168.28.xxx, request: "GET /test.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.10.133.xxx"
with php-fpm listening on socket /var/run/php5-fpm.socket
[error] 14894#0: *383 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.10.133.xx, server: 192.168.28.xxx, request: "GET /test.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.socket:", host: "10.10.133.xxx"
Following is the nginx vhost conf
upstream fastcgi_backend {
server 127.0.0.1:9000;
#server unix:/var/run/php5-fpm.socket;
keepalive 30;
}
server {
listen 80;
server_name 10.10.xxx.xxx;
access_log /tmp/ngx_access_80.log;
error_log /tmp/ngx_error_80.log;
location ~ \.php$ {
root /var/www/test/;
include fastcgi_params;
fastcgi_pass fastcgi_backend; //upstream set above
fastcgi_keep_conn on; #Test for keepalive connection to php-fpm
fastcgi_buffer_size 16k;
fastcgi_buffers 4 16k;
}
}
Following is the php-fpm.conf
[global]
pid = /var/run/php-fpm-9000.pid
error_log = /var/log/php-fpm-9000.log
[www]
listen = 0.0.0.0:9000
user = daemon
group = daemon
rlimit_files = 60000
pm = dynamic
pm.max_requests = 0
pm.max_children = 500
pm.start_servers = 50
pm.min_spare_servers = 40
pm.max_spare_servers = 90
You must set nginx keepalive_requests and php-fpm pm.max_requests to the same value to avoid getting this error
[error] recv() failed (104: Connection reset by peer) while reading
response header from upstream
If the two values are not matching, then either nginx or php-fpm end up closing the connection, triggering the error.
There is a bug with php-fpm which makes it fall over when used with nginx's
fastcgi_keep_conn= on;
You need to turn that option to off.
this indicates somehow the php-cgi child 15388 has received a SIGTERM from the OS or PHP-FPM . See https://bugs.php.net/bug.php?id=60961

Redirect users using nginx as a load balancer to save on bandwidth

I recently purchased 4 servers for my videos to even out the load. I'm currently using nginx as a load balancer but I'm running out of bandwidth.
Is there any way I can redirect users to one of the servers to lower my bandwidth usage and still be able to detect if the server is up?
This is what i'm currently using:
upstream videos {
server xx.xx.xxx.130:8080;
server xx.xx.xxx.131:8080;
server xx.xx.xxx.132:8080;
server xx.xx.xxx.133:8080;
}
proxy_next_upstream error;
server {
listen 80;
server_name www.example.com;
location / {
proxy_pass http://videos;
proxy_redirect off;
proxy_set_header Host $http_host;
}
}
You can play with weight parameter for bandwith balancing:
upstream videos {
server xx.xx.xxx.130:8080 weight=5; # high bandwith server
server xx.xx.xxx.131:8080 weight=5; # high bandwith server
server xx.xx.xxx.132:8080 weight=3; # middle bandwith server
server xx.xx.xxx.133:8080 weight=1; # low bandwith server
}
So, every 14 requests will go : 5 to xx.xx.xxx.130 and 131 servers, 3 to 132 and one to 133.
Read more: http://nginx.org/en/docs/http/load_balancing.html#nginx_weighted_load_balancing

Nginx and Flask-socketio Websockets: Alive but not Messaging?

I've been having a bit of trouble getting Nginx to play nicely with the Python Flask-socketio library (which is based on gevent). Currently, since we're actively developing, I'm trying to get Nginx to just work as a proxy. For sending pages, I can get this to work, either by directly running the flask-socketio app, or by running through gunicorn. One hitch: the websocket messaging does not seem to work. The pages are successfully hosted and displayed. However, when I try to use the websockets, they do not work. They are alive enough that the websocket thinks it is connected, but they will not send a message. If I remove the Nginx proxy, they do work. Firefox gives me this error when I try to send a message:
Firefox can't establish a connection to the server at ws:///socket.io/1/websocket/.
Where web address is where the server is located and the unique id is just a bunch of randomish digits. It seems to be doing enough to keep the connection live (e.g., the client thinks it is connected), but can't send a message over the websocket. I have to think that the issue has to do with some part of the proxy, but am having mighty trouble debugging what the issue might be (in part because this is my first go-round with both Flask-socketIO and nginx). The configuration file I am using for nginx is:
user <user name>; ## This is set to the user name for the remote SSH session
worker_processes 5;
events {
worker_connections 1024; ## Default: 1024
}
http {
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
sendfile on;
server_names_hash_bucket_size 128; # this seems to be required for some vhosts
server {
listen 80;
server_name _;
location / {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}
}
}
I made the config file as an amalgam of a general example and a websocket specific one, but trying to fiddle with it has not solved the issue. Also, I am using the werkzeug Proxy_Fix call on my Flask app.wsgi_app when I use it in wsgi mode. I've tried it with and without that, to no avail, however. If anyone has some insight, I will be all ears/eyes.
I managed to fix this. The issues were not specific to flask-socketio, but they were specific to Ubuntu, NginX, and gevent-socketio. Two significant issues were present:
Ubuntu 12.04 has a truly ancient version of nginx (1.1.19 vs 1.6.x for stable versions). Why? Who knows. What we do know is that this version does not support websockets in any useful way, as 1.3.13 is about the earliest you should be using.
By default, gevent-socketio expects your sockets to be at the location /socket.io . You can upgrade the whole HTTP connection, but I had some trouble getting that to work properly (especially after I threw SSL into the mix).
I fixed #1, but in fiddling with it I purged by nginx and apt-get installed... the default version of nginx on Ubuntu. Then, I was mysteriously confused as to why things worked even worse than before. Many .conf files valiantly lost their lives in this battle.
If trying to debug websockets in this configuration, I would recommend the following steps:
Check your nginx version via 'nginx -v'. If it is anything less than 1.4, upgrade it.
Check your nginx.conf settings. You need to make sure the connection upgrades.
Check that your server IP and port match your nginx.conf reverse proxy.
Check that your client (e.g., socketio.js) connects to the right location and port, with the right protocol.
Check your blocked ports. I was on EC2, so you have to manually open 80 (HTTP) and 443 (SSL/HTTPS).
Having just checked all of these things, there are takeaways.
Upgrading to the latest stable nginx version on Ubuntu (full ref) can be done by:
sudo apt-get install python-software-properties
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:nginx/stable
sudo apt-get update
sudo apt-get install nginx
In systems like Windows, you can use an installer and will be less likely to get a bad version.
Many config files for this can be confusing, since nginx officially added sockets in about 2013, making earlier workaround configs obsolete. Existing config files don't tend to cover all the bases for nginx, gevent-socketio, and SSL together, but have them all separately (Nginx Tutorial, Gevent-socketio, Node.js with SSL). A config file for nginx 1.6 with flask-socketio (which wraps gevent-socketio) and SSL is:
user <user account, probably optional>;
worker_processes 2;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log;
sendfile on;
# tcp_nopush on;
keepalive_timeout 3;
# tcp_nodelay on;
# gzip on;
client_max_body_size 20m;
index index.html;
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
# Listen on 80 and 443
listen 80 default;
listen 443 ssl; (only needed if you want SSL/HTTPS)
server_name <your server name here, optional unless you use SSL>;
# SSL Certificate (only needed if you want SSL/HTTPS)
ssl_certificate <file location for your unified .crt file>;
ssl_certificate_key <file location for your .key file>;
# Optional: Redirect all non-SSL traffic to SSL. (if you want ONLY SSL/HTTPS)
# if ($ssl_protocol = "") {
# rewrite ^ https://$host$request_uri? permanent;
# }
# Split off basic traffic to backends
location / {
proxy_pass http://localhost:8081; # 127.0.0.1 is preferred, actually.
proxy_redirect off;
}
location /socket.io {
proxy_pass http://127.0.0.1:8081/socket.io; # 127.0.0.1 is preferred, actually.
proxy_redirect off;
proxy_buffering off; # Optional
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
}
}
}
Checking that your Flask-socketio is using the right port is easy. This is sufficient to work with the above:
from flask import Flask, render_template, session, request, abort
import flask.ext.socketio
FLASK_CORE_APP = Flask(__name__)
FLASK_CORE_APP.config['SECRET_KEY'] = '12345' # Luggage combination
SOCKET_IO_CORE = flask.ext.socketio.SocketIO(FLASK_CORE_APP)
#FLASK_CORE_APP.route('/')
def index():
return render_template('index.html')
#SOCKET_IO_CORE.on('message')
def receive_message(message):
return "Echo: %s"%(message,)
SOCKET_IO_CORE.run(FLASK_CORE_APP, host=127.0.0.1, port=8081)
For a client such as socketio.js, connecting should be easy. For example:
<script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/socket.io/0.9.16/socket.io.min.js"></script>
<script type="text/javascript">
var url = window.location.protocol + document.domain + ':' + location.port,
socket = io.connect(url);
socket.on('message', alert);
io.emit("message", "Test")
</script>
Opening ports is really more of a server-fault or a superuser issue, since it will depend a lot on your firewall. For Amazon EC2, see here.
If trying all of this does not work, cry. Then return to the top of the list. Because you might just have accidentally reinstalled an older version of nginx.