Error in Nginx+php-fpm with keepalive & fastcgi_keep_conn on - sockets

I am trying to use Nginx + php-fpm with nginx option 'keepalive' & 'fastcgi_keep_conn on' to keep tcp connection active between them but facing errors after serving few hundred requests "104: Connection reset by peer".
These errors are visible with php-fpm started on tcp port ( 9000 ) or unix socket ( /var/run/php5-fpm.socket ).
Intension here is to reduce new tcp/socket connection overhead between Nginx + php-fpm as much as possible and reuse connections as much as possible.
Note that i have kept nginx 'keepalive 20' where as php-fpm 'pm.max_requests = 0' & 'pm.start_servers = 50'.
Can anybody please help me to fix this error?
Softwares in use:
nginx version: nginx/1.4.7
php-fpm version: 5.4.25 / 5.6.6
PHP-FPM Error log entry:
WARNING: [pool www] child 15388 exited on signal 15 (SIGTERM) after 2245.557110 seconds from start
NOTICE: [pool www] child 18701 started
Nginx Errors:
with php-fpm listening on port 9000
[error] 32310#0: *765 readv() failed (104: Connection reset by peer) while reading upstream, client: 10.10.133.xx, server: 192.168.28.xxx, request: "GET /test.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.10.133.xxx"
with php-fpm listening on socket /var/run/php5-fpm.socket
[error] 14894#0: *383 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.10.133.xx, server: 192.168.28.xxx, request: "GET /test.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.socket:", host: "10.10.133.xxx"
Following is the nginx vhost conf
upstream fastcgi_backend {
server 127.0.0.1:9000;
#server unix:/var/run/php5-fpm.socket;
keepalive 30;
}
server {
listen 80;
server_name 10.10.xxx.xxx;
access_log /tmp/ngx_access_80.log;
error_log /tmp/ngx_error_80.log;
location ~ \.php$ {
root /var/www/test/;
include fastcgi_params;
fastcgi_pass fastcgi_backend; //upstream set above
fastcgi_keep_conn on; #Test for keepalive connection to php-fpm
fastcgi_buffer_size 16k;
fastcgi_buffers 4 16k;
}
}
Following is the php-fpm.conf
[global]
pid = /var/run/php-fpm-9000.pid
error_log = /var/log/php-fpm-9000.log
[www]
listen = 0.0.0.0:9000
user = daemon
group = daemon
rlimit_files = 60000
pm = dynamic
pm.max_requests = 0
pm.max_children = 500
pm.start_servers = 50
pm.min_spare_servers = 40
pm.max_spare_servers = 90

You must set nginx keepalive_requests and php-fpm pm.max_requests to the same value to avoid getting this error
[error] recv() failed (104: Connection reset by peer) while reading
response header from upstream
If the two values are not matching, then either nginx or php-fpm end up closing the connection, triggering the error.

There is a bug with php-fpm which makes it fall over when used with nginx's
fastcgi_keep_conn= on;
You need to turn that option to off.

this indicates somehow the php-cgi child 15388 has received a SIGTERM from the OS or PHP-FPM . See https://bugs.php.net/bug.php?id=60961

Related

Meteor Error: write after end

EDIT
It seems that the second server DOES occasionally get this error, this makes me near certain it's a config problem. Could it be one of:
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse =1
version information as requested: Meteor: 1.5.0
OS: Ubuntu 16.04
Provider: AWS EC2
I'm getting the following error, intermittently and seemingly randomly, on both processes running on one server (of a pair). The other server never gets this error, the error doesn't refer to any code I've written, so I can only assume its (a) a bug in Meteor or (b), a bug with my server config. The server whose processes are crashing is also hosting two other meteor sites, both of which occasionally get this error:
Error: write after end
at writeAfterEnd (_stream_writable.js:167:12)
at PassThrough.Writable.write (_stream_writable.js:212:5)
at IncomingMessage.ondata (_stream_readable.js:542:20)
at emitOne (events.js:77:13)
at IncomingMessage.emit (events.js:169:7)
at IncomingMessage.Readable.read (_stream_readable.js:368:10)
at flow (_stream_readable.js:759:26)
at resume_ (_stream_readable.js:739:3)
at nextTickCallbackWith2Args (node.js:511:9)
at process._tickDomainCallback (node.js:466:17)
things I've already checked:
memory limits (nowhere near close)
connection limits - very small, around 20 per server at the time of failure, and the processes were bumped to the second server within 1 minute, which handled them + it's own just fine
process limits - both processes on server 1 failed within 7 minutes of each other.
server config - while I was trying to eek out a little extra performance during load testing, I modified sysctl.conf based on a post I saw for high load node.js servers, this is the contents of the faulty servers sysctl.conf however, the functioning server has an identical config.
.
fs.file-max = 1000000
fs.nr_open = 1000000
ifs.file-max = 70000
net.nf_conntrack_max = 1048576
net.ipv4.netfilter.ip_conntrack_max = 32768
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_max_orphans = 8192
net.ipv4.ip_local_port_range = 16768 61000
net.ipv4.tcp_max_syn_backlog = 10024
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse =1
net.core.somaxconn = 20048
I have an NGINX balancer on server1 which load balances across the 4 processes (2 per server). The NGINX error log is littered with lines as follows:
2017/08/17 16:15:01 [warn] 1221#1221: *6233472 an upstream response is buffered to a temporary file /var/lib/nginx/proxy/1/46/0000029461 while reading upstream, client: 164.68.80.47, server: server redacted, request: "GET path redacted HTTP/1.1", upstream: "path redacted", host: "host redacted", referrer: "referrer redacted"
At the time of the error, I see a pair of lines like this:
2017/08/17 15:07:19 [error] 1222#1222: *6215301 connect() failed (111: Connection refused) while connecting to upstream, client: ip redacted, server: server redacted, request: "GET /admin/sockjs/info?cb=o2ziavvsua HTTP/1.1", upstream: "http://127.0.0.1:8080/admin/sockjs/info?cb=o2ziavvsua", host: "hostname redacted", referrer: "referrer redacted"
2017/08/17 15:07:19 [warn] 1222#1222: *6215301 upstream server temporarily disabled while connecting to upstream, client: ip redacted, server: server redacted, request: "GET /admin/sockjs/info?cb=o2ziavvsua HTTP/1.1", upstream: "http://127.0.0.1:8080/admin/sockjs/info?cb=o2ziavvsua", host: "hostname redacted", referrer: "referrer redacted"
If it matters at all, I'm using a 3 node mongo replica set, where both servers are pointing at all 3 nodes.
I'm also using a custom hosted version of kadira (since it went offline).
If there is no way to stop the errors, is there anyway to stop them taking down the entire process, there are times when 50-100 users are connected per process, booting them all because of one error seems excessive
It's been two days without a crash, so I think the solution was changing:
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
to
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
I don't know which of those was causing the problem (probably the timeout). I still think its a "bug" that a single "Write after end" error crashes the entire meteor process. Perhaps this should simply be logged.

Infinite redirect - nginx

I seem to be having an issue with an infinite redirect with nginx. This has been driving me crazy for the las half hour or so because I am unable to identify where the infinite redirect is occurring.
vHost:
Server ID : SuperUser Shell : sites-available/ > # cat example.com-ssl
server {
listen 80;
server_name www.example.com example.com;
return 301 https://www.example.com$request_uri;
}
server {
listen 443 ssl;
server_name example.com;
return 301 https://www.example.com$request_uri;
# This is for troubleshooting
access_log /var/log/nginx/www.example.com/access.log;
error_log /var/log/nginx/www.example.com/error.log debug;
}
server {
listen 443 default_server ssl;
server_name www.example.com;
ssl on;
ssl_certificate /etc/ssl/certs/www.example.com/2017/www.example.com.crt;
ssl_certificate_key /etc/ssl/certs/www.example.com/2017/www.example.com.key;
ssl_trusted_certificate /etc/ssl/certs/www.example.com/2017/www.example.com.ca-bundle;
ssl_protocols TLSv1.1 TLSv1.2;
ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_dhparam /etc/ssl/certs/www.example.com/2017/dhparam.pem;
add_header Strict-Transport-Security "max-age=63072000; includeSubexamples; ";
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_pass http://127.0.0.1:2368;
}
access_log /var/log/nginx/www.example.com/access.log;
error_log /var/log/nginx/www.example.com/error.log;
}
Server ID : SuperUser Shell : sites-available/ > #
Additional info:
I should also point out that this is a ghost blogging server and that I have updated config.js to reflect https instead of http:
Server ID : SuperUser Shell : ghost/ > # cat config.js
// # Ghost Configuration
// Setup your Ghost install for various [environments](http://support.ghost.org/config/#about-environments).
// Ghost runs in `development` mode by default. Full documentation can be found at http://support.ghost.org/config/
var path = require('path'),
config;
config = {
// ### Production
production: {
url: 'https://www.example.com',
mail: {},
database: {
client: 'sqlite3',
connection: {
filename: path.join(__dirname, '/content/data/ghost.db')
},
debug: false
},
server: {
host: '0.0.0.0',
port: '2368'
}
},
// ### Development **(default)**
development: {
url: 'https://www.example.com',
database: {
client: 'sqlite3',
connection: {
filename: path.join(__dirname, '/content/data/ghost-dev.db')
},
debug: false
},
},
...
Server ID : SuperUser Shell : ghost/ > #
I also restarted this java process using pm2 (What I use to keep ghost running). I even went as far as stopping the process and starting it again
cURL output:
... Same thing as below for 49 times
* Ignoring the response-body
* Connection #0 to host www.example.com left intact
* Issue another request to this URL: 'https://www.example.com/'
* Found bundle for host www.example.com: 0x263b920
* Re-using existing connection! (#0) with host www.example.com
* Connected to www.example.com (123.45.67.89) port 443 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: www.example.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
* Server nginx/1.4.6 (Ubuntu) is not blacklisted
< Server: nginx/1.4.6 (Ubuntu)
< Date: Sat, 26 Nov 2016 08:56:23 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 63
< Connection: keep-alive
< X-Powered-By: Express
< Location: https://www.example.com/
< Vary: Accept, Accept-Encoding
< Strict-Transport-Security: max-age=63072000; includeSubdomains;
<
* Ignoring the response-body
* Connection #0 to host www.bestredflags.com left intact
* Maximum (50) redirects followed
Questions:
Does anyone see where I am making my mistake?
I copied my nginx configuration from my RHEL 7.3 server running nginx version: nginx/1.10.2 (where there are no issues) to my Ubuntu 14.04 running nginx version: nginx/1.4.6 (Ubuntu) could this be part of my problem?
I did not receive an error from nginx -t before pushing the vHost config change.
This leads me to think that there is not an issue with my vHost but a vHost misconfiguration also seems to be the logical choice.
Something that I was wondering about earlier is that this seems quite a bit longer in nginx than redirecting null to www to https://www is in apache. Am I doing this right for nginx?
I am following documentation put forth here:
https://www.digitalocean.com/community/tutorials/how-to-create-temporary-and-permanent-redirects-with-apache-and-nginx
I hope that this is not too much information. I definitely don't want to give too little information.
Thanks for any help / pointers.
This is super embarrassing but I found out when comparing configs that my production server that ghost does not need to have https specified in config.js. This caused my first infinite redirect loop.
PROD : SuperUser Shell : ghost/ > # grep sitename config.js
url: 'http://www.sitename.com',
url: 'http://www.sitename.com',
PROD : SuperUser Shell : ghost/ > #
Secondly I received another redirect loop from CloudFlare when re-enabling DNS Protection
To correct this issue go to the Overview tab > Settings Summary > Click on SSL and change SSL from "Flexible " to "Full (Strict)".

Nginx websocket proxy uses three connections per socket

I am trying to create an Nginx configuration that will serve as a proxy to incoming websocket connections (mainly for SSL offloading), but I am running into connection limits. I followed several guides and SO answers to accommodate more connections but something weird caught my attention. I currently have 18K clients connected and when I run ss -s on the Nginx machine, this is the report:
Total: 54417 (kernel 54537)
TCP: 54282 (estab 54000, closed 280, orphaned 0, synrecv 0, timewait 158/0), ports 18263
Transport Total IP IPv6
* 54537 - -
RAW 0 0 0
UDP 1 1 0
TCP 54002 36001 18001
INET 54003 36002 18001
FRAG 0 0 0
I understand how there can be 36K IP connections, but what I do not get is where those additional IPv6 connections come from. I am having problems scaling above 25K connections and I think part of that comes from the fact that somehow there are three connections set up for each socket. So, my question is this: does anyone know where those extra connections are coming from?
The entire system is running within a Kubernetes cluster, with the configuration as follows:
nginx.conf:
user nginx;
worker_processes auto;
worker_rlimit_nofile 500000;
error_log /dev/stdout warn;
pid /var/run/nginx.pid;
# Increase worker connections to accommodate more sockets
events {
worker_connections 500000;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log off; # don't use it, so don't waste cpu, i/o and other resources.
tcp_nopush on;
tcp_nodelay on;
include /etc/nginx/conf.d/*.conf;
}
proxy.conf (included via conf.d):
server {
listen 0.0.0.0:443 ssl backlog=100000;
# Set a big keepalive timeout to make sure no connections are dropped by nginx
# This should never be less than the MAX_CLIENT_PING_INTERVAL + MAX_CLIENT_PING_TIMEOUT in the ws-server config!
keepalive_timeout 200s;
keepalive_requests 0;
proxy_read_timeout 200s;
ssl_certificate /app/secrets/cert.chain.pem;
ssl_certificate_key /app/secrets/key.pem;
ssl_prefer_server_ciphers On;
ssl_protocols TLSv1.2;
location / {
proxy_pass http://127.0.0.1:8443;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
I also set the following options in Unix:
/etc/sysctl.d/custom.conf:
fs.file-max = 1000000
fs.nr_open = 1000000
net.ipv4.netfilter.ip_conntrack_max = 1048576
net.core.somaxconn = 1048576
net.ipv4.tcp_max_tw_buckets = 1048576
net.ipv4.ip_local_port_range 1024 65000
net.ipv4.tcp_max_syn_backlog = 3240000
net.nf_conntrack_max = 1048576
net.ipv4.tcp_tw_reuse= 1
net.ipv4.tcp_fin_timeout= 15
/etc/security/limits.d/custom.conf:
root soft nofile 1000000
root hard nofile 1000000
* soft nofile 1000000
* hard nofile 1000000
With help of some colleagues I found out that this is actually Kubernetes confusing everything by joining containers within a Pod in one IP namespace (so that each container can access the other via localhost (link)). So what I see there:
Incoming connections from the proxy
Outgoing connections from the proxy
Incoming connections from the server
Although this does not help me to achieve more connections on a single instance, it does explain the weird behaviour.

502 Bad Gateway when redirecting on nginx

I have a problem with nginx redirection. I work on nginx 1.4.4 and i have two seperate redirects. It should work two ways:
First redirect: Address address1.com redirects to address address2.com ->
Address address2.com redirects to addres2.com:1234 where the application resides.
Second redirect is directly from ddress2.com:
- address2.com redirects to address2.com:1234
Now the problem:
- Redirect from address1.com to address2.com works, but address2.com to address2.com:port doesn't. It ends with 502 Bad Gateway error. Configs and
errors from log are presented below:
Information from error.log:
[error] : *386 connect() failed (111: Connection refused) while connecting to upstream, client: {client ip addr}, server:{server name}, request:
"GET / HTTP/1.1", upstream: "https://127.0.0.1:{port}", host: "{server name}"
Nginx uses many .conf files stored in conf.d location.
address1.conf (This works):
server {
### server port and name ###
listen {ip_addr}:443;
ssl on;
server_name address1.com;
access_log /var/log/nginx/address1.log;
error_log /var/log/nginx/address1-error.log;
ssl_certificate /etc/httpd/ssl/servercert.crt;
ssl_certificate_key /etc/httpd/ssl/private/serverkey.key;
location / {
rewrite ^ $scheme://address2.com redirect;
}}
address2.com conf file (This doesn't):
server {
### server port and name ###
listen {ip_addr}:443;
ssl on;
server_name address2.com;
access_log /var/log/nginx/address2.log;
error_log /var/log/nginx/address2-error.log;
ssl_certificate /etc/httpd/ssl/servercert.crt;
ssl_certificate_key /etc/httpd/ssl/private/serverkey.key;
proxy_read_timeout 180;
location / {
proxy_pass https://127.0.0.1:{port};
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Ssl on;
proxy_set_header X-Forwarded-Protocol $scheme;
proxy_set_header X-Forwarded-HTTPS on;
}}
Funny thing is that I have another application working on the scheme addr3.com -> addr3.com:port and redirection works just perfect. The only
difference between address2.conf and address3.conf is port on which applications work. Each address uses https, Port 443 is open on the firewall.
Hope my description is detailed enough, if not just let me know.
I've been struggling with this problem for couple of days and haven't found any tips or solutions suitable for me.
I'd appreciate any help.
The problem might be with SELinux. Check to see if it running with sestatus. Since some forwarding is working for you, this command might be redundant, but others might require it:
sudo setsebool -P httpd_can_network_connect 1
To enable forwaring for specific ports, which might be your problem, run this command:
sudo semanage port -a -t http_port_t -p tcp 8088
Replace 8088 with the port in question.
The command semanage might not be found. How you install it is distro dependent, but you can most likely google for a solution to that.

Haproxy 503 Service Unavailable . No server is available to handle this request

How does haproxy deal with static file , like .css, .js, .jpeg ? When I use my configure file , my brower says :
503 Service Unavailable
No server is available to handle this request.
This my config :
global
daemon
group root
maxconn 4000
pidfile /var/run/haproxy.pid
user root
defaults
log global
option redispatch
maxconn 65535
contimeout 5000
clitimeout 50000
srvtimeout 50000
retries 3
log 127.0.0.1 local3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
listen dashboard_cluster :8888
mode http
stats refresh 5s
balance roundrobin
option httpclose
option tcplog
#stats realm Haproxy \ statistic
acl url_static path_beg -i /static
acl url_static path_end -i .css .jpg .jpeg .gif .png .js
use_backend static_server if url_static
backend static_server
mode http
balance roundrobin
option httpclose
option tcplog
stats realm Haproxy \ statistic
server controller1 10.0.3.139:80 cookie controller1 check inter 2000 rise 2 fall 5
server controller2 10.0.3.113:80 cookie controller2 check inter 2000 rise 2 fall 5
Does my file wrong ? What should I do to solve this problem ? ths !
What I think is the cause:
There was no default_backend defined. 503 will be sent by HAProxy---this will appear as NOSRV in the logs.
Another Possible Cause
Based on one of my experiences, the HTTP 503 error I receive was due to my 2 bindings I have for the same IP and port x.x.x.x:80.
frontend test_fe
bind x.x.x.x:80
bind x.x.x.x:443 ssl blah
# more config here
frontend conflicting_fe
bind x.x.x.x:80
# more config here
Haproxy configuration check does not warn you about it and netstat doesn't show you 2 LISTEN entries, that's why it took a while to realize what's going on.
This can also happen if you have 2 haproxy services running. Please check the running processes and terminate the older one.
Try making the timers bigger and check that the server is reachable.
From the HAproxy docs:
It can happen from many reasons:
The status code is always 3-digit. The first digit indicates a general status :
- 1xx = informational message to be skipped (eg: 100, 101)
- 2xx = OK, content is following (eg: 200, 206)
- 3xx = OK, no content following (eg: 302, 304)
- 4xx = error caused by the client (eg: 401, 403, 404)
- 5xx = error caused by the server (eg: 500, 502, 503)
503 when no server was available to handle the request, or in response to
monitoring requests which match the "monitor fail" condition
When a server's maxconn is reached, connections are left pending in a queue
which may be server-specific or global to the backend. In order not to wait
indefinitely, a timeout is applied to requests pending in the queue. If the
timeout is reached, it is considered that the request will almost never be
served, so it is dropped and a 503 error is returned to the client.
if you see SC in the logs:
SC The server or an equipment between it and haproxy explicitly refused
the TCP connection (the proxy received a TCP RST or an ICMP message
in return). Under some circumstances, it can also be the network
stack telling the proxy that the server is unreachable (eg: no route,
or no ARP response on local network). When this happens in HTTP mode,
the status code is likely a 502 or 503 here.
Check ACLs, check timeouts... and check the logs, that's the most important...