I'm currently testing extreme condition on a piece of code written with Erlang.
I have implemented learnyousomeerlang.com's technique of supervisor to have multiple accept capability.
Here the code slightly modified to handle SSL connections of the supervisor:
-module(mymodule).
-behaviour(supervisor).
-export([start/0, start_socket/0]).
-define(SSL_OPTIONS, [{active, true},
{mode, list},
{reuseaddr, true},
{cacertfile, "./ssl_key/server/gd_bundle.crt"},
{certfile, "./ssl_key/server/cert.pem"},
{keyfile, "./ssl_key/server/key.pem"},
{password, "********"}
]).
-export([init/1]).
start_link() ->
application:start(crypto),
crypto:start(),
application:start(public_key),
application:start(ssl),
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, LSocket} = ssl:listen(4242, ?SSL_OPTIONS),
spawn_link(fun empty_listeners/0),
{ok, {{simple_one_for_one, 60, 3600},
[{socket,
{mymodule_serv, start_link, [LSocket]}, % pass the socket!
temporary, 1000, worker, [mymodule_serv]}
]}}.
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,100)],
ok.
start_socket() ->
supervisor:start_child(?MODULE, []).
Here's the code for gen_server which will represent every client connecting :
-module(mymodule_serv).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, terminate/2, code_change/3, handle_info/2]).
start_link(Socket) ->
gen_server:start_link(?MODULE, Socket, []).
init(Socket) ->
gen_server:cast(self(), accept),
{ok, #client{socket=Socket, pid=self()}}.
handle_call(_E, _From, Client) ->
{noreply, Client}.
handle_cast(accept, C = #client{socket=ListenSocket}) ->
{ok, AcceptSocket} = ssl:transport_accept(ListenSocket),
mymodule:start_socket(),
ssl:ssl_accept(AcceptSocket),
ssl:setopts(AcceptSocket, [{active, true}, {mode, list}]),
{noreply, C#client{socket=AcceptSocket, state=connecting}}.
[...]
I have the ability to launch close to 10.000 connections at once from multiple server.
While it will take 10 second to a ssl accepting bit of C++ code to accept all of them (which don't even have multiple accept pending), in Erlang this is quite different. It will accept at most 20 connections a second (according to netstat info, whilst C++ accept more like 1K connection per seconds)
While the 10K connections are awaiting for acceptance, I'm manually trying to connect as well.
openssl s_client -ssl3 -ign_eof -connect myserver.com:4242
3 cases happen when I do :
Connection simply timeout
Connection will connect after waiting for it 30 sec. at least
Connection will occur almost directly
When I try connecting manually with 2 consoles, the first done handshaking will not always be the first which tried to connect... Which I found particular.
The server configuration is :
2 x Intel® Xeon® E5620
8x 2.4GHz
24 Go RAM
I'm starting the Erlang shell with :
$erl +S 8:8
EDIT 1:
I have even tried to accept the connection with gen_tcp, and upgrading afterwards the connection to a SSL one. Still the same issue, it won't accept more than 10 connections a second... Is ssl:ssl_accept is doing this ? does it lock anything that would prevent Erlang to scale this ?
EDIT 2:
After looking around on other SSL server created in erlang, it seems that they use some kind of driver for SSL/TLS connection, my examples are RabbitMQ and EjabberD.
Nowhere there is ssl:ssl_accept in their Erlang code, I haven't investigate a lot, but it seems they have created their own driver in order to upgrade the TCP Socket to a SSL/TLS one.
Is that because there is an issue with Erlang's module SSL ? Does anyone know why they are using custom driver for SSL/TLS ?
Any thoughts on this ?
Actually it was not the SSL accept or handshake that was slowing the whole thing.
We found on the erlang question list that it was the backlog.
Backlog is set to 5 by default. I have set it to SOMAXCONN and everything works fine now !
Related
I have an issue,there is a piece of code which is in tcl opens a client socket in Linux. Because of my tests the socket is connection timed out after 2 hours. I'm looking for commands in tcl to keep alive the opened socket. So far I have got setsockopt,SO_KEEPALIVE all of these are in C language. Can someone help me how do we keep alive sockets in tcl.
I tried with setsockopt but it didn't worked as it is in C.I looked at tcp_keepalive_time,tcp_keepalive_intrvl,tcp_keepalive_probes which are in order 7200,75,9. I tried to modify those parameters but I dont have user permissions(admin restrictions).
if [catch {socket $use_host $use_port} comIdW] {
error "Error: Unable to connect to host $use_host port $use_port: $comIdW"
}
fconfigure $comIdW -buffering none -translation binary
set comIdR $comIdW
# I added following code based on my understanding
set optval 1
set optlen [llength $optval]
seetsockopt($comIdW,SOL_SOCKET,SO_KEEPALIVE,$optval,$optlen)
puts "SO_KEEPALIVE is ($optval ? "ON" : "OFF"))"
I wanted to keep this channel alive, it might be good If I can ping after 30 minutes of channel open
tried with setsockopt but it didn't worked as it is in C
There is currently no built-in way of setting at the script level the SO_KEEPALIVE on a socket controlled by Tcl as a Tcl channel (fconfigure). See also TIP#344.
I want to simulate test cross-platform connection failures / timeouts, starting with blocking connect()s:
#!/usr/bin/python3
import socket
s = socket.socket()
endpoint = ('localhost', 28813)
s.bind((endpoint))
# listen for connections, accept 0 connections kept waiting (backlog)
# all other connect()s should block indefinitely
s.listen(0)
for i in range(1,1000):
c = socket.socket()
c.connect(endpoint)
# print number of successfully connected sockets
print(i)
On Linux, it prints "1" and hangs indefinitely (i.e. the behavior I want).
On Windows (Server 2012), it prints "1" and aborts with a ConnectionRefusedError.
On macOS, it prints all numbers from 1 to 128 and then hangs indefinitely.
Thus, I could accept the macOS ignores the backlog parameter and just connect enough sockets for clients to block on new connections.
How can I get Windows to also block connect() attempts?
On Windows, the SO_CONDITIONAL_ACCEPT socket option allows the application to have the incoming connections wait until it's accept()ed. The constant (SO_CONDITIONAL_ACCEPT=0x3002) isn't exposed in the Python module, but can be supplied manually:
s.bind(endpoint)
s.setsockopt(socket.SOL_SOCKET, 0x3002, 1)
s.listen(0)
It's so effective that even the first connect is kept waiting.
On macOS, backlog=0 is reset to backlog=SOMAXCONN, backlog=1 keeps all connections except the first waiting.
I have written a program in golang to make request about 2000qps to different remote ip with local port randomly selected by linux, and close request immediately after connection established, but still encounter bind: address already in use error periodically
what I have done:
net.ipv4.ip_local_port_range is 15000-65535
net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_fin_timeout=30
above is sockstat:
sockets: used 1200 TCP: inuse 2302 orphan 1603 tw 40940 alloc 2325 mem 201
I don't figure it out why this error still there with kernel selecting available local port,will kernel return a port in use ?
This is a good answer from 2012:
https://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1#434669
As of 2018, tcp_tw_recycle exists only in the sysctl binary, is otherwise gone from the kernel:
https://github.com/torvalds/linux/search?utf8=%E2%9C%93&q=tcp_tw_recycle&type=
tcp_tw_reuse is still in use as described in the above answer:
https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_ipv4.c#L128
However, while a TCP_TIMEWAIT_LEN is in use:
https://github.com/torvalds/linux/search?utf8=%E2%9C%93&q=TCP_TIMEWAIT_LEN&type=
the value is hardcoded:
https://github.com/torvalds/linux/blob/master/include/net/tcp.h#L120
and tcp_fin_timeout refers to a different state:
https://github.com/torvalds/linux/blob/master/Documentation/networking/ip-sysctl.txt#L294
One can relatively safely change the local port range to 1025-65535.
For kicks, if there were a situation where this client was talking to servers and network under my control, I would build a new kernel with a not-to-spec TCP_TIMEWAIT_LEN, and perhaps also fiddle with tcp_max_tw_buckets:
https://github.com/torvalds/linux/blob/master/Documentation/networking/ip-sysctl.txt#L379
But doing so in other circumstances- if this client is behind a NAT and talking to common public servers- will likely be disruptive.
I am struggling to identify the reason for gen_tcp:accept always returning an {error, closed} response.
Essentially, I have a supervisor that creates a listening socket:
gen_tcp:listen(8081, [binary, {packet, 0}, {active, false}, {reuseaddr, true}]),
This socket is then passed to a child, which is an implementation of the gen_server behaviour. The child then accepts connections on the socket.
accept(ListeningSocket, {ok, Socket}) ->
spawn(fun() -> loop(Socket) end),
accept(ListeningSocket);
accept(_ListeningSocket, {error, Error}) ->
io:format("Unable to listen on socket: ~p.~n", [Error]),
gen_server:call(self(), stop).
accept(ListeningSocket) ->
accept(ListeningSocket, gen_tcp:accept(ListeningSocket)).
loop(Socket) ->
case gen_tcp:recv(Socket, 0) of
{ok, Data} ->
io:format("~p~n", [Data]),
process_request(Data),
gen_tcp:send(Socket, Data),
loop(Socket);
{error, closed} -> ok
end.
I load the supervisor and gen_server BEAM binaries locally, and load them on a another node (which runs on the same machine) via an RPC call to code:load_binary.
Next, I execute the supervisor via an RPC call, which in turn starts the server.{error, closed} is always returned by gen_tcp:accept in this scenario.
Should I run the supervisor and server while logged in to a node shell, then the server can accept connections without issue. This includes 'remsh' to the remote node that would fail to accept connections, had I previously RPCed it to start the server unsuccessfully.
I seem to be able to replicate the issue by using the shell alone:
[Terminal 1]: erl -sname node -setcookie abc -distributed -noshell
[Terminal 2]: erl -sname rpc -setcookie abc:
net_adm:ping('node#verne').
{ok, ListeningSocket} = rpc:call('node#verne', gen_tcp, listen, [8081, [binary, {packet, 0}, {active, true}, {reuseaddr, true}]]).
rpc:call('node#verne', gen_tcp, accept, [ListeningSocket]).
The response to the final RPC is {error, closed}.
Could this be something to do with socket/port ownership?
In case it is of help, there are no clients waiting to connect, and I don't set timeouts anywhere.
Each rpc:call starts a new process on the target node to handle the request. In your final example, your first call creates a listen socket within such a process, and when that process dies at the end of the rpc call, the socket is closed. Your second rpc call to attempt an accept therefore fails due to the already-closed listen socket.
Your design seems unusual in several ways. For example, it's not normal to have supervisors opening sockets. You also say the child is a gen_server yet you show a manual recv loop, which if run within a gen_server would block it. You might instead explain what you're trying to accomplish and request help on coming up with a design to meet your goals.
UPDATE:
It's not memcached, it's a lot of sockets in TIME_WAIT state:
% ss -s
Total: 2494 (kernel 2784)
TCP: 43323 (estab 2314, closed 40983, orphaned 0, synrecv 0, timewait 40982/0), ports 16756
BTW, I have modified previous version (below) to use Brad Fitz's memcache client and to reuse the same memcache connection:
http://dpaste.com/1387307/
OLD VERSION:
I have thrown together the most basic webserver in Go that has handler function doing only one thing:
retrieving a key from memcached
sending it as http response to client
Here's the code: http://dpaste.com/1386559/
The problem is I'm getting a lot of connection resets on memcached:
2013/09/18 20:20:11 http: panic serving [::1]:19990: dial tcp 127.0.0.1:11211: connection reset by peer
goroutine 20995 [running]:
net/http.func·007()
/usr/local/go/src/pkg/net/http/server.go:1022 +0xac
main.maybe_panic(0xc200d2e570, 0xc2014ebd80)
/root/go/src/http_server.go:19 +0x4d
main.get_memc_val(0x615200, 0x7, 0x60b5c0, 0x6, 0x42ee58, ...)
/root/go/src/http_server.go:25 +0x64
main.func·001(0xc200149b40, 0xc2017b3380, 0xc201888b60)
/root/go/src/http_server.go:41 +0x35
net/http.HandlerFunc.ServeHTTP(0x65e950, 0xc200149b40, 0xc2017b3380, 0xc201888b60)
/usr/local/go/src/pkg/net/http/server.go:1149 +0x3e
net/http.serverHandler.ServeHTTP(0xc200095410, 0xc200149b40, 0xc2017b3380, 0xc201888b60)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc201b9b2d0)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266
I have taken care to set Linux kernel networking in such way as not to get in the way (turning off SYN flooding protection etc).
...
...
And yet on testing with "ab" (below) I'm getting those errors.
ab -c 1000 -n 50000 "http://localhost:8000/"
There is no sign whatsoever anywhere I looked that it's the kernel (dmesg, /var/log).
I would guess that is because you are running out of sockets - you never close the memc here. Check with netstat while your program is running.
func get_memc_val(k string) []byte {
memc, err := gomemcache.Connect(mc_ip, mc_port)
maybe_panic(err)
val, _, _ := memc.Get(k)
return val
}
I'd use this go memcache interface if I were you - it was written by the author of memcached who now works for Google on Go related things.
Try memcache client from YBC library. Unlike gomemcache it opens and re-uses only a few connections to memcache server irregardless of the number of concurrent requests issued via the client. It achieves high performance by pipelining concurrent requests over a small number of open connections to the memcache server.
The number of connections to the memcache server can configured via ClientConfig.ConnectionsCount.