memcached apparently resetting connections - memcached

UPDATE:
It's not memcached, it's a lot of sockets in TIME_WAIT state:
% ss -s
Total: 2494 (kernel 2784)
TCP: 43323 (estab 2314, closed 40983, orphaned 0, synrecv 0, timewait 40982/0), ports 16756
BTW, I have modified previous version (below) to use Brad Fitz's memcache client and to reuse the same memcache connection:
http://dpaste.com/1387307/
OLD VERSION:
I have thrown together the most basic webserver in Go that has handler function doing only one thing:
retrieving a key from memcached
sending it as http response to client
Here's the code: http://dpaste.com/1386559/
The problem is I'm getting a lot of connection resets on memcached:
2013/09/18 20:20:11 http: panic serving [::1]:19990: dial tcp 127.0.0.1:11211: connection reset by peer
goroutine 20995 [running]:
net/http.func·007()
/usr/local/go/src/pkg/net/http/server.go:1022 +0xac
main.maybe_panic(0xc200d2e570, 0xc2014ebd80)
/root/go/src/http_server.go:19 +0x4d
main.get_memc_val(0x615200, 0x7, 0x60b5c0, 0x6, 0x42ee58, ...)
/root/go/src/http_server.go:25 +0x64
main.func·001(0xc200149b40, 0xc2017b3380, 0xc201888b60)
/root/go/src/http_server.go:41 +0x35
net/http.HandlerFunc.ServeHTTP(0x65e950, 0xc200149b40, 0xc2017b3380, 0xc201888b60)
/usr/local/go/src/pkg/net/http/server.go:1149 +0x3e
net/http.serverHandler.ServeHTTP(0xc200095410, 0xc200149b40, 0xc2017b3380, 0xc201888b60)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc201b9b2d0)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266
I have taken care to set Linux kernel networking in such way as not to get in the way (turning off SYN flooding protection etc).
...
...
And yet on testing with "ab" (below) I'm getting those errors.
ab -c 1000 -n 50000 "http://localhost:8000/"
There is no sign whatsoever anywhere I looked that it's the kernel (dmesg, /var/log).

I would guess that is because you are running out of sockets - you never close the memc here. Check with netstat while your program is running.
func get_memc_val(k string) []byte {
memc, err := gomemcache.Connect(mc_ip, mc_port)
maybe_panic(err)
val, _, _ := memc.Get(k)
return val
}
I'd use this go memcache interface if I were you - it was written by the author of memcached who now works for Google on Go related things.

Try memcache client from YBC library. Unlike gomemcache it opens and re-uses only a few connections to memcache server irregardless of the number of concurrent requests issued via the client. It achieves high performance by pipelining concurrent requests over a small number of open connections to the memcache server.
The number of connections to the memcache server can configured via ClientConfig.ConnectionsCount.

Related

Simulate a TCP tarpit

I want to simulate test cross-platform connection failures / timeouts, starting with blocking connect()s:
#!/usr/bin/python3
import socket
s = socket.socket()
endpoint = ('localhost', 28813)
s.bind((endpoint))
# listen for connections, accept 0 connections kept waiting (backlog)
# all other connect()s should block indefinitely
s.listen(0)
for i in range(1,1000):
c = socket.socket()
c.connect(endpoint)
# print number of successfully connected sockets
print(i)
On Linux, it prints "1" and hangs indefinitely (i.e. the behavior I want).
On Windows (Server 2012), it prints "1" and aborts with a ConnectionRefusedError.
On macOS, it prints all numbers from 1 to 128 and then hangs indefinitely.
Thus, I could accept the macOS ignores the backlog parameter and just connect enough sockets for clients to block on new connections.
How can I get Windows to also block connect() attempts?
On Windows, the SO_CONDITIONAL_ACCEPT socket option allows the application to have the incoming connections wait until it's accept()ed. The constant (SO_CONDITIONAL_ACCEPT=0x3002) isn't exposed in the Python module, but can be supplied manually:
s.bind(endpoint)
s.setsockopt(socket.SOL_SOCKET, 0x3002, 1)
s.listen(0)
It's so effective that even the first connect is kept waiting.
On macOS, backlog=0 is reset to backlog=SOMAXCONN, backlog=1 keeps all connections except the first waiting.

how can I make large number of connections without error at client side

I have written a program in golang to make request about 2000qps to different remote ip with local port randomly selected by linux, and close request immediately after connection established, but still encounter bind: address already in use error periodically
what I have done:
net.ipv4.ip_local_port_range is 15000-65535
net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_fin_timeout=30
above is sockstat:
sockets: used 1200 TCP: inuse 2302 orphan 1603 tw 40940 alloc 2325 mem 201
I don't figure it out why this error still there with kernel selecting available local port,will kernel return a port in use ?
This is a good answer from 2012:
https://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1#434669
As of 2018, tcp_tw_recycle exists only in the sysctl binary, is otherwise gone from the kernel:
https://github.com/torvalds/linux/search?utf8=%E2%9C%93&q=tcp_tw_recycle&type=
tcp_tw_reuse is still in use as described in the above answer:
https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_ipv4.c#L128
However, while a TCP_TIMEWAIT_LEN is in use:
https://github.com/torvalds/linux/search?utf8=%E2%9C%93&q=TCP_TIMEWAIT_LEN&type=
the value is hardcoded:
https://github.com/torvalds/linux/blob/master/include/net/tcp.h#L120
and tcp_fin_timeout refers to a different state:
https://github.com/torvalds/linux/blob/master/Documentation/networking/ip-sysctl.txt#L294
One can relatively safely change the local port range to 1025-65535.
For kicks, if there were a situation where this client was talking to servers and network under my control, I would build a new kernel with a not-to-spec TCP_TIMEWAIT_LEN, and perhaps also fiddle with tcp_max_tw_buckets:
https://github.com/torvalds/linux/blob/master/Documentation/networking/ip-sysctl.txt#L379
But doing so in other circumstances- if this client is behind a NAT and talking to common public servers- will likely be disruptive.

Haskell: Testing connection availability N times with a delay (scotty to mongodb)

I have a stupid problem with scotty web app and mongodb service starting in the right order.
I use systemd to start mongodb first and then the scotty web app. It does not work for some reason. The app errors out with connect: does not exist (Connection refused) from the mongodb driver meaning that the connection is not ready.
So my question. How can I test the connection availability say three times with 0.5s interval and only then error out?
This is the application main function
main :: IO ()
main = do
pool <- createPool (runIOE $ connect $ host "127.0.0.1") close 1 300 5
clearSessions pool
let r = \x -> runReaderT x pool
scottyT 3000 r r basal
basal :: ScottyD ()
basal = do
middleware $ staticPolicy (noDots >-> addBase "static")
notFound $ runSession
routes
Although the app service is ordered after mongodb service the connection to mongodb is still unavailable during the app start up. So I get the above mentioned error.
This is the systemd service file to avoid questions regarding the correct service ordering.
[Unit]
Description=Basal Web Application
Requires=mongodb.service
After=mongodb.service iptables.service network-online.target
[Service]
User=http
Group=http
WorkingDirectory=/srv/http/basal/
ExecStart=/srv/http/basal/bin/basal
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
I don't know why connection to mongodb is not available given the correct service order.
So I want to probe connection availability withing haskell code three times with 0.5s delay and then error out. How can I do it?
Thanks.
I guess from the functions you're using that you're using something like mongoDB 1.5.0.
Here, connect returns something in the IOE monad, which is an alias for ErrorTIOErrorIO.
So the best approach is to use the retrying mechanisms ErrorT offers. As it's an instance of MonadPlus, we can just use mplus if we don't care about checking for the specific error:
retryConnect :: Int -> Int -> Host -> IOE Pipe
retryConnect retries delayInMicroseconds host
| retries > 0 =
connect host `mplus`
(liftIO (threadDelay delayInMicroseconds) >>
retryConnect (retries - 1) delayInMicroseconds host)
| otherwise = connect host
(threadDelay comes from Control.Concurrent).
Then replace connect with retryConnect 2 500000 and it'll retry twice after the first failure with a 500,000 microsecond gap (i.e. 0.5s).
If you do want to check for a specific error, then use catchError instead and inspect the error to decide whether to swallow it or rethrow it.

Enyim.Caching.Memcached - Failed to read from Socket

I'm currently building an environment for deploying a web.application.
The Web.Application uses Enyim.Caching.
There looks to be an issues with the sockets
I'm unfamiliar with membase server, if there is any additional information that I can include in this post please ask...
Any suggestions on what I can check would be greatly appreciated:
Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl - Pool has been inited for 127.0.0.1:11212 with 10 sockets`
Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl - Acquiring stream from pool. 127.0.0.1:11212`
Enyim.Caching.Memcached.PooledSocket - Socket 86101442-5fc2-4169-bba2-9f25f1647254 was reset
Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl - Socket was reset. 86101442-5fc2-4169-bba2-f25f1647254
Enyim.Caching.Memcached.MemcachedNode - System.IO.IOException: Failed to read from the socket '127.0.0.1:11212'. Error: ?
at Enyim.Caching.Memcached.PooledSocket.BasicNetworkStream.Read(Byte[] buffer, Int32 offset, Int32 count) in d:\d\repo\EnyimMemcached\Enyim.Caching\Memcached\BasicNetworkStream.cs:line 92
at System.IO.BufferedStream.ReadByte()
at Enyim.Caching.Memcached.PooledSocket.ReadByte() in
Enyim uses port 11211 by default. It looks like you are trying 11212 instead, try changing to 11211.

Erlang accepting SSL connection is really slow (comparing to C++)

I'm currently testing extreme condition on a piece of code written with Erlang.
I have implemented learnyousomeerlang.com's technique of supervisor to have multiple accept capability.
Here the code slightly modified to handle SSL connections of the supervisor:
-module(mymodule).
-behaviour(supervisor).
-export([start/0, start_socket/0]).
-define(SSL_OPTIONS, [{active, true},
{mode, list},
{reuseaddr, true},
{cacertfile, "./ssl_key/server/gd_bundle.crt"},
{certfile, "./ssl_key/server/cert.pem"},
{keyfile, "./ssl_key/server/key.pem"},
{password, "********"}
]).
-export([init/1]).
start_link() ->
application:start(crypto),
crypto:start(),
application:start(public_key),
application:start(ssl),
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, LSocket} = ssl:listen(4242, ?SSL_OPTIONS),
spawn_link(fun empty_listeners/0),
{ok, {{simple_one_for_one, 60, 3600},
[{socket,
{mymodule_serv, start_link, [LSocket]}, % pass the socket!
temporary, 1000, worker, [mymodule_serv]}
]}}.
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,100)],
ok.
start_socket() ->
supervisor:start_child(?MODULE, []).
Here's the code for gen_server which will represent every client connecting :
-module(mymodule_serv).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, terminate/2, code_change/3, handle_info/2]).
start_link(Socket) ->
gen_server:start_link(?MODULE, Socket, []).
init(Socket) ->
gen_server:cast(self(), accept),
{ok, #client{socket=Socket, pid=self()}}.
handle_call(_E, _From, Client) ->
{noreply, Client}.
handle_cast(accept, C = #client{socket=ListenSocket}) ->
{ok, AcceptSocket} = ssl:transport_accept(ListenSocket),
mymodule:start_socket(),
ssl:ssl_accept(AcceptSocket),
ssl:setopts(AcceptSocket, [{active, true}, {mode, list}]),
{noreply, C#client{socket=AcceptSocket, state=connecting}}.
[...]
I have the ability to launch close to 10.000 connections at once from multiple server.
While it will take 10 second to a ssl accepting bit of C++ code to accept all of them (which don't even have multiple accept pending), in Erlang this is quite different. It will accept at most 20 connections a second (according to netstat info, whilst C++ accept more like 1K connection per seconds)
While the 10K connections are awaiting for acceptance, I'm manually trying to connect as well.
openssl s_client -ssl3 -ign_eof -connect myserver.com:4242
3 cases happen when I do :
Connection simply timeout
Connection will connect after waiting for it 30 sec. at least
Connection will occur almost directly
When I try connecting manually with 2 consoles, the first done handshaking will not always be the first which tried to connect... Which I found particular.
The server configuration is :
2 x Intel® Xeon® E5620
8x 2.4GHz
24 Go RAM
I'm starting the Erlang shell with :
$erl +S 8:8
EDIT 1:
I have even tried to accept the connection with gen_tcp, and upgrading afterwards the connection to a SSL one. Still the same issue, it won't accept more than 10 connections a second... Is ssl:ssl_accept is doing this ? does it lock anything that would prevent Erlang to scale this ?
EDIT 2:
After looking around on other SSL server created in erlang, it seems that they use some kind of driver for SSL/TLS connection, my examples are RabbitMQ and EjabberD.
Nowhere there is ssl:ssl_accept in their Erlang code, I haven't investigate a lot, but it seems they have created their own driver in order to upgrade the TCP Socket to a SSL/TLS one.
Is that because there is an issue with Erlang's module SSL ? Does anyone know why they are using custom driver for SSL/TLS ?
Any thoughts on this ?
Actually it was not the SSL accept or handshake that was slowing the whole thing.
We found on the erlang question list that it was the backlog.
Backlog is set to 5 by default. I have set it to SOMAXCONN and everything works fine now !