Writing parallel TCP server in erlang - sockets

"Programming Erlang Software for a Concurrent World" says to write a parallel TCP server do like this:
start_parallel_server() ->
{ok, Listen} = gen_tcp:listen(...),
spawn(fun() -> par_connect(Listen) end).
par_connect(Listen) ->
{ok, Socket} = gen_tcp:accept(Listen),
spawn(fun() -> par_connect(Listen) end),
loop(Socket).
loop(...) -> %% handle request here
When start_parallel_server finishes its work it will close listen socket. Shouldn't we add something like timer:sleep(infinity) at the end of it?

If you run start_parallel_server() from the shell the shell process will own the listening socket, so it will stay alive as long as that shell process is alive. Note that the shell process dies on exceptions and a new shell process is respawned… Can cause confusion.
But if you e.g. spawn a new process that in turn calls the start_parallel_server() function you will need a sleep in that spawned process to keep it alive.

Moreover, for real world applications https://github.com/extend/ranch is more suitable.

Related

No output from erlang tracer

I've got a module my_api with a function which is callback for cowboy's requests handle/2,
So when I make some http requests like this:
curl http://localhost/test
to my application this function is called and it's working correctly because I get a response in the terminal.
But in another terminal I attach to my application with remsh and try to trace calls to that function with a dbg module like this:
dbg:tracer().
dbg:tp(my_api, handle, 2, []).
dbg:p(all, c).
I expected that after in another terminal I make a http request to my api, the function my_api:handle/2 is called and I get some info about this call (at least function arguments) in the attached to the node terminal but I get nothing in there. What am I missing?
When you call dbg:tracer/0, a tracer of type process is started with a message handler that sends all trace messages to the user I/O device. Your remote shell's group leader is independent of the user I/O device, so your shell doesn't receive the output sent to user.
One approach to allow you to see trace output is to set up a trace port on the server and a trace client in a separate node. If you want traces from node foo, first remsh to it:
$ erl -sname bar -remsh foo
Then set up a trace port. Here, we set up a TCP/IP trace port on host port 50000 (use any port you like as long as it's available to you):
1> dbg:tracer(port, dbg:trace_port(ip, 50000)).
Next, set up the trace parameters as you did before:
2> dbg:tp(my_api, handle, 2, []).
{ok, ...}
3> dbg:p(all, c).
{ok, ...}
Then exit the remsh, and start a node without remsh:
$ erl -sname bar
On this node, start a TCP/IP trace client attached to host port 50000:
1> dbg:trace_client(ip, {"localhost", 50000}).
This shell will now receive dbg trace messages from foo. Here, we used "localhost" as the hostname since this node is running on the same host as the server node, but you'll need to use a different hostname if your client is running on a separate host.
Another approach, which is easier but relies on an undocumented function and so might break in the future, is to remsh to the node to be traced as you originally did but then use dbg:tracer/2 to send dbg output to your remote shell's group leader:
1> dbg:tracer(process, {fun dbg:dhandler/2, group_leader()}).
{ok, ...}
2> dbg:tp(my_api, handle, 2, []).
{ok, ...}
3> dbg:p(all, c).
{ok, ...}
Since this relies on the dbg:dhandler/2 function, which is exported but undocumented, there's no guarantee it will always work.
Lastly, since you're tracing all processes, please pay attention to the potential problems described in the dbg man page, and always be sure to call dbg:stop_clear(). when you're finished tracing.

Haskell tcp server, fd is too big error

I've been trying to write a Haskell server for a Go client. For the Haskell TCP server I'm simply using Network.Socket. Whenever I try to run hWaitForInput, I am getting this error:
fdReady: fd is too big.
Here is the server code -
connHandler :: (Socket, SockAddr) -> IO()
connHandler (sock, _) = do
putStrLn "Starting Handler"
handle <- socketToHandle sock ReadWriteMode
hSetBuffering handle LineBuffering
hPutStrLn handle "Hello Client!"
putStrLn "Waiting for Input"
success <- hWaitForInput handle (1000*10)
putStrLn "Wait done"
if success
then do
putStrLn "Client timed out"
else do
msg <- hGetLine handle
putStrLn msg
hClose handle
The Go client is receiving and printing the server's message("Hello Client!") but the haskell server throws the error right after printing "Waiting for Input"
You aren't doing anything wrong. The specific error message you're seeing only shows up with GHC >=8.0.2 running on Windows and represents a bug/limitation in an internal GHC function fdReady that they've tried to address on non-Windows architectures but have left unfixed on Windows. (Don't feel too jealous, though -- the "fix" on non-Windows architectures is currently broken and crashes, too.) Trying an earlier version of GHC probably wouldn't help -- it would still cause an error, but the error message would be different.
Here's the problem: on Windows, the internal function fdReady uses the select() system call to poll file descriptors for sockets, and select is limited to a certain maximum numerical value for the file descriptors it can poll. It looks like the Windows default for this value is quite low (64) but can be increased at compile time (the time GHC is compiled, unfortunately, not the time when GHC compiles your program).
If you add the line:
hShow handle >>= putStrLn
just before your hWaitForInput, you should see some debug info printed for the socket, including something like loc=<socket: nnn> where nnn is the file descriptor. This may help you verify that you're seeing a file descriptor greater than 64 that's causing the problem.
If this is the case, I would suggest filing a GHC bug to see if you can get it fixed.
As an alternative/workaround, you could try reading a line in one thread and a timer in another thread.
putStrLn "Waiting for Input"
msgMVar <- newEmptyMVar
tid <- forkIO $ hGetLine handle >>= putMVar msgMVar
maybeExn <- waitTimeout tid (1000*10)
case maybeExn of
Nothing -> do
killThread tid
putStrLn "Client timed out"
Just (Just _) ->
putStrLn "Exception"
_ -> do
msg <- takeMVar msgMVar
putStrLn msg
hClose handle
This does have different behavior (can time out in the middle of reading a line) than your code (never times out if a single byte can be read, even if the line is not complete), though.

Unable to accept connections on socket, when creating sockets on remote node via RPC in Erlang

I am struggling to identify the reason for gen_tcp:accept always returning an {error, closed} response.
Essentially, I have a supervisor that creates a listening socket:
gen_tcp:listen(8081, [binary, {packet, 0}, {active, false}, {reuseaddr, true}]),
This socket is then passed to a child, which is an implementation of the gen_server behaviour. The child then accepts connections on the socket.
accept(ListeningSocket, {ok, Socket}) ->
spawn(fun() -> loop(Socket) end),
accept(ListeningSocket);
accept(_ListeningSocket, {error, Error}) ->
io:format("Unable to listen on socket: ~p.~n", [Error]),
gen_server:call(self(), stop).
accept(ListeningSocket) ->
accept(ListeningSocket, gen_tcp:accept(ListeningSocket)).
loop(Socket) ->
case gen_tcp:recv(Socket, 0) of
{ok, Data} ->
io:format("~p~n", [Data]),
process_request(Data),
gen_tcp:send(Socket, Data),
loop(Socket);
{error, closed} -> ok
end.
I load the supervisor and gen_server BEAM binaries locally, and load them on a another node (which runs on the same machine) via an RPC call to code:load_binary.
Next, I execute the supervisor via an RPC call, which in turn starts the server.{error, closed} is always returned by gen_tcp:accept in this scenario.
Should I run the supervisor and server while logged in to a node shell, then the server can accept connections without issue. This includes 'remsh' to the remote node that would fail to accept connections, had I previously RPCed it to start the server unsuccessfully.
I seem to be able to replicate the issue by using the shell alone:
[Terminal 1]: erl -sname node -setcookie abc -distributed -noshell
[Terminal 2]: erl -sname rpc -setcookie abc:
net_adm:ping('node#verne').
{ok, ListeningSocket} = rpc:call('node#verne', gen_tcp, listen, [8081, [binary, {packet, 0}, {active, true}, {reuseaddr, true}]]).
rpc:call('node#verne', gen_tcp, accept, [ListeningSocket]).
The response to the final RPC is {error, closed}.
Could this be something to do with socket/port ownership?
In case it is of help, there are no clients waiting to connect, and I don't set timeouts anywhere.
Each rpc:call starts a new process on the target node to handle the request. In your final example, your first call creates a listen socket within such a process, and when that process dies at the end of the rpc call, the socket is closed. Your second rpc call to attempt an accept therefore fails due to the already-closed listen socket.
Your design seems unusual in several ways. For example, it's not normal to have supervisors opening sockets. You also say the child is a gen_server yet you show a manual recv loop, which if run within a gen_server would block it. You might instead explain what you're trying to accomplish and request help on coming up with a design to meet your goals.

Erlang accepting SSL connection is really slow (comparing to C++)

I'm currently testing extreme condition on a piece of code written with Erlang.
I have implemented learnyousomeerlang.com's technique of supervisor to have multiple accept capability.
Here the code slightly modified to handle SSL connections of the supervisor:
-module(mymodule).
-behaviour(supervisor).
-export([start/0, start_socket/0]).
-define(SSL_OPTIONS, [{active, true},
{mode, list},
{reuseaddr, true},
{cacertfile, "./ssl_key/server/gd_bundle.crt"},
{certfile, "./ssl_key/server/cert.pem"},
{keyfile, "./ssl_key/server/key.pem"},
{password, "********"}
]).
-export([init/1]).
start_link() ->
application:start(crypto),
crypto:start(),
application:start(public_key),
application:start(ssl),
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, LSocket} = ssl:listen(4242, ?SSL_OPTIONS),
spawn_link(fun empty_listeners/0),
{ok, {{simple_one_for_one, 60, 3600},
[{socket,
{mymodule_serv, start_link, [LSocket]}, % pass the socket!
temporary, 1000, worker, [mymodule_serv]}
]}}.
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,100)],
ok.
start_socket() ->
supervisor:start_child(?MODULE, []).
Here's the code for gen_server which will represent every client connecting :
-module(mymodule_serv).
-behaviour(gen_server).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, terminate/2, code_change/3, handle_info/2]).
start_link(Socket) ->
gen_server:start_link(?MODULE, Socket, []).
init(Socket) ->
gen_server:cast(self(), accept),
{ok, #client{socket=Socket, pid=self()}}.
handle_call(_E, _From, Client) ->
{noreply, Client}.
handle_cast(accept, C = #client{socket=ListenSocket}) ->
{ok, AcceptSocket} = ssl:transport_accept(ListenSocket),
mymodule:start_socket(),
ssl:ssl_accept(AcceptSocket),
ssl:setopts(AcceptSocket, [{active, true}, {mode, list}]),
{noreply, C#client{socket=AcceptSocket, state=connecting}}.
[...]
I have the ability to launch close to 10.000 connections at once from multiple server.
While it will take 10 second to a ssl accepting bit of C++ code to accept all of them (which don't even have multiple accept pending), in Erlang this is quite different. It will accept at most 20 connections a second (according to netstat info, whilst C++ accept more like 1K connection per seconds)
While the 10K connections are awaiting for acceptance, I'm manually trying to connect as well.
openssl s_client -ssl3 -ign_eof -connect myserver.com:4242
3 cases happen when I do :
Connection simply timeout
Connection will connect after waiting for it 30 sec. at least
Connection will occur almost directly
When I try connecting manually with 2 consoles, the first done handshaking will not always be the first which tried to connect... Which I found particular.
The server configuration is :
2 x Intel® Xeon® E5620
8x 2.4GHz
24 Go RAM
I'm starting the Erlang shell with :
$erl +S 8:8
EDIT 1:
I have even tried to accept the connection with gen_tcp, and upgrading afterwards the connection to a SSL one. Still the same issue, it won't accept more than 10 connections a second... Is ssl:ssl_accept is doing this ? does it lock anything that would prevent Erlang to scale this ?
EDIT 2:
After looking around on other SSL server created in erlang, it seems that they use some kind of driver for SSL/TLS connection, my examples are RabbitMQ and EjabberD.
Nowhere there is ssl:ssl_accept in their Erlang code, I haven't investigate a lot, but it seems they have created their own driver in order to upgrade the TCP Socket to a SSL/TLS one.
Is that because there is an issue with Erlang's module SSL ? Does anyone know why they are using custom driver for SSL/TLS ?
Any thoughts on this ?
Actually it was not the SSL accept or handshake that was slowing the whole thing.
We found on the erlang question list that it was the backlog.
Backlog is set to 5 by default. I have set it to SOMAXCONN and everything works fine now !

Erlang catch disconnect client

I have tcp server written in erlang and command handler. If client connect to my server, and then closing how can i catch network disconnect?
I presume u are using vanilla gen_tcp to implement your server.
In which case, acceptor process (the process you pass the Socket to) will receive a {tcp_closed, Socket} message when the socket is closed from the client end.
sample code from the erlang gen_tcp documentation.
start(LPort) ->
case gen_tcp:listen(LPort,[{active, false},{packet,2}]) of
{ok, ListenSock} ->
spawn(fun() -> server(LS) end);
{error,Reason} ->
{error,Reason}
end.
server(LS) ->
case gen_tcp:accept(LS) of
{ok,S} ->
loop(S),
server(LS);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
inet:setopts(S,[{active,once}]),
receive
{tcp,S,Data} ->
Answer = do_something_with(Data),
gen_tcp:send(S,Answer),
loop(S);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
are you using a separate linked process to handle commands from each client?
if so you can think of trapping exits in the main process...