how do I write my own production web server?

how do I write my own production web server? - select

I am making a unix ssl server/client. So far I have implemented FD_SET with select to handle all connections concurrently in one master server process. However due to __FD_SETSIZE the number of clients can only be 1024. I need to increase the number of clients and efficiency of the server. Changing the __FD_SETSIZE has potential problems (apparently?) so I am stuck.
So far the network includes: errno.h detection, signal detection -> atomic handling, fd_set -> select(), successful stream socket based communication.
I would really appreciate it if someone can tell me what should I do? do I fork() after 1024 (which presents its own problems, if its even doable?) do I implement threads to handle each client request, or just client data or both?
What is the best network architecture in your opinion? keep in mind its a socket stream based connection that is meant to handle as much punishment as possible and allowing as many clients to the server as possible.

Don't write your own production web server.
There are too many open source servers out there all written by people who know more about high connectivity and SSL than you do. They also have the advantage of being tested to a degree that you'd never be able to accomplish with your homebrew server.

Related

How can I automatically test a networking (TCP/IP) application?

I teach students to develop network applications, both clients and servers. At this moment, we have not yet touched existing protocols such as HTTP, SMTP, etc. The students write very simple programs on top of the plain socket API. Currently I check a students' work manually, but I want to automate this task and create an automated test bench for networking applications. The most interesting topics for testing are:
Breaking TCP segments into small parts and delivering them with a noticeable delay. A reason I need such test is that students usually just issue a read/recv call and process the received data without checking that all necessary data was received. TCP doesn't guarantee the message boundaries, so in certain circumstances it is necessary to make several read/recv calls. The problem is that in most simple network applications (for example, in a chat application) messages are small and fit into the single TCP segment, so the issue doesn't appear. My idea is to artificially break messages into several small TCP segments (i.e. several bytes of data) so the problem will appear.
Pausing the data transfer for some time to simulate multiple slow clients and check that the multithreading/async sockets are implemented properly in the students' servers.
Resetting a connection in random moments of time.
I've found several systems which simulate a bad network (dummynet, clumsy, netem). Hovewer, they all work on the IP level of the stack, so OS and it's TCP implementation will compensate the data loss. Such systems are able to solve the task number 2, but they are not able to solve tasks 1 and 3. So I think that I need to develop my own solution, which will act as a TCP proxy. My questions are:
Maybe the are any libraries or applications which can (at least partially) solve the given tasks, so I'll be able to use them as a base for my own solution?
In case there is none any suitable existing software projects, maybe there are any ideas and approaches about how to do this properly?

From WireShark mailing list - Creating and Modifying Packets:
...There's a "Tools" page on the Wireshark Wiki:
http://wiki.wireshark.org/Tools
which has a "Traffic generators" section:
https://wiki.wireshark.org/Tools#Traffic_generators
which lists some tools that might be useful...
The "Traffic generators" chapter also mentions another collection of traffic generators

If you write your own socket code, you can address all 3 tasks.
enable the socket's TCP_NODELAY option (disable the Nagle Algorithm for Send Coalescing) via setsockopt(), then you can send() small fragments of data as you wish, optionally with a delay in between (see #2).
simply put a delay in between your send() calls.
use setsockopt() to adjust the socket's SO_LINGER and SO_DONTLINGER options to control whether closing the socket performs an abortive or graceful closure, then simply close the socket at some random interval after the connection is established.

server push for millions of concurrent connections

I am building a distributed system that consists of potentially millions of clients which all need to keep an open (preferrably HTTP) connection to wait for a command from the server (which is running somewhere else). The load of messages / commmands will not be very high, maybe one message / sec / 1000 clients which means it would be 1000 msg/sec # 1 million clients. => it's basically about the concurrent connections.
The requirements are simple too. One way messaging (server->client), only 1 client per "channel".
I am pretty open in terms of technology (xmpp / websockets / comet / ...). I am using Google App Engine as server, but their "channels" won't work for me unfortunately (too low quotas and no Java client). XMPP was an option but is quite expensive. So far I was using URL Fetch & pubnub, but they just started charging for connections (big time).
So:
Does anyone know of a service out there that can do that for me in an affordable way? Most I have found restrict or heavily charge for connections.
Any experience with implementing such a server yourself? I have actually done that already and it works pretty well (based on Tomcat & NIO) but I haven't had the time yet to actually set up a large load test environment (partially because this is still a fallback solution, I'd prefer a battle hardened msg server). Any experience to how many users you get per GB? Any hard limits?
My architecture also allows to fragment the msg servers, but I'd like to maximize the concurrent connections because the msg processing CPU overhead is minimal.

I have meanwhile implemented my own message server using netty.io. Netty makes use of Java NIO and scales extremely well. For idle connections I get a memory footprint of 500 bytes per connection. I am doing only very simple message forwarding (no caching, storage or other fancy stuff) but with that am easily getting 1000 - 1500 msg / sec (each half a KB) on the small amazon instance (1ECU / 1.6GB).
Otherwise if you are looking for a (paid) service then I can recommend spire.io (they do not charge for connections but have a higher price per message) or pubnub (they do charge for connections but are cheaper per message).

You have to look more in architecture of making such environment.
First of all, if you will write sockets management by yourself, then don't use Thread per Client Socket. Use Asynchronous methods for receiving and sending data.
WebSockets might be too heavy if your messages are small. Because it implements framing, which has to be applied to each message for each socket individually (caching can be used for different versions of WebSockets protocols), that makes them slower to process both directions: for receive and for send, especially because of data masking.
It is possible to create millions of sockets, but only most advanced technologies are capable to do so. Erlang is able to handle millions connections, and is pretty scalable.
If you would like to have millions of connections using other higher level technologies, then you need to think about clustering of what you are trying to accomplish.
For example using gateway server that will keep track of all processing servers. And have data of them (IP, ports, load (if it will be one internal network, firewalling and port forwarding might be handy here).
Client software connects to that gateway server, gateway server checks the least loaded server and sends ip and port to client. Client creates connection directly to working server using provided address.
That way you will have gateway which as well can handle authorization, and wont hold connections for long, so one of them might be enough. And many workers that are doing publishing of data and keeping connections.
This is very related to your needs, and might not be suitable for your solutions.

simulate server load with BSD sockets

I'm using blocking TCP sockets in C and I want to simulate a high load on the server when there are many simultaneous connections and then I want to measure the time necessary to access the server via a browser during this high load time (the server understands HTTP headers).
Also each client request ends fast (sends a HTTP header - gets text).
How do I do this (without crashing my local machine -> I tried using fork to make many clients; also, I have a virtual machine too).
If anyone has an idea or some general directions about how to do this, it would mean a lot.
Edit: I need to run this with my own client, which uses a modified version of the OpenSSL library to connect to my SSL/TLS server, so I can't use external test tools.
I want to know how to build the client and the server. I don't know too much about other sockets than the blocking ones, I'm just skimming through the UNIX Network Programming book of Richard Stevens, but I was wondering if anyone could point out the exact solution.
Thank you !

The easiest resolution to this would be to download an existing stress testing framework such as fwptt ( http://fwptt.sourceforge.net/ ).
If you want to implemennt your own stress testing framework, I'd suggest you lose the blocking nature of your code and go with a parallel design that will scale beautifully. The limiting factor is pretty much your CPU then.
Having two physical servers would be ideal, so that then your stress test isn't affecting the CPU (and therefore the response times) of the server. Also that VM of yours drains up precious CPU time.

Per socket data transfer limitations. Are download managers useful yet?

The idea behind breaking up a download into multiple segments with different ranges is for increasing download speed. This works if the server has a per connection limit. A server without that limitation theoretically servers the same bytes with one or more connections.
My question is if download managers still speed up downloading from such a server or it's just a useless effort. In other words is there any limitations per TCP socket connection by default or not?

No. There are no limitations per socket. Most OS:es will try to share the bandwidth equally between all sockets unless QoS is specified.

While a server could throttle bandwidth usage per connection, they typically do not bother. If a response is big enough that it could be effectively throttled then it's about the same impact to slower clients if a fast download just completes sooner.
Splitting a download into pieces may actually hurt your client's performance because of the way TCP operates -- it has a "slow start" mechanism that reduces throughput on new connections.
Websites that implement throttling will typically do so between their various virtual hosts (so that the download site doesn't starve out a more interactive one) or will do so based on the remote IP address.
By far the primary benefit of a download manager is that it will simply continue the download if the connection gets broken.

Can socket connections be multiplexed?

Is it possible to multiplex sa ocket connection?
I need to establish multiple connections to yahoo messenger and i am looking for a way to do this efficiently without having to hold a socket open for each client connection.
so far i have to use one socket for each client and this does not scale well above 50,000 connections.
oh, my solution is for a TELCO, so i need to at least hit 250,000 to 500,000 connections
i'm planing to bind multiple IP addresses to a single NIC to beat the 65k port restriction per IP address.
Please i would any help, insight i can get.
**most of my other questions on this site have gone un-answered :) **
Thanks

This is an interesting question about scaling in a serious situation.
You are essentially asking, "How do I establish N connections to an internet service, where N is >= 250,000".
The only way to do this effectively and efficiently is to cluster. You cannot do this on a single host, so you will need to be able to fragment and partition your client base into a number of different servers, so that each is only handling a subset.
The idea would be for a single server to hold open as few connections as possible (spreading out the connectivity evenly) while holding enough connections to make whatever service you're hosting viable by keeping inter-server communication to a minimum level. This will mean that any two connections that are related (such as two accounts that talk to each other a lot) will have to be on the same host.
You will need servers and network infrastructure that can handle this. You will need a subnet of ip addresses, each server will have to have stateless communication with the internet (i.e. your router will not be doing any NAT in order to not have to track 250,000+ connections).
You will have to talk to AOL. There is no way that AOL will be able to handle this level of connectivity without considering cutting your connection off. Any service of this scale would have to be negotiated with AOL so both you and they would be able to handle the connectivity.
There are i/o multiplexing technologies that you should investigate. Kqueue and epoll come to mind.
In order to write this massively concurrent and teleco grade solution, I would recommend investigating erlang. Erlang is designed for situations such as these (multi-server, massively-multi-client, massively-multithreaded telecommunications grade software). It is currently used for running Ericsson telephone exchanges.

While you can listen on a socket for multiple incoming connection requests, when the connection is established, it connects a unique port on the server to a unique port on the client. In order to multiplex a connection, you need to control both ends of the pipe and have a protocol that allows you to switch contexts from one virtual connection to another or use a stateless protocol that doesn't care about the client's identity. In the former case you'd need to implement it in the application layer so that you could reuse existing connections. In the latter case you could get by using a proxy that keeps track of which server response goes to which client. Since you're connecting to Yahoo Messenger, I don't think you'll be able to do this since it requires an authenticated connection and it assumes that each connection corresponds to a single user.

You can only multiplex multiple connections over a single socket if the other end supports such an operation.
In other words it's a function protocol - sockets don't have any native support for it.
I doubt yahoo messenger protocol has any support for it.
An alternative (to multiple IPs on a single NIC) is to design your own multiplexing protocol and have satellite servers that convert from the multiplex protocol to the yahoo protocol.

I'll trow in another approach you could consider (depending on how desperate you are).
Note that operating system TCP/IP implementations need to be general purpose, but you are only interested in a very specific use-case. So it might make sense to implement a cut-down version of TCP/IP (which only handles your use-case, but does that very well) in your application code.
For example, if you are using Linux, you could route a couple of IP addresses to a tun interface and have your application handle the IP packets for that tun interface. That way you can implement TCP/IP (optimised for your use-case) entirely in your application and avoid any operating system restriction on the number of open connections.
Of course, it's quite a bit of work doing the TCP/IP yourself, but it really depends on how desperate you are - i.e. how much hardware can you afford to throw at the problem.

500,000 arbitrary yahoo messenger connections - is your telco doing this on behalf of Yahoo? It seems like whatever solution has been in place for many years now should be scalable with the help of Moore's Law - and as far as I know all the IM clients have been pretty effective for a long time, and there's no pressing increase in demand that I can think of.
Why isn't this a reasonable problem to address with hardware plus traditional solutions?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse