]> A multithreaded midrange web server on Linux

A multithreaded midrange web server on Linux

by Stephan K.H. Seidl

Version 1.2, Mon, 25 May 2020 12:00:15 +0200

Introduction

It happens that one has to provide a very special Internet service. Client computers that are or are not behind a NAT-capable router should be able to exchange information with some server. This server should be stateful for at least a limited period of time. Stateless servers are not taken into account here. The protocol used can, in our case, freely be chosen. There is no need to encrypt the data. The service provided should be stable and maintenance-free. The data rates are very low on average, but requests from anywhere in the world should be processed as soon as possible. What could the plan for such a project look like?

About the hardware. First-class providers today, 2020, offer virtual servers for rent at unbeatable prices. Second-rate providers can only offer root servers, which are then correspondingly expensive. So one will definitely try to get happy with a virtual server. Further, with a virtual server, all hardware problems are the responsibility of others.

To the software. If one is familiar with Linux, this operation system is certainly a good choice. Linux is extremely stable and can be configured to use very few resources for itself. In other words, a server software can use almost the whole server hardware without any problems. Linux on virtual servers is typically managed at the command line level. In our case, Debian 9 was chosen as the operation system.

With respect to the protocol, the following can be stated. The destination TCP port 80 is a port that client computers behind NAT routers normally reach unhindered. This port is therefore predestined to be selected as the server listen port here. And what does now belong to TCP port 80? Right, it is HTTP. So we use HTTP, too. And, if it is further decided to speak HTML in addition, the scenario gets perfect. No router will think in blocking our nice packets to the server and vice versa. And various browsers and programs such as wget and curl can then be used for debugging. So we let the server listen on TCP port 80, use HTTP 1.1 and pretend to speak HTML.

Next is to look for a server software. We find that there is no suitable candidate. So we have to write one. Said and done.

Establishing a properly functioning Internet service is not that easy. It is therefore sensible to publish a modern production code with all the bells and whistles one time to give beginners or interested parties a reasonable start.

To do

Currently nothing.

Change log

For the change log click Changelog.
Comments, bug reports, and better ideas are welcome.

Remarks

The file webserver.c (click here for download) contains the source code of a multithreaded midrange web server on Linux. This web server is stateful. Up to 256 clients are served in parallel. Within a limited period of time, each individual client initiates a synchronous handshake sequence, consisting of reserving a slot, exchanging certain information, asking a series of questions that the server must answer as quickly as possible, and finally releasing the server slot reserved before. The so-called cruncher found is the actual payload application that accepts and delivers a character string per call and slot. The cruncher works sequentially inside a slot, maintaining internal states, but is reentrant with respect to different slots. The cruncher can only be seen here as a stub. So if someone has a similar problem to solve, he will have a different payload application that is tailored to his needs, but will perhaps be able to use or reuse some of the ideas, algorithms and possibly also some of the lines of the present code.

Except for the data initialization phase at the beginning and the accept() system calls, which both run sequentially, the code follows the principles that massive parallelism requires. In particular, this means that the complexity of code passages within critical regions does not exceed the order O(log n), where n denotes the number of processing entities. So the reader can expect to find great algorithms in the code, such as iteratively working AVL tree insertion and deletion procedures, for example. Another very interesting algorithm is the 0-root scattering tree algorithm, which is suitable for every type of broadcast operation. The counterpart to this algorithm is not used here, because in our case the barriers basically do the same thing. One question could still arise here. Why is this only considered a midrange server? Because although the accept() system calls do almost nothing here, they are processed in succession by a single thread and can therefore easily become a bottleneck in extreme cases.

The code can be compiled and started by means of

  cc -O3 webserver.c -o webserver -lpthread
strip webserver
./webserver

The server will preferably run with root privileges. As an unprivileged user, entering

  curl http://localhost:12913

from another terminal of the same machine yields the GET request output, and entering

  d=""
d="${d}statelocator="
d="${d}00000000000000000000000000000000"
d="${d}00000000000000000000000000000000"
d="${d}&reqlinnum=1"
d="${d}&reqlin0=000A3448454C4F3389C8"
d="${d}&endofreq=1"
curl -d "${d}" http://localhost:12913

in a Bourne-shell-like shell yields the output for a valid POST request, just to play a bit. Running as the same user as the server, the commands

  /bin/kill -2 <pid>

or

  /bin/kill -15 <pid>

gracefully stop the web server, where <pid> denotes the process ID of the server. 2 represents the signal INT, or ^C on the keyboard, and 15 represents the signal TERM. Gracefully stopping here means that the accounting records are properly written before the server process terminates. The command

  /bin/kill -10 <pid>

causes the server to write out the accounting records from the buffers and then continue normal work. 10 belongs to the signal USR1.

That's it. Enjoy what you can see here and take away.


Mon, 25 May 2020 12:00:15 +0200

Stephan K.H. Seidl