Experimenting with high concurrent connection server - server

I am trying to build a server which can handle as many concurrent connections as possible. (100k at least, for a start)
Right now, when i test it through LAN, it can go up to 50k+ concurrent connections easily (did not test more yet). However when I test it from outside my LAN, it never goes beyond about 8k...
To be more precise, when going past 8k, the first sockets no longer receive any data, as if the new ones replaced them...
Does anyone have any idea what could cause this?
I have done some research, and it seems, although it isn't clear, that routers/modems may have a limited amount of supported concurrent connections, is that true?
If so, and if that's my problem, do I have to get one that can support more? Or get rid of it somehow?

Related

Npgsql with Pgbouncer on Kubernetes - pooling & keepalives

I'm looking for more detailed guidance / other people's experience of using Npgsql in production with Pgbouncer.
Basically we have the following setup using GKE and Google Cloud SQL:
Right now - I've got npgsql configured as if pgbouncer wasn't in place, using a local connection pool. I've added pgbouncer as a deployment in my GKE cluster as Google SQL has very low max connection limits - and to be able to scale my application horizontally inside of Kubernetes I need to protect against overwhelming it.
My problem is one of reliability when one of the pgbouncer pods dies (due to a node failure or as I'm scaling up/down).
When that happens (1) all of the existing open connections from the client side connection pools in the application pods don't immediately close (2) - and basically result in exceptions to my application as it tries to execute commands. Not ideal!
As I see it (and looking at the advice at https://www.npgsql.org/doc/compatibility.html) I have three options.
Live with it, and handle retries of SQL commands within my application. Possible, but seems like a lot of effort and creates lots of possible bugs if I get it wrong.
Turn on keep alives and let npgsql itself 'fail out' relatively quickly the bad connections when those fail. I'm not even sure if this will work or if it will cause further problems.
Turn off client side connection pooling entirely. This seems to be the official advice, but I am loathe to do this for performance reasons, it seems very wasteful for Npgsql to have to open a connnection to pgbouncer for each session - and runs counter to all of my experience with other RDBMS like SQL Server.
Am I on the right track with one of those options? Or am I missing something?
You are generally on the right track and your analysis seems accurate. Some comments:
Option 2 (turning out keepalives) will help remove idle connections in Npgsql's pool which have been broken. As you've written your application will still some failures (as some bad idle connections may not be removed in time). There is no particular reason to think this would cause further problems - this should be pretty safe to turn on.
Option 3 is indeed problematic for perf, as a TCP connection to pgbouncer would have to be established every single time a database connection is needed. It will also not provide a 100% fail-proof mechanism, since pgbouncer may still drop out while a connection is in use.
At the end of the day, you're asking about resiliency in the face of arbitrary network/server failure, which isn't an easy thing to achieve. The only 100% reliable way to deal with this is in your application, via a dedicated layer which would retry operations when a transient exception occurs. You may want to look at Polly, and note that Npgsql helps our a bit by exposing an IsTransient exception which can be used as a trigger to retry (Entity Framework Core also includes a similar "retry strategy"). If you do go down this path, note that transactions are particularly difficult to handle correctly.

What are the possible use cases of the OrientDb Live Query feature?

I apologise if the question is naive. I wanted to understand what could be a few possible use cases of the live query feature.
Let's say - My database state changes but it doesn't change every minute (or hour). If I execute a live query against my database/class/cluster, I'm not really expecting the callback to be called anytime soon. But, hey, I would still want to be notified when there's a state change.
My need with Orientdb is more on lines of ElasticSearch's percolator bundled with a publish-subscribe system.
Is live query meant to cater to such use cases too? Or is my understanding of live query very limited? What could be a few possible use cases for the live query feature?
Thanks!
Whether or not Live Queries will be appropriate for your use case depends on a few things. There are several reason why live queries make sense. A few questions to ask are:
How frequently does the data change?
How soon after the data changes do you need to know about it?
How many different groups of data (e.g. classes, clusters) do you need to deal with?
How many clients are connected to the server?
If the data does not change very often, or if you can wait a set period of time before an update, or you don't have many clients (hitting the DB directly), or if you only have one thing feeding the database, then you might want to just do polling. There is a balance between holding a connection open that you send a message on very infrequently (live queries) and polling too often.
For example. It's possible that you have an application server (tomcat, node, etc) and that your clients connect via web sockets. Now lets say your app server makes one (or a few pooled) live query to the database. Now lets say your database has an update. It might just go from the database to the app server (e.g. node). Node may now be responsible for fanning out that message across 100 web sockets (1 for each connected client). In this case, the fact that node is connected to the database in a persistent way with a live query open, is not that big of a deal.
The question is. If you have thousands of clients connected, do they all need an immediate update. If so are you planning on having them polling at a short interval? If so, you probably could benefit from a live query. Lots of clients polling at a short interval will generate a lot of unnecessary traffic and queries.
Unfortunately at the end of the day, the answer is it depends. You probably need to prototype and then instrument under load to see what your tradeoffs are. But in principal, it is less about how frequently updates come, and more about how often you would have clients poll, and how many clients you have. If the answer is "short intervals and a lot of clients" Give live queries a try.

Documentation of TCP possible errors / unpredictable behaviours

I've started some time ago to work with custom-made servers, and even though I have experience to deal with the actual message exchange / serialization, etc, of client/server communications, I've had never coded an actual server from scratch.
In this sense, I have found raw TCP socket connections to be much trickier and unpredictable than I'd like.
For example, I coded a simple client/server application that would establish a long lived TCP connection, and the clients would receive push notifications from the server. Very simple, it worked very well in my test environment, even with many computers.
When I actually published this, though, I've had got lots of errors that later I would found that it was the lack of keepalive signals, which would make the connection to be cut, without giving me (either client or server) any feedback / error at all. The messages simply wouldn't be delivered, and fail silently.
I knew that TCP could break the connection, but I thought I could at least receive an error or such so I could reconnect in case of loss of connection.
This made me very insecure about rolling my own servers, as the possible errors and scenarios seem too many and unexpected, and I really don't want to learn about the unexpected behaviours when the actual application is deployed. With my current experience with server-side programming, the best way to deal with errors would be to enumerate all possible errors, and make sure I cover all exceptional cases when writing a program.
So, is there anywhere I could find a good documentation on the possible pitfalls / exceptions I could find with sockets, with how to detect them? It's been some time since I last worked with that, so I don't have any more fresh examples, but I remember that e.g. when you receive an empty message it would mean that the connection broke.
I'd also love to hear suggestions, or maybe simple libs (preferrably in C) that cover them so I can base my work in it? My main platform is linux, but a cross-platform solution is much appreciated!
Thank you!

What is the practical / hard limit on socket connections per server

I have a number of client devices that open socket connection exposed by a service running on a Windows 2008 R2 server. I'm wondering if what is hard limit on the number of concurrent client connections.
According to this article, one hard limit is (was) 16,777,214. The practical limit depends on your application also: for example, if you create a thread per connection, then the practical limit comes from the limitation in the number of threads more than from the network stack. There is also a limit on the number of handles any process may have, and so on.
Assuming you select a sensible architecture for your server then the limit will be memory and cpu related. IMHO you'll never reach the hard limit that Martin mentions :)
So, rather than worrying about a theoretical limit that you'll never hit you should, IMHO, be thinking about how you will design your application and how you will test it to determine the current maximum number of client connections that you can maintain for your application on given hardware. The important thing for me is to run your perf tests from Day 0 (see here for a blog posting where I explain this). Modern operating systems and hardware allow you to build very scalable systems but simple day to day coding and design mistakes can easily squander that scalability and so you simply MUST run perf tests all the time so that you know when you are building in road blocks to your performance. You simply cannot go back and fix these kind of mistakes at the end of the project.
As an aside, I ran some tests on Windows 2003 Server with a low spec VM and easily achieved more than 70,000 concurrent and active connections with a simple server based on an overlapped I/O (I/O completion port) based design. See this answer for more details.
My personal approach would be to get a shell of a server put together quickly using whatever technology you decide on (I favour unmanaged C++ using I/O Completion Ports and minimal threads), see this blog posting for more details. Then build a client or series of clients that can stress test the application and keep updating and running the test clients as you implement your server logic. You would expect to see a gradually declining curve of maximum concurrent clients as you add more complexity to your server; large drops in scalability should cause you to examine the latest check ins to look for unfortunate design decisions.

Faulty-connection Proof File Transfer Protocol?

I frequently do website development live over an FTP connection. That is to say, I use a code editor with a built in FTP window and push/pull files to work on them, upload the changes, etc. This is mostly because it's unreasonable to try to create a local development server, and I use too many computers for that to be practical anyway without a lot of work.
My trouble is, the internet connection at our home is not exactly... stable. It's fast and mostly reliable, but it has a tendancy to glitch far more frequently than any other connection I've worked on (it's wireless DSL) and as a result, dropped connections are far too frequent. (It's about as reliable as AT&T is with phone calls in that regard.) When working with FTP, I find that if it drops the connection mid-file transfer, it can be difficult to recover. First of all, when the connection is dropped, it saves a blank file to the server (how is this helpful?) breaking the page I was working on completely, and the icing on the cake is that depending on the timing, vsftpd will get itself stuck in a timeout and I have to SSH in and restart it before I can access that file again.
This process alone has only been beneficial because it's taught me to build up some data protection techniques clientside, to prevent the server from eating my recent changes if the dropped connection happens to hang or crash my client. Overall though, it's a pretty failed situation, and I'm surprised I get any work done at all.
Long, long context, I know, but my question is this: Is there a file transfer protocol that is designed to handle "flakey" connections like mine? I'd imagine that, for example, trying to transfer files over a 3G tethered connection would yield the same results, especially while traveling. It seems like FTP and SFTP both rely on a persistant connection, and can deal with dropped packets but not the loss of the entire socket through a reconnect. It seems to me like a file transfer daemon should be able to store the state of the user interacting with it, and thus detect failed transfers and be ready to "resume" if the user reconnects in a reasonable amount of time.
Thanks if anyone knows anything. I'm seriously considering trying to write such a protocol myself (I've had a lot of success coding the ajax on my page to handle faulty connections, for example) but I don't want to dive in if there's already a solution available.
You want rsync. If the connection drops, you just repeat the command and it picks up right where it left off. Built in error checking and everything. Works over SSH, Windows client exists. Somebody's probably written a GUI front end.
BitTorrent works well with flakey connections. I hear that it is fast, too!