200 vs 403 server response - which degrades server's performance more? - server

Some rogue people have set up server monitoring that connects to server every 2 minutes to check if it's down (they connect from several different accounts so they ping the server every 20 seconds or so). It's a simple GET request.
I have two options:
Leave it as it is (ie. allow them via a normal 200 server response).
Block them by either IP or user-agent (giving 403 response).
My question is - what is the better solution as far as server performance is concerned (ie. what is less 'stressful' on the server) - 1 (200 response) or 2 (403 response)?
I'm inclined to #1 since there would be no IP / user-agent checking which should mean less stress on the server, correct?

It doesn't matter.
The status code and an if-check on the user-string is completely dominated by network IO, gc and server subsystems.
If they just query every 2 minutes, I'd very much leave it alone. If they query a few hundred times per second; time to act.

Related

What could "reason: Layer6 timeout" possibly mean?

I have a haproxy configured with two servers in the backend. Occasionally, every 16-20h one of them gets marked by haproxy as DOWN:
haproxy.log-20190731:2019-07-30T16:16:24+00:00 <local2.alert> haproxy[2716]: Server be_kibana_elastic/kibana8 is DOWN, reason: Layer6 timeout, check duration: 2000ms. 0 active and 0 backup servers left. 8 sessions active, 0 requeued, 0 remaining in queue.
I did some reading how haproxy runs the checks but the Layer6 timeout does not tell me much. What could be a possible reasons for that timeout? What does it actually mean?
Here is my backend configuration
backend be_kibana_elastic
balance roundrobin
stick on src
stick-table type ip size 100k expire 12h
server kibana8 172.24.0.1:5601 check ssl verify none
server kibana9 172.24.0.2:5601 check ssl verify none
Layer 6 refers to TLS. The backend is accepting a TCP connection but isn't negotiating TLS (SSL) on the health check connection within the allowed time.
The configuration values timeout connect, timeout check, and inter all interact to determine how much time health checks are allowed, to complete, and the default value of inter if not specified is 2000 milliseconds, which is what you're seeing here. By default, inter (health check interval) determines both how often checks run and how long they are allowed to complete.
Since you have not configured a fall count for the servers, the implication is that the default value 3 is being used, which means your server is actually failing 3 consecutive health checks, before being marked down.
Consider adding option log-health-checks to the backend declaration, which will create additional log entries of those initial failing checks before the final one causes the backend to be marked down.
Increasing the allowable time may avoid the failure, but is probably valid only for testing -- not a fix -- because if your backend can't reliably respond to a check within 2000 ms, then it also can't reliably respond to client connections within that time frame, which is a long time to wait for a response.
Note that in typical environments, intermittent packet loss will typically cause sluggish behavior in increments of 3000 ms, because TCP stacks often use a retransmission timeout (RTO) of 3 seconds. Since this is more than 2000 ms, packet loss on your network is one possible explanation for the problem.
Another possible explanation is excessive CPU load on the backend, either related to traffic or to a cron job doing something intensive, because TLS negotiation -- relatively speaking -- is an expensive process from the CPU's perspective.

Ajax polling vs SSE (performance on server side)

I'm curious about if there is some type of standard limit on when is better to use Ajax Polling instead of SSE, from a server side viewpoint.
1 request every second: I'm pretty sure is better SSE
1 request per minute: I'm pretty sure is better Ajax
But what about 1 request every 5 seconds? How can we calculate where is the limit frequency for Ajax or SSE?
No way is 1 request per minute always better for Ajax, so that assumption is flawed from the start. Any kind of frequent polling is nearly always a costly choice. It seems from our previous conversation in comments of another question that you start with a belief that an open TCP socket (whether SSE connection or webSocket connection) is somehow costly to server performance. An idle TCP connection takes zero CPU (maybe every once in a long while, a keep alive might be sent, but other than that, an idle socket does not use CPU). It does use a bit of server memory to handle the socket descriptor, but a highly tuned server can have 1,000,000 open sockets at once. So, your CPU usage is going to be more about how many connections are being established and what are they asking the server to do every time they are established than it is about how many open (and mostly idle) connections there are.
Remember, every http connection has to create a TCP socket (which is roundtrips between client/server), then send the http request, then get the http response, then close the socket. That's a lot of roundtrips of data to do every minute. If the connection is https, it's even more work and roundtrips to establish the connection because of the crypto layer and endpoint certification. So doing all that every minute for hundreds of thousands of clients seems like a massive waste of resources and bandwidth when you could create one SSE connection and the client just listen for data to stream from the server over that connection.
As I said in our earlier comment exchange on a different question, these types of questions are not really answerable in the abstract. You have to have specific requirements of both client and server and a specific understanding of the data being delivered and how urgent it is on the client and therefore a specific polling interval and a specific scale in order to begin to do some calculations or test harnesses to evaluate which might be the more desirable way to do things. There are simply too many variables to come up with a purely hypothetical answer. You have to define a scenario and then analyze different implementations for that specific scenario.
Number of requests per second is only one of many possible variables. For example, if most the time you poll there's actually nothing new, then that gives even more of an advantage to the SSE case because it would have nothing to do at all (zero load on the server other than a little bit of memory used for an open socket most of the time) whereas the polling creates continual load, even when nothing to do.
The #1 advantage to server push (whether implement with SSE or webSocket) is that the server only has to do anything with the client when there is actually pertinent data to send to that specific client. All the rest of the time, the socket is just sitting there idle (perhaps occasionally on a long interval, sending a keep-alive).
The #1 disadvantage to polling is that there may be lots of times that the client is polling the server and the server has to expend resources to deal with the polling request only to inform that client that it has nothing new.
How can we calculate where is the limit frequency for Ajax or SSE?
It's a pretty complicated process. Lots of variables in a specific scenario need to be defined. It's not as simple as just requests/sec. Then, you have to decide what you're attempting to measure or evaluate and at what scale? "Server performance" is the only thing you mention, but that has to be completely defined and different factors such as CPU usage and memory usage have to be weighted into whatever you're measuring or calculating. Then, you may even need to run some test harnesses if the calculations don't yield an obvious answer or if the decision is so critical that you want to verify your calculations with real metrics.
It sounds like you're looking for an answer like "at greater than x requests/min, you should use polling instead of SSE" and I don't think there is an answer that simple. It depends upon far more things than requests/min or requests/sec.
"Polling" incurs overhead on all parties. If you can avoid it, don't poll.
If SSE is an option, it might be a good choice. "It depends".
Q: What (if any) kind of "event(s)" will your app need to handle?

Intermittent slowness in responses from vert.x based web server

I have a vertx webserver running on a 1x8g machine. It has about 15 routes mapped, 5 of which are blocking and 10 are non blocking. These are all part of one standard verticle that my app comprises of. The non blocking handlers just open an http connection to another downstream system ( all of which are very fast - elastic search / cached data APIs ). Some of the blocking handlers do take a bit of time - anywhere between 3 and 9 seconds depending on the time of the day - these also call an external system.
The API response time for my non blocking handlers are usually in the 400ms-600ms range. Occasionally, I see the response times spiking up to over 2 seconds and sometimes all the way up to 12 seconds. I'm not sure what is causing this. Is it the combination of blocking vs nonblocking handlers in the same verticle.
What is the best way to diagnose the root cause here ?

What is the overhead traffic of a TCP connection (plus TCP clarifications)?

We have a TCP connection.
Nothing is sent over; how many traffic(bytes) are needed for each second to keep that connection open?
What is the duration of opening a connection from a client in South America to a server in North Europe?
If I have to send small amount of data (max 256bytes) at x seconds interval, what would be x for which is better to close the connection and reopen again instead of keeping the connection always open?
I do not expect exact data - estimates will suffice.
1) none.
2) some time. Try it and see. For a rough estimate, ping one end from the other and double it.
3) try it. It depends on bandwidth and, more importantly, latency. These vary over wide ranges. Usually, it's better, speed-wise, to keep connections open. 256 bytes at intervals of seconds? I would keep the connection open, especially over paths with possibly high latency, (eg. intercontinental).
1. According to the TCP/IP standard, nothing. However, depending on the network conditions and any middleboxes (NAT devices, firewalls, etc.), a connection with no data going over it may be dropped. That could be a staic timeout (say two minutes, or ten minutes, or an hour), or it could be based on a least-recently-used table in some device.
2. It depends on a lot of factors, and the biggest delay may be from the client's local network rather than the intercontinental connection. However, the surface of the earth between the points is about 40 light-millisenconds, so (without TCP Fast Open) that would be 120 ms for the first data packet to get from the client to the server and 40 ms for the response, 80 ms more than in an active connection.
3. Assuming no broken middleboxes, always better to keep the connection open. However, the delay to recover from a "silently dropped" connection may be a lot longer than the time to open a new one; it might be appropriate for the client to manage its own timeout (on hte order of a second or so), and open a new connection and retry the last message if it hasn't gotten a response by then. Depends on what you're sending; transactional messages might merit such explicit fast retry more than a remote copy of syslog.

Getting QuotaExceededException - What are the operation quota limitations for Azure Notification Hubs?

I was doing some latency/performance testing for sending push notifications with Azure Notification Hub by consecutively sending many notifications in a foreach loop. It worked fine for 100 "SendNotification" requests, altough it was relatively slow (14s), but I got a QuotaExceededException for 1000 requests in a row:
[QuotaExceededException: The remote server returned an error: (403)
Forbidden. The request was terminated because the namespace
pushnotification-testing is being throttled. Please wait 60 seconds
and try again. TrackingId:...
Even when I don't wait for 60 seconds as advised, I can again execute 100 consecutive requests, but 1000 requests in a row always fail... Anything slightly above 100 consecutive requests fails most of the time...
I couldn't find any documentation on these limitations. This should be documented somewhere, so I can be sure Azure Notification Hubs will fit my needs.
The answer to this question says
There is a throttling for CRUD operation's rate. Quotas depend on tire
your are but it is not going to be less then 2000 operations per
minute per namespace any way. If quota is exceed then service returns
403.
For me, it seems to be less then 2000 operations. By the way, I'm using "FREE" tier for testing, but I guess we would switch to "STANDARD" for production.
Has anyone similar experiences or knows where to look for more information?
In particular, what are the operation quota limitations per timefram for the different tiers of Azure Notification Hubs?
UPDATE1: It's weird, but I sending 1000 requests in parallel works most of the time, but consecutively it fails on the 101st request.
For my best knowledge for right now NH has following limitations on number of SENDS (not registrations) per namespace per minute per NH machine:
Free tire: 100
Basic tire: 900
Standard tire: 11500
Massive sending in parallel allows to send more because calls are very likely to be routed on different machines.