"missing" Haproxy sessions in stats page - haproxy

In my Haproxy config, I have a front-end and several backends. The stats page is showing stats of sessions, CUR = 1098. (CSV output below)
However, if I add up all the sessions for all its back-ends, I get nowhere close to that number (54).
Am I misunderstanding the meaning of "ssesions - cur" on this page?
Or is http-in front end discarding 95% of incoming sessions because they don't match a backend? If the latter, I would have though I'd see a bunch of 503s being returned all the time, which I don't.
I thought subsequently perhaps these could be sessions stuck in TCP_WAIT on the client side, but that only accounts for 17% - not 95.
In short, what's happening with these other 95% please?
Many thanks
# pxname,bobname,qcur,qmax,scur,smax
http-in,FRONTEND,,,1098,1254
foo_web_zar_and_ws,bob91,0,0,0,1
foo_web_zar_and_ws,bob83,0,0,1,7
foo_web_zar_and_ws,BACKEND,0,0,1,7
foo_web_ned,bob91,0,0,0,0
foo_web_ned,bob83,0,0,0,0
foo_web_ned,BACKEND,0,0,0,0
foo_web_comms,bob91,0,0,0,2
foo_web_comms,bob83,0,0,0,2
foo_web_comms,BACKEND,0,0,0,2
bla_web_comms,bob10,0,0,9,46
bla_web_comms,bob91,0,0,3,32
bla_web_comms,bob83,0,0,3,62
bla_web_comms,BACKEND,0,0,15,85
bla_web_zar_and_ws,bob91,0,0,5,20
bla_web_zar_and_ws,bob83,0,0,7,36
bla_web_zar_and_ws,BACKEND,0,0,12,45
bla_web_ned,bob91,0,0,0,2
bla_web_ned,bob83,0,0,0,2
bla_web_ned,BACKEND,0,0,0,2
stats,FRONTEND,,,1,5
stats,BACKEND,0,0,0,1

If your site is really busy, that could explain that many sessions on the front end.
What comes to mind to me is a possibility that you've got http keepalive turned on for the front end, and there are 1098 connections being kept alive right at that moment, but almost all of them are idle.

Related

REST API retrieving many subresources efficiently

Let's assume I have a REST API for a bulletin board with threads and their comments as a subresource, e.g.
/threads
/threads/{threadId}/comments
/threads/{threadId}/comments/{commentId}
The user can retrieve all threads with /threads, but what is an efficient/good way to retrieve all comments?
I know that HAL can embeded subresources directly into a parent resource, but that possibly means sending much data over the network, even if the client does not need the subresource. Also, I guess paging is difficult to implement (let's say one thread contains many hundred posts).
Should there be a different endpoint representing the SQL query where threadId in (..., ..., ...)? I'm having a hard time to name this endpoint in the strict resource oriented fashion.
Or should I just let the client retrieve each subresource individually? I guess this boils down to the N+1 problem. But maybe it's not so much of a deal, as they client could start to retrieve all subresources at once, and the responses should come back simulataneously? I could think of the drawback that this more or less forces the API client to use non-blocking IO (as otherwise the client may need to open 20 threads for a page size of 20 - or even more), which might not be so straight-forward in some frameworks. Also, with HTTP 1.1, only 6 simulatenous requests are allowed per TCP connection, right?
I actually now tend to the last option, with a focus on HTTP 2 and non-blocking IO (or even server push?) - although some more simpler clients may not support this. At least the API would be clean and does not have to be changed just to work around technical difficulties.
Is there any other option I have missed?

Is it possible to know the number of users connected by looking haproxy stats?

When I look the stats page ogf my haproxy, I wonder if the term "session" represents the number of different users who are connected to my haproxy.
Session shows number of sessions. Two sessions opened from the same ip address will be shown as two sessions. I don't know a better solution for you than to use socat and "show sess" command. https://cbonte.github.io/haproxy-dconv/1.7/management.html#9.3-show%20sess
then group and count sessions with unique src address.

HAR file - access "Size" column entries from Chrome Dev Tools Network tab?

I am working on measuring the percentage of GET requests being handled / returned by a site's service worker. Within Chrome Dev Tools there is a "Size" column that shows "(from ServiceWorker)" for files matched by the cache.
When I right-click on any row and choose "Save as HAR with content" then open up the downloaded file in a text editor, searching for "service worker" includes some results (where within the response, there is "statusText": "Service Worker Fallback Required"), but none of them look related to the fact that some requests were handled by the service worker.
Is this information I'm looking for accessible anywhere within the downloaded HAR file? Alternatively, could this be found out by some other means like capturing network traffic through Selenium Webdriver / ChromeDriver?
It looks like the content object defines the size of requests: http://www.softwareishard.com/blog/har-12-spec/#content
But I'm not seeing anything in a sample HAR file from airhorner.com that would help you determine that the request came from a service worker. Seems like a shortcoming in the HAR spec.
It looks like Puppeteer provides this information. See response.fromServiceWorker().
I tried to investigate this a bit in Chrome 70. Here's a summary.
I'm tracking all requests for the https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.5/require.min.js URL, which is a critical script for my site.
TL;DR
As Kayce suggests, within a Chrome HAR file there is no explicit way of determining that an entry was handled by a service worker (as far as I can see). I also haven't been able to find a combination of existing HAR entry fields that would positively identify an entry as being handled by a service worker (but perhaps there is such a combination).
In any case, it would be useful for browsers to record any explicit relationships between HAR entries, so that tools like HAR Viewer could recognise that two entries are for the same logical request, and therefore not display two requests in the waterfall.
Setup
Clear cache, cookies, etc, using the Clear Cache extension.
First and second entries found in HAR
The first entry (below) looks like a request that is made by the page and intercepted/handled by the service worker. There is no serverIPAddress and no connection, so we can probably assume this is not a 'real' network request.
The second entry is also present as a result of the initial page load - there has been no other refresh/reload - you get 2 entries in the HAR for the same URL on initial page load (if it passes through a service worker and reaches the network).
The second entry (below) looks like a request made by the service worker to the network. We see the serverIPAddress and response.connection fields populated.
An interesting observation here is that entry#2's startedDateTime and time fall 'within' the startedDateTime and time of the 'parent' request/entry.
By this I mean entry#2's start and end time fall completely within entry#1's start and end time. Which makes sense as entry#2 is a kind of 'sub-request' of entry#1.
It would be good if the HAR spec had a way of explicitly recording this relationship. I.e. that request-A from the page resulted in request-B being sent by the service worker. Then a tool like HAR Viewer would not display two entries for what is effectively a single request (would this cover the case where a single fetch made by the page resulted in multiple service worker fetches?).
Another observation is that entry#1 records the request.httpVersion and response.httpVersion as http/1.1, whereas the 'real' request used http/2.0.
Third entry (from pressing enter in the address bar after initial page load)
This entry appears in the HAR as a result of pressing enter in the address bar. The _fromCache field is memory as expected, as the resource should be served from regular browser cache in this situation (the resource uses cache-control=public, max-age=30672000).
Questions:
Was this entry 'handled' by the service worker's fetch event?
Maybe when a resource is in memory cache the service worker fetch event isn't fired?
Or is service worker effectively 'transparent' here?
There is no serverIPAddress or connection fields as expected, as there was no 'real' network request.
There is a pageref field present, unlike for entry#2 (entry#2 was a service worker initiated network request).
Fourth entry
The preparation work for this entry was:
Add the resource to a service-worker-cache (https://developer.mozilla.org/en-US/docs/Web/API/Cache).
Use Clear Cache extension to clear all caches (except service-worker-cache).
Reload page.
This entry has fromCache set to disk. I assume this is because the service-worker-cache was able to satisfy the request.
There is no serverIPAddress or connection field set, but the pageref is set.
Fifth entry
The preparation work for this entry was:
Use devtools to enter 'Offline' mode.
This entry is basically the same as entry#4.

How to design a REST API to fetch a large (ephemeral) data stream?

Imagine a request that starts a long running process whose output is a large set of records.
We could start the process with a POST request:
POST /api/v1/long-computation
The output consists of a large sequence of numbered records, that must be sent to the client. Since the output is large, the server does not store everything, and so maintains a window of records with a upper limit on the size of the window. Let's say that it stores upto 1000 records (and pauses computation whenever this many records are available). When the client fetches records, the server may subsequently delete those records and so continue with generating more records (as more slots in the 1000-length window are free).
Let's say we fetch records with:
GET /api/v1/long-computation?ack=213
We can take this to mean that the server should return records starting from index 214. When the server receives this request, it can assume that the (well-behaved) client is acknowledging that records up to number 213 are received by the client and so it deletes them, and then returns records starting from number 214 to whatever is available at that time.
Next if the client requests:
GET /api/v1/long-computation?ack=214
the server would delete record 214 and return records starting from 215.
This seems like a reasonable design until it is noticed that GET requests need to be safe and idempotent (see section 9.1 in the HTTP RFC).
Questions:
Is there a better way to design this API?
Is it OK to keep it as GET even though it appears to violate the standard?
Would it be reasonable to make it a POST request such as:
POST /api/v1/long-computation/truncate-and-fetch?ack=213
One question I always feel like that needs to be asked is, are you sure that REST is the right approach for this problem? I'm a big fan and proponent REST, but try to only apply to to situations where it's applicable.
That being said, I don't think there's anything necessarily wrong with expiring resources after they have been used, but I think it's bad design to re-use the same url over and over again.
Instead, when I call the first set of results (maybe with):
GET /api/v1/long-computation
I'd expect that resource to give me a next link with the next set of results.
Although that particular url design does sort of tell me there's only 1 long-computation on the entire system going on at the same time. If this is not the case, I would also expect a bit more uniqueness in the url design.
The best solution here is to buy a bigger hard drive. I'm assuming you've pushed back and that's not in the cards.
I would consider your operation to be "unsafe" as defined by RFC 7231, so I would suggest not using GET. I would also strongly advise you to not delete records from the server without the client explicitly requesting it. One of the principles REST is built around is that the web is unreliable. Under your design, what happens if a response doesn't make it to the client for whatever reason? If they make another request, any records from the lost response will be destroyed.
I'm going to second #Evert's suggestion that you absolutely must keep this design, you instead pick a technology that's build around reliable delivery of information, such as a messaging queue. If you're going to stick with REST, you need to allow clients to tell you when it's safe to delete records.
For instance, is it possible to page records? You could do something like:
POST /long-running-operations?recordsPerPage=10
202 Accepted
Location: "/long-running-operations/12"
{
"status": "building next page",
"retry-after-seconds": 120
}
GET /long-running-operations/12
200 OK
{
"status": "next page available",
"current-page": "/pages/123"
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "building next page",
"retry-after-seconds": 120
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "complete"
}
GET /pages/123
{
// a page of records
}
DELETE /pages/123
// remove this page so new records can be made
You'll need to cap out page size at the number of records you support. If the client request is smaller than that limit, you can background more records while they process the first page.
That's just spitballing, but maybe you can start there. No promises on quality - this is totally off the top of my head. This approach is a little chatty, but it saves you from returning a 404 if the new page isn't ready yet.

Avoid duplicate POSTs with REST

I have been using POST in a REST API to create objects. Every once in a while, the server will create the object, but the client will be disconnected before it receives the 201 Created response. The client only sees a failed POST request, and tries again later, and the server happily creates a duplicate object...
Others must have had this problem, right? But I google around, and everyone just seems to ignore it.
I have 2 solutions:
A) Use PUT instead, and create the (GU)ID on the client.
B) Add a GUID to all objects created on the client, and have the server enforce their UNIQUE-ness.
A doesn't match existing frameworks very well, and B feels like a hack. How does other people solve this, in the real world?
Edit:
With Backbone.js, you can set a GUID as the id when you create an object on the client. When it is saved, Backbone will do a PUT request. Make your REST backend handle PUT to non-existing id's, and you're set.
Another solution that's been proposed for this is POST Once Exactly (POE), in which the server generates single-use POST URIs that, when used more than once, will cause the server to return a 405 response.
The downsides are that 1) the POE draft was allowed to expire without any further progress on standardization, and thus 2) implementing it requires changes to clients to make use of the new POE headers, and extra work by servers to implement the POE semantics.
By googling you can find a few APIs that are using it though.
Another idea I had for solving this problem is that of a conditional POST, which I described and asked for feedback on here.
There seems to be no consensus on the best way to prevent duplicate resource creation in cases where the unique URI generation is unable to be PUT on the client and hence POST is needed.
I always use B -- detection of dups due to whatever problem belongs on the server side.
Detection of duplicates is a kludge, and can get very complicated. Genuine distinct but similar requests can arrive at the same time, perhaps because a network connection is restored. And repeat requests can arrive hours or days apart if a network connection drops out.
All of the discussion of identifiers in the other anwsers is with the goal of giving an error in response to duplicate requests, but this will normally just incite a client to get or generate a new id and try again.
A simple and robust pattern to solve this problem is as follows: Server applications should store all responses to unsafe requests, then, if they see a duplicate request, they can repeat the previous response and do nothing else. Do this for all unsafe requests and you will solve a bunch of thorny problems. Repeat DELETE requests will get the original confirmation, not a 404 error. Repeat POSTS do not create duplicates. Repeated updates do not overwrite subsequent changes etc. etc.
"Duplicate" is determined by an application-level id (that serves just to identify the action, not the underlying resource). This can be either a client-generated GUID or a server-generated sequence number. In this second case, a request-response should be dedicated just to exchanging the id. I like this solution because the dedicated step makes clients think they're getting something precious that they need to look after. If they can generate their own identifiers, they're more likely to put this line inside the loop and every bloody request will have a new id.
Using this scheme, all POSTs are empty, and POST is used only for retrieving an action identifier. All PUTs and DELETEs are fully idempotent: successive requests get the same (stored and replayed) response and cause nothing further to happen. The nicest thing about this pattern is its Kung-Fu (Panda) quality. It takes a weakness: the propensity for clients to repeat a request any time they get an unexpected response, and turns it into a force :-)
I have a little google doc here if any-one cares.
You could try a two step approach. You request an object to be created, which returns a token. Then in a second request, ask for a status using the token. Until the status is requested using the token, you leave it in a "staged" state.
If the client disconnects after the first request, they won't have the token and the object stays "staged" indefinitely or until you remove it with another process.
If the first request succeeds, you have a valid token and you can grab the created object as many times as you want without it recreating anything.
There's no reason why the token can't be the ID of the object in the data store. You can create the object during the first request. The second request really just updates the "staged" field.
Server-issued Identifiers
If you are dealing with the case where it is the server that issues the identifiers, create the object in a temporary, staged state. (This is an inherently non-idempotent operation, so it should be done with POST.) The client then has to do a further operation on it to transfer it from the staged state into the active/preserved state (which might be a PUT of a property of the resource, or a suitable POST to the resource).
Each client ought to be able to GET a list of their resources in the staged state somehow (maybe mixed with other resources) and ought to be able to DELETE resources they've created if they're still just staged. You can also periodically delete staged resources that have been inactive for some time.
You do not need to reveal one client's staged resources to any other client; they need exist globally only after the confirmatory step.
Client-issued Identifiers
The alternative is for the client to issue the identifiers. This is mainly useful where you are modeling something like a filestore, as the names of files are typically significant to user code. In this case, you can use PUT to do the creation of the resource as you can do it all idempotently.
The down-side of this is that clients are able to create IDs, and so you have no control at all over what IDs they use.
There is another variation of this problem. Having a client generate a unique id indicates that we are asking a customer to solve this problem for us. Consider an environment where we have a publicly exposed APIs and have 100s of clients integrating with these APIs. Practically, we have no control over the client code and the correctness of his implementation of uniqueness. Hence, it would probably be better to have intelligence in understanding if a request is a duplicate. One simple approach here would be to calculate and store check-sum of every request based on attributes from a user input, define some time threshold (x mins) and compare every new request from the same client against the ones received in past x mins. If the checksum matches, it could be a duplicate request and add some challenge mechanism for a client to resolve this.
If a client is making two different requests with same parameters within x mins, it might be worth to ensure that this is intentional even if it's coming with a unique request id.
This approach may not be suitable for every use case, however, I think this will be useful for cases where the business impact of executing the second call is high and can potentially cost a customer. Consider a situation of payment processing engine where an intermediate layer ends up in retrying a failed requests OR a customer double clicked resulting in submitting two requests by client layer.
Design
Automatic (without the need to maintain a manual black list)
Memory optimized
Disk optimized
Algorithm [solution 1]
REST arrives with UUID
Web server checks if UUID is in Memory cache black list table (if yes, answer 409)
Server writes the request to DB (if was not filtered by ETS)
DB checks if the UUID is repeated before writing
If yes, answer 409 for the server, and blacklist to Memory Cache and Disk
If not repeated write to DB and answer 200
Algorithm [solution 2]
REST arrives with UUID
Save the UUID in the Memory Cache table (expire for 30 days)
Web server checks if UUID is in Memory Cache black list table [return HTTP 409]
Server writes the request to DB [return HTTP 200]
In solution 2, the threshold to create the Memory Cache blacklist is created ONLY in memory, so DB will never be checked for duplicates. The definition of 'duplication' is "any request that comes into a period of time". We also replicate the Memory Cache table on the disk, so we fill it before starting up the server.
In solution 1, there will be never a duplicate, because we always check in the disk ONLY once before writing, and if it's duplicated, the next roundtrips will be treated by the Memory Cache. This solution is better for Big Query, because requests there are not imdepotents, but it's also less optmized.
HTTP response code for POST when resource already exists