Asynchronouos Socket Communication & Heap fragmentation - sockets

I wrote a multithreaded Socket Server application which accepts over a 1000 concurrent connections. Recently we had application crash; after analyzing the dump files came to know app has crash due to heap corruption. I found the same issue discussed in following links.
.NET Does NOT Have Reliable Asynchronouos Socket Communication?
http://support.microsoft.com/kb/947862
And also discussion suggest 3 solutions.
The network application should have an upper bound on the number of outstanding asynchronous IO that it posts.
Use Microsoft CCR
Use TPL
Due to the time factor, I thought to stick with #1, but I don't have a clear picture how to implement this. Can some one give a good starting point please?
And also has anyone used Async with TPL to solve this issue?

You mean a better starting point than the blog posting that I linked to in the answer that you refer to?
The issue is this:
Memory and other per-operation resources that are used during an async write are often "in use" until the remote peer's TCP stack acks the data and the local stack can complete your async write operation to tell you that you can reuse your buffer.
The local peer has no control over this as it's all governed by the speed at which the remote peer reads data from its socket and the congestion on the link between the two peers.
Because of the above you need to have a hard limit on the amount of async writes that you have outstanding at any one time. You can track this by incrementing a counter just before you issue an async write and decrementing it in the completion handler.
What you do once you hit that limit is up to you. In the original article I favour a queue that data to be written is placed into. This queue can then be used as a source of data as write completions occur. Once the queue is empty you can send normally again. Of course this simply moves the problem - you still have a memory resource that's controlled by the remote peer (the queued data) but you don't also have other OS resources used too (non-paged pool, I/O page lock limit, etc).
You could simply stop your peer sending when you reach your limit - and now the API that you build over the async API needs to have a 'can't sent at the moment, try again later' return from a send which previously used to always "work".
If you're doing this I would also seriously look at avoiding the pinned memory issue by allocating a large block of buffers in one contiguous block and using them from the pool.

First, that's a very old KB article. How are you sure you have that particular problem?
Then, as Hans Passant answers in the SO question, if you write bad async code, it will bite you. If you don't take care of your resources (and memory buffers are resources), a concurrent program will face memory errors
It's very hard to write good concurrent code using raw Threads and TPL does make it easier but it won't fix the bugs you already have. In fact, unless you identify your current problems you are likely to transfer them to the version that uses TPL.
Without knowing the specific problem that caused your application to crash, I can only make some suggestions:
Use BufferManager to reuse memory buffers instead of allocating new ones.
Use a queue to store requests and process them asynchronously instead of starting a new thread for each request.
There are other techniques you can use as well, depending on the type of application you are building. Eg you could use TPL DataFlow to break processing in independent steps.
As for CCR, there is not much point in using it outside Robotics Studio. TPL contains most of the relevant functionality you need to write concurrent apps.

Related

Connection pool and File handles

We use Retrofit/OkHttp3 for all network traffic from our Android application. So far everything seems to run quite smoothly.
However, we have now occasionally had our app/process run out of file handles.
Android allows for a max of 1024 file handles per process
OkHttp will create a new thread for each async call
Each thread created this way will (from our observation) be responsible for 3 new file handles (2 pipes and one socket).
We were able to debug this exactly, where each dispached async call using .enqueue() will lead to an increase of open file handles of 3.
The problem is, that the ConnectionPool in OkHttp seems to be keeping the connection threads around for much longer than they are actually needed. (This post talks of five minutes, though I haven't seen this specified anywhere.)
That means if you are quickly dispatching request, the connection pool will grow in size, and so will the number of file handles - until you reach 1024 where the app crashes.
I've seen that it is possible to limit the number of parallel calls with Dispatcher.setMaxRequests()(although it seems unclear whether this actually works, see here) - but that still doesn't quite address the issue with the open threads and file handles piling up.
How could we prevent OkHttp from creating too many file handles?
I am answering my own question here to document this issue we had. It took us a while to figure this out and I think others might encounter this too and might be glad for this answer.
Our problem was that we created one OkHttpClient per request, as we used it's builder/interceptor API to configure some per-request parameters like HTTP headers or timeouts.
By default each OkHttpClient comes with its own connection pool, which of course blows up the number of connections/threads/file handles and prevents proper reuse in the pool.
Our solution
We solved the problem by manually creating a global ConnectionPool in a singleton, and then passing that to the OkHttpClient.Builder object which builds the actual OkHttpClient.
This still allows for per-request configuration using the OkHttpClient.Builder
Makes sure all OkHttpClient instances are still using a common connection pool.
We were then able to properly size the global connection pool.

OSB: Analyzing memory of proxy service

I have multiple proxies in a message flow.Is there a way in OSB by which I can monitor the memory utilization of each proxy ? I'm getting OOM, want to investigate which proxy is eating away all/most memory.
Thanks !
If you're getting OOME then it's either because a proxy is not freeing up all the memory it uses (so will eventually fail even with one request at a time), or you use too much memory per invocation and it dies over a certain threshold but is fine under low load. Do you know which it is?
Either way, you will want to generate a heap dump on OOME so you can investigate what's going on. It's annoying but sometimes necessary. A colleague had to do that recently to fix some issues (one problem was an SB-transport platform bug, one was a thread starvation issue due to a platform work manager bug, the last one due to a Muxer bug when used in exalogic).
If it just performs poorly under load, then you'll need to do the usual OSB optimisations, like use fewer Assign steps (but assign more variables per step), do a lot more in xquery rather than proxy steps, especially loops that don't need a service callout, since they can easily be rolled into a for loop in xquery; you know, all the standard stuff.

Difference between shared memory IPC mechanism and API/system-call invocation

I am studying about operating systems(Silberscatz, Galvin et al). My programming experiences are limited to occasional coding of exercise problems given in a programing text or an algorithm text. In other words I do not have a proper application programming or system programming experience. I think my below question is a result of a lack of experience of the above and hence a lack of context.
I am specifically studying IPC mechanisms. While reading about shared memory(SM) I couldn't imagine a real life scenario where processes communicate using SM. An inspection of processes attached to the same SM segment on my linux(ubuntu) machine(using 'ipcs' in a small shell script) is uploaded here
Most of the sharing by applications seem to be with the X deamon. From what I know , X is the process responsible for giving me my GUI. I infered that these applications(mostly applets which stay on my taskbar) share data with X about what needs to change in their appearances and displayed values. Is this a reasonable inference??
If so,
my question is, what is the difference between my applications communicating with 'X' via shared memory segments versus my applications invoking certain API's provided by 'X' and communicate to 'X' about the need to refresh their appearances?? BY difference I mean, why isn't the later approach used?
Isn't that how user processes and the kernel communicate? Application invokes a system call when it wants to, say read a file, communicating the name of the file and other related info via arguments of the system call?
Also could you provide me with examples of routinely used applications which make use of shared memory and message-passing for communication?
EDIT
I have made the question more clearer. I have formatted the edited part to be bold
First, since the X server is just another user space process, it cannot use the operating system's system call mechanism. Even when the communication is done through an API, if it is between user space processes, there will be some inter-process-communication (IPC) mechanism behind that API. Which might be shared memory, sockets, or others.
Typically shared memory is used when a lot of data is involved. Maybe there is a lot of data that multiple processes need to access, and it would be a waste of memory for each process to have its own copy. Or a lot of data needs to be communicated between processes, which would be slower if it were to be streamed, a byte at a time, through another IPC mechanism.
For graphics, it is not uncommon for a program to keep a buffer containing a pixel map of an image, a window, or even the whole screen that then needs to be regularly copied to the screen. Sometimes at a very high rate...30 times a second or more. I suspect this is why X uses shared memory when possible.
The difference is that with an API you as a developer might not have access to what is happening inside these functions, so memory would not necessarily be shared.
Shared Memory is mostly a specific region of memory to which both apps can write and read from. This off course requires that access to that memory is synchronized so things don't get corrupted.
Using somebody's API does not mean you are sharing memory with them, that process will just do what you asked and perhaps return the result of that operation to you, however that doesn't necessarily go via shared memory. Although it could, it depends, as always.
The preference for one over another I'd say depends on the specifications of the particular application and what it is doing and what it needs to share. I can imagine that a big dataset of some kind or another would be shared by shared memory, but passing a file name to another app might only need an API call. However largely dependent on requirements I'd say.

How does Scala's Lift manage state?

I'm quite impressed by what Lift 2.0 brings to the table with Actors and StatefulSnippets, etc, but I'm a little worried about the memory overhead of these things. My question is twofold:
How does Lift determine when to garbage collect state objects?
What does the memory footprint of a page request look like?
If a web crawler dances across the footprint of the site, are they going to be opening up enough state objects to drown out a modest VPS (512M)? The question is very obviously application dependent, but I'm curious if anyone has any real world figures they can throw out at me.
Lift stores state information in a session, so once the session is destroyed the state associated with that session goes away.
Within the session, Lift tracks each page that state is allocated for (e.g., mapping between an ajax button in the browser and a function on the server) and have a heart-beat from the browser. Functions for pages that have not seen the heartbeat in 10 minutes are unreferenced so the JVM can garbage collection them. All of this is tunable, so you can change heart-beat frequency, function lifespan, etc., but in practice the defaults work quite well.
In terms of session explosion, yeah... that's a minor issue. Popular sites (including http://demo.liftweb.net/ ) experience it. The example code (see http://github.com/lift/lift/tree/master/examples/example/ ) detects sessions that were created by a single request and then abandoned and expires those early. I'm running demo.liftweb.net with 256MB of heap size (that'd fit in a 512MB VPS) and occasionally, the session count rises over 1,000, but that's quickly tamped down for search engine traffic.
I think the question about memory footprint was once answered somewhere on the mailing list, but I can’t find it at the moment.
Garbage collection is done after some idle time. There is, however, an example on the wiki which uses some better heuristics to kill off sessions spawned by web crawlers.
Of course, for your own project it makes sense to check memory consumption with something like VisualVM while spawning a couple of sessions yourself.

Running a socket stream in a thread

I have an application that opens a connection with 2 sockets (in and out) and I want to have them working in a thread.
The reason that I want them to be in a separate thread is that I don't want my application to freeze when I receive data, and this can happen anytime as long as the application is running.
Currently I have a class that handle also network communication and I run this class in an NSOperation, I'm not sure if it's the best solution.
I'm not very familiar with threading so guys if you could give me some help I would be very grateful.
Thanks
First, you should know that you can use the same socket to send and receive data — they're generally bi-directional. You should be able to share a reference to the same socket among multiple threads of execution.
Second, unless you'll be receiving large amounts of data and have experienced performance issues with your UI, I would delay optimizing for it. (Don't get me wrong, this is a good consideration, but premature optimization is the root of all evil, and simpler is generally better if it performs adequately.)
Third, NSOperation objects are "single-shot", meaning that once the main method completes, the operation task cannot be used again. This may or may not be conducive to your networking model. You might also look at NSThread. The fact that you already have the functionality "factored out" bodes well for your design, whatever turns out to be best.
Lastly, threading is a complex topic, but a good place to start (especially for Objective-C) is Apple's Threading Programming Guide.