How much data can a socket store in its buffer before read? - sockets

I have a client / server application where clients send messages to the server. Due to a legacy library that I use, my server cannot read immediately but must wait for a condition to come true until it reads the messages. How much data can a socket store? Is there a fixed buffer size/limit?
Thanks.

It depends on the size of the socket receive buffer, whose default value varies among operating systems. You can control it from your application via setsockopt() and the SO_RCVBUFSIZE option.

That depends on many factors of which many you can't control. This is not the correct approach.
You should read the data as soon as it is available, but only process it if the conditions are met.
Edit: I guess I misinterpreted the question, see #EJP's answer.

Related

Sending custom data to socket using eBPF

I'm trying, finally, to understand eBPF and maybe use it in an upcoming project.
For sake of simplicity I started with reading bcc documentation.
In my project I'll need to send some data over network upon some kernel function calls.
Can that be done without sending the data to userspace first?
I see that I can redirect skbs from one socket to another etc., and I see that I can submit custom data to user space. Is there a way to get the best of both worlds?
EDIT: I'm trying to log some file system events to another server that'll collect this data from multiple machines. Those machines can be fairly busy in some situations. It should be real time and with low latency.
I'd love to avoid going through userspace to prevent copying the data back and forth and to reduce sw overhead as much as possible.
Thank you all!
It seems this question can be summarized to: is it possible to send data over the network from a BPF tracing program (kprobes, tracepoints, etc.)?
The answer to that question is no. As far as I know, there are currently no way to craft and send packets over the network from BPF programs. You can resend a received packet to the network with some helpers, but they are only available to networking BPF programs.

What is the purpose of quickfix message store?

What is the purpose of the message store structure in quickfix? I understand that you can log all incoming and outgoing fix messages via the message store interface and quickfix provides multiple implementations like file store etc.
My question is why do you even care about the message store other than logging your fix messages for the record?
You are confusing the MessageStore and the Log, which are two different things.
The MessageStore is for internal engine use. It tracks the current incoming and outgoing message sequence numbers, session start time, and other stuff. If your app goes down for whatever reason, when it restarts, it uses the MessageStore to resume where it left off with regards to sequence number and whether to reset the session.
The Log, however, is just a log. The engine doesn't really care about it. It's for the developers.

design high volume MSMQ

We have many communication servers sending data packets. We would like to store these data packets coming from these server programs in MSMQ until an updater will process them. Data loss has been a concern and we would like to not lose any data packet coming from these server programs and want an efficient and performant solution.
What will be the best design approach?
Well, there are two basic things you need to do to get started. First, you'll want to modify the default installation to move the storage location to a drive that is mirrored and/or is not the same as the one that the operating system boots from on that server. Also you'll want to ensure there is enough space there to hold messages as they are queued, depending on the volume you're contemplating. This article covers that.
Second, you'll want to use transactions and journaling to ensure reliability. This is both a programming and infrastructure issue, so you can start by looking at this article, and then following up with a general guide on how to program against MSMQ correctly. This for example is a good starting point if you've never used MSMQ, although it's fairly basic. If you're going to use MSMQ as a binding/transport for WCF then you have the plumbing part pretty much covered; it's just a matter of configuring your services to handle the volume and traffic you think you're going to see.
We have many communication servers sending data packets.
When storing 'data packets', I would recommend writing [Serializable] .NET objects to WCF, mainly because WCF can read/write them transparently to MSMQ. This will be easier to work with, but if your data packets are say TCP/IP or binary packets, you will need to turn on 'Ordering', to ensure they go into the queue in the exact order they were placed.
MSMQ also has sessions, so if you want to group items together this is possible. WCF does not make this guarantee. You will need to write custom code for this, but it is only a case of assigning a unique ID to each message in a particular session.
Data loss has been a concern and we would like to not lose any data packet coming from these server programs
MSMQ can persist the data to disk, so if a server goes down, its queue is preserved. MSMQ can hold the queue in memory, which is more efficient but crashes/restarts will not retain the queue information.
and want an efficient( good performance )
MSMQ is fairly performant. The persistence to disk has a small overhead, but only due to the disk write. If performance includes multi-threading, MSMQ does not offer this feature as the queue is sequential, so must be processed in order. But this is typical of queue technologies.
MSMQ also have 4MB max message size, so keep in mind what you want to send across the network.
The only other thing is that MSMQ is not massively scalable. Its primary goal is guaranteed delivery. If you post millions of packets, they will get to their destination, but MSMQ does have a finite ability to push the messages to other machines. It operates a ThreadPool-like system, so it will not scale if this is also a requirement.
I have also added info to the #msmq-wcf wiki with a basic example of writing data.

serving large file using select, epoll or kqueue

Nginx uses epoll, or other multiplexing techniques(select) for its handling multiple clients, i.e it does not spawn a new thread for every request unlike apache.
I tried to replicate the same in my own test program using select. I could accept connections from multiple client by creating a non-blocking socket and using select to decide which client to serve. My program would simply echo their data back to them .It works fine for small data transfers (some bytes per client)
The problem occurs when I need to send a large file over a connection to the client. Since i have only one thread to serve all client till the time I am finished reading the file and writing it over to the socket i cannot resume serving other client.
Is there a known solution to this problem, or is it best to create a thread for every such request ?
When using select you should not send the whole file at once. If you e.g. are using sendfile to do this it will block until the whole file has been sent. Instead use a small buffer, and send a little data at a time to each client. Then use select to identify when the socket is again ready to be written to and send some more until all data has been sent. This will allow you to handle multiple clients in parallel.
The simplest approach is to create a thread per request, but it's certainly not the most scalable approach. I think at this time basically all high-performance web servers use various asynchronous approaches built on things like epoll (Linux), kqueue (BSD), or IOCP (Windows).
Since you don't provide any information about your performance requirements, and since all the non-threaded approaches require restructuring your application to use these often-complex asynchronous techniques (as described in the C10K article and others found from there), for now your best bet is just to use the threaded approach.
Please update your question with concrete requirements for performance and other relevant data if you need more.
For background this may be useful reading http://www.kegel.com/c10k.html
I think you are using your callback to handle a single connection. This is not how it was designed. Your callback has to handle the whatever-thousand of connections you are planning to serve, i.e from the number of file descriptor you get as parameter, you have to know (by reading the global variables) what to do with that client, either read() or send() or ... whatever

socket programming: How do I handle out of band data

I just looked into wikipedia's entry on out-of-band data and as far as I understand, OOB data is somehow flagged more important and treated as ordinary data, but transmitted in a seperate stream, which profoundly confuses me.
The actual question would be (besides "Could someone explain what OOB data is?"):
I'm writing a unix application that uses sockets and need to make use of select() and was wondering what to do with the exceptfds parameter? Do I need to put all my sockets into this parameter and react to such events? Or do I just ignore them?
I know you've decided you don't need to handle OOB data, but here are some things to keep in mind if you ever do care about OOB...
IPv4 doesn't really send OOB data on a separate channel, or at a different priority. It is just a flag on the packet.
OOB data is extremely limited -- 1 byte!
OOB data can be received either inline or separately depending on socket options
An "exception" signaling OOB data may occur even if the next read doesn't contain the OOB data (the network stack on the sender may flag any already queued data, so the other side will know there's OOB ASAP). This is often handled by entering a "drain" loop where you discard data until the actual OOB data is available.
If this seems a bit confusing and worthless, that's because it mostly is. There are good reasons to use OOB, but it's rare. One example is FTP, where the user may be in the middle of a large transfer but decide to abort. The abort is sent as OOB data. At that point the server and client just eat any further "normal" data to drain anything that's still in transit. If the abort were handled inline with the data then all the outstanding traffic would have to be processed, only to be dumped.
It's good to be aware that OOB exists and the basics of how it works, just in case you ever do need it. But don't bother learning it inside-out unless you're just curious. Chances are decent you may never use it.
I think I found the answer on this page. In short:
I don't need to handle OOB data on the receiving side if I'm not sending any OOB data. I had thought that OOB data could be generated by the OS of the sender.
You don't need to handle it at the receiving end even if you are sending it - OOB data is transparently ignored in all circumstances unless you actively go about receiving it.