I've been reading a lot about OpenSSL, specifically the TLS and DTLS APIs. Most of it makes sense, it's a pretty intuitive API once you understand it. One thing has really got me scratching my head though...
When/why would I use BIOs?
For example, this wiki page demonstrates setting up a barebones TLS server. There isn't even a mention of BIOs anywhere in the example.
Now this page Uses BIOs exclusively, not ever using the read and write functions of the SSL struct. Granted it's from 2013, but it's not the only one that uses BIOs.
To make it even more confusing this man page suggests that the SSL struct has an "underlying BIO" without ever needing to set it explicitly.
So why would I use BIOs if I can get away with using SSL_read() and SSL_write()? What are the advantages? Why do some examples use BIOs and others don't? What Is the Airspeed Velocity of an Unladen Swallow?
BIO's are always there, but they might be hidden by the simpler interface. Directly using the BIO interface is useful if you want more control - with more effort. If you just want to use TLS on a TCP socket then the simple interface is usually sufficient. If you instead want to use TLS on your own underlying transport layer or if you want have more control on how it interacts with the transport layer then you need BIO.
An example for such a use case is this proposal where TLS is tunneled as JSON inside HTTPS, i.e. the TLS frames are encoded in JSON and which is then transferred using POST requests and responses. This can be achieved by handling the TLS with memory BIO's which are then encoded to and decoded from JSON.
First, your Q is not very clear. SSL is (a typedef for) a C struct type, and you can't use the dot operator on a struct type in C, only an instance. Even assuming you meant 'an instance of SSL', as people sometimes do, in older versions (through 1.0.2) it did not have members read and write, and in 1.1.0 up it is opaque -- you don't even know what its members are.
Second, there are two different levels of BIO usage applicable to the SSL library. The SSL/TLS connection (represented by the SSL object, plus some related things linked to it like the session) always uses two BIOs to respectively send and receive protocol data -- including both protocol data that contains the application data you send with SSL_write and receive with SSL_read, and the SSL/TLS handshake that is handled within the library. Much as Steffen describes, these normally are both set to a socket-BIO that sends to and receives from the appropriate remote host process, but they can instead be set to BIOs that do something else in-between, or even instead. (This normal case is automatically created by SSL_set_{,r,w}fd which it should be noted on Windows actually takes a socket handle -- but not any other file handle; only on Unix are socket descriptors semi-interchangeable with file descriptors.)
Separately, the SSL/TLS connection itself can be 'wrapped' in an ssl-BIO. This allows an application to handle an SSL/TLS connection using mostly the same API calls as a plain TCP connection (using a socket-BIO) or a local file, as well as the provided 'filter' BIOs like a digest (md) BIO or a base64 encoding/decoding BIO, and any additional BIOs you add. This is the case for the IBM webpage you linked (which is for a client not a server BTW). This is similar to the Unix 'everything is (mostly) a file' philosophy, where for example the utility program grep, by simply calling read on fd 0, can search data from a file, the terminal, a pipe from another program, or (if run under inetd or similar) from a remote system using TCP (but not SSL/TLS, because that isn't in the OS). I haven't encountered many cases where it is particularly beneficial to be able to easily interchange SSL/TLS data with some other type of source/sink, but OpenSSL does provide the ability.
Related
Background/Context
I have a two scripts, a server-side script that can handle multiple clients, and a client-side script that connects to the server. Any of the clients that send a message to the server have that message copied/echoed to all the other connected clients.
Where I'm stuck.
This afternoon, I have been grasping at thin air searching for a thorough explanation with examples covering all that there is for Perl and TCP sockets. A surprising large number of results from google still list articles from 2007-2012 . It appears there originally there was the 'Socket' module , and over time IO::Socket was added , then IO::Select. But the Perldoc pages don't cover or reference everything in one place, or provide sufficent cross referencing links. I gather that most of the raw calls in Socket have an equivalent in IO::Socket. And its possible (recommended ? yes/no?) to do a functional call on the socket if something isn't available via the OO modules...
Problem 1. The far-side/peer has disconnected / the socket is no longer ESTABLISHED?
I have been trying everything I ran across today, including IO::Select with calls to can_read, has_exception, but the outputs from these show no differences regardless if the socket is up or down - I confirmed from netstat output that the non-blocking socket is torn down instantly by the OS (MacOS).
Problem 2. Is there data available to read?
For my previous perl client scripts, I have rolled my own method of using sysread (https://perldoc.perl.org/functions/sysread.html) , but today I noticed that recv is listed within the synopsis on this page near the top https://perldoc.perl.org/IO/Socket.html , but there is no mention of the recv method in the detailed info below...
From other C and Java doco pages, I gather there is a convention of returning undef, 0, >0, and on some implementations -1 when doing the equivalent of sysread. Is there an official perl spec someone can link me to that describes what Perl has implemented? Is sysread or recv the 'right' way to be reading from TCP sockets in the first place?
I haven't provided my code here because I'm asking from a 'best-practices' point of view, what is the 'right' way to do client-server communication? Is polling even the right way to begin with? Is there an event-driven method that I've somehow missed..
My sincere apologies if what I've asked for is already available, but google keeps giving me the same old result pages and derivative blogs/articles that I've already read.
Many thanks in advance.
And its possible (recommended ? yes/no?) to do a functional call on the socket if something isn't available via the OO modules...
I'm not sure which functional calls you refer to which are not available in IO::Socket. But in general IO::Socket objects are also normal file handles. This means you can do things like $server->accept but also accept($server).
Problem 1. The far-side/peer has disconnected / the socket is no longer ESTABLISHED?
This problem is not specific to Perl but how select and the socket API work in general. Perl does not add its own behavior in this regard. In general: If the peer has closed the connection then select will show that the socket is available for read and if one does a read on the socket it will return no data and no error - which means that no more data are available to read from the peer since the peer has properly closed its side of the connection (connection close is not considered an error but normal behavior). Note that it is possible within TCP to still send data to the peer even if the peer has indicated that it will not send any more data.
Problem 2. Is there data available to read?
sysread and recv are different the same as read and recv/recvmsg or different in the underlying libc. Specifically recv can have flags which for example allow peeking into data available in the systems socket buffer without reading the data. See the the documentation for more information.
I would recommend to use sysread instead of recv since the behavior of sysread can be redefined when tying a file handle while the behavior of recv cannot. And tying the file handle is for example done by IO::Socket::SSL so that not the data from the underlying OS socket are returned but the decrypted data from the SSL socket.
From other C and Java doco pages, I gather there is a convention of returning undef, 0, >0, and on some implementations -1 when doing the equivalent of sysread. Is there an official perl spec someone can link me to that describes what Perl has implemented?
The behavior of sysread is well documented. To cite from what you get when using perldoc -f sysread:
... Returns the number of bytes
actually read, 0 at end of file, or undef if there was an error
(in the latter case $! is also set).
Apart from that, you state your problem as Is there data available to read? but then you only talk about sysread and recv and not how to check if data is available before calling these functions. I assume that you are using select (or IO::Select, which is just a wrapper) to do this. While can_read of IO::Select can be used to get the information in most cases it will return the information only from the underlying OS socket. With plain sockets this is enough but for example when using SSL there is some internal buffering done in the SSL stack and can_read might return false even though there are still data available to read in the buffer. See Common Usage Errors: Polling of SSL sockets on how to handle this properly.
In current lua sockets implementation, I see that we have to install a timer that calls back periodically so that we check in a non blocking API to see if we have received anything.
This is all good and well however in UDP case, if the sender has a lot of info being sent, do we risk loosing the data. Say another device sends a 2MB photo via UDP and we check socket receive every 100msec. At 2MBps, the underlying system must store 200Kbits before our call queries the underlying TCP stack.
Is there a way to get an event fired when we receive the data on the particular socket instead of the polling we have to do now?
There are a various ways of handling this issue; which one you will select depends on how much work you want to do.*
But first, you should clarify (to yourself) whether you are dealing with UDP or TCP; there is no "underlying TCP stack" for UDP sockets. Also, UDP is the wrong protocol to use for sending whole data such as a text, or a photo; it is an unreliable protocol so you aren't guaranteed to receive every packet, unless you're using a managed socket library (such as ENet).
Lua51/LuaJIT + LuaSocket
Polling is the only method.
Blocking: call socket.select with no time argument and wait for the socket to be readable.
Non-blocking: call socket.select with a timeout argument of 0, and use sock:settimeout(0) on the socket you're reading from.
Then simply call these repeatedly.
I would suggest using a coroutine scheduler for the non-blocking version, to allow other parts of the program to continue executing without causing too much delay.
Lua51/LuaJIT + LuaSocket + Lua Lanes (Recommended)
Same as the above method, but the socket exists in another lane (a lightweight Lua state in another thread) made using Lua Lanes (latest source). This allows you to instantly read the data from the socket and into a buffer. Then, you use a linda to send the data to the main thread for processing.
This is probably the best solution to your problem.
I've made a simple example of this, available here. It relies on Lua Lanes 3.4.0 (GitHub repo) and a patched LuaSocket 2.0.2 (source, patch, blog post re' patch)
The results are promising, though you should definitely refactor my example code if you derive from it.
LuaJIT + OS-specific sockets
If you're a little masochistic, you can try implementing a socket library from scratch. LuaJIT's FFI library makes this possible from pure Lua. Lua Lanes would be useful for this as well.
For Windows, I suggest taking a look at William Adam's blog. He's had some very interesting adventures with LuaJIT and Windows development. As for Linux and the rest, look at tutorials for C or the source of LuaSocket and translate them to LuaJIT FFI operations.
(LuaJIT supports callbacks if the API requires it; however, there is a signficant performance cost compared to polling from Lua to C.)
LuaJIT + ENet
ENet is a great library. It provides the perfect mix between TCP and UDP: reliable when desired, unreliable otherwise. It also abstracts operating system specific details, much like LuaSocket does. You can use the Lua API to bind it, or directly access it via LuaJIT's FFI (recommended).
* Pun unintentional.
I use lua-ev https://github.com/brimworks/lua-ev for all IO-multiplexing stuff.
It is very easy to use fits into Lua (and its function) like a charm. It is either select/poll/epoll or kqueue based and performs very good too.
local ev = require'ev'
local loop = ev.Loop.default
local udp_sock -- your udp socket instance
udp_sock:settimeout(0) -- make non blocking
local udp_receive_io = ev.IO.new(function(io,loop)
local chunk,err = udp_sock:receive(4096)
if chunk and not err then
-- process data
end
end,udp_sock:getfd(),ev.READ)
udp_receive_io:start(loop)
loop:loop() -- blocks forever
In my opinion Lua+luasocket+lua-ev is just a dream team for building efficient and robust networking applications (for embedded devices/environments). There are more powerful tools out there! But if your resources are limited, Lua is a good choice!
Lua is inherently single-threaded; there is no such thing as an "event". There is no way to interrupt executing Lua code. So while you could rig something up that looked like an event, you'd only ever get one if you called a function that polled which events were available.
Generally, if you're trying to use Lua for this kind of low-level work, you're using the wrong tool. You should be using C or something to access this sort of data, then pass it along to Lua when it's ready.
You are probably using a non-blocking select() to "poll" sockets for any new data available. Luasocket doesn't provide any other interface to see if there is new data available (as far as I know), but if you are concerned that it's taking too much time when you are doing this 10 times per second, consider writing a simplified version that only checks one socket you need and avoids creating and throwing away Lua tables. If that's not an option, consider passing nil to select() instead of {} for those lists you don't need to read and pass static tables instead of temporary ones:
local rset = {socket}
... later
...select(rset, nil, 0)
instead of
...select({socket}, {}, 0)
EDIT : My original title was "Use of Stub in RPC" ; I edited the title just to let others know it is more than that question.
I have started developing some SOAP based services and I cannot understand the role of stubs. To quote Wiki :
The client and server use different address spaces, so conversion of parameters used in a function call have to be performed, otherwise the values of those parameters could not be used, because of pointers to the computer's memory pointing to different data on each machine. The client and server may also use different data representations even for simple parameters (e.g., big-endian versus little-endian for integers.) Stubs are used to perform the conversion of the parameters, so a Remote Function Call looks like a local function call for the remote computer.
This is dumb, but I don't understand this "practically". I have done some socket programming in Java, but I don't remember any step for "conversion of parameters" when my TCP/UDP clients interacted with my server. (I assume raw server-client communication using TCP/UDP sockets does come under RPC)
I have had some experience with RESTful service development, but I can't recognize the Stub analogue with REST either. Can someone please help me ?
Stubs for calls over the network (be they SOAP, REST, CORBA, DCOM, JSON-RPC, or whatever) are just helper classes that give you a wrapper function that takes care of all the underlying details, such as:
Initializing your TCP/UDP/whatever transport layer
Finding the right address to call and doing DNS lookups if needed
Connecting to the network endpoint where the server should be
Handling errors if the server isn't listening
Checking that the server is what we're expecting it to be (security checks, versioning, etc)
Negotiating the encoding format
Encoding (or "marshalling") your request parameters in a format suitable for transmission on the network (CDR, NDR, JSON, XML, etc.)
Transmitting your encoded request parameters over the network, taking care of chunking or flow control as necessary
Receiving the response(s) from the server
Decoding (or "unmarshalling") the response details
Returning the responses to your original calling code (or throwing an error if something went wrong)
There's no such thing as "raw" TCP communication. If you are using it in a request/response model and infer any kind of meaning from the data sent across the TCP connection then you've encoded some form of "parameters" in there. You just happened to build yourself what stubs would normally have provided.
Stubs try to make your remote calls look just like local in-process calls, but honestly that's a really bad thing to do. They're not the same at all, and they should be considered differently by your application.
I´ve read that it´s possible to share sockets between processes. Is this also possible in Node.js?
I saw the cluster api in node.js, but that´s not what I´m looking for. I want to be able to accept a connection in one process, maybe send & read a bit, and after a while pass this socket to another fully independent node.js process.
I could already do this with piping, but I don´t want to do this, since it´s not as fast as directly reading/writing to the socket itself.
Any ideas?
Update
I found the following entry in the node.js documentation:
new net.Socket([options]) #
Construct a new socket object.
options is an object with the following defaults:
{ fd: null
type: null
allowHalfOpen: false
}
fd allows you to specify the existing file descriptor of socket. type specified underlying protocol. It can be 'tcp4', 'tcp6', or 'unix'. About allowHalfOpen, refer to createServer() and 'end' event.
I think it would be possible to set the "fd" property to the filedescriptor of the socket and then open the socket with that. But... How can I get the filedescriptor of the socket and pass it to the process that needs it?
Thanks for any help!
This is not possible at the moment, but I've added it as a feature request to the node issues page.
Update
In the mean time, I've written a module for this. You can find it here: https://github.com/VanCoding/node-ancillary
You probably want to take a look at hook.io
hook.io is a distributed EventEmitter built on node.js. In addition to providing a minimalistic event framework, hook.io also provides a rich network of hook libraries for managing all sorts of input and output.
I am writing a C daemon which my web application will use as a proxy to communicate with FTP servers. My web application enables users to connect and interact with FTP sites via AJAX. The reason I need a C daemon is that I have no way of keeping FTP connections alive across AJAX calls.
My web application will need to be able to tell my daemon to do list, get, put, delete, move, and rename files to a given FTP server for a given user account. So when my application talks to the daemon, it needs to pass the following via some protocol I define: 1) action, 2) connection id, 3) user id, 4) any additional parameters for action (note: connection information is stored in a database, so the daemon will talk to that as well).
So that's what I need my daemon to do. I'm thinking communication between my web app and the daemon will take place via a TCP socket, but I don't know exactly what data I would send. I need an example. For instance, should I just send something like this over the socket to the daemon?
action=list&connection_id=345&user_id=12345&path=/some/path
or should I do something hardcore at the byte level, like this?
+-----------------+-------------------------+-------------------+-----------------------------------+
| 1 byte (action) | 4 bytes (connection id) | 4 bytes (user id) | 255 bytes (additional parameters) |
+-----------------+-------------------------+-------------------+-----------------------------------+
| 0x000001 | 345 | 12345 | /some/path |
+-----------------+-------------------------+-------------------+-----------------------------------+
What does such communication over a socket normally look like?
Really it's mostly about whatever format is easiest for you to encode and parse, which is why rather than reinventing the wheel with my own protocol, I personally would go with an existing remote procedure call solution. My second choice would be the bitfields, as that's easy to pack into and out of a struct.
You don't necessarily need to implement your own protocol. Have you thought about using something like XML-RPC, or even just plain XML? There are C libraries that should let you parse it.
Binary protocols are a bit easier to deal with. Just prepend the length to the message (or just the variable part of it) - TCP doesn't know about your application-level message boundaries. Pay attention to number endianness.
On the other hand, text-based protocols are more flexible.
Also, take a look at Google Protocol Buffers - could be very useful, though I'm not sure ajax is supported.