Converting from TClientSocket to Indy: What is equivalent to ReceiveLength? - sockets

I am using XE7 and converting an app from using the ScktComp components to Indy, using TIdTCPClient in place of TClientSocket. At present I am just putting in likely equivalents to get it to compile. Most of it has been converted, except for this snippet:
if (Socket.ReceiveLength > 0) then
begin
s := Socket.ReceiveText;
which i have converted to
s := Socket.IOHandler.ReadLn
I have no equivalent for ReceiveLength.
Any ideas?

IOHandler.ReadLn() is not the correct equivalent of ReceiveText().
ReceiveLength() returns the number of unread bytes that are currently in the socket's buffer. ReceiveText() simply reads whatever raw bytes are currently in the socket's buffer and returns them in a string variable. It is a wrapper for a single call to ReceiveBuf() using ReceiveLength() as the buffer size.
IOHandler.ReadLn() reads from the IOHandler's own memory buffer, populating it with bytes from the socket's buffer as needed, until it encounters the specified terminator (which is a LF character by default), no matter how many reads it takes to accomplish that.
There is no direct translation of ReceiveLength() in Indy, however the closest equivalent to your snippet would be to call IOHandler.CheckForDataOnSource() followed by IOHandler.InputBufferAsString(), eg:
Socket.CheckForDataOnSource(0);
if not Socket.InputBufferIsEmpty then
begin
s := Socket.InputBufferAsString;
Alternatively, you can IOHandler.ReadBytes() with its AByteCount parameter set to -1, and then convert the returned byte array to a string:
buf := nil;
Socket.ReadBytes(buf, -1);
if buf <> nil then
begin
s := BytesToString(buf);
That being said, I have to ask why you are using ReceiveText() in the first place. It returns a string of arbitrary bytes, so it doesn't really lend itself well to most communication needs, can break up textual data in unpredictable ways, and is not suited for binary data at all. Network protocols usually have structure to them, and TClientSocket usage typically requires code to manually buffer bytes and parse structured data from that buffer - things Indy is designed to handle for you. You should focus more on the goal you want to achieve and less on the particulars of how to get it. If you need to read an integer, ask Indy to read an integer. If you need to read a string of a particular length or ending with a particular delimiter, ask Indy to read a string. If you need to read a block of X bytes, ask Indy to read X bytes. Indy has many read/write methods available to automate common tasks that you would normally have to do manually.

Related

Null pointer Exception in the case of reading in emoticons

I have a text file that looks like this:
shooting-stars 💫 "are cool"
I have a lexical analyzer that uses FileInputStream to read the characters one at a time, passing those characters to a switch statement that returns the corresponding lexeme.
In this case, 💫 represents assignment so this case passes:
case 'ð' :
return new Lexeme("ASSIGN");
For some reason, the file reader stops at that point, returning a null pointer even though it has yet to process the string (or whatever is after the 💫). Any time it reads in an emoticon it does this. If there were no emoticons, it gets to the end of file. Any ideas?
I suspect the problem is that the character 💫 (Unicode code point U+1F4AB) is outside the range of characters that Java represents internally as single char values. Instead, Java represents characters above U+FFFF as two characters known as surrogate pairs, in this case U+D83D followed by U+DCAB. (See this thread for more info and some links.)
It's hard to know exactly what's going on with the little bit of code that you presented, but my guess is that you are not handling this situation correctly. You will need to adjust your processing logic to deal with your emoticons arriving in two pieces.

Receiving unknown strings lengths?

So I'm converting a Python program I wrote to Erlang, and it's been a long time since I used Erlang. So I guest I'm moved back to beginner level. Anyways from experience every language I use when dealing with sockets have send/recv functions that always return the length of data sent/receive. In Erlangs gen_tcp case however doesn't seem to do that.
So when I call send/recv/or inet:setopts it knows when the packet has ended? Will I need to write a looping recvAll/sendAll function so I can find the escape or \n in the packet(string) I wish to receive?
http://erlang.org/doc/man/gen_tcp.html#recv-2
Example code I'm using:
server(LS) ->
case gen_tcp:accept(LS) of
{ok,S} ->
loop(S),
server(LS);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
inet:setopts(S,[{active,once}]),
receive
{tcp,S,Data} ->
Answer = process(Data), % Not implemented in this example
gen_tcp:send(S,Answer),
loop(S);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
Just from looking at examples and documentation it seems like Erlang just knows. And I want to confirm, because the length of data being received can be anywhere between to 20 bytes to 9216 bytes and or could be sent in chunks since the client is a PHP socket library I'm writing.
Thank you,
Ajm.
TL;DR
So when I call send/recv/or inet:setopts it knows when the packet has
ended?
No, it doesn't.
Will I need to write a looping recvAll/sendAll function so I can find
the escape or \n in the packet(string) I wish to receive?
Yes, generally, you will. But erlang can do this job for you.
HOW?
Actually, you couldn't rely on TCP in sense of splitting messages into packets. In general, TCP will split your stream to arbitrary sized chunks, and you program have to assemble this chunks and parse this stream by own. So, first, your protocol must be "self delimiting". For example you can:
In binary protocol - precede each packet with its length (fixed-size field). So, protocol frame will looks like this: <<PacketLength:2/big-unsigned-integer, Packet/binary>>
In text protocol - terminate each line with line feed symbol.
Erlang can help you with this deal. Take a look here http://erlang.org/doc/man/gen_tcp.html#type-option. There is important option:
{packet, PacketType}(TCP/IP sockets)
Defines the type of packets to use for a socket. The following values are valid:
raw | 0
No packaging is done.
1 | 2 | 4
Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. The length of header can be one, two, or four bytes; containing an unsigned integer in big-endian byte order. Each send operation will generate the header, and the header will be stripped off on each receive operation.
In current implementation the 4-byte header is limited to 2Gb.
line
Line mode, a packet is a line terminated with newline, lines longer than the receive buffer are truncated.
Last option (line) is most interesting for you. If you'll set this option, erlang will parse input stream internally and yeld packets splitted by lines.

Read UTF-8 encoded string from io.Reader

I am writing an small communication protocol with TCP sockets.
I am able to read and write basic data types such as integers but I have no idea of how to read an UTF-8 encoded string from a slice of bytes.
The protocol client is written in Java and the server is Go.
As per I read: GO runes are 32 bit long and UTF-8 chars are 1 to 4 byte long, what makes not possible to simply cast a byte slice to a String.
I'd like to know how can I read and write this UTF-8 stream.
Note
I have the byte buffer length on time to read the string.
Some theory first:
A rune in Go represents a Unicode code point — a number assigned to a particular character in Unicode. It's an alias to uint32.
UTF-8 is a Unicode encoding — a format of representing Unicode code points for the means of storage and transmission. UTF-8 might use 1 to 4 bytes to encode a single code point.
How this maps on Go data types:
Both []byte and string store a series of bytes (a byte in Go is an alias for uint8).
The chief difference is that strings are immutable, so while you can
b := make([]byte, 2)
b[0] = byte('a')
b[1] = byte('z')
you can't
var s string
s[0] = byte('a')
The latter fact is even underlined by the inability to set the string length explicitly (like in imaginary s := make(string, 10)).
While strings in Go contain abstract bytes (you're free to store in them, say, characters encoded using Windows-1252), certain Go statements and type conversions interpret strings as being encoded in UTF-8, in particular:
A type conversion between string and []rune parses the string as a sequence of UTF-8-encoded code points and produces a slice of them. The reverse type conversion takes the Unicode code points from the slice of runes and produces an UTF-8-encoded string.
A range loop over a string loops through Unicode code points comprising the string, not just bytes.
Go also supplies the type conversions between string and []byte and back. Now recall that strings are read-only, while slices of bytes are not. This means a construct like
b := make([]byte, 1000)
io.ReadFull(r, b)
s := string(b)
always copies the data, no matter if you convert a slice to a string or back. This wastes space but is type-safe and enforces the semantics.
Now back to your task at hand.
If you work with reasonably small strings and are not under memory pressure, just convert your byte slices filled by io.Read() (or whatever) to strings. Be sure to reuse the slice you're using to read the data to ease the pressure on the garbage collector — that is, do not allocate a new slice for each new read as you're gonna to copy the data put to it by the reading code off to a string.
Finally, if you absolutely must to not copy the data (say, you're dealing with multi-megabyte strings, and you have tight memory requirements), you may try to play dirty tricks by unsafely working with memory — here is an example of how you might transplant the memory from a byte slice to a string. Note that should you revert to something like this, you must very well understand that it's free to break with any new release of Go, and it's not even guaranteed to work at all.

How do I use the StackExchange API from Matlab?

How do I access data from the StackExchange API using Matlab?
The naive
sitedata = urlread('http://api.stackoverflow.com/1.1/questions?tagged=matlab')
fails since the data is compressed. However, when I write this to file (using fprintf(fileID,'%s',sitedata)), I get a zip-file that cannot be uncompressed.
Try urlwrite() instead:
urlwrite('http://api.stackoverflow.com/1.1/questions?tagged=matlab',...
'tempfile.zip')
gunzip('tempfile.zip')
fid = fopen('tempfile');
str = textscan(fid,'%s',Delimiter','\n');
fclose(fid);
A better version of this snippet would use tempname to dynamically generate temporary filenames.
Matlab's urlread assumes you're getting text data back, not binary. The gzip binary data is getting mangled either when urlread is decoding the character data to Unicode values to stick in Matlab chars, or when the formatted-output fprintf function is writing them out, encoding them to UTF-8 or whatever default character encoding you're using for fileID and changing the byte sequence, or maybe both.
IIRC, urlread will default to using ISO-8859-1 encoding, which means the bytes will be turned in to the Unicode code points with the same numeric values - effectively just a widening. So you can get the byte data back by doing sitebytes = uint8(sitedata). (That's a regular uint8() conversion, not a typecast().) (If this isn't the case, you can probably fiddle with urlread's CharSet option.)
If you can't get the right bytes out from urlread by fiddling with the encoding and casts, then you can drop down and make calls against the Java HttpAgent like urlread does and bypass the character set decoding step, or fiddle with its options. See the urlread source for how to do it.
Once you have the right bytes in memory, you can write them out to a file using the lower-level fwrite() function, which won't mangle them by doing character set encoding. Then you'll have a valid gzip file of the site's original response. (I think it'll work if you also just use fwrite(fileID, sitedata, 'uint8') directly on the char string, but it's uglier IMHO.)
You can also unzip it in memory using Java classes and save a trip to the filesystem. Do jsitebytes = typecast(sitebytes 'int8') to get them as Java-friendly signed bytes and then stick it into a ByteArrayInputStream and read it out through a GZIPInputStream. You'll need to build a little Java helper class because Matlab doesn't play well with passing byte[] buffers by reference like java.io wants, but it may be worthwhile if you do a lot of in-memory munging like this.
When working with web services or fancier data downloads (e.g. sites that need sessions or certificates), I've often ended up dropping down and coding directly against the HttpAgent and java.io classes from within Matlab.

Is there a wide character version of WSABUF structure in "Microsoft Visual Studio 8\VC\PlatformSDK\Include\WinSock2"

Is there a wide character version of WSABUF structure in winsock?
I want to write Japanese data on the socket.
As another answer states, WSABUF uses char * to represent bytes.
TCP provides a stream of bytes it's up to you to decide what those bytes consist of. So, as long as you're providing some kind of protocol framing so that you can read the correct amount of data at the far end, just cast your wide string to a char *.
If you were to follow your question through to its logical conclusion you'd next be asking where the WSABUF that supports PNG images is, or the WSABUF that supports your favourite data structure. It's up to you to translate the data that you have to a stream of bytes (which, in the case of a wide character string, is simply framing and casting).
Probably not. You most likely need to convert your wide character string into some other format, such as UTF7 or something, and send that over the wire then convert back on the other side.