Reading byte stream returned from JavaEE server - iphone

We have a JavaEE server and servlets providing data to mobile clients (first JavaME, now soon iPhone). The servlet writes out data using the following code:
DataOutputStream dos = new DataOutputStream(out);
dos.writeInt(someInt);
dos.writeUTF(someString);
... and so on
This data is returned to the client as bytes in the HTTP response body, to reduce the number of bytes transferred.
In the iPhone app, the response payload is loaded into NSData object. Now, after spending hours and hours trying to figure out how to read the data out in the Objective-C application, I'm almost ready to give up, as I haven't found any good way to read the data into NSInteger and NSString (as corresponding to above protocol)
Would anyone have any pointers how to read stuff out from a binary protocol written by a java app? Any help is greatly appreciated!
Thanks!

You'll have to do the demarshalling yourself; fortunately, it's fairly straightforward. Java's DataOutputStream class writes integers in big-endian (network) format. So, to demarshall the integer, we grab 4 bytes and unpack them into a 4-byte integer.
For UTF-8 strings, DataOutputStream first writes a 2-byte value indicating the number of bytes that follow. We read that in, and then read the subsequent bytes. Then, to decode the string, we can use the NSString method initWithBytes:length:encoding: as so:
NSData *data = ...; // this comes from the HTTP request
int length = [data length];
const uint8_t *bytes = (const uint8_t *)[data bytes];
if(length < 4)
; // oops, handle error
// demarshall the big-endian integer from 4 bytes
uint32_t myInt = (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | (bytes[3]);
// convert from (n)etwork endianness to (h)ost endianness (may be a no-op)
// ntohl is defined in <arpa/inet.h>
myInt = ntohl(myInt);
// advance to next datum
bytes += 4;
length -= 4;
// demarshall the string length
if(length < 2)
; // oops, handle error
uint16_t myStringLen = (bytes[0] << 8) | (bytes[1]);
// convert from network to host endianness
myStringLen = ntohs(myStringLen);
bytes += 2;
length -= 2;
// make sure we actually have as much data as we say we have
if(myStringLen > length)
myStringLen = (uint16_t)length;
// demarshall the string
NSString *myString = [[NSString alloc] initWithBytes:bytes length:myStringLen encoding:NSUTF8StringEncoding];
bytes += myStringLen;
length -= myStringLen;
You can (and probably should) write functions to demarshall, so that you don't have to repeat this code for every field you want to demarshall. Also, be extra careful about buffer overflows. You're handling data sent over the network, which you should always distrust. Always verify your data, and always check your buffer lengths.

The main thing is to understand the binary data format itself. It doesn't matter what's written it, so long as you know what the bytes mean.
As such, the docs for DataOutputStream are your best bet. They specify everything (hopefully) about what the binary data will look like.
Next, I would try to basically come up with a class on the iPhone which will read the same format into appropriate data structure. I don't know Objective C at all, but I'm sure that it can't be too hard to read 4 bytes, know that the first byte is the most significant (etc) and do appropriate bit-twiddling to get the right kind of integer. (Basically read a byte, shift it left 8, read the next byte and add it into the result, shift the whole lot left 8 bits, etc.) There may well be more efficient ways of doing it, but get something that works first. When you've got unit tests around it all, you can move onto optimising it.

Don't forget that Objective-C is just C in a pretty dress--and C excels at this kind of bit-grovelling. To a large extent, you should be able to just define a C struct that looks like your data and cast the pointer to your data into a pointer to that struct. Now, exactly which types to use, and if you need to byte-swap anything, will depend on how Java constructs this stream; that's what you'll need to spend time with Java's documentation for.
Fundamentally, though, this is a design smell. You're having this problem because you made assumptions about your client platform that are no longer valid. If it's an option, I'd recommend you offer a second, more portable interface to the same functions (just adding "WithXML" wrappers or something should suffice). This will save you time if you ever end up porting to another platform that doesn't use Java.

Carl, if you really can't change what the server provides have a look at this class. It should be the pointer you are looking for. That said: The idea to use native java serialization as a transport format does not sound like a good idea. My first choice would have been JSON. If that's still too big I would probably rather use something like Thrift or Protobuffers. They also provide binary serialization - but in a cross language manner. (There is also the oldie ASN1 - but that's painful)

Related

converting a string to Unicode in C

I have a string in a variable and that string comes from the core part of the project. Now i want to convert that to unicode string. How can i do that
and adding L or _T() or TEXT() is not an option.
To further make thing clear please see below
Void foo(char* string) {
//Here the contents of the variable STRING should be converted to Unicode
//The soln should be possible to use in C code.
}
TIA
Naveen
L is used to create wchar_t literals.
From your comment about SafeArrayPutElement and the way you us the term 'Unicode' it's clear you're using Windows. Assuming that that char* string is in the legacy encoding Windows is using and not UTF-8 or something (a safe assumption on Windows) you can get a wchar_t string in the following ways:
// typical Win32 conversion in C
int output_size = MultiByteToWideChar(CP_ACP,0,string,-1,NULL,0);
wchar *wstring = malloc(output_size * sizeof(wchar_t));
int size = MultiByteToWideChar(CP_ACP,0,string,-1,wstring,output_size);
assert(output_size==size);
// make use of wstring here
free(wstring);
If you're using C++ you might want to make that exception safe by using std::wstring instead (this uses a tiny bit of C++11 and so may require VS2010 or above):
std::wstring ws(output_size,L'\0');
int size = MultiByteToWideChar(CP_ACP,0,string,-1,ws.data(),ws.size());
// MultiByteToWideChar tacks on a null character to mark the end of the string, but this isn't needed when using std::wstring.
ws.resize(ws.size() -1);
// make use of ws here. You can pass a wchar_t pointer to a function by using ws.c_str()
//std::wstring handles freeing the memory so no need to clean up
Here's another method that uses more of the C++ standard library (and takes advantage of VS2010 not being completely standards compliant):
#include <locale> // for wstring_convert and codecvt
std::wstring ws = std::wstring_convert<std::codecvt<wchar_t,char,std::mbstate_t>,wchar_t>().from_bytes(string);
// use ws.c_str() as before
You also imply in the comments that you tried converting to wchar_t and got the same error. If that's the case when you try these methods for converting to wchar_t then the error lies elsewhere. Probably in the actual content of your string. Perhaps it's not properly null terminated?
You can't say "converted to Unicode". You need to specify an encoding, Unicode is not an encoding but (roughly) a character set and a set of encodings to express those characters as sequences of bytes.
Also, you must specify the input encoding, how is e.g. a character such as "å" encoded in string?

NSCoding and integer arrays

How do you use NSCoding to code (and decode) an array of of ten values of primitive type int? Encode each integer individually (in a for-loop). But what if my array held one million integers? Is there a more satisfying alternative to using a for-loop here?
Edit (after first answer): And decode? (#Justin: I'll then tick your answer.)
If performance is your concern here: CFData/NSData is NSCoding compliant, so just wrap your serialized representation of the array as NSCFData.
edit to detail encoding/decoding:
your array of ints will need to to be converted to a common endian format (depending on the machine's endianness) - e.g. always store it as little or big endian. during encoding, convert it to an array of integers in the specified endianness, which is passed to the NSData object. then pass the NSData representation to the NSCoder instance. at decode, you'll receive an NSData object for the key, you conditionally convert it to the native endianness of the machine when decoding it. one set of byte swapping routines available for OS X and iOS begin with OSSwap*.
alternatively, see -[NSCoder encodeBytes:voidPtr length:numBytes forKey:key]. this routine also requires the client to swap endianness.

Writing 64 bit int value to NSOutputStream

I'm having trouble communicating position coordinates to a set of motors I have connected on the network. I can send a string just fine, and receive text back from the motor, but I can't seem to send it an int value.
Using NSlog I have determined that the actual value I'm sending is correct, however I suspect my method of sending it via the output stream is wrong. Any ideas?
My code for sending a 64bit int value:
uint64_t rawInt = m1;
rawInt <<= 16;
rawInt |= m2;
NSData *buffer = [NSData dataWithBytes: &rawInt length:8];
[outputStream write: [buffer bytes] maxLength:[buffer length]];
Well, your codes looks kind of funny. Anyway, for sending a binary integer over the network, you have to know the endianess: The receiver either expects little or big endian integers.
Here's the code for both:
uint64_t myInt = 42;
uint64_t netInt = CFSwapInt64HostToLittle(myInt); // use this for little endian
uint64_t netInt = CFSwapInt64HostToBig(myInt); // use this for big endian
[outputStream write:(uint8_t*)&netInt maxLength:sizeof(netInt)];
You're probably seeing an endianness problem. Intel CPUs are little-endian, meaning that the least significant byte in a multi-byte value is stored first, the most significant byte last. Reading an 8-byte number from memory and putting it on the network without doing anything will cause the bytes to go out in that order.
Most network protocols (including all internet protocols) expect big-endian data, so the most significant byte appears on the network first and the least significant byte last. Does your network protocol expect that?

What kind of data type is this?

In an class header I have seen something like this:
enum {
kAudioSessionProperty_PreferredHardwareSampleRate = 'hwsr', // Float64
kAudioSessionProperty_PreferredHardwareIOBufferDuration = 'iobd' // Float32
};
Now I wonder what data type such an kAudioSessionProperty_PreferredHardwareSampleRate actually is?
I mean this looks like plain old C, but in Objective-C I would write #"hwsr" if I wanted to make it a string.
I want to pass such an "constant" or "enum thing" as argument to an method.
This converts to an UInt32 enum value using the ASCII value of each of the entries. This style have been around for a long time in Mac OS headers.
'hwsr' has the same value as if you had written 0x68777372, but is a lot more reader friendly. If you used the #"hwsr" style instead you would need more than 4 bytes to represent the same.
The advantage of using this style is that you are actually able to quickly identify the content of a raw data stream if you can see the ASCII values of it.

Parsing of binary data with scala

I need to parse some simple binary Files. (The files contains n entries which consists of several signed/unsigned Integers of different sizes etc.)
In the moment i do the parsing "by hand". Does somebody know a library which helps to do this type of parsing?
Edit: "By hand" means that i get the Data Byte by Byte sort it in to the correct Order and convert it to an Int/Byte etc. Also some of the Data is unsigned.
I've used the sbinary library before and it's very nice. The documentation is a little sparse but I would suggest first looking at the old wiki page as that gives you a starting point. Then check out the test specifications, as that gives you some very nice examples.
The primary benefit of sbinary is that it gives you a way to describe the wire format of each object as a Format object. You can then encapsulate those formatted types in a higher level Format object and Scala does all the heavy lifting of looking up that type as long as you've included it in the current scope as an implicit object.
As I say below, I'd now recommend people use scodec instead of sbinary. As an example of how to use scodec, I'll implement how to read a binary representation in memory of the following C struct:
struct ST
{
long long ll; // # 0
int i; // # 8
short s; // # 12
char ch1; // # 14
char ch2; // # 15
} ST;
A matching Scala case class would be:
case class ST(ll: Long, i: Int, s: Short, ch1: String, ch2: String)
I'm making things a bit easier for myself by just saying we're storing Strings instead of Chars and I'll say that they are UTF-8 characters in the struct. I'm also not dealing with endian details or the actual size of the long and int types on this architecture and just assuming that they are 64 and 32 respectively.
Scodec parsers generally use combinators to build higher level parsers from lower level ones. So for below, we'll define a parser which combines a 8 byte value, a 4 byte value, a 2 byte value, a 1 byte value and one more 1 byte value. The return of this combination is a Tuple codec:
val myCodec: Codec[Long ~ Int ~ Short ~ String ~ String] =
int64 ~ int32 ~ short16 ~ fixedSizeBits(8L, utf8) ~ fixedSizeBits(8L, utf8)
We can then transform this into the ST case class by calling the xmap function on it which takes two functions, one to turn the Tuple codec into the destination type and another function to take the destination type and turn it into the Tuple form:
val stCodec: Codec[ST] = myCodec.xmap[ST]({case ll ~ i ~ s ~ ch1 ~ ch2 => ST(ll, i, s, ch1, ch2)}, st => st.ll ~ st.i ~ st.s ~ st.ch1 ~ st.ch2)
Now, you can use the codec like so:
stCodec.encode(ST(1L, 2, 3.shortValue, "H", "I"))
res0: scodec.Attempt[scodec.bits.BitVector] = Successful(BitVector(128 bits, 0x00000000000000010000000200034849))
res0.flatMap(stCodec.decode)
=> res1: scodec.Attempt[scodec.DecodeResult[ST]] = Successful(DecodeResult(ST(1,2,3,H,I),BitVector(empty)))
I'd encourage you to look at the Scaladocs and not at the Guide as there's much more detail in the Scaladocs. The guide is a good start at the very basics but it doesn't get into the composition part much but the Scaladocs cover that pretty well.
Scala itself doesn't have a binary data input library, but the java.nio package does a decent job. It doesn't explicitly handle unsigned data--neither does Java, so you need to figure out how you want to manage it--but it does have convenience "get" methods that take byte order into account.
I don't know what you mean with "by hand" but using a simple DataInputStream (apidoc here) is quite concise and clear:
val dis = new DataInputStream(yourSource)
dis.readFloat()
dis.readDouble()
dis.readInt()
// and so on
Taken from another SO question: http://preon.sourceforge.net/, it should be a framework to do binary encoding/decoding.. see if it has the capabilities you need
If you are looking for a Java based solution, then I will shamelessly plug Preon. You just annotate the in memory Java data structure, and ask Preon for a Codec, and you're done.
Byteme is a parser combinators library for doing binary. You can try to use it for your tasks.