I have the following performance problem when decoding base64 on iPhone 4S. I'm decoding a "blob" of size roughly 80K. The native function for this, atob(), is very fast and completes in a few ms. However, the result of this is a string where the character code (as obtained using .charCodeAt()) of each character in the string represents one byte of the binary data that was base64 encoded. I want to convert this to a byte[], but this seems to be a very slow process on especially iPhone's. Just running through the array and calling .charCodeAt() on each character and storing that in e.g. an UInt8Array takes 1.2 seconds on an iPhone 4S - even though we are talking about only 80K and the base64 decoding has already been done - it's the character to byte decoding that takes this long. The performance penalty is the same if one instead parses the decoded string (i.e. output of atob()) back to GWT and do the conversion to byte[] there (again by a loop). So what I'm looking for is either:
A fast way to convert the string that results from atob() to byte[]. It seems converting them character-for-character using charCodeAt() is way too slow compared to what such an operation should take.
A fast base64 to byte[] decoder which doesn't rely on atob() but can natively generate the byte[].
Unfortunately the alternatives I've tried for #2 were even slower (factor 3x) than the other method.
After more experimentation it turned out that the String -> Uint8array conversion really isn't as slow as was indicated before, it was due to another programming bug. So the 1,2 seconds were really only around 40 ms. So from my POV this can be closed.
Related
I'm trying to store 64 bit integers from a python program in my Firestore database. The problem is that it seems like the last digits are rounded off.
doc = db.collection('questions').document('0MPvbeTEglD9lbpDq6xm')
ints = [9223372036854775807, 9223372036854775533, 9223372036854775267]
doc.update({
'random': ints
})
When I look in the database they are stored as:
random = [9223372036854776000, 9223372036854776000, 9223372036854775000}
According to the documentation 64 bit signed integers are supported. What could the problem be?
I am not 100% certain, but my guess is that what you're seeing is due to the fact that JavaScript integers are not 64 bits in size. They're actually more like 53 bits. Since the Firebase console is a web app implemented with JavaScript, it probably can't understand the full 64 bits of very large integers you write to it.
What I'd recommend is reading the values back out of your document with another python program instead of checking the values in the console. If they're the same as what you wrote, then there's not real problem here. You just can't trust the rendering of the values in the console.
TL;DR
I'm trying to talk to a Minecraft server with a client written in Scala, using Akka I/O over TCP, and would like to know if Minecraft uses an official, standardised protocol in it's network communication?
Minecraft's own documentation covers the contents of each packet, but fails to explain how the packets themselves are encoded, or even how they should be formed.
A little back story
As part of a personal project that I'm working on, written in Scala, I need to create an interface capable of mocking a Minecraft client, and performing actions against a Minecraft server. After weeks of research, I came across a lot of Java libraries that were almost what I was looking for, but none that quite suited my exact needs; long story short, I did the classic, "Oh well, why not write it myself and enjoy the learning curve"...
The issue
The Minecraft protocol documentation is thorough in some respects, but lacking in others, many assumptions are made throughout and a lot of key information is missing or even incorrect; a detailed network specification being the most notable in my case.
One attempt to talk to the Minecraft server had me playing around with Google's protocol buffers, using ScalaPB to compile them to usable case classes, but the data types were a pain to resolve between Google's own documentation and Minecraft's.
message Handshake {
<type?> protocolVersion = 1;
<type?> host = 2;
<type?> port = 3;
<type?> nextState = 4;
}
The host is a string, so that's an easy win, but both the protocolVersion and nextState are variable integers, which are not encoded as expected when I compared them with valid packets generated by another client with identical contents (I've been using a third-party library to compare the hexadecimal output of encoded packets).
My hacky solution
In a ditch attempt to achieve my goals, I've simply written methods like the one below (this is also a first iteration, so be kind!) to generate the desired encoding for each of the types declared in Minecraft's documentation that are not supported natively in Scala, and although this works, it just smells like I'm missing something potentially obvious that others might know about.
def toVarint(x: Int): Array[Byte] = {
var number = x
var output = ArrayBuffer[Int]()
while (number >= Math.pow(2, 31)) {
output += number & 0xFF | 0x80
number /= 128
}
while ((number & ~0x7F) > 0) {
output += number 0xFF | 0x80
number >>>= 7
}
output += number | 0
output.map(_.toByte).toArray
}
This is old, but I'll answer anyway! The wiki you referenced has (at least now) a section on the packet format (both before & after compression), as well as explanations including example code for how to handle their varint & varlong types!
To summarise, each packet is length prefixed with a varint (in both compression modes), so you just need to read a varint from the stream, allocate that much space & read that many bytes from the stream to the buffer.
Each byte of Minecraft varints have a "another byte to follow?" flag bit, followed by 7 bits of actual data. Where those 7 bits are just the bits from a standard int (so on receive, you essentially just omit those flag bits & write the rest to a standard int type)
I see that there are many ways to serialize/deserialize Haskell objects:
Data.Serialize -> encode, decode functions
Data.Binary http://code.haskell.org/binary/
MsgPack, JSON, BSON, etc
In my application, I want to setup a simple TCP client-server, where client may send serialized Haskell record objects. How does one decide between these serialization alternatives?
Additionally, when objects serialized into strings are sent over the network using Network.Socket, strings are returned. Is there a slightly higher level library, that works at the level of whole TCP messages? In other words, is there a way to avoid writing parsing code on the receive end that:
collects results of a sequence of recv() calls,
detect that a whole object has been received, and
then parse it into a haskell type?
In my application, the objects are not expected to be too large (maybe about ~1MB max).
As for the second part of your question, two things are required:
An incremental parser that doesn't need to have the whole document in memory to start parsing, and which can be fed with the partial chunks of data arriving from the wire. Also, when the parsing succeeds it must return any "leftover data" along with the parsed value.
A source of data with "pushback capabilities", that allows you to "unread" any leftovers so that they are available to the next parsing attempt.
The most popular library providing (1) is attoparsec. As for (2), all the three main streaming libraries (conduit, io-streams, and pipes) offer some kind of pushback functionality (the latter using the auxiliary pipes-parse package). All three libraries can integrate with attoparsec parsers as well (see here, here and here).
(Another option, of course, is to prepend each message with its lenght are read only the exact number of bytes.)
To answer the first part of your question (about data serialization), I would say that everything you listed sounds fine. Since you are dealing with pretty big (1MB) serializations, I think that the most important thing is laziness. There is another serialization library, called cereal that has strict serializations, and you wouldn't want that because you'd need to build it up in memory before sending in out. I'll give a shout out to aeson (http://hackage.haskell.org/package/aeson-0.8.0.2/docs/Data-Aeson.html) which you can use GHC Generics with to get something simple like this:
data Shape = Rect Int Int | Circle Double | Other String Int
deriving (Generic)
instance FromJSON Shape -- uses a default
instance ToJSON Shape -- uses a default
And then, bam!, you've got access to the encode and decode methods. I don't know about a higher level TCP library. Hopefully, someone else will have more insight on that.
I'm using an open-source networking framework that makes it easy for developers to communicate over a service found using Bonjour in Objective-C.
There are a few lines that have had me on edge for a while now, even though they never seem to have caused any problems on any machines I've tested, regardless of whether I'm running the 32-bit of 64-bit version of my application:
int packetLength = [rawPacketData length];
[outgoingBuffer appendBytes:&packetLength length:sizeof(int)];
[outgoingBuffer appendData:rawPacketData];
[self writeToStream];
Note that the first piece of information sent is the length of the data packet, which is pretty standard, and then the data itself is sent. What scares me is the length of the length. Will one machine ever assume an int is 4 bytes, while the other machine believes an int to be 8 bytes?
If the two sizes could be different on different machines, what would cause this? Is it dependent on my compiler, or the end-user's machine architecture? And finally, if it is a problem, how can I take an 8-byte int and scrunch it down to 4-bytes to ensure backwards compatibility? (Since I'll never need more than 4 bytes to represent the size of the data packet.)
You can't assume that sizeof(int) will always be four bytes. If the size matters, you should either hard-code a size of 4 (and write code to serialize values into four-byte arrays with the proper endianness), or use types like int32_t defined in <stdint.h>.
(However, as a practical matter, most compiler vendors have decided that int should stay four bytes, so you probably don't need to worry about everything breaking tomorrow. Then again, it wasn't so long ago that many compiler vendors let an int be two bytes, leading to many problems when ints became four bytes, so you really ought to do things the right way so guard against future changes.)
It could be different, but this depends on the compiler more than the machine. A different compiler might redeclare int to be 8 bytes.
Size of int depends on machine architecture; size of int will be the size of the data bus almost always, unless your C compiler does something special and changes it.
That means size of int is not 4 bytes when you compile your program in a 8-bit, 16-bit or 64-bit machine/architecture.
I would define a constant for the buffer size instead of using size of int.
Hope this answers your question.
Although iPhone support JSON natively, AMF is a binary protocol and it supposes to use much less bandwidth. Do you think using AMF is a good idea?
Just found this AMF library in cocoa (Objective-C): http://github.com/nesium/cocoa-amf/
Here's the famous benchmark that shows AMF is smaller and faster than JSON + gzip in Flex: http://www.jamesward.com/census/
I don't think AMF would be significantly smaller than JSON. In fact, it can be slightly larger in many cases. Let me show this in an example:
AMF stores the string "asdf" in the following binary format:
0x12 /* type = string */
0x00 0x04 /* length */
'a' 's' 'd' 'f'
/* total: strlen(s)+3 bytes */
while JSON stores the string "asdf" in strlen(s) + 2 bytes if there are no quotes in the string.
AMF stores the JSON object {"key1":"asdf","key2":"foo"} in the following binary format:
0x03 /* type = object */
0x00 0x04 /* length of key1 */
'k' 'e' 'y' '1'
0x02 /* value type = string */
0x00 0x04 /* length of value1 */
'a' 's' 'd' 'f'
0x00 0x04 /* length of key2 */
'k' 'e' 'y' '2'
0x02 /* type of value2 */
0x00 0x03 /* length of value2 */
'f' 'o' 'o'
0x00 0x00 0x09 /* end of object */
/* total: 30 bytes, while the JSON string is 28 bytes */
The above examples were in AMF0, but I don't think AMF3 would be much different.
The only feature in AMF0 that can significantly reduce the bandwidth is that it contains a reference type: if you send the same large object twice, the second object will be only a back-reference to the first instance. But it is a rare case IMHO (and it works only for objects, not for strings).
So I would recommend JSON (if you really want to spare on bytes, you can compress it with zlib or anything): it's much simpler to read, there are much more implementations, and the specification is clear (while the Flash implementation is sometimes different from the specification - we all like Adobe ;))
Gym said :
The above examples were in AMF0, but I don't think AMF3 would be much different.
This is SO untrue.
AMF3 can result in data as much as 5 to 8 times less than AMF / JSON.
AMF3 achieves this by referencing every single item that has been used once. Not only strings. Any object, including keys, is referenced (with an offset) as soon as it has been used once.
On large datasets, it makes a huge difference.
You might take a look at Hessian or Google protocol buffers if you want to use a binary protocol. I know for a fact hessian provides very good performance on the iPhone.
http://code.google.com/p/protobuf/
http://hessian.caucho.com/
Actually it's a pretty good question and I have it too. I attended a session today at WWDC talking about client/server with iPhone. And they kept telling us binary plist were far more efficient than JSON and XML, especially when it comes to parsing time. But the problem is that I'm still tryin to find any server-side implementation of plist as a remoting protocol, whereas AMF has plenty of great implementations on the server-side: WebORB, ZendAMF, BlazeDS, etc. So integrating AMF on the server side is a breeze. Unfortunately, on the client side, the only option I found was Nesium's Cocoa AMF, but unfortunately it doesn't support channel set authentication and it misses a client-side stub generator. I would look into it but since this is no small task and I'm sure plenty of iPhone developers have already faced that issue, I want to make sure there really are no other options.
Up until now, I've been using Hessian with HessianKit, but it doesn't support authentication either and it's starting to be a limitation. Apple can say all they want about Flash, but at least they make it very easy to connect to a remote server.
You could also try plist, a native binary format. Any format, include AMF or even XML plist, could be reduced by zip. zlib is part of iPhone.