I'm building an iPhone app that's going to be sending and receiving large amounts of data to and from a server. I'm using JSon to get the data.
I was wondering if it's possible to also use some sort of compression on the received data in order to try to speed up the process a little bit. If so, which kind of compression works best with JSon, and where can I find more info on it?
Thanks,
late for party but just in case anyone looking for.
Use ASIHTTPRequest which has inbuilt supports for the gzip compression.
this will save overhead of handling decompression.
gzip at ASIHTTPRequest
IPhone supports ZLib. But I think its better idea to have your server supporting compression as the NSURLRequest accepts gzip encoding from server responses. As JSON is serializable, this might be the best option for you.
With zlib you can use compression from the client-side.
JSON itself doesn't really care what kind of compression you use on your data so you are free to choose the compression scheme that best fits the data and provides the best size/performance.
However JSON expects all data to be in UTF-8 format so you need to encode the compressed data, e.g. by using base64 encoding.
There are at least two algorithms used for JSON compression (CJson & HPack).
If the client device supports gzip, then there is no benefit of using JSON compression. When using both: gzip compression & json compression, the improvement is negligible. Using JSON compression does make sense, when gzip is disabled or not supported.
I think HPack(also known as JSONH) compression algorithm with gzip compression is a good option if you are too concerned about the data size.
I tried compressing a simple json data with array of objects, I used two methods of compression -
gzip
JSONH+gzip
The result of JSONH+gzip was around 7% more compressed than the result of just using gzip. In my case this was a significant number and I went ahead with the mixed implementation.
Related
Many compression algorithms take advantage of the fact that there's redundancy/patterns in data. aaaaaaaaaabbbbbbbbbbbcccccccccccc could be compressed to 10'a'11'b'12'c', for example.
But that there's no more redundancy in my compressed data, I couldn't really compress it further. However, I can encrypt or encode it and turn it into a different string of bytes: xyzxyzxyzxyzxyz.
If the random bits just so happened to have a pattern in them, it seems that it'd be easy to take advantage of that: 5'xyz'
Here's what our flow looks like:
Original: aaaaaaaaaabbbbbbbbbbbcccccccccccc
Compressed: 10'a'11'b'12'c'
Encrypted: xyzxyzxyzxyzxyz
Compressed again: 5'xyz'
But the more data you have, the larger your file, the more effective many forms of encryption will be. Huffman encoding, especially, seems like it'd work really well on random bits of data, especially when the file gets pretty large!!
I imagine this would be atrocious when you need data fast, but I think it could have merits for storing archives, or other stuff like that. Maybe downloading a movie over a network would only take 1MB of bandwidth instead of 4MB. Then, you could unpack the movie as the download happened, getting the full 4MB file on your hard drive without destroying your network's bandwidth.
So I have a few questions:
Do people ever encode data so it can be compressed better?
Do people ever "double-compress" their data?
Are there any well-known examples of "double" compression, where data is compressed, encrypted or encoded, then compressed again?
Good encryption results in high-quality random data, so it can not be compressed. The probability of a compressible result "just so happened" from an encryption is the same as it would be from any other random data source. Which is simply never.
Double compression is like perpetual motion. It is a oft discussed idea but never works. If it worked, you could compress and compress and compress and get the file down to 1 bit... See
How many times can a file be compressed?
The fundamental problem is that most files are NOT compressible--random, encrypted files even less so.
To answer your questions:
1) yes! See burrows wheeler compression
2) no.
3) no.
From reading some stuff about TOAST I've learned Postgres uses an LZ-family compression algorithm, which it calls PGLZ. And it kicks in automatically for values larger than 2KB.
How does PGLZ compare to GZIP in terms of speed and compression ratio?
I'm curious to know if PGLZ and GZIP have similar speeds and compression rates such that doing an extra GZIP step before inserting large JSON strings as data into Postgres would be unnecessary or harmful.
It's significantly faster, but has a lower compression ratio than gzip. It's optimised for lower CPU costs.
There's definitely a place for gzip'ing large data before storing it in a bytea field, assuming you don't need to manipulate it directly in the DB, or don't mind having to use a function to un-gzip it first. You can do it with things like plpython or plperl if you must do it in the DB, but it's usually more convenient to just do it in the app.
If you're going to go to the effort of doing extra compression though, consider using a stronger compression method like LZMA.
There have been efforts to add support for gzip and/or LZMA compression to TOAST in PostgreSQL. The main problem with doing so has been that we need to maintain compatibility with on-disk format for older versions, make sure it stays compatible into the future, etc. So far nobody's come up with an implementation that's satisfied relevant core team members. See e.g. pluggable compression support. It tends to get stuck in a catch-22 where pluggable support gets rejected (see that thread for why) but nobody can agree on a suitable, software-patent-safe algorithm we should adopt as a new default method, agree on how to change the format to handle multiple compression methods, etc.
Custom compression methods are coming to reality. As reported here
https://www.postgresql.org/message-id/20180618173045.7f734aca%40wp.localdomain
synthetic tests showed that zlib gives more compression but usually
slower than pglz
I need to save a JSON which has a size of about 20 MG (include some jpg base64 images inside).
Is any advantage in performance if I save it on a binary field, JSON field or a text field?
Any suggestion to save it?
The most efficient way to store this would be to extract the image data, base64-decode it, and store it in a bytea field. Then store the rest of the json in a json or text field. Doing that is likely to save you quite a bit of storage because you're storing the highly compressed JPEG data directly, rather than a base64-encoded version.
If you can't do that, or don't want to, you should just shove the whole lot in a json field. PostgreSQL will attempt to compress it, but base64 of a JPEG won't compress too wonderfully with the fast-but-not-very-powerful compression algorithm PostgreSQL uses. So it'll likely be signficantly bigger.
There is no difference in storage terms between text and json. (jsonb, in 9.4, is different - it's optimised for fast access, rather than compact storage).
For example, if I take this 17.5MB JPEG, it's 18MB as bytea. Base64-encoded it's 24MB uncompressed. If I shove that into a json field with minimal json syntax wrapping it remains 24MB - which surprised me a little, I expected to save some small amount of storage with TOAST compression. Presumably it wasn't considered compressible enough.
(BTW, base64 encoded binary isn't legal as an unmodified json value as you must escape slashes)
I am using the gzip algorithm in j2me. After compressing the string I tried to send the compressed string as text message but the size was increasing drastically. So I used base64 encoding to convert the compressed binary to text. But while encoding the size is still increasing, please help me with an encoding technique which when used the data size remains the same.
I tried sending binary sms but as its limit is 134 characters I want to compress it before sending the sms.
You have some competing requirements here.
The fact you're considering using SMS as a transport mechanism makes me suspect that the data you have to send is quite short to start with.
Compression algorithms (in general) work best with large amounts of data and can end up creating a longer output than you started with if you start with something very short.
There are very few useful encoding changes that will leave you with output the same length as when you started. (I'm struggling to think of anything really useful right now.)
You may want to consider alternative transmission methods or alternatives to the compression techniques you have tried.
QUESTION:
Is it better to send large data blobs in JSON for simplicity, or send them as binary data over a separate connection?
If the former, can you offer tips on how to optimize the JSON to minimize size?
If the latter, is it worth it to logically connect the JSON data to the binary data using an identifier that appears in both, e.g., as "data" : "< unique identifier >" in the JSON and with the first bytes of the data blob being < unique identifier > ?
CONTEXT:
My iPhone application needs to receive JSON data over the 3G network. This means that I need to think seriously about efficiency of data transfer, as well as the load on the CPU.
Most of the data transfers will be relatively small packets of text data for which JSON is a natural format and for which there is no point in worrying much about efficiency.
However, some of the most critical transfers will be big blobs of binary data -- definitely at least 100 kilobytes of data, and possibly closer to 1 megabyte as customers accumulate a longer history with the product. (Note: I will be caching what I can on the iPhone itself, but the data still has to be transferred at least once.) It is NOT streaming data.
I will probably use a third-party JSON SDK -- the one I am using during development is here.
Thanks
You could try to compress the JSON (gz perhaps) before you send it and then uncompress it on the client-side.
But I'm not sure how that affects iPhone performance.