how to convert binary data into string format in j2me - encoding

I am using the gzip algorithm in j2me. After compressing the string I tried to send the compressed string as text message but the size was increasing drastically. So I used base64 encoding to convert the compressed binary to text. But while encoding the size is still increasing, please help me with an encoding technique which when used the data size remains the same.
I tried sending binary sms but as its limit is 134 characters I want to compress it before sending the sms.

You have some competing requirements here.
The fact you're considering using SMS as a transport mechanism makes me suspect that the data you have to send is quite short to start with.
Compression algorithms (in general) work best with large amounts of data and can end up creating a longer output than you started with if you start with something very short.
There are very few useful encoding changes that will leave you with output the same length as when you started. (I'm struggling to think of anything really useful right now.)
You may want to consider alternative transmission methods or alternatives to the compression techniques you have tried.

Related

Using the CBOR format instead of JSON in elasticsearch ingest plugin

In the documentation of Ingest Attachment Processor Plugin in Elasticsearch, it is mentioned, "If you do not want to incur the overhead of converting back and forth between base64, you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. The processor will skip the base64 decoding then." Could anyone please throw some light on this or maybe share an example of how to achieve this? I need to index a very large number of documents having a significant size. So I need to minimise the latency.

Is it possible to compress a piece of already-compressed-data by encrypting or encoding it?

Many compression algorithms take advantage of the fact that there's redundancy/patterns in data. aaaaaaaaaabbbbbbbbbbbcccccccccccc could be compressed to 10'a'11'b'12'c', for example.
But that there's no more redundancy in my compressed data, I couldn't really compress it further. However, I can encrypt or encode it and turn it into a different string of bytes: xyzxyzxyzxyzxyz.
If the random bits just so happened to have a pattern in them, it seems that it'd be easy to take advantage of that: 5'xyz'
Here's what our flow looks like:
Original: aaaaaaaaaabbbbbbbbbbbcccccccccccc
Compressed: 10'a'11'b'12'c'
Encrypted: xyzxyzxyzxyzxyz
Compressed again: 5'xyz'
But the more data you have, the larger your file, the more effective many forms of encryption will be. Huffman encoding, especially, seems like it'd work really well on random bits of data, especially when the file gets pretty large!!
I imagine this would be atrocious when you need data fast, but I think it could have merits for storing archives, or other stuff like that. Maybe downloading a movie over a network would only take 1MB of bandwidth instead of 4MB. Then, you could unpack the movie as the download happened, getting the full 4MB file on your hard drive without destroying your network's bandwidth.
So I have a few questions:
Do people ever encode data so it can be compressed better?
Do people ever "double-compress" their data?
Are there any well-known examples of "double" compression, where data is compressed, encrypted or encoded, then compressed again?
Good encryption results in high-quality random data, so it can not be compressed. The probability of a compressible result "just so happened" from an encryption is the same as it would be from any other random data source. Which is simply never.
Double compression is like perpetual motion. It is a oft discussed idea but never works. If it worked, you could compress and compress and compress and get the file down to 1 bit... See
How many times can a file be compressed?
The fundamental problem is that most files are NOT compressible--random, encrypted files even less so.
To answer your questions:
1) yes! See burrows wheeler compression
2) no.
3) no.

Marc21 Binary Decoder with Akka-Stream

I'm trying to decode Marc21 binary data records which have the following specification concerning the field that provide the length of the record.
A Computer-generated, five-character number equal to the length of the
entire record, including itself and the record terminator. The number
is right justified and unused positions contain zeros.
I am trying to use
Akka Stream Framing.lengthField, however I just don't know how specify the size of that field. I imagine that a character is 8 bit, maybe 16 for a number, i am not sure, i wonder if that depend of the platform or language. In short, the question is is it possible to say what is the size of that field Knowing that i am in Scala/Java.
Also what does means:
The number is right justified and unused positions contain zeros"
Does that has implication on how one read the value if collected properly ?
If anyone know anything about this, please share.
EDIT1
Context:
I am trying to build a stream processing graph where the first stage would be processing the result of a sys command ran against a symphony (Vendor Cataloging system) server, which is a stream of unstructured byte chunck which as a whole represent all the Marc21 records Requested (full dump or partial dump).
By processing i mean, chunking that unstructured stream of byte into a stream of frames where the frames are the Records.
In other words, readying the bytes for one record at the time, and emitting it individually to the next stage.
The next stage will consist in emitting that record (Bytes) to apache Kafka.
Obviously the emission stage would be fully parallelize to speed up the process.
The Symphony server does not have the capability to stream a dump when requested, especially over the network. Hence, this Akka-stream based Graph processing to perform that work, for fast ingestion/production and overall streaming processing of our dumps in our overall fast data infrastructure.
EDIT2
Based on #badcook input, I wonder if ComputeFramesize could be used here. Not sure i am slightly confused by the function and what does it takes into parameters.
Little clarification would be much appreciated.
It looks like you're trying to parse MARC 21 records.
In that case I would recommend you just take a look at MARC4J and use that.
If you want to integrate it with Akka streams, or even if you want to parse MARC records your own way, I would recommend breaking up your byte steam with Framing.delimiter using the MARC 21 record terminator (ASCII control character 1D) into complete MARC records rather than try to stream and work with fragments of MARC records. It'll be a lot easier.
As for your specific questions: The MARC 21 specification uses characters rather than raw bytes when talking about its structure. It specifies two character encodings into raw bytes, UTF-8 and MARC 8, both of which are variable width encodings. Hence, no it is not true that every character is a byte. There is no single answer of how many bytes a character takes up.
"[R]ight justified and unused positions contain zeroes" is another way of saying that numbers are padded from the left with 0s. In this case this line comes from a larger quote staying that the numerical string must be 5 characters long. That means if you are trying to represent the number 1, you must represent it as 00001.

iPhone, JSon and Compression

I'm building an iPhone app that's going to be sending and receiving large amounts of data to and from a server. I'm using JSon to get the data.
I was wondering if it's possible to also use some sort of compression on the received data in order to try to speed up the process a little bit. If so, which kind of compression works best with JSon, and where can I find more info on it?
Thanks,
late for party but just in case anyone looking for.
Use ASIHTTPRequest which has inbuilt supports for the gzip compression.
this will save overhead of handling decompression.
gzip at ASIHTTPRequest
IPhone supports ZLib. But I think its better idea to have your server supporting compression as the NSURLRequest accepts gzip encoding from server responses. As JSON is serializable, this might be the best option for you.
With zlib you can use compression from the client-side.
JSON itself doesn't really care what kind of compression you use on your data so you are free to choose the compression scheme that best fits the data and provides the best size/performance.
However JSON expects all data to be in UTF-8 format so you need to encode the compressed data, e.g. by using base64 encoding.
There are at least two algorithms used for JSON compression (CJson & HPack).
If the client device supports gzip, then there is no benefit of using JSON compression. When using both: gzip compression & json compression, the improvement is negligible. Using JSON compression does make sense, when gzip is disabled or not supported.
I think HPack(also known as JSONH) compression algorithm with gzip compression is a good option if you are too concerned about the data size.
I tried compressing a simple json data with array of objects, I used two methods of compression -
gzip
JSONH+gzip
The result of JSONH+gzip was around 7% more compressed than the result of just using gzip. In my case this was a significant number and I went ahead with the mixed implementation.

Should I move big data blobs in JSON or in separate binary connection?

QUESTION:
Is it better to send large data blobs in JSON for simplicity, or send them as binary data over a separate connection?
If the former, can you offer tips on how to optimize the JSON to minimize size?
If the latter, is it worth it to logically connect the JSON data to the binary data using an identifier that appears in both, e.g., as "data" : "< unique identifier >" in the JSON and with the first bytes of the data blob being < unique identifier > ?
CONTEXT:
My iPhone application needs to receive JSON data over the 3G network. This means that I need to think seriously about efficiency of data transfer, as well as the load on the CPU.
Most of the data transfers will be relatively small packets of text data for which JSON is a natural format and for which there is no point in worrying much about efficiency.
However, some of the most critical transfers will be big blobs of binary data -- definitely at least 100 kilobytes of data, and possibly closer to 1 megabyte as customers accumulate a longer history with the product. (Note: I will be caching what I can on the iPhone itself, but the data still has to be transferred at least once.) It is NOT streaming data.
I will probably use a third-party JSON SDK -- the one I am using during development is here.
Thanks
You could try to compress the JSON (gz perhaps) before you send it and then uncompress it on the client-side.
But I'm not sure how that affects iPhone performance.