How to decode 16-bit signed binary file in IEEE754 standard - matlab

I have a file format called .ogpr (openGPR, a dead format used for Ground Radar data), I'm trying to read this file and convert it into a matrix using Matlab(R).
In the first part of file there is a JSON Header where are explained the characteristics of data acquisition (number of traces, position etc), and on the second part there are two different data blocks.
First block contains the 'real' GPR data and I know that they are formatted as:
Multibyte binary data are little-endian
Floating point binary data follow the IEEE 754 standard
Integer data follow the two’s complement encoding
I know also the total number of bytes and also the relative number of bytes for each single 'slice' (we have 512 samples * 10 channel * 3971 slices [x2 byte per sample]).
Furthermore: 'A Data Block of type Radar Volume stores a 3D array of radar Samples At the moment, each sample value is stored in a 16-bit signed integer. Each Sample value is in volts in the range [-20, 20].'
Second block contains geolocation infos.
I'd like to read and convert the Data Block from that codification but it ain't clear especially how many bytes break the data and how to convert them from that codification to number.
I tried to use this part of code:
bin_data = ogpr_data(48:(length(ogpr_data)-1),1);
writematrix(bin_data, 'bin_data.txt');
fileID = fopen('bin_data.txt', 'r', 'ieee-le');
format = 'uint16';
Data = fread(fileID, Inf, format);fclose(fileID)

Looks like your posted code is mixing text files and binary files. The writematrix( ) routine writes values as comma delimited text. Then you turn around and try to use fopen( ) and fread( ) to read this as a binary file in IEEE Little Endian format. These are two totally different things. You need to pick one format and use it consistently, either human readable comma delimited text files, or machine readable binary IEEE format files.

Related

Read huge binary numbers in Scala

I have 2 huge numbers written in a .txt file in binary format. Both of them have about 800 digits. I want to read them from this file in Scala, but I can't find a suitable type, which would be able to hold all the digits. It seems that even BigInt cuts a part of the digits.
The task itself is to add to numbers in binary representation and count zeroes/ones. I wanted to operate with String, so it would be easier to convert from binary to a decimal system.
So I would be grateful for any advice on which type to use better in Scala for such numbers?

Why use Base64?

Base64 encoding increases the size of the input by around 37% when sent over the wire. If this is the case, why not use UTF-8 to encode the contents(say a .jpg file). This way the size of the file does not increase right?
eg: If I want to send the string "asd", a UTF-8 encoded version of this will be 3 bytes, whereas a Base64 encoded version will be 4 bytes.
The purpose of Base64 is to allow binary data to be transferred over a communication channel that cannot be relied on to transfer all possible byte values end-to-end. In particular, Base64 is used where byte values between 128 and 255 cannot be easily and reliably transferred.
In contrast, UTF-8 is used to encode Unicode across a channel that can be assumed to reliably transfer all possible byte values end-to-end (sometime referred to as an "8-bit clean" channel).
So, you have two problems with your proposal. First, a JPEG is binary data, not Unicode, so UTF-8 isn't really appropriate: if you "encode a JPEG as UTF-8" in the obvious way (treating the JPEG as a sequence of bytes, each associated with a Unicode code point from U+00 to U+FF, and then encoding those code points as UTF-8), it will double the size of all byte values from 128-255, so you'll have, on average, a 50% increase in file size. Second, even if you did this, the resulting encoded JPEG would require a communication channel that's 8-bit clean, so it couldn't be used in situations where Base64 is needed anyway.
Edit: In a comment, you asked if we couldn't use "input binary -> 7 bit ASCII encoding -> send over wire" to save space. I assume you mean taking the input binary as a long stream of bits and chopping them up into 7-bit chunks and sending those as ASCII? Yes, that could be done and would only increase size by 14%, but it's not just the non-ASCII byte values 128-255 that cause problems. In MIME email, where Base64 is most frequently used, differences in line-ending convention (carriage return, line feed, or a combination) from platform to platform, certain historical line length restrictions enshrined in the standards, and so on mean that not all ASCII characters (bytes 0-127) can be safely used. Base64 is not the best trade-off possible between compatibility and efficiency, but it's pretty close.
Base64 is usually used in instances to represent arbitrary binary data in a text format, it has a 33.3'% overhead but that's better than say hex notation which has a 50% overhead.
utf-8 is a text encoding which cannot represent arbitrary binary data which is what a jped file is.
There is little to no reason to convert the binary data to text to transfer it over the wire so a many times people do it because they don't know any better.
The only reason to use it is if you get it from apis or libraries.

MATLAB takeing huge 350 mb memory to write one column vector to txt file

I have a variable sndpwr which have 18000 rows and just one column. When I use fprintf it takes 350mb txt file to write. Even csvwrite and dlmwrite take 200+ mb space.
Can anyone tell me any function or method that will write it in a small text file. I am importing it in another program which is not able to import such large files.
fid = fopen('sndpwr.txt','wt');
fprintf(fid,'%0.6f\r\n',sndpwr');
fclose(fid);
Thanks!
EDIt: in workspace it is described as 31957476x1 double. Sorry for my previous incorrect data.
Unfortunately, there is no way to compress your data without using an actual compression algorithm. You have 3x10^7 numbers, written with six digits after the decimal point, at least one before, and a couple of newline characters. This gives 3x10^7 * 10 = 3x10^8 bytes, as a bare minimum. Since 1MB is approximately 10^6 bytes, you are getting a file on the order or 300MB.
If you were to write the file in binary using the double datatype, the file would likely be about 20% smaller since doubles are generally 64-bit (8 byte numbers). If you were to use the single datatype, there might be some information lost since single can only hold approximately 5 digits of decimal precision, but the file would only be 40% of its current size.
If binary is not an option, you can always split the data into smaller text files.

How does MATLAB read and interpret binary digits from a .bin file?

I have a binary file with .bin extension. This file is created by a data acquisition software. Basically a "measurement computing" 16-bit data-acquisition hardware is receiving signals from a transducer(after amplified by an amplifier) and sending this to PC by a USB. A program/software then is generating a .bin file corresponding received serial data from data aq. hardware. There are several ways to read this .bin file and plot the signal in MATLAB.
When I open this .bin file with a hexeditor I can see the ASCII or ones and zeros (binary). The thing is I don't know how to interpret this knowledge. There are 208000 bytes in the file obtained in 16 seconds. I was thinking each 2 bytes corresponds to a sample since the DAQ device has 16 bit resolution. So I thought for example a 16-bit data such as 1000100111110010 is converted by MATLAB to a corresponding voltage level. But I tried to open two different .bin files with different voltage levels such as 1V and 9V and still teh numbers do not seem to be related what I think.
How does MATLAB read and interpret binary digits from a .bin file?
Thnx,
Assuming your .bin file is literally just a dump of the values recorded, you can read the data using fread (see the documentation for more info):
fid = fopen('path_to_your_file', 'r');
nSamples = 104000;
data = fread(fid, nSamples, 'int16');
fclose(fid);
You will also need to know, however, whether this data is signed or unsigned - if it's unsigned you can use 'uint16' as the third argument to fread instead. You should also find out if it's big-endian or little-endian... You should check the original program's source code.
It's a good idea to record the sample rate at which you make acquisitions like this, because you'll be hard pressed to do anything but trivial analysis on it afterwards without knowing this information. Often this kind of data is stored in .wav files, so that both the data and its sample rate (and the bit depth, in fact) are stored in the file. That way you don't need a separate bit of paper to go along with your file (also, reading .wav files in MATLAB is extremely easy).

After encoding data size is increasing

I am having a text data in XML format and it's length is around 816814 bytes. It contains some image data as well as some text data.
We are using ZLIB algorithm for compressing and after compressing, the compressed data length is 487239 bytes.
After compressing we are encoding data using BASE64Encoder. But after encoding the compressed data, size is increasing and length of encoded data is 666748 bytes.
Why, after encoding data size is increasing? Is there any other best encoding techniques?
Regards,
Siddesh
As noted, when you are encoding binary 8-bit bytes with 256 possible values into a smaller set of characters, in this case 64 values, you will necessarily increase the size. For a set of n allowed characters, the expansion factor for random binary input will be log(256)/log(n), at a minimum.
If you would like to reduce this impact, then use more characters. Chances are that whatever medium you are using, it can handle more than 64 characters transparently. Find out how many by simply sending all 256 possible bytes, and see which ones make it through. Test the candidate set thoroughly, and then ideally find documentation of the medium that backs up that set of n < 256.
Once you have the set, then you can use a simple hard-wired arithmetic code to convert from the set of 256 to the set of n and back.
That is perfectly normal.
Base64 is required to be done, if your transmitting medium is not designed to transmit binary data but only textual data (eg XML)
So your zip file gets base64 encoded.
Plainly speaking, it requires the transcoder to change "non-ASCII" letters into a ASCII form but still remember the way to go back
As a rule of thumb, it's around a 33% size increase ( http://en.wikipedia.org/wiki/Base64#Examples )
This is the downside of base64. You are better of using a protocol supporting file-transfer... but for files encoded within XML, you are pretty much out of options.