I am reading binary data from instrumentation using the Matlab udp() object.
I am surprised by the apparent lack of support for reading arbitrary length data types. How does one read a 24-bit integer? Or a 24-bit float? These are not that strange in instrumentation, and I have found only 8/16/32/64 data types in the documentation.
Have you looked tried help fread? The documentation shows it supports reading up to 64 bits at a time using bitN where N is a value between 1 and 64.
fid = udp(<your parameters here>); % use fopen to open the stream.
...
A = fread(fid,1,'bit24=>int32'); % stream 24 bits to a 32 bit integer.
B = fread(fid,1,'ubit24=>uint32'); % stream 24 bits to a 32 bit unsigned integer.
Since floating point specs vary, so this may or may not work for your situation:
C = fread(fid,1,'bit24=>float32'); % transcode 24bits to 32 bit float (MATLAB spec)
UPDATE
Seeing that the udp/fread implementation does not support this casting there are a couple, not-so-pretty, workarounds you can try.
Read in uchar data in multiples of three and then multiply it by their byte offsets directly. For example:
% First determine number of bytes on the stream and make sure you
% have at 3 or more bytes to read so you can calculate thirdOfBytesExpected.
[anMx3result, packetCount] = fread(fid,[thirdOfBytesExpected,3]);
unsigned20bitInt = anMx3result*(2.^(0:8:16))';
To be precise, the unsigned20bitInt is actually stored as a MATLAB double here. So if you need to write it elsewhere, you will need to bring it back to the individual uchar types it came from.
The not so pretty option is to eat the overhead of streaming the data back to a binary file format as an interim step so that you can then use the base fread method mentioned above. Not an ideal solution, but perhaps worth considering if you just need something to work.
% your original code for opening the udp handle
....
tmpFid = fopen('tmp.bin','rw');
[ucharVec, bytesRead] = fread(udpFid,bytesExpected,'uchar=>uchar');
bytesWritten = fwrite(tmpFid,ucharVec,'uchar');
% Do some quality control on bytes read vs written ...
fseek(tmpFid,-bytesWritten,'cof');
% in theory you should be able to just go to the beginning
% of the file each time like this fseek(tmpFid, 0, 'bof');
% provided you also reset to the beginning prior writing or after reading
% Read in the data as described originally
num24ByteIntsToRead = bytesWritten/3;
A = fread(tmpFid,num24BytsIntsToRead,'bit24=>int32');
Related
Here is the code:
x = rand(5)*100;
save('pqfile.txt','x','-ascii','-tabs')
The above works, but:
x = rand(5)*100;
x = uint8(x);
save('pqfile.txt','x','-ascii','-tabs')
says:
Warning: Attempt to write an unsupported data type to an ASCII file.
Variable 'x' not written to file.
Does anyone know why this happens? How come I can't save the data when it is uint8. I have to read data into a VHDL testbench so was experimenting. I guess the only option is to save my 8 bit unsigned integer values in 2d array using printf then read into the test bench.
ASCII option
The save method is somewhat restrictive in what it can support, and then it uses floating point notation to represent your numbers which bloats your file when dealing with a limited range of numbers like you are (i.e. uint8, 0 to 255).
Check out dlmwrite as an alternative (documentation here).
It takes the filename to write/save to, the variable to store, and some additional parameters, like the delimiter you want to separate your values with.
For your example, it looks like this
x = rand(5)*100;
x = uint8(x);
dlmwrite('pqfile.txt',x,'\t');
Binary option
If you are looking to stored your uint8 data as single bytes then you probably want go with a custom binary file instead instead of ASCII. (Yes, you can convert uint8 to single ASCII characters but you run into issues with these values being interpreted with your delimiters; newlines or tabs.)
fid=fopen('pqfile.dat','wb');
if(fid>2)
fwrite(fid,size(x),'*uint8'); % Note: change data type here you are dealing with more than 255 rows or columns
fwrite(fid,x','*uint8'); % Transpose x (with x') so it is stored in row order.
fclose(fid);
else
fprintf(1,'Could not open the file for writing.\n');
end
I'm not sure what type of parser you are using for your VHDL, but this will pack your data into a file with a short header of the expected dimensions followed by one long row of your serialized data.
To read it back in with MATLAB, you can do this:
fid = fopen('pqfile.dat','rb');
szX = fread(fid,2,'uint8');
x = fread(fid,szX,'*uint8')'; % transpose back if you are dealing with matlab.
fclose(fid);
The transpose operations are necessary for MATLAB because it reads data column-wise, whereas most other languages (in my experience) read row-wise.
I am attempting to quantize a 16 bit .wav file to a lower bit rate using Matlab. I've opened the file using wavread() but I am unsure of how to proceed from here. I know that somehow I need to "round" each sample value to (for example) a 7 bit number. Here's the code that's reading the file:
[file,rate,bits] = wavread('smb.wav');
file is a 1 column matrix containing the values of each sample. I can loop through each item in that matrix like so:
for i=1 : length(file)
% not sure what to put here..
end
Could you point me in the right direction to quantize the data?
If you have int16 data, varying from -32768 to +32767, it can be as simple as
new_data = int8(old_data./2^8);
That won't even require a for loop.
For scaled doubles it would be
new_data = int8(old_data.*2^7);
The wavread documentation suggests that you might even be able retrieve the data in that format to begin with:
[file,rate,bits] = wavread('smb.wav','int8');
EDIT: Changing the bit rate:
After rereading the question, I realize that you also mentioned a lower bit rate which implies reducing the sample rate, not the quantization of the data. If that is the case, you should look at the documentation for downsample, decimate, and/or resample. They are all built in MATLAB functions that change the bit rate.
downsample(file,2)
would half the bit rate, for example.
I'd like to read heterogenious binary data into matlab. I do know from the beginning how much it is and in which datatype each segment is. For example:
%double %double %int32 ...
and then this get repeated about a million times. Easy enoug to handle with fread since the know the number of bites for each segment and can therefore calculate the skip value for each row.
But now the data segment looks something like this :
%double %int32%*char %double %double ...
Whereby the int prior to the *char is the length of the said string. This brings the problem that I cannot calculate the skip anymore and I'm stuck be reading in the whole file line by line therefore needing to make a lot more file access and slowing everthing down.
In order to get at least some speed up I wan't to read in all %double %double ... (Around 30 elements) at a time and then use those from a buffer to fill up the array's. In C this would be a rather easy task here, without memcpy and not so direct access to pointers...
Do you know any way to achive this, not using mex files?
You can't solve the problem that the record size is unknown, and thus you don't know how much to read ahead of time. But you can batch up the reads, and if you have a reasonable max size for the string, you can always read that amount, and ignore the unneeded bytes at the end. typecast is the trick:
readlen = 1024;
buf = fread(fid, readlen, '*uint8'); % the asterisk keeps the returned array as uint8
rec.val1 = typecast(buf(1:8), 'double');
string_len = typecast(buf(9:12), 'int32');
rec.str1 = typecast(buf(13:13+string_len-1), 'uint8');
pos = 13+string_len;
rec.val2 = typecast(buf(pos:pos+8-1), 'double');
You might wrap a simple function around this technique to track the current offset automatically.
I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.
We were recently taught the concepts of error control coding - basic codes such as Hamming code, repeatition code etc.
I thought of trying out these concepts in MATLAB. My goal was to compare how an audio file plays when corrupted by noise and in the case when the file is protected by basic codes and then corrupted by noise.
So I opened a small audio clip of 20-30 seconds in MATLAB using audioread function. I used 16 bit encoded PCM wave file.
If opened in 'native' format it is in int16 format . If not it opens in a double format.
I then added two types of noises to it : - AWGN noise (using double format) and Binary Symmetric Channel noise (by converting the int16 to uint16 and then by converting that to binary using dec2bin function). Reconverting back to the original int16 format does add a lot of noise to it.
Now my goal is to try out a basic repeatition code. So what I did was to convert the 2-d audio file matrix which consists of binary data into a 3-d matrix by adding redundancy. I used the following command : -
cat(3,x,x,x,x,x) ;
It created a 3-D matrix such that it had 5 versions of x along the 3rd dimension.
Now I wish to add noise to it using bsc function.
Then I wish to do the decoding of the redundant data by removing the repetition bits using a mode() function on the vector which contains the redundant bits.
My whole problem in this task is that MATLAB is taking too long to do the computation. I guess a 30 second file creates quite a big matrix so maybe its taking time. Moreover I suspect what I am doing is not the most efficient way to do it with regards to the various data types.
Can you suggest a way in which I may improve on the computation times. Are there some functions which can help do this basic task in a better way.
Thanks.
(first post on this site with respect to MATLAB so bear with me if the posting format is not upto the mark.)
Edit - posting the code here :-
[x,Fs] = audioread('sample.wav','native'); % native loads it in int16 format , Fs of sample is 44 khz , size of x is 1796365x1
x1 = x - min(x); % to make all values non negative
s = dec2bin(x); % this makes s as a 1796365x15 matrix the binary stream stored as character string of length 15. BSC channel needs double as the data type
s1 = double(s) - 48; % to get 0s and 1s in double format
%% Now I wish to compare how noise affects s1 itself or to a matrix which is error control coded.
s2 = bsc(s,0.15); % this adds errors with probability of 0.15
s3 = cat(3,s,s,s,s,s) ; % the goal here is to add repetition redundancy. I will try out other efficient codes such as Hamming Code later.
s4 = bsc(s3,0.15);% this step is taking forever and my PC is unresponsive because of this one.
s5 = mode(s4(,,:)) ; % i wish to know if this is a proper syntax, what I want to do is calculate mode along the 3rd dimension just to remove redundancy and thereby reduce error.
%% i will show what I did after s was corrupted by bsc errors in s2,
d = char(s2 + 48);
d1 = bin2dec(d) + min(x);
sound(d1,Fs); % this plays the noisy file. I wish to do the same with error control coded matrix but as I said in a previous step it is highly unresponsive.
I suppose what is mostly wrong with my task is that I took a large sampling rate and hence the vector was very big.