Binary Data in MatLab - matlab

So I'm trying to work with some data files that contain a text header followed by binary data, 16 bits signed integers with the least significant byte first.
I can't seem to open the binary data correctly. The text header is of variable length.
I've tried the following, but my problem is that the data is not actually stored as binary, but is already a number, but not correctly.
The header is of variable length so I can't tell it to read after so many characters without first opening the file.
fileName = 'PATH/TO/FILE/FILE_NAME.DAT';
dataFile = fopen(fileName);
header = '';
i = 1;
%dataContents = fileread(fileName);
dataContents = fread(dataFile);
while i < 115
char = dataContents(i);
header = [header char];
if char == '}'
break
end
i = i + 1;
end
header = header(2:end-1);
headerSplit = strsplit(header,',');
fileSize = str2double(headerSplit(17));
binaryData = dataContents(i:end);
data = [];
j = 1;
num = binaryData(1:50)
while j < fileSize
data = [data, bin2dec(num2str(binaryData(j:j+1)))];
j = j + 2;
end
length(data)
Any help would be great. I'm new to matlab so I'm probably missing something simple.

Without knowing the data format, it's nearly impossible to give a detailed advice. You probably need to set the precision argument of fread according to your data format.
After parsing your header, you know where your data begins. Use fseek(dataFile, numel(header)+2, 'bof') to set your file handle to the position where the file begins. Then start reading again using fread with a matching precision.

Related

Table fields in format in PDF report generator - Matlab

When I save a table as PDF (using report generator) I do not get the numeric fields (Index1, Index2, Index3) shortG format (total of 5 digits). What is the problem and how can I fix it?
The code:
function ButtonPushed(app, event)
import mlreportgen.dom.*;
import mlreportgen.report.*
format shortG;
ID = [1;2;3;4;5];
Name = {'San';'John';'Lee';'Boo';'Jay'};
Index1 = [71.1252;69.2343245;64.345345;67.345322;64.235235];
Index2 = [176.23423;163.123423654;131.45364572;133.5789435;119.63575647];
Index3 = [176.234;16.123423654;31.45364572;33.5789435;11.6647];
mt = table(ID,Name,Index1,Index2,Index3);
d = Document('myPDF','pdf');
d.OutputPath = ['E:/','temp'];
append(d,'Table 1: ');
append(d,mt);
close(d);
rptview(d.OutputPath);
end
To fix it, format your numeric arrays to character arrays with 5 significant digits before writing to PDF.
mt = table(ID,Name,f(Index1),f(Index2),f(Index3));
where,
function FivDigsStr = f(x)
%formatting to character array with 5 significant digits and then splitting.
%at each tab. categorical is needed to remove ' 's that appear around char
%in the output PDF file with newer MATLAB versions
%e.g. with R2018a, there are no ' ' in the output file but ' ' appears with R2020a
FivDigsStr = categorical(split(sprintf('%0.5G\t',x)));
%Removing the last (<undefined>) value (which is included due to \t)
FivDigsStr = FivDigsStr(1:end-1);
end
The above change gives the following result:
Edit:
To bring back the headers:
mt.Properties.VariableNames(3:end) = {'Index1', 'Index2', 'Index3'};
or in a more general way to extract the variable names instead of hardcoding them, you can use inputnames to extract the variable names.
V = #(x) inputname(1);
mt.Properties.VariableNames(3:end) = {V(Index1), V(Index2), V(Index3)};
which gives:

joining arrays in Matlab and writing to file using dlmwrite( ) adds extra space

I am generating 2500 values in Matlab in format (time,heart_rate, resp_rate) by using below code
numberOfSeconds = 2500;
time = 1:numberOfSeconds;
newTime = transpose(time);
number0 = size(newTime, 1)
% generating heart rates
heart_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intHeartRate = int64(heart_rate);
number1 = size(intHeartRate, 1)
% hist(heart_rate)
% generating resp rates
resp_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intRespRate = int64(resp_rate);
number2 = size(intRespRate, 1)
% hist(heart_rate)
% joining time and sensor data
joinedStream = strcat(num2str(newTime),{','},num2str(intHeartRate),{','},num2str(intRespRate))
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', joinedStream,'delimiter','');
The data shown in the console is alright, but when I save this data to a .txt file, it contains extra spaces in beginning. Hence I am not able to parse the .txt file to generate input stream. Please help
Replace the last two lines of your code with the following. No need to use strcat if you want a CSV output file.
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', [newTime intHeartRate intRespRate]);
π‘‡β„Žπ‘’ π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑠𝑒𝑔𝑔𝑒𝑠𝑑𝑒𝑑 𝑏𝑦 π‘ƒπΎπ‘œ 𝑖𝑠 π‘‘β„Žπ‘’ π‘ π‘–π‘šπ‘π‘™π‘’π‘ π‘‘ π‘“π‘œπ‘Ÿ π‘¦π‘œπ‘’π‘Ÿ π‘π‘Žπ‘ π‘’. π‘‡β„Žπ‘–π‘  π‘Žπ‘›π‘ π‘€π‘’π‘Ÿ 𝑒π‘₯π‘π‘™π‘Žπ‘–π‘›π‘  π‘€β„Žπ‘¦ π‘¦π‘œπ‘’ 𝑔𝑒𝑑 π‘‘β„Žπ‘’ 𝑒𝑛𝑒π‘₯𝑝𝑒𝑐𝑑𝑒𝑑 π‘œπ‘’π‘‘π‘π‘’π‘‘.
The data written in the file is exactly what is shown in the console.
>> joinedStream(1) %The exact output will differ since 'rand' is used
ans =
cell
' 1,60,63'
num2str basically converts a matrix into a character array. Hence number of characters in its each row must be same. So for each column of the original matrix, the row with the maximum number of characters is set as a standard for all the rows with less characters and the deficiency is filled by spaces. Columns are separated by 2 spaces. Take a look at the following smaller example to understand:
>> num2str([44, 42314; 4, 1212421])
ans =
2Γ—11 char array
'44 42314'
' 4 1212421'

Shifting a string in matlab

Ok so I have retrieved this string from the text file now I am supposed to shift it by a specified amount. so for example, if the string I retrieved was
To be, or not to be
That is the question
and the shift number was 5 then the output should be
stionTo be, or not
to beThat is the que
I was going to use circshift but the given string wouldn't of a matching dimesions. Also the string i would retrieve would be from .txt file.
So here is the code i used
S = sprintf('To be, or not to be\nThat is the question')
circshift(S,5,2)
but the output is
stionTo be, or not to be
That is the que
but i need
stionTo be, or not
to beThat is the que
By storing the locations of the new lines, removing the new lines and adding them back in later we can achieve this. This code does rely on the insertAfter function which is only available in MATLAB 2016b and later.
S = sprintf('To be, or not to be\nThat is the \n question');
newline = regexp(S,'\n');
S(newline) = '';
S = circshift(S,5,2);
for ii = 1:numel(newline)
S = insertAfter(S,newline(ii)-numel(newline)+ii,'\n');
end
S = sprintf(S);
You can do this by performing a circular shift on the indices of the non-newline characters. (The code below actually skips all control characters with ASCII code < 32.)
function T = strshift(S, k)
T = S;
c = find(S >= ' '); % shift only printable characters (ascii code >= 32)
T(c) = T(circshift(c, k, 2));
end
Sample run:
>> S = sprintf('To be, or not to be\nThat is the question')
S = To be, or not to be
That is the question
>> r = strshift(S, 5)
r = stionTo be, or not
to beThat is the que
If you want to skip only the newline characters, just change to
c = find(S != 10);

Read data to matlab with for loop

I want to read the data of a file with size about 60 MB into matlab in some variables, but I get errors. This is my code:
clear all ;
clc ;
% Reading Input File
Dataz = importdata('leak0.lis');
%Dataz = load('leak0.lis');
for k = 1:1370
foundPosition = 1 ;
for i=1:size(Dataz,1)
strp = sprintf('I%dz=',k);
fprintf(strp);
findValue = strfind(Dataz{i}, strp) ;
if ~isempty(findValue)
eval_param = strp + '(foundPosition) = sscanf(Dataz{i},''%*c%*c%*f%*c%*c%f'') ;';
disp(eval_param);
% str(foundPosition) = sscanf(Dataz{i},'%*c%*c%*f%*c%*c%f') ;
eval(eval_param);
foundPosition = foundPosition + 1 ;
end
end
end
When I debugged it, I found out that the dataz is empty & so it doesn't proceed to next lines. I replace it with fopen, load & etc, but it didn't work.
From the Matlab help files, import data is likely failing because it doesn't understand your file format.
From the help files
Name and extension of the file to import, specified as a string. If importdata recognizes the file extension, it calls the MATLAB helper function designed to import the associated file format (such as load for MAT-files or xlsread for spreadsheets). Otherwise, importdata interprets the file as a delimited ASCII file.
For ASCII files and spreadsheets, importdata expects to find numeric
data in a rectangular form (that is, like a matrix). Text headers can
appear above or to the left of the numeric data, as follows:
Assuming that your .lis files actually have delimited text.
You should adjust the delimiter in the importdata call so that Matlab can understand your file.
filename = 'myfile01.txt';
delimiterIn = ' ';
headerlinesIn = 1;
A = importdata(filename,delimiterIn,headerlinesIn);

Read textfile with a mix of floats, integers and strings in the same column

Loading a well formatted and delimited text file in Matlab is relatively simple, but I struggle with a text file that I have to read in. Sadly I can not change the structure of the source file, so I have to deal with what I have.
The basic file structure is:
123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
.
.
.
45 String (A integer in column 1, string in column 2)
.
.
.
A simple
[data1,data2]=textread('filename.txt','%f %s', ...
'emptyvalue', NaN)
Does not work.
How can I properly filter the input data? All examples I found online and in the Matlab help so far deal with well structured data, so I am a bit lost at where to start.
As I have to read a whole bunch of those files >100 I rather not iterate trough every single line in every file. I hope there is a much faster approach.
EDIT:
I made a sample file available here: test.txt (google drive)
I've looked at the text file you supplied and tried to draw a few general conclusions -
When there are two integers on a line, the second integer corresponds to the number of rows following.
You always have (two integers (A, B) followed by "B" floats), repeated twice.
After that you have some free-form text (or at least, I couldn't deduce anything useful about the format after that).
This is a messy format so I doubt there are going to be any nice solutions. Some useful general principles are:
Use fgetl when you need to read a single line (it reads up to the next newline character)
Use textscan when it's possible to read multiple lines at once - it is much faster than reading a line at a time. It has many options for how to parse, which it is worth getting to know (I recommend typing doc textscan and reading the entire thing).
If in doubt, just read the lines in as strings and then analyse them in MATLAB.
With that in mine, here is a simple parser for your files. It will probably need some modifications as you are able to infer more about the structure of the files, but it is reasonably fast on the ~700 line test file you gave.
I've just given the variables dummy names like "a", "b", "floats" etc. You should change them to something more specific to your needs.
function output = readTestFile(filename)
fid = fopen(filename, 'r');
% Read the first line
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
a = nums{1}(1);
b = nums{1}(2);
% Read 'b' of the next lines:
contents = textscan(fid, '%f', b);
floats1 = contents{1};
% Read the next line:
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
c = nums{1}(1);
d = nums{1}(2);
% Read 'd' of the next lines:
contents = textscan(fid, '%f', d);
floats2 = contents{1};
% Read the rest:
rest = textscan(fid, '%s', 'Delimiter', '\n');
output.a = a;
output.b = b;
output.c = c;
output.d = d;
output.floats1 = floats1;
output.floats2 = floats2;
output.rest = rest{1};
end
You can read in the file line by line using the lower-level functions, then parse each line manually.
You open the file handle like in C
fid = fopen(filename);
Then you can read a line using fgetl
line = fgetl(fid);
String tokenize it on spaces is probably the best first pass, storing each piece in a cell array (because a matrix doesn't support ragged arrays)
colnum = 1;
while ~isempty(rem)
[token, rem] = strtok(rem, ' ');
entries{linenum, colnum} = token;
colnum = colnum + 1;
end
then you can wrap all of that inside another while loop to iterate over the lines
linenum = 1;
while ~feof(fid)
% getl, strtok, index bookkeeping as above
end
It's up to you whether it's best to parse the file as you read it or read it into a cell array first and then go over it afterwards.
Your cell entries are all going to be strings (char arrays), so you will need to use str2num to convert them to numbers. It does a good job of working out the format so that might be all you need.