Matlab skip overheader importing textfile (follow up) - matlab

I want to be able to import a file with 46 lines of headers and then 4 x6 columns of data. I tried using this example from a previous answer, but the textscan won't work.
here is my test file: ImportTest.txt
1.0 2.0 3.0
1.1 2.1 3.1
1.2 2.2 3.3
Here is the code. The strData is different sizes. Why?
%open file
fid = fopen('ImportTest.txt');
strData = textscan(fid,'%s%s%s%s', 'Delimiter',',')
fclose(fid);
%# catenate, b/c textscan returns a column of cells for each column in the data
strData = cat(2,strData{:}) ;
%# convert cols 3:6 to double
doubleData = str2double(strData(:,3:end));
%# find header rows. headerRows is a logical array
headerRowsL = all(isnan(doubleData),2);
%# since I guess you know what the headers are, you can just remove the header rows
dateAndTimeCell = strData(~headerRowsL,1:2);
dataArray = doubleData(~headerRowsL,:);
%# and you're ready to start working with your data

You can use dlmread function to read your input file:
header_rows=5;
delim_char=' ';
C=dlmread('ImportTest.txt',delim_char,header_rows,0)
In the call you can specify the numember of header rows to skip (header_rows, in the example) and the delimiter character (delim_char in the example).
The last parameter in the call (0) defines the colums where the data starts.
The data read fromt the text file are stored directly in an array (C in the example).
Given this imput file (with 5 header rows):
header line 1
header line 2
header line 3
header line 4
header line 5
1.0 2.0 3.0
1.1 2.1 3.1
1.2 2.2 3.3
the output will be:
C =
1.0000 2.0000 3.0000
1.1000 2.1000 3.1000
1.2000 2.2000 3.3000
As an alternative, you can use importdata
header_rows=5;
delim_char=' ';
c=importdata('ImportTest.txt',delim_char,header_rows)
In this case, the output will be a structure:
c =
data: [3x3 double]
textdata: {5x3 cell}
colheaders: {'header' 'line' '5'}
The data are stored in the "data" field", the 5 header rows in the "textdata" field.
The last header row is also interpreted as the actual header of the data.
Hope this helps.
Qapla'

Related

MATLAB reading to the end of a binary file

I think the solution will be quite simple for somebody with some MATLAB knowhow however I do not know how to do it.
I have a binary file that I am reading with fread and I am reading the first 4 bytes of this file followed by the next 2 bytes.
I basically want this process of reading 4 bytes followed by 2 bytes repeated till the end of the file is reached.
So the number of bytes read is 4,2,4,2,4,2......
I have the following to read the first pair of data and I want this to repeat.
fileID = fopen('MyBinaryFile');
4bytes = fread(fileID, 4);
fseek(fileID, 4, 0);
2bytes = fread(fileID, 2);
Thanks in advance for any help and suggestions
I take it this is a variant of your former question MATLAB reading a mixed data type binary file.
Your goal is to read a binary file containing mixed data type. In your case it contains 2 columns:
1x single value (4 bytes) and 1x int16 value (2 bytes).
There are several ways to read this type of file. They differ in speed because some ways minimize disk access but require more temporary memory, and other way use just the memory needed but require more disk access (= slower).
Ultimately, the 3 ways I'm going to show you produce exactly the same result.
The direct answer to this question is the version #3 below, but I encourage you to have a look at the 2 other options described here, they are both really worth understanding.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
B = int16(-5:5) ; %// a few "int16" data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'single');
fwrite(fileID,B(il),'int16');
end
fclose(fileID);
This create a 2 column binary file, the columns being:
11 values of type float going from -3 to 1.
11 values of type int16 going from -5 to +5.
For future reference:
>> disp(A)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(B)
-5 -4 -3 -2 -1 0 1 2 3 4 5
In each of the solution below, the first column will be read in a variable called varSingle, and the second column in a variable called varInt16.
1) Read all data in one go - convert to proper type after
%% // SOLUTION 1 (fastest) : Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
colSize = [4 2] ; %// number of byte for each column [4 byte single, 2 byte int16]
R = reshape( R , sum(colSize) , [] ) ; %// reshape data into a matrix (6 is because 4+2byte=6 byte per column)
temp = R(1:4,:) ; %// extract data for first column into temporary variable (OPTIONAL)
varSingle = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
temp = R(5:end,:) ; %// extract data for second column
varInt16 = typecast( temp(:) , 'int16' ) ; %// convert into "int16"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // SOLUTION 2 : Read column by column (using "skipvalue") - 2 pass approach
col1size = 4 ; %// size of data in column 1 (in [byte])
col2size = 2 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
varSingle = fread(fileID,'*single',col2size) ; %// read all "float" values, skipping all "int16"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
varInt16 = fread(fileID,'*int16',col1size) ; %// read all "int16" values, skipping all "float"
fclose(fileID);
That works too. It works fine ... but it is going to be slower than method 1 above, because you will have to scan the file twice. It may be a good option if the file is very large and method 1 above fail because of an out of memory error.
3) Read element by element
%% // SOLUTION 3 : Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
varSingle=[];varInt16=[];
while ~feof(fileID)
try
varSingle(end+1) = fread(fileID, 1, '*single' ) ;
varInt16(end+1) = fread(fileID, 1, '*int16' ) ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
That does work too, and if you were writing C code it would be more than ok. But in Matlab this is not the recommended way to go (your choice ultimately)
As promised, the 3 methods above will give you exactly what we wrote in the file at the beginning:
>> disp(varSingle)
-3.0000 -2.6000 -2.2000 -1.8000 -1.4000 -1.0000 -0.6000 -0.2000 0.2000 0.6000 1.0000
>> disp(varInt16)
-5 -4 -3 -2 -1 0 1 2 3 4 5
fileID = fopen('MyBinaryFile');
kk=1;
while ~feof(fileID)
bytes4(kk) = fread(fileID, 4);
fseek(fileID, 4, 0);
bytes2(kk) = fread(fileID, 2);
kk=kk+1;
end
the while loop condition is ~feof, which stands for End-Of-File. So as long as you haven't reached the end of your file it runs.
I added the kk just so you store everything and not just overwrite them each loop iteration.
If you want to get the data without loops, there are MATLABish ways to that:
%'Sizes'
T = 4; %'Time record size'
D = 2; %'Date record size'
R = T+D; %'Record size'
%'Open file'
f = fopen('MyBinaryFile', 'rb');
if f < 0
error('Could not open file.');
end;
%'Read the entire file at once, and close file'
buf = fread(f, Inf, '*uint8');
fclose(f);
%'Ignore the last unpadded bytes, and reshape by the size of 1 record'
buf = reshape(buf(1:R*fix(numel(buf)/R)), R, []);
%'Pinpoint the data'
time_bytes = buf( 1: T, :);
date_bytes = buf(T+1:T+D, :);

How to remove an alphabet from a list of numbers in Matlab?

I have a list of numbers in a column vector. In those numbers i have an alphabet M which appears at random intervals..
this link How to delete zero components in a vector in Matlab?
display how to remove the Zero, i tried to adapt how to remove M, but in vain.!
How do I replace this M by 0?
I tried this code but to no avail.!
I called all my sample data N.
N=[4.6
6.7
4.1
3.1
M
2.6]
N(N==M) = [];
i also tried this code
sample=N(N~=M);
My real data is loaded from a text file:
filename='x.txt';
N=importdata(filename)
The problem is that your M items are never being imported by importdata in the first place!
importdata is the wrong function to use for complex data. If you put this in x.txt:
4.6
6.7
4.1
3.1
M
2.6
Then the output of N=importdata(filename) is simply the first four values. It can't handle the M. You should notice this, because the size of N should be smaller than the number of values in your file.
Instead, use textscan, telling it that M is an invalid item and should be replaced with 0:
fid = fopen('x.text');
N = textscan(fid,'%f','treatAsEmpty',{'M'},'EmptyValue',0);
fclose(fid);
N{1}
ans =
4.6000
6.7000
4.1000
3.1000
0
2.6000
Additional note: it's probably a bad idea to put 0 in where you mean this value was bad, because it will affect the results you get from other functions. I would set EmptyValue to NaN instead.
Assumes that you do not really have numbers, but numbers as a string. This means that you can use the function strrep.
try:
A = ['1 2 3 4 M 6'];
strrep(A,'M', '0');
Hey unfortunately you haven't provided what kind of datatype you have in N. As given by OP double doesn't make sense, because M is not a valid double value as far as I know.
So I assume that you have a cell array containing doubles or strings placed in cells. If so this code works:
N={ 1 2 2 42 5 12 'M' 'm' 123}
alphabet=['A':'Z','a':'z'];
for k=1:numel(N)
if ismember(N{k},alphabet)
N{k}=0;
end
end
display(N)
resulting in following console output:
input
N =
[1] [2] [2] [42] [5] [12] 'M' 'm' [123]
output
N =
[1] [2] [2] [42] [5] [12] [0] [0] [123]
You can change what the replacement is in the if statement.
The code can be modified to fit a string as input:
N=['1 2 2 42 5 12 M m 123']
alphabet=['A':'Z','a':'z'];
for k=1:numel(N)
if ismember(N(k),alphabet)
N(k)='0';
end
end
display(N)

reading in a file with textscan and ignoring certain lines

I have an input file which has rows of integers like this:
1 2 3
4 5 6
7 8 9
I want to read in the file, I have used the textscan function for this kind of task before.
But there are a few lines in the data (at random positions) which contain double numbers, for example
<large number of integer lines>
0.12 12.44 65.34
<large number of integer lines>
When reading in the file, I want to ignore these lines. What's the best approach to do this? Can I tell textscan to ignore certain patterns?
The formatSpec argument could be the one you're searching for:
http://www.mathworks.de/de/help/matlab/ref/textscan.html#inputarg_formatSpec
It terminates the reading, if the content does not match the given format. If you call textscan a second time with the same file, it has to start reading where it last terminated.
From linked site:
If you resume a text scan of a file by calling textscan with the same
file identifier (fileID), then textscan automatically resumes reading
at the point where it terminated the last read.
One option is to simply just read everything in as floats - use either textscan or if your data is all numeric dlmread or similar might be simpler.
Then just remove the lines you don't want:
data =
1.0000 2.0000 3.0000
4.0000 5.0000 6.0000
0.1200 12.4400 65.3400
7.0000 8.0000 9.0000
data(data(:,1)~=round(data(:,1)),:)=[]
data =
1 2 3
4 5 6
7 8 9
If your later code requires that the type of your data matrix is non-float, use uint8 or similar to convert at this point.
Assuming that you don't know the location and number of the lines with floats, and that you don't want lines such as 1.0 2.0 3.0 or 1 2 3.0 my idea would be to read the file line by line and not store lines which contain a . character.
fid = fopen('file.txt');
nums = [];
line = fgetl(fid);
while line ~= -1 % #read until end of file
if isempty(strfind(line, '.'))
line = textscan(line, '%d %d %d');
nums = [nums; line{:}];
end
line = fgetl(fid);
end
fclose(fid);
nums is the matrix containing your data.

Specify decimal separator for .dat file in matlab [duplicate]

This question already has answers here:
Matlab: How to read in numbers with a comma as decimal separator?
(4 answers)
Closed 9 years ago.
I've got a bunch of .dat files, where the decimal separator is comma instead of dot. Is there any function in MATLAB to set comma as the separator?
You will have to read the data in as text (with textscan, textread, dlmread, etc.) and convert to numeric.
Say you have read the data into a cell array with each number in a cell:
>> C = {'1,2345','3,14159','2,7183','1,4142','0,7071'}
C =
'1,2345' '3,14159' '2,7183' '1,4142' '0,7071'
Use strrep and str2double as follows:
>> x = str2double(strrep(C,',','.'))
x =
1.2345 3.1416 2.7183 1.4142 0.7071
For your example data from comments, you have a file "1.dat" formatted similarly to:
1,2 3,4
5,6 7,8
Here you have a space as a delimiter. By default, textscan uses whitespace as a delimiter, so that is fine. All you need to change below is the format specifier for the number of columns in your data by repeating %s for each column (e.g. here we need '%s%s' for two columns):
>> fid = fopen('1.dat','r');
>> C = textscan(fid,'%s%s')
C =
{2x1 cell} {2x1 cell}
>> fclose(fid);
The output of textscan is a cell array for each column delimited by whitespace. Combine the columns into a single cell array and run the commands to convert to numeric:
>> C = [C{:}]
C =
'1,2' '3,4'
'5,6' '7,8'
>> x = str2double(strrep(C,',','.'))
x =
1.2000 3.4000
5.6000 7.8000

Use textscan in Matlab to output data

I've got a large text file with some headers and numerical data. I want to ignore the header lines and specifically output the data in columns 2 and 4.
Example data
[headers]
line1
line2
line3
[data]
1 2 3 4
5 6 7 8
9 10 11 12
I've tried using the following code:
FID = fopen('datafile.dat');
data = textscan(FID,'%f',4,'delimiter',' ','headerLines',4);
fclose(FID);
I only get an output of 0x1 cell
Try this:
FID = fopen('datafile.dat');
data = textscan(FID,'%f %f %f %f', 'headerLines', 6);
fclose(FID);
data will be a 1x4 cell array. Each cell will contain a 3x1 array of double values, which are the values in each column of your data.
You can access the 2nd and 4th columns of your data by executing data{2} and data{4}.
With your original code, the main issue is that the data file has 6 header lines but you've specified that there are only 4.
Additionally, though, you'll run into problems with the specification of the number of times to match the formatSpec. Take for instance the following code
data = textscan(FID,'%f',4);
which specifies that you will attempt to match a floating-point value 4 times. Keep in mind that after matching 4 values, textscan will stop. So for the sake of simplicity, let's imagine that your data file only contained the data (i.e. no header lines), then you would get the following results when executing that code, multiple times:
>> FID = fopen('datafile_noheaders.dat');
>> data_line1 = textscan(FID,'%f', 4)
data_line1 =
[4x1 double]
>> data_line1{1}'
ans =
1 2 3 4
>> data_line2 = textscan(FID,'%f', 4)
data_line2 =
[4x1 double]
>> data_line2{1}'
ans =
5 6 7 8
>> data_line3 = textscan(FID,'%f', 4)
data_line3 =
[4x1 double]
>> data_line3{1}'
ans =
9 10 11 12
>> data_line4 = textscan(FID,'%f', 4)
data_line4 =
[0x1 double]
>> fclose(FID);
Notice that textscan picks up where it "left off" each time it is called. In this case, the first three times that textscan is called it returns one row from your data file (in the form of a cell containing a 4x1 column of data). The fourth call returns an empty cell. For the usecase you described, this format is not particularly helpful.
The example given at the top should return data in a format that is much easier to work with for what you are trying to accomplish. In this case it will match four floating point values in each of your rows of data, and will continue with each line of text until it can no longer match this pattern.