How to open DBase files (.DBF) in Matlab? - matlab

I've googled and searched through Matlab Central, but cannot find any way to open DBF files directly in Matlab. There are some references to DBFREAD function in TMW File Exchange, but it's not available anymore. Is it really a problem?
I do have Database toolbox, but could not find dbf support there.
I don't want to use Excel or other tools for converting files outside of Matlab, since I have a lot of files to process. ODBC also is not good, I need the code to work under Mac and Unix as well.
Please help.

I contacted with Brian Madsen, the author of DBFREAD function, which was deleted from File Exchange, probably because The Mathworks is going to include this function into MATLAB in some future release. Brian kindly gave me permission to publish this function here. All copyright information left intact. I only modified lines 33-38 to allow DBFREAD to read file outside working directory.
function [dbfData, dbfFieldNames] = dbfread(filename, records2read, requestedFieldNames)
%DBFREAD Read the specified records and fields from a DBF file.
%
% [DATA, NAMES] = DBFREAD(FILE) reads numeric, float, character and date
% data and field names from a DBF file, FILE.
%
% [DATA, NAMES] = DBFREAD(FILE, RECORDS2READ) reads only the record
% numbers specified in RECORDS2READ, a scalar or vector.
%
% [DATA, NAMES] = DBFREAD(FILE, RECORDS2READ, REQUESTEDFIELDNAMES) reads
% the data from the fields, REQUESTEDFIELDNAMES, for the specified
% records. REQUESTEDFIELDNAMES must be a cell array. The fields in the
% output will follow the order given in REQUESTEDFIELDNAMES.
%
% Examples:
%
% % Get all records and a list of the field names from a DBF file.
% [DATA,NAMES] = dbfread('c:\matlab\work\mydbf')
%
% % Get data from records 3:5 and 10 from a DBF file.
% DATA = dbfread('c:\matlab\work\mydbf',[3:5,10])
%
% % Get data from records 1:10 for three of the fields in a DBF file.
% DATA = dbfread('c:\matlab\work\mydbf',1:10,{'FIELD1' 'FIELD3' 'FIELD5'})
%
% See also XLSREAD, DLMREAD, DLMWRITE, LOAD, FILEFORMATS, TEXTSCAN.
% Copyright 2008 The MathWorks, Inc.
% $Revision: 1.0 $ $Date: 2008/04/18 05:58:17 $
[pathstr,name,ext] = fileparts(filename);
dbfFileId = fopen(filename,'r','ieee-le');
if (dbfFileId == -1)
dbfFileId = fopen(fullfile(pathstr, [name '.dbf']),'r','ieee-le');
end
if (dbfFileId == -1)
dbfFileId = fopen(fullfile(pathstr, [name '.DBF']),'r','ieee-le');
end
if (dbfFileId == -1)
eid = sprintf('MATLAB:%s:missingDBF', mfilename);
msg = sprintf('Failed to open file %s.dbf or file %s.DBF.',...
name, name);
error(eid,'%s',msg)
end
info = dbfinfo(dbfFileId);
if ~exist('requestedFieldNames','var')
dbfFieldNames = {info.FieldInfo.Name};
requestedFieldNames = dbfFieldNames;
else
dbfFieldNames = (info.FieldInfo(matchFieldNames(info,requestedFieldNames)).Name);
end
fields2read = matchFieldNames(info,requestedFieldNames);
% The first byte in each record is a deletion indicator
lengthOfDeletionIndicator = 1;
if ~exist('records2read','var')
records2read = (1:info.NumRecords);
elseif max(records2read) > info.NumRecords
eid = sprintf('MATLAB:%s:invalidRecordNumber', mfilename);
msg = sprintf('Record number %d does not exist, please select from the range 1:%d.',...
max(records2read), info.NumRecords);
error(eid,'%s',msg)
end
% Loop over the requested fields, reading in the data
dbfData = cell(numel(records2read),numel(fields2read));
for k = 1:numel(fields2read),
n = fields2read(k);
fieldOffset = info.HeaderLength ...
+ sum([info.FieldInfo(1:(n-1)).Length]) ...
+ lengthOfDeletionIndicator;
fseek(dbfFileId,fieldOffset,'bof');
formatString = sprintf('%d*uint8=>char',info.FieldInfo(n).Length);
skip = info.RecordLength - info.FieldInfo(n).Length;
data = fread(dbfFileId,[info.FieldInfo(n).Length info.NumRecords],formatString,skip);
dbfData(:,k) = feval(info.FieldInfo(n).ConvFunc,(data(:,records2read)'));
% dbfData(:,k) = info.FieldInfo(n).ConvFunc(data(:,records2read)');
end
fclose(dbfFileId);
%--------------------------------------------------------------------------
function fields2read = matchFieldNames(info, requestedFieldNames)
% Determine which fields to read.
allFieldNames = {info.FieldInfo.Name};
if isempty(requestedFieldNames)
if ~iscell(requestedFieldNames)
% Default case: User omitted the parameter, return all fields.
fields2read = 1:info.NumFields;
else
% User supplied '{}', skip all fields.
fields2read = [];
end
else
% Match up field names to see which to return.
fields2read = [];
for k = 1:numel(requestedFieldNames)
index = strmatch(requestedFieldNames{k},allFieldNames,'exact');
if isempty(index)
wid = sprintf('MATLAB:%s:nonexistentDBFName',mfilename);
wrn = sprintf('DBF name ''%s'' %s\n%s',requestedFieldNames{k},...
'doesn''t match an existing DBF name.',...
' It will be ignored.');
warning(wid,wrn)
end
for l = 1:numel(index)
% Take them all in case of duplicate names.
fields2read(end+1) = index(l);
end
end
end
%--------------------------------------------------------------------------
function info = dbfinfo(fid)
%DBFINFO Read header information from DBF file.
% FID File identifier for an open DBF file.
% INFO is a structure with the following fields:
% Filename Char array containing the name of the file that was read
% DBFVersion Number specifying the file format version
% FileModDate A string containing the modification date of the file
% NumRecords A number specifying the number of records in the table
% NumFields A number specifying the number of fields in the table
% FieldInfo A 1-by-numFields structure array with fields:
% Name A string containing the field name
% Type A string containing the field type
% ConvFunc A function handle to convert from DBF to MATLAB type
% Length A number of bytes in the field
% HeaderLength A number specifying length of the file header in bytes
% RecordLength A number specifying length of each record in bytes
% Copyright 1996-2005 The MathWorks, Inc.
% $Revision: 1.1.10.4 $ $Date: 2005/11/15 01:07:13 $
[version, date, numRecords, headerLength, recordLength] = readFileInfo(fid);
fieldInfo = getFieldInfo(fid);
info.Filename = fopen(fid);
info.DBFVersion = version;
info.FileModDate = date;
info.NumRecords = numRecords;
info.NumFields = length(fieldInfo);
info.FieldInfo = fieldInfo;
info.HeaderLength = headerLength;
info.RecordLength = recordLength;
%----------------------------------------------------------------------------
function [version, date, numRecords, headerLength, recordLength] = readFileInfo(fid)
% Read from File Header.
fseek(fid,0,'bof');
version = fread(fid,1,'uint8');
year = fread(fid,1,'uint8') + 1900;
month = fread(fid,1,'uint8');
day = fread(fid,1,'uint8');
dateVector = datevec(sprintf('%d/%d/%d',month,day,year));
dateForm = 1;% dd-mmm-yyyy
date = datestr(dateVector,dateForm);
numRecords = fread(fid,1,'uint32');
headerLength = fread(fid,1,'uint16');
recordLength = fread(fid,1,'uint16');
%----------------------------------------------------------------------------
function fieldInfo = getFieldInfo(fid)
% Form FieldInfo by reading Field Descriptor Array.
%
% FieldInfo is a 1-by-numFields structure array with the following fields:
% Name A string containing the field name
% Type A string containing the field type
% ConvFunc A function handle to convert from DBF to MATLAB type
% Length A number equal to the length of the field in bytes
lengthOfLeadingBlock = 32;
lengthOfDescriptorBlock = 32;
lengthOfTerminator = 1;
fieldNameOffset = 16; % Within table field descriptor
fieldNameLength = 11;
% Get number of fields.
fseek(fid,8,'bof');
headerLength = fread(fid,1,'uint16');
numFields = (headerLength - lengthOfLeadingBlock - lengthOfTerminator)...
/ lengthOfDescriptorBlock;
% Read field lengths.
fseek(fid,lengthOfLeadingBlock + fieldNameOffset,'bof');
lengths = fread(fid,[1 numFields],'uint8',lengthOfDescriptorBlock - 1);
% Read the field names.
fseek(fid,lengthOfLeadingBlock,'bof');
data = fread(fid,[fieldNameLength numFields],...
sprintf('%d*uint8=>char',fieldNameLength),...
lengthOfDescriptorBlock - fieldNameLength);
data(data == 0) = ' '; % Replace nulls with blanks
names = cellstr(data')';
% Read field types.
fseek(fid,lengthOfLeadingBlock + fieldNameLength,'bof');
dbftypes = fread(fid,[numFields 1],'uint8=>char',lengthOfDescriptorBlock - 1);
% Convert DBF field types to MATLAB types.
typeConv = dbftype2matlab(upper(dbftypes));
% Return a struct array.
fieldInfo = cell2struct(...
[names; {typeConv.MATLABType}; {typeConv.ConvFunc}; num2cell(lengths)],...
{'Name', 'Type', 'ConvFunc', 'Length'},1)';
%----------------------------------------------------------------------------
function typeConv = dbftype2matlab(dbftypes)
% Construct struct array with MATLAB types & conversion function handles.
typeLUT = ...
{'N', 'double', #str2double2cell;... % DBF numeric
'F', 'double', #str2double2cell;... % DBF float
'C', 'char', #cellstr;... % DBF character
'D', 'char', #cellstr}; % DBF date
unsupported = struct('MATLABType', 'unsupported', ...
'ConvFunc', #cellstr);
% Unsupported types: Logical,Memo,N/ANameVariable,Binary,General,Picture
numFields = length(dbftypes);
if numFields ~= 0
typeConv(numFields) = struct('MATLABType',[],'ConvFunc',[]);
end
for k = 1:numFields
idx = strmatch(dbftypes(k),typeLUT(:,1));
if ~isempty(idx)
typeConv(k).MATLABType = typeLUT{idx,2};
typeConv(k).ConvFunc = typeLUT{idx,3};
else
typeConv(k) = unsupported;
end
end
%----------------------------------------------------------------------------
function out = str2double2cell(in)
% Translate IN, an M-by-N array of class char, to an M-by-1 column vector
% OUT, of class double. IN may be blank- or null-padded. If IN(k,:) does
% not represent a valid scalar value, then OUT(k) has value NaN.
if isempty(in)
out = {[NaN]};
return
end
% Use sprintf when possible, but fall back to str2double for unusual cases.
fmt = sprintf('%%%df',size(in,2));
[data count] = sscanf(reshape(in',[1 numel(in)]),fmt);
if count == size(in,1)
out = cell(count,1);
for k = 1:count
out{k} = data(k);
end
else
out = num2cell(str2double(cellstr(in)));
end
UPDATE
STR2DOUBLE2CELL subfunction sometimes works incorrectly if number of digits in the input parameter is different (see this discussion).
Here is my version of STR2DOUBLE2CELL:
function out = str2double2cell(in)
% Translate IN, an M-by-N array of class char, to an M-by-1 column vector
% OUT, of class double. IN may be blank- or null-padded. If IN(k,:) does
% not represent a valid scalar value, then OUT(k) has value NaN.
if isempty(in)
out = {[NaN]};
return
end
out = cellfun(#str2double,cellstr(in),'UniformOutput',false);

The way I see it, you have two options:
Method 1: use ODBC to read the dBASE files:
This requires the database toolbox
cd 'path/to/dbf/files/'
conn = database('dBASE Files', '', '');
cur = exec(conn, 'select * from table');
res = fetch(cur);
res.Data
close(conn)
'dBASE Files' is an ODBC Data Source Name (DSN) (I believe its installed by default with MS Office). It uses the current directory to look for .dbf files.
Or maybe you can use a DSN-less connection string with something like:
driver = 'sun.jdbc.odbc.JdbcOdbcDriver';
url = 'jdbc:odbc:DRIVER={Microsoft dBase Driver (*.dbf)};DBQ=x:\path;DefaultDir=x:\path';
conn = database('DB', '', '', driver, url);
...
if this gives you trouble, try using the FoxPro ODBC Driver instead..
For Linux/Unix, the same thing could be done. A quick search revealed these:
How do set up an ODBC DSN on Mac or Linux/Unix
how to create and use dBase-format files with OpenOffice
Method 2: read/write .dbf files directly with a library
There's a Java library available for free from SVConsulting, that allows you to read/write DBF files: JDBF. (UPDATE: link seems to be dead, use Wayback Machine to access it)
You can use Java classes directly from MATLAB. Refer to the documentation to see how.

If you are interested only in numerical values, try xlsread command.

Related

MATLAB time duration in milliseconds from an excel file

I am making a Matlab program that uses data from an excel file designated from an open file dialog.
[filename, pathname] = uigetfile({'*.xlsx','Excel Files(*.xlsx)'; '*.txt','Txt Files(*.txt)'}, 'Pick a file');
FilePath = append(pathname,filename);
opts = detectImportOptions(FilePath, "ReadVariableNames", false);
opts = setvartype(opts, 1, 'char');
data = readtable(FilePath, opts);
table = data(:,1);
Now the code is like this.
enter image description here
After that, as you see the date is saved as string.
But what I really want to find is time difference (duration) in milliseconds.
The raw data looks like this:
enter image description here
A column and C column has the same time, so I want to only use A column data.
Please help a newbie with this!! I appreciate!
Problem resolved.
I share my code to help other people suffering the same problem.
Please leave comments if things can be more simplified.
Importing an Excel file to analyze
% Clean the memory and the code previously running
clc;
clear all;
close all;
% Sampling frequency of the acquired data
fs = 1e2; % Sampling Frequency - this can be found on LabView code.
Ts = 1/fs; % Sampling Interval
%Importing data from an excel file
[filename, pathname] = uigetfile({'*.xlsx','Excel Files(*.xlsx)'; '*.txt','Txt Files(*.txt)'}, 'Pick a file');
FilePath = append(pathname,filename);
[fPath ,fName, fExt] = fileparts(FilePath);
a. To find "Time duration" from the file
opts = spreadsheetImportOptions("NumVariables", 1);
% Specify sheet and range
opts.Sheet = "sheet1";
opts.DataRange = "A2";
% Specify column names and types
opts.VariableNames = "Time";
opts.VariableTypes = "datetime";
% Specify file level properties
opts.ImportErrorRule = "omitrow";
opts.MissingRule = "omitrow";
% Specify variable properties
opts = setvaropts(opts, "Time", "InputFormat", 'mm:ss.SSS');
tTable = readtable(FilePath, opts, "UseExcel", false);
tArray = table2array(tTable);
% Calculating time duration
tArray = tArray - tArray(1);
% Coverting to seconds
time = milliseconds(tArray)*1e-3;
% Clear temporary variables
clear opts;
% Discarding data if time difference is too big
ii = size(time(:,1));
k = 0;
disp('Now removing error data elements');
for i = 1:1:ii(1)-1
a = time(i);
b = time(i+1);
if (b-a)>0.5 && k==0
k=i+1;
fprintf('Elements from %d seconds will be removed. (%dth element)\n', time(k),k);
for j = ii(1):-1:k
if rem(j,10)==0
fprintf('%dth element is removed... \n',j);
end
time(j) = [];
end
break % Break the loop after removing error data
end
end
disp('Time table is set.');
clearvars -except filename FilePath fs pathname time k ii Ts fName
I created a similar excel file and I found the time columns are read as a char. So, what I did is get the index of those columns and convert it to datatime. After that, looks that it is working. Hopefully, this works.
% Define the excel file name (can be the path)
ExcelFile = 'demo_stackflow.xlsx';
% Get the options for that sheet and preserving the variable name
opts = detectImportOptions(ExcelFile,'Sheet','Sheet1',...
"VariableNamingRule","preserve")
% Get the idx where the variable is a char (In this case col 1 and 3)
CharVars = opts.VariableNames(contains(opts.VariableTypes,'char'));
% Convert the char columns to datetime
opts = setvaropts(opts,CharVars,'Type',"datetime");
% Get the data
data = readtable(ExcelFile, opts)
% Print the 1st column to see if the data type of the column is
% datetime
T = data(:,1)

Matlab: Error using readtable (line 216) Input must be a row vector of characters or string scalar

I gave the error Error using readtable (line 216) Input must be a row vector of characters or string scalar when I tried to run this code in Matlab:
clear
close all
clc
D = 'C:\Users\Behzad\Desktop\New folder (2)';
filePattern = fullfile(D, '*.xlsx');
file = dir(filePattern);
x={};
for k = 1 : numel(file)
baseFileName = file(k).name;
fullFileName = fullfile(D, baseFileName);
x{k} = readtable(fullFileName);
fprintf('read file %s\n', fullFileName);
end
% allDates should be out of the loop because it's not necessary to be in the loop
dt1 = datetime([1982 01 01]);
dt2 = datetime([2018 12 31]);
allDates = (dt1 : calmonths(1) : dt2).';
allDates.Format = 'MM/dd/yyyy';
% 1) pre-allocate a cell array that will store
% your tables (see note #3)
T2 = cell(size(x)); % this should work, I don't know what x is
% the x is xlsx files and have different sizes, so I think it should be in
% a loop?
% creating loop
for idx = 1:numel(x)
T = readtable(x{idx});
% 2) This line should probably be T = readtable(x(idx));
sort = sortrows(T, 8);
selected_table = sort (:, 8:9);
tempTable = table(allDates(~ismember(allDates,selected_table.data)), NaN(sum(~ismember(allDates,selected_table.data)),size(selected_table,2)-1),'VariableNames',selected_table.Properties.VariableNames);
T2 = outerjoin(sort,tempTable,'MergeKeys', 1);
% 3) You're overwriting the variabe T2 on each iteration of the i-loop.
% to save each table, do this
T2{idx} = fillmissing(T2, 'next', 'DataVariables', {'lat', 'lon', 'station_elevation'});
end
the x is each xlsx file from the first loop. my xlsx file has a different column and row size. I want to make the second loop process for all my xlsx files in the directory.
did you know what is the problem? and how to fix it?
Readtable has one input argument, a filename. It returns a table. In your code you have the following:
x{k} = readtable(fullFileName);
All fine, you are reading the tables and storing the contents in x. Later in your code you continue with:
T = readtable(x{idx});
You already read the table, what you wrote is basically T = readtable(readtable(fullFileName)). Just use T=x{idx}

For loop iteration through .txt files

I have a function that is part of an object that I have name TestData. Here are the properties and its constructor:
classdef TestData
properties
metaData = []; % stores meta data in Nx2 array
data = []; % stores data in PxQ array
colLabels = []; % labels columns
colUnits = []; % provides units
temp = []; % temporary to bypass setters while structuring data
end
methods
%% Constructor
function this = TestData(varargin) % Will take in a variable number of inputs
if (nargin == 0)
return;
end
if (nargin == 1 && ischar(varargin{1})) % Case of the input being a file (.txt, .s*p, etc..)
this = readDataFromFile(this, varargin{1});
return;
elseif (nargin == 4) % Case of the input being raw data. Note that the order matters.
this.metaData = varargin{1};
this.colLabels = varargin{2};
this.colUnits = varargin{3};
this.data = varargin{4};
return
else
error('Not enough input arguments.');
end
end
Now, let us say that I have 42 text files that I want to pass into the object. I have created a function for this called extractData that sifts through the data and finds a user input x-axis and y-axis. It does so by first calling a function readDataFromFile, which sets the property arrays that are outlined in the code above. It is a fairly long function and I won't include it for the time being, but I will say that I do know it is working.
The problem: I make an object of the class called myTestData and perform this operation in the CW, myTestData=myTestData.extractData('Tip Type','Tip S/N'). This will look for the Tip Type and Tip S/N. As you can see in the code, the type and S/N are in column 3 of the matrix. If the string is found, it takes the row it was found on, accesses column 3 of that row, and places that value into the temp array. In my case, the type is passive and the S/N is AM66191. It will write this to the temp array, but then on the second iteration of the for loop it will write something else over the first row and the passive and AM66191 will be displayed on the second row. It will do this all the way until the end, where on row 42 I will see the type and S/N, but all the other rows are just garbage. I have attached an image of what the temp array looks like after the second iteration of the for loop. Here is the code for the extractData function:
function this = extractData(this, xAxis, yAxis)
s = dir('*.txt'); % Gather all text files
disp(length(s))
for i=1:length(s) % Loop through and gather data until last element of strcuct
j = 1;
fid = s(i).name; % Open file in read-only mode
this = this.readDataFromFile(fid);
disp(fid)
x = this.metaData(find(contains(this.metaData(:,1), xAxis)),3);
this.temp(i,j) = x;
disp(this.temp(i,j))
j = j+1;
y = this.metaData(find(contains(this.metaData, yAxis)),3); %#ok<*FNDSB>
this.temp(i,j) = y;
disp(this.temp(i,j))
end %for
end %extractData

Read multiple files with for loop

My code is below. In the code, I am evaluating only the data in the 'fb2010' file. I want to add other files" 'fb2020', 'fb2030', and 'fb2040' and evaluate their data by the same code. My question is how to apply a for loop and include the other data files. I tried, but I got confused by the for loop.
load('fb2010'); % loading the data
x = fb2010(3:1:1502,:);
% y_filt = filter(b,a,x); % filtering the received signal
y_filt= filter(b,a,x,[],2);
%%%%%%% fourier transform
nfft = length(y_filt);
res = fft(y_filt,nfft,2)/nfft;
res2 = res(:,1:nfft/2+1); %%%% taking single sided spectrum
res3 = fft(res2,[],2);
for i = 3:1:1500 %%%% dividing each row by first row.
resd(i,:) = res3(i,:)./res3(1,:);
end
I'm assuming that your files are MAT-files, not ASCII. You can do this by having load return a struct and using dynamic field referencing:
n = 4;
for i = 1:n
vname = ['fb20' int2str(i) '0']; % Create file/variable name based on index
s = load(vname); % Load data as struct (overwriting previous s)
x = s.(vname)(3:1:1502,:); % Access struct with dynamic field reference
% Rest of your code
...
end
If you're using a plain ASCII file, load won't produce a struct. However, such files are much simpler (see documentation for load/save). The following code would probably work:
n = 4;
for i = 1:n
vname = ['fb20' int2str(i) '0']; % Create file/variable name based on index
s = load(vname); % Load data as matrix (overwriting previous s)
x = s(3:1:1502,:); % Directly index matrix
% Rest of your code
...
end
It would be a good idea to add the file extension to your load command to make your code more readable.

Query DBF Data Base in MatLab using JDBC under windows 7 x64 [duplicate]

I've googled and searched through Matlab Central, but cannot find any way to open DBF files directly in Matlab. There are some references to DBFREAD function in TMW File Exchange, but it's not available anymore. Is it really a problem?
I do have Database toolbox, but could not find dbf support there.
I don't want to use Excel or other tools for converting files outside of Matlab, since I have a lot of files to process. ODBC also is not good, I need the code to work under Mac and Unix as well.
Please help.
I contacted with Brian Madsen, the author of DBFREAD function, which was deleted from File Exchange, probably because The Mathworks is going to include this function into MATLAB in some future release. Brian kindly gave me permission to publish this function here. All copyright information left intact. I only modified lines 33-38 to allow DBFREAD to read file outside working directory.
function [dbfData, dbfFieldNames] = dbfread(filename, records2read, requestedFieldNames)
%DBFREAD Read the specified records and fields from a DBF file.
%
% [DATA, NAMES] = DBFREAD(FILE) reads numeric, float, character and date
% data and field names from a DBF file, FILE.
%
% [DATA, NAMES] = DBFREAD(FILE, RECORDS2READ) reads only the record
% numbers specified in RECORDS2READ, a scalar or vector.
%
% [DATA, NAMES] = DBFREAD(FILE, RECORDS2READ, REQUESTEDFIELDNAMES) reads
% the data from the fields, REQUESTEDFIELDNAMES, for the specified
% records. REQUESTEDFIELDNAMES must be a cell array. The fields in the
% output will follow the order given in REQUESTEDFIELDNAMES.
%
% Examples:
%
% % Get all records and a list of the field names from a DBF file.
% [DATA,NAMES] = dbfread('c:\matlab\work\mydbf')
%
% % Get data from records 3:5 and 10 from a DBF file.
% DATA = dbfread('c:\matlab\work\mydbf',[3:5,10])
%
% % Get data from records 1:10 for three of the fields in a DBF file.
% DATA = dbfread('c:\matlab\work\mydbf',1:10,{'FIELD1' 'FIELD3' 'FIELD5'})
%
% See also XLSREAD, DLMREAD, DLMWRITE, LOAD, FILEFORMATS, TEXTSCAN.
% Copyright 2008 The MathWorks, Inc.
% $Revision: 1.0 $ $Date: 2008/04/18 05:58:17 $
[pathstr,name,ext] = fileparts(filename);
dbfFileId = fopen(filename,'r','ieee-le');
if (dbfFileId == -1)
dbfFileId = fopen(fullfile(pathstr, [name '.dbf']),'r','ieee-le');
end
if (dbfFileId == -1)
dbfFileId = fopen(fullfile(pathstr, [name '.DBF']),'r','ieee-le');
end
if (dbfFileId == -1)
eid = sprintf('MATLAB:%s:missingDBF', mfilename);
msg = sprintf('Failed to open file %s.dbf or file %s.DBF.',...
name, name);
error(eid,'%s',msg)
end
info = dbfinfo(dbfFileId);
if ~exist('requestedFieldNames','var')
dbfFieldNames = {info.FieldInfo.Name};
requestedFieldNames = dbfFieldNames;
else
dbfFieldNames = (info.FieldInfo(matchFieldNames(info,requestedFieldNames)).Name);
end
fields2read = matchFieldNames(info,requestedFieldNames);
% The first byte in each record is a deletion indicator
lengthOfDeletionIndicator = 1;
if ~exist('records2read','var')
records2read = (1:info.NumRecords);
elseif max(records2read) > info.NumRecords
eid = sprintf('MATLAB:%s:invalidRecordNumber', mfilename);
msg = sprintf('Record number %d does not exist, please select from the range 1:%d.',...
max(records2read), info.NumRecords);
error(eid,'%s',msg)
end
% Loop over the requested fields, reading in the data
dbfData = cell(numel(records2read),numel(fields2read));
for k = 1:numel(fields2read),
n = fields2read(k);
fieldOffset = info.HeaderLength ...
+ sum([info.FieldInfo(1:(n-1)).Length]) ...
+ lengthOfDeletionIndicator;
fseek(dbfFileId,fieldOffset,'bof');
formatString = sprintf('%d*uint8=>char',info.FieldInfo(n).Length);
skip = info.RecordLength - info.FieldInfo(n).Length;
data = fread(dbfFileId,[info.FieldInfo(n).Length info.NumRecords],formatString,skip);
dbfData(:,k) = feval(info.FieldInfo(n).ConvFunc,(data(:,records2read)'));
% dbfData(:,k) = info.FieldInfo(n).ConvFunc(data(:,records2read)');
end
fclose(dbfFileId);
%--------------------------------------------------------------------------
function fields2read = matchFieldNames(info, requestedFieldNames)
% Determine which fields to read.
allFieldNames = {info.FieldInfo.Name};
if isempty(requestedFieldNames)
if ~iscell(requestedFieldNames)
% Default case: User omitted the parameter, return all fields.
fields2read = 1:info.NumFields;
else
% User supplied '{}', skip all fields.
fields2read = [];
end
else
% Match up field names to see which to return.
fields2read = [];
for k = 1:numel(requestedFieldNames)
index = strmatch(requestedFieldNames{k},allFieldNames,'exact');
if isempty(index)
wid = sprintf('MATLAB:%s:nonexistentDBFName',mfilename);
wrn = sprintf('DBF name ''%s'' %s\n%s',requestedFieldNames{k},...
'doesn''t match an existing DBF name.',...
' It will be ignored.');
warning(wid,wrn)
end
for l = 1:numel(index)
% Take them all in case of duplicate names.
fields2read(end+1) = index(l);
end
end
end
%--------------------------------------------------------------------------
function info = dbfinfo(fid)
%DBFINFO Read header information from DBF file.
% FID File identifier for an open DBF file.
% INFO is a structure with the following fields:
% Filename Char array containing the name of the file that was read
% DBFVersion Number specifying the file format version
% FileModDate A string containing the modification date of the file
% NumRecords A number specifying the number of records in the table
% NumFields A number specifying the number of fields in the table
% FieldInfo A 1-by-numFields structure array with fields:
% Name A string containing the field name
% Type A string containing the field type
% ConvFunc A function handle to convert from DBF to MATLAB type
% Length A number of bytes in the field
% HeaderLength A number specifying length of the file header in bytes
% RecordLength A number specifying length of each record in bytes
% Copyright 1996-2005 The MathWorks, Inc.
% $Revision: 1.1.10.4 $ $Date: 2005/11/15 01:07:13 $
[version, date, numRecords, headerLength, recordLength] = readFileInfo(fid);
fieldInfo = getFieldInfo(fid);
info.Filename = fopen(fid);
info.DBFVersion = version;
info.FileModDate = date;
info.NumRecords = numRecords;
info.NumFields = length(fieldInfo);
info.FieldInfo = fieldInfo;
info.HeaderLength = headerLength;
info.RecordLength = recordLength;
%----------------------------------------------------------------------------
function [version, date, numRecords, headerLength, recordLength] = readFileInfo(fid)
% Read from File Header.
fseek(fid,0,'bof');
version = fread(fid,1,'uint8');
year = fread(fid,1,'uint8') + 1900;
month = fread(fid,1,'uint8');
day = fread(fid,1,'uint8');
dateVector = datevec(sprintf('%d/%d/%d',month,day,year));
dateForm = 1;% dd-mmm-yyyy
date = datestr(dateVector,dateForm);
numRecords = fread(fid,1,'uint32');
headerLength = fread(fid,1,'uint16');
recordLength = fread(fid,1,'uint16');
%----------------------------------------------------------------------------
function fieldInfo = getFieldInfo(fid)
% Form FieldInfo by reading Field Descriptor Array.
%
% FieldInfo is a 1-by-numFields structure array with the following fields:
% Name A string containing the field name
% Type A string containing the field type
% ConvFunc A function handle to convert from DBF to MATLAB type
% Length A number equal to the length of the field in bytes
lengthOfLeadingBlock = 32;
lengthOfDescriptorBlock = 32;
lengthOfTerminator = 1;
fieldNameOffset = 16; % Within table field descriptor
fieldNameLength = 11;
% Get number of fields.
fseek(fid,8,'bof');
headerLength = fread(fid,1,'uint16');
numFields = (headerLength - lengthOfLeadingBlock - lengthOfTerminator)...
/ lengthOfDescriptorBlock;
% Read field lengths.
fseek(fid,lengthOfLeadingBlock + fieldNameOffset,'bof');
lengths = fread(fid,[1 numFields],'uint8',lengthOfDescriptorBlock - 1);
% Read the field names.
fseek(fid,lengthOfLeadingBlock,'bof');
data = fread(fid,[fieldNameLength numFields],...
sprintf('%d*uint8=>char',fieldNameLength),...
lengthOfDescriptorBlock - fieldNameLength);
data(data == 0) = ' '; % Replace nulls with blanks
names = cellstr(data')';
% Read field types.
fseek(fid,lengthOfLeadingBlock + fieldNameLength,'bof');
dbftypes = fread(fid,[numFields 1],'uint8=>char',lengthOfDescriptorBlock - 1);
% Convert DBF field types to MATLAB types.
typeConv = dbftype2matlab(upper(dbftypes));
% Return a struct array.
fieldInfo = cell2struct(...
[names; {typeConv.MATLABType}; {typeConv.ConvFunc}; num2cell(lengths)],...
{'Name', 'Type', 'ConvFunc', 'Length'},1)';
%----------------------------------------------------------------------------
function typeConv = dbftype2matlab(dbftypes)
% Construct struct array with MATLAB types & conversion function handles.
typeLUT = ...
{'N', 'double', #str2double2cell;... % DBF numeric
'F', 'double', #str2double2cell;... % DBF float
'C', 'char', #cellstr;... % DBF character
'D', 'char', #cellstr}; % DBF date
unsupported = struct('MATLABType', 'unsupported', ...
'ConvFunc', #cellstr);
% Unsupported types: Logical,Memo,N/ANameVariable,Binary,General,Picture
numFields = length(dbftypes);
if numFields ~= 0
typeConv(numFields) = struct('MATLABType',[],'ConvFunc',[]);
end
for k = 1:numFields
idx = strmatch(dbftypes(k),typeLUT(:,1));
if ~isempty(idx)
typeConv(k).MATLABType = typeLUT{idx,2};
typeConv(k).ConvFunc = typeLUT{idx,3};
else
typeConv(k) = unsupported;
end
end
%----------------------------------------------------------------------------
function out = str2double2cell(in)
% Translate IN, an M-by-N array of class char, to an M-by-1 column vector
% OUT, of class double. IN may be blank- or null-padded. If IN(k,:) does
% not represent a valid scalar value, then OUT(k) has value NaN.
if isempty(in)
out = {[NaN]};
return
end
% Use sprintf when possible, but fall back to str2double for unusual cases.
fmt = sprintf('%%%df',size(in,2));
[data count] = sscanf(reshape(in',[1 numel(in)]),fmt);
if count == size(in,1)
out = cell(count,1);
for k = 1:count
out{k} = data(k);
end
else
out = num2cell(str2double(cellstr(in)));
end
UPDATE
STR2DOUBLE2CELL subfunction sometimes works incorrectly if number of digits in the input parameter is different (see this discussion).
Here is my version of STR2DOUBLE2CELL:
function out = str2double2cell(in)
% Translate IN, an M-by-N array of class char, to an M-by-1 column vector
% OUT, of class double. IN may be blank- or null-padded. If IN(k,:) does
% not represent a valid scalar value, then OUT(k) has value NaN.
if isempty(in)
out = {[NaN]};
return
end
out = cellfun(#str2double,cellstr(in),'UniformOutput',false);
The way I see it, you have two options:
Method 1: use ODBC to read the dBASE files:
This requires the database toolbox
cd 'path/to/dbf/files/'
conn = database('dBASE Files', '', '');
cur = exec(conn, 'select * from table');
res = fetch(cur);
res.Data
close(conn)
'dBASE Files' is an ODBC Data Source Name (DSN) (I believe its installed by default with MS Office). It uses the current directory to look for .dbf files.
Or maybe you can use a DSN-less connection string with something like:
driver = 'sun.jdbc.odbc.JdbcOdbcDriver';
url = 'jdbc:odbc:DRIVER={Microsoft dBase Driver (*.dbf)};DBQ=x:\path;DefaultDir=x:\path';
conn = database('DB', '', '', driver, url);
...
if this gives you trouble, try using the FoxPro ODBC Driver instead..
For Linux/Unix, the same thing could be done. A quick search revealed these:
How do set up an ODBC DSN on Mac or Linux/Unix
how to create and use dBase-format files with OpenOffice
Method 2: read/write .dbf files directly with a library
There's a Java library available for free from SVConsulting, that allows you to read/write DBF files: JDBF. (UPDATE: link seems to be dead, use Wayback Machine to access it)
You can use Java classes directly from MATLAB. Refer to the documentation to see how.
If you are interested only in numerical values, try xlsread command.