Why does Matlab dbf-reader read certain integers wrong? - matlab

I use the matlab dbf reader available
here
I've noticed that three digit integers some times are read wrong.
Original data from dbf-file:
LAMAX,DTLD,1,599,727Q9,A,STANDARD,1,18,18,0,2359.5
But looking at the data in Matlab you see that 599 becomes 995.
Why is that?
'LAMAX','DTLD',[1],[995],'727Q9','A','STANDARD','1','18','18','0',
[2.3595e+03]
This is how I read the dbf file with matlab code
[dbfData, NAMES] = dbfread(path2file);
where dbfData is the actual data and NAMES are the field names in the dbf-file.
EDIT:
The dbf-file was created with INM
When I open the dbf file using OpenOffice the headers look like this
METRIC_ID,C,6 ; GRID_ID,C,8I_INDEX,N,3,0 ; J_INDEX,N,3,0 ; ACFT_ID,C,12 ; OP_TYPE,C,1 ; PROF_ID1,C,8 ; PROF_ID2,C,1 ; RWY_ID,C,8 ; TRK_ID1,C,8 ; TRK_ID2,C,1 ; DISTANCE,N,9,1
The distorted integers are stored with 3 digits numbers without decimals J_INDEX,N,3,0

Have you used the updated version of STR2DOUBLE2CELL?
From the link above:
STR2DOUBLE2CELL subfunction sometimes works incorrectly if number of digits in the input parameter is different

Related

How to import dates correctly from this .csv file into Matlab?

I have a .csv file with the first column containing dates, a snippet of which looks like the following:
date,values
03/11/2020,1
03/12/2020,2
3/14/20,3
3/15/20,4
3/16/20,5
04/01/2020,6
I would like to import this data into Matlab (I think the best way would probably be using the readtable() function, see here). My goal is to bring the dates into Matlab as a datetime array. As you can see above, the problem is that the dates in the original .csv file are not consistently formatted. Some of them are in the format mm/dd/yyyy and some of them are mm/dd/yy.
Simply calling data = readtable('myfile.csv') on the .csv file results in the following, which is not correct:
'03/11/2020' 1
'03/12/2020' 2
'03/14/0020' 3
'03/15/0020' 4
'03/16/0020' 5
'04/01/2020' 6
Does anyone know a way to automatically account for this type of data in the import?
Thank you!
My version: Matlab R2017a
EDIT ---------------------------------------
Following the suggestion of Max, I have tried specifiying some of the input options for the read command using the following:
T = readtable('example.csv',...
'Format','%{dd/MM/yyyy}D %d',...
'Delimiter', ',',...
'HeaderLines', 0,...
'ReadVariableNames', true)
which results in:
date values
__________ ______
03/11/2020 1
03/12/2020 2
NaT 3
NaT 4
NaT 5
04/01/2020 6
and you can see that this is not working either.
If you are sure all the dates involved do not go back more than 100 years, you can easily apply the pivot method which was in use in the last century (before th 2K bug warned the world of the danger of the method).
They used to code dates in 2 digits only, knowing that 87 actually meant 1987. A user (or a computer) would add the missing years automatically.
In your case, you can read the full table, parse the dates, then it is easy to detect which dates are inconsistent. Identify them, correct them, and you are good to go.
With your example:
a = readtable(tfile) ; % read the file
dates = datetime(a.date) ; % extract first column and convert to [datetime]
idx2change = dates.Year < 2000 ; % Find which dates where on short format
dates.Year(idx2change) = dates.Year(idx2change) + 2000 ; % Correct truncated years
a.date = dates % reinject corrected [datetime] array into the table
yields:
a =
date values
___________ ______
11-Mar-2020 1
12-Mar-2020 2
14-Mar-2020 3
15-Mar-2020 4
16-Mar-2020 5
01-Apr-2020 6
Instead of specifying the format explicitly (as I also suggested before), one should use the delimiterImportoptions and in the case of a csv-file, use the delimitedTextImportOptions
opts = delimitedTextImportOptions('NumVariables',2,...% how many variables per row?
'VariableNamesLine',1,... % is there a header? If yes, in which line are the variable names?
'DataLines',2,... % in which line does the actual data starts?
'VariableTypes',{'datetime','double'})% as what data types should the variables be read
readtable('myfile.csv',opts)
because the neat little feature recognizes the format of the datetime automatically, as it knows that it must be a datetime-object =)

MATLAB fwrite\fread issue: two variables are being concatenated

I am reading in a binary EDF file and I have to split it into multiple smaller EDF files at specific points and then adjust some of the values inside. Overall it works quite well but when I read in the file it combines 2 character arrays with each other. Obviously everything afterwords gets corrupted as well. I am at a dead end and have no idea what I'm doing wrong.
The part of the code (writing) that has to contain the problem:
byt=fread(fid,8,'*char');
fwrite(tfid,byt,'*char');
fwrite(tfid,fread(fid,44));
%new number of records
s = records;
fwrite(tfid,s,'*char');
fseek(fid,8,0);
%test
fwrite(tfid,fread(fid,8,'*char'),'*char');
When I use the reader it combines the records (fwrite(tfid,s,'*char'))
with the value of the next variable. All variables before this are displayed correctly. The relevant code of the reader:
hdr.bytes = str2double(fread(fid,8,'*char')');
reserved = fread(fid,44);%#ok
hdr.records = str2double(fread(fid,8,'*char')');
if hdr.records == -1
beep
disp('There appears to be a problem with this file; it returns an out-of-spec value of -1 for ''numberOfRecords.''')
disp('Attempting to read the file with ''edfReadUntilDone'' instead....');
[hdr, record] = edfreadUntilDone(fname, varargin);
return
end
hdr.duration = str2double(fread(fid,8,'*char')');
The likely problem is that your character array s does not have 8 characters in it, but you expect there to be 8 when you read it from the file. Whatever the number of characters in the array is, that's how many values fwrite will write out to the file. Anything less than 8 characters and you'll end up reading part of the next piece of data when you read from the file.
One fix would be to pad s with blanks before writing it:
s = [blanks(8-numel(records)) records];
In addition, the syntax '*char' is only valid when using fread: the * indicates that the output class should be 'char' as well. It's unnecessary when using fwrite.

How to read a number from text file via Matlab

I have 1000 text files and want to read a number from each file.
format of text file as:
af;laskjdf;lkasjda123241234123
$sakdfja;lskfj12352135qadsfasfa
falskdfjqwr1351
##alskgja;lksjgklajs23523,
asdfa#####1217653asl123654fjaksj
asdkjf23s#q23asjfklj
asko3
I need to read the number ("1217653") behind "#####" in each txt file.
The number will follow the "#####" closely in all text file.
"#####" and the close following number just appear one time in each file.
clc
clear
MyFolderInfo = dir('yourpath/folder');
fidin = fopen(file_name,'r','n','utf-8');
while ~feof(fidin)
tline=fgetl(fidin);
disp(tline)
end
fclose(fidin);
It is not finish yet. I am stuck with the problem that it can not read after the space line.
This is another approach using the function regex. This will easily provide a more advanced way of reading files and does not require reading the full file in one go. The difference from the already given example is basically that I read the file line-by-line, but since the example use this approach I believe it is worth answering. This will return all occurences of "#####NUMBER"
function test()
h = fopen('myfile.txt');
str = fgetl(h);
k = 1;
while (isempty(str) | str ~= -1 ) % Empty line returns empty string and EOF returns -1
res{k} = regexp(str,'#####\d+','match');
k = k+1;
str = fgetl(h);
end
for k=1:length(res)
disp(res{k});
end
EDIT
Using the expression '#####(\d+)' and the argument 'tokens' instead of 'match' Will actually return the digits after the "#####" as a string. The intent with this post was also, apart from showing another way to read the file, to show how to use regexp with a simple example. Both alternatives can be used with suitable conversion.
Assuming the following:
All files are ASCII files.
The number you are looking to extract is directly following #####.
The number you are looking for is a natural number.
##### followed by a number only occurs once per file.
You can use this code snippet inside a for loop to extract each number:
regx='#####(\d+)';
str=fileread(fileName);
num=str2double(regexp(str,regx,'tokens','once'));
Example of for loop
This code will iterate through ALL files in yourpath/folder and save the numbers into num.
regx='#####(\d+)'; % Create regex
folderDir='yourpath/folder';
files=cellstr(ls(folderDir)); % Find all files in folderDir
files=files(3:end); % remove . and ..
num=zeros(1,length(files)); % Pre allocate
for i=1:length(files) % Iterate through files
str=fileread(fullfile(folderDir,files{i})); % Extract str from file
num(i)=str2double(regexp(str,regx,'tokens','once')); % extract number using regex
end
If you want to extract more ''advanced'' numbers e.g. Integers or Real numbers, or handle several occurrences of #####NUMBER in a file you will need to update your question with a better representation of your text files.

How to import large dataset and put it in a single column

I want to import the large data set (multiple column) by using the following code. I want to get all in a single column instead only one row (multi column). So i did transpose operation but it still doesn't work appropriately.
clc
clear all
close all
dataX_Real = fopen('dataX_Real_in.txt');dataX_Real=dataX_Real';
I will really appreciate your support and suggestions. Thank You
The sample files can be found using the following link.
When using fopen, all you are doing is opening up the file. You aren't reading in the data. What is returned from fopen is actually a file pointer that gives you access to the contents of the file. It doesn't actually read in the contents itself. You would need to use things like fread or fscanf to read in the content from the text data.
However, I would recommend you use dlmread instead, as this doesn't require a fopen call to open your file. This will open up the file, read the contents and store it into a variable in one function call:
dataX_Real = dlmread('dataX_Real_in.txt');
By doing the above and using your text file, I get 44825 elements. Here are the first 10 entries of your data:
>> format long;
>> dataX_Real(1:10)
ans =
Columns 1 through 4
-0.307224970000000 0.135961950000000 -1.072544100000000 0.114566020000000
Columns 5 through 8
0.499754310000000 -0.340369000000000 0.470609910000000 1.107567700000000
Columns 9 through 10
-0.295783020000000 -0.089266816000000
Seems to match up with what we see in your text file! However, you said you wanted it as a single column. This by default reads the values in on a row basis, so here you can certainly transpose:
dataX_Real = dataX_Real.';
Displaying the first 10 elements, we get:
>> dataX_Real = dataX_Real.';
>> dataX_Real(1:10)
ans =
-0.307224970000000
0.135961950000000
-1.072544100000000
0.114566020000000
0.499754310000000
-0.340369000000000
0.470609910000000
1.107567700000000
-0.295783020000000
-0.089266816000000

How to avoid the repeated paragraghs of long txt files being ignored for importdata in matlab

I am trying to import all double from a txt file, which has this form
#25x1 string
#9999x2 double
.
.
.
#(repeat ten times)
However, when I am trying to use import Wizard, only the first
25x1 string
9999x2 double.
was successfully loaded, the other 9 were simply ignored
How may I import all the data? (Does importdata has a maximum length or something?)
Thanks
It's nothing to do with maximum length, importdata is just not set up for the sort of data file you describe. From the help file:
For ASCII files and spreadsheets, importdata expects
to find numeric data in a rectangular form (that is, like a matrix).
Text headers can appear above or to the left of the numeric data,
as follows:
Column headers or file description text at the top of the file, above
the numeric data. Row headers to the left of the numeric data.
So what is happening is that the first section of your file, which does match the format importdata expects, is being read, and the rest ignored. Instead of importdata, you'll need to use textscan, in particular, this style:
C = textscan(fileID,formatSpec,N)
fileID is returned from fopen. formatspec tells textscan what to expect, and N how many times to repeat it. As long as fileID remains open, repeated calls to textscan continue to read the file from wherever the last read action stopped - rather than going back to the start of the file. So we can do this:
fileID = fopen('myfile.txt');
repeats = 10;
for n = 1:repeats
% read one string, 25 times
C{n,1} = textscan(fileID,'%s',25);
% read two floats, 9999 times
C{n,2} = textscan(fileID,'%f %f',9999);
end
You can then extract your numerical data out of the cell array (if you need it in one block you may want to try using 'CollectOutput',1 as an option).