import data from files with unknown format in matlab - matlab

I have some files with about 3000 entries including text and numeric data similar to this:
1 0 23 'x' 'x' 'x' 0 0 0 1 1 10.3 54 123.45678 'x' 'x' 'x' ...
i want to import each file data in a separate 1x3000 vector in MatLab but when i use 'importdata' function, it creates a 1x1 struct with two fields (data and textdata).
file_path = '/home/my/file/path';
list_of_files = dir(file_path);
for i = 3:end_
new_data = importdata(fullfile(file_path,list_of_files(i).name));
end
Also i tried to use 'textscan' function but it requires format specification but format of files is unknown (length of each file is constant but it's not clear where we have 'x' or a number)
does anybody have suggestions what to do?

You can not have an array containing both numeric values and strings.
If you want to have both numeric values and strings you have to use cellarray.
Since, as you wrote, you do not like having a struct with two fields, using textscan seems a promising way even if it will be a little complicated.
You can overcame the problem of the format specification by specifying:
the format as string
the delimiter as ' (in you example the text is included within two ')
Your input file will them be stored in a cellarray as sset of strings.
Now you can extract both the numeric values and the strings by scanning the elements of the cellarray.
To identify the numeric values you can try to convert the string in a numeric array with the function str2num:
if the string only contains nueric values, you can store cumulatively the values in an array
if the conversin fails, that means that it was a string, then you can store it cumulatively in a string
You can also set a flag and use it to allows inserting in the numeric output array a value (e. g. NaN) when a string is found; this can allows you to understand where the strings were in the input file.
Also for both the above conditions, you can evaluate the length of the partial arrays of numbers or string and store it in another array.
This allows you to understand where a specific number or string was in the input file.
In the folowing you cn find a possible implementatino of the above described approach.
% Open the input file
fp=fopen('mix_n_s.dat','r');
% Read the input file as a string in a cell array using "'" as a
% delimitator
% c=textscan(fp,'%s','delimiter','''');
c=textscan(fp,'%s','delimiter','''');
% Close the input file
fclose(fp);
% Extract the cell array
a=c{1};
% Initialize the output variables
% Array with the numeric values
numeric_array=[];
% String with the string in the input file
the_strings=[];
% Array with the number of numeric values and strings
the_cnt=[];
% Define the flag for enabling the isertion of NaN in the output numeric
% array in case a string is found
insert_nan=1;
% Scan the cellarray to extract the numbers and the strings
for i=1:length(a)
x=a{i}
% If the i-th element is empty (this occurs when there are at least two
% consecutive string in the input file, do nothing
if(~isempty(x))
% If the i-th element is not empty try to convert it into a numeric array
m=str2num(x);
% If the output is not empty you have read one or more than one
% numeric values
if(~isempty(m));
% Then store them into an array
numeric_array=[numeric_array m];
% The lengh of the array gives you the number of numeric values;
% store it the array
the_cnt=[the_cnt length(m)];
else
% If the conversin failed, you have read a string; store it in a
% string
the_strings=[the_strings ' ' x];
% Store the length of the string in the array; if you store it as
% a negative value, you can recognise it later on
the_cnt=[the_cnt -length(x)];
% if the flag is on, then insert NaN in the numeric array
if(insert_nan)
numeric_array=[numeric_array NaN];
end
end
end
end
numeric_array
the_strings
the_cnt
Based in the input example you've provided (I've slightly modified the strings):
1 0 23 'x' 'x' 'x' 0 0 0 1 1 10.3 54 123.45678 'x' 'x' 'x'
the output is the following (the flag for inssert NaN is on):
numeric_array =
Columns 1 through 7
1.0000 0 23.0000 NaN NaN NaN 0
Columns 8 through 14
0 0 1.0000 1.0000 10.3000 54.0000 123.4568
Columns 15 through 17
NaN NaN NaN
the_strings =
x abcd efghilm x x x
the_cnt =
3 -1 -4 -7 8 -1 -1 -1
It can be interpreted as follows:
looking at the numeric_array array: in the input file
three numeric values, then three strings, then eight numeric values and three strings
looking at the the_cnt array, you can understand the length (discard the - sign) of each string.
Hope this helps.
Qapla'

Related

Matlab fscanf read two column character/hex data from text file

Need to read in data stored as two columns of hex values in text file temp.dat into a Matlab variable with 8 rows and two columns.
Would like to stick with the fcsanf method.
temp.dat looks like this (8 rows, two columns):
0000 7FFF
30FB 7641
5A82 5A82
7641 30FB
7FFF 0000
7641 CF05
5A82 A57E
30FB 89BF
% Matlab code
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
% Matlab treats hex a a character string
formatSpec = '%s %s';
% Want the output variable to be 8 rows two columns
sizeA = [8,2];
A = fscanf(fid,formatSpec,sizeA)
fclose(fid);
Matlab is producing the following which I don't expect.
A = 8×8 char array
'03577753'
'00A6F6A0'
'0F84F48F'
'0B21F12B'
'77530CA8'
'F6A00F59'
'F48F007B'
'F12B05EF'
In another variation, I attemped changing the format string like this
formatSpec = '%4c %4c';
Which produced this output:
A =
8×10 char array
'0↵45 F7↵78'
'031A3F65E9'
'00↵80 4A↵B'
'0F52F0183F'
'7BA7B0C20 '
'F 86↵0F F '
'F724700AB '
'F6 1F↵55 '
Still another variation like this:
formatSpec = '%4c %4c';
sizeA = [8,16];
A = fscanf(fid,formatSpec);
Produces a one by 76 character array:
A =
'00007FFF
30FB 7641
5A82 5A827641 30FB
7FFF 0000
7641CF05
5A82 A57E
30FB 89BF'
Would like and expect Matlab to produce a workspace variable with 8 rows and 2 columns.
Have followed the example on the Matlab help area here:
https://www.mathworks.com/help/matlab/ref/fscanf.html
My Matlab code is based on the 'read file contents into an array' section about 1/3 of the way down the page. The example I reference is doing something very similar except that the two columns are one int and one float rather than two characters.
Running Matlab R2017a on Redhat.
Here is the complete code with the solution provided by Azim and comments about
what I learned as a result of posting the question.
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
formatSpec = '%9c\n';
% specify the output size as the input transposed, NOT the input.
sizeA = [9,8];
A = fscanf(fid,formatSpec,sizeA);
% A' is an 8 by 9 character array, which is the goal matrix size.
% B is an 8 by 1 cell array, each member has this format 'dead beef'.
%
% Cell arrays are data types with indexed data containers called cells,
% where each cell can contain any type of data.
B = cellstr(A');
% split divides str at whitespace characters.
S = split(C)
fclose(fid)
S =
8×2 cell array
'0000' '7FFF'
'30FB' '7641'
'5A82' '5A82'
'7641' '30FB'
'7FFF' '0000'
'7641' 'CF05'
'5A82' 'A57E'
'30FB' '89BF'
It is likely your, 8x2 MATLAB variable would end up being a cell array. This can be done in two steps.
First, your lines have 9 characters so you could use formatSpec = '%9c\n' to read each line. Next you need to adjust the size parameter to read 9 rows and 8 columns; sizeA = [9 8]. This will read in all 9 characters into columns of the output; transposing the output will get you closer.
In the second step you need to convert the result of fscanf into your 8x2 cell array. Since you have R2017a you can then use cellstr and split to get your result.
Finally, if you need the integer values of each hex value you can use hex2dec on each cell in the cell-array.

How to convert a String to a Matrix Matlab

Im trying to convert a String into a Matrix. So like a=1 b=2... "Space"=28. Etc.
My question is how would I convert a string to a matrix?
aka..
abc=[1,2,3]
Tried a for loop, which does convert the string into numbers.
Here is where I try to make it into a Matrix
String1=char(string)
String2=reshape(String1,[10,14]);
the error I get is
"To RESHAPE the number of elements must not change"
"String2=reshape(String1,[10,14]);
If you need a general coding from characters into numbers (not necessarily ASCII):
Define the coding by means of a string, such that the character that appears first corresponds to number 1, etc.
Use ismember to do the "reverse indexing" operation.
Code:
coding = 'abcdefghijklmnñopqrstuvwxyz .,;'; %// define coding: 'a' is 1, 'b' is 2 etc
str = 'abc xyz'; %// example text
[~, result] = ismember(str, coding);
In this example,
result =
1 2 3 28 25 26 27

Exporting blank values into a .txt file - MATLAB

I'm currently trying to export multiple matrices of unequal lengths into a delimited .txt file thus I have been padding the shorter matrices with 0's such that dlmwrite can use horzcat without error:
dlmwrite(filename{1},[a,b],'delimiter','\t')
However ideally I do not want the zeroes to appear in the .txt file itself - but rather the entries are left blank.
Currently the .txt file looks like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887 0
61825 0
62785 0
63942 0
65159 0
66304 0
67509 0
68683 0
69736 0
70782 0
But I want it to look like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887
61825
62785
63942
65159
66304
67509
68683
69736
70782
Is there anyway I can do this? Is there an alternative to dlmwrite which will mean I do not need to have matrices of equal lengths?
If a is always longer than b you could split vector a into two vectors of same length as vector b and the rest:
a = [1 2 3 4 5 6 7 8]';
b = [9 8 7 ]';
len = numel(b);
dlmwrite( 'foobar.txt', [a(1:len), b ], 'delimiter', '\t' );
dlmwrite( 'foobar.txt', a(len+1:end), 'delimiter', '\t', '-append');
You can read in the numeric data and convert to string and then add proper whitespaces to have the final output as string based cell array, which you can easily write into the output text file.
Stage 1: Get the cell of strings corresponding to the numeric data from column vector inputs a, b, c and so on -
%// Concatenate all arrays into a cell array with numeric data
A = [{a} {b} {c}] %// Edit this to add more columns
%// Create a "regular" 2D shaped cell array to store the cells from A
lens = cellfun('length',A)
max_lens = max(lens)
A_reg = cell(max_lens,numel(lens))
A_reg(:) = {''}
A_reg(bsxfun(#le,[1:max_lens]',lens)) = cellstr(num2str(vertcat(A{:}))) %//'
%// Create a char array that has string data from input arrays as strings
wsp = repmat({' '},max_lens,1) %// Create whitespace cell array
out_char = [];
for iter = 1:numel(A)
out_char = [out_char char(A_reg(:,iter)) char(wsp)]
end
out_cell = cellstr(out_char)
Stage 2: Now, that you have out_cell as the cell array that has the strings to be written to the text file, you have two options next for the writing operation itself.
Option 1 -
dlmwrite('results.txt',out_cell(:),'delimiter','')
Option 2 -
outfile = 'results.txt';
fid = fopen(outfile,'w');
for row = 1:numel(out_cell)
fprintf(fid,'%s\n',out_cell{row});
end
fclose(fid);

Matlab: Cell column with mixed char/double entries - how to make all numerical?

I'm importing large datasets into Matlab from different Excel files. I use [~,~,raw] = xlsread('myfile.xlsx') to obtain a raw input into a single Matlab cell.
One column consists of interest rates, and the entries are imported as either CHAR (if they're decimal numbers) or DOUBLE (if they're rounded to integers).
Now, I want to slice out that column and get a numerical vector, which Matlab doesn't like. If i use str2num, all the CHAR entries are converted into DOUBLE, but the DOUBLES becomes NaN. Is there a function/solution to take into account that some entries are already DOUBLE?
You can probably work this into your existing code rather than create a whole new function but this should work for you. The functions not vectorized though but since it a cell vector I don't think that's an issue
function number = str2numThatHandelsNumericInputs(obj)
if isnumeric(obj)
number = obj;
else
number = str2num(obj);
end
end
Or as Eitan points out a better function:
function num = str2numThatHandelsNumericInputs(num)
if ischar(num)
num = str2num(num);
end
end
I think I didn't quite understand your question, because I understood you have something like this:
raw = {...
'1.2345' , NaN
3 , inf
4 , #cos
'567.1232' , { struct }
};
In which case you could just use str2double:
>> inds = cellfun('isclass', raw(:,1), 'char'); % indices to non-numeric data
>> raw(inds,1) = num2cell(str2double(raw(inds,1))); % convert in-place
>> [raw{:,1}].' % extract numeric array
ans =
1.2345
3.0000
4.0000
567.1232
But is this what you mean?

read text files containing binary data as a single matrix in matlab

I have a text file which contains binary data in the following manner:
00000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111110000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000000000111111111111111111111111111111111111110000000011100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111100111110000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111110111110000000000000000000000000000000
00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111000000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111110000011100000000000000000000000000000000
00000000000000000000000000000000000000000000011111111111111111111111111111111111100000000011100000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111110111100000000000000000000000000000000
Please note that each 1 or 0 is independent i.e the values are not decimal. I need to find the column wise sum of the file. There are 125 columns in all and there are 840946 rows.
I have tried textread, fscanf and a few other matlab commands, but the result is that they all read each row in decimal format and create a 840946x1 array. I want to create a 840946x125 matrix to compute a column wise sum.
You can use textread to do it. Just read strings and later process them with sscanf, one digit at a time
A = textread('data.txt', '%s');
ncols = size(A, 1);
nrows = size(A{1}, 2);
A = reshape(sscanf([A{:}], '%1d'), nrows, ncols);
Note that now A is transposed, i.e. you have 125 rows.
The column-wise sum is then computed simply by
colsum = sum(A);
Here's a slightly hack-ish approach:
A = textread('data.txt', '%s');
colsum = sum(cat(1,A{:})-'0')
Breakdown:
textread will read each line of 0's and 1's as a single string. A will therefore be a cell-string, with each element equal to a string of length 125.
cat(1,A{:}) will concatenate the cell string into a "normal" Matlab character array of size 840946-by-125.
Subtracting the ASCII-value '0' from any character array consisting of 0's and 1's will return their numeric representation. For example, 'a'-0 = 97, the ASCII-value for lower-case 'a'.
sum will finally sum over the columns of this array.