Reading numeric data from CSV file Matlab for specific columns? - matlab

I have two .csv files which I am trying to read into Matlab as numeric matrices. Call it list_a, simply has two columns of ID numbers and corresponding values (appr. 50000 lines) with a ',' delimiter. list_b has 6 columns with a ';' delimiter. I am only interested in the first two columns containing containing numbers; the other columns contain text that I don't care about.
I initially tried using the readtable function in Matlab but noticed that these values aren't stored as numeric values, which is a requirement I have. I couldn't figure out how to cast these as integers after reading them either.
For list_a I have used the dlmread function, which I believes reads the file as numeric values.
For list_b I have tried using the dlmread function in which row and column offsets can be specified (https://www.mathworks.com/help/matlab/ref/dlmread.html#d117e329603) - the problem here is however, that the length of the file could change in the future, so I'm not sure what to enter for the row offsets.
I'm also not sure I understand how this function works, considering I tried testing it for the first 1000 rows as follows:
csv_matrix = dlmread(csv_fullpath,';',[1 1 1000 2]);
and subsequently got the following error message - even though "field number 3" shouldn't even be included in the first place:
Error using dlmread (line 147)
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 1, field number 3) ==>
RandomTextInFile\n Error in Damage_List_Reader (line 15)
csv_matrix = dlmread(csv_fullpath,';',[1 1 1000 3]);
I get the impression that I'm making this problem a lot harder than it needs to be so if there's an all around better way to do this, I'm all ears.. Thanks!

I would suggest using fopen in combination with textscan (e.g. for list_a) like this:
file = fopen('list_a.csv');
out = textscan(file, '%d%f', 'delimiter', ',');
ID = out{1};
Vals = out{2};
'%d%f' specifies the FormatSpec, so the way how the data is formatted in file. With this, you can capture any data from a csv file (and also omit data). I recommend reading the textscan Matlab doc for further formatting issues.
P.S.: I think you can put and "end" (without the quotations) instead of one of the row offset values if the number of rows/cols isn't fixed.

Related

Can i write out a txt or csv doc with data of varying dimensions in Matlab?

I am using Matlab R2013b.
I have a 100x100 matrix which contains both numbers and strings. I converted it to a cell array (alldat) and wrote it to a csv file (blah.csv).
I then tried to append a single number to the top line of this csv file...which Matlab won't let me do.
cell2csv('blah.csv',alldat)
I can append the single number 'n' at the bottom of the matrix:
dlmwrite('blah.csv',n,'-append','delimiter',' ','roffset',1)
But it won't let me do it the other way around (so I can put the number in the first cell of the csv file, then have the matrix below it.
Can anyone advise?
I also tried outputting the cell array to a txt document using dlmwrite:
dlmwrite('blah.txt',alldat,'delimiter',' ');
And I kept getting this error:
Error using dlmwrite (line 113) The input cell array cannot be
converted to a matrix.
I often use tables for such tasks. Since you have a 100 x 100 array and not variables with different dimensions, it should be possible to adapt.
VarA={'12A3';123;'12B3'};
VarB={'45A6';456;'45B6'};
T=table(VarA,VarB);
writetable(T,'test.csv','WriteVariableNames',false)
T1=readtable('test.csv','ReadVariableNames',false)
You may want to use cell2table to create a table directly from your cell array, although it didn't work for me because it made some strange conversions from number to character.

Reading a text file with no delimiter into a vector MATLAB

I have a text file that has 200 rows, and there are 200 values in every row. The file consists of integers, but they are not separated by any delimiter, not even a space. Here is an example,
1111111111111111111111111111111111111111122222222222222222222222222220000111
1111111111111111111111111111100000000003123333333333333333333333333333300002
0000000000022222222222222222222222222222222211111121212222222222222222111111
The file may contain some strings at the beginning, but I want only to read these numbers. I want to be able to count the occurrence of every integer. So, I will read all these numbers into a vector, or a matrix, where every element in the vector is a number in the file. So, the vector must contain 200 * 200 elements. Then, I will calculate the occurrence of every element.
I checked available file reading methods like textscan, but I think that textscan with this format C = textscan(fid,'%d %d'); requires specifying %d 200 * 200 times, is this the case, or there is a way to use textscan?
I also tried importdata, but when I tried to print the result I didn't get the numeric values. It seems that it only reads the first row, because of this line 200x1 double. Here is the output,
A =
data: [200x1 double]
textdata: {6x1 cell}
colheaders: {[1x107 char]}
Can you please tell me what method I can use to read the file described above?
The data you have with importdata, imports only double values and the headers. You could use the readtable function as follows (I assume 1 header line):
datafile='test.txt';
headerlines=1;
%OPTION1
A=readtable(datafile); %from Matlab R2013b
AA=cell2mat(table2array(A(headerlines+1:end,:)));
%OPTION2
A=textread(datafile,'%s'); %from Matlab R2006a
AA=cell2mat(A(headerlines+1:end,:));
%PROCESSING
b=zeros(size(AA));
for k=1:size(AA,1)
b(k,:)=str2double(regexp(AA(k,:),'\d','match'));
end
%COUNTING
[nelements,centers]=hist(b',0:9);
The regular expression does the trick of getting out the numbers to columns:
regexp('01112345640','\d','match')
This should return a 1x11 cell with the numbers in char-format.
A simple approach:
each integer is a separate number (in the desired output), so read the data in line-by-line as a string, then do a loop
for j= 1:numel(a_line_of_integers),
x(j) = str2num(a_line_of_integers(j);
end
And repeat for every row you read in. Note in passing that if you switch to R, x=as.numeric(strsplit(a_line_of_Integers)) is much faster and easier

importing demarcated blocks of a file into distinct cells of an array

I have an input file having the following basic structure:
master header line(s)
block 1 header line(s)
... [m' x n] numerical matrix ...
block 2 header line(s)
... [m'' x n] numerical matrix ...
...
block N header line(s)
... [m(N) x n] numerical matrix ...
where n is constant, but m may assume different values (as indicated by the prime marks).
I am wondering if there is a simple way to load data of this organization into a cell array (or another structure of some kind) having the following form: each block of data (as defined by the header) is represented by a cell in a cell array, the contents of which are the numerical data in the form of a double array. To concretize that description, the desired MATLAB representation would appear as follows: cell{1} contains a double array containing the numerical data listed under the block 1 header; cell{2} contains a double array containing the numerical data listed under the block 2 header; etc.
Of course, there are simple alternatives, such as splitting the input file into individual block-specific files and sequentially reading each file into an element of a cell array via a loop statement, but I am interested to know whether there is a solution that does not require such manipulation.
I've had to do something similar. One way, as you say, is to divide into files. But really, since your file has a set structure:
1 - open the file
2 - read the first line (e.g. using fget)
3 - Read the header (e.g. using fget)
4 - read the next M rows (e.g. using fget, fread, etc.) and store as a matrix
5 - loop back to 3 except when eof.
(apologies for the pseudocode, I don't have access to Matlab on this computer)
Yes, this is still manipulation of the file, but it becomes extendable to when the file isn't as ordered as the example you gave (which is the case I have), and is extremely easy to read and debug. However, it will be slow if your file is hundreds of MBs.

dynamically specifying floating point number size in a function-enclosed sscanf statement

I have a structured data file consisting of header lines interspersed with blocks of data. I am reading each block of data (as defined by the header line) into a separate cell of a cell array. For instance, suppose that after loading the data with textscan, I have a cell array x and an array of indices of header lines and EOF (headerIdx) of the following form:
x={'header line 1';'98.78743';'99.39717';'99.93578';'100.40125';'100.79166';'101.10525';'101.34037';'101.49553';'101.56939';'101.56072';'101.4685';'101.29184';'101.03002';'100.68249';'header line 2';'100.24887';'99.72897';'99.12274';'98.43036';'97.65215';'96.78864';'95.84054';'header line 3';'3.2';'4.31';'2.7';'4.6';'9.3'};
headerIdx=[1;16;24;30];
I then attempt to extract each block of data below a header line into a separate element of a cell array using sscanf and str2mat (as suggested by this post). Initially, this approach failed because the elements within a given block of data were of different length. This can be solved by including a numerical flag for the '%f' argument to help sscanf know where to delimit the input data (as suggested by this post). One can then use a strategy such as the following to effect the conversion of structured data to a cell array of block-specific double arrays:
extract_data = #(n) sscanf(str2mat(x(headerIdx(n)+1:headerIdx(n+1)-1)).',['%' num2str(size(str2mat(x(headerIdx(n)+1:headerIdx(n+1)-1)).',1)) 'f']);
extracted_data = arrayfun(extract_data,1:numel(headerIdx)-1,'UniformOutput',false);
The numerical flag of the format string can either be set to something arbitrarily large to encompass all the data, or can be set on a block-specific basis as I have done in the example above. The latter approach leads to redundant evaluation of str2mat (once for the input to sscanf and once for the input to the '%f' string generator. Can this redundancy be avoided without using loop statements that store the output of the str2mat command in a temporary variable? Note that one cannot simply take the output of the size command applied to the output of str2mat(x).' on the entire data set because the header lines are generally going to be the lines with the greatest number of characters.
Finally, I have constructed the x matrix above to reflect the fact that some blocks of data may have different precision than other blocks. This is the reason to set the format string in a block-specific manner. My testing has shown that despite accurate construction of a block-specific format string (['%' num2str(size(str2mat(x(headerIdx(n)+1:headerIdx(n+1)-1)).',1)) 'f']), the data in all elements of the resulting cell array (extracted_data) are ultimately forced to have the same precision (see below). Why is this the case, and how can it be corrected?
extracted_data{:}
ans =
98.7874
99.3972
99.9358
100.4013
100.7917
101.1052
101.3404
101.4955
101.5694
101.5607
101.4685
101.2918
101.0300
100.6825
ans =
100.2489
99.7290
99.1227
98.4304
97.6522
96.7886
95.8405
ans =
3.2000
4.3100
2.7000
4.6000
9.3000

Matlab: reading from a .csv file

I am trying to import some data from a .csv file, I have search for solutions but no one seems to solve my problem. My .csv is just one column of numbers, but when I try to read it with csvread('myfile.csv') it says that it cannot convert from string. When I double click on the .csv file in matlab I can see that every number from the .csv has this aspect:
"996.47"
So every number is between double commas, and whatever I do I can not get just the number between them. I am trying also opening the file and with textscan but I find no way. Thank you very much in advance.
You can try this workaround:
V = dlmread('myfile.csv','"');
v = V(:,2)
According to your description you have one column of values formatted like "996.47". The first line creates a matrix where columns are delimited by '"' - you get three columns where the middle one is filled with your values. The second line extracts the middle column.
what about using
importdata('yourfile.csv')
It should work if you are only interested in data.
If you want a more generic solution that doesn't need to deal with indexing, you can use MATLAB's built-in function importdata.
x = importdata('yourfile.csv'); % reads in the file as text surrounded by double quotes
x = cellfun(#str2num,strrep(v,'"','')); % removes the double quotes and changes text to numbers