I've got a large text file with some headers and numerical data. I want to ignore the header lines and specifically output the data in columns 2 and 4.
Example data
[headers]
line1
line2
line3
[data]
1 2 3 4
5 6 7 8
9 10 11 12
I've tried using the following code:
FID = fopen('datafile.dat');
data = textscan(FID,'%f',4,'delimiter',' ','headerLines',4);
fclose(FID);
I only get an output of 0x1 cell
Try this:
FID = fopen('datafile.dat');
data = textscan(FID,'%f %f %f %f', 'headerLines', 6);
fclose(FID);
data will be a 1x4 cell array. Each cell will contain a 3x1 array of double values, which are the values in each column of your data.
You can access the 2nd and 4th columns of your data by executing data{2} and data{4}.
With your original code, the main issue is that the data file has 6 header lines but you've specified that there are only 4.
Additionally, though, you'll run into problems with the specification of the number of times to match the formatSpec. Take for instance the following code
data = textscan(FID,'%f',4);
which specifies that you will attempt to match a floating-point value 4 times. Keep in mind that after matching 4 values, textscan will stop. So for the sake of simplicity, let's imagine that your data file only contained the data (i.e. no header lines), then you would get the following results when executing that code, multiple times:
>> FID = fopen('datafile_noheaders.dat');
>> data_line1 = textscan(FID,'%f', 4)
data_line1 =
[4x1 double]
>> data_line1{1}'
ans =
1 2 3 4
>> data_line2 = textscan(FID,'%f', 4)
data_line2 =
[4x1 double]
>> data_line2{1}'
ans =
5 6 7 8
>> data_line3 = textscan(FID,'%f', 4)
data_line3 =
[4x1 double]
>> data_line3{1}'
ans =
9 10 11 12
>> data_line4 = textscan(FID,'%f', 4)
data_line4 =
[0x1 double]
>> fclose(FID);
Notice that textscan picks up where it "left off" each time it is called. In this case, the first three times that textscan is called it returns one row from your data file (in the form of a cell containing a 4x1 column of data). The fourth call returns an empty cell. For the usecase you described, this format is not particularly helpful.
The example given at the top should return data in a format that is much easier to work with for what you are trying to accomplish. In this case it will match four floating point values in each of your rows of data, and will continue with each line of text until it can no longer match this pattern.
Related
I have a text file.
In the file is approx 20,000 rows of data. Each row has one column & contains 256 characters (which are all numbers).
I need to split each row into a cell array or matrix. So each 8 characters are "one piece" of information. I want to split the first 3 characters into a cell array and the next 5 characters into a double, then same again for the next 8 characters.
example
1653256719812345
myCellArray (1 x 2) myDoubleArray (1 x 2)
[165, 198] [32567, 12345]
What is the best way to do this?
Use textscan.
fid = fopen(MyFileName.txt);
data = textscan(fid, '%3d%5d', 'Delimiter', '');
fclose(fid);
testing:
% Test with string of 256 random digits that all happen to be 1:8 repeated 32 times
x = '1234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678';
>> y = textscan(x, '%3d%5d', 'Delimiter', '')
y =
[32x1 int32] [32x1 int32]
>> y{1}
ans =
123
123
123
123
...
I don't know the exact format of your files, so you may have to do this line-by-line within a loop (in which case you would get each line using fgetl and then replace fid in the textscan statement with the output from fgetl).
In general, whenever you find yourself having to read in data that was produced by FORTRAN code (fixed field width text files), textscan's 'Delimiter, '' and 'Whitespace', '' parameters are your friend.
Use regexp. If the file data.txt contains
1653256719812345
1563256719812345
1233256719812345
1463256719812345
Then the following MATLAB statements will read the numbers.
>> txt = fileread('data.txt') % Read entire file in txt
>> out = regexp(txt,'(\d{3})(\d{5})(\d{3})(\d{5})','tokens') % Match regex capturing groups
out =
{1x4 cell} {1x4 cell} {1x4 cell} {1x4 cell}
Each cell in out is a row from the file containing the parsed numbers as strings.You can use str2double to convert the numbers to a numeric data type in MATLAB
>> nums = cellfun(#str2double,out,'uni',0)
nums =
[1x4 double] [1x4 double] [1x4 double] [1x4 double]
Iterate over your rows one by one and run something like the following code.
k=int2str(1653256719812345);
> myCellArray{1}=k(1:3)
myCellArray =
'165'
>> mydoublearray(1)=str2num(k(4:9))
mydoublearray =
325671
If there's some formulaic pattern you should incorporate that instead of manually hard coding it.
I have a .out file (.txt) in the form:
This is a text file
This file was created by Andrew on 4/5/14
Certificate Result Test #12
Time A B C D
50 4 3 8 9
55 4 8 7 4
60 8 4 1 4
65 7 1 5 1
70 4 2 2 2
How do I read the numbers in the table into a matrix, called M, in MATLAB whilst ignoring all the text at the start?
I have tried using fscan and M = dlmread(filename) but I recieve errors saying Mismatch between file and format string due to the lines of text at the start.
Thanks in advance
Use textscan with the 'HeaderLines' option:
fid = fopen('my_file.out'); % or whatever your file is called
M = textscan(fid,'%d %d %d %d %d','HeaderLines',7); % using int32 data types, change as required
fclose(fid)
Note that M is a cell array
textscan is a powerful tool, with good low level functions. There is also the more convenient 'importdata' which works for many files of this kind:
m = importdata('my.txt', ' ', 6)
m =
data: [5x5 double]
textdata: {6x5 cell}
colheaders: {'Time' 'A' 'B' 'C' 'D'}
As you can see, it not only returns the data in m.data, but you also get the column headers for free.
I would like display the data from only one column of a .csv file in a matrix. There are multiple integer numbers (to be precise 3 numbers) separated by a semi-colon per cel for each row. Here is an example of how the data looks like:
A B
1;2;3
4;5;6
(note that A means column A, column B is empty)
The desired output would be an array in Matlab with 3 columns and 2 rows.
>> matrixFromCsvFile=
1 2 3
4 5 6
What I tried: was
fid = fopen('test.csv');
matrixFromCsvFile = textscan(fid, '%d %d %d', 'delimiter', ';')
fclose(fid);
Instead of the desired output I got this:
>> matrixFromCsvFile =
[2x1 int32] [2x1 int32] [2x1 int32]
>> matrixFromCsvFile{1}
>> ans =
1
4
Did I really just created 3 arrays within an array? I want just one. Luckily the 1 and 4 values are correct though. This already took me a long time to achieve, I'm stuck.
You can fix your example just by adding a CollectOutput flag to textscan:
M = textscan(fid, '%d %d %d', 'delimiter', ';','CollectOutput',1);
By default textscan outputs columns separately (so your data is there, just in e.g. M{1}, M{2}, M{3}). Setting CollectOutput puts consecutive columns of the same class into a single array.
e.g. this would give me five columns in five arrays:
M = textscan(fid, '%d %d %f %f %f');
This would give me two arrays, one containing the first two columns, one containing the last three:
M = textscan(fid, '%d %d %f %f %f','CollectOutput',1);
Use importdata:
M = importdata('test.csv',';',1)
matrixFromCsvFile = M.data
You could go on with
matrixFromCsvFile = cell2mat(matrixFromCsvFile);
Another question on fprintf
I have a matrix s(n,5) that I want to shorten (just take columns 3,4 and 5) into s1(n,3) and save with a different name.
s1=s(:,3:5);
txtfilename = [Filename '-1.txt'];
% Open a file for writing
fid = fopen(txtfilename, 'w');
% print values in column order
% two values appear on each row of the file
fprintf(fid, '%f %f %f\n', s1);
fclose(fid);
I don't think I understood the way to use fprintf and rewrite my new matrix, because it is sorting the values.
Thanks for your help
The problem is that MATLAB stores data in column-major order, meaning that when you do s1(:), the first three values are the first three values in the first column not the first row. (This is how fprintf will read values out of s1.) For example:
>> M = magic(3)
M =
8 1 6
3 5 7
4 9 2
>> M(:)
ans =
8
3
4
1
5
9
6
7
2
You can simply transpose the matrix to output the way you want:
fprintf(fid, '%f %f %f\n', s1.');
Trying to read a txt file, then to loop through all string of the txt file. Unfortunately not getting it to work.
fid = fopen(fullfile(source_dir, '1.txt'),'r')
read_current_item_cells = textscan(fid,'%s')
read_current_item = cell2mat(read_current_item_cells);
for i=1:length(read_current_item)
current_stock = read_current_item(i,1);
current_url = sprintf('http:/www.', current_item)
.....
I basically try to convert the cell arrays to a matrix as textscan outputs cell arrays. However now I get the message
Error using cell2mat (line 53) Cannot support cell arrays containing cell arrays or objects.
Any help is very much appreciated
That is the normal behaviour of textscan. It returns a cell array where each element of it is another cell OR array (depending on the specifier) containing the values corresponding to each format specifier in the format string you have passed to the function. For example, if 1.txt contains
appl 12
msft 23
running your code returns
>> read_current_item_cells
read_current_item_cells =
{4x1 cell}
>> read_current_item_cells{1}
ans =
'appl'
'12'
'msft'
'23'
which itself is another cell array:
>> iscell(read_current_item_cells{1})
ans =
1
and its elements can be accessed using
>> read_current_item_cells{1}{1}
ans =
appl
Now if you change the format from '%s' to '%s %d' you get
>> read_current_item_cells
read_current_item_cells =
{2x1 cell} [2x1 int32]
>> read_current_item_cells{1}
ans =
'appl'
'msft'
>> read_current_item_cells{2}
ans =
12
23
But the interesting part is that
>> iscell(read_current_item_cells{1})
ans =
1
>> iscell(read_current_item_cells{2})
ans =
0
That means the cell element corresponding to %s is turned into a cell array, while the one corresponding to %d is left as an array. Now since I do not know the exact format of the rows in your file, I guess you have one cell array with one element which in turn is another cell array containing all the elements in the table.
What can happen is that the data gets wrapped into a cell array of cell arrays, and to access the stored strings you need to index past the first array with
read_current_item_cells = read_current_item_cells{1};
Converting from cell2mat will not work if your strings are not equal in length, in which case you can use strvcat:
read_current_item = strvcat(read_current_item_cells{:});
Then you should be able to loop through the char array:
for ii=1:size(read_current_item,1)
current_stock = read_current_item(ii,:);
current_url = sprintf('http:/www.', current_stock)
.....