Dimensions of matrices being concatenated are not consistent - matlab

i read a csv file with textscan and when i want write in a file i receive this error : Error using horzcat. Dimensions of matrices being concatenated are not consistent.
if i change the first format in textscan (i mean %S) to %f the error vanishes.
the error occurs when matlab want to make [datatest{1} probability]
probability is 1000*1 double
datatest{1} is 1000*1 cell
datatest=textscan(FileID,'%s %*f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f',1000,'headerlines',1,'delimiter',',');
csvwrite('output.csv',[datatest{1} probability]);

Your variable datatest{1} contains 1000 cells which each contains a string (may be or may be not the same length).
In your statement [datatest{1} probability] you are trying to concatenate cells (containing strings) with double numeric type, this does not work. The concatenation operator needs to operate on data of similar type.
Now even if you were to create a cell array which would contain all your desired columns myCellArray={datatest{1} probability}, this would not help you because the output of that cannot be passed on the function csvwrite.
csvwrite, or the better sister dlmwrite, do not accept cell arrays. You would have to convert the cell values into numeric values. Unfortunately, you want to write strings and numeric values, so your only way is to use low level functions like fprintf
In your case, to write the file you were expecting, you can use the following code.
col1 = datatest{1} ; %// extract the column of interest for easier indexing later on
fidw = fopen('output.csv','w') ; %// get a handle on a file to write (necessary with "fprintf")
for iline = 1:numel(probability) %// loop on each line
fprintf( fidw , '%s, %f\n' , col1{iline} , probability(iline) ) ; %// write the line
end
fclose(fidw) ; %// close the file - IMPORTANT - (necessary with "fprintf")

Related

I want to read this specific csv file, using Matlab. I used textscan but I failed

csv file:
Date,Open,High,Low,Close,Volume,Adj Close
20170217,64.470001,64.690002,64.300003,64.620003,21234600,64.620003
20170216,64.739998,65.239998,64.440002,64.519997,20524700,64.519997
I used this:
fileID = fopen('table.csv');
C = textscan(fileID,'%s %f %f %f %f %d %f','Delimiter',',');
fclose(fileID);
celldisp(C)
but It does not read anything.
You can use the csvread function to read a csv file.
m=csvread('table.csv',1,0)
The values are stored in a matrix.
Since your file has an header line, you have to specify, in the call, to start reading from the second row of the file.
You can do it by adding two parameters in the call:
the first defines the row from which to start reading (notice that the index is zero base)
the second defines the column from which to start (in the case of the example, from the first)
If, nevertheless, you want to use textscan, you have to modify your code as follows:
fileID = fopen('table.csv');
% C = textscan(fileID,'%s %f %f %f %f %d %f','Delimiter',',');
C1 = textscan(fileID,'%s',2);
C2 = textscan(fileID,'%d%f%f%f%f%d%f','delimiter',',')
fclose(fileID);
You have to call textscan twice:
the first time ro read the first row (the header)
the second time to read the data
Notice in the first call the third parameter in the call: it specifies that the format (%s) has to be used twice.
This because in your header row the last word is separated by a space.
Once you've read the header row, you call textscan for the again to read the numeric values.
CSV is reading by xlsread('File');
if it is reading nan so do
[num text all]=xlsread('file');
and do for loops on text output

matlab reading mixed data from file

I am pretty new to matlab. I've been reading the documentation but can figure it out why matlab does not correctly read the string from file. What I am trying to do is to read a mixed data type from file. Some sample data is:
t a e incl lasc aper meanan truean rupnode rdnnode name
0.000000 1.2712052487 0.8899021688 22.2458 265.2511471042 322.1539251184 -13.6281352271 -130.986 0.155342 0.889756 phaet_000018
0.000000 1.2712052478 0.8899021575 22.2458 265.2511428392 322.1539270642 -13.6281369694 -130.986 0.155342 0.889756 phaet_000044
0.000000 1.2712052496 0.8899021868 22.2458 265.2511587897 322.1539149438 -13.6281365049 -130.986 0.155342 0.889755 phaet_000006
The first line is header. So here is what I've done so far:
fid = fopen('data.dat');
header = fgetl(fid); # I read the header
Now I read the data:
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s',[11 inf]);
data1 = data';
fclose(fid);
I can now access the first element as:
data1(1,1)
However, when I do:
data(1,11)
instead of phaet_000018 I am getting a number (112). Any idea what I am doing wrong?
There are a few issues with your code.
First, your sizeA input to fscanf is backwards. sizeA with a vector input is defined as:
Read at most m*n numeric values or character fields. n can be Inf, but m cannot. The output, A, is m-by-n, filled in column order.
So you've asked fscanf to give you 11 rows and whatever number of columns. You can't have an Inf row specification so you'll want to remove the third input entirely and reshape your data afterwards.
For example:
fid = fopen('data.dat');
header = fgetl(fid);
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
% We just happen to know this explicitly, not knowledge to generally assume
ncols = 22;
% Reshape and transpose
data = reshape(data, ncols, []).';
Gives us a 3 x 22 data array, which is kinda sorta what we want.
So where are the extra columns coming from? For %s fields, fscanf reads the string until it encounters whitespace. Because the output of fscanf is a numeric array it must convert this string into a numeric value, so it converts each character to its numeric equivalent (double(letter)) and outputs that into the matrix.
Using our above data matrix as an example, we have:
>> char(data(1, 11:end))
ans =
phaet_000018
With this in mind, your initial code only happens to work because all of your strings are the same length. If we change the length of one or more of the strings, this data import will fail:
Error using reshape
Product of known dimensions, 22, not divisible into total number of elements, 65.
Error in testcode (line 11)
data = reshape(data, ncols, []).';
So what can we do instead? If you need this string from your data I would recommend trying textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
This will read your data into a 1x11 cell array, where each column corresponds to a column in your data:
>> data{1} % t
ans =
0
0
0
To collect your numeric data you can iterate through the cell array, or you can use the 'CollectOutput' flag in textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s', 'CollectOutput', true);
fclose(fid);
Which will output a 1x2 cell array, where data{1} is your numeric array and data{2} is a cell array containing your strings:
>> data{1} % Numeric data
ans =
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2512 322.1539 -13.6281 -130.9860 0.1553 0.8898
>> data{2} % Strings
ans =
3×1 cell array
'phaet_000018'
'phaet_000044'
'phaet_000006'

Using textscan to read certain rows

I am trying to read data from a text file using textscan from Matlab. Currently, the code is provided below reads rows 1 to 4. I need it to read rows from 5 to 8, then rows from 9 to 13 and so on. How would I achieve this?
fileID=fopen(fileName);
num_rows=4;
nHeaderLines = 2;
formatSpec = '%*s %*s %s %s %*s %*s %*s %f %*s';
dataIn = textscan(fileID,formatSpec,num_rows,'HeaderLines',nHeaderLines, 'Delimiter',',' );
fclose(fileID);
Use
file = fopen('myfile');
content = textscan(file,'%s','delimiter','\n');
fclose(file);
and you have all the lines in your file as cell array of strings. Then take any number of rows you want and process them as you like.

Detect number of columns in a columnar text file

I am trying to interpret data from an eye tracking device. The files exported from the eye tracker are in ASCII format.
Recording files that contain data from a single eye only look like this (only four rows shown):
6372825 645.3 275.4 1362.0 ...
6372826 644.6 274.0 1364.0 ...
6372827 644.2 273.2 1365.0 ...
6372828 642.5 272.7 1367.0 ...
Note that the dots at the end of each row above are a part of the output file, i.e. I haven't added them for the purposes of this question. I normally detect these dots and later throw them away.
The format of the above columns is [timestamp, X, Y, pupilSize, {...}]
A recording from both eyes looks like this (only four rows shown):
505076 416.8 755.4 1148.0 23.6 751.1 1239.0 .....
505077 417.0 758.4 1143.0 23.7 753.1 1244.0 .....
505078 416.7 761.4 1146.0 24.6 752.1 1249.0 .....
505079 416.1 764.8 1150.0 27.3 750.2 1250.0 .....
In this case, the data format is [timestamp, X(left), Y(left), pupilSize(left), X(right), Y(right), pupilSize(right), {.....}]
In both cases, I'd like to extract the numbers from the text and assign them to an array. Here's how I do this for recordings from a single eye:
eyeData = textscan(fid,'%d %f %f %f %s');
I can do the same for binocular recordings, using the following code:
eyeData = textscan(fid,'%d %f %f %f %f %f %f %s');
The trouble is, I'd like to be able to automatically detect whether the data I'm dealing with are monocular or binocular. In other words, I need a way of determining whether the ASCII file has five columns or eight. Note that the last column in both cases just consists of a series of dots. Whilst I typically just throw this away, it may be useful in determining the number of eyes in the recording (since monocular recordings end each row with ... and binocular with .....)
Any ideas as to how I might work out how many columns are in each ASCII file are welcome!
You can read the first data line, check the number of columns and then revert the file position indicator For example:
pos = ftell(fid);
cols = numel(regexp(fgetl(fid), '\s*([^\s]*)\s*'));
fseek(fid, pos, 'bof');
This can be followed by:
if (cols == 5)
eyeData = textscan(fid, '%d %f %f %f %s');
else
eyeData = textscan(fid, '%d %f %f %f %f %f %f %s');
end
By the way, note that you can tell textscan to discard the dots by using %*s instead of the last %s in the pattern string.
You can count the columns in a file with a shell command, which you can call from MATLAB using
s = system(shell_command);
To produce a 'shell_command' that fits your needs check out the following link
unix - count of columns in file

Matlab textscan gone wrong: cellfun to select data from certain lines

Hi I am using the following code to read some values from lines containing 'GPGGA' from data.txt
fid = fopen('D:\data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*s %*s %*s %*s %*s %*s %*s %*s %*s,'Delimiter',',');
fclose(fid);
Loc = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Loc = Loc(row_idxs, :);
display(Loc);
The code works perfectly if the last line in data.txt is deleted. Not sure why it throws this error when the last line is included in the text file. What is the reason? I'm confused!
"??? Error using ==> horzcat
CAT arguments dimensions are not consistent.
Error in ==> test at 4
Loc = [A{[2, 4]}];"
data.txt
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,1,16,05,15,046,23,29,47,071,21,16,31,291,18,31,39,202,18*73
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGSV,4,2,16,23,13,298,17,25,15,119,17,06,22,247,16,03,04,251,14*75
$GPGGA,1.8,98.90,S,18.0014,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGGA,1.3,98.91,S,18.0015,E,1,04,1.0,100.7,M,48.0,M,,*40
$GPGGA,1.3,98.92,S,18.0016,E,1,04,1.0,105.4,M,48.0,M,,*4F
$GPGGA,1.8,98.93,S,18.0017,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGGA,1.8,98.94,S,18.0018,E,1,04,1.0,87.8,M,48.0,M,,*76
$GPGSV,4,4,16,27,,,,26,,,,24,,,,22,,,*79
Your format string is no good. It is only indicative of 15 columns. The sample data you've posted has 20 columns. I suggest using the following code (which runs without error on my machine) instead:
fid = fopen('D:\data.txt','r');
A=textscan(fid,'%s %*s %f %s %f %s %*[^\n]', 'Delimiter',',');
fclose(fid);
Loc = [A{[2, 4]}];
row_idxs = cellfun( #(s) strcmp(s, '$GPGGA'), A{1});
Loc = Loc(row_idxs, :);
display(Loc);
Note the construct %*[^\n] in my format string. This tells textscan to ignore all columns from this point onwards. It is much neater than writing out lots of %*s over and over. Also, it means you're less likely to miscount the number of columns when building the format string :-)