Read images from a csv file with Octave - matlab

I want to read the training.csv file with Octave for the Kaggle competition.
The file contains 16 fields. First 15 are the coordinates of keypoints. The 16th is is the image, which is 9216 numbers (0 to 255) separated by space.
Tried, but with no luck the followings:
- data = csvread('training.csv');
- data = dlmread('training.csv', ',');
- [l1,l2,l3,l4,l5,l6,l7,l8,l9,l10,l11,l12,l13,l14,l15, image] =
textread("training.csv", "%f %f %f %f %f %f %f %f %f %f %f %f %f %f
%f %s", "delimiter", ",", "endofline", "\n", "headerlines", 1);
Note:
The file also contains missing data.
The first 3 lines of the csv file are here: pastebin.com/pwBQgcfa
Thanks,

This post contributed greatly to figure this out.
The key is to:
Remove the header row
Replace the ",,"(double commas) with "0"
Replace ","(single comma) with " " (space)
The code to read the file
fn = 'training_space.txt';
M = dlmread(fn);

Related

Read line delimited by comma and tab

I would like to read files containing numbers in each line. Here is the example of the format-
0,0,0 1 0 0 0
0.02,0.1,0.98 8.77 0.985292 0.112348 0.112348
0.04,0.2,1.96 8.77 0.985292 0.112348 0.224696
As above shown, the first three numbers are separated by commas, after that all the rest numbers are separated by tab in the line. As a result, it is not possible to use dlmread or textscan. Is there any way to solve it? Thanks!
Yes you should add two parameters in your function:
Delimiter %choose the delimiter
and
MultipleDelimsAsOne %Treat Repeated Delimiters as One
Option 1:
Small "trick" you can select more than one delimiter if you give a structure as input: {',',' '}.
Result = textscan(fileID,'%f %f %f %f %f %f %f','Delimiter',{',',' '},'MultipleDelimsAsOne',1);
Option 2: (that should work)
This time I don't use MultipleDelimsAsOne but I precise that the delimiter can be a comma or a tab (with \t).
Result = textscan(fileID,'%f %f %f %f %f %f %f','Delimiter',{',','\t'});

Dimensions of matrices being concatenated are not consistent

i read a csv file with textscan and when i want write in a file i receive this error : Error using horzcat. Dimensions of matrices being concatenated are not consistent.
if i change the first format in textscan (i mean %S) to %f the error vanishes.
the error occurs when matlab want to make [datatest{1} probability]
probability is 1000*1 double
datatest{1} is 1000*1 cell
datatest=textscan(FileID,'%s %*f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f',1000,'headerlines',1,'delimiter',',');
csvwrite('output.csv',[datatest{1} probability]);
Your variable datatest{1} contains 1000 cells which each contains a string (may be or may be not the same length).
In your statement [datatest{1} probability] you are trying to concatenate cells (containing strings) with double numeric type, this does not work. The concatenation operator needs to operate on data of similar type.
Now even if you were to create a cell array which would contain all your desired columns myCellArray={datatest{1} probability}, this would not help you because the output of that cannot be passed on the function csvwrite.
csvwrite, or the better sister dlmwrite, do not accept cell arrays. You would have to convert the cell values into numeric values. Unfortunately, you want to write strings and numeric values, so your only way is to use low level functions like fprintf
In your case, to write the file you were expecting, you can use the following code.
col1 = datatest{1} ; %// extract the column of interest for easier indexing later on
fidw = fopen('output.csv','w') ; %// get a handle on a file to write (necessary with "fprintf")
for iline = 1:numel(probability) %// loop on each line
fprintf( fidw , '%s, %f\n' , col1{iline} , probability(iline) ) ; %// write the line
end
fclose(fidw) ; %// close the file - IMPORTANT - (necessary with "fprintf")

textscan Unexpected Empty Cell with valid Format string

I am reading a tab-delimited file. Five representative lines of this file are:
Date Time Property Path 1 Path 2 Path 3 Path 4 Path 5 Path 6 Path 7 Path 8
Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1
1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003
1/1 01:00:00 F1 (sm³/s) 1.9988E-004 1.6655E-003 2.2252E-004 1.6883E-003 1.8612E-003 2.0221E-004 2.0795E-004 1.7333E-003
1/1 02:00:00 F1 (sm³/s) -4.0722E-004 -3.3931E-003 -4.4324E-004 -2.1177E-003 -3.7075E-003 -2.5364E-004 -3.7330E-004 -3.1115E-003
When I use the following format string I get the expected results:
test = '1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003';
textscan(test, '%*s %*s %*s %*s %f %f %f %f %f %f %f %f')
Gives me:
ans =
[-0.0013] [-0.0112] [-1.0123e-04] [0.0098] [-8.4673e-04] [0.0012] [2.6890e-04] [0.0022]
Which is what I want, but when I attempt:
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
I get a 1x8 cell of empty cells.
What is the error in the format string translation?
I don't think there's anything wrong with your format string specifically.
Try pulling in the lines individually with fgetl or similar and just check that there's nothing you weren't expecting in the file. For example - your code seems to work for me but I can replicate your error by putting an additional blank line at the start of the file, which causes textscan to try and read the second header line as a line of data (and fail inelegantly). That particular error can be removed by increasing the value of HeaderLines.
fid = fopen('test.txt');
fgetl(fid) % repeat until you see your first line of data
Now, I try to use you code and it's work!
file=('d.txt');
fid=fopen(file);
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
Output:
celldisp(C)
C{1} =
-0.0013405
0.00019988
-0.00040722
C{2} =
-0.01117
0.0016655
-0.0033931
C{3} =
-0.00010123
0.00022252
-0.00044324
C{4} =
0.0097769
0.0016883
-0.0021177
C{5} =
-0.00084673
0.0018612
-0.0037075
C{6} =
0.001171
0.00020221
-0.00025364
C{7} =
0.0002689
0.00020795
-0.0003733
C{8} =
0.0022413
0.0017333
-0.0031115
I came across a problem where my textscan would only grab empty cell arrays, google search led me here. I solved it by using fgetl(fid) a couple of times and then frewind(fid), (fid being your variable for fopen) something about reading the lines made it easier to bring in the values.

Read part of a text-based file

I had a text-based file with .ptx suffix. It contains the point cloud information please see the following example
100
50
0.352 -5.207 -0.823 0.238 61 61 61
0.345 -5.202 -0.824 0.234 60 60 60
...
Question:
How can I load the file from the third row (ignore the first two rows) and save is as a matrix.
I would recommend using textscan.
Something like:
in = textscan('sample.ptx','%f %f %f %f %f %f %f','HeaderLines',2)
You can specify a number of header lines to skip using 'HeaderLines'. The %f refers to the format of the input data. Hope that helps.
Here is a full example of how to apply textscan and transform the result in to a matrix:
fid = fopen('textscantest.txt','r');
assert(fid~=1); % verify file is opened
C = textscan(fid,'%f %f %f %f %f %f %f','HeaderLines',2);
fclose(fid);
M = [C{:}]
Note that since you want it all in the same matrix, you need the same data type and all %f is required for each column.

Detect number of columns in a columnar text file

I am trying to interpret data from an eye tracking device. The files exported from the eye tracker are in ASCII format.
Recording files that contain data from a single eye only look like this (only four rows shown):
6372825 645.3 275.4 1362.0 ...
6372826 644.6 274.0 1364.0 ...
6372827 644.2 273.2 1365.0 ...
6372828 642.5 272.7 1367.0 ...
Note that the dots at the end of each row above are a part of the output file, i.e. I haven't added them for the purposes of this question. I normally detect these dots and later throw them away.
The format of the above columns is [timestamp, X, Y, pupilSize, {...}]
A recording from both eyes looks like this (only four rows shown):
505076 416.8 755.4 1148.0 23.6 751.1 1239.0 .....
505077 417.0 758.4 1143.0 23.7 753.1 1244.0 .....
505078 416.7 761.4 1146.0 24.6 752.1 1249.0 .....
505079 416.1 764.8 1150.0 27.3 750.2 1250.0 .....
In this case, the data format is [timestamp, X(left), Y(left), pupilSize(left), X(right), Y(right), pupilSize(right), {.....}]
In both cases, I'd like to extract the numbers from the text and assign them to an array. Here's how I do this for recordings from a single eye:
eyeData = textscan(fid,'%d %f %f %f %s');
I can do the same for binocular recordings, using the following code:
eyeData = textscan(fid,'%d %f %f %f %f %f %f %s');
The trouble is, I'd like to be able to automatically detect whether the data I'm dealing with are monocular or binocular. In other words, I need a way of determining whether the ASCII file has five columns or eight. Note that the last column in both cases just consists of a series of dots. Whilst I typically just throw this away, it may be useful in determining the number of eyes in the recording (since monocular recordings end each row with ... and binocular with .....)
Any ideas as to how I might work out how many columns are in each ASCII file are welcome!
You can read the first data line, check the number of columns and then revert the file position indicator For example:
pos = ftell(fid);
cols = numel(regexp(fgetl(fid), '\s*([^\s]*)\s*'));
fseek(fid, pos, 'bof');
This can be followed by:
if (cols == 5)
eyeData = textscan(fid, '%d %f %f %f %s');
else
eyeData = textscan(fid, '%d %f %f %f %f %f %f %s');
end
By the way, note that you can tell textscan to discard the dots by using %*s instead of the last %s in the pattern string.
You can count the columns in a file with a shell command, which you can call from MATLAB using
s = system(shell_command);
To produce a 'shell_command' that fits your needs check out the following link
unix - count of columns in file