Matlab - Select rows given a condition - matlab

I have a cell variable (Size:2639516x12, Bytes:3863876744, Class:cell) and I want to make a selection, considering the first row. So for instance if I have
A:
1997 FD 89
1997 GD 65
1999 FDK 87
2010 UY 123
I would like to get
B:
1997 FD 89
1997 GD 65
To get to cell A I use the following code:
% Transfer csv file to matlab
Data_file = fopen('Data.csv');
Data = textscan(Data_file,'%q %q %q %q %f %f %f %f %s %f %f %f %s %f %s %f %s %f %f %f %s','delimiter',',','headerlines', 1);
fclose(Data_file);
%Convert numbers into strings
F_5=Data{:,5};
F_6=num2cell(Data{:,6});
F_7=num2cell(Data{:,7});
%Get the first 4 numbers within variable F_5
F_5A=max(0,fix(log10(F_5)+1)-4);
F_5B=fix(F_9./10.^F_5A);
%Convert number into string
F_5C = num2cell(F_5B);
%Create new cell A w/ variables I want
A=[F_5C Data{:,1} Data{:,2} Data{:,3} Data{:,4} F_6 F_7];

Using logical indexing
B = A(cell2mat(A(:,1))==1997,:);
Thanks to excaza for mentioning that values may not be rounded
If the year values are not properly rounded (i.e some cells have value as 1996.999999 or 1997.0001) then use
e = 0.001 %\\some small value
B = A(abs(cell2mat(A(:,1))-1997)<e,:);

You can use the following code to extract specific rows based on the first column of a.
b=a(~cellfun('isempty',(cellfun(#(x) find(x==1997),a(:,1),'UniformOutput',false))),:);
Here is how it works:
a =
[1997] 'FD' [ 89]
[1997] 'GD' [ 65]
[1999] 'FDK' [ 87]
[2010] 'UY' [123]
b=a(~cellfun('isempty',(cellfun(#(x) find(x==1997),a(:,1),'UniformOutput',false))),:);
b =
[1997] 'FD' [89]
[1997] 'GD' [65]

Related

MATLAB Unwanted conversion from double to int32 after indexing

I have this code to load data from text file, four columns - string, 3 integer columns and decimal number.
fileID = fopen(Files(j,1).name);
formatSpec = '%*s %d %d %d %f';
data = zeros(0, 4, 'double');
k = 0;
while ~feof(fileID)
k = k+1;
temp = textscan(fileID,formatSpec,10,'HeaderLines',1);
data = [data; [ temp{:} ] ] ;
end
fclose(fileID);
In a temp variable, the last column is saved as decimal number, however, the command
data = [data; [ temp{:} ] ]
has as consequence its rounding and I get 0 or 1 in last column.
I am asking why is that?
temp looks like (1x4 cell):
[5248000;5248100;....] [5248100;5248200;....]
[111;95;....] [0,600000000000000;0,570000000000000;....]
data then looks like (1x4 matrix):
5248000 5248100 111 1
5248100 5248200 95 1
EDIT:
Retesting (recreation of the same variable, just copied numbers from variable editor)
temp=cell(1,4);
temp{1}=[0;100;200;300;400;500;600;700;800;900];
temp{2}=[100;200;300;400;500;600;700;800;900;1000];
temp{3}=[143;155;150;128;121;122;137;145;145;126];
temp{4}=[0.340000000000000;0.450000000000000;0.370000000000000;...
0.570000000000000;0.570000000000000;0.580000000000000;...
0.590000000000000;0.500000000000000;0.460000000000000;0.480000000000000];
tempx=[temp{:}]
This makes it correctly! The last columnd is decimal.
But why it doesn't work on "real" data from textscan function?
Consider reading everything as floating point values:
change
formatSpec = '%*s %d %d %d %f';
to
formatSpec = '%*s %f %f %f %f';
If works in your EDIT because your variables are already of type double.
Concatenating double and int data casts to the int type. For example:
>> [pi; int16(5)]
ans =
2×1 int16 column vector
3
5
To avoid that, you can cast to double before concatenating. So in your case use something like the following to convert each cell's content to double(thanks to #CrisLuengo for a correction):
[ data; cellfun(#double, temp, 'uniformoutput', true) ]

matlab reading mixed data from file

I am pretty new to matlab. I've been reading the documentation but can figure it out why matlab does not correctly read the string from file. What I am trying to do is to read a mixed data type from file. Some sample data is:
t a e incl lasc aper meanan truean rupnode rdnnode name
0.000000 1.2712052487 0.8899021688 22.2458 265.2511471042 322.1539251184 -13.6281352271 -130.986 0.155342 0.889756 phaet_000018
0.000000 1.2712052478 0.8899021575 22.2458 265.2511428392 322.1539270642 -13.6281369694 -130.986 0.155342 0.889756 phaet_000044
0.000000 1.2712052496 0.8899021868 22.2458 265.2511587897 322.1539149438 -13.6281365049 -130.986 0.155342 0.889755 phaet_000006
The first line is header. So here is what I've done so far:
fid = fopen('data.dat');
header = fgetl(fid); # I read the header
Now I read the data:
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s',[11 inf]);
data1 = data';
fclose(fid);
I can now access the first element as:
data1(1,1)
However, when I do:
data(1,11)
instead of phaet_000018 I am getting a number (112). Any idea what I am doing wrong?
There are a few issues with your code.
First, your sizeA input to fscanf is backwards. sizeA with a vector input is defined as:
Read at most m*n numeric values or character fields. n can be Inf, but m cannot. The output, A, is m-by-n, filled in column order.
So you've asked fscanf to give you 11 rows and whatever number of columns. You can't have an Inf row specification so you'll want to remove the third input entirely and reshape your data afterwards.
For example:
fid = fopen('data.dat');
header = fgetl(fid);
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
% We just happen to know this explicitly, not knowledge to generally assume
ncols = 22;
% Reshape and transpose
data = reshape(data, ncols, []).';
Gives us a 3 x 22 data array, which is kinda sorta what we want.
So where are the extra columns coming from? For %s fields, fscanf reads the string until it encounters whitespace. Because the output of fscanf is a numeric array it must convert this string into a numeric value, so it converts each character to its numeric equivalent (double(letter)) and outputs that into the matrix.
Using our above data matrix as an example, we have:
>> char(data(1, 11:end))
ans =
phaet_000018
With this in mind, your initial code only happens to work because all of your strings are the same length. If we change the length of one or more of the strings, this data import will fail:
Error using reshape
Product of known dimensions, 22, not divisible into total number of elements, 65.
Error in testcode (line 11)
data = reshape(data, ncols, []).';
So what can we do instead? If you need this string from your data I would recommend trying textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
This will read your data into a 1x11 cell array, where each column corresponds to a column in your data:
>> data{1} % t
ans =
0
0
0
To collect your numeric data you can iterate through the cell array, or you can use the 'CollectOutput' flag in textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s', 'CollectOutput', true);
fclose(fid);
Which will output a 1x2 cell array, where data{1} is your numeric array and data{2} is a cell array containing your strings:
>> data{1} % Numeric data
ans =
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2512 322.1539 -13.6281 -130.9860 0.1553 0.8898
>> data{2} % Strings
ans =
3×1 cell array
'phaet_000018'
'phaet_000044'
'phaet_000006'

How to read specific words and numbers from a txt file and save them in a matrix

I am doing the object detection and want to generate a ground truth .mat file from 7481 text files. The contents of these files are all in this format:
car 0.00 0 -1.82 804.97 167.34 995.43 327.94 1.63 1.48 2.37 3.23 1.59 8.55 -1.47
misc 0.00 5 2.35 254.24 -2 305.25 7.6 4.58 5.35 2.35 1.35 2.35 3.36 1.56
bicycle 0.00 1 2 3 1 2.3 4.25 3.1 2 1 2.4 1.25 46.5 1.54
don't know 0.00 2.21 5.32 1.23 5.25 9.46 4.35 1.25 5 1 3 2 4 1.54
i.e., in every text file, there are several rows (number of rows are different in different files), and in each row, the first term is the type (car/misc/people/van/don't know....), following the type are 14 double numbers separated by space delimiter. I want to do the following things:
check whether the type is car/van/misc/tram
If the type is one of them, then in the following 14 numbers, pick up the 4th, 5th, 6th, 7th and the 14th number then save them in a matrix
repeat 1 and 2 for all the text files in the folder then generate a mat file containing the ground truth information
Now my codes are like:
clc;
clear all;
DetDir = '/scratch/yangj/project/car_dataset/training/label/';
F = dir([DetDir,'/*.txt']);
for frameNum = 1:7481
detFile = [DetDir,F(frameNum).name];
fid = fopen(detFile);
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
str = tline;
end
fclose (fid);
end
I think I should do the type checking and number-picking up in the while loop, but I have no ideas how to write the codes to achieve my goal.
Could you help me with that?
If your delimiter is a space, the don't know statement is quite annoying.. I would suggest to fix that first using for instance this nice (Perl) function replaceinfile, which can change the don't know into for instance don't_know.
If that is fixed, the following should work:
N = numel(F);
C = cell(N,1);
for idx = 1:N
% get the data
fid = fopen([DetDir F(idx).name]);
data = textscan(fid,'%s %f %f %f %f %f %f %f %f %f %f %f %f %f %f');
fclose(fid);
% combine all numeric data
M = horzcat(data{2:end});
% check for a string match
b = cellfun(#(type) strcmp(data{1}, type), {'car','van','misc','tram'}, 'uni', 0);
% keep only the interesting part of the numeric data
C{idx} = M(any(horzcat(b{:}),2),[4 5 6 7 14]);
end
% combine and save
gt = vertcat(C{:});
save('gt.mat', 'gt');
If you do not change the don't know statements in the files, the code will actually still run, but (in general) not result in the desired gt matrix.
To answer your question about adding additional stuff:
After constructing M, simply add:
M(:,end+1) = M(:,6)-M(:,4); % this becomes the 15-th value
Including the filenumber is done by changing C{idx} = M(any(horzcat(b{:}),2),[4 5 6 7 14]); into
fnr = (idx-1) * ones(sum(sum(horzcat(b{:}),2)),1);
C{idx} = [fnr M(any(horzcat(b{:}),2),[4 5 7 14 15])];

textscan Unexpected Empty Cell with valid Format string

I am reading a tab-delimited file. Five representative lines of this file are:
Date Time Property Path 1 Path 2 Path 3 Path 4 Path 5 Path 6 Path 7 Path 8
Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1
1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003
1/1 01:00:00 F1 (sm³/s) 1.9988E-004 1.6655E-003 2.2252E-004 1.6883E-003 1.8612E-003 2.0221E-004 2.0795E-004 1.7333E-003
1/1 02:00:00 F1 (sm³/s) -4.0722E-004 -3.3931E-003 -4.4324E-004 -2.1177E-003 -3.7075E-003 -2.5364E-004 -3.7330E-004 -3.1115E-003
When I use the following format string I get the expected results:
test = '1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003';
textscan(test, '%*s %*s %*s %*s %f %f %f %f %f %f %f %f')
Gives me:
ans =
[-0.0013] [-0.0112] [-1.0123e-04] [0.0098] [-8.4673e-04] [0.0012] [2.6890e-04] [0.0022]
Which is what I want, but when I attempt:
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
I get a 1x8 cell of empty cells.
What is the error in the format string translation?
I don't think there's anything wrong with your format string specifically.
Try pulling in the lines individually with fgetl or similar and just check that there's nothing you weren't expecting in the file. For example - your code seems to work for me but I can replicate your error by putting an additional blank line at the start of the file, which causes textscan to try and read the second header line as a line of data (and fail inelegantly). That particular error can be removed by increasing the value of HeaderLines.
fid = fopen('test.txt');
fgetl(fid) % repeat until you see your first line of data
Now, I try to use you code and it's work!
file=('d.txt');
fid=fopen(file);
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
Output:
celldisp(C)
C{1} =
-0.0013405
0.00019988
-0.00040722
C{2} =
-0.01117
0.0016655
-0.0033931
C{3} =
-0.00010123
0.00022252
-0.00044324
C{4} =
0.0097769
0.0016883
-0.0021177
C{5} =
-0.00084673
0.0018612
-0.0037075
C{6} =
0.001171
0.00020221
-0.00025364
C{7} =
0.0002689
0.00020795
-0.0003733
C{8} =
0.0022413
0.0017333
-0.0031115
I came across a problem where my textscan would only grab empty cell arrays, google search led me here. I solved it by using fgetl(fid) a couple of times and then frewind(fid), (fid being your variable for fopen) something about reading the lines made it easier to bring in the values.

Read part of a text-based file

I had a text-based file with .ptx suffix. It contains the point cloud information please see the following example
100
50
0.352 -5.207 -0.823 0.238 61 61 61
0.345 -5.202 -0.824 0.234 60 60 60
...
Question:
How can I load the file from the third row (ignore the first two rows) and save is as a matrix.
I would recommend using textscan.
Something like:
in = textscan('sample.ptx','%f %f %f %f %f %f %f','HeaderLines',2)
You can specify a number of header lines to skip using 'HeaderLines'. The %f refers to the format of the input data. Hope that helps.
Here is a full example of how to apply textscan and transform the result in to a matrix:
fid = fopen('textscantest.txt','r');
assert(fid~=1); % verify file is opened
C = textscan(fid,'%f %f %f %f %f %f %f','HeaderLines',2);
fclose(fid);
M = [C{:}]
Note that since you want it all in the same matrix, you need the same data type and all %f is required for each column.