I have a text file with data structure like this
30,311.263671875,158.188034058,20.6887207031,17.4877929688,0.000297248129755,aeroplane
30,350.668334961,177.547393799,19.1939697266,18.3677368164,0.00026999923648,aeroplane
30,367.98135376,192.697219849,16.7747192383,23.0987548828,0.000186387239864,aeroplane
30,173.569274902,151.629364014,38.0069885254,37.5704650879,0.000172595537151,aeroplane
30,553.904602051,309.903320312,660.893981934,393.194030762,5.19620243722e-05,aeroplane
30,294.739196777,156.249740601,16.3522338867,19.8487548828,1.7795707663e-05,aeroplane
30,34.1946258545,63.4127349854,475.104492188,318.754821777,6.71026540999e-06,aeroplane
30,748.506652832,0.350944519043,59.9415283203,28.3256549835,3.52978979379e-06,aeroplane
30,498.747009277,14.3766479492,717.006652832,324.668731689,1.61551643174e-06,aeroplane
30,81.6389465332,498.784301758,430.23046875,210.294677734,4.16855394647e-07,aeroplane
30,251.932098389,216.641052246,19.8385009766,20.7131652832,3.52147743106,bicycle
30,237.536972046,226.656692505,24.0902862549,15.7586669922,1.8601918593,bicycle
30,529.673400879,322.511322021,25.1921386719,21.6920166016,0.751171214506,bicycle
30,255.900146484,196.583847046,17.1589355469,27.4430847168,0.268321367912,bicycle
30,177.663650513,114.458488464,18.7516174316,16.6759414673,0.233057001606,bicycle
30,436.679382324,273.383331299,17.4342041016,19.6081542969,0.128449092153,bicycle
I want to index those file with a label file.and the result will be something like this.
60,509.277435303,284.482452393,26.1684875488,31.7470092773,0.00807665128377,15
60,187.909835815,170.448471069,40.0388793945,58.8763122559,0.00763951029512,15
60,254.447280884,175.946624756,18.7212677002,21.9440612793,0.00442053096776,15
However there might be some class that is not in label class and I need to filter those line out so I can use load() to load in.(you can't have char inside that text file and execute load().
here is my implement:
function test(vName,meta)
f_dt = fopen([vName '.txt'],'r');
f_indexed = fopen([vName '_indexed.txt'], 'w');
lbls = loadlbl()
count = 1;
while(true),
if(f_dt == -1),
break;
end
dt = fgets(f_dt);
if(dt == -1),
break
else
dt_cls = strsplit(dt,','){7};
dt_cls = regexprep(dt_cls, '\s+', '');
cls_idx = find(strcmp(lbls,dt_cls));
if(~isempty(cls_idx))
dt = strrep(dt,dt_cls,int2str(cls_idx));
fprintf(f_indexed,dt);
end
end
end
fclose(f_indexed);
if(f_dt ~= -1),
fclose(f_dt);
end
end
However it work very very slow because the text file contains 100 thousand of lines. Is it anyway that I could do this task smarter and faster?
You may use textscan, and get the indices/ line numbers of the labels you want. After knowing the line numbers, you can extract what you want.
fid = fopen('data.txt') ;
S = textscan(fid,'%s','delimiter','\n') ;
S = S{1} ;
fclose(fid) ;
%% get bicycle lines
idx = strfind(S, 'bicycle');
idx = find(not(cellfun('isempty', idx)));
S_bicycle = S(idx)
%% write this to text file
fid2 = fopen('text.txt','wt') ;
fprintf(fid2,'%s\n',S_bicycle{:});
fclose(fid2) ;
From S_bicycle, you can extract your numbers.
function []= read_c3d_feat(output_list_relative)
dir_list = importdata(output_list_relative);
dim_feat = 512;
for i = 1 : size(dir_list, 1)
dir_str = char(dir_list(i));
feat_files = dir([dir_str, '/*.res5b']);
num_feat = length(feat_files);
feat = zeros(num_feat, dim_feat);
for j = 1 : num_feat
feat_path = strcat(dir_str, '/', feat_files(j).name);
...............
....................so on
Give me error like
Error using dir
Invalid path. The path must not contain a null character.
Error in read_c3d_feat (line 12)
feat_files = dir([dir_str, '/*.res5b']);
Your dir_list variable must have strings which contain null characters, as the error tells you. If you try using hard-coded strings you will see it works:
function read_c3d_feat(output_list_relative)
dir_list = {'21';'45';'18'};
for i = 1:size(dir_list, 1)
dir_str = dir_list{i}; % Loops through '21','45','18'
% The dir function now works because we know dir_str is a valid string
feat_files = dir([dir_str, '/*.res5b']);
end
end
This means you need to debug your code and find out what this line is actually assigning to dir_list:
dir_list = importdata(output_list_relative);
Note that if dir_list is a cell of text entries, you should be indexing it with curly braces as above. If instead it is a matrix (because all of the entries seem to be numerical anyway) then you should be using num2str when passing to dir:
function read_c3d_feat(output_list_relative)
dir_list = importdata(output_list_relative);
dim_feat = 512;
for i = 1:size(dir_list, 1)
feat_files = dir([num2str(dir_list(i)), '/*.res5b']);
% ...
How do I get to read my file with increment .htm file with correct file format and path?
path:DATA\WEBPAGE_SOURCE\train75_phish_data\1.htm
file:1.htm,2.htm,3.htm....etc
Inside 1.htm,2.htm,3.htm....etc are the soucre code of webpage
I do try with the following example, but got the error when i=21.
data2=fopen(strcat('DATA\WEBPAGE_SOURCE\train75_phish_data\',int2str(i),'.htm'),'r')
I have refer to this, still cannot work, any ideas?
http://www.mathworks.com/help/matlab/ref/fopen.html
Here is my code:
data = importdata('DATA/URL/trainURL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')
[sizeData b] = size(domain_URL);
for i = 1:150
A7_data = domain_URL{i};
data2=fopen(strcat('DATA\WEBPAGE_SOURCE\train75_phish_data\',int2str(i),'.htm'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
img_only = regexp(CharData, '<img.*?>', 'match');
feature7_data=(cellfun(#(n) isempty(n), strfind(img_only, A7_data)))
B7(i)=sum(feature7_data)
end
feature7(B7>=10)=1;
feature7(B7<10&B7>5)=0;
feature7(B7<=5)=-1;
feature7'
Here is my output:
data = importdata('DATA/URL/trainURL') is a list of URL being saved inside
I could not loop the results for i=20, it will come to error when iteration=21, I want to loop until 150, it cnt read the 'data2' for 'i=21'
I think you need to handle possible exceptions that can come in a more principled way. Try this:
data = importdata('DATA/URL/trainURL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')
[sizeData b] = size(domain_URL);
for i = 1:150
A7_data = domain_URL{i};
filename = fullfile('DATA\WEBPAGE_SOURCE\train75_phish_data\',strcat(int2str(i),'.htm'));
if (exist(filename,'file')),
disp(sprintf('file %s exists, processing it',filename));
data2=fopen(filename,'r');
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
img_only = regexp(CharData, '<img.*?>', 'match');
feature7_data=(cellfun(#(n) isempty(n), strfind(img_only, A7_data)))
B7(i)=sum(feature7_data)
else,
disp(sprintf('file %s does not exist, skipping it!',filename));
end
end
feature7(B7>=10)=1;
feature7(B7<10&B7>5)=0;
feature7(B7<=5)=-1;
feature7'
after the line that does the fread.
I've got a folder which contains subfolders with text files. I want to read those file with the same order as they are in the subfolders. I've got a problem with that. I use the following matlab code:
outNames = {};
k=1;
feature = zeros(619,85);
fileN = cell(619,1);
for i=1:length(nameFolds)
dirList = dir(strcat(path, num2str(cell2mat(nameFolds(i,1)))));
names = {dirList.name};
outNames = {};
for j=1:numel(names)
name = names{j};
if ~isequal(name,'.') && ~isequal(name,'..')
[~,name] = fileparts(names{j});
outNames{end+1} = name;
fileName = strcat(path, num2str(cell2mat(nameFolds(i,1))), '\', name, '.descr' );
feature(k,:) = textread(fileName);
fileN{k} = [fileName num2str(k)];
k= k+1;
end
end
end
In one subfolder I've got the above text file names:
AnimalPrint_tiger_test_01.descr
AnimalPrint_tiger_test_02.descr
AnimalPrint_tiger_test_03.descr
AnimalPrint_tiger_test_04.descr
AnimalPrint_tiger_test_05.descr
AnimalPrint_tiger_test_06.descr
AnimalPrint_tiger_test_07.descr
AnimalPrint_tiger_test_08.descr
AnimalPrint_tiger_test_09.descr
AnimalPrint_tiger_test_10.descr
AnimalPrint_tiger_test_11.descr
AnimalPrint_tiger_test_12.descr
AnimalPrint_tiger_test_13.descr
AnimalPrint_tiger_test_14.descr
AnimalPrint_tiger_test_15.descr
AnimalPrint_zebra_test_1.descr
AnimalPrint_zebra_test_2.descr
AnimalPrint_zebra_test_3.descr
AnimalPrint_zebra_test_4.descr
AnimalPrint_zebra_test_5.descr
AnimalPrint_zebra_test_12.descr
But it seems that it reads first the AnimalPrint_zebra_test_12.descr and after the AnimalPrint_zebra_test_1.descr and the rest. Any idea why this happens?
dir sorts the files according to their names, for instance
test_1
test_12 % 1 followed by 2
test_2
test_3
You may want to build your own order with ['test_' num2str(variable) '.descr'] that concatenates test_ with an incrementing variable.
I am looking for MATLAB code that does some routine (updates a file.m), if file.csv is edited more recently than file.m.
Something that should look like:
% Write time extraction
tempC = GetFileTime('file.csv', [], 'Write');
tempdateC = tempC.date
tempM = GetFileTime('file.m', [], 'Write');
tempdateM = tempM.date
% Write time comparison
if numel(dir('file.m')) == 0 || tempdateC >= tempdateM
matDef = regexprep(fileread('file.csv'), '(\r\n|\r|\n)', ';\n');
f = fopen('file.m', 'w');
fwrite(f, ['Variable = [' matDef(1:end) '];']);
fclose(f);
end
The lines for timestamp extraction seem to be incorrect MATLAB code. The rest works (Evaluate variables in external file strings).
You can extract the modification time of a file using MATLAB's dir command. Something like:
function modTime = GetFileTime(fileName)
listing = dir(fileName);
% check we got a single entry corresponding to the file
assert(numel(listing) == 1, 'No such file: %s', fileName);
modTime = listing.datenum;
end
Note that the output is in MATLAB's datenum serial date format.