how to load .arff format file to matlab - matlab

Is there any package to load .arff format file into matlab?
The .arff format is used in Weka for running machine learning algorithm.

Since Weka is a Java library, you can directly use the API it exposes to read ARFF files:
%## paths
WEKA_HOME = 'C:\Program Files\Weka-3-7';
javaaddpath([WEKA_HOME '\weka.jar']);
fName = [WEKA_HOME '\data\iris.arff'];
%## read file
loader = weka.core.converters.ArffLoader();
loader.setFile( java.io.File(fName) );
D = loader.getDataSet();
D.setClassIndex( D.numAttributes()-1 );
%## dataset
relationName = char(D.relationName);
numAttr = D.numAttributes;
numInst = D.numInstances;
%## attributes
%# attribute names
attributeNames = arrayfun(#(k) char(D.attribute(k).name), 0:numAttr-1, 'Uni',false);
%# attribute types
types = {'numeric' 'nominal' 'string' 'date' 'relational'};
attributeTypes = arrayfun(#(k) D.attribute(k-1).type, 1:numAttr);
attributeTypes = types(attributeTypes+1);
%# nominal attribute values
nominalValues = cell(numAttr,1);
for i=1:numAttr
if strcmpi(attributeTypes{i},'nominal')
nominalValues{i} = arrayfun(#(k) char(D.attribute(i-1).value(k-1)), 1:D.attribute(i-1).numValues, 'Uni',false);
end
end
%## instances
data = zeros(numInst,numAttr);
for i=1:numAttr
data(:,i) = D.attributeToDoubleArray(i-1);
end
%## visualize data
parallelcoords(data(:,1:end-1), ...
'Group',nominalValues{end}(data(:,end)+1), ...
'Labels',attributeNames(1:end-1))
title(relationName)
You can even directly use its functionality from MATLAB. An example:
%## classification
classifier = weka.classifiers.trees.J48();
classifier.buildClassifier( D );
fprintf('Classifier: %s %s\n%s', ...
char(classifier.getClass().getName()), ...
char(weka.core.Utils.joinOptions(classifier.getOptions())), ...
char(classifier.toString()) )
The output C4.5 decision tree:
Classifier: weka.classifiers.trees.J48 -C 0.25 -M 2
J48 pruned tree
------------------
petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
Number of Leaves : 5
Size of the tree : 9

Yes, there are a few MATLAB interfaces for WEKA files on MATLAB File Exchange, I normally use this one: http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface where you have a saveARFF() and a loadARFF() functions.

If you only want to load a file stored in "arff" format into Matlab, and don't need any other functionality from Weka, just remove the header part of your "arff" file (those attribute definitions), and save the file as csv format (you should replace class values with a numeric equivalences), and then use the built-in "csvread" function of Matlab. This way there is no need to find a third party package.

M = importdata('filename.arff');
very slow for large files, but it works (tested in MATLAB 2010b)

Searching the MATLAB Central File Exchange reveals some possibilities. In particular, the results from Durga Lal Shrestha and Gerald Augusto Corzo Perez look promising, though I haven't tried either.

If the methods mentioned above do not work, and header information is required, load the arff file in weka, then select save as option and save the data using csv file format.

Related

Simulink model 'to workspace' output

I am trying to control motor torque and am using a workspace variable in Simulink and want to output similar variable to workspace.
I have size(T_u)=[3, 91] whereas the output I am getting from the simulation has size [91, 90]
I am unable to understand why this is so.
Code that I am using:
load('Motor_Param.mat')
t = 1:0.1:10;
T_o = [0.05*(10-t);0.04*(10-t);0.03*(10-t)];
T_d = zeros(size(T_o));
T_e = (T_d - T_o);
C_PD = pid(100,0,10,100);
T_u = zeros(size(T_e));
for k=1:size(T_e,1)
T_u(k,:) = lsim(C_PD,T_e(k,:),t);
%T_u(1,:)= -45.0450000000000 -44.5444552724092 -44.0439110892737 -43.5433674500493 -43.0428243541925 -42.5422818011600 -42.0417397904094 -41.5411983213986 -41.0406573935862 -40.5401170064312 -40.0395771593933 -39.5390378519326 -39.0384990835098 -38.5379608535861 -38.0374231616233 -37.5368860070837 -37.0363493894301 -36.5358133081260 -36.0352777626353 -35.5347427524223 -35.0342082769522 -34.5336743356904 -34.0331409281029 -33.5326080536564 -33.0320757118181 -32.5315439020554 -32.0310126238368 -31.5304818766308 -31.0299516599067 -30.5294219731343 -30.0288928157839 -29.5283641873264 -29.0278360872332 -28.5273085149760 -28.0267814700274 -27.5262549518604 -27.0257289599483 -26.5252034937652 -26.0246785527857 -25.5241541364848 -25.0236302443380 -24.5231068758215 -24.0225840304120 -23.5220617075865 -23.0215399068228 -22.5210186275990 -22.0204978693939 -21.5199776316868 -21.0194579139572 -20.5189387156857 -20.0184200363529 -19.5179018754402 -19.0173842324294 -18.5168671068029 -18.0163504980435 -17.5158344056347 -17.0153188290603 -16.5148037678048 -16.0142892213531 -15.5137751891906 -15.0132616708034 -14.5127486656779 -14.0122361733011 -13.5117241931606 -13.0112127247442 -12.5107017675407 -12.0101913210389 -11.5096813847285 -11.0091719580996 -10.5086630406426 -10.0081546318487 -9.50764673120954 -9.00713933821711 -8.50663245236405 -8.00612607314350 -7.50562020004906 -7.00511483257487 -6.50460997021554 -6.00410561246623 -5.50360175882257 -5.00309840878072 -4.50259556183731 -4.00209321748951 -3.50159137523496 -3.00109003457184 -2.50058919499879 -2.00008885601498 -1.49958901712007 -0.999089677814209 -0.498590837598075 0.00190750402718064
a = sim('Motor_Control','SimulationMode','normal');
out = a.get('T_l')
end
Link to .mat and .slx files is: https://drive.google.com/open?id=1kGeA4Cmt8mEeM3ku_C4NtXclVlHsssuw
If you set the Save format in the To Workspace block to Timeseries the output will have the dimensions of the signal times the number of timesteps.
In your case I activated the option Display->Signals & Ports->Signal dimensions and the signal dimensions in your model look like this:
So the signal that you output to the workspace has the size 90. Now if I print size(out.Data) I get
ans = 138 90
where 90 is the signal dimension and 138 is the number of timesteps in your Simulink model.
You could now use the last row of the data (which has the length 90) and add it to your array.
I edit your code, the code has [21,3] output size. "21" is coming from (t_final*1/sample_time+1)
In your code, time t should start from 0.
Motor_Control.slx model has 0.1 sample time if you run the model for a 9 second, the output file has 91 samples for each signal and that's why you have [91, 90] sized output. I download from your drive link and this Simulink model has 2 sec. simulation.
T_u is used as an input of the Simulink model, it is not constant so T_u must be time series.
The edited code is below;
load('Motor_Param.mat')
t = 0:0.1:10;
T_o = [0.05*(10-t);0.04*(10-t);0.03*(10-t)];
T_d = zeros(size(T_o));
T_e = (T_d - T_o);
C_PD = pid(100,0,10,100);
T_u = timeseries(zeros(size(T_e)),t);
for k=1:size(T_e,1)
T_u.Data(k,:) = lsim(C_PD,T_e(k,:),t);
a = sim('Motor_Control','SimulationMode','normal');
out = a.get('T_l')
end

SqueezeNet Deep Compression

Do you guys know where or how to obtain the 0.47MB version of SqueezeNet ?
In other words, how to make the weights bitwidth to be 6 instead of 8 ?
I cannot find the modification spot in this SqueezeNet generation code.
In this following method, I got 0.77 MB Model! Lets assume we have a SqueezeNet_model. We can convert SqueezeNet to Tensorflow Lite Model.
converter = tf.lite.TFLiteConverter.from_keras_model(SqueezeNet_model)
open("SqueezeNet_model.tflite", "wb").write(tflite_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
Then, we can use POST quantization to decrease the size of model!
open("SqueezeNet_Quant_model.tflite", "wb").write(tflite_quant_model)
print("Quantized model in Mb:", os.path.getsize('SqueezeNet_Quant_model.tflite') / float(2**20)) // I got 0.77 MB model
Finally, we can test our model with:
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="SqueezeNet_Quant_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on some input data.
input_shape = input_details[0]['shape']
acc=0
for i in range(len(x_test)):
input_data = np.array(x_test[i].reshape(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
if(np.argmax(output_data) == np.argmax(y_test[i])):
acc+=1
acc = acc/len(x_test)
print(acc*100)

how to read from file and display the data in desired rows in Matlab

I am trying to read from a file and display the data in rows 6, 11, 111 and 127 in Matlab. I could not figure out how to do it. I have been searching Matlab forums and this platform for an answer. I used fscanf, textscan and other functions but they did not work as intended. I also used a for loop but again the output was not what I wanted. I can now only read one row and display it. Simply I want to display all of them(data in rows given above) at the same time. How can I do that?
matlab code
n = [0 :1: 127];
%% Problem 1
figure
x1 = cos(0.17*pi*n)
%it creates file and writes content of x1 to the file
fileID = fopen('file.txt','w');
fprintf(fileID,'%d \n',x1);
fclose(fileID);
%line number can be changed in order to obtain wanted values.
fileID = fopen('file.txt');
line = 6;
C = textscan(fileID,'%s',1,'delimiter','\n', 'headerlines',line-1);
celldisp(C)
fclose(fileID);
and this is the file
1
8.607420e-01
4.817537e-01
-3.141076e-02
-5.358268e-01
-8.910065e-01
-9.980267e-01
-8.270806e-01
-4.257793e-01
9.410831e-02
5.877853e-01
9.177546e-01
9.921147e-01
7.901550e-01
3.681246e-01
-1.564345e-01
-6.374240e-01
-9.408808e-01
-9.822873e-01
-7.501111e-01
-3.090170e-01
2.181432e-01
6.845471e-01
9.602937e-01
9.685832e-01
7.071068e-01
2.486899e-01
-2.789911e-01
-7.289686e-01
-9.759168e-01
-9.510565e-01
-6.613119e-01
-1.873813e-01
3.387379e-01
7.705132e-01
9.876883e-01
9.297765e-01
6.129071e-01
1.253332e-01
-3.971479e-01
-8.090170e-01
-9.955620e-01
-9.048271e-01
-5.620834e-01
-6.279052e-02
4.539905e-01
8.443279e-01
9.995066e-01
8.763067e-01
5.090414e-01
-4.288121e-15
-5.090414e-01
-8.763067e-01
-9.995066e-01
-8.443279e-01
-4.539905e-01
6.279052e-02
5.620834e-01
9.048271e-01
9.955620e-01
8.090170e-01
3.971479e-01
-1.253332e-01
-6.129071e-01
-9.297765e-01
-9.876883e-01
-7.705132e-01
-3.387379e-01
1.873813e-01
6.613119e-01
9.510565e-01
9.759168e-01
7.289686e-01
2.789911e-01
-2.486899e-01
-7.071068e-01
-9.685832e-01
-9.602937e-01
-6.845471e-01
-2.181432e-01
3.090170e-01
7.501111e-01
9.822873e-01
9.408808e-01
6.374240e-01
1.564345e-01
-3.681246e-01
-7.901550e-01
-9.921147e-01
-9.177546e-01
-5.877853e-01
-9.410831e-02
4.257793e-01
8.270806e-01
9.980267e-01
8.910065e-01
5.358268e-01
3.141076e-02
-4.817537e-01
-8.607420e-01
-1
-8.607420e-01
-4.817537e-01
3.141076e-02
5.358268e-01
8.910065e-01
9.980267e-01
8.270806e-01
4.257793e-01
-9.410831e-02
-5.877853e-01
-9.177546e-01
-9.921147e-01
-7.901550e-01
-3.681246e-01
1.564345e-01
6.374240e-01
9.408808e-01
9.822873e-01
7.501111e-01
3.090170e-01
-2.181432e-01
-6.845471e-01
-9.602937e-01
-9.685832e-01
-7.071068e-01
-2.486899e-01
2.789911e-01
Assuming the file is not exceedingly large, the simplest way would probably be read the entire file & index the output to your desired lines.
line = [6 11 111 127];
fileID = fopen('file.txt');
C = textscan(fileID,'%s','delimiter','\n');
fclose(fileID);
disp(C{1}(line))

Octave: Load all files from specific directory

I used to have Matlab and loaded all txt-files from directory "C:\folder\" into Matlab with the following code:
myFolder = 'C:\folder\';
filepattern = fullfile(myFolder, '*.txt');
files = dir(filepattern);
for i=1:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
If C:\folder\ contains A.txt, B.txt, C.txt, I would then have matrices A, B and C in the workspace.
The code doesn't work in octave, maybe because of "fullfile"?. Anyway, with the following code I get matrices with the names C__folder_A, C__folder_B, C__folder_C. However, I need matrices called A, B, C.
myFolder = 'C:\folder\';
files = dir(myFolder);
for i=3:length(files)
eval(['load ' myFolder,files(i).name ' -ascii']);
end
Can you help me?
Thanks,
Martin
PS: The loop starts with 3 because files(1).name = . and files(2).name = ..
EDIT:
I have just found a solution. It's not elegant, but it works.
I just add the path in which the files are with "addpath", then I don't have to give the full name of the directory in the loop.
myFolder = 'C:\folder\';
addpath(myFolder)
files = dir(myFolder);
for i=3:length(files)
eval(['load ' files(i).name ' -ascii']);
end
It's usually bad design if you load files to variables which name is generated dynamically and you should load them to a cell array instead but this should work:
files = glob('C:\folder\*.txt')
for i=1:numel(files)
[~, name] = fileparts (files{i});
eval(sprintf('%s = load("%s", "-ascii");', name, files{i}));
endfor
The function scanFiles searches file names with extensions in the current dirrectory (initialPath) and subdirectories recursively. The parameter fileHandler is a function that you can use to process populated file structure (i.e. read text, load image, etc.)
Source
function scanFiles(initialPath, extensions, fileHandler)
persistent total = 0;
persistent depth = 0; depth++;
initialDir = dir(initialPath);
printf('Scanning the directory %s ...\n', initialPath);
for idx = 1 : length(initialDir)
curDir = initialDir(idx);
curPath = strcat(curDir.folder, '\', curDir.name);
if regexp(curDir.name, "(?!(\\.\\.?)).*") * curDir.isdir
scanFiles(curPath, extensions, fileHandler);
elseif regexp(curDir.name, cstrcat("\\.(?i:)(?:", extensions, ")$"))
total++;
file = struct("name",curDir.name,
"path",curPath,
"parent",regexp(curDir.folder,'[^\\\/]*$','match'),
"bytes",curDir.bytes);
fileHandler(file);
endif
end
if!(--depth)
printf('Total number of files:%d\n', total);
total=0;
endif
endfunction
Usage
# txt
# textFileHandlerFunc=#(file)fprintf('%s',fileread(file.path));
# scanFiles("E:\\Examples\\project\\", "txt", textFileHandlerFunc);
# images
# imageFileHandlerFunc=#(file)imread(file.path);
# scanFiles("E:\\Examples\\project\\datasets\\", "jpg|png", imageFileHandlerFunc);
# list files
fileHandlerFunc=#(file)fprintf('path=%s\nname=%s\nsize=%d bytes\nparent=%s\n\n',
file.path,file.name,file.bytes,file.parent);
scanFiles("E:\\Examples\\project\\", "txt", fileHandlerFunc);

Changing format of many files in Excel

I have a folder filled with thousands of csv files. When I open one file, the data looks like:
20110503 01:46.0 1527.8 1 E
20110503 01:46.0 1537.8 1 E
20110504 37:40.0 1536.6 1 E
20110504 37:40.0 1533.6 1 E
20110504 36:17.0 1531.1 1 E
The second column(time) has minutes and seconds before the decimal point. If I select the second column, right click and click format cells, select time, and change to 13:30:55 mode, the same data looks like:
20110503 19:01:46 1527.8 1 E
20110503 19:01:46 1537.8 1 E
20110504 0:37:40 1536.6 1 E
20110504 0:37:40 1533.6 1 E
20110504 8:36:17 1531.1 1 E
Now I can see hours, minutes and seconds. I have written a matlab function that reads these files, but needs to be able to read the hours. The function can only be used after I change the format to display the hours. Now I have to apply the function to all the files in the folder.
I'm wondering, is there a way to change the default time display so hours are included? If not, is there a way of writing a script to change the format of these files? Thanks!
Note: the part of my matlab function that reads the file looks like:
fid = fopen('E:\Tick Data\Data Output\NGU13.csv','rt');
c = fscanf(fid, '%d,%d:%d:%d,%f,%d,%*c');
datamat = reshape(c,6,length(c)/6)'; % reshape into matrix
yyyymmdd = datamat(:,1);
hr = datamat(:,2);
mn = datamat(:,3);
sec = datamat(:,4);
pp = datamat(:,5); % price
vv = datamat(:,6); % volume
In Excel:
In Notepad, you can see hours, minutes, seconds, and milliseconds:
20111206,09:50:56.411,4.320,1,E
20111206,10:02:10.167,4.300,1,E
20111206,11:24:09.052,4.313,1,E
20111206,11:46:09.359,4.307,1,E
20111206,11:50:22.785,4.320,1,E
For a record of the type
20010402, 09:30:24.456, 4.235, 1, E
you should use this fmt:
fmt = '%f%f:%f:%f.%f%f%*s';
data = textscan(fid, fmt, 'Delimiter',',','CollectOutput',true);