String attribute to numeric in WEKA - type-conversion

I am new to weka and trying to analyse a dataset.
I imported a csv into the explorer window and noticed that one column from the csv file that contains numeric values as percentages (e.g. 46%) has been imported as nominal.
How can I transform these values from nominal to numerical?
Any tips would be much appreciated.

The quickest way would be:
open the CSV file in a text editor
remove all the %
re-import the CSV file in Weka
Other tools, like the Spreadsheet file viewer or the Flow editor in ADAMS, allow you that kind of conversion on the fly using the SpreadSheetConvertCells transformer:
finder: define the correct column range to operate on
conversion: adams.data.conversion.StringExpression -expression "substitute(X, \\\"%\\\", \\\"\\\")"

Related

Convert spreadsheet to .csv on command line without evaluating formulas

I want to convert a spreadsheet (e.g. .xls or from LibreOffice Calc) to some text format, e.g. .csv, without evaluating formulas so the formulas are stored in the text file. I know that LibreOffice has an option "Save cell formula instead of calculated values" when saving as .csv and according to How to export spreadsheet to CSV without evaluating formulas Excel can do this too, but I'd like to do it on command line. I know that ssconvert from the Gnumeric package can convert on command line but as fa as I ca see there's no option to keep the formulas.
The bigger picture is that I want to write a script that takes two versions of an .ods file, converts them and shows the differences. When only one cell has really changed but many other cells depend on it, then I want to see only the real change.
I have used xls2csv under Cygwin. Just a Google search shows many implementations. I would start there.
http://search.cpan.org/~ken/xls2csv-1.07/script/xls2csv

Reading data in dymola from chx file

I want to import data into dymola from a chx file which is generated by the output of a program and then run a simulation with those outputs as parameters.
The file has parameters of the form:
<tubedata>
<nrows>28</nrows>
<ncolumns>3<ncolumns>
</tubedata>
I want to import this file into dymola, insert all the variables into a record file and then run simulation.
I'm not sure if chx files are simply xml formatted files, but if they are then there is a rather new library that allows you reading data from xml files (and xls, json, and ini files for that matter):
https://github.com/tbeu/ExternData
You could write an xslt transformation on the .chx file to put the data in Modelica table fomat. See for example https://build.openmodelica.org/Documentation/Modelica.Blocks.Tables.CombiTable1D.html
on how to format the table. Then use the table to set the parameters.
Alternatively I think you can load a script .mos file in Dymola with the format (not sure of it 100%):
x1 := value1
x2 := value2
for the parameters.

Xlsread returning zero values....?

I am getting zero values while using xlsread command in MATLAB.I am using a real world dataset taken from UCI repository which has got both integer and float values.
[Train,textData,rawData] = `xlsread('C:\Users\pooja\Documents\project\breastcancer.csv');`
I have tried with xls format too..
[Train,textData,rawData] = xlsread('C:\Users\pooja\Documents\project\breastcancer.xls');
Thanx in Advance..!
In the wide world of computers, there are a lot of data formats. You need to remember that data formats are different from each other. Generally software like Matlab allows you to open different types of data formats. Each one of course with its own function.
You can guess that the function xmlread is to read XML files. If you want to read csv files or any other type of file in the world, please (I think this is obvious) do not use xmlread!
Specifically to open csv files matlab has csvread. Please, do not use csv read to open files that are not CSV.....

Matlab's Import Tool recognizes a column as numbers but generate %s in formatSpec

I use Matlab's Import Tool to generate a script that will take care of importing several CSV files with the same columns.
The Import Tool successfully manages to recognize the type of each column of my CSV file:
However, in the generated script, the same column are cast as strings (%s = string):
Any idea why?
Surprisingly it works fine with CSV files with fewer columns (it works with 70-column CSV files, but the issue arises with with 120-column CSV files). Here is one example of a CSV file that triggers the issue.
I use R2014b x64 with Windows 7 SP1 x64 Ultimate.
This is happening because one of the columns in your file contains data which contains numbers and text. The Import Tool is predicting that you're going to want to extract the numbers from this field, so it labels the column as 'NUMBER'. However, standard textscan doesn't allow for this, so the Import Tool must generate code to read in all of the data as text, and does the numeric conversion afterwards. The Import Tool is trying to help avoid errors using textscan.
The result of running the generated code is still numeric, as is shown in the Import Tool.
The specific column is labeled SEGMENT_ID in your example file. It contains data like:
l8K3ziItQ2pRFQ8
L79Y0zA8xF7fVTA
JYYqCCwRrigaUmD

Convert dataset of .mat format to .csv octave/matlab

there are datasets in .mat format in the this site: http://www.cs.nyu.edu/~roweis/data.html
I want to change the format to .csv.
Can someone tell me how to change the format to create the .csv file.
Thanks!
Suppose that the .mat files from the site are available already. In the command window in Matlab, you may write, for example:
load('C:\Users\YourUserName\Downloads\mnist_all.mat');
to load the .mat file; the result should be a set of matrices test0, test1, ..., train0, train1 ... created in your workspace, which you want saved as CSV files. Because they're different size, you need to save one CSV per variable, e.g. (also in the command window):
csvwrite('C:\Users\YourUserName\Downloads\mnist_test0.csv', test0);
Repeat the command for each variable, and do not forget to change also the name of the output file to avoid overwriting.
Did you tried the csvwrite function in Matlab?
Just load your .mat files with the load function and then write them with csvwrite!
I do not have a Matlab license so I installed GNU Octave 4.2.1 (2017) on Windows 10 (thank you to John W. Eaton and others). I was not fully successful using the csvwrite so I used the following workaround. (BTW, I am totally incompetent in the Octave world. csvwrite worked for simple data structures).
In the Command Window I used the following two commands
load myfile.mat
save("-text","myfile.txt","variablename")
When the "myfile.mat" is loaded, the variable names for the data vectors loaded are displayed in the workspace window. This is the name(s) to use in the save command. Some .mat files will load several data structures.
The "-text" option is the default, so you may not need to include this option in the command.
The output file lists the .mat file contents in text format as single column (of potentially sequential variables). It should be easy to use you text editor to massage this data into the original matrix structure for use in whatever app you are comfortable with.
Had a similar issue. Needed to convert a series of .mat files that had two columns of numerical data into standard data files (ascii text). Note that I don't really ever use csv, but everything here could be adapted by using csvwrite instead of the standard save.
Using Octave 4.2.1 ....
load myfile.mat
LI = [L, I] ## L and I are column vectors representing my data
save myfile.txt LI
Note that L and I appear to be default variable names chosen by Octave for the two columns vectors in my original data file. Ideally a script that iterated over all files with the .mat extension in my directory would be ideal, but this got the job done. It saves the data as two space separated columns of data.
*** Update
The following script works on Octave 4.2.1 for a series of data files with the .mat extension that are in the same directory. It will iterate over them and write the data out to text files with the same name but with the extension .dat . Note that this is not efficient, so if you have a lot of files or if they are large it can take a while to run. I would suggest that you run it from the command line using octave mat2dat.m so you can actually watch it go.
I make no guarantees that this will work for you, but it did for me. I also am NOT proficient in Octave or Matlab, so I'm sure a better solution exists.
# mat2dat.m
dirlist = glob("*.mat")
for i=1:length(dirlist)
filename = dirlist{i,1}
load(filename, "L", "I")
LI = [L,I]
tmpname = filename(1:length(filename)-3)
txtname = strcat(tmpname, 'dat')
save(txtname, "LI")
end