How to perform nominal to numeric conversion of attributes in WEKA? - classification

I have a dataset containing a mixture of numerical and nominal attributes. I want to convert all the nominal attributes in the dataset to numeric so that I can apply the SVM classifier kernel(PolyKernel and RBFKernel) that only works with numeric attributes. Any help would be greatly appreciated. FYI I've already tried NominalToBinary Filter(Its not really what I want)

One thing you could do is convert all of the label names for the attribute using RenameNominalValues. Please note that all of these new labels would need to be numeric, so you might need to change them as below:
Once this is done, then you could save the .ARFF File, and change the entry in your attributes list from:
#attribute a0 {false,true,maybe}
to
#attribute a0 numeric
Once saved, reload the document and hopefully all will load okay.
Alternatively, you could try your favorite Spreadsheet Application if conversion of your data back to ARFF would not be an issue.
Hope this Helps!

There is no direct filter to convert nominal data to numeric data. If your nominal attribute has 2 values (SEX: MALE, FEMALE) you can easily apply the filter under unsupervised filters "nominal to binary".
But if you have more than 2 variations for the attribute, you cannot use "nominal to binary". So you need to use a filter called "Rename Nominal vales". There you can convert nominal value to numeric value.
Eg: if your dataset has an attribute called " region" and it has "INNER_CITY, TOWN, SUBURBAN, SUBURBAN" for values, you can easily convert those nominal values using the "Rename Nominal Value" filter.
There is a value replacement form, you have to do only add values like below. INNER_CITY:0, TOWN:1, SUBURBAN:2, SUBURBAN:3
you can see your results.

The NominalToNumeric filter (package: weka.filters.unsupervised.attribute) that is part of ADAMS allows you to do exactly that. You can either use the internal representation (i.e., order of labels starting at 0), or, if there is a numeric part in the label that can be turned into a number, use regular expressions to convert these sub-strings.
ADAMS also offers the Weka Investigator, a more powerful tool than the Weka Explorer. Just download the adams-ml-app-snapshot snapshot to get access to this filter.

Yes, you can convert nominal data to numeric in weka:
Example:
select: filters.unsupervised.attribute.OrdinalToNumeric

Related

ADF String to Decimal return NULL value

I have an imported CSV file with string values.
In this file there are amounts, of which several lines equal 0,00
I want to create a TotalCA column by adding several fields in my table and convert it to a numeric value.
I use the toDecimal function and the values are all returned NULL and the created column is grayed..
I have done a lot of research and I can't find a solution. Can you help me?
Thank you
Lea
I made an example csv data if I understand you correctly:
Like you said, some rows are enriched with values greater than 0, and others contain "0.00" when it is a zero value. Actually, the row data contains different data type, int and decimal.
For these reason and as I tested, no matter toDecimal(), toFloat() or toDouble(), all of the functions don't work. I use Derived column expression to do the data conversion.
We can't keep these data and only can choose one type of them. If you choose the decimal or float, other rows data would be converted to '11.0', I think that also doesn't you want.
Source Projection: I preset the column type to double:
(Decimal can't keep '0.00', it only returns '0')
In one word, the only way is that use String data type to keep the data. And also use String data type to receive the data in sink dataset.
HTH.
Thank you all for your answers.
Here is my CSV file
If I go to the Source Projection module and change the type of my column LFC1_UM01S to decimal this is what I get:
Why are some values considered as NULL?
To decimal column

Spotfire - Filtering a Table by Values in a Calculated Column

I am trying to filter a table visualization of all of my data by looking to see if a Study Number contains Activity A. If a Study Number contains Activity A then I want to filter for all rows containing those Study Numbers even if the Activity is not A. See mock data below. In my real data set I have ~55,000 rows.
I have created a calculated to return Study Numbers if Activity= A but I am not sure where to go from there. Thanks for any help.
If(UniqueConcatenate([Activity]) OVER ([Study Number])~="A","Y","N")
Will give you this resulting column that you can then filter on (or you can use the formula as a Data Limiting Expression:

Range values in Tableau

I want to visualise the below excel table in Tableau.
When adding this table to Tableau it shows Salary values as String and thus under Dimension Tab and not under Measure, thus cannot make proper graph from it.
How to convert this Salary range values to Int ?
As #Alexandru Porumb suggested, the best solution is to have a min_salary column and a max_salary column — unless you really have the actual salary available which is even better.
If you don’t want to revise the incoming data, you can get the same effect using the Split() function in a calculated field from Tableau to derive two integer fields from the original string field.
For example, you could define a calculated field called min_salary as INT(SPLIT([Salary], ‘-‘, 1)). Split() extracts part of a string based on a separator string. Int() converts the string to an integer.
You could simplify the way it sees the data and separate the salary column into Min and Max, thus you wouldn't have the hyphen that makes Tableau consider the entry as a string.
Simplistic idea, I know but it may help until a better solution will be provided.
Hope it helps

Power BI - Numeric Format

I would like to know how to change the standard format for numbers in Power BI.
The software I use separates decimals by coma "000.000,00", but the idea is to set that to US format: "000,000.00".
How can I fix that?
There are two approaches to this problem.
If all the data sources you have in the current Power BI file are in the European format, you can set the locale of the file accordingly:
File -> Options and settings -> Options
Once set, Power BI can identify the format for numbers correctly when you change the data type in Query Editor.
If you're just catering for a particular data source, you can use the transformation Replace Values to swap . with ,, and then change the data type to decimal number afterwards.

Problems reading CSV in Octave

I have a .csv file and I can't read it on Octave. On R I just use the command below and everything is read alright:
myData <- read.csv("myData.csv", stringsAsFactors = FALSE)
However, when I go to Octave it doesn't do it properly with the below command:
myData = csvread('myData.csv',1,0);
When I open the file with Notepad, the data looks something like the below. Note there isn't a comma separating the last column name (i.e. Column3) from the first value (i.e. Value1) and the same thing happens with the last value of the first row (i.e. Value3) and the first value of the second row (i.e Value4)
Column1,Column2,Column3Value1,Value2,Value3Value4,Value5,Value6
The Column1 is meant for date values (with format yyyy-mm-dd hh:mm:ss), I don't know if that has anything to do with the problem.
Alex's answers already explains why csvread does not work for your case. That function only reads numeric data and returns an array. Since your fields are all strings, you need something that reads a csv file into a cell array.
That function is named csv2cell and is part of the io package.
As a separate note, if you plan to make operation with those dates, you may want to convert those dates as strings, into serial date numbers. This will allow you to put your dates in a numeric array which will allow for faster operations and reduced memory usage. Also, the financial package has many functions to deal with dates.
csvread only reads numeric data, so a date does not qualify unfortunately.
In Octave you might want to check out the dataframe package. In Matlab you would do readtable.
Otherwise there are also more primitive functions you can use like textscan.