Text Classification with the class label and probability of the class label - ipython

I have use decision tree to predict the text into several classes as shown in the figure 1 below:
I would like to append the probability of the prediction next to the "predictions" column as the figure 2 below. I have tried to find the solution but cant find it anywhere. Is there any possible way to do this?

Related

How to copy data from a 1 x 1 double timeseries to a 1 D Lookup Table in Matlab

I have been asked to look at a Matlab project. I'll link sceenshots to clarify the problem. I need to create a 1 D Lookup table with the data from a 1 x 1 double timeseries from another model that has been supplied. One problem is that there are a lot of data points (12500). Is it possible to copy these points across without having to drag the mouse down over the whole 12500 points? Someone has actually tried this dragging the mouse over all the points method and said it didn't work anyway, but I don't really want to try it myself, as it would be way to cumbersome for my liking, even if it did work.
Here is an example of what the 1 x 1 double timeseries looks like (just using 5 points instead of 12500 for simplicity's sake):
Here is the model with the 1D look up table highlighted in blue at the left:
Here is what the 1-D lookup table looks like when opened:
Any insight appreciated.
I've worked out how to copy the data from the timeseries table (actually from the input to this, which is a 1 x 1 struct), but he problem is that the values have no commas between them and the 1-D lookup table requires commas.
Note that the problem has now been solved using Excel, although not by the method I was trying to make work in the question. An answer has been posted which may work, but I'm not sure if i will go to the trouble of attempting to implement it or not at this stage. However, if need be and all being well, I will either do that or delete the question.
You can import a look-up table object (Simulink.LookupTable-object) from both, the MATLAB workspace or directly from Excel.
If you want to automate it, it basically comes down to these two points:
"
Open the model containing the lookup table block and in the Modeling tab, select Model Settings.
In the Model Properties dialog box, in the Callbacks tab, click PostLoadFcn callback in the model callbacks list.
... the next time you open the model, Simulink® invokes the callback and imports the data.
"
After a little thought I feel like the answer is actually fairly trivial: it just can't be done. The input field of the look up table just can't hold 12500 points, most of which have 9 or so decimal places, so there's no way to do this.
This isn't to say there's no way to put the data into the model: this can be done using Excel.

No Model Summary For GLMs in Pyspark / SparkML

I'm familiarizing myself with Pyspark and SparkML at the moment. To do so I use the titanic dataset to train a GLM for predicting the 'Fare' in that dataset.
I'm following closely the Spark documentation. I do get a working model (which I call glm_fare) but when I try to assess the trained model using summary I get the following error message:
RuntimeError: No training summary available for this GeneralizedLinearRegressionModel
Why is this?
The code for training was as such:
glm_fare = GeneralizedLinearRegression(
labelCol="Fare",
featuresCol="features",
predictionCol='prediction',
family='gamma',
link='log',
weightCol='wght',
maxIter=20
)
glm_fit = glm_fare.fit(training_df)
glm_fit.summary
Just in case someone comes across this question, I ran into this problem as well and it seems that this error occurs when the Hessian matrix is not invertible. This matrix is used in the maximization of the likelihood for estimating the coefficients.
The matrix is not invertible if one of the eigenvalues is 0, which occurs when there is multicollinearity in your variables. This means that one of the variables can be predicted with a linear combination of the other variables. Consequently, the effect of each of the variables cannot be identified with any significance.
A possible solution would be to find the variables that are (multi)collinear and remove one of them from the regression. Note however that multicollinearity is only a problem if you want to interpret the coefficients and not when the model is used for prediction.
It is documented possibly there could be no summary available for a model in GeneralizedLinearRegressionModel docs.
However you can do an initial check to avoid the error:
glm_fit.hasSummary() which is a public boolean method.
Using it as
if glm_fit.hasSummary():
print(glm_fit.summary)
Here is a direct like to the Pyspark source code
and the GeneralizedLinearRegressionTrainingSummary class source code and where the error is thrown
Make sure your input variables for one hot encoder starts from 0.
One error I made that caused summary not created is, I put quarter(1,2,3,4) directly to one hot encoder, and get a vector of length 4, and one column is 0. I converted quarter to 0,1,2,3 and problem solved.

Tableau percentage change of same measure

I have added a very simple example Tableau here: https://drive.google.com/file/d/1yG7EdIrKrTklhWOEoAmM4xwSbFZpOTvf/view?usp=sharing
In this example what I would like to achieve is to have a % change column which shows the % change.
I've tried using Quick table calculations but I cannot find any calculation that allows me to get the % change. I've also tried to use a calculated field with IF statements, but the calculation was returning blank cell.
P.S.
This is part of a more complex example in this thread: https://community.tableau.com/message/753038#753038
I have followed the answer in the link above and I was able to get to the point where I have the data showing up in two separate columns "Current Year" and "Prior Year".
But then I'm stuck on the supposedly easy step of simply calculating the % change between those two columns.
I was able to solve this issue by right clicking on the measure and selecting "Add Table Calculation".
Then I choose Calculation Type = "Percent Difference From", computing using = "Table (across)" and finally relative to = "Previous"

Delaunay Triangulation - Removing Triangles

I made a Delaunay Triangulation using Matlab version 2013. I want to remove some of the triangles, meaning canceling their connectivity, for example triangle number 760. How can I make this change? When I tried to edit the connectivity list:
dt.ConnectivityList(760 , :) = [];
I got the message:
Cannot assign values to the triangulation.
I thought about maybe copying specific fields to a different structure, but:
a. I'm not familiar with structures so I don't know how to do it right.
b. After I copy the structure, how can I get my triangles?
dt contains 3 fields: Points, ConnectivityList and Constraints (empty field).
A brief note on MATLAB objects. When you access a field for reading, you are basically doing get(obj, fieldname);. When you try to set a field as you are doing, you are actually calling set(obj, fieldname, new_value). Objects do not necessarily allow you to do these operations.
The triangulation object is read-only, so you will have to make copies of all the fields. If, as you mentioned, you would like to make a structure with similar fields, you can do as follows:
dts = struct('Points', dt.Points, 'ConnectivityList', dt.ConnectivityList);
Now you can edit the fields.
dts.ConnectivityList(760) = [];
You may be able to plot the new structure, but the methods of the delaunayTriangulation class will not be available to you.
To plot the result, use trisurf:
trisurf(dts.ConnectivityList, dts.Points);
I was facing same problem. I found another solution. Instead of creating a new struct just create an object of its super class i.e. triangulation class with edited connectivity list.
Here is my code
P- list of points
C- Constraints (optional)
dt=delaunayTriangulation(P,C); %created triangulation but dt won't let you change connectivity list
list=dt.ConnectivityList;
%your changes here
x=triangulation(list,dt.Points);
Now you can use x as triangulation object
triplot(x)

How to use Decision Tree Classification Matlab?

I have data in form of rows and columns where rows represent a record and column represents its attributes.
I also have the labels (classes) for those records.
I know about decision trees concept and I would like to use matlab for classification of unseen records using decision trees.
How can this be done? I followed this link but its not giving me correct output-
Decision Tree in Matlab
Essentially I want to construct a decision tree based on training data and then predict the labels of my testing data using that tree. Can someone please give me a good and working example for this ?
I used following code to achieve it. And it is working correctly
function DecisionTreeClassifier(trainingFile, testingFile, labelsFile, outputFile)
training = csvread(trainingFile);
labels = csvread(labelsFile);
testing = csvread(testingFile);
tree = ClassificationTree.fit(training,labels)
prediction = predict(tree, testing)
csvwrite(outputFile, prediction)
ClassificationTree.fit will be removed in a future release. Use fitctree instead.