How to set a class_weight Dictionary for Random Forest? - classification

I'm dealing with an unbalanced dataset, so I decided to use a weight dictionary for classification.
Documentation says that a weight dict must be defined as shown below:
https://imbalanced-learn.org/stable/generated/imblearn.ensemble.BalancedRandomForestClassifier.html
weight_dict = [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}]
So, since I want to predict 12 classes which are located in the last column.
I assume that the setting would be like:
weight_dict = [{0: 1, 1: 5.77390289e-01}, {0: 1, 1: 6.48317326e-01},
{0: 1, 1: 1.35324885e-01}, {0: 1, 1: 2.92665797e+00},
{0: 1, 1: 5.77858906e+01}, {0: 1, 1: 1.73193507e+00},
{0: 1, 1: 9.27828244e+00}, {0: 1, 1: 1.18766082e+01},
{0: 1, 1: 8.99009985e+01}, {0: 1, 1: 6.39833279e+00},
{0: 1, 1: 2.55347077e+01}, {0: 1, 1: 9.47015372e+02}]
Honestly, I don't clearly understand the notation of the first indicators, I mean the:
0:1 of {0: 1, 1: 1}
or the:
1: value.
Do they represent column position, label order?
What is the right way to set it?
I'll be grateful for your insights.

I don't clearly understand the notation of the first indicators 0:1 of {0: 1, 1: 1}
The notation is {<class label> : <count>}. The class label is in its original (ie. untransformed) representation.
For example, the following would order the generation of an Iris training set that contains 25 samples of "setosa", and 50 samples of "versicolor" and "virginica" each:
weight_dict = {"setosa" : 25, "versicolor" : 50, "virginica" : 50}

Related

How should you test the significance of 2 classification accuracy scores: paired permutation test

I have a single trained classifier tested on 2 related multiclass classification tasks. As each trial of the classification tasks are related, the 2 sets of predictions constitute paired data. I would like to run a paired permutation test to find out if the difference in classification accuracy between the 2 prediction sets is significant.
So my data consists of 2 lists of predicted classes, where each prediction is related to the prediction in the other test set at the same index.
Example:
actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
H0: There is no significant difference in classification accuracy.
How do I go about running a paired permutation test to test significance of the difference in classification accuracy?
I have been thinking about this and I'm going to post a proposed solution and see if someone approves or explains why I'm wrong.
actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
paired_predictions = [[1,1], [3,3], [6,7], [1,10], [22,22], [1,1], [11,7], [12,12], [9,2], [10,10]]
actual_test_statistic = predictions1 - predictions2 # 90%-50%=40 # 0.9-0.5=0.4
all_simulations = [] # empty list
for number_of_iterations:
shuffle(paired_predictions) # only shuffle between pairs, not within
simulated_predictions1 = paired_predictions[first prediction of each pair]
simulated_predictions2 = paired_predictions[second prediction of each pair]
simulated_accuracy1 = proportion of times simulated_predictions1 equals actual_classes
simulated_accuracy2 = proportion of times simulated_predictions2 equals actual_classes
all_simulations.append(simulated_accuracy1 - simulated_accuracy2) # Put the simulated difference in the list
p = count(absolute(all_simulations) > absolute(actual_test_statistic ))/number_of_iterations
If you have any thoughts, let me know in the comments. Or better still, provide your own corrected version in your own answer. Thank you!

How to find duplicate elements length in array flutter?

I want to implement add to checkout in which number of items added is displayed. Plus button adds elements in list and minus removes elements from list. Goal is just to display particular items added and its quantity. I have added items in list want to count length of duplicate items. How we can do that in flutter?
here is your solution. [Null Safe]
void main() {
List<int> items = [1, 1, 1, 2, 3, 4, 5, 5];
Map<int, int> count = {};
items.forEach((i) => count[i] = (count[i] ?? 0) + 1);
print(count.toString()); // {1: 3, 2: 1, 3: 1, 4: 1, 5: 2}
}

PostGIS conditional aggregration - presence/absence matrix

I have a dataset that resembles the following:
site_id, species
1, spp1
2, spp1
2, spp2
2, spp3
3, spp2
3, spp3
4, spp1
4, spp2
I want to create a table like this:
site_id, spp1, spp2, spp3, spp4
1, 1, 0, 0, 0
2, 1, 1, 1, 0
3, 0, 1, 1, 0
4, 1, 1, 0, 0
This question was asked here, however the issue I face is that my list of species is significantly greater and so creating a massive query listing each species manually would take a significant amount of time. I would therefore like a solution that does not require this and could instead read from the existing species list.
In addition, when playing with that query, the count() function would keep adding so I would end up with values greater than 1 where multiples of the same species were present in a site_id. Ideally I want a binary 1 or 0 output.

How to compare 2 sets of different date which contains 2 different sets of data?

I have 2 sets of Date, their 1st and last dates are the same respectively but their dates within might not be the same to each other. Both DateA and DateB contain different values on their each date, which are arrays A and B.
DateA= '2016-01-01'
'2016-01-02'
'2016-01-04'
'2016-01-05'
'2016-01-06'
'2016-01-07'
'2016-01-08'
'2016-01-09'
'2016-01-10'
'2016-01-12'
'2016-01-13'
'2016-01-14'
'2016-01-16'
'2016-01-17'
'2016-01-18'
'2016-01-19'
'2016-01-20'
DateB= '2016-01-01'
'2016-01-02'
'2016-01-03'
'2016-01-04'
'2016-01-05'
'2016-01-09'
'2016-01-10'
'2016-01-11'
'2016-01-12'
'2016-01-13'
'2016-01-15'
'2016-01-16'
'2016-01-17'
'2016-01-19'
'2016-01-20'
A = [5, 2, 3, 4, 6, 1, 7, 9, 3, 6, 1, 7, 9, 2, 1, 4, 6]
B = [4, 2, 7, 1, 8, 4, 9, 5, 3, 9, 3, 6, 7, 2, 9]
I have converted the dates into datenumber,ie
datenumberA= 736330
736331
736333
736334
736335
736336
736337
736338
736339
736341
736342
736343
736345
736346
736347
datenumberB= 736330
736331
736332
736333
736334
736338
736339
736340
736341
736342
736344
736345
736346
736348
736349
Now I want to compare the value of A on DateA(n) to that of B on DateB while DateB is the date that is closest to and before the date of DateA(n).
For example,
comparing the value of A on DateA '2016-01-12' to that of B on DateB '2016-01-11'.
Please help and thanks a lot.
It'll get you the desired output!
all_k=0;
out(1)=1; % not comparing the first index as you mentioned
for n=2:size(datenumberA,1)
j=0;
while 1
k=find(datenumberB+j==datenumberA(n)-1); %finding the index of DateB closest to and before DateA(n)
if size(k,1)==1 break; end %if found, come out of the while loop
j=j+1; % otherwise keep adding 1 in the values of datenumberB until found
end
if size(find(all_k==k),2) ~=1 % to avoid if any DateB is already compared
out(end+1)=A(n)> B(k); %Comparing Value in A with corresponding value in B
all_k(end+1)=k; end %Storing which indices of DateB are already compared
end
out' %Output
Output:-
ans =
1
0
0
1
0
0
1
0
0
1
0
0
1

Google Chart, how to move annotation on top of columns

I'm using Google Chart's stacked column chart, what i wanna achieve is to display the total on top of each column and i'm using annotation for this. As you look at the image, somehow only the annotation on the 5th column (1,307.20) is working as expected.
As i investigate , this seem like a bug of Google Chart , this bug can be explained like below
[[Date, Car, Motobike, {role: :annotation}],
[June 2015, 500, 0, 500],
[Feb 2015, 500, 600, 1100]]
[March 2015, 700, 0, 700],
With the above data, the annotation for Feb 2015 is the only which is displayed correctly , the other 2 do not since the last value of then is 0 , when I change the last value to 1 for June and March , the annotation is displayed correctly.
Then I think of a work around is to always display the "non-zero" data on top , and here's the result:
The annotations are moved on top properly , but as you can see, it's located within the column and what i want to achieve is to move it on top of the column .
I'm stuck with this for a while , Google Documentation doesn't help much with this case. Any help would be highly appreciated
I had the same problem, some of my series had 0 as my last value so the label would show on the X Axis instead of at the top. With dynamic data it would be a real challenge to ensure the last value was never 0. #dlaliberte gave me a hint where to start with this comment:
"As a workaround, you might consider using a ComboChart with an extra
series to draw a point at the top of each column stack. You'll have to
compute the total of the other series yourself to know where to put
each point."
I found a combo chart from google's gallery and opened jsfiddle to see what I could do. I left the data mostly, but changed the series name labels and made the numbers a little simpler. Don't get caught up on the purpose of the graph the data is regardless, I just wanted to figure out how to get my annotation to the top of the graph even when the last column was 0 (https://jsfiddle.net/L5wc8rcp/1/):
function drawVisualization() {
// Some raw data (not necessarily accurate)
var data = google.visualization.arrayToDataTable([
['Month', 'Bolivia', 'Ecuador', 'Madagascar', 'Papua New Guinea', 'Rwanda', 'Total', {type: 'number', role: 'annotation'}],
['Application', 5, 2, 2, 8, 0, 17, 17],
['Friend', 4, 3, 5, 6, 2, 20, 20],
['Newspaper', 6, 1, 0, 2, 0, 9, 9],
['Radio', 8, 0, 8, 1, 1, 18, 18],
['No Referral', 2, 2, 3, 0, 6, 13, 13]
]);
var options = {
isStacked: true,
title : 'Monthly Coffee Production by Country',
vAxis: {title: 'Cups'},
hAxis: {title: 'Month'},
seriesType: 'bars',
series: {5: {type: 'line'}},
};
var chart = new google.visualization.ComboChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
That produced this graph, which is a great start:
As you can see since series 5 (our Total of the other series) is a type: 'line', so it will always point to the top of the stack. Now, I didn't necessarily want the line in my chart, since it was not used to compare continuous horizontal totals, so I updated series 5 with lineWidth: 0, and then made the title of that category '' so that it wouldn't be included in the legend as a stack (https://jsfiddle.net/Lpgty7rq/):
function drawVisualization() {
// Some raw data (not necessarily accurate)
var data = google.visualization.arrayToDataTable([
['Month', 'Bolivia', 'Ecuador', 'Madagascar', 'Papua New Guinea', 'Rwanda', '', {type: 'number', role: 'annotation'}],
['Application', 5, 2, 2, 8, 0, 17, 17],
['Friend', 4, 3, 5, 6, 2, 20, 20],
['Newspaper', 6, 1, 0, 2, 0, 9, 9],
['Radio', 8, 0, 8, 1, 1, 18, 18],
['No Referral', 2, 2, 3, 0, 6, 13, 13]
]);
var options = {
isStacked: true,
title : 'Monthly Coffee Production by Country',
vAxis: {title: 'Cups'},
hAxis: {title: 'Month'},
seriesType: 'bars',
series: {5: {type: 'line', lineWidth: 0}},
};
var chart = new google.visualization.ComboChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
And Voila!
Use alwaysOutside: true.
annotations: {
textStyle: {
color: 'black',
fontSize: 11,
},
alwaysOutside: true
}
You will want to use the annotations.alwaysOutside option:
annotations.alwaysOutside -- In Bar and Column charts, if set to true,
draws all annotations outside of the Bar/Column.
See https://google-developers.appspot.com/chart/interactive/docs/gallery/columnchart
However, with a stacked chart, the annotations are currently always forced to be inside the columns. This will be fixed in the next major release.
As a workaround, you might consider using a ComboChart with an extra series to draw a point at the top of each column stack. You'll have to compute the total of the other series yourself to know where to put each point. Then make the pointSize 0, and add the annotation column after this series.