I am trying to implement a two-class classification problem, and the dataset is highly imbalanced. I am using the Binary Focal Cross-entropy Loss function to avoid overfitting but getting this error. The snippet is enclosed here. Any suggestion on this is highly regarded.
enter image description here
Related
I am new to machine learning and have been working on a classification problem and my model after preprocessing is constantly showing poor accuracy even after hyper-parameter optimization. Can anyone help me with suggestions on where am i doing wrong?Thank you..
in preprocessing i filled the null values using mean, checked if the data was normally distributed and did feature scaling after train test split.
What is difference between closed-set and open-set classification problem? Please explain in terms of face recognition problem.
Open-set classification is a problem of handling unknown classes that are not contained in the training dataset, whereas traditional classifiers assume that only known classes appear in the test environment.
I am trying to use the following CNN architecture for semantic pixel classification. The code I am using is here
However, from my understanding this type of semantic segmentation network typically should have a softmax output layer for producing the classification result.
I could not find softmax used anywhere within the script. Here is the paper I am reading on this segmentation architecture. From Figure 2, I am seeing softmax being used. Hence I would like to find out why this is missing in the script. Any insight is welcome.
You are using quite a complex code to do the training/inference. But if you dig a little you'll see that the loss functions are implemented here and your model is actually trained using cross_entropy loss. Looking at the doc:
This criterion combines log_softmax and nll_loss in a single function.
For numerical stability it is better to "absorb" the softmax into the loss function and not to explicitly compute it by the model.
This is quite a common practice having the model outputs "raw" predictions (aka "logits") and then letting the loss (aka criterion) do the softmax internally.
If you really need the probabilities you can add a softmax on top when deploying your model.
I'm approaching a 4 class classification problem, it's not particularly unbalanced, no missing features a lot of observation.. It seems everything good but when I approach the classification with fitcecoc it classifies everything as part of the first class. I try. to use fitclinear and fitcsvm on one vs all decomposed data but gaining the same results. Do you have any clue about the reason of that problem ?
Here are a few recommendations:
Have you normalized your data? SVM is sensitive to the features being
from different scales.
Save the mean and std you obtain during the training and use
those values during the prediction phase for normalizing the test
samples.
Change the C value and see if that changes the results.
I hope these help.
I'm working on a feed-forward backpropagation network in C++ but cannot seem to make it work properly. The network I'm basing mine on is using the cross-entropy error function. However, I'm not very familiar with it and even though I'm trying to look it up I'm still not sure. Sometimes it seems easy, sometimes difficult. The network will solve a multinomial classification problem and as far as I understand, the cross-entropy error function is suitable for these cases.
Someone that knows how it works?
Ah yes, good 'ole backpropagation. The joy of it is that it doesn't really matter (implementation wise) what error function you use, so long as it differentiable. Once you know how to calculate the cross entropy for each output unit (see the wiki article), you simply take the partial derivative of that function to find the weights for the hidden layer, and once again for the input layer.
However, if your question isn't about implementation, but rather about training difficulties, then you have your work cut out for you. Different error functions are good at different things (best to just reason it out based on the error function's definition) and this problem is compounded by other parameters like learning rates.
Hope that helps, let me know if you need any other info; your question was a lil vague...