Is TabNet's mask explanation biased? - neural-network

I have been reading about TabNet model and it's predictions "explanations" through the attentive transformers' masks values.
However, if the inputs values are not normalized, aren't this masks values simply a normalization scalar (and not a feature's importance value)?
E.g.: A feature Time is expressed in days and has a mask mean value of 1/365, it could mean that the mask is simply normalizing the feature?
Let me know if I wasn't clear in my question.

Related

In tf.keras is there a way to mask non-zero inputs

I'm trying to build an autoencoder where some of the relevant input can be zero and non-relevant input will have a different unused value such as -999 or -np.inf.
I know how to use a Masking layer to mask timesteps where all input is equal to mask value or use Embedding layer with mask_zero=True to mask zero value inputs in timestep.
my question is if there's an embedding layer equivalent solution for non-zero values.
Thanks!
P.S. I realise I can replace all legitimate zero values with some other value and replace all non relevant values with zeros and use embedding layer with mask_zero=True as is but looking for a more straightforward solution

Implementation of Radon transform in Matlab, output size

Due to the nature of my problem, I want to evaluate the numerical implementations of the Radon transform in Matlab (i.e. different interpolation methods give different numerical values).
while trying to code my own Radon, and compare it to Matlab's output, I found out that my radon projection sizes are different than Matlab's.
So a bit of intuition of how I compute the amount if radon samples needed. Let's do the 2D case.
The idea is that the maximum size would be when the diagonal (in a rectangular shape at least) part is proyected in the radon transform, so diago=sqrt(size(I,1),size(I,2)). As we dont wan nothing out, n_r=ceil(diago). n_r should be the amount of discrete samples of the radon transform should be to ensure no data is left out.
I noticed that Matlab's radon output is always even, which makes sense as you would want a "ray" through the rotation center always. And I noticed that there are 2 zeros in the endpoints of the array in all cases.
So in that case, n_r=ceil(diago)+mod(ceil(diago)+1,2)+2;
However, it seems that I get small discrepancies with Matlab.
A MWE:
% Try: 255,256
pixels=256;
I=phantom('Modified Shepp-Logan',pixels);
rd=radon(I,pi/4);
size(rd,1)
s=size(I);
diagsize=sqrt(sum(s.^2));
n_r=ceil(diagsize)+mod(ceil(diagsize)+1,2)+2
rd=
367
n_r =
365
As Matlab's Radon transform is a function I can not look into, I wonder why could it be this discrepancy.
I took another look at the problem and I believe this is actually the right answer. From the "hidden documentation" of radon.m (type in edit radon.m and scroll to the bottom)
Grandfathered syntax
R = RADON(I,THETA,N) returns a Radon transform with the
projection computed at N points. R has N rows. If you do not
specify N, the number of points the projection is computed at
is:
2*ceil(norm(size(I)-floor((size(I)-1)/2)-1))+3
This number is sufficient to compute the projection at unit
intervals, even along the diagonal.
I did not try to rederive this formula, but I think this is what you're looking for.
This is a fairly specialized question, so I'll offer up an idea without being completely sure it is the answer to your specific question (normally I would pass and let someone else answer, but I'm not sure how many readers of stackoverflow have studied radon). I think what you might be overlooking is the floor function in the documentation for the radon function call. From the doc:
The radial coordinates returned in xp are the values along the x'-axis, which is
oriented at theta degrees counterclockwise from the x-axis. The origin of both
axes is the center pixel of the image, which is defined as
floor((size(I)+1)/2)
For example, in a 20-by-30 image, the center pixel is (10,15).
This gives different behavior for odd- or even-sized problems that you pass in. Hence, in your example ("Try: 255, 256"), you would need a different case for odd versus even, and this might involve (in effect) padding with a row and column of zeros.

Image Parameters (Standard Deviation, Mean and Entropy) of an RGB Image

I couldn't find an answer for RGB image.
How can someone get a value of SD,mean and Entropy of RGB image using MATLAB?
From http://airccse.org/journal/ijdms/papers/4612ijdms05.pdf TABLE3, it seems he got one answer so did he get the average of the RGB values?
Really in need of any help.
After reading the paper, because you are dealing with colour images, you have three channels of information to access. This means that you could alter one of the channels for a colour image and it could still affect the information it's trying to portray. The author wasn't very clear on how they were obtaining just a single value to represent the overall mean and standard deviation. Quite frankly, because this paper was published in a no-name journal, I'm not surprised how they managed to get away with it. If this was attempted to be published in more well known journals (IEEE, ACM, etc.), this would probably be rejected outright due to that very ambiguity.
On how I interpret this procedure, averaging all three channels doesn't make sense because you want to capture the differences over all channels. Doing this averaging will smear that information and those differences get lost. Practically speaking, if you averaged all three channels, should one channel change its intensity by 1, and when you averaged the channels together, the reported average would be so small that it probably would not register as a meaningful difference.
In my opinion, what you should perhaps do is treat the entire RGB image as a 1D signal, then perform the mean, standard deviation and entropy of that image. As such, given an RGB image stored in image_rgb, you can unroll the entire image into a 1D array like so:
image_1D = double(image_rgb(:));
The double casting is important because you want to maintain floating point precision when calculating the mean and standard deviation. The images will probably be of an unsigned integer type, and so this casting must be done to maintain floating point precision. If you don't do this, you may have calculations that get saturated or clamped beyond the limits of that data type and you won't get the right answer. As such, you can calculate the mean, standard deviation and entropy like so:
m = mean(image_1D);
s = std(image_1D);
e = entropy(image_1D);
entropy is a function in MATLAB that calculates the entropy of images so you should be fine here. As noted by #CitizenInsane in his answer, entropy unrolls a grayscale image into a 1D vector and applies the Shannon definition of entropy on this 1D vector. In a similar token, you can do the same thing with a RGB image, but we have already unrolled the signal into a 1D vector anyway, and so the input into entropy will certainly be well suited for the unrolled RGB image.
I have no idea how the author actually did it. But what you could do, is to treat the image as a 1D-array of size WxHx3 and then simply calculate the mean and standard deviation.
Don't know if table 3 is obtain in the same way but at least looking at entropy routine in image toolbox of matlab, RGB values are vectorized to single vector:
I = imread('rgb'); % Read RGB values
I = I(:); % Vectorization of RGB values
p = imhist(I); % Histogram
p(p == 0) = []; % remove zero entries in p
p = p ./ numel(I); % normalize p so that sum(p) is one.
E = -sum(p.*log2(p));

MATLAB edge function

I have a question regarding the parameters in the edge function.
edge(img,'sobel',threshold);
edge(img,'prewitt',threshold) ;
edge(img,'roberts',threshold);
edge(img,'canny',thresh_canny,sigma);
How should the threshold for the first 3 types be chosen? Is there an aspect that can help choosing this threshold (like histogram for instance)? I am aware of the function graythresh but I want to set it manually. So far I know it's a value between 0-1, but I don't know how to interpret it.
Same thing for Canny. I`m trying to input an array for thresh_canny = [low_limit, high_limit]. but don't know how to look at these values. How does the sigma value influence the image?
It really depends on the type of edge you want to see in the output. If you want to see really powerful edges, use a smaller interval in the higher end of threshold (say 0.9-1) and this is relative to highest gradient magnitude of the image.
As far as the sigma is concerned, it is used in filtering the input image before passing to edge. This is to reduce noise in the input image.

Fill areas in a matrice

As u can see in this figure, I have 3 "lines", that are not linear with values of zeros (or NaNs) in between.
What I would like to create a picture that has 3 filled areas with a single value by averaging all the values in the "line" section" and fill them where the area of zeroes below.
averaging is not a problem and also using a fill or patch command is not the problem, my problem here is how to identify a piece of data here that is not linear or homogenous in it's content by it's shape and efficiently create three slices of data because real values in the matrice must stay.
will appreciate any idea!
thanks
It's a little tough to tell from a colormap like that, but it looks like the inhomogeneities are considerably worse along the columns. If that is the case, then you may be best off comparing the values of each column to the two adjacent ones and creating a bias index.
Just taking a stab at it - I'll use inhomoge as slang for your inhomogenous image
predictedInhomoge=(inhomoge(1:end-2)+inhomoge(3:end))/2;
biasImage=(predictedInhomoge-inhomoge(2:end-1))./(inhomoge(2:end-1)+predictedInhomoge);
image(255*biasImage/max(biasImage(:)));
If your blue colored data are actually zeros and not just small values the biasImage above will have a lot of pointless Inf values. If that's the case, just drop the denominator off of the biasImage equation so that it is no longer normalized.
biasImage=(predictedInhomoge-inhomoge(2:end-1));
If the biasImage looks like the highest values reflect your inhomogeneities then you're in business, just pick a threshold and recalculate the values above it. If it doesn't, then I'd consider trying something Bayesian.
Probably doesn't need to be said, but any NaNs are easy to find and remove.