I'm trying to build an autoencoder where some of the relevant input can be zero and non-relevant input will have a different unused value such as -999 or -np.inf.
I know how to use a Masking layer to mask timesteps where all input is equal to mask value or use Embedding layer with mask_zero=True to mask zero value inputs in timestep.
my question is if there's an embedding layer equivalent solution for non-zero values.
Thanks!
P.S. I realise I can replace all legitimate zero values with some other value and replace all non relevant values with zeros and use embedding layer with mask_zero=True as is but looking for a more straightforward solution
Related
I have been reading about TabNet model and it's predictions "explanations" through the attentive transformers' masks values.
However, if the inputs values are not normalized, aren't this masks values simply a normalization scalar (and not a feature's importance value)?
E.g.: A feature Time is expressed in days and has a mask mean value of 1/365, it could mean that the mask is simply normalizing the feature?
Let me know if I wasn't clear in my question.
function out = findWaldo(im, filter)
% convert image (and filter) to grayscale
im_input = im;
im = rgb2gray(im);
im = double(im);
filter = rgb2gray(filter);
filter = double(filter);
filter = filter/sqrt(sum(sum(filter.^2)));
out = normxcorr2(filter, im);
Question1: Why we first do rgb2gray on im and filter?
Question2: What does the last second line actually do? Namely,
filter = filter/sqrt(sum(sum(filter.^2)));
Question 1: Why apply rgb2gray first?
normxcorr2 standing for "Normalized 2-D cross-correlation" works on a 2D signal (see doc). A RGB image is a 3D signal: width x height x color (e.g. 1024 x 1024 x 3, 3 since it's three colors). That is why you flatten it first to one color channel. Applying the filter to the image on each color separately would be the alternative, but then you also need to process three correlations (average them or something...).
Question 2: What does filter = filter/sqrt(sum(sum(filter.^2)));do?
It squares the filter image, then sums over rows and columns (basically all squared gray values of the filters) to get a single number that the squareroot is applied to and then is used to divide all filter image values.
I'd say it is some sort of normalization to handle specific input signals, maybe an attempt to get values from 0 - 1. But since normalized cross correlation (normxcorr2) does normalization itself, this step is definitely not needed for it. Unless you don't do something other than cross correlation with the filter variable, I'd consider this an artifact that should be deleted.
General explanation of the function
This function receives two inputs: an image file and a template.
For example, the image file may be a large scene of findings Waldo game and the template can be a picture of Waldo himself.
The output is a matrix called 'out' with same size as the image file. s.t. each pixel holds a "matching results". The higher the value - the higher the chances that the patch centered around the pixel holds a similar pattern such as the template.
The maximal value should be on the pixel in which Waldo is.
Question 1
rgb2gray function receives an rgb image with 3 channels and transorm it into gray image.
It is done on im and on filter because normxcorr2 function only works with grayscale images.
Question 2
The last perform normalization of the pattern: it divides it by it's norm, thus changing it to 1. In fact, this line is not required and should be deleted. Normalization stage is already performed inside normxcorr2 function.
I have two equally-sized data-arrays (mainly zeros, and sparsely filled with ones), and make the conv of it. As a result I get this.
Now one can see a peak around -10^{-5}. My question is, how can I do the convolution such that I only get a small region around that peak?
I know that the convolution is defined from minus infinity to infinity. Mathematically I would want to change those limits to (in my example) [-1.5*10^5,-0.5*10^-5].
Thanks alot for your help!
edit
I found the solution: One can use xcorr(a,fliplr(b)) instead of conv(a,b). Now xcorr has the option "maxlags", which is exactly the thing I was searching for.
You can reduce the number of output values of conv, but not arbitrarily. Try 'same' or 'valid' options:
C = CONV(A, B, SHAPE) returns a subsection of the convolution with size specified by SHAPE:
'full' - (default) returns the full convolution,
'same' - returns the central part of the convolution
that is the same size as A.
'valid' - returns only those parts of the convolution
that are computed without the zero-padded edges.
LENGTH(C)is MAX(LENGTH(A)-MAX(0,LENGTH(B)-1),0).
To specify arbitrary output limits, you probably need to do the convolution manually with a user-defined function. But it may be more efficient to use conv and then trim the output.
I have created an Auto Encoder Neural Network in MATLAB. I have quite large inputs at the first layer which I have to reconstruct through the network's output layer. I cannot use the large inputs as it is,so I convert it to between [0, 1] using sigmf function of MATLAB. It gives me a values of 1.000000 for all the large values. I have tried using setting the format but it does not help.
Is there a workaround to using large values with my auto encoder?
The process of convert your inputs to the range [0,1] is called normalization, however, as you noticed, the sigmf function is not adequate for this task. This link maybe is useful to you.
Suposse that your inputs are given by a matrix of N rows and M columns, where each row represent an input pattern and each column is a feature. If your first column is:
vec =
-0.1941
-2.1384
-0.8396
1.3546
-1.0722
Then you can convert it to the range [0,1] using:
%# get max and min
maxVec = max(vec);
minVec = min(vec);
%# normalize to -1...1
vecNormalized = ((vec-minVec)./(maxVec-minVec))
vecNormalized =
0.5566
0
0.3718
1.0000
0.3052
As #Dan indicates in the comments, another option is to standarize the data. The goal of this process is to scale the inputs to have mean 0 and a variance of 1. In this case, you need to substract the mean value of the column and divide by the standard deviation:
meanVec = mean(vec);
stdVec = std(vec);
vecStandarized = (vec-meanVec)./ stdVec
vecStandarized =
0.2981
-1.2121
-0.2032
1.5011
-0.3839
Before I give you my answer, let's think a bit about the rationale behind an auto-encoder (AE):
The purpose of auto-encoder is to learn, in an unsupervised manner, something about the underlying structure of the input data. How does AE achieves this goal? If it manages to reconstruct the input signal from its output signal (that is usually of lower dimension) it means that it did not lost information and it effectively managed to learn a more compact representation.
In most examples, it is assumed, for simplicity, that both input signal and output signal ranges in [0..1]. Therefore, the same non-linearity (sigmf) is applied both for obtaining the output signal and for reconstructing back the inputs from the outputs.
Something like
output = sigmf( W*input + b ); % compute output signal
reconstruct = sigmf( W'*output + b_prime ); % notice the different constant b_prime
Then the AE learning stage tries to minimize the training error || output - reconstruct ||.
However, who said the reconstruction non-linearity must be identical to the one used for computing the output?
In your case, the assumption that inputs ranges in [0..1] does not hold. Therefore, it seems that you need to use a different non-linearity for the reconstruction. You should pick one that agrees with the actual range of you inputs.
If, for example, your input ranges in (0..inf) you may consider using exp or ().^2 as the reconstruction non-linearity. You may use polynomials of various degrees, log or whatever function you think may fit the spread of your input data.
Disclaimer: I never actually encountered such a case and have not seen this type of solution in literature. However, I believe it makes sense and at least worth trying.
I have a question regarding the parameters in the edge function.
edge(img,'sobel',threshold);
edge(img,'prewitt',threshold) ;
edge(img,'roberts',threshold);
edge(img,'canny',thresh_canny,sigma);
How should the threshold for the first 3 types be chosen? Is there an aspect that can help choosing this threshold (like histogram for instance)? I am aware of the function graythresh but I want to set it manually. So far I know it's a value between 0-1, but I don't know how to interpret it.
Same thing for Canny. I`m trying to input an array for thresh_canny = [low_limit, high_limit]. but don't know how to look at these values. How does the sigma value influence the image?
It really depends on the type of edge you want to see in the output. If you want to see really powerful edges, use a smaller interval in the higher end of threshold (say 0.9-1) and this is relative to highest gradient magnitude of the image.
As far as the sigma is concerned, it is used in filtering the input image before passing to edge. This is to reduce noise in the input image.