Replace values with NaN or Inf when certain conditions are met

Replace values with NaN or Inf when certain conditions are met - matlab

I created the following three dimensional mockup matrix:
mockup(:,:,1) = ...
[100, 100, 100; ...
103, 95, 100; ...
101, 85, 100; ...
96, 90, 102; ...
91, 89, 99; ...
97, 91, 97; ...
105, 83, 100];
mockup(:,:,2) = ...
[50, NaN, NaN; ...
47, NaN, 40; ...
45, 60, 45; ...
47, 65, 45; ...
51, 70, 45; ...
54, 65, 50; ...
62, 80, 55];
I also defined percentTickerAvailable = 0.5.
As a result, The columns represent equity prices from three different assets. For futher processing I need to manipulate the NaN values in the following way.
If the percentage of NaNs in any given ROW is greater than 1 - percentTickerAvailable, replace all values in these particular rows with NaNs. That is, if not enough assets have prices in that particular row, ignore the row completely.
If the percentage of NaNs in any given ROW is less or equal to 1 - percentTickerAvailable, replace the respective NaNs with -inf.
To be clear, "percentage of NaNs in any given ROW" is calculated as follows:
Number of NaNs in any given ROW divided by number of columns.
The adjusted mockup matrix should look like this:
mockupAdj(:,:,1) = ...
[100, 100, 100; ...
103, 95, 100; ...
101, 85, 100; ...
96, 90, 102; ...
91, 89, 99; ...
97, 91, 97; ...
105, 83, 100];
mockupAdj(:,:,2) = ...
[NaN, NaN, NaN; ...
47, -inf, 40; ...
45, 60, 45; ...
47, 65, 45; ...
51, 70, 45; ...
54, 65, 50; ...
62, 80, 55];
So far, I did the following:
function vout = ranking(vin, percentTickerAvailable)
percentNonNaN = 1 - sum(isnan(vin), 2) / size(vin, 2);
NaNIdx = percentNonNaN < percentTickerAvailable;
infIdx = percentNonNaN > percentTickerAvailable & ...
percentNonNaN < 1;
[~, ~, numDimVin] = size(vin);
for i = 1 : numDimVin
vin(NaNIdx(:,:,i) == 1, :, i) = NaN;
end
about = vin;
end % EoF
By calling mockupAdj = ranking(mockup, 0.5) this already transforms the first row in mockup(1,:,2)correctly to {'NaN', 'NaN', 'NaN'}. However, I am struggling with the second point. With infIdx I already successfully identified the rows that corresponds to the second condition. But I don't know how to correctly use that information in order to replace the single NaN in mockup(2,2,2) with -inf.
Any hint is highly appreciated.

This is a good example of something that can be solved using vectorization. I am providing two versions of the code, one that uses the modern syntax (including implicit expansion) and one for older version of MATLAB.
Several things to note:
In the NaN substitution stage, I'm using a "trick" where 0/0 is evaluated to NaN.
In the Inf substitution stage, I'm using logical masking/indexing to access the correct elements in vin.
R2016b and newer:
function vin = ranking (vin, percentTickerAvailable)
% Find percentage of NaNs on each line:
pNaN = mean(isnan(vin), 2, 'double');
% Fills rows with NaNs:
vin = vin + 0 ./ (1 - ( pNaN >= percentTickerAvailable));
% Replace the rest with -Inf
vin(isnan(vin) & pNaN < percentTickerAvailable) = -Inf;
end
Prior to R2016b:
function vin = rankingOld (vin, percentTickerAvailable)
% Find percentage of NaNs on each line:
pNaN = mean(isnan(vin), 2, 'double');
% Fills rows with NaNs:
vin = bsxfun(#plus, vin, 0 ./ (1 - ( pNaN >= percentTickerAvailable)));
% Replace the rest with -Inf
vin(bsxfun(#and, isnan(vin), pNaN < percentTickerAvailable)) = -Inf;
end

1)
The percentage of NaN in any given row should be smaller than 1
... Are you talking about ratio? In which case this is a useless check, as it will always be the case. Or talking about percentages? In which case your code doesn't do what you describe. My guess is ratio.
2) Based on my guess, I have a follow up question: following your description, shouldn't mockup(2,2,2) stay NaN? There is 33% (<50%) of NaN in that row, so it does not fulfill your condition 2.
3) Based on the answers I deemed logical, I would have changed percentNaN = sum(isnan(vin), 2) / size(vin, 2); for readability, and NaNIdx = percentNaN > percentTickerAvailable; accordingly. Now just add one line in front of your loop:
vin(isnan(vin)) = -inf;
Why? Because like this you replace all the NaNs by -inf. Later on, the ones that respect condition 1 will be overwritten to NaN again, by the loop. You don't need the InfIdx.
4) Be aware that your function cannot return vout as of now. Just let it return vin, and you'll be fine.

You can also use logical indexing to achieve this task:
x(:,:,1) = ...
[100, 100, 100; ...
103, 95, 100; ...
101, 85, 100; ...
96, 90, 102; ...
91, 89, 99; ...
97, 91, 97; ...
105, 83, 100];
x(:,:,2) = ...
[50, NaN, NaN; ...
47, NaN, 40; ...
45, 60, 45; ...
47, 65, 45; ...
51, 70, 45; ...
54, 65, 50; ...
62, 80, 55];
% We fix the threshold
tres = 0.5; %fix the threshold.
% We check if a value = NaN or not.
in = isnan(x);
% Which line have more than 50% of NaN ?.
ind = (sum(in,2)./(size(x,2)))>0.5
% We generate an index
[x1,~,x3] = ind2sub(size(ind),ind);
% We set the NaN index to 0 if the line contains less than 50 % of NaN.
in(x1,:,x3) = 0;
% We calculate the new values.
x(in) = -inf;
x(x1,:,x3) = NaN;

Related

Calculate padding for 3D CNN in Pytorch

I'm currently trying to apply a 3D CNN to a set of images with the dimensions of 193 x 229 x 193 and would like to retain the same image dimensions through each convolutional layer (similar to tensorflow's padding=SAME). I know that the padding can be calculated as follow:
S=Stride
P=Padding
W=Width
K=Kernal size
P = ((S-1)*W-S+K)/2
Which yields a padding of 1 for the first layer:
P = ((1-1)*193-1+3)/2
P= 1.0
Although I also get a result of 1.0 for each of the subsequent layers. Anyone have any suggestions? Sorry, beginner here!
Reproducible example:
import torch
import torch.nn as nn
x = torch.randn(1, 1, 193, 229, 193)
padding = ((1-1)*96-1+3)/2
print(padding)
x = nn.Conv3d(in_channels=1, out_channels=8, kernel_size=3, padding=1)(x)
print("shape after conv1: " + str(x.shape))
x = nn.Conv3d(in_channels=8, out_channels=8, kernel_size=3,padding=1)(x)
x = nn.BatchNorm3d(8)(x)
print("shape after conv2 + batch norm: " + str(x.shape))
x = nn.ReLU()(x)
print("shape after reLU:" + str(x.shape))
x = nn.MaxPool3d(kernel_size=2, stride=2)(x)
print("shape after max pool" + str(x.shape))
x = nn.Conv3d(in_channels=8, out_channels=16, kernel_size=3,padding=1)(x)
print("shape after conv3: " + str(x.shape))
x = nn.Conv3d(in_channels=16, out_channels=16, kernel_size=3,padding=1)(x)
print("shape after conv4: " + str(x.shape))
Current output:
shape after conv1: torch.Size([1, 8, 193, 229, 193])
shape after conv2 + batch norm: torch.Size([1, 8, 193, 229, 193])
shape after reLU:torch.Size([1, 8, 193, 229, 193])
shape after max pooltorch.Size([1, 8, 96, 114, 96])
shape after conv3: torch.Size([1, 16, 96, 114, 96])
shape after conv4: torch.Size([1, 16, 96, 114, 96])
Desired output:
shape after conv1: torch.Size([1, 8, 193, 229, 193])
shape after conv2 + batch norm: torch.Size([1, 8, 193, 229, 193])
...
shape after conv3: torch.Size([1, 16, 193, 229, 193])
shape after conv4: torch.Size([1, 16, 193, 229, 193])

TLDR; your formula also applies to nn.MaxPool3d
You are using a max pool layer of kernel size 2 (implicitly (2,2,2)) with a stride of 2 (implicitly (2,2,2)). This means for every 2x2x2 block you're only getting a single value. In other words - as the name implies: only the maximum value from every 2x2x2 block is pooled to the output array.
That's why you're going from (1, 8, 193, 229, 193) to (1, 8, 96, 114, 96) (notice the division by 2).
Of course, if you set kernel_size=3 and stride=1 on nn.MaxPool3d, you will preserve the shape of your blocks.
Let #x be the input shape, and #w the kernel shape. If we want the output to have the same size, then #x = floor((#x + 2p - #w)/s + 1) needs to be true. That's 2p = s(#x - 1) - #x + #w = #x(s - 1) + #w - s (your formula)
Since s = 2 and #w = 2, then 2p = #x which is not possible.

Converting a matrix loop into an equivalent function

Given a recursive loop similar to the following:
A = [5,2;0,2]
B = [5;6]
x = [0;7]
for i = 1:10
x(:,i+1) = A * x(:,i) + B
end
How can this represented without a loop?
Sample output:
[ 0, 19, 140, 797, 4186, 21339, 107520, 539257, 2699606, 13504679, 67536700;
7, 20, 46, 98, 202, 410, 826, 1658, 3322, 6650, 13360]

Here is a more mathy approach by solving the general formula for your recursion
u = pinv(A-eye(2))*B;
C = arrayfun(#(n) A^n*(x+u)-u,0:10,'UniformOutput',false);
M = cat(2,C{:});
which gives
M =
Columns 1 through 9:
0 19 140 797 4186 21339 107520 539257 2699606
7 20 46 98 202 410 826 1658 3322
Columns 10 and 11:
13504679 67536700
6650 13306

I think you are looking to create a recursive function. If so, the below might work for you.
A = [5,2;0,2]
B = [5;6]
x = [0;7]
x = myRecursive(A,B,x, 10)
function [x] = myRecursive(A,B,x,n)
x(:,end+1) = A * x(:,end) + B;
if size(x,2) <= n
x = myRecursive(A,B,x,n);
end
end

A method to vectorise a call to prod() for lots of arrays of varying length?

So my problem is, I'd like to do this without the for loop. Geting the prod() of multiple vectors but of different lengths.
I am dealing with rays intersecting voxels. I typically have 1e6 rays and 1e5 voxels, but this can vary.
intxRays is a list of rays that have intersected voxels.
gainList is a one dimensional vector, each element has a value that corresponds to a specific ray voxel intersection calculated previously (actually with the help of you lovely lot here).
rayIntxStart and rayIntxEnd are vectors of indices for, where in the gainlist array, each ray's corresponding values start and end (they're all in order).
Here is the code and some examples and expected outputs.
gainSum = zeros(1, 5);
% only interested in the intx uniques
intxSegCtr = 1;
% loop through all of the unique segments
for rayCtr = 1:max(intxRays)
if rayCtr == intxRays(intxSegCtr)
startInd = rayIntxStart(intxSegCtr);
endInd = rayIntxEnd(intxSegCtr);
% find which rows correspoond to those segements
gainVals = gainList(startInd:endInd);
gainProd = prod(gainVals);
% get the product of the gains for those voxels
gainSumIdx = intxRays(intxSegCtr);
gainSum(gainSumIdx) = gainProd;
% increment counter
intxSegCtr = intxSegCtr + 1;
end
end
Example data for five rays and nine voxels. Assume the voxel gain array looked like this (for simplicity) for nine voxels (used in previous step).
voxelGains = 10:10:90;
Now say rays 1 and 3 don't hit anything, ray 2 hits voxels 1 and 2, ray 4 hits voxels 2:7
and ray 5 hits voxels 6:9
intxRays = [2, 4, 5];
gainList = [10, 20, 20, 30, 40, 50, 60, 70, 60 70, 80, 90];
rayIntxStart = [1, 3, 9];
rayIntxEnd = [2, 8, 12];
For these numbers the above code would give as a result:
gainSum = [0, 200, 0, 5.0400e+09, 3.024e+07];
I hope this all makes sense.
When I developed it I was using far smaller ray and voxel numbers and it worked fine. As I'm moving up though, the major bottleneck in my code is this loop. Actually just the gainVals and gainProd assignment is like 80% and 15% of my runtime on their own.
This is the only method I can find that works, padding and the like won't work due to the sizes involved.
Is there a way to get the value I want, without this loop?
Many thanks!

ok this is a very small performance boost, but it might help. for testing the matrix way without the loop a bigger data sample is needed.
These are 3 soultions, your original, an optimized and the optimized way as a oneliner. could you please try if this is already doing something for you?
clear all
% orignial loop through all Rays
intxRays = [2, 4, 5];
gainList = [10, 20, 20, 30, 40, 50, 60, 70, 60 70, 80, 90];
rayIntxStart = [1, 3, 9];
rayIntxEnd = [2, 8, 12];
gainSum = zeros(1, 5);
tic
% only interested in the intx uniques
intxSegCtr = 1;
% loop through all of the unique segments
for rayCtr = 1:max(intxRays)
if rayCtr == intxRays(intxSegCtr)
startInd = rayIntxStart(intxSegCtr);
endInd = rayIntxEnd(intxSegCtr);
% find which rows correspoond to those segements
gainVals = gainList(startInd:endInd);
gainProd = prod(gainVals);
% get the product of the gains for those voxels
gainSumIdx = intxRays(intxSegCtr);
gainSum(gainSumIdx) = gainProd;
% increment counter
intxSegCtr = intxSegCtr + 1;
end
end
toc
clear all
%loop insted of every single one to max just through the intxRays
intxRays = [2, 4, 5];
gainList = [10, 20, 20, 30, 40, 50, 60, 70, 60 70, 80, 90];
rayIntxStart = [1, 3, 9];
rayIntxEnd = [2, 8, 12];
gainSum = zeros(1, 5);
tic
for rayCtr=1:length(intxRays)
%no if as you just go through them
%intxRays(rayCtr) is the corresponding element
startInd = rayIntxStart(rayCtr);
endInd = rayIntxEnd(rayCtr);
% find which rows correspoond to those segements
gainVals = gainList(startInd:endInd);
gainProd = prod(gainVals);
% get the product of the gains for those voxels and set them to the ray
gainSum(intxRays(rayCtr)) = gainProd;
end
%disp(gainSum);
toc
clear all
%same as above, but down to 1 line so no additional values are generated
intxRays = [2, 4, 5];
gainList = [10, 20, 20, 30, 40, 50, 60, 70, 60 70, 80, 90];
rayIntxStart = [1, 3, 9];
rayIntxEnd = [2, 8, 12];
gainSum = zeros(1, 5);
tic
for rayCtr=1:length(intxRays)
gainSum(intxRays(rayCtr))=prod(gainList(rayIntxStart(rayCtr):rayIntxEnd(rayCtr)));
end
toc

cut vector according to NaN values

data_test is a vector that is populated by numbers with some NaN.
data_test = [NaN, 2, 3, 4, NaN,NaN,NaN, 12 ,44, 34, NaN,5,NaN];
I would like to cut data_test according to the NaNs and create a cell array containing the pieces of data_set in between NaNs.
data_cell{1}=[2 3 4];
data_cell{2}=[12 44 34];
data_cell{3}=[5];
at this point I need to filter these values (this is OK, just as an example the filtered values will be the same of data_test +1)
data_cell{1} -> data_cell_filt{1}
data_cell{2} -> data_cell_filt{2}
data_cell{3} -> data_cell_filt{3}
and put back the filtered values in data_test.
data_cell_filt{1}
data_cell_filt{2} -> data_test
data_cell_filt{3}
in order that data_test is
data_test = [NaN, 3, 4, 5, NaN,NaN,NaN, 13 ,45, 35, NaN, 6, NaN];
ps (data_test in my case is ~20000 elements)

You can do it easily with a loop or use arrayfun like this:
A = [NaN, 2, 3, 4, NaN, NaN, NaN, 13, 45, 35, NaN, 6, NaN]
i1 = find(diff(isnan(A))==-1)+1 %// Index where clusters of numbers begin
i2 = find(diff(isnan(A))==1) %// Index where clusters of numbers end
data_cell_filt = arrayfun(#(x,y)({A(x:y)}),i1,i2 ,'uni', false)

One approch with accumarray and cumsum and diff
%// find the index of regular numbers
idx = find(~isnan(data_test))
%// group the numbers which are adjacent, to some index number
idx1 = cumsum([1,diff(idx)~=1])
%// put all those numbers of same index number into a cell
out = accumarray(idx1.',data_test(idx).',[],#(x) {x.'})
Sample run:
data_test = [NaN, 2, 3, 4, NaN,NaN,NaN, 12 ,44, 34, NaN,5,NaN];
>> celldisp(out)
out{1} =
2 3 4
out{2} =
12 44 34
out{3} =
5

Convolution-based approach:
ind = isnan(data_test);
t = conv(2*x-1, [-1 1], 'same'); %// convolution is like correlation but flips 2nd input
starts = find(t==2); %// indices of where a run of non-NaN's starts, minus 1
ends = find(t==-2); %// indices of where it ends
result = mat2cell(data_test(~ind), 1, ends-starts); %// pick non-NaN's and split

Matlab code runs too slow on three dimensional array

I'm trying to vectorize the following code:
% code before
% code before
% a lot of code before we got to the current comment
%
% houghMatrix holds some values
for i=1:n
for j=1:m
for k = 1:maximalRadius
% get the maximal threshold
if houghMatrix(i,j,k) > getMaximalThreshold(k)
lhs = [j i k];
% verify that the new circle is not listed
isCircleExist = verifyCircleExists(circles,lhs,circleCounter);
% not listed - then we put it in the circles vector
if isCircleExist == 0
circles(circleCounter,:) = [j i k];
fprintf('Circle % d: % d, % d, % d \n', circleCounter, j, i, k);
circleCounter = circleCounter + 1;
end
end
end
end
end
Using tic tac I got the below outputs :
>> x = findCircles(ii);
Circle 1: 38, 38, 35
Circle 2: 89, 51, 34
Circle 3: 72, 66, 11
Circle 4: 33, 75, 30
Circle 5: 90, 81, 31
Circle 6: 54, 96, 26
Elapsed time is 3.111176 seconds.
>> x = findCircles(ii);
Circle 1: 38, 38, 35
Circle 2: 89, 51, 34
Circle 3: 72, 66, 11
Circle 4: 33, 75, 30
Circle 5: 90, 81, 31
Circle 6: 54, 96, 26
Elapsed time is 3.105642 seconds.
>> x = findCircles(ii);
Circle 1: 38, 38, 35
Circle 2: 89, 51, 34
Circle 3: 72, 66, 11
Circle 4: 33, 75, 30
Circle 5: 90, 81, 31
Circle 6: 54, 96, 26
Elapsed time is 3.135818 seconds.
Meaning - average of 3.1 seconds .
I tried to vectorize the code , but the problem is that I need to use
the index i,j,k in the body of the inner for (the 3rd for) .
Any suggestions how to vectorize the code would be greatly appreciated
Thanks
EDIT :
% -- function [circleExists] = verifyCircleExists(circles,lhs,total) --
%
%
function [circleExists] = verifyCircleExists(circles,lhs,total)
MINIMUM_ALLOWED_THRESHOLD = 2;
circleExists = 0;
for index = 1:total-1
rhs = circles(index,:);
absExpr = abs(lhs - rhs);
maxValue = max( absExpr );
if maxValue <= MINIMUM_ALLOWED_THRESHOLD + 1
circleExists = 1;
break
end
end
end

Heres what I think you want to do: For each valid coordinate triplet, you want to check whether there has been a nearby triplet already, otherwise, you add it to the list. This operation can be fully vectorized if there's no possibility of "chaining", i.e. if each cluster of possible candidate voxels can only accomodate one center. In this case, you simply use:
%# create a vector of thresholds
maximalThreshold = getMaximalThreshold(1:maximalRadius);
%# make it 1-by-1-by-3
maximalThreshold = reshape(maximalThreshold,1,1,[]);
%# create a binary array the size of houghMatrix with 1's
%# wherever we have a candidate circle center
validClusters = bsxfun(#gt, houghMatrix, maximalThreshold);
%# get the centroids of all valid clusters
stats = regionprops(validClusters,'Centroid');
%# collect centroids, round to get integer pixel values
circles = round(cat(1,stats.Centroid));
Alteratively, if you want to follow your scheme of selecting valid circles, you can get the ijk indices from validClusters as follows:
[potentialCircles(:,1),potentialCircles(:,2), potentialCircles(:,3)]= ...
sub2ind(size(houghMatrix),find(validClusters));
nPotentialCircles = size(potentialCircles,1);
for iTest = 2:nPotentialCircles
absDiff = abs(bsxfun(#minus,potentialCircles(1:iTest-1,:),potentialCircles(iTest,:)));
if any(absDiff(:) <= MINIMUM_ALLOWED_THRESHOLD + 1)
%# mask the potential circle
potentialCircles(iTest,:) = NaN;
end
end
circles = potentialCircles(isfinite(potentialCircles(:,1)),:);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Replace values with NaN or Inf when certain conditions are met - matlab

Related

Calculate padding for 3D CNN in Pytorch

Converting a matrix loop into an equivalent function

A method to vectorise a call to prod() for lots of arrays of varying length?

cut vector according to NaN values

Matlab code runs too slow on three dimensional array

Categories

Resources