UNet Segmentation, chest xray lung segmentation using Unet - semantic-segmentation

I want to segment a chest x ray images for lung using UNet model, I have image dataset in which size of each image is (256, 256). When i am using the Unet model for segment it, it is showing error:
ValueError: Exception encountered when calling layer "model" (type Functional).
Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=3. Full shape received: (None, 256, 256)
Call arguments received by layer "model" (type Functional):
• inputs=tf.Tensor(shape=(None, 256, 256), dtype=uint8)
• training=True
• mask=None
What is the solution? The UNet model which i am using valid for dimension of (height, width, channel).
`#Build the model
inputs = tf.keras.layers.Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
s = tf.keras.layers.Lambda(lambda x: x / 255)(inputs)
#Contraction path
c1 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(s)
c1 = tf.keras.layers.Dropout(0.1)(c1)
c1 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c1)
..........................`
I tried using the Unet with dimension (height, width) in that case it is showing error:
ValueError: Input 0 of layer "conv2d_40" is incompatible with the layer: expected min_ndim=4, found ndim=3. Full shape received: (None, 256, 256)
Please help, thanks`

Related

How can i get finger-prints through image using flutter

i am using opencv package https://pub.dev/packages/opencv_4 but can not get fingertprints clearly from image.
i am using this function.
Uint8List? _byte = await Cv2.morphologyEx(
pathFrom: CVPathFrom.GALLERY_CAMERA,
pathString: photoFinger.path,
operation: Cv2.COLOR_BayerGB2RGB ,
kernelSize: [30, 30],
);
i have python code using openvcv lib but dont know how to convert it into dart. using openCv dart package.
python code is:
import cv2
# Read the input image
img = cv2.imread('input_image.jpg', cv2.IMREAD_GRAYSCALE)
# Apply Gaussian blur to remove noise
img = cv2.GaussianBlur(img, (5, 5), 0)
# Apply adaptive thresholding to segment the fingerprint
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 11, 5)
# Apply morphological operations to remove small objects and fill in gaps
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel, iterations=1)
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel, iterations=1)
# Estimate the orientation field of the fingerprint
sobel_x = cv2.Sobel(img, cv2.CV_32F, 1, 0, ksize=3)
sobel_y = cv2.Sobel(img, cv2.CV_32F, 0, 1, ksize=3)
theta = cv2.phase(sobel_x, sobel_y)
# Apply non-maximum suppression to thin the ridges
theta_quantized = np.round(theta / (np.pi / 8)) % 8
thin = cv2.ximgproc.thinning(img, thinningType=cv2.ximgproc.THINNING_ZHANGSUEN, thinningIterations=5)
# Extract minutiae points from the fingerprint
minutiae = cv2.ximgproc.getFastFeatureDetector().detect(thin)
# Display the output image with minutiae points
img_with_minutiae = cv2.drawKeypoints(img, minutiae, None, color=(0, 255, 0), flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('Output Image', img_with_minutiae)
cv2.waitKey(0)
cv2.destroyAllWindows()

How to make custom 'any object' Cascade (.xml) for opencv-python?

I want to make a haar cascade so that I can use it to detect a object in opencv-python.For eg, I want to detect a watch. I tried making a cascade using cascade trainer gui but it isn't giving me expected results.
Well, before training, search through the internet. Maybe the object you want to detect has already been trained, so you don't need to train again.
For example, you want to detect a watch. The haar-file is available here.
So I used the file whether it is working or not, the result is:
Code:
import cv2
w_cascade = cv2.CascadeClassifier('watchcascade10stage.xml')
cap = cv2.VideoCapture(0)
while True:
ret, img = cap.read()
if ret:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
w = w_cascade.detectMultiScale(image=gray,
scaleFactor=1.3
minNeighbors=50)
for (x, y, w, h) in watches:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 0), 2)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'Watch', (x - w, y - h), font, 0.5, (11, 255, 255), 2, cv2.LINE_AA)
cv2.imshow('img', img)
k = cv2.waitKey(0) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
You can find other tutorial searching through the internet. For instance start with this video
So the thing is Haar Cascade is not a detector or even a classifier. It is a feature extractor IF you are going to use Haar Cascade you will use it in conjunction with SVM (support vector machines) for classification and then implement a sliding window to detect watches.
So the steps are a fallowed.
1 Extract a patch of images using sliding window.
2 pass it to SVM trained on Haar Cascade
3 Draw rect if prediction is true
I recommend this tutorial series https://pythonprogramming.net/haar-cascade-object-detection-python-opencv-tutorial/.please do reach out to me if you still need help.

Cropping layer in my keras model has zero dimensions

I create simple model with keras to understand the cropping layer
def other_model():
x = keras.Input(shape = (64,64,3))
conv = keras.layers.Conv2D(5, 2)(x)
crop = keras.layers.Cropping2D(cropping = 32)(conv)
model = keras.Model(x,crop)
model.summary()
return model
But I get the following summary
Layer (type) Output Shape Param #
input_12 (InputLayer) (None, 64, 64, 3) 0
conv2d_21 (Conv2D) (None, 63, 63, 5) 65
cropping2d_13 (Cropping2D) (None, 0, 0, 5) 0
Total params: 65
Trainable params: 65
Non-trainable params: 0
Why are the 1st and the 2nd dimensions of Cropping2D equal to zero?
They are supposed to be 32
You can just choose the number of pixels which will be cut off at every side of your image. I would chose it bigger or equal than the half size of the image, so it didn't work
It is a bit unclear in the documentation, but if you give a single integer value (cropping=32) as parameter, it crops off 32 pixels on each side of the image.
If you have an image with 64x64 pixels and cropping=32, the target size therefore will be 0x0 pixels...
If you want to have a target size of 32x32 pixels, you have to give cropping=16

Can I use rectangle images with a convolution neural network in Keras?

Say I'd like to use Keras's Convolutional2D function to build a CNN, can the input image be of size [224, 320, 3] instead of something like [224, 224, 3]?
Should I keep my images in their rectangle format or scale them to be square? I've tried making them into squares but the quality is greatly diminished + there is important data around the edges.
If I build it with rectangle input images, will it end up breaking down the line?
I'd also like to attach a decoder onto the end of the CNN to output the images in the same shape (essentially a VAE with rectangle images not squares).
Can I use Conv2D on arbitrary rectangles?
The short answer is yes. One of the big reasons that squares are used is that the math for the maxpooling/strides/padding is easy if it is exactly the same for both height and width. It just makes it easy. In the case of 224, you could use conv2d with padding=same, followed by maxpool several times to decrease both the height and width from 224, to 112, then 56, 28, 14, then finally 7.
When you do that with an input image of 224x320, then the progressing of reductions is as follows: 224x320, 112x160, 56x80, 28x40, 14x20, 7x10. Not a big deal, and it worked out pretty well. If instead the image was 224x300, it wouldn't get far before second dimension didn't divide nicely.
Here is some code in tensorflow for the encoder side of an autoencoder
import tensorflow as tf
import numpy as np
encoder = tf.keras.models.Sequential([
tf.keras.layers.InputLayer([224,320,3]),
tf.keras.layers.Conv2D(filters=16, kernel_size=5, padding='same', activation='tanh'),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Conv2D(filters=16, kernel_size=5, padding='same', activation='tanh'),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Conv2D(filters=16, kernel_size=5, padding='same', activation='tanh'),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Conv2D(filters=16, kernel_size=5, padding='same', activation='tanh'),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Conv2D(filters=16, kernel_size=5, padding='same', activation='tanh'),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Conv2D(filters=32, kernel_size=5, padding='same', activation='tanh'),
])
data = np.zeros([1,224,320,3], dtype=np.float32)
print( encoder.predict(data).shape )
The output is
(1, 7, 10, 32)
The reverse can be used to make a decoder

Why VGG-16 takes input size 512 * 7 * 7?

According to https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
I don`t understand why VGG models take 512 * 7 * 7 input_size of fully-connected layer.
Last convolution layer is
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2, dilation=1)
Codes in above link.
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
To understand this you have to know how the convolution operator works for CNNs.
nn.Conv2d(512, 512, kernel_size=3, padding=1) means that the input image to that convolution has 512 channels and that the output after the convolution is gonna be also 512 channels. The input image is going to be convolved with a kernel of size 3x3 that moves as a sliding window. Finally, the padding=1 means that before applying the convolution, we symmetrically add zeroes to the edges of the input matrix.
In the example you are saying, you can think that 512 is the depth while 7x7 is the width and height that is obtained by applying several convolutions. Imagine that we have an image with some width and height and we feed it to a convolution, the resulting size will be
owidth = floor(((width + 2*padW - kW) / dW) + 1)
oheight = floor(((height + 2*padH - kH) / dH) + 1)
where height and width are the original sizes, padW and padH are height and width (horizontal and vertical) padding, kW and kH are the kernel sizes and dW and dH are the width and height (horizontal and vertical) pixels that the kernel moves (i.e. if it is dW=1 first the kernel will be at pixel (0,0) and then move to (1,0) )
Usually the first convolution operator in a CNN looks like: nn.Conv2d(3, D, kernel_size=3, padding=1) because the original image has 3 input channels (RGB). Assuming that the input image has a size of 256x256x3 pixels if we apply the operator as defined before, the resulting image has the same width and height as the input image but its depth is now D. Simarly if we define the convolution as c = nn.Conv2d(3, 15, kernel_size=25, padding=0, stride=5) with kernel_size=25, no padding in the input image and with stride=5 (dW=dH=5, which means that the kernel moves 5 pixels each time if we are at (0,0) then it moves to (5,0), until we reach the end of the image on the x-axis then it moves to (0,5) -> (5,5) -> (5,15) until it reaches the end again) the resulting output image will have a size of 47x47xD
The VGG neural net has two sections of layers: the "feature" layer and the "classifier" layer. The input to the feature layer is always an image of size 224 x 224 pixels.
The feature layer has 5 nn.MaxPool2d(kernel_size=2, stride=2) convolutions. See referenced source code line 76: each 'M' character in the configurations sets up one MaxPool2d convolution.
A MaxPool2d convolution with these specific parameters reduces the tensor size in half. So we have 224 --> 112 --> 56 --> 28 --> 14 --> 7 which means that the output of the feature layer is a 512 channels * 7 * 7 tensor. This is the input to the "classifier" layer.