How do you "import" image data for MNIST? - neural-network

So I've been using Tensorflow's tutorials for neural networks. I completed the "basic classification" that is essentially just MNIST and have been working on making my own custom variation as a little thought experiment. Everything is pretty self explanatory except putting the datasets into a usable form as the tutorial uses a premade dataset and looks like it cuts some corners. All I would like to know is how to put a colored photo into a usable piece of data. I assume that will just be a 1D array. As a side question, will a neural network lose any effectiveness if a 2d photo is stored in a 1d array if its not a CNN.

Datasets included in Keras are premade and usually preprocessed so that beginner could easily try a hand on them. For using your own images, like for a cat-dog image classification problem, you can place the images in two separate directories, for example,
in images/cats and images/dogs.
Now, we parse each and every image in these directories,
import os
from PIL import Image
master_dir = 'images'
img_dirs = os.listdir( master_dir )
for img_dir in img_dirs:
img_names = os.listdir( os.path.join( master_dir , img_dir ) )
for name in img_names:
img_path = os.path.join( master_dir , img_dir , name )
image = Image.open( img_path ).resize( ( 64 , 64 ) ).convert( 'L' )
# Store this image in an array with its corresponding label
Here. the image will be an array of shape (64, 64 ) which indicates that the image is grayscale. Besides .convert( 'L' ) in the code, we can use .convert( 'RGB' ) to have an image of shape (64,64,3) RGB image.
Now,
Collect all the images and labels in a Python list.
Convert the lists to NumPy arrays.
Store the NumPy arrays in a .npy file using the np.save() method.
In the file which trains the model, load these files using np.load() method and feed them to the model.

Related

Instance annotations in KITTI-360 2D instacne datasets

I am trying to count the instance of the vehicle in each image in KITTI-360 instance segmented dataset. For a trial, I first tried to do it on the single image. But I am getting only one instance value when I run my code. Which means that all the instances of the vehicle class are denoted by only one value in the image. I have attached the code that I used for finding this below.
I want to know why this is? or if I am doing something wrong in my code?
"""
This file is for the verification of the instance confirmation for the pixel values
"""
This file is for the verification of the instance confirmation for the pixel values
"""
#Imports
import os
import numpy as np
import cv2
import json
# Import image from the file location
CWD = os.getcwd()
print(CWD)
instance_folder = os.path.join(CWD, 'image_my_data', "instance")
print(instance_folder)
instance_image_path = os.path.join(instance_folder, "0000004402.png")
print(instance_image_path)
instance_image_array = cv2.imread(instance_image_path)
# print the size of the image for reference
print(instance_image_array.shape)
# Following are pixel values are measured and wanted to see what are the instance values at these pixel locations.
# Pixel location as tuples
pixel_location_1 = (210, 815)
pixel_location_2 = (200, 715)
# print the pixel location, for the above values
print('pixel values at (210, 815)', instance_image_array[pixel_location_1[0], pixel_location_1[1]])
print('pixel values at (200, 715)', instance_image_array[pixel_location_2[0], pixel_location_2[1]])
Note: the values of the pixels that I have taken above I choose by opening the image in paint and noting down the pixel coordinates in x and y in any locations where I can physically see that the two separate instances of the class are present.
Hope someone is able to help me with this.
I found the answer to my own question. The easiest way to find the instance in an image is to read the image using the cv2.imread(image, cv2.IMREAD_ANYDEPTH)
The reason for doing this is, the KITTI-360 images are 8 bit images. So, we can use the regular imread for reading the image as a RGB image but that will not give the correct instance ids. When using the method above will convert the image into a single channel read and that single channel will contain the instance ids of each object.
I hope this helps someone else.

Problem with creating a 3D mask of DICOM-RT contour data using MATLAB

I have troubles extracting a tumor using a RT mask from a dicom image. Due to GDPR I am not allowed to share the dicom images even though they are anonymized. However I am allowed to share the images themself. I want to extract the drawn tumor from the CT images using the draw GTV stored as a RT structure using MATLAB.
Lets say that the file directory where my CT images are stored is called DicomCT and that the RT struct dicom file is called rtStruct.dcm.
I can read and visualize my CT images as follows:
V = dicomreadVolume(“DicomCT”);
V = squeeze(V);
volshow(V)
volume V - 3D CT image
I can load my rt structure using:
Info = dicominfo(“rtStruct.dcm”);
rtContours = dicomContours(Info);
I get the plot giving the different contours.
plotContour(rtContours)
Contours for the GTV of the CT image
I used this link for the information on how to create the mask such that I can apply it to the 3D CT image: https://nl.mathworks.com/help/images/create-and-display-3-d-mask-of-dicom-rt-contour-data.html#d124e5762
The dicom information tells mee the image should be 3mm slices, hence I took 3x3x3 for the referenceInfo.
referenceInfo = imref3d(size(V),3,3,3);
rtMask = createMask(rtContours, 1, referenceInfo)
When I plot my rtMask, I get a grey screen without any trace of the mask. I think that something is wrong with the way that I define the referenceInfo, but I have no idea how to fix it or what is wrong.
volshow(rtMask)
Volume plot of the RT mask
What would be the best way forward?
i was actually having some sort of similar problem to you a couple of days ago. I think you might have two possible problems (none of them your fault).
Your grey screen might be an error rendering that it's not showing because of how the actual volshow() script works. I found it does some things i don't understand with graphics memory and representing numeric type volumes vs logic volumes. I found this the hard way in my job PC where i only have intel HD graphics. Using
iptsetpref('VolumeViewerUseHardware',true)
for logical volumes worked fine for me. You an also test this by trying to replot the mask as double instead of logical by
rtMask = double(rtMask)
volshow(rtMask)
If it's not a rendering error caused by the interactions between your system and volshow() it might be an actual confusion and how the createMask and the actual reference info it needs (created by an actual bad explanation in the tutorial you just linked). Using pixel size info instead of actual axes limits can create partial visualization in segmentation or even missing it bc of scale. This nice person explained more elegantly in this post by using actual geometrical info of the dicom contours as limits.
https://es.mathworks.com/support/search.html/answers/1630195-how-to-convert-dicom-rt-structure-to-binary-mask.html?fq%5B%5D=asset_type_name:answer&fq%5B%5D=category:images/basic-import-and-export&page=1
basically use
plotContour(rtContours);
ax = gca;
referenceInfo = imref3d(size(V),ax.XLim,ax.YLim,ax.ZLim);
rtMask = createMask(rtContours, 1, referenceInfo)
In addition to your code and it might work.
I hope this could be of help to you.

Nib.load() error - Trying to load PNG and DICOM images to be resized for FCNN

Have 40 DICOM and 40 PNG images (data and their masks) for a Fully CNN that are loaded into my Google Drive and have been found by the notebook via the print(os.listdir(...)), as evidenced below in the first block of code where all the names of the 80 data in the above sets are listed.
Also have globbed all of the DICOM and PNG into img_path and mask_path, both with lengths of 40, in the second block of code that is below.
Now attempting to resize all of the images to 256 x 256 before inputting them into the U-net like architecture for segmentation. However, cannot load them via the nib.load() call, as it cannot work out the file type of the DCM and PNG files, even though for the latter it will not error but still produce an empty set of data like the last block of code yields.
Assuming that, once the first couple of lines inside the for loop in the third block of code are rectified, pre-processing should be completed and I can move onto the U-net implementation.
Have the current pydicom running in the Colab notebook and tried it in lieu of the nib.load() call, which produced the same error as the current one.
#import data as data
import pydicom
from PIL import Image
import numpy as np
import glob
import imageio
print(os.listdir("/content/drive/My Drive/Images"))
print(os.listdir("/content/drive/My Drive/Masks"))
pixel_data = []
images = glob.glob("/content/drive/My Drive/Images/IMG*.dcm");
for image in images:
dataset = pydicom.dcmread(image)
pixel_data.append(dataset.pixel_array)
#print(len(images))
#print(pixel_data)
pixel_data1 = [] ----------------> this section is the trouble area <-------
masks = glob.glob("content/drive/My Drive/Masks/IMG*.png");
for mask in masks:
dataset1 = imageio.imread(mask)
pixel_data1.append(dataset1.pixel_array)
print(len(masks))
print(pixel_data1)
['IMG-0004-00040.dcm', 'IMG-0002-00018.dcm', 'IMG-0046-00034.dcm', 'IMG-0043-00014.dcm', 'IMG-0064-00016.dcm',....]
['IMG-0004-00040.png', 'IMG-0002-00018.png', 'IMG-0046-00034.png', 'IMG-0043-00014.png', 'IMG-0064-00016.png',....]
0 ----------------> outputs of trouble area <--------------
[]
import glob
img_path = glob.glob("/content/drive/My Drive/Images/IMG*.dcm")
mask_path = glob.glob("/content/drive/My Drive/Masks/IMG*.png")
print(len(img_path))
print(len(mask_path))
40
40
images=[]
a=[]
for a in pixel_data:
a=resize(a,(a.shape[0],256,256))
a=a[:,:,:]
for j in range(a.shape[0]):
images.append((a[j,:,:]))
No output, this section works fine.
images=np.asarray(images)
print(len(images))
10880
masks=[] -------------------> the other trouble area <-------
b=[]
for b in masks:
b=resize(b,(b.shape[0],256,256))
b=b[:,:,:]
for j in range(b.shape[0]):
masks.append((b[j,:,:]))
No output, trying to solve the problem of how to fix this section.
masks=np.asarray(masks) ------------> fix the above section and this
print(len(masks)) should have no issues
[]
You are trying to load the DICOM files again using nib.load, which does not work, as you already found out:
for name in img_path:
a=nib.load(name) # does not work with DICOM files
a=a.get_data()
a=resize(a,(a.shape[0],256,256))
You already have the data from the DICOM files in the pixel_data list, so you should use these:
for a in pixel_data:
a=resize(a,(a.shape[0],256,256)) # or something similar, depending on the shape of pixel_data
...
Your last loop for mask in masks: is never executed because two lines about it you set masks = [].
It looks like it should to be for mask in mask_path:. mask_path is the list of mask file names.

Ground truth pixel labels in PASCAL VOC for semantic segmentation

I'm experimenting with FCN(Fully Convolutional Network), and trying to reproduce the results reported in the original paper (Long et al. CVPR'15).
In that paper the authors reported results on PASCAL VOC dataset. After downloading and untarring the train-val dataset for 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
), I noticed there are 2913 png files in the SegmentationClass and same number of files in SegmentationObject subdirectory.
The pixel values in these png files seem to be multiples of 32 (e.g. 0, 128, 192, 224...), which don't fall in the range between 0 and 20. I'm just wondering what's the correspondence between the pixel values and ground truth labels for pixels. Or am I looking at the wrong files?
Just downloaded Pascal VOC. The pixel values in the dataset are as follows:
0: background
[1 .. 20] interval: segmented objects, classes [Aeroplane, ..., Tvmonitor]
255: void category, used for border regions (5px) and to mask difficult objects
You can find more info on the dataset here.
The previous answer by captainist discusses png files saved with color palettes, I think it's not related to the original question. The linked tensorflow code simply loads a png that was saved with color map (palette), then converts it to numpy array (at this step the color palette is lost), then saves the array as a png again. The numerical values are not changed in this process, only the color palette is removed.
I know that this question was asked some time ago. But I raised myself a similar question when trying on PASCAL VOC 2012 with tensorflow deeplab.
If you look at the file_download_and_convert_voc2012.sh, there are lines marked by "# Remove the colormap in the ground truth annotations". This part process the original SegmentationClass files and produce the raw segmented image files, which have each pixel value between 0 : 20. (If you may ask why, check this post: Python: Use PIL to load png file gives strange results)
Pay attention to this magic function:
def _remove_colormap(filename):
"""Removes the color map from the annotation.
Args:
filename: Ground truth annotation filename.
Returns:
Annotation without color map.
"""
return np.array(Image.open(filename))
I have to admit that I do not fully understand the operation by
np.array(Image.open(filename))
I have shown here below a set of images for your referece (from above down: orignal image, segmentation class, and segmentation raw class)
The values mentioned in the original question look like the "color map" values, which could be obtained by getpalette() function from PIL Image module.
For the annotated values of the VOC images, I use the following code snip to check them:
import numpy as np
from PIL import Image
files = [
'SegmentationObject/2007_000129.png',
'SegmentationClass/2007_000129.png',
'SegmentationClassRaw/2007_000129.png', # processed by _remove_colormap()
# in captainst's answer...
]
for f in files:
img = Image.open(f)
annotation = np.array(img)
print('\nfile: {}\nanno: {}\nimg info: {}'.format(
f, set(annotation.flatten()), img))
The three images used in the code are shown below (left to right, respectively):
The corresponding outputs of the code are as follows:
file: SegmentationObject/2007_000129.png
anno: {0, 1, 2, 3, 4, 5, 6, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F59538B35F8>
file: SegmentationClass/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F5930DD5780>
file: SegmentationClassRaw/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=L size=334x500 at 0x7F5930DD52E8>
There are two things I learned from the above output.
First, the annotation values of the images in SegmentationObject folder are assigned by the number of objects. In this case there are 3 people and 3 bicycles, and the annotated values are from 1 to 6. However, for images in SegmentationClass folder, their values are assigned by the class value of the objects. All the people belong to class 15 and all the bicycles are class 2.
Second, as mkisantal has already mentioned, after the np.array() operation, the color palette was removed (I "know" it by observing the results but I still don't understand the mechanism under the hood...). We can confirm this by checking the image mode of the outputs:
Both the SegmentationObject/2007_000129.png and SegmentationClass/2007_000129.png have image mode=P while
SegmentationClassRaw/2007_000129.png has image mode=L. (ref: The modes of PIL Image)

Converting PIL image to VIPS image

I'm working on some large histological images using Vips image library. Together with the image I have an array with coordinates. I want to make a binary mask which masks out the part of the image within the polygon created by the coordinates. I first tried to do this using vips draw function, but this is very inefficiently and takes forever (in my real code the images are about 100000 x 100000px and the array of polygons are very large).
I then tried creating the binary mask using PIL, and this works great. My problem is to convert the PIL image into an vips image. They both have to be vips images to be able to use the multiply-command. I also want to write and read from memory, as I believe this is faster than writing to disk.
In the im_PIL.save(memory_area,'TIFF') command I have to specify and image format, but since I'm creating a new image, I'm not sure what to put here.
The Vips.Image.new_from_memory(..) command returns: TypeError: constructor returned NULL
from gi.overrides import Vips
from PIL import Image, ImageDraw
import io
# Load the image into a Vips-image
im_vips = Vips.Image.new_from_file('images/image.tif')
# Coordinates for my mask
polygon_array = [(368, 116), (247, 174), (329, 222), (475, 129), (368, 116)]
# Making a new PIL image of only 1's
im_PIL = Image.new('L', (im_vips.width, im_vips.height), 1)
# Draw polygon to the PIL image filling the polygon area with 0's
ImageDraw.Draw(im_PIL).polygon(polygon_array, outline=1, fill=0)
# Write the PIL image to memory ??
memory_area = io.BytesIO()
im_PIL.save(memory_area,'TIFF')
memory_area.seek(0)
# Read the PIL image from memory into a Vips-image
im_mask_from_memory = Vips.Image.new_from_memory(memory_area.getvalue(), im_vips.width, im_vips.height, im_vips.bands, im_vips.format)
# Close the memory buffer ?
memory_area.close()
# Apply the mask with the image
im_finished = im_vips.multiply(im_mask_from_memory)
# Save image
im_finished.tiffsave('mask.tif')
You are saving from PIL in TIFF format, but then using the vips new_from_memory constructor, which is expecting a simple C array of pixel values.
The easiest fix is to use new_from_buffer instead, which will load an image in some format, sniffing the format from the string. Change the middle part of your program like this:
# Write the PIL image to memory in TIFF format
memory_area = io.BytesIO()
im_PIL.save(memory_area,'TIFF')
image_str = memory_area.getvalue()
# Read the PIL image from memory into a Vips-image
im_mask_from_memory = Vips.Image.new_from_buffer(image_str, "")
And it should work.
The vips multiply operation on two 8-bit uchar images will make a 16-bit uchar image, which will look very dark, since the numeric range will be 0 - 255. You could either cast it back to uchar again (append .cast("uchar") to the multiply line) before saving, or use 255 instead of 1 for your PIL mask.
You can also move the image from PIL to VIPS as a simple array of bytes. It might be slightly faster.
You're right, the draw operations in vips don't work well with very large images in Python. It's not hard to write a thing in vips to make a mask image of any size from a set of points (just combine lots of && and < with the usual winding rule), but using PIL is certainly simpler.
You could also consider having your poly mask as an SVG image. libvips can load very large SVG images efficiently (it renders sections on demand), so you just magnify it up to whatever size you need for your raster images.