Ground truth pixel labels in PASCAL VOC for semantic segmentation - semantic-segmentation

I'm experimenting with FCN(Fully Convolutional Network), and trying to reproduce the results reported in the original paper (Long et al. CVPR'15).
In that paper the authors reported results on PASCAL VOC dataset. After downloading and untarring the train-val dataset for 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
), I noticed there are 2913 png files in the SegmentationClass and same number of files in SegmentationObject subdirectory.
The pixel values in these png files seem to be multiples of 32 (e.g. 0, 128, 192, 224...), which don't fall in the range between 0 and 20. I'm just wondering what's the correspondence between the pixel values and ground truth labels for pixels. Or am I looking at the wrong files?

Just downloaded Pascal VOC. The pixel values in the dataset are as follows:
0: background
[1 .. 20] interval: segmented objects, classes [Aeroplane, ..., Tvmonitor]
255: void category, used for border regions (5px) and to mask difficult objects
You can find more info on the dataset here.
The previous answer by captainist discusses png files saved with color palettes, I think it's not related to the original question. The linked tensorflow code simply loads a png that was saved with color map (palette), then converts it to numpy array (at this step the color palette is lost), then saves the array as a png again. The numerical values are not changed in this process, only the color palette is removed.

I know that this question was asked some time ago. But I raised myself a similar question when trying on PASCAL VOC 2012 with tensorflow deeplab.
If you look at the file_download_and_convert_voc2012.sh, there are lines marked by "# Remove the colormap in the ground truth annotations". This part process the original SegmentationClass files and produce the raw segmented image files, which have each pixel value between 0 : 20. (If you may ask why, check this post: Python: Use PIL to load png file gives strange results)
Pay attention to this magic function:
def _remove_colormap(filename):
"""Removes the color map from the annotation.
Args:
filename: Ground truth annotation filename.
Returns:
Annotation without color map.
"""
return np.array(Image.open(filename))
I have to admit that I do not fully understand the operation by
np.array(Image.open(filename))
I have shown here below a set of images for your referece (from above down: orignal image, segmentation class, and segmentation raw class)

The values mentioned in the original question look like the "color map" values, which could be obtained by getpalette() function from PIL Image module.
For the annotated values of the VOC images, I use the following code snip to check them:
import numpy as np
from PIL import Image
files = [
'SegmentationObject/2007_000129.png',
'SegmentationClass/2007_000129.png',
'SegmentationClassRaw/2007_000129.png', # processed by _remove_colormap()
# in captainst's answer...
]
for f in files:
img = Image.open(f)
annotation = np.array(img)
print('\nfile: {}\nanno: {}\nimg info: {}'.format(
f, set(annotation.flatten()), img))
The three images used in the code are shown below (left to right, respectively):
The corresponding outputs of the code are as follows:
file: SegmentationObject/2007_000129.png
anno: {0, 1, 2, 3, 4, 5, 6, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F59538B35F8>
file: SegmentationClass/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F5930DD5780>
file: SegmentationClassRaw/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=L size=334x500 at 0x7F5930DD52E8>
There are two things I learned from the above output.
First, the annotation values of the images in SegmentationObject folder are assigned by the number of objects. In this case there are 3 people and 3 bicycles, and the annotated values are from 1 to 6. However, for images in SegmentationClass folder, their values are assigned by the class value of the objects. All the people belong to class 15 and all the bicycles are class 2.
Second, as mkisantal has already mentioned, after the np.array() operation, the color palette was removed (I "know" it by observing the results but I still don't understand the mechanism under the hood...). We can confirm this by checking the image mode of the outputs:
Both the SegmentationObject/2007_000129.png and SegmentationClass/2007_000129.png have image mode=P while
SegmentationClassRaw/2007_000129.png has image mode=L. (ref: The modes of PIL Image)

Related

Nib.load() error - Trying to load PNG and DICOM images to be resized for FCNN

Have 40 DICOM and 40 PNG images (data and their masks) for a Fully CNN that are loaded into my Google Drive and have been found by the notebook via the print(os.listdir(...)), as evidenced below in the first block of code where all the names of the 80 data in the above sets are listed.
Also have globbed all of the DICOM and PNG into img_path and mask_path, both with lengths of 40, in the second block of code that is below.
Now attempting to resize all of the images to 256 x 256 before inputting them into the U-net like architecture for segmentation. However, cannot load them via the nib.load() call, as it cannot work out the file type of the DCM and PNG files, even though for the latter it will not error but still produce an empty set of data like the last block of code yields.
Assuming that, once the first couple of lines inside the for loop in the third block of code are rectified, pre-processing should be completed and I can move onto the U-net implementation.
Have the current pydicom running in the Colab notebook and tried it in lieu of the nib.load() call, which produced the same error as the current one.
#import data as data
import pydicom
from PIL import Image
import numpy as np
import glob
import imageio
print(os.listdir("/content/drive/My Drive/Images"))
print(os.listdir("/content/drive/My Drive/Masks"))
pixel_data = []
images = glob.glob("/content/drive/My Drive/Images/IMG*.dcm");
for image in images:
dataset = pydicom.dcmread(image)
pixel_data.append(dataset.pixel_array)
#print(len(images))
#print(pixel_data)
pixel_data1 = [] ----------------> this section is the trouble area <-------
masks = glob.glob("content/drive/My Drive/Masks/IMG*.png");
for mask in masks:
dataset1 = imageio.imread(mask)
pixel_data1.append(dataset1.pixel_array)
print(len(masks))
print(pixel_data1)
['IMG-0004-00040.dcm', 'IMG-0002-00018.dcm', 'IMG-0046-00034.dcm', 'IMG-0043-00014.dcm', 'IMG-0064-00016.dcm',....]
['IMG-0004-00040.png', 'IMG-0002-00018.png', 'IMG-0046-00034.png', 'IMG-0043-00014.png', 'IMG-0064-00016.png',....]
0 ----------------> outputs of trouble area <--------------
[]
import glob
img_path = glob.glob("/content/drive/My Drive/Images/IMG*.dcm")
mask_path = glob.glob("/content/drive/My Drive/Masks/IMG*.png")
print(len(img_path))
print(len(mask_path))
40
40
images=[]
a=[]
for a in pixel_data:
a=resize(a,(a.shape[0],256,256))
a=a[:,:,:]
for j in range(a.shape[0]):
images.append((a[j,:,:]))
No output, this section works fine.
images=np.asarray(images)
print(len(images))
10880
masks=[] -------------------> the other trouble area <-------
b=[]
for b in masks:
b=resize(b,(b.shape[0],256,256))
b=b[:,:,:]
for j in range(b.shape[0]):
masks.append((b[j,:,:]))
No output, trying to solve the problem of how to fix this section.
masks=np.asarray(masks) ------------> fix the above section and this
print(len(masks)) should have no issues
[]
You are trying to load the DICOM files again using nib.load, which does not work, as you already found out:
for name in img_path:
a=nib.load(name) # does not work with DICOM files
a=a.get_data()
a=resize(a,(a.shape[0],256,256))
You already have the data from the DICOM files in the pixel_data list, so you should use these:
for a in pixel_data:
a=resize(a,(a.shape[0],256,256)) # or something similar, depending on the shape of pixel_data
...
Your last loop for mask in masks: is never executed because two lines about it you set masks = [].
It looks like it should to be for mask in mask_path:. mask_path is the list of mask file names.

Interpolation between two images with different pixelsize

For my application, I want to interpolate between two images(CT to PET).
Therefore I map between them like that:
[X,Y,Z] = ndgrid(linspace(1,size(imagedata_ct,1),size_pet(1)),...
linspace(1,size(imagedata_ct,2),size_pet(2)),...
linspace(1,size(imagedata_ct,3),size_pet(3)));
new_imageData_CT=interp3(imagedata_ct,X,Y,Z,'nearest',-1024);
The size of my new image new_imageData_CT is similar to PET image. The problem is that data of my new image is not correct scaled. So it is compressed. I think the reason for that is that the pixelsize between the two images is different and not involved to the interpolation. So for example :
CT image size : 512x512x1027
CT voxel size[mm] : 1.5x1.5x0.6
PET image size : 192x126x128
PET voxel size[mm] : 2.6x2.6x3.12
So how could I take care about the voxel size regarding to the interpolation?
You need to perform a matching in the patient coordinate system, but there is more to consider than just the resolution and the voxel size. You need to synchronize the positions (and maybe the orientations also, but this is unlikely) of the two volumes.
You may find this thread helpful to find out which DICOM Tags describe the volume and how to calculate transformation matrices to use for transforming between the patient (x, y, z in millimeters) and volume (x, y, z in column, row, slice number).
You have to make sure that the volume positions are comparable as the positions of the slices in the CT and PET do not necsesarily refer to the same origin. The easy way to do this is to compare the DICOM attribute Frame Of Reference UID (0020,0052) of the CT and PET slices. For all slices that share the same Frame Of Reference UID, the position of the slice in the DICOM header refers to the same origin.
If the datasets do not contain this tag, it is going to be much more difficult, unless you just put it as an assumption. There are methods to deduce the matching slices of two different volumes from the contents of the pixel data referred to as "registration" but this is a science of its own. See the link from Hugues Fontenelle.
BTW: In your example, you are not going to find a matching voxel in both volumes for each position as the volumes have different size. E.g. for the x-direction:
CT: 512 * 1.5 = 768 millimeters
PET: 192 * 2.6 = 499 millimeters
I'll let to someone else answering the question, but I think that you're asking the wrong one. I lack context of course, but at first glance Matlab isn't the right tool for the job.
Have a look at ITK (C++ library with python wrappers), and the "Multi-modal 3D image registration" article.
Try 3DSlicer (it has a GUI for the previous tool)
Try FreeSurfer (similar, focused on brain scans)
After you've done that registration step, you could export the resulting images (now of identical size and spacing), and continue with your interpolation in Matlab if you wish (or with the same tools).
There is a toolbox in slicer called PETCTFUSION which aligns the PET scan to the CT image.
you can install it in slicer new version.
In the module's Display panel shown below, options to select a colorizing scheme for the PET dataset are provided:
Grey will provide white to black colorization, with black indicating the highest count values.
Heat will provide a warm color scale, with Dark red lowest, and white the highest count values.
Spectrum will provide a warm color scale that goes cooler (dark blue) on the low-count end to white at the highest.
This panel also provides a means to adjust the window and level of both PET and CT volumes.
I normally use the resampleinplace tool after the registration. you can find it in the package: registration and then, resample image.
Look at the screensht here:
If you would like to know more about the PETCTFUSION, there is a link below:
https://www.slicer.org/wiki/Modules:PETCTFusion-Documentation-3.6
Since slicer is compatible with python, you can use the python interactor to run your own code too.
And let me know if you face any problem

To imcrop non-textual PDF for Mathematica?

I want to prepare one selection of data from my high-quality PDF document which has no textual elements (just a plot), prepared originally by Matlab.
I do not want to give the whole picture for my collegues because it is too overwhelming.
#1 Tools in Matlab
I know this thread How can I read an image file that is stored in PDF format (much like reading a jpeg file with I = imread('image.jpg')? but I have got denying experiences from my colleagues and to my task PDF should be enough because my data is just a high-quality plot without textual elements.
Most relevant thread is this one How to extract data from pdf file in matlab?
Most attempts are based on extracting PDF to TXT, like How to Read PDF file in Matlab? about pdftotext.
I want now imcrop the PDF such that the output could be used in the time-series analysis of Mathematica here, but I did not find that the default imcrop tool of Matlab is supporting PDF, Crop an Image.
Some findings
Show and Save as PDF based on the answer. I do pdf = Import[filename.pdf]; Show[pdf[[1]], PlotRange -> {{50, 200}, {100, 300}}] and I see a good selected picture in Image viewer, but failure when exporting the picture back to Mathematica seeing the complete picture. Why? PlotRange does not crop but only put a white mask on the top of the picture which can separated etc in Mathematica.
Going from Show to ImageCrop based on this answer. Wrong approach, confusion with ImageTake.
Going from Show to ImageTake based on this answer.
The Show and ImageTake are not injective to each other because ImageTake has at least reversed order of parameters {ymin,ymax}, {xmin,xmax} according to the manual. However, I could not manage to select the correct selection by just reversing the parameters. Why?
Comments for Mathematica
It would be nice if the regions selected would correspond to each other.
Therefore, I would like to have some visual tool to select appropriate area from the figure.
I notice there occurs some aliasing when enlarging the original image.
It would be nice to know how Mathematica handles such cases with ImageTake.
How can you prepare imcrop of PDF image for the time-series toolbox of Mathematica?
I think this question is about image extraction.
However, I extended the question to the thread Better Colormap of Matlab and Image Extraction for Time-Series Toolbox of Mathematica? for Mathematica.
Mathematica will import your pdf as a graphic object which you can 'crop' using plotrange.
pdf = Import[filename.pdf];
Show[pdf[[1]], PlotRange -> {{50, 200}, {100, 300}}]
note the values are {{xmin,xmax},{ymin,ymax}} in "points"
You can also rasterize and then use ImageTake
ImageTake[Rasterize[pdf[[1]]], {10, 100}, {20, 100}]
here the values are {ymin,ymax} , {xmin,xmax} (note the reverse order )
Note the [[1]] here is effectively the page number. I'm pretty sure Import returns a list of pages even if the pdf is a single page.
If you want to actually extract plot data that's a whole other question. For that I'd suggest mathematica.stackexchange.com and provide an example file.

Using dicomwrite with color images

I am trying to write a sequence of color images to a dicom file in Matlab. Each image is of type uint16. The sequence is stored in a 4D matrix named output of size 200x360x3x360 (num of rows x num of cols x num of channels x num of images). When I execute dicomwrite(output,'outputfile.dcm'), it gives the following error:
It says data bit depth is 8 but I've ensured that each image is 16-bit. Not sure what's going wrong.
The documentation for dicomwrite says it can write color images as well. In fact dicomread can read color dicom images such that the size of the matrix which stores the read data is 200x360x3x360. So I guess it should be possible to write color images as well using dicomwrite. Any help in this regard is appreciated. There is a related post but it doesn't talk about color image sequence.
The comment by JohnnyQ is correct.
From this page down in section A.8.5.4, they list the Multi-frame True Color SC Image IOD Content Constraints (partial list quote):
In the Image Pixel Module, the following constraints apply:
Samples per Pixel (0028,0002) shall be 3
Bits Allocated (0028,0100) shall be 8
Bits Stored (0028,0101) shall be 8
High Bit (0028,0102) shall be 7
Pixel Representation (0028,0103) shall be 0
It seems matlab will not do the conversion for you, so you should down convert each 16-bit color channel to 8-bit for DICOM

Dicom: Matlab versus ImageJ grey level

I am processing a group of DICOM images using both ImageJ and Matlab.
In order to do the processing, I need to find spots that have grey levels between 110 and 120 in an 8 bit-depth version of the image.
The thing is: The image that Matlab and ImageJ shows me are different, using the same source file.
I assume that one of them is performing some sort of conversion in the grey levels of it when reading or before displaying. But which one of them?
And in this case, how can I calibrate do so that they display the same image?
The following image shows a comparison of the image read.
In the case of the imageJ, I just opened the application and opened the DICOM image.
In the second case, I used the following MATLAB script:
[image] = dicomread('I1400001');
figure (1)
imshow(image,[]);
title('Original DICOM image');
So which one is changing the original image and if that's the case, how can I modify so that both version looks the same?
It appears that by default ImageJ uses the Window Center and Window Width tags in the DICOM header to perform window and level contrast adjustment on the raw pixel data before displaying it, whereas the MATLAB code is using the full range of data for the display. Taken from the ImageJ User's Guide:
16 Display Range of DICOM Images
With DICOM images, ImageJ sets the
initial display range based on the Window Center (0028, 1050) and
Window Width (0028, 1051) tags. Click Reset on the W&L or B&C window and the display range will be set to the minimum and maximum
pixel values.
So, setting ImageJ to use the full range of pixel values should give you an image to match the one displayed in MATLAB. Alternatively, you could use dicominfo in MATLAB to get those two tag values from the header, then apply window/leveling to the data before displaying it. Your code will probably look something like this (using the formula from the first link above):
img = dicomread('I1400001');
imgInfo = dicominfo('I1400001');
c = double(imgInfo.WindowCenter);
w = double(imgInfo.WindowWidth);
imgScaled = 255.*((double(img)-(c-0.5))/(w-1)+0.5); % Rescale the data
imgScaled = uint8(min(max(imgScaled, 0), 255)); % Clip the edges
Note that 1) double is used to convert to double precision to avoid integer arithmetic, 2) the data is assumed to be unsigned 8-bit integers (which is what the result is converted back to), and 3) I didn't use the variable name image because there is already a function with that name. ;)
A normalized CT image (e.g. after the modality LUT transformation) will have an intensity value ranging from -1024 to position 2000+ in the Hounsfield unit (HU). So, an image processing filter should work within this image data range. On the other hand, a RGB display driver can only display 256 shades of gray. To overcome this limitation, most typical medical viewers apply Window Leveling to create a view of the image where the anatomy of interest has the proper contrast to display in the RGB display driver (mapping the image data of interest to 256 or less shades of gray). One of the ways to define the Window Level settings is to use Window Center (0028,1050) and Window Width (0028,1051) tags. Also, a single CT image can have multiple Window Level values and each pair is basically a view of the anatomy of interest. So using view data for image processing, instead actual image data, may not produce consistent results.