Correct me if I am wrong.
The 'classic' way to pass images through torchvision transforms is to
use Compose as in its doc page. This, however, requires to pass Image input.
An alternative is to use ConvertImageDtype with torch.nn.Sequential. This 'bypasses'
the need for Image, and in my case it is much faster because I work with numpy arrays.
My problem is that results are not identical.
Below is an example with custom Normalize.
I would like to use torch.nn.Sequential (tr) because it is faster for my needs,
but the error compared to Compose (tr2) is very large (~810).
from PIL import Image
import torchvision.transforms as T
import numpy as np
import torch
o = np.random.rand(64, 64, 3) * 255
o = np.array(o, dtype=np.uint8)
i = Image.fromarray(o)
tr = torch.nn.Sequential(
T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(224),
T.ConvertImageDtype(torch.float),
T.Normalize([0.48145466, 0.4578275, 0.40821073], [0.26862954, 0.26130258, 0.27577711]),
)
tr2 = T.Compose([
T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
])
out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())
out2 = tr2(i)
print(((out - out2) ** 2).sum())
The interpolation method seems to matter A LOT, and if I use the default BILINEAR the error is ~7, but I need to use BICUBIC.
The problem seems to lie in ConvertImageDtype vs ToTensor, because if I replace
ToTensor with ConvertImageDtype results are identical (cannot do the other way around
because ToTensor is not a subclass of Module and I cannot use it with nn.Sequential).
However, the following gives identical results
tr = torch.nn.Sequential(
T.ConvertImageDtype(torch.float),
)
tr2 = T.Compose([
T.ToTensor(),
])
out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())
out2 = tr2(i)
print(((out - out2) ** 2).sum())
This means that the interpolation changes something in the results, which matters only
when I use ToTensor vs ConvertImageDtype.
Any input is appreciated.
This is documented here:
The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the antialias parameter, which can help making the output of PIL images and tensors closer.
Passing antialias=True produces almost identical results.
This is interesting because the doc says that
it can be set to True for InterpolationMode.BILINEAR only mode.
Yet, I am using BICUBIC and still works.
Related
I'm working on a very unbalanced dataset for semantic segmentation. The majority case of image's masks have only background (0 pixels) and no feature (1 pixels) leading to a very biased and unefficient Unet networks.
I'm looking for a code to select (from folders) only masks, and corresponding images with at least one feature to segment. Any possible idea?
I think there are a lot of ways to achieve this, but the first one that comes to my mind is to check if there are non-zeros values in your mask.
You forgot to tell us what framework do you use, so let's assume it's python, but you could try something like :
import os
import numpy as np
import cv2
numpy_images_array = list()
numpy_masks_array = list()
for mask_path, img_path in zip(os.listdir(MASKS_DIR), os.listdir(IMG_DIR)):
mask = cv2.imread(mask_path)
if np.any(mask!=0) : # or: if len(np.unique(mask))>1
numpy_masks_array.append(mask)
numpy_images_array.append(cv2.imread(img_path))
I have 6-monthly raster maps of ET-data as tif format for the months from Apr to Sep; and would like to get the average/mean of those 6-maps as a single mean ET-map.
ETmaps_04.tif
ETmaps_05.tif
ETmaps_06.tif
ETmaps_07.tif
ETmaps_08.tif
ETmaps_09.tif
ETmaps_average.tif (I need such a map!)
Any idea?
I prefer doing it using GDAL package in python 3.7. Thanks
I have made a few assumptions here, but given that those are true this should solve your problem.
All data can fit in memory
All images have the same size (and same geotransform)
All images have a single band
You should be able to modify the code in case some of the above assumptions are not true
from osgeo import gdal
import numpy as np
file_paths = ['''List of paths to your files''']
# We build one large np array of all images (this requires that all data fits in memory)
res = []
for f in file_paths:
ds = gdal.Open(f)
res.append(ds.GetRasterBand(1).ReadAsArray()) # We assume that all rasters has a single band
stacked = np.dstack(res) # We assume that all rasters have the same dimensions
mean = np.mean(stacked, axis=-1)
# Finally save a new raster with the result.
# This assumes that all inputs have the same geotransform since we just copy the first
driver = gdal.GetDriverByName('GTiff')
result = driver.CreateCopy('ETmaps_average.tif', gdal.Open(file_paths[0]))
result.GetRasterBand(1).WriteArray(mean)
result = None
Does pyspark's KernelDensity.estimate work correctly on a dataset that is normally distributed? I get an error when I try that. I have filed https://issues.apache.org/jira/browse/SPARK-20803 (KernelDensity.estimate in pyspark.mllib.stat.KernelDensity throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not normally distributed))
Example code:
vecRDD = sc.parallelize(colVec)
kd = KernelDensity()
kd.setSample(vecRDD)
kd.setBandwidth(3.0)
# Find density estimates for the given values
densities = kd.estimate(samplePoints)
When data is NOT Gaussian, I get for e.g.
5.6654703477e-05,0.000100010001,0.000100010001,0.000100010001,.....
For reference, using Scala, for Gaussian data,
Code:
vecRDD = sc.parallelize(colVec)
kd = new KernelDensity().setSample(vecRDD).setBandwidth(3.0)
// Find density estimates for the given values
densities = kd.estimate(samplePoints)
I get:
[0.04113814235801906,1.0994865517293571E-163,0.0,0.0,.....
I faced the same issue and was able to track down the issue to a very minimal test case. If you're using Numpy in Python to generate the data in the RDD, then that's the problem!
import numpy as np
kd = KernelDensity()
kd.setSample(sc.parallelize([0.0, 1.0, 2.0, 3.0])) # THIS WORKS
# kd.setSample(sc.parallelize([0.0, np.float32(1.0), 2.0, 3.0])) # THIS FAILS
kd.setBandwidth(0.35)
kd.estimate([0.0, 1.0])
If this was your issue as well, simply convert the Numpy data to Python base type until the Spark issue is fixed. You can do that by using the np.asscalar function.
I am using Keras (version 2.0.0) and I'd like to make use of pretrained models like e.g. VGG16.
In order to get started, I ran the example of the [Keras documentation site ][https://keras.io/applications/] for extracting features with VGG16:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
The used preprocess_input() function bothers me
(the function does Zero-centering by mean pixel what can be seen by looking at the source code).
Do I really have to preprocess input data (validation/test data) before using a trained model?
a)
If yes, one can conclude that you always have to be aware of what preprocessing steps have been performed during training phase?!
b)
If no: Does preprocessing of validation/test data cause a bias?
I appreciate your help.
Yes you should use the preprocessing step. You can retrain the model without it but the first layers will learn to center your datas so this is a waste of parameters.
If you do not recenter your performances will suffer.
Great thread on reddit : https://www.reddit.com/r/MachineLearning/comments/3q7pjc/why_is_removing_the_mean_pixel_value_from_each/
I'm trying to fit a mixed normal model to some data using scikit-learn's DPGMM algorithm. One of the advantages advertised on [0] is that I don't need to specify the number of components; which is good, because I do not know the number of components in my data. The documentation states that I only need to specify an upper bound. However, it looks very much like that is not true:
>>> data = numpy.random.normal(loc = 0.0, scale = 1.0, size = 1000)
>>> from sklearn.mixture import DPGMM
>>> d = DPGMM(n_components=5)
>>> d.fit(data.reshape(-1,1))
DPGMM(alpha=1.0, covariance_type='diag', init_params='wmc', min_covar=None,
n_components=5, n_iter=10, params='wmc', random_state=None, thresh=None,
tol=0.001, verbose=0)
>>> d.n_components
5
>>> d.means_
array([[-0.02283383],
[ 0.06259168],
[ 0.00390097],
[ 0.02934676],
[-0.05533165]])
As you can see, the fitting reports five components (the upper bound) even for data clearly sampled from just one normal distribution.
Am I doing something wrong? Did I misunderstand something?
Thanks a lot in advance,
Lukas
[0] http://scikit-learn.org/stable/modules/mixture.html#dpgmm
I recently had similar doubts about results of this DPGMM implementation. If you check provided example you notice that DPGMM always return model with n_components, now the trick is to remove redundant components. This can be done with predict function.
Unfortunately this important pice is hidden in comment in code example.
# as the DP will not use every component it has access to
# unless it needs it, we shouldn't plot the redundant components
Perhaps look at using an improved sklearn solution for this kind of problem, namely a Bayesian Gaussian Mixture. With this model, the suggested prior number of components must be given, but once trained, the model assigns weightings to each component, which essentially indicate their relevance. Here is a pretty cool visual demo of BGMM in action.
Once you have experimented with training a few BGMMs on your data, you can get a feel for a sensible estimate to the number of components for your given problem.