Pillow Image.convert multiframe tiff mode 1 to 'RGB' - not saving all frames - python-imaging-library

I have a multiframe tiff image with mode 1 that I want to convert to a multiframe tiff with mode 'RGB'. It is only saving a single frame in the output. Am I missing something?
def q(file_path='test.tiff'):
with Image.open(file_path) as image:
if image.mode != 'RGB':
n = image.convert('RGB')
n.save(fp='new.tiff', format="TIFF", save_all=True, compression="None")
return

You need to iterate over the sequence of frames. See here.
Code attributable to above link:
from PIL import Image
with Image.open("animation.gif") as im:
im.seek(1) # skip to the second frame
try:
while 1:
im.seek(im.tell() + 1)
# do something to im
except EOFError:
pass # end of sequence

Related

Determining the size of a PNG from raw bytes

I'm trying to load a sequence of PNG images concatenated together as bytes; I know the number of images, but not their individual file sizes.
Loading the first image is easy with something like
import io
from PIL import Image
buffer = io.BytesIO(image_bytearray)
image = Image.open(buffer)
image.load()
However, I'm not sure how to handle the subsequent images. Two approaches that seem to work but might be too brittle:
Split the bytes based on the PNG header e.g. image_bytearray.split(b"\x89PNG\r\n\x1a\n"). This seems to work, but I'm worried in some edge cases this could appear in a non-header location.
Use BytesIO.tell() to determine how much of the stream was read. This appears to be 4 bytes less than the actual file size; I can add 4 to account for this, but I'm not this won't change in a later version.
Here's a simple example that illustrates the two approaches:
import io
import sys
from PIL import Image
import PIL
def setup_bytes():
"""
Concatenate the bytes of some images. The actual images don't matter
"""
files = ["Tests/images/hopper.png", "Tests/images/test-card.png"]
bytes_out = bytes()
for file in files:
im = Image.open(file)
buffer = io.BytesIO()
im.save(buffer, format="PNG")
print(f"writing {len(buffer.getvalue())} bytes")
bytes_out += buffer.getvalue()
return bytes_out
def read_split(bytes_in):
png_header = b"\x89PNG\r\n\x1a\n"
images_out = []
image_byte_splits = bytes_in.split(png_header)
for image_bytes in image_byte_splits:
if len(image_bytes) == 0:
continue
# add back the header
image = Image.open(io.BytesIO(png_header + image_bytes))
image.load()
print(f"read {len(png_header) + len(image_bytes)} bytes")
images_out.append(image)
return images_out
def read_streaming(bytes_in):
images_out = []
bytes_read = 0
# Read the images back from the bytes (without knowing the sizes).
while bytes_read < len(bytes_in):
buffer = io.BytesIO(bytes_in[bytes_read:])
image = Image.open(buffer)
image.load()
images_out.append(image)
# Start the next read at the end of the current image.
# These extra 4 bytes appear necessary.
read = buffer.tell() + 4
print(f"read {read} bytes?")
bytes_read += read
return images_out
def main():
print(f"python sys.version = {sys.version}")
print(f"Pillow version = {PIL.__version__}")
b = setup_bytes()
read_split(b)
read_streaming(b)
main()
My questions are:
Is splitting safe? Is there a chance that the header could also appear in the body of an image?
Is adding an offset to tell() safe? Is there a way to get the image loading to leave the position at the actual end of the file?
Is there a better way to do this in general? Some of the classes in PngImagePlugin.py look like they'd be useful to examine the chunks without actually decompressing.

loading images into an array python

i am trying to read my own images which are 28 x 28 dimension: the images are stored in a folder called, my_own_images. and the image name is 2828_my_own_3.png, 2828_my_own_7.png etc...
i am using the imagio.imread(image_file_name, as_gray = True). However i get an error for the as_gray. I am trying to convert them into grey scale
****THE CODE IS BELOW****
*import imageio
import glob
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
my_dataset = []
for image_file_name in glob.glob('my_own_images/2828_my_own_?.png'):
print ("loading ... ", image_file_name)
# use the filename to set the correct label
label = int(image_file_name[-5:-4])
# load image data from png files into an array
img_array = imageio.imread(image_file_name)
print(img_array.shape)
# reshape from 28x28 to list of 784 values, invert values
img_data = 255.0 - img_array.reshape(784)
# then scale data to range from 0.01 to 1.0
img_data = (img_data / 255.0 * 0.99) + 0.01
print(numpy.min(img_data))
print(numpy.max(img_data))
# append label and image data to test data set
record = numpy.append(label,img_data)
print(record)
my_dataset.append(record)
pass*
The ERROR im getting:
open() got an unexpected keyword argument 'as_gray'
It seems we are reading the same book (Make Your Own Neural Network), I have encountered the same error with anaconda (I have imageio 2.2.0 Install), so I updated the imageio to 2.3.0, re-launch jupyter Notebook, re-run the code again it work for me. (hope it help you)

Traffic Sign detection and Recognition using Neural networks

I wanted to detect and recognize traffic signs from a video feed. I used the Tensorflow ML framework for recognition of signs and used haar classifier for detection of signs.
Here is the code:
import cv2
import numpy as np
import tensorflow as tf
import os,time
import threading
# constants
IMAGE_SIZE = 200.0
MATCH_THRESHOLD = 3
def SignRecognizer():
#to neglect all tensorflow compilation warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
#path to the blob
image_path='/root/Desktop/blob.jpg'
#read the image data
image_data = tf.gfile.FastGFile(image_path,'rb').read()
#load label file,strip off carriage return \n
label_lines= [line.rstrip() for line in tf.gfile.GFile("/root/Desktop/another_model/retrained_labels.txt")]
#unpersists graph from file
with tf.gfile.FastGFile("/root/Desktop/another_model/retrained_graph.pb",'rb') as f:
graph_def=tf.GraphDef()
graph_def.ParseFromString(f.read())
_=tf.import_graph_def(graph_def,name='')
with tf.Session() as sess:
#feed the image_data as input to the graph and get the first prediction
softmax_tensor=sess.graph.get_tensor_by_name("final_result:0")
predictions = sess.run(softmax_tensor,\
{'DecodeJpeg/contents:0':image_data})
#sort to show labels of first prediction in order of confidence
top_k=predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string=label_lines[node_id]
print("%s"%(human_string))
break
roundabout_cascade = cv2.CascadeClassifier("/root/Desktop/tsp/haarcascade_roundabout.xml")
videocapture = cv2.VideoCapture(0)
scale_factor=1.3
while 1:
ret,pic = videocapture.read()
# do roundabout detection on street image
gray = cv2.cvtColor(pic,cv2.COLOR_RGB2GRAY)
signs = roundabout_cascade.detectMultiScale(pic,scaleFactor=1.4,minNeighbors=6)
# initialize ORB and BFMatcher
orb = cv2.ORB_create()
bf = cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)
# find the keypoints and descriptors for roadsign image
roadsign = cv2.imread("/root/Desktop/tsp/roundabout.jpg",0)
kp_r,des_r = orb.detectAndCompute(roadsign,None)
for (x,y,w,h) in signs:
#cv2.rectangle(pic,(x,y),(x+w,y+h),(255,0,0),2)
# obtain object from street image
obj = gray[y:y+h,x:x+w]
color_image=pic[y:y+h,x:x+w]
cv2.imwrite("/root/Desktop/blob.jpg",color_image)
cv2.imshow('blob', color_image)
#start a new thread and run SignRecognizer on it
t=threading.Thread(name="SignRecognizer",target=SignRecognizer)
#set the thread as a daemon to prevent blocking of the main program
t.setDaemon(True)
t.start()
ratio = IMAGE_SIZE / obj.shape[1]
obj = cv2.resize(obj,(int(IMAGE_SIZE),int(obj.shape[0]*ratio)))
# find the keypoints and descriptors for object
kp_o, des_o = orb.detectAndCompute(obj,None)
if len(kp_o) == 0 or des_o == None:
continue
# match descriptors
matches = bf.match(des_r,des_o)
# draw object on street image, if threshold met
if(len(matches) >= MATCH_THRESHOLD):
cv2.rectangle(pic,(x,y),(x+w,y+h),(255,0,0),2)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(pic,'Roundabout sign',(x,y),font,1,(255,255,255),1,cv2.LINE_AA)
cv2.imshow('roundabout_signs',pic)
k = cv2.waitKey(30) & 0xFF
if k==2:
break
cv2.waitKey(0)
cv2.destroyAllWindows()
The SignRecognizer function reads the blob image file and recognizes the sign using the model I created using tensorflow ML Framework.
I used VideoCapture(0) to start the webcam and simulate a live video feed.
I also used OpenCV's ORB ( Oriented FAST and rotated BRIEF) to remove false positives.
I used threading module to run the SignRecognizer on another thread and set it as a daemon so that the main pgm. wasn't blocked during recognition.
Everything works great but there seems to be a little lag inspite of using threading module.Is there any way to make it lag free?

Test labels for regression caffe, float not allowed?

I am doing regression using caffe, and my test.txt and train.txt files are like this:
/home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0
/home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6
/home/foo/caffe/data/finetune/flickr/3208038920.jpg 3.2
/home/foo/caffe/data/finetune/flickr/6170430622.jpg 4.0
/home/foo/caffe/data/finetune/flickr/7508671542.jpg 2.7272
My problem is it seems caffe does not allow float labels like 2.0, when I use float labels while reading, for example the 'test.txt' file caffe only
recognizes
a total of 1 images
which is wrong.
But when I for example change the 2.0 to 2 in the file and the following lines same, caffe now gives
a total of 2 images
implying that the float labels are responsible for the problem.
Can anyone help me here, to solve this problem, I definitely need to use float labels for regression, so does anyone know about a work around or solution for this? Thanks in advance.
EDIT
For anyone facing a similar issue use caffe to train Lenet with CSV data might be of help. Thanks to #Shai.
When using the image dataset input layer (with either lmdb or leveldb backend) caffe only supports one integer label per input image.
If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.
In python you can use h5py package to create hdf5 files.
import h5py, os
import caffe
import numpy as np
SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
# transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
Once you have all h5 files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt:
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset tool.
At closer inspection, it seems like caffe stores data of type Datum in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.
Shai's answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here's a snippet on how to create an LMDB from float data (adapted from this github comment):
import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):
db = lmdb.open(path_dst, map_size=int(1e12))
with db.begin(write=True) as in_txn:
for idx, x in enumerate(scalars):
content_field = np.array([x])
# get shape (1,1,1)
content_field = np.expand_dims(content_field, axis=0)
content_field = np.expand_dims(content_field, axis=0)
content_field = content_field.astype(float)
dat = caffe.io.array_to_datum(content_field)
in_txn.put('{:0>10d}'.format(idx) dat.SerializeToString())
db.close()
I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.
First read the image as unsigned ints:
img = np.array(Image.open('images/' + image_name))
Then change the channel order from RGB to BGR:
img = img[:, :, ::-1]
Finally, switch from Height x Width x Channels to Channels x Height x Width:
img = img.transpose((2, 0, 1))
Merely changing the shape will scramble your image and ruin your data!
To read back the image:
with h5py.File(h5_filename, 'r') as hf:
images_test = hf.get('images')
targets_test = hf.get('targets')
for i, img in enumerate(images_test):
print(targets_test[i])
from skimage.viewer import ImageViewer
viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
viewer.show()
Here's a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0
Besides #Shai's answer above, I wrote a MultiTaskData layer supporting float typed labels.
Its main idea is to store the labels in float_data field of Datum, and the MultiTaskDataLayer will parse them as labels for any number of tasks according to the value of task_num and label_dimension set in net.prototxt. The related files include: caffe.proto, multitask_data_layer.hpp/cpp, io.hpp/cpp.
You can easily add this layer to your own caffe and use it like this (this is an example for face expression label distribution learning task in which the "exp_label" can be float typed vectors such as [0.1, 0.1, 0.5, 0.2, 0.1] representing face expressions(5 class)'s probability distribution.):
name: "xxxNet"
layer {
name: "xxx"
type: "MultiTaskData"
top: "data"
top: "exp_label"
data_param {
source: "expression_ld_train_leveldb"
batch_size: 60
task_num: 1
label_dimension: 8
}
transform_param {
scale: 0.00390625
crop_size: 60
mirror: true
}
include:{ phase: TRAIN }
}
layer {
name: "exp_prob"
type: "InnerProduct"
bottom: "data"
top: "exp_prob"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 8
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "exp_loss"
type: "EuclideanLoss"
bottom: "exp_prob"
bottom: "exp_label"
top: "exp_loss"
include:{ phase: TRAIN }
}

sws_scale screws up last pixel row in smaller x264 mp4 encoding

I am muxing pictures in the PIX_FMT_ARGB format into an mp4 video.
All of it works well, except that the last pixel row of the outgoing image is screwed up, in most cases the last row is completely black, sometimes there are other colors, it seems somehow dependant on the machine it runs on.
I am absolutely sure that the error must be in sws_scale, as I am saving the images before and after the scaling. The input images do not have the error, but after the sws_scale() I save the yuv image and the error is apparent.
Here is an example:
Original
Yuvfile (after sws_scale)
At the bottom of the Yuvfile, you will see the black row.
This is how I do the scaling (it is after the official ffmpeg examples, more or less):
static int sws_flags = SWS_FAST_BILINEAR | SWS_ACCURATE_RND;
if (img_convert_ctx == NULL)
{
img_convert_ctx = sws_getContext( srcWidth, srcHeight,
PIX_FMT_ARGB,
codecContext->width, codecContext->height,
codecContext->pix_fmt,
sws_flags, NULL, NULL, NULL );
if (img_convert_ctx == NULL)
{
av_log(c, AV_LOG_ERROR, "%s","Cannot initialize the conversion context\n");
exit(1);
}
}
fill_image(tmp_picture, pic, pic_size, frame_count, ptr->srcWidth, ptr->srcHeight );
sws_scale(img_convert_ctx, tmp_picture->data, tmp_picture->linesize,
0, srcHeight, picture->data, picture->linesize);
I also tried a number of different SWS_ flags, but all yield the same result.
Could this be a bug in sws_scale or am I doing something wrong? I am using the latest version of the ffmpeg libraries.
The problem was this function:
fill_image(tmp_picture, pic, pic_size, frame_count, ptr->srcWidth, ptr->srcHeight );
It did not copy the input image to the tmp_picture correctly. Indeed skipped the last line.
Morale: Do not trust years-old functions :D
180 is not a multiple of 8, this could be the reason for the black row. Can you try scaling it to the nearest multiple of 8,say 184 or 192(multiple of 16)? Non h264 codecs need multiple of 8 as height.