Tensorflow tf.train.Saver saves suspiciously large .ckpt files? - neural-network

I'm working with a reasonably sized net (1 convolutional layer, 2 fully connected layers). Every time I save variables using tf.train.Saver, the .ckpt files are half a gigabyte each of disk space (512 MB to be exact). Is this normal? I have a Caffe net with the same architecture that requires only a 7MB .caffemodel file. Is there a particular reason why Tensorflow saves such large file sizes?
Many thanks.

Hard to tell how large your net is from what you've described -- the number of connections between two fully connected layers scales up quadratically with the size of each layer, so perhaps your net is quite large depending on the size of your fully connected layers.
If you'd like to save space in the checkpoint files, you could replace this line:
saver = tf.train.Saver()
with the following:
saver = tf.train.Saver(tf.trainable_variables())
By default, tf.train.Saver() saves all variables in your graph -- including the variables created by your optimizer to accumulate gradient information. Telling it to save only trainable variables means it will save only the weights and biases of your network, and discard the accumulated optimizer state. Your checkpoints will probably be a lot smaller, with the tradeoff that it you may experience slower training for the first few training batches after you resume training, while the optimizer re-accumulates gradient information. It doesn't take long at all to get back up to speed, in my experience, so personally, I think the tradeoff is worth it for the smaller checkpoints.

Maybe you can try (in Tensorflow 1.0):
saver.save(sess, filename, write_meta_graph=False)
which doesn't save meta Graph information.
See:
https://www.tensorflow.org/versions/master/api_docs/python/tf/train/Saver
https://www.tensorflow.org/programmers_guide/meta_graph

Typically you only save tf.global_variables() (which is shorthand for tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES), i.e. the collection of global variables). This collection is meant to include variables which are necessary for restoring the state of the model, so things like current moving averages for batch normalization, the global step, the states of the optimizer(s) and, of course, the tf.GraphKeys.TRAINABLE_VARIABLES collection. Variables of more temporary nature, such as the gradients, are collected in LOCAL_VARIABLES and it is usually not necessary to store them and they might take up a lot of disk space.

Related

MATLAB: Are there any problems with many (millions) small files compared to few (thousands) large files?

I'm working on a real-time test software in MATLAB. On user input I want to extract the value of one (or a few neighbouring) pixels from 50-200 high resolution images (~25 MB).
My problem is that the total image set is to big (~2000 images) to store in RAM, consequently I need to read each of the 50-200 images from disk after each user-input which of course is way to slow!
So I was thinking about splitting the images into sub-images (~100x100 pixels) and saving these separately. This would make the image-read process quick enough.
Are there any problems I should be aware of with this approach? For instance I've read about people having trouble copying many small files, will this affect me to i.e. make the image-read slower?
rahnema1 is right - imread(...,'PixelRegion') will fasten read operation. If it is not enough for you, even if your files are not fragmented, may be it is time to think about some database?
Disk operations are always the bottleneck. First we switch to disk caches, then distributed storage, then RAID, and after some more time, we finish with in-memory databases. You should choose which access speed is reasonable.

Train a huge model inception with keras

I need to train an inception model with more than 400 000 images.
I know I can't load it all on the memory, since it's too big.
So, I will certainly train it over batch, instead than epoch (And so generate load every batch from the disk)
But, it will be very slow, no ?
Do you know if there is a different way of doing it ?
I also want to apply different and aleatory transformations to my images during the training.
I looked over the dataimagegenerator class, but, it's incompatible with all the images I have.
So, there is a way to do that without the generator ?
Thanks to you!
You can use the fit_generator method (https://keras.io/models/model/#fit_generator) of the model. This still loads images from memory, however this is done in parallel and has less overhead. You can write your own generator to apply the transformations you want to (https://wiki.python.org/moin/Generators).
If you need faster memory access you can take a look at hdf5. You can store the images in hdf5 to provide faster indexing and loading for your program. (http://www.h5py.org/)

tensorflow store training data on GPU memory

I am pretty new to tensorflow. I used to use theano for deep learning development. I notice a difference between these two, that is where input data can be stored.
In Theano, it supports shared variable to store input data on GPU memory to reduce the data transfer between CPU and GPU.
In tensorflow, we need to feed data into placeholder, and the data can come from CPU memory or files.
My question is: is it possible to store input data on GPU memory for tensorflow? or does it already do it in some magic way?
Thanks.
If your data fits on the GPU, you can load it into a constant on GPU from e.g. a numpy array:
with tf.device('/gpu:0'):
tensorflow_dataset = tf.constant(numpy_dataset)
One way to extract minibatches would be to slice that array at each step instead of feeding it using tf.slice:
batch = tf.slice(tensorflow_dataset, [index, 0], [batch_size, -1])
There are many possible variations around that theme, including using queues to prefetch the data to GPU dynamically.
It is possible, as has been indicated, but make sure that it is actually useful before devoting too much effort to it. At least at present, not every operation has GPU support, and the list of operations without such support includes some common batching and shuffling operations. There may be no advantage to putting your data on GPU if the first stage of processing is to move it to CPU.
Before trying to refactor code to use on-GPU storage, try at least one of the following:
1) Start your session with device placement logging to log which ops are executed on which devices:
config = tf.ConfigProto(log_device_placement=True)
sess = tf.Session(config=config)
2) Try to manually place your graph on GPU by putting its definition in a with tf.device('/gpu:0'): block. This will throw exceptions if ops are not GPU-supported.

Optimizing compression using HDF5/H5 in Matlab

Using Matlab, I am going to generate several data files and store them in H5 format as 20x1500xN, where N is an integer that can vary, but typically around 2300. Each file will have 4 different data sets with equal structure. Thus, I will quickly achieve a storage problem. My two questions:
Is there any reason not the split the 4 different data sets, and just save as 4x20x1500xNinstead? I would prefer having them split, since it is different signal modalities, but if there is any computational/compression advantage to not having them separated, I will join them.
Using Matlab's built-in compression, I set deflate=9 (and DataType=single). However, I have now realized that using deflate multiplies my computational time with 5. I realize this could have something to do with my ChunkSize, which I just put to 20x1500x5 - without any reasoning behind it. Is there a strategic way to optimize computational load w.r.t. deflation and compression time?
Thank you.
1- Splitting or merging? It won't make a difference in the compression procedure, since it is performed in blocks.
2- Your choice of chunkshape seems, indeed, bad. Chunksize determines the shape and size of each block that will be compressed independently. The bad is that each chunk is of 600 kB, that is much larger than the L2 cache, so your CPU is likely twiddling its fingers, waiting for data to come in. Depending on the nature of your data and the usage pattern you will use the most (read the whole array at once, random reads, sequential reads...) you may want to target the L1 or L2 sizes, or something in between. Here are some experiments done with a Python library that may serve you as a guide.
Once you have selected your chunksize (how many bytes will your compression blocks have), you have to choose a chunkshape. I'd recommend the shape that most closely fits your reading pattern, if you are doing partial reads, or filling in in a fastest-axis-first if you want to read the whole array at once. In your case, this will be something like 1x1500x10, I think (second axis being the fastest, last one the second fastest, and fist the slowest, change if I am mistaken).
Lastly, keep in mind that the details are quite dependant on the specific machine you run it: the CPU, the quality and load of the hard drive or SSD, speed of RAM... so the fine tuning will always require some experimentation.

Deep-learning for mapping large binary input

this question may come as being too broad, but I will try to make every sub-topic to be as specific as possible.
My setting:
Large binary input (2-4 KB per sample) (no images)
Large binary output of the same size
My target: Using Deep Learning to find a mapping function from my binary input to the binary output.
I have already generated a large training set (> 1'000'000 samples), and can easily generate more.
In my (admittedly limited) knowledge of Neural networks and deep learning, my plan was to build a network with 2000 or 4000 input nodes, the same number of output nodes and try different amounts of hidden layers.
Then train the network on my data set (waiting several weeks if necessary), and checking whether there is a correlation between in- and output.
Would it be better to input my binary data as single bits into the net, or as larger entities (like 16 bits at a time, etc)?
For bit-by-bit input:
I have tried "Neural Designer", but the software crashes when I try to load my data set (even on small ones with 6 rows), and I had to edit the project save files to set Input and Target properties. And then it crashes again.
I have tried OpenNN, but it tries to allocate a matrix of size (hidden_layers * input nodes) ^ 2, which, of course, fails (sorry, no 117GB of RAM available).
Is there a suitable open-source framework available for this kind of
binary mapping function regression? Do I have to implement my own?
Is Deep learning the right approach?
Has anyone experience with these kind of tasks?
Sadly, I could not find any papers on deep learning + binary mapping.
I will gladly add further information, if requested.
Thank you for providing guidance to a noob.
You have a dataset containing pairs of binary valued vectors, with a max length of 4,000 bits. You want to create a mapping function between the pairs. On the surface, that doesn't seem unreasonable - imagine a 64x64 image with binary pixels – this only contains 4,096 bits of data and is well within the reach of modern neural networks.
As your dealing with binary values, then a multi-layered Restricted Boltzmann Machine would seem like a good choice. How many layers you add to the network really depends on the level of abstraction in the data.
You don’t mention the source of the data, but I assume you expect there to be a decent correlation. Assuming the location of each bit is arbitrary and is independent of its near neighbours, I would rule out a convolutional neural network.
A good open source framework to experiment with is Torch - a scientific computing framework with wide support for machine learning algorithms. It has the added benefit of utilising your GPU to speed up processing thanks to its CUDA implementation. This would hopefully avoid you waiting several weeks for a result.
If you provide more background, then maybe we can home in on a solution…