Extremely low accuracy on own data in caffe - neural-network

I'm trying to train a network on my own data. Whole dataset consists of 256x256 jpeg images. There is 236 objects for classification. Training and validation sets have ~247K and ~61K images, respectively. I've made LMDBs from them using $CAFFE_ROOT/build/tools/convert_imageset utility.
Just for starting I'm using caffenet's topology for my model. During training I come across the weird message "Data layer prefetch queue empty" that I never seen before.
Moreover, initially, network has an abnormal accuracy (~0.00378378) and during next 1000 iterations, it reaches max ~0.01 and further does not increase (just fluctuates).
What I'm doing wrong and how can I improve the accuracy?
Runtime log:
http://paste.ubuntu.com/15568421/
Model:
http://paste.ubuntu.com/15568426/
Solver:
http://paste.ubuntu.com/15568430/
P.S. I'm using the latest version of Caffe, Ubuntu Server 14.04 LTS and g2.2xlarge instance on AWS.

Related

How to evaluate mutiple testing using one ROC-Curve

I ran out of memory (11G VRAM) while testing my CNN with 10 test images. I'm using the U-Net architecture, 20 training images each (1600x1200x1), 48x48 patches (190000 sub images) and batch size of 32 (recommended).
So right now I'm testing my network 5 times eatch 2 images. After that I want to evaluate my network using one ROC curve.
So here are my questions: Can I evaluate my network, if I split the testing? If yes, how can I manage it?
If not, what do I have to change in my config so that the memory doesn't run out?
btw I'm a beginner in NN and I'm sorry for my bad english!

'std::bad_alloc' while reading lmdb in caffe

I have created an lmdb file that contains non-encoded 6-channel images. When I load it into a network in caffe, after the network is loaded, the system RAM usage (as seen using the 'top' command) is initially around 10%, but it goes on increasing, until in reaches above 90%. I am using a system with 32 GB RAM, and it begins to slow down extremely, until the code crashes with the following error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Note that this happens even before running a single forward pass.
The size of the lmdb file I'm using is 545 MB.
I've used python netspec to define the network. Following is the code:
net = caffe.NetSpec()
net.data0, net.label = CreateAnnotatedDataLayer(train_data,
batch_size=1,train=True, output_label=True,
label_map_file=label_map_file,
transform_param=train_transform_param, batch_sampler=batch_sampler)
net.data, net.data_d = L.Slice(net.data0, slice_param={'axis': 1}, ntop=2, name='data_slicer')
Since my lmdb has 6-channel images, and the pretrained network has 3 channels, I am using a slice layer to split the image into 3-channel images that can be fed into two different convolutional layers.
Any suggestions would be helpful.

Tensorflow. Cifar10 Multi-gpu example performs worse with more gpus

I have to test the distributed version of tensorflow across multiple gpus.
I run the Cifar-10 multi-gpu example on an AWS g2.8x EC2 instance.
Running time for 2000 steps of the cifar10_multi_gpu_train.py (code here) was 427 seconds with 1 gpu (flag num_gpu=1). Afterwards the eval.py script returned precision # 1 = 0.537.
With the same example running for the same number of steps (with one step being executed in parallel across all gpus), but using 4 gpus (flag num_gpu=4) running time was about 530 seconds and the eval.py script returned only a slightly higher precision # 1 of 0.552 (maybe due to randomness in the computation?).
Why is the example performing worse with a higher number of gpus? I have used a very small number of steps for testing purposes and was expecting a much higher gain in precision using 4 gpus.
Did I miss something or made some basic mistakes?
Did someone else try the above example?
Thank you very much.
The cifar10 example uses variables on CPU by default, which is what you need for a multi-GPU architecture. You may achieve about 1.5x speed up compared to a single GPU setup with 2 GPUs.
Your problem has to do with the Dual GPU architecture for Nvidia Tesla K80. It has a PCIe switch to communicate both GPU cards internally. It shall introduce an overhead on communication. See block diagram:

Size of a random forest model in MLlib

I have to compute and to keep in memory several (e.g. 20 or more) random forests model with Apache Spark.
I have only 8 GB available on the driver of the yarn cluster I use to launch the job. And I am faced to OutOfMemory errors because models do not fit in memory. I have already decreased the ratio spark.storage.memoryFraction to 0.1 to try to increase the non-RDD memory.
I have thus two questions:
How could I make these models fit in memory?
What could I check the size of my models?
EDIT
I have 200 executors which have 8GB of space.
I am not sure my models live in the driver but I suspect it as I get OutOfMemory errors and I have plenty of space in the executors. Furthermore, I stock these models in Arrays

Deep-learning for mapping large binary input

this question may come as being too broad, but I will try to make every sub-topic to be as specific as possible.
My setting:
Large binary input (2-4 KB per sample) (no images)
Large binary output of the same size
My target: Using Deep Learning to find a mapping function from my binary input to the binary output.
I have already generated a large training set (> 1'000'000 samples), and can easily generate more.
In my (admittedly limited) knowledge of Neural networks and deep learning, my plan was to build a network with 2000 or 4000 input nodes, the same number of output nodes and try different amounts of hidden layers.
Then train the network on my data set (waiting several weeks if necessary), and checking whether there is a correlation between in- and output.
Would it be better to input my binary data as single bits into the net, or as larger entities (like 16 bits at a time, etc)?
For bit-by-bit input:
I have tried "Neural Designer", but the software crashes when I try to load my data set (even on small ones with 6 rows), and I had to edit the project save files to set Input and Target properties. And then it crashes again.
I have tried OpenNN, but it tries to allocate a matrix of size (hidden_layers * input nodes) ^ 2, which, of course, fails (sorry, no 117GB of RAM available).
Is there a suitable open-source framework available for this kind of
binary mapping function regression? Do I have to implement my own?
Is Deep learning the right approach?
Has anyone experience with these kind of tasks?
Sadly, I could not find any papers on deep learning + binary mapping.
I will gladly add further information, if requested.
Thank you for providing guidance to a noob.
You have a dataset containing pairs of binary valued vectors, with a max length of 4,000 bits. You want to create a mapping function between the pairs. On the surface, that doesn't seem unreasonable - imagine a 64x64 image with binary pixels – this only contains 4,096 bits of data and is well within the reach of modern neural networks.
As your dealing with binary values, then a multi-layered Restricted Boltzmann Machine would seem like a good choice. How many layers you add to the network really depends on the level of abstraction in the data.
You don’t mention the source of the data, but I assume you expect there to be a decent correlation. Assuming the location of each bit is arbitrary and is independent of its near neighbours, I would rule out a convolutional neural network.
A good open source framework to experiment with is Torch - a scientific computing framework with wide support for machine learning algorithms. It has the added benefit of utilising your GPU to speed up processing thanks to its CUDA implementation. This would hopefully avoid you waiting several weeks for a result.
If you provide more background, then maybe we can home in on a solution…