How to evaluate mutiple testing using one ROC-Curve - neural-network

I ran out of memory (11G VRAM) while testing my CNN with 10 test images. I'm using the U-Net architecture, 20 training images each (1600x1200x1), 48x48 patches (190000 sub images) and batch size of 32 (recommended).
So right now I'm testing my network 5 times eatch 2 images. After that I want to evaluate my network using one ROC curve.
So here are my questions: Can I evaluate my network, if I split the testing? If yes, how can I manage it?
If not, what do I have to change in my config so that the memory doesn't run out?
btw I'm a beginner in NN and I'm sorry for my bad english!

Related

yolov4..cfg : increasing subdivisions parameter consequences

I'm trying to train a custom dataset using Darknet framework and Yolov4. I built up my own dataset but I get a Out of memory message in google colab. It also said "try to change subdivisions to 64" or something like that.
I've searched around the meaning of main .cfg parameters such as batch, subdivisions, etc. and I can understand that increasing the subdivisions number means splitting into smaller "pictures" before processing, thus avoiding to get the fatal "CUDA out of memory". And indeed switching to 64 worked well. Now I couldn't find anywhere the answer to the ultimate question: is the final weight file and accuracy "crippled" by doing this? More specifically what are the consequences on the final result? If we put aside the training time (which would surely increase since there are more subdivisions to train), how will be the accuracy?
In other words: if we use exactly the same dataset and train using 8 subdivisions, then do the same using 64 subdivisions, will the best_weight file be the same? And will the object detections success % be the same or worse?
Thank you.
first read comments
suppose you have 100 batches.
batch size = 64
subdivision = 8
it will divide your batch = 64/8 => 8
Now it will load and work one by one on 8 divided parts into the RAM, because of LOW RAM capacity you can change the parameter according to ram capacity.
you can also reduce batch size , so it will take low space in ram.
It will do nothing to the datasets images.
It is just splitting the large batch size which can't be load in RAM, so divided into small pieces.

Caffe: How to train end-to-end (image to image)?

We are quite new to caffe, but what we have seen so far, looks really promising.
After reading a few papers (1,2), we wanted to reproduce the result of 1, specifically about a segmentation challenge 4.
We downloaded the modified caffe from 3 and were able to execute it, just to see, that the trained network didn't work with the dataset from 4.
At first we thought that the network needs to be trained for the specific problem.
Which lead to the problem of how to do 'image-to-image (aka end-to-end) learning ' (4, training data).
This lead us to 'holistically nested edge detection' (hed, 2), where image-to-image learning, seems to be used.
With hed, we were able to retrain the network on our own. But it doesn't work (it leads to all 0 or 0.5 images - black images :-( ) if we try to train the network for the dataset of 4. For initialization we wrote a script to calculate the mean-map witch we use for the dataset of 4.
Our question(s) are:
How can we reproduce the result, mentioned in 1 by running
image-to-image training?
or
How do you train networks, where we have image-to-image learning?
Since we only have 30 image-to-image pairs, should we implement
deformation as mentioned in 1/3 via matlab/python or is there a
functionality within caffe already?
Are we missing something simple from 1 or 2?
Kind regards,
Klaus and Bernhard
Ps: We asked the same question at the caffe-user group and intend to post solutions at both locations.
After some time, and trying several different things out - i stumbled upon:
https://github.com/naibaf7
Using that caffe fork, with caffe_neural_models and caffe_neural_tool training image(raw)-to-image(labels) can be done quite simple.
Just check out 'caffe_neural_models/net*' for different configurations.

Deep-learning for mapping large binary input

this question may come as being too broad, but I will try to make every sub-topic to be as specific as possible.
My setting:
Large binary input (2-4 KB per sample) (no images)
Large binary output of the same size
My target: Using Deep Learning to find a mapping function from my binary input to the binary output.
I have already generated a large training set (> 1'000'000 samples), and can easily generate more.
In my (admittedly limited) knowledge of Neural networks and deep learning, my plan was to build a network with 2000 or 4000 input nodes, the same number of output nodes and try different amounts of hidden layers.
Then train the network on my data set (waiting several weeks if necessary), and checking whether there is a correlation between in- and output.
Would it be better to input my binary data as single bits into the net, or as larger entities (like 16 bits at a time, etc)?
For bit-by-bit input:
I have tried "Neural Designer", but the software crashes when I try to load my data set (even on small ones with 6 rows), and I had to edit the project save files to set Input and Target properties. And then it crashes again.
I have tried OpenNN, but it tries to allocate a matrix of size (hidden_layers * input nodes) ^ 2, which, of course, fails (sorry, no 117GB of RAM available).
Is there a suitable open-source framework available for this kind of
binary mapping function regression? Do I have to implement my own?
Is Deep learning the right approach?
Has anyone experience with these kind of tasks?
Sadly, I could not find any papers on deep learning + binary mapping.
I will gladly add further information, if requested.
Thank you for providing guidance to a noob.
You have a dataset containing pairs of binary valued vectors, with a max length of 4,000 bits. You want to create a mapping function between the pairs. On the surface, that doesn't seem unreasonable - imagine a 64x64 image with binary pixels – this only contains 4,096 bits of data and is well within the reach of modern neural networks.
As your dealing with binary values, then a multi-layered Restricted Boltzmann Machine would seem like a good choice. How many layers you add to the network really depends on the level of abstraction in the data.
You don’t mention the source of the data, but I assume you expect there to be a decent correlation. Assuming the location of each bit is arbitrary and is independent of its near neighbours, I would rule out a convolutional neural network.
A good open source framework to experiment with is Torch - a scientific computing framework with wide support for machine learning algorithms. It has the added benefit of utilising your GPU to speed up processing thanks to its CUDA implementation. This would hopefully avoid you waiting several weeks for a result.
If you provide more background, then maybe we can home in on a solution…

How to get significant speed up using Parallel Computing Toolbox of MATLAB in core i7 processor?

I am working on Image Processing . I am having a computer with Intel(R) Core(TM) i7 -3770 CPU #3.40 GHz, RAM 4 GB Configuration. I just want parallelize our code of an algorithm of image processing using SPMD command of PCT. For this i have divided image vertically into 8 parts and send it different labs and by using SPMD command i executed algorithm of image processing parallely on different parts on different lab.
I got the right answer which i got from sequential code. But this is taking more time than a sequential code . I have tried this with very largest image to smallest image but didn't get the significant result.
Suggest me how can i get significant speed up using SPMD command?
Since you did not provide any code I'll have to stick to a general answer. In all parallel computing there are several design considerations, the two most important are: is your code able to run in parallel, and secondly: how much communication overhead do you create.
Calling workers means sending information back and forth, so there is an optimum in parallel computing. Make sure you provide your workers with enough work so that the communication to and from your workers requires less time than the speed-up gained from parallel computing.
Last but not least: if you provide a working code example the community is able to help you much better!
If you want to apply the same operation to several blocks within an image, then rather than worry about constructs such as spmd, you can just apply the command blockproc and set the UseParallel option to true. It will parallelize everything for you, without you needing to do anything.
If that doesn't work for you and you really have a requirement to implement your algorithm directly using spmd, you'll need to post some example code to indicate what you've tried, and where it's going wrong.

How to fully use the CPU in Matlab [Improving performance of a repetitive, time-consuming program]

I'm working on an adaptive and Fully automatic segmentation algorithm under varying light condition , the core of this algorithm uses Particle swarm Optimization(PSO) to tune the fuzzy system and believe me it's very time consuming :| for only 5 particles and 100 iterations I have to wait 2 to 3 hours ! and it's just processing one image from my data set containing over 100 photos !
I'm using matlab R2013 ,with a intel coer i7-2670Qm # 2.2GHz //8.00GB RAM//64-bit operating system
the problem is : when starting the program it uses only 12%-16% of my CPU and only one core is working !!
I've searched a lot and came into matlabpool so I added this line to my code :
matlabpool open 8
now when I start the program the task manger shows 98% CPU usage, but it's just for a few seconds ! after that it came back to 12-13% CPU usage :|
Do you have any idea how can I get this code run faster ?!
12 Percent sounds like Matlab is using only one Thread/Core and this one with with full load, which is normal.
matlabpool open 8 is not enough, this simply opens workers. You have to use commands like parfor, to assign work to them.
Further to Daniel's suggestion, ideally to apply PARFOR you'd find a time-consuming FOR loop in you algorithm where the iterations are independent and convert that to PARFOR. Generally, PARFOR works best when applied at the outermost level possible. It's also definitely worth using the MATLAB profiler to help you optimise your serial code before you start adding parallelism.
With my own simulations I find that I cannot recode them using Parfor, the for loops I have are too intertwined to take advantage of multiple cores.
HOWEVER:
You can open a second (and third, and fourth etc) instance of Matlab and tell this additional instance to run another job. Each instance of matlab open will use a different core. So if you have a quadcore, you can have 4 instances open and get 100% efficiency by running code in all 4.
So, I gained efficiency by having multiple instances of matlab open at the same time and running a job. My jobs took 8 to 27 hours at a time, and as one might imagine without liquid cooling I burnt out my cpu fan and had to replace it.
Also do look into optimizing your matlab code, I recently optimized my code and now it runs 40% faster.